|
1. |
A Bayesian Evolutionary Distance for Parametrically Aligned Sequences |
|
Journal of Computational Biology,
Volume 3,
Issue 1,
1996,
Page 1-17
PANKAJ AGARWAL,
DAVID J. STATES,
Preview
|
PDF (1945KB)
|
|
摘要:
ABSTRACTThere is an inherent relationship between the process of pairwise sequence alignment and the estimation of evolutionary distance. This relationship is explored and made explicit. Assuming an evolutionary model and given a specific pattern of observed base mismatches, the relative probabilities of evolution at each evolutionary distance are computed using a Bayesian framework. The mean or the median of this probability distribution provides a robust estimate of the central value. The evolutionary distance has traditionally been computed as zero for an observed homology of 20 bases with no mismatches; we prove that it is highly probable that the distance is greater than 0.01. The mean of the distribution is 0.047, which is a better estimate of the evolutionary distance. Bayesian estimates of the evolutionary distance incorporate arbitrary prior information about variable mutation rates both over time and along sequence position, thus requiring only a weak form of the molecular-clock hypothesis. The endpoints of the similarity between genomic DNA sequences are often ambiguous. The probability of evolution at each evolutionary distance can be estimated over the entire set of alignments by choosing the best alignment at each distance and the corresponding probability of duplication at that evolutionary distance. A central value of this distribution provides a robust evolutionary distance estimate. We provide an efficient algorithm for computing the parametric alignment, considering evolutionary distance as the only parameter. These techniques and estimates are used to infer the duplication history of the genomic sequence inC. elegansand inS. cerevisae. Our results indicate that repeats discovered using a single scoring matrix show a considerable bias in subsequent evolutionary distance estimates.
ISSN:1066-5277
DOI:10.1089/cmb.1996.3.1
年代:1996
数据来源: MAL
|
2. |
Complete Families of Linear Invariants for Some Stochastic Models of Sequence Evolution, with and without the Molecular Clock Assumption |
|
Journal of Computational Biology,
Volume 3,
Issue 1,
1996,
Page 19-31
MICHAEL D. HENDY,
DAVID PENNY,
Preview
|
PDF (1213KB)
|
|
摘要:
ABSTRACTFor various models of sequence evolution, the set of linear functions of the frequencies of the nucleotide patterns forms a vector space, the invariant space. Here we distinguish between the model of nucleotide substitution, and the phylogenetic treeTdescribing the paths on which these changes occur. We describe a procedure to construct a basis of the invariant space for those models that are extensions of models incorporating Kimura's three substitution model of nucleotide change, including both the Jukes–Cantor and Cavender– Farris models. The dimension of the invariant space is determined, for those models where it is independent of the tree topology, as a function of the number of sequences. These are calculated where the nucleotide distribution at the root is unspecified, and both with, and without, the assumption of the molecular clock hypothesis. The invariants have a number of potential applications, including tree identification, and testing the fit of models (which could include the molecular clock) to sequence d
ISSN:1066-5277
DOI:10.1089/cmb.1996.3.19
年代:1996
数据来源: MAL
|
3. |
Approximate Matching of Network Expressions with Spacers |
|
Journal of Computational Biology,
Volume 3,
Issue 1,
1996,
Page 33-51
EUGENE W MYERS,
Preview
|
PDF (2548KB)
|
|
摘要:
ABSTRACTTwo algorithmic results are presented that are pertinent to the matching of patterns typically used by biologists to describe regions of macromolecular sequences that encode a given function. The first result is a threshold-sensitive algorithm for approximately matching both network and regular expressions. Network expressions are regular expressions that can be composed only from union and concatenation operators. Kleene closure (i.e., unbounded repetition) is not permitted. The algorithm is threshold-sensitive in that its performance depends on the threshold,k, of the number of differences allowed in an approximate match. This result generalizes the0(kn) expected-time algorithm of Ukkonen for approximately matching keywords. The second result concerns the problem of matching a pattern that is a network expression whose elements are approximate matches to network or regular expressions interspersed with specifiable distance ranges. For this class of patterns, it is shown how to determine a backtracking procedure whose order of evaluation is optimal in the sense that its expected time is minimal over all such procedures.
ISSN:1066-5277
DOI:10.1089/cmb.1996.3.33
年代:1996
数据来源: MAL
|
4. |
Fast Protein Folding in the Hydrophobic–Hydrophilic Model within Three-Eighths of Optimal |
|
Journal of Computational Biology,
Volume 3,
Issue 1,
1996,
Page 53-96
WILLIAM E. HART,
SORIN C. ISTRAIL,
Preview
|
PDF (4964KB)
|
|
摘要:
ABSTRACTWe present performance-guaranteed approximation algorithms for the protein folding problem in the hydrophobic–hydrophilic model (Dill, 1985). Our algorithms are the first approximation algorithms in the literature with guaranteed performance for this model (Dill, 1994). The hydrophobic–hydrophilic model abstracts the dominant force of protein folding: the hydrophobic interaction. The protein is modeled as a chain of amino acids of lengthnthat are of two types;H(hydrophobic, i.e., nonpolar) andP(hydrophilic, i.e., polar). Although this model is a simplification of more complex protein folding models, the protein folding structure prediction problem is notoriously difficult for this model. Our algorithms have linear (3n) or quadratic time and achieve a three-dimensional protein conformation that has a guaranteed free energy no worse than three-eighths of optimal. This result answers the open problem of Ngoet al. (1994) about the possible existence of an efficient approximation algorithm with guaranteed performance for protein structure prediction in any well-studied model of protein folding. By achieving speed and near-optimality simultaneously, our algorithms rigorously capture salient features of the recently proposed framework of protein folding by Saliet al. (1994). Equally important, the final conformations of our algorithms have significant secondary structure (antiparallel sheets, β-sheets, compact hydrophobic core). Furthermore, hypothetical folding pathways can be described for our algorithms that fit within the framework of diffusion-collision protein folding proposed by Karplus and Weaver (1979). Computational limitations of algorithms that compute the optimal conformation have restricted their applicability to short sequences (length ≤ 90). Because our algorithms trade computational accuracy for speed, they can construct near-optimal conformations in linear time for sequences of an
ISSN:1066-5277
DOI:10.1089/cmb.1996.3.53
年代:1996
数据来源: MAL
|
5. |
A Mathematical Solution to a Network Designing Problem |
|
Journal of Computational Biology,
Volume 3,
Issue 1,
1996,
Page 97-141
YOSHIKANE TAKAHASHI,
Preview
|
PDF (4462KB)
|
|
摘要:
ABSTRACTOne of the major open issues in neural network research includes a Network Designing Problem (NDP): find a polynomial-time procedure that produces minimal structures (the minimum intermediate size, thresholds and synapse weights) of multilayer threshold feed-forward networks so that they can yield outputs consistent with given sample sets of input– output data. The NDP includes as a subproblem a Network Training Problem (NTP) where the intermediate size is given. The NTP has been studied mainly by use of iterative algorithms of network training. This paper, making use of both rate distortion theory in information theory and linear algebra, solves the NDP mathematically rigorously. On the basis of this mathematical solution, it furthermore develops a mathematical solution procedure to the NDP that computes the minimal structure straightforwardly from the sample set. The procedure precisely attains the minimum intermediate size, although its computational time complexity can be of nonpolynomial order at worst cases. The paper also refers to a polynomial-time shortcut of the procedure for practical use that can reach an approximate minimum intermediate size with its error measurable. The shortcut, when the intermediate size is prespecified, reduces to a promising alternative as well to current network training algorithms to the NT
ISSN:1066-5277
DOI:10.1089/cmb.1996.3.97
年代:1996
数据来源: MAL
|
6. |
Probabilistic Learning in Immune Network: Weighted Tree Matching Model |
|
Journal of Computational Biology,
Volume 3,
Issue 1,
1996,
Page 143-162
RAJANI R. JOSHI,
K. KRISHNANAND,
Preview
|
PDF (1906KB)
|
|
摘要:
ABSTRACTAdaptive learning properties (of clonal selection and affinity maturation) in the immune network model are investigated in this paper under a nonlinear data structural representation of the involved molecules. Weighted trees are constructed to model the multiple paratopes/epitopes on the antibodies/antigens. Parallel computing experiments are carried out for the canonical coding of these trees and the corresponding multiple matching interactions. Our experiments on real data have shown significant results on the cognitive properties of the immune network. These and other computational results are presented along with a discussion of future applications.
ISSN:1066-5277
DOI:10.1089/cmb.1996.3.143
年代:1996
数据来源: MAL
|
7. |
Improving Prediction of Protein Secondary Structure Using Structured Neural Networks and Multiple Sequence Alignments |
|
Journal of Computational Biology,
Volume 3,
Issue 1,
1996,
Page 163-183
SØREN KAMARIC RIIS,
ANDERS KROGH,
Preview
|
PDF (6084KB)
|
|
摘要:
ABSTRACTThe prediction of protein secondary structure by use of carefully structured neural networks and multiple sequence alignments has been investigated. Separate networks are used for predicting the three secondary structures α-helix, β-strand, and coil. The networks are designed using a priori knowledge of amino acid properties with respect to the secondary structure and the characteristic periodicity in α-helices. Since these single-structure networks all have less than 600 adjustable weights, overfitting is avoided. To obtain a three-state prediction of α-helix, β-strand, or coil, ensembles of single-structure networks are combined with another neural network. This method gives an overall prediction accuracy of 66.3% when using 7-fold cross-validation on a database of 126 nonhomologous globular proteins. Applying the method to multiple sequence alignments of homologous proteins increases the prediction accuracy significantly to 71.3% with corresponding Matthew's correlation coefficientsCα= 0.59,Cβ= 0.52, andCc= 0.50. More than 72% of the residues in the database are predicted with an accuracy of 80%. It is shown that the network outputs can be interpreted as estimated probabilities of correct prediction, and, therefore, these numbers indicate which residues are predicted with high conf
ISSN:1066-5277
DOI:10.1089/cmb.1996.3.163
年代:1996
数据来源: MAL
|
8. |
A Simple Flexible Program for the Computational Analysis of Amino Acyl Residue Distribution in Proteins: Application to the Distribution of Aromatic versus Aliphatic Hydrophobic Amino Acids in Transmembrane α-Helical Spanners of Integral Membrane Transport Proteins |
|
Journal of Computational Biology,
Volume 3,
Issue 1,
1996,
Page 185-190
SIMON TSANG,
MILTON H. SAIER,
Preview
|
PDF (683KB)
|
|
摘要:
ABSTRACTWe describe a simple, flexible program (AAD) with a primary function of depicting the distribution of aliphatic and aromatic amino acid residues along the linear aligned sequence of a family of homologous proteins and a secondary function of depicting the distribution of all amino acids along the same linear sequence. The program is used to examine the distribution of aromatic versus aliphatic residues in representative well-characterized families of polytopic membrane proteins. Many but not all such protein families are shown to exhibit a predominance of aliphatic residues in the central regions of their transmembrane spanners but a predominance of aromatic residues at the peripheries of their spanners. We propose that this distribution stabilizes the hydrophobic–hydrophilic interface and renders the centers of these integral membrane proteins more fluid than their peripherie
ISSN:1066-5277
DOI:10.1089/cmb.1996.3.185
年代:1996
数据来源: MAL
|
9. |
Integrated Access to Metabolic and Genomic Data |
|
Journal of Computational Biology,
Volume 3,
Issue 1,
1996,
Page 191-212
PETER D. KARP,
SUZANNE PALEY,
Preview
|
PDF (2825KB)
|
|
摘要:
ABSTRACTThe EcoCyc system consists of a knowledge base (KB) that describes the genes and intermediary metabolism ofEscherichia coli, and a graphical user interface (GUI) for accessing that knowledge. This paper addresses two problems: How can we create a GUI that provides integrated access to metabolic and genomic data? We describe the design and implementation of visual presentations that closely mimic those found in the biology literature, and that offer hypertext navigation among related entities, and multiple views of the same entity. We employ a frame knowledge representation system (FRS) called HyperTHEO to manage the EcoCyc knowledge base. Among the advantages of FRSs are an expressive data model for capturing the complexities of biological information, and schema-evolution capabilities that facilitate the constant schema changes that biological databases tend to undergo. HyperTHEO also includes rule-based inference facilities that are the foundation of expert systems, a constraint language for maintaining data integrity, and a declarative query language. A graphic KB editor and browser allow the EcoCyc developers to interactively inspect and modify this evolving KB.
ISSN:1066-5277
DOI:10.1089/cmb.1996.3.191
年代:1996
数据来源: MAL
|
|