1. |
Robust Proofs of NP-Hardness for Protein Folding: General Lattices and Energy Potentials |
|
Journal of Computational Biology,
Volume 4,
Issue 1,
1997,
Page 1-22
WILLIAM E. HART,
SORIN ISTRAIL,
Preview
|
PDF (6459KB)
|
|
摘要:
ABSTRACTThis paper addresses the robustness of intractability arguments for simplified models of protein folding that use lattices to discretize the space of conformations that a protein can assume. We present two generalized NP-hardness results. The first concerns the intractability of protein folding independent of the lattice used to define the discrete protein-folding model. We consider a previously studied model and prove that foranyreasonable lattice the protein-structure prediction problem is NP-hard. The second hardness result concerns the intractability of protein folding for a class of energy formulas that contains a broad range of mean force potentials whose form is similar to commonly used pair potentials (e.g., the Lennard-Jones potential). We prove that protein-structure prediction is NP-hard for any energy formula in this class. These are the first robust intractability results that identify sources of computational complexity of protein-structure prediction that transcend particular problem formulations.
ISSN:1066-5277
DOI:10.1089/cmb.1997.4.1
年代:1997
数据来源: MAL
|
2. |
Towards Integration of Multiple Alignment and Phylogenetic Tree Construction |
|
Journal of Computational Biology,
Volume 4,
Issue 1,
1997,
Page 23-34
MARTIN VINGRON,
ARNDT VON HAESELER,
Preview
|
PDF (1373KB)
|
|
摘要:
ABSTRACTA central problem in the study of molecular evolution is the reconstruction of the history of a set of biological sequences in the form of a phylogenetic tree. One step in calculating this tree is the computation of a multiple alignment. Most existing approaches treat the two problems of multiple alignment and tree construction as separate while in fact they influence each other. Based on three-way alignments of pre-aligned groups of sequences we adapt a commonly used tree construction procedure to produce both tree and multiple alignment simultaneously. In contrast to existing iterative algorithms the new method can change alignments made early in the course of the computation at a later stage. A sufficient criterion to prevent the introduction of edges with negative length reduces the number of three-way alignments that need to be computed. Applications of the new approach to the alignment of protein and of nucleic acid sequences are presented.
ISSN:1066-5277
DOI:10.1089/cmb.1997.4.23
年代:1997
数据来源: MAL
|
3. |
Central Limit Theorem for Renewal Theory for Several Patterns |
|
Journal of Computational Biology,
Volume 4,
Issue 1,
1997,
Page 35-44
MIROSLAV S. TANUSHEV,
RICHARD ARRATIA,
Preview
|
PDF (921KB)
|
|
摘要:
ABSTRACTWe prove a joint central limit theorem for the vector of counts of nonoverlapping occurrences ofmgiven words as competing renewals. Our underlying model is ani.i.d.sequence over a finite alphabet. The motivation involves restriction enzymes in DNA sequences. We give a simple explicit formula for the limit covariance. This is in terms of the matrix of overlap-matching polynomials, following works of Guibas and Odlyzko (1980), of Breenet al. (1985), and of Biggins and Cannings (1987). The corresponding central limit theorem for counts of overlapping occurrences, rather than competing renewals, was derived by Lundstrom (1990). The above is a special case of a general situation of competing renewals in which occurrences of each type individually form a renewal process, and the individual processes interact in such a way that occurrences of either of two given types also form a renewal process. There is a simple expression for the limit covariance in this general case, involving only the means and variances for each type.
ISSN:1066-5277
DOI:10.1089/cmb.1997.4.35
年代:1997
数据来源: MAL
|
4. |
Score Distributions for Simultaneous Matching to Multiple Motifs |
|
Journal of Computational Biology,
Volume 4,
Issue 1,
1997,
Page 45-59
TIMOTHY L. BAILEY,
MICHAEL GRIBSKOV,
Preview
|
PDF (1720KB)
|
|
摘要:
ABSTRACTSeveral computer algorithms now exist for discovering multiple motifs (expressed as weight matrices) that characterize a family of protein sequences known to be homologous. This paper describes a method for performing similarity searches of protein sequence databases using such a group of motifs. By simultaneously using all the motifs that characterize a protein family, the sensitivity and specificity of the database search are increased. We define the p-value for a target sequence to be the probability of a random sequence of the same length scoring as well or better in comparison to all the motifs that characterize the family. (The p-value of a database search can be determined from this value and the size of the database.) We show that estimating the distribution of single motif scores by a Gaussian extreme value distribution is insufficiently accurate to provide a useful estimate of the p-value, but that this deficiency can be corrected by reestimating the parameters of the underlying Gaussian distribution from observed scores for comparison of a given motif and sequence database. These parameters are used to calculate a "reduced variate" which has a Gumbel limiting distribution. Multiple motif scores are combined to give a single p-value by using the sum of the reduced variates for the motif scores as the test statistic. We give a computationally efficient approximation to the distribution of the sum of independent Gumbel random variables and verify experimentally that it closely approximates the distribution of the test statistic. Experiments on pseudorandom sequences show that the approximated p-values are conservative, so the significance of high scores in database searches will not be overstated. Experiments with real protein sequences and motifs identified by the MEME algorithm show that determining an overall p-value based on the combination of multiple motifs gives significantly better database search results than using p-values of single motifs.
ISSN:1066-5277
DOI:10.1089/cmb.1997.4.45
年代:1997
数据来源: MAL
|
5. |
Coverage Processes in Physical Mapping by Anchoring Random Clones |
|
Journal of Computational Biology,
Volume 4,
Issue 1,
1997,
Page 61-82
SOPHIE SCHBATH,
Preview
|
PDF (2091KB)
|
|
摘要:
ABSTRACTThe aim of this paper is to provide general results for predicting progress in a physical mapping project by anchoring random clones, when clones and anchors are not homogeneously distributed along the genome. A complete physical map of the DNA of an organism consists of overlapping clones spanning the entire genome. Several schemes can be used to construct such a map, depending on the way that clones overlap. We focus here on the approach consisting of assembling clones sharing a common random short sequence called an anchor. Some mathematical analyses providing statistical properties of anchored clones have been developed in the stationary case. Modeling the clone and anchor processes as nonhomogeneous Poisson processes provides such an analysis in a general nonstationary framework. We apply our results to two natural nonhomogeneous models to illustrate the effect of inhomogeneity. This study reveals that using homogeneous processes for clones and anchors provides an overly optimistic assessment of the progress of the mapping project.
ISSN:1066-5277
DOI:10.1089/cmb.1997.4.61
年代:1997
数据来源: MAL
|
6. |
Generating Programs for Predicting the Activity of Functional Sites |
|
Journal of Computational Biology,
Volume 4,
Issue 1,
1997,
Page 83-90
MICHAEL P. PONOMARENKO,
ANNA N. KOLCHANOVA,
NIKOLAY A. KOLCHANOV,
Preview
|
PDF (828KB)
|
|
摘要:
ABSTRACTThe computer system ACTIVITY is intended for generating programs with which to predict the activity of functional sites by nucleotide sequences. ACTIVITY analyzes a basis set of nucleotide sequences with known activity. The novelty of this approach is that Zadeh's fuzzy logic and decision-making theory have been employed for determining the best "sequence → activity" regression. The best one thus determined is then transformed into the text of a program with which the activity for any nucleotide sequence is to be predicted. Testing with independent data has proved this prediction reliable. We have compared our approach with the two commonly used on identical data sets to find the ACTIVITY-generated programs quite competitiv
ISSN:1066-5277
DOI:10.1089/cmb.1997.4.83
年代:1997
数据来源: MAL
|