|
1. |
Special RECOMB99 Issue |
|
Journal of Computational Biology,
Volume 6,
Issue 3-4,
1999,
Page 279-279
Sorin Istrail,
Preview
|
PDF (40KB)
|
|
ISSN:1066-5277
DOI:10.1089/106652799318265
出版商:Mary Ann Liebert, Inc.
年代:1999
数据来源: MAL
|
2. |
Clustering Gene Expression Patterns |
|
Journal of Computational Biology,
Volume 6,
Issue 3-4,
1999,
Page 281-297
Amir Ben-Dor,
Ron Shamir,
Zohar Yakhini,
Preview
|
PDF (728KB)
|
|
摘要:
Recent advances in biotechnology allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. Analysis of data produced by such experiments
offers potential insight into gene function and regulatory mechanisms. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. Thecorresponding algorithmic problem is to cluster multicondition gene expression patterns. In this paper we describe a novel clustering algorithm that was developed for analysis of gene expression data. We
define an appropriate stochastic error model on the input, and prove that under the conditions of the model, the algorithm recovers the cluster structure with high probability. The running time of the algorithmon an n-gene dataset is O{n2[log(n)]c}. We also present a practical heuristic based on the same algorithmic ideas. The heuristic was implemented and its performance is demonstrated
on simulated data and on real gene expression data, with very promising results.
ISSN:1066-5277
DOI:10.1089/106652799318274
出版商:Mary Ann Liebert, Inc.
年代:1999
数据来源: MAL
|
3. |
Performance of Threading Scoring Functions Designed Using New Optimization Method |
|
Journal of Computational Biology,
Volume 6,
Issue 3-4,
1999,
Page 299-311
Jadwiga R. Biekowska,
Robert G. Rogers,
Temple F. Smith,
Preview
|
PDF (225KB)
|
|
摘要:
We present a new procedure for optimization of a threading scoring function. A scoring function is usually formulated in terms of the structural environment states that describe the protein fold model.
We propose a method for the optimal selection of those structural environment states that naturally follows from the probabilistic description of the threading problem and is done prior to threading experiments.We demonstrate the selection of the optimal structural environment states for the solvent exposure of the amino acid position, and present the results of threading experiments performed using scoring functions
designed with and without the optimization of the structural environment states. These results confirm that the optimal scoring function predicts the sequence-to-structure alignments most accurately. Threadingexperiments performed with 15 optimally designed scoring functions show that the correlation coefficient between the information content of the amino acid distribution that determines the scoring function
and the accuracy of the optimal sequence-to-structure alignment is 0.94.
ISSN:1066-5277
DOI:10.1089/106652799318283
出版商:Mary Ann Liebert, Inc.
年代:1999
数据来源: MAL
|
4. |
Fast Detection of Common Geometric Substructure in Proteins |
|
Journal of Computational Biology,
Volume 6,
Issue 3-4,
1999,
Page 313-325
L. Paul Chew,
Dan Huttenlocher,
Klara Kedem,
Jon Kleinberg,
Preview
|
PDF (237KB)
|
|
摘要:
We consider the problem of identifying common three-dimensional substructures between proteins. Our method is based on comparing the shape of the-carbon backbone structures of the proteins in order
to find three-dimensional (3D) rigid motions that bring portions of the geometric structures into correspondence. We propose a geometric representation of protein backbone chains that is compact yet allowsfor similarity measures that are robust against noise and outliers. This representation encodes the structure of the backbone as a sequence of unit vectors, defined by each adjacent pair of-carbons.
We then define a measure of the similarity of two protein structures based on the root mean squared (RMS) distance between corresponding orientation vectors of the two proteins. Our measure has severaladvantages over measures that are commonly used for comparing protein shapes, such as the minimum RMS distance between the 3D positions of corresponding atoms in two proteins. A key advantage is that this
new measure behaves well for identifying common substructures, in contrast with position-based measures where the nonmatching portions of the structure dominate the measure. At the same time, it avoidsthe quadratic space and computational difficulties associated with methods based on distance matrices and contact maps. We show applications of our approach to detecting common contiguous substructures
in pairs of proteins, as well as the more difficult problem of identifying common protein domains (i.e., larger substructures that are not necessarily contiguous along the protein chain).
ISSN:1066-5277
DOI:10.1089/106652799318292
出版商:Mary Ann Liebert, Inc.
年代:1999
数据来源: MAL
|
5. |
De NovoPeptide Sequencing via Tandem Mass Spectrometry |
|
Journal of Computational Biology,
Volume 6,
Issue 3-4,
1999,
Page 327-342
Vlado Dank,
Theresa A. Addona,
Karl R. Clauser,
James E. Vath,
Pavel A. Pevzner,
Preview
|
PDF (680KB)
|
|
摘要:
Peptide sequencing via tandem mass spectrometry (MS/MS) is one of the most powerful tools in proteomics for identifying proteins. Because complete genome sequences are accumulating rapidly, the recent trend
in interpretation of MS/MS spectra has been database search. However,de novoMS/MS spectral interpretation remains an open problem typically involving manual interpretation by expert mass spectrometrists.
We have developed a new algorithm, SHERENGA, forde novointerpretation that automatically learns fragment ion types and intensity thresholds from a collection of test spectra generated from any
type of mass spectrometer. The test data are used to construct optimal path scoring in the graph representations of MS/MS spectra. A ranked list of high scoring paths corresponds to potential peptide sequences.SHERENGA is most useful for interpreting sequences of peptides resulting from unknown proteins and for validating the results of database search algorithms in fully automated, high-throughput peptide sequencing.
ISSN:1066-5277
DOI:10.1089/106652799318300
出版商:Mary Ann Liebert, Inc.
年代:1999
数据来源: MAL
|
6. |
Evolution of Metabolisms: A New Method for the Comparison of Metabolic Pathways Using Genomics Information |
|
Journal of Computational Biology,
Volume 6,
Issue 3-4,
1999,
Page 343-360
Christian V. Forst,
Klaus Schulten,
Preview
|
PDF (364KB)
|
|
摘要:
The abundance of information provided by completely sequenced genomes defines a starting point for new insights in the multilevel organization of organisms and their evolution. At the lowest level enzymes
and other protein complexes are formed by aggregating multiple polypeptides. At a higher level enzymes group conceptually into metabolic pathways as part of a dynamic information-processing system, andsubstrates are processed by enzymes yielding other substrates. A method based on a combination of sequence information with graph topology of the underlying pathway is presented. With this approach pathways
of different organisms are related to each other by phylogenetic analysis, extending conventional phylogenetic analysis of individual enzymes. The new method is applied to pathways related to electron transferand to the Krebs citric acid cycle. In addition to providing a more comprehensive understanding of similarities and differences between organisms, this method indicates different evolutionary rates between
substrates and enzymes.
ISSN:1066-5277
DOI:10.1089/106652799318319
出版商:Mary Ann Liebert, Inc.
年代:1999
数据来源: MAL
|
7. |
Optimal Reconstruction of a Sequence from its Probes |
|
Journal of Computational Biology,
Volume 6,
Issue 3-4,
1999,
Page 361-368
Alan M. Frieze,
Franco P. Preparata,
Eli Upfal,
Preview
|
PDF (189KB)
|
|
摘要:
An important combinatorial problem, motivated by DNA sequencing in molecular biology, is the reconstruction of a sequence over a small finite alphabet from the collection of its probes (the sequencespectrum),
obtained by sliding a fixed sampling pattern over the sequence. Such construction is required for Sequencing-by-Hybridization (SBH), a novel DNA sequencing technique based on an array (SBH chip) of shortnucleotide sequences (probes). Once the sequence spectrum is biochemically obtained, a combinatorial method is used to reconstruct the DNA sequence from its spectrum.Since technology limits
the number of probes on the SBH chip, a challenging combinatorial question is the design of a smallest set of probes that can sequence an arbitrary DNA string of a given length. We present in this worka novel probe design, crucially based on the use of universal bases [bases that bind to any nucleotide (Loakes and Brown, 1994)] that drastically improves the performance of the SBH process and asymptoticallyapproaches the information-theoretic bound up to a constant factor. Furthermore, the sequencing algorithm we propose is substantially simpler than the Eulerian path method used in previous solutions of
this problem.
ISSN:1066-5277
DOI:10.1089/106652799318328
出版商:Mary Ann Liebert, Inc.
年代:1999
数据来源: MAL
|
8. |
Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction |
|
Journal of Computational Biology,
Volume 6,
Issue 3-4,
1999,
Page 369-386
Daniel H. Huson,
Scott M. Nettles,
Tandy J. Warnow,
Preview
|
PDF (268KB)
|
|
摘要:
The evolutionary history of a set of species is represented by aphylogenetic tree, which is a rooted, leaf-labeled tree, where internal nodes represent ancestral species and the leaves represent
modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large and divergent trees from realistic length sequences have long been considered one of the major challenges insystematic biology. In this paper, we present a simple method, theDisk-Covering Method(DCM), whichbooststhe performance of base phylogenetic methods under various Markov models of evolution.
We analyze the performance of DCM-boosted distance methods under the Jukes-Cantor Markov model of biomolecular sequence evolution, and prove that for almost all trees,polylogarithmiclength sequences
suffice for complete accuracy with high probability, while polynomial length sequences always suffice. We also provide an experimental study based upon simulating sequence evolution on model trees. Thisstudy confirms substantial reductions in error rates at realistic sequence lengths.
ISSN:1066-5277
DOI:10.1089/106652799318337
出版商:Mary Ann Liebert, Inc.
年代:1999
数据来源: MAL
|
9. |
Efficient Algorithms for Protein Sequence Design and the Analysis of Certain Evolutionary Fitness Landscapes |
|
Journal of Computational Biology,
Volume 6,
Issue 3-4,
1999,
Page 387-404
Jon M. Kleinberg,
Preview
|
PDF (352KB)
|
|
摘要:
Protein sequence designis a natural inverse problem to protein structure prediction: given atarget structurein three dimensions, we wish to design an amino acid sequence that is likely
fold to it. A model of Sun, Brem, Chan, and Dill casts this problem as an optimization on a space of sequences of hydrophobic (H) and polar (P) monomers; the goal is to find a sequence that achieves a densehydrophobic core with few solvent-exposed hydrophobic residues. Sunet al. developed a heuristic method to search the space of sequences, without a guarantee of optimality or near-optimality; Hart
subsequently raised the computational tractability of constructing an optimal sequence in this model as an open question. Here we resolve this question by providing an efficient algorithm to construct optimalsequences; our algorithm has a polynomial running time, and performs very efficiently in practice. We illustrate the implementation of our method on structures drawn from the Protein Data Bank. We also
consider extensions of the model to larger amino acid alphabets, as a way to overcome the limitations of the binary H/P alphabet. We show that for a natural class of arbitrarily large alphabets, it remainspossible to design optimal sequences efficiently. Finally, we analyze some of the consequences of this sequence design model for the study of evolutionary fitness landscapes. A given target structure may
have many sequences that are optimal in the model of Sunet al.; following a notion raised by the work of J. Maynard Smith, we can ask whether these optimal sequences are "connected" by successive
point mutations. We provide a polynomial-time algorithm to decide this connectedness property, relative to a given target structure. We develop the algorithm by first solving an analogous problem expressedin terms ofsubmodular functions, a fundamental object of study in combinatorial optimization.
ISSN:1066-5277
DOI:10.1089/106652799318346
出版商:Mary Ann Liebert, Inc.
年代:1999
数据来源: MAL
|
10. |
An Anytime Local-to-Global Optimization Algorithm for Protein Threading in O(m22) Space |
|
Journal of Computational Biology,
Volume 6,
Issue 3-4,
1999,
Page 405-418
Richard H. Lathrop,
Preview
|
PDF (284KB)
|
|
摘要:
This paper describes a novel anytime branch-and-bound or best-first threading search algorithm for gapped block protein sequence-structure alignment with general sequence residue pair interactions. The
new algorithm (1) returns a good approximate answer quickly, (2) iteratively improves that answer to the global optimum if allowed more time, (3) eventually produces a proof that the final answer foundis indeed the global optimum, and (4) always terminates correctly within a bounded number of steps if allowed sufficient space and time. It runs in polynomial space, which is asymptotically dominated by
the O(m22) space required by the lower bound computation. Using previously published data sets and the Bryant-Lawrence (1993) objective function, the algorithm found the true
(proven) global optimum in less than 5 min in all search spaces size 1025or smaller (sequences to 478 residues), and a putative (not guaranteed) optimum in less than 5 hr in all search spaces
size 1060or smaller (sequences to 793 residues, cores to 42 secondary structure segments). The threading in the largest case studied was eventually proven to be globally optimal; the corresponding
search speed in that case was the equivalent of 1.5x1056threadings/sec, a speed-up exceeding 1025over previously published batch branch-and-bound speeds, and exceeding 1050over previously published exhaustive search speeds, using the same objective function and threading paradigm. Implementation-independent measures of search efficiency are defined for equivalent branching
factor, depth, and probability of success per draw; empirical data on these measures are given. The general approach should apply to other alignment methodologies and search methods that use a divide-and-conquerstrategy.
ISSN:1066-5277
DOI:10.1089/106652799318355
出版商:Mary Ann Liebert, Inc.
年代:1999
数据来源: MAL
|
|