首页   按字顺浏览 期刊浏览 卷期浏览 Walking tree heuristics for biological string alignment, gene location, and phylogenies
Walking tree heuristics for biological string alignment, gene location, and phylogenies

 

作者: P. Cull,   J. L. Holloway,   J. D. Cavener,  

 

期刊: AIP Conference Proceedings  (AIP Available online 1999)
卷期: Volume 465, issue 1  

页码: 201-215

 

ISSN:0094-243X

 

年代: 1999

 

DOI:10.1063/1.58246

 

出版商: AIP

 

数据来源: AIP

 

摘要:

Basic biological information is stored in strings of nucleic acids (DNA, RNA) or amino acids (proteins). Teasing out the meaning of these strings is a central problem of modern biology. Matching and aligning strings brings out their shared characteristics. Although string matching is well-understood in the edit-distance model, biological strings with transpositions and inversions violate this model’s assumptions. We propose a family of heuristics calledwalking treesto align biologically reasonable strings. Both edit-distance and walking tree methods can locate specific genes within a large string when the genes’ sequences are given. When we attempt to match whole strings, the walking tree matches most genes, while the edit-distance method fails. We also give examples in which the walking tree matches substrings even if they have been moved or inverted. The edit-distance method was not designed to handle these problems. We include an example in which the walking tree “discovered” a gene. Calculating scores for whole genome matches gives a method for approximating evolutionary distance. We show two evolutionary trees for the picornaviruses which were computed by the walking tree heuristic. Both of these trees show great similarity to previously constructed trees. The point of this demonstration is that WHOLE genomes can be matched and distances calculated. The first tree was created on a Sequent parallel computer and demonstrates that the walking tree heuristic can be efficiently parallelized. The second tree was created using a network of work stations and demonstrates that there is suffient parallelism in the phylogenetic tree calculation that the sequential walking tree can be used effectively on a network. ©1999 American Institute of Physics.

 

点击下载:  PDF (1317KB)



返 回