SwePub
Sök i LIBRIS databas

  Utökad sökning

id:"swepub:oai:gup.ub.gu.se/253129"
 

Sökning: id:"swepub:oai:gup.ub.gu.se/253129" > Clustering protein ...

Clustering protein sequences--structure prediction by transitive homology.

Bolten, E (författare)
Schliep, Alexander, 1967 (författare)
Gothenburg University,Göteborgs universitet,Institutionen för data- och informationsteknik, datavetenskap (GU),Department of Computer Science and Engineering, Computing Science (GU)
Schneckener, S (författare)
visa fler...
Schomburg, D (författare)
Schrader, R (författare)
visa färre...
 (creator_code:org_t)
2001
2001
Engelska.
Ingår i: Bioinformatics (Oxford, England). - 1367-4803. ; 17:10, s. 935-41
  • Tidskriftsartikel (refereegranskat)
Abstract Ämnesord
Stäng  
  • It is widely believed that for two proteins Aand Ba sequence identity above some threshold implies structural similarity due to a common evolutionary ancestor. Since this is only a sufficient, but not a necessary condition for structural similarity, the question remains what other criteria can be used to identify remote homologues. Transitivity refers to the concept of deducing a structural similarity between proteins A and C from the existence of a third protein B, such that A and B as well as B and C are homologues, as ascertained if the sequence identity between A and B as well as that between B and C is above the aforementioned threshold. It is not fully understood if transitivity always holds and whether transitivity can be extended ad infinitum.We developed a graph-based clustering approach, where transitivity plays a crucial role. We determined all pair-wise similarities for the sequences in the SwissProt database using the Smith-Waterman local alignment algorithm. This data was transformed into a directed graph, where protein sequences constitute vertices. A directed edge was drawn from vertex A to vertex B if the sequences A and B showed similarity, scaled with respect to the self-similarity of A, above a fixed threshold. Transitivity was important in the clustering process, as intermediate sequences were used, limited though by the requirement of having directed paths in both directions between proteins linked over such sequences. The length dependency-implied by the self-similarity-of the scaling of the alignment scores appears to be an effective criterion to avoid clustering errors due to multi-domain proteins. To deal with the resulting large graphs we have developed an efficient library. Methods include the novel graph-based clustering algorithm capable of handling multi-domain proteins and cluster comparison algorithms. Structural Classification of Proteins (SCOP) was used as an evaluation data set for our method, yielding a 24% improvement over pair-wise comparisons in terms of detecting remote homologues.The software is available to academic users on request from the authors.e.bolten@science-factory.com; schliep@zpr.uni-koeln.de; s.schneckener@science-factory.com; d.schomburg@uni-koeln.de; schrader@zpr.uni-koeln.de.http://www.zaik.uni-koeln.de/~schliep/ProtClust.html.

Ämnesord

NATURVETENSKAP  -- Data- och informationsvetenskap -- Bioinformatik (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Bioinformatics (hsv//eng)

Nyckelord

Algorithms
Cluster Analysis
Computational Biology
Databases
Protein
statistics & numerical data
Proteins
genetics
Sensitivity and Specificity
Sequence Alignment
statistics & numerical data
Sequence Homology
Amino Acid
Software

Publikations- och innehållstyp

ref (ämneskategori)
art (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy