SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Schneckener S) "

Sökning: WFRF:(Schneckener S)

  • Resultat 1-2 av 2
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Bolten, E, et al. (författare)
  • Clustering protein sequences--structure prediction by transitive homology.
  • 2001
  • Ingår i: Bioinformatics (Oxford, England). - 1367-4803. ; 17:10, s. 935-41
  • Tidskriftsartikel (refereegranskat)abstract
    • It is widely believed that for two proteins Aand Ba sequence identity above some threshold implies structural similarity due to a common evolutionary ancestor. Since this is only a sufficient, but not a necessary condition for structural similarity, the question remains what other criteria can be used to identify remote homologues. Transitivity refers to the concept of deducing a structural similarity between proteins A and C from the existence of a third protein B, such that A and B as well as B and C are homologues, as ascertained if the sequence identity between A and B as well as that between B and C is above the aforementioned threshold. It is not fully understood if transitivity always holds and whether transitivity can be extended ad infinitum.We developed a graph-based clustering approach, where transitivity plays a crucial role. We determined all pair-wise similarities for the sequences in the SwissProt database using the Smith-Waterman local alignment algorithm. This data was transformed into a directed graph, where protein sequences constitute vertices. A directed edge was drawn from vertex A to vertex B if the sequences A and B showed similarity, scaled with respect to the self-similarity of A, above a fixed threshold. Transitivity was important in the clustering process, as intermediate sequences were used, limited though by the requirement of having directed paths in both directions between proteins linked over such sequences. The length dependency-implied by the self-similarity-of the scaling of the alignment scores appears to be an effective criterion to avoid clustering errors due to multi-domain proteins. To deal with the resulting large graphs we have developed an efficient library. Methods include the novel graph-based clustering algorithm capable of handling multi-domain proteins and cluster comparison algorithms. Structural Classification of Proteins (SCOP) was used as an evaluation data set for our method, yielding a 24% improvement over pair-wise comparisons in terms of detecting remote homologues.The software is available to academic users on request from the authors.e.bolten@science-factory.com; schliep@zpr.uni-koeln.de; s.schneckener@science-factory.com; d.schomburg@uni-koeln.de; schrader@zpr.uni-koeln.de.http://www.zaik.uni-koeln.de/~schliep/ProtClust.html.
  •  
2.
  • Pipenbacher, P, et al. (författare)
  • ProClust: improved clustering of protein sequences with an extended graph-based approach.
  • 2002
  • Ingår i: Bioinformatics (Oxford, England). - 1367-4803. ; 18 Suppl 2
  • Tidskriftsartikel (refereegranskat)abstract
    • The problem of finding remote homologues of a given protein sequence via alignment methods is not fully solved. In fact, the task seems to become more difficult with more data. As the size of the database increases, so does the noise level; the highest alignment scores due to random similarities increase and can be higher than the alignment score between true homologues. Comparing two sequences with an arbitrary alignment method yields a similarity value which may indicate an evolutionary relationship between them. A threshold value is usually chosen to distinguish between true homologue relationships and random similarities. To compensate for the higher probability of spurious hits in larger databases, this threshold is increased. Increasing specificity however leads to decreased sensitivity as a matter of principle. Sensitivity can be recovered by utilizing refined protocols. A number of approaches to this challenge have made use of the fact that proteins are often members of some larger protein family. This can be exploited by using position-specific substitution matrices or profiles, or by making use of transitivity of homology. Transitivity refers to the concept of concluding homology between proteins A and C based on homology between A and a third protein B and between B and C. It has been demonstrated that transitivity can lead to substantial improvement in recognition of remote homologues particularly in cases where the alignment score of A and C is below the noise level. A natural limit to the use of transitivity is imposed by domains. Domains, compact independent sub-units of proteins, are often shared between otherwise distinct proteins, and can cause substantial problems by incorrectly linking otherwise unrelated proteins.We extend a graph-based clustering algorithm which uses an asymmetric distance measure, scaling similarity values based on the length of the protein sequences compared. Additionally, the significance of alignment scores is taken into account and used for a filtering step in the algorithm. Post-processing, to merge further clusters based on profile HMMs is proposed. SCOP sequences and their super-family level classification are used as a test set for a clustering computed with our method for the joint data set containing both SCOP and SWISS-PROT. Note, the joint data set includes all multi-domain proteins, which contain the SCOP domains that are a potential source of incorrect links. Our method compares at high specificities very favorably with PSI-Blast, which is probably the most widely-used tool for finding remote homologues. We demonstrate that using transitivity with as many as twelve intermediate sequences is crucial to achieving this level of performance. Moreover, from analysis of false positives we conclude that our method seems to correctly bound the degree of transitivity used. This analysis also yields explicit guidance in choosing parameters. The heuristics of the asymmetric distance measure used neither solve the multi-domain problem from a theoretical point of view, nor do they avoid all types of problems we have observed in real data. Nevertheless, they do provide a substantial improvement over existing approaches.The complete software source is freely available to all users under the GNU General Public License (GPL) from http://www.bioinformatik.uni-koeln.de/~proclust/download/
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-2 av 2
Typ av publikation
tidskriftsartikel (2)
Typ av innehåll
refereegranskat (2)
Författare/redaktör
Schliep, Alexander, ... (2)
Schneckener, S (2)
Schomburg, D (2)
Schrader, R (2)
Bolten, E (1)
Pipenbacher, P (1)
visa fler...
Schönhuth, A (1)
visa färre...
Lärosäte
Göteborgs universitet (2)
Språk
Engelska (2)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (2)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy