SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Noble William Stafford) "

Sökning: WFRF:(Noble William Stafford)

  • Resultat 1-16 av 16
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Freestone, Jack, et al. (författare)
  • Semi-supervised Learning While Controlling the FDR with an Application to Tandem Mass Spectrometry Analysis
  • 2024
  • Ingår i: Research in Computational Molecular Biology - 28th Annual International Conference, RECOMB 2024, Proceedings. - : Springer Science and Business Media Deutschland GmbH. ; , s. 448-453
  • Konferensbidrag (refereegranskat)abstract
    • Canonical procedures to control the false discovery rate (FDR) among the list of putative discoveries rely on our ability to compute informative p-values. Competition-based approach offers a fairly novel and increasingly popular alternative when computing such p-values is impractical. The popularity of this approach stems from its wide applicability: instead of computing p-values, which requires knowing the entire null distribution for each null hypothesis, a competition-based approach only requires a single draw from each such null distribution. This drawn example is known as a “decoy” in the mass spectrometry community (which was the first to adopt the competition approach) or as a “knockoff” in the statistics community. The decoy is competed with the original observation so that only the higher scoring of the two is retained. The number of decoy wins is subsequently used to estimate and control the FDR among the target wins. In this paper we offer a novel method to extend the competition-based approach to control the FDR while taking advantage of side information, i.e., additional features that can help us distinguish between correct and incorrect discoveries. Our motivation comes from the problem of peptide detection in tandem mass spectrometry proteomics data. Specifically, we recently showed that a popular mass spectrometry analysis software tool, Percolator, can apparently fail to control the FDR. We address this problem here by developing a general protocol called “RESET” that can take advantage of the additional features, such as the ones Percolator uses, while still theoretically and empirically controlling the FDR.
  •  
2.
  • Granholm, Viktor, 1986-, et al. (författare)
  • A cross-validation scheme for machine learning algorithms in shotgun proteomics
  • 2012
  • Ingår i: BMC Bioinformatics. - : Springer Nature. - 1471-2105. ; 13:S16, s. S3-
  • Tidskriftsartikel (refereegranskat)abstract
    • Peptides are routinely identified from mass spectrometry-based proteomics experiments by matching observed spectra to peptides derived from protein databases. The error rates of these identifications can be estimated by target-decoy analysis, which involves matching spectra to shuffled or reversed peptides. Besides estimating error rates, decoy searches can be used by semi-supervised machine learning algorithms to increase the number of confidently identified peptides. As for all machine learning algorithms, however, the results must be validated to avoid issues such as overfitting or biased learning, which would produce unreliable peptide identifications. Here, we discuss how the target-decoy method is employed in machine learning for shotgun proteomics, focusing on how the results can be validated by cross-validation, a frequently used validation scheme in machine learning. We also use simulated data to demonstrate the proposed cross-validation scheme's ability to detect overfitting.
  •  
3.
  • Granholm, Viktor, 1986-, et al. (författare)
  • Determining the calibration of confidence estimation procedures for unique peptides in shotgun proteomics
  • 2013
  • Ingår i: Journal of Proteomics. - : Elsevier BV. - 1874-3919 .- 1876-7737. ; 80, s. 123-131
  • Tidskriftsartikel (refereegranskat)abstract
    • The analysis of a shotgun proteomics experiment results in a list of peptide-spectrum matches (PSMs) in which each fragmentation spectrum has been matched to a peptide in a database. Subsequently, most protein inference algorithms rank peptides according to the best-scoring PSM for each peptide. However, there is disagreement in the scientific literature on the best method to assess the statistical significance of the resulting peptide identifications. Here, we use a previously described calibration protocol to evaluate the accuracy of three different peptide-level statistical confidence estimation procedures: the classical Fisher's method, and two complementary procedures that estimate significance, respectively, before and after selecting the top-scoring PSM for each spectrum. Our experiments show that the latter method, which is employed by MaxQuant and Percolator, produces the most accurate, well-calibrated results.
  •  
4.
  • Granholm, Viktor, 1986-, et al. (författare)
  • On Using Samples of Known Protein Content to Assess the Statistical Calibration of Scores Assigned to Peptide-Spectrum Matches in Shotgun Proteomics
  • 2011
  • Ingår i: Journal of Proteome Research. - : American Chemical Society (ACS). - 1535-3893 .- 1535-3907. ; 10:5, s. 2671-2678
  • Tidskriftsartikel (refereegranskat)abstract
    • In shotgun proteomics, the quality of a hypothesized match between an observed spectrum and a peptide sequence is quantified by a score function. Because the score function lies at the heart of any peptide identification pipeline, this function greatly affects the final results of a proteomics assay. Consequently, valid statistical methods for assessing the quality of a given score function are extremely important. Previously, several research groups have used samples of known protein composition to assess the quality of a given score function. We demonstrate that this approach is problematic, because the outcome can depend on factors other than the score function itself. We then propose an alternative use of the same type of data to validate a score function. The central idea of our approach is that database matches that are not explained by any protein in the purified sample comprise a robust representation of incorrect matches. We apply our alternative assessment scheme to several commonly used score functions, and we show that our approach generates a reproducible measure of the calibration of a given peptide identification method. Furthermore, we show how our quality test can be useful in the development of novel score functions.
  •  
5.
  • Halloran, John T., et al. (författare)
  • Speeding Up Percolator
  • 2019
  • Ingår i: Journal of Proteome Research. - : AMER CHEMICAL SOC. - 1535-3893 .- 1535-3907. ; 18:9, s. 3353-3359
  • Tidskriftsartikel (refereegranskat)abstract
    • The processing of peptide tandem mass spectrometry data involves matching observed spectra against a sequence database. The ranking and calibration of these peptide-spectrum matches can be improved substantially using a machine learning postprocessor. Here, we describe our efforts to speed up one widely used postprocessor, Percolator. The improved software is dramatically faster than the previous version of Percolator, even when using relatively few processors. We tested the new version of Percolator on a data set containing over 215 million spectra and recorded an overall reduction to 23% of the running time as compared to the unoptimized code. We also show that the memory footprint required by these speedups is modest relative to that of the original version of Percolator.
  •  
6.
  • Käll, Lukas, 1969-, et al. (författare)
  • Assigning significance to peptides identified by tandem mass spectrometry using decoy databases
  • 2008
  • Ingår i: Journal of Proteome Research. - : American Chemical Society (ACS). - 1535-3893 .- 1535-3907. ; 7:1, s. 29-34
  • Tidskriftsartikel (refereegranskat)abstract
    • Automated methods for assigning peptides to observed tandem mass spectra typically return a list of peptide-spectrum matches, ranked according to an arbitrary score. In this article, we describe methods for converting these arbitrary scores into more useful statistical significance measures. These methods employ a decoy sequence database as a model of the null hypothesis, and use false discovery rate (FDR) analysis to correct for multiple testing. We first describe a simple FDR inference method and then describe how estimating and taking into account the percentage of incorrectly identified spectra in the entire data set can lead to increased statistical power.
  •  
7.
  • Käll, Lukas, et al. (författare)
  • Non-parametric estimation of posterior error probabilities associated with peptides identified by tandem mass spectrometry
  • 2008
  • Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 24:16, s. i42-i48
  • Tidskriftsartikel (refereegranskat)abstract
    • Motivation: A mass spectrum produced via tandem mass spectrometry can be tentatively matched to a peptide sequence via database search. Here, we address the problem of assigning a posterior error probability (PEP) to a given peptide-spectrum match (PSM). This problem is considerably more difficult than the related problem of estimating the error rate associated with a large collection of PSMs. Existing methods for estimating PEPs rely on a parametric or semiparametric model of the underlying score distribution. Results: We demonstrate how to apply non-parametric logistic regression to this problem. The method makes no explicit assumptions about the form of the underlying score distribution; instead, the method relies upon decoy PSMs, produced by searching the spectra against a decoy sequence database, to provide a model of the null score distribution. We show that our non-parametric logistic regression method produces accurate PEP estimates for six different commonly used PSM score functions. In particular, the estimates produced by our method are comparable in accuracy to those of PeptideProphet, which uses a parametric or semiparametric model designed specifically to work with SEQUEST. The advantage of the non-parametric approach is applicability and robustness to new score functions and new types of data.
  •  
8.
  • Käll, Lukas, 1969-, et al. (författare)
  • Posterior error probabilities and false discovery rates : two sides of the same coin
  • 2008
  • Ingår i: Journal of Proteome Research. - : American Chemical Society (ACS). - 1535-3893 .- 1535-3907. ; 7:1, s. 40-44
  • Tidskriftsartikel (refereegranskat)abstract
    • A variety of methods have been described in the literature for assigning statistical significance to peptides identified via tandem mass spectrometry. Here, we explain how two types of scores, the q-value and the posterior error probability, are related and complementary to one another.
  •  
9.
  • Käll, Lukas, 1969-, et al. (författare)
  • QVALITY : non-parametric estimation of q-values and posterior error probabilities
  • 2009
  • Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 25:7, s. 964-966
  • Tidskriftsartikel (refereegranskat)abstract
    • Qvality is a C++ program for estimating two types of standard statistical confidence measures: the q-value, which is an analog of the p-value that incorporates multiple testing correction, and the posterior error probability (PEP, also known as the local false discovery rate), which corresponds to the probability that a given observation is drawn from the null distribution. In computing q-values, qvality employs a standard bootstrap procedure to estimate the prior probability of a score being from the null distribution; for PEP estimation, qvality relies upon non-parametric logistic regression. Relative to other tools for estimating statistical confidence measures, qvality is unique in its ability to estimate both types of scores directly from a null distribution, without requiring the user to calculate p-values.
  •  
10.
  • Käll, Lukas, et al. (författare)
  • Semi-supervised learning for peptide identification from shotgun proteomics datasets
  • 2007
  • Ingår i: Nature Methods. - : Springer Science and Business Media LLC. - 1548-7091 .- 1548-7105. ; 4:11, s. 923-925
  • Tidskriftsartikel (refereegranskat)abstract
    • Shotgun proteomics uses liquid chromatography-tandem mass spectrometry to identify proteins in complex biological samples. We describe an algorithm, called Percolator, for improving the rate of confident peptide identifications from a collection of tandem mass spectra. Percolator uses semi-supervised machine learning to discriminate between correct and decoy spectrum identifications, correctly assigning peptides to 17% more spectra from a tryptic Saccharomyces cerevisiae dataset, and up to 77% more spectra from non-tryptic digests, relative to a fully supervised approach.
  •  
11.
  • McIlwain, Sean, et al. (författare)
  • Crux : Rapid Open Source Protein Tandem Mass Spectrometry Analysis
  • 2014
  • Ingår i: Journal of Proteome Research. - : American Chemical Society (ACS). - 1535-3893 .- 1535-3907. ; 13:10, s. 4488-4491
  • Tidskriftsartikel (refereegranskat)abstract
    • Efficiently and accurately analyzing big protein tandem mass spectrometry data sets requires robust software that incorporates state-of-the-art computational, machine learning, and statistical methods. The Crux mass spectrometry analysis software toolkit (http://cruxtoolkit.sourceforge.net) is an open source project that aims to provide users with a cross-platform suite of analysis tools for interpreting protein mass spectrometry data.
  •  
12.
  • Merrihew, Gennifer E., et al. (författare)
  • Use of shotgun proteomics for the identification, confirmation, and correction of C. elegans gene annotations
  • 2008
  • Ingår i: Genome Research. - : Cold Spring Harbor Laboratory. - 1088-9051 .- 1549-5469. ; 18:10, s. 1660-1669
  • Tidskriftsartikel (refereegranskat)abstract
    • We describe a general mass spectrometry-based approach for gene annotation of any organism and demonstrate its effectiveness using the nematode Caenorhabditis elegans. We detected 6779 C. elegans proteins (67,047 peptides), including 384 that, although annotated in WormBase WS150, lacked cDNA or other prior experimental support. We also identified 429 new coding sequences that were unannotated in WS150. Nearly half (192/429) of the new coding sequences were confirmed with RT-PCR data. Thirty-three (approximately 8%) of the new coding sequences had been predicted to be pseudogenes, 151 (approximately 35%) reveal apparent errors in gene models, and 245 (57%) appear to be novel genes. In addition, we verified 6010 exon-exon splice junctions within existing WormBase gene models. Our work confirms that mass spectrometry is a powerful experimental tool for annotating sequenced genomes. In addition, the collection of identified peptides should facilitate future proteomics experiments targeted at specific proteins of interest.
  •  
13.
  • Palmblad, Magnus, et al. (författare)
  • Interpretation of the DOME Recommendations for Machine Learning in Proteomics and Metabolomics
  • 2022
  • Ingår i: Journal of Proteome Research. - : American Chemical Society (ACS). - 1535-3893 .- 1535-3907. ; 21:4, s. 1204-1207
  • Tidskriftsartikel (refereegranskat)abstract
    • Machine learning is increasingly applied in proteomics and metabolomics to predict molecular structure, function, and physicochemical properties, including behavior in chromatography, ion mobility, and tandem mass spectrometry. These must be described in sufficient detail to apply or evaluate the performance of trained models. Here we look at and interpret the recently published and general DOME (Data, Optimization, Model, Evaluation) recommendations for conducting and reporting on machine learning in the specific context of proteomics and metabolomics.
  •  
14.
  • Reynolds, Sheila M., et al. (författare)
  • Transmembrane topology and signal peptide prediction using dynamic bayesian networks
  • 2008
  • Ingår i: PloS Computational Biology. - : Public Library of Science (PLoS). - 1553-734X .- 1553-7358. ; 4:11, s. e1000213-
  • Tidskriftsartikel (refereegranskat)abstract
    • Hidden Markov models (HMMs) have been successfully applied to the tasks of transmembrane protein topology prediction and signal peptide prediction. In this paper we expand upon this work by making use of the more powerful class of dynamic Bayesian networks (DBNs). Our model, Philius, is inspired by a previously published HMM, Phobius, and combines a signal peptide submodel with a transmembrane submodel. We introduce a two-stage DBN decoder that combines the power of posterior decoding with the grammar constraints of Viterbi-style decoding. Philius also provides protein type, segment, and topology confidence metrics to aid in the interpretation of the predictions. We report a relative improvement of 13% over Phobius in full-topology prediction accuracy on transmembrane proteins, and a sensitivity and specificity of 0.96 in detecting signal peptides. We also show that our confidence metrics correlate well with the observed precision. In addition, we have made predictions on all 6.3 million proteins in the Yeast Resource Center (YRC) database. This large-scale study provides an overall picture of the relative numbers of proteins that include a signal-peptide and/or one or more transmembrane segments as well as a valuable resource for the scientific community. All DBNs are implemented using the Graphical Models Toolkit. Source code for the models described here is available at http://noble.gs.washington.edu/proj/philius. A Philius Web server is available at http://www.yeastrc.org/philius, and the predictions on the YRC database are available at http://www.yeastrc.org/pdr.
  •  
15.
  • Spivak, Marina, et al. (författare)
  • Improvements to the percolator algorithm for Peptide identification from shotgun proteomics data sets
  • 2009
  • Ingår i: Journal of Proteome Research. - : American Chemical Society (ACS). - 1535-3893 .- 1535-3907. ; 8:7, s. 3737-3745
  • Tidskriftsartikel (refereegranskat)abstract
    • Shotgun proteomics coupled with database search software allows the identification of a large number of peptides in a single experiment. However, some existing search algorithms, such as SEQUEST, use score functions that are designed primarily to identify the best peptide for a given spectrum. Consequently, when comparing identifications across spectra, the SEQUEST score function Xcorr fails to discriminate accurately between correct and incorrect peptide identifications. Several machine learning methods have been proposed to address the resulting classification task of distinguishing between correct and incorrect peptide-spectrum matches (PSMs). A recent example is Percolator, which uses semisupervised learning and a decoy database search strategy to learn to distinguish between correct and incorrect PSMs identified by a database search algorithm. The current work describes three improvements to Percolator. (1) Percolator's heuristic optimization is replaced with a clear objective function, with intuitive reasons behind its choice. (2) Tractable nonlinear models are used instead of linear models, leading to improved accuracy over the original Percolator. (3) A method, Q-ranker, for directly optimizing the number of identified spectra at a specified q value is proposed, which achieves further gains.
  •  
16.
  • Ting, Ying S., et al. (författare)
  • Peptide-Centric Proteome Analysis : An Alternative Strategy for the Analysis of Tandem Mass Spectrometry Data
  • 2015
  • Ingår i: Molecular & Cellular Proteomics. - : Elsevier BV. - 1535-9476 .- 1535-9484. ; 14:9, s. 2301-2307
  • Forskningsöversikt (refereegranskat)abstract
    • In mass spectrometry-based bottom-up proteomics, data-independent acquisition is an emerging technique because of its comprehensive and unbiased sampling of precursor ions. However, current data-independent acquisition methods use wide precursor isolation windows, resulting in cofragmentation and complex mixture spectra. Thus, conventional database searching tools that identify peptides by interpreting individual tandem MS spectra are inherently limited in analyzing data-independent acquisition data. Here we discuss an alternative approach, peptide-centric analysis, which tests directly for the presence and absence of query peptides. We discuss how peptide-centric analysis resolves some limitations of traditional spectrum-centric analysis, and we outline the unique characteristics of peptide-centric analysis in general.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-16 av 16

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy