SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Noble William Stafford) "

Sökning: WFRF:(Noble William Stafford)

  • Resultat 1-10 av 16
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Freestone, Jack, et al. (författare)
  • Semi-supervised Learning While Controlling the FDR with an Application to Tandem Mass Spectrometry Analysis
  • 2024
  • Ingår i: Research in Computational Molecular Biology - 28th Annual International Conference, RECOMB 2024, Proceedings. - : Springer Science and Business Media Deutschland GmbH. ; , s. 448-453
  • Konferensbidrag (refereegranskat)abstract
    • Canonical procedures to control the false discovery rate (FDR) among the list of putative discoveries rely on our ability to compute informative p-values. Competition-based approach offers a fairly novel and increasingly popular alternative when computing such p-values is impractical. The popularity of this approach stems from its wide applicability: instead of computing p-values, which requires knowing the entire null distribution for each null hypothesis, a competition-based approach only requires a single draw from each such null distribution. This drawn example is known as a “decoy” in the mass spectrometry community (which was the first to adopt the competition approach) or as a “knockoff” in the statistics community. The decoy is competed with the original observation so that only the higher scoring of the two is retained. The number of decoy wins is subsequently used to estimate and control the FDR among the target wins. In this paper we offer a novel method to extend the competition-based approach to control the FDR while taking advantage of side information, i.e., additional features that can help us distinguish between correct and incorrect discoveries. Our motivation comes from the problem of peptide detection in tandem mass spectrometry proteomics data. Specifically, we recently showed that a popular mass spectrometry analysis software tool, Percolator, can apparently fail to control the FDR. We address this problem here by developing a general protocol called “RESET” that can take advantage of the additional features, such as the ones Percolator uses, while still theoretically and empirically controlling the FDR.
  •  
2.
  • Granholm, Viktor, 1986-, et al. (författare)
  • A cross-validation scheme for machine learning algorithms in shotgun proteomics
  • 2012
  • Ingår i: BMC Bioinformatics. - : Springer Nature. - 1471-2105. ; 13:S16, s. S3-
  • Tidskriftsartikel (refereegranskat)abstract
    • Peptides are routinely identified from mass spectrometry-based proteomics experiments by matching observed spectra to peptides derived from protein databases. The error rates of these identifications can be estimated by target-decoy analysis, which involves matching spectra to shuffled or reversed peptides. Besides estimating error rates, decoy searches can be used by semi-supervised machine learning algorithms to increase the number of confidently identified peptides. As for all machine learning algorithms, however, the results must be validated to avoid issues such as overfitting or biased learning, which would produce unreliable peptide identifications. Here, we discuss how the target-decoy method is employed in machine learning for shotgun proteomics, focusing on how the results can be validated by cross-validation, a frequently used validation scheme in machine learning. We also use simulated data to demonstrate the proposed cross-validation scheme's ability to detect overfitting.
  •  
3.
  • Granholm, Viktor, 1986-, et al. (författare)
  • Determining the calibration of confidence estimation procedures for unique peptides in shotgun proteomics
  • 2013
  • Ingår i: Journal of Proteomics. - : Elsevier BV. - 1874-3919 .- 1876-7737. ; 80, s. 123-131
  • Tidskriftsartikel (refereegranskat)abstract
    • The analysis of a shotgun proteomics experiment results in a list of peptide-spectrum matches (PSMs) in which each fragmentation spectrum has been matched to a peptide in a database. Subsequently, most protein inference algorithms rank peptides according to the best-scoring PSM for each peptide. However, there is disagreement in the scientific literature on the best method to assess the statistical significance of the resulting peptide identifications. Here, we use a previously described calibration protocol to evaluate the accuracy of three different peptide-level statistical confidence estimation procedures: the classical Fisher's method, and two complementary procedures that estimate significance, respectively, before and after selecting the top-scoring PSM for each spectrum. Our experiments show that the latter method, which is employed by MaxQuant and Percolator, produces the most accurate, well-calibrated results.
  •  
4.
  • Granholm, Viktor, 1986-, et al. (författare)
  • On Using Samples of Known Protein Content to Assess the Statistical Calibration of Scores Assigned to Peptide-Spectrum Matches in Shotgun Proteomics
  • 2011
  • Ingår i: Journal of Proteome Research. - : American Chemical Society (ACS). - 1535-3893 .- 1535-3907. ; 10:5, s. 2671-2678
  • Tidskriftsartikel (refereegranskat)abstract
    • In shotgun proteomics, the quality of a hypothesized match between an observed spectrum and a peptide sequence is quantified by a score function. Because the score function lies at the heart of any peptide identification pipeline, this function greatly affects the final results of a proteomics assay. Consequently, valid statistical methods for assessing the quality of a given score function are extremely important. Previously, several research groups have used samples of known protein composition to assess the quality of a given score function. We demonstrate that this approach is problematic, because the outcome can depend on factors other than the score function itself. We then propose an alternative use of the same type of data to validate a score function. The central idea of our approach is that database matches that are not explained by any protein in the purified sample comprise a robust representation of incorrect matches. We apply our alternative assessment scheme to several commonly used score functions, and we show that our approach generates a reproducible measure of the calibration of a given peptide identification method. Furthermore, we show how our quality test can be useful in the development of novel score functions.
  •  
5.
  • Halloran, John T., et al. (författare)
  • Speeding Up Percolator
  • 2019
  • Ingår i: Journal of Proteome Research. - : AMER CHEMICAL SOC. - 1535-3893 .- 1535-3907. ; 18:9, s. 3353-3359
  • Tidskriftsartikel (refereegranskat)abstract
    • The processing of peptide tandem mass spectrometry data involves matching observed spectra against a sequence database. The ranking and calibration of these peptide-spectrum matches can be improved substantially using a machine learning postprocessor. Here, we describe our efforts to speed up one widely used postprocessor, Percolator. The improved software is dramatically faster than the previous version of Percolator, even when using relatively few processors. We tested the new version of Percolator on a data set containing over 215 million spectra and recorded an overall reduction to 23% of the running time as compared to the unoptimized code. We also show that the memory footprint required by these speedups is modest relative to that of the original version of Percolator.
  •  
6.
  • Käll, Lukas, 1969-, et al. (författare)
  • Assigning significance to peptides identified by tandem mass spectrometry using decoy databases
  • 2008
  • Ingår i: Journal of Proteome Research. - : American Chemical Society (ACS). - 1535-3893 .- 1535-3907. ; 7:1, s. 29-34
  • Tidskriftsartikel (refereegranskat)abstract
    • Automated methods for assigning peptides to observed tandem mass spectra typically return a list of peptide-spectrum matches, ranked according to an arbitrary score. In this article, we describe methods for converting these arbitrary scores into more useful statistical significance measures. These methods employ a decoy sequence database as a model of the null hypothesis, and use false discovery rate (FDR) analysis to correct for multiple testing. We first describe a simple FDR inference method and then describe how estimating and taking into account the percentage of incorrectly identified spectra in the entire data set can lead to increased statistical power.
  •  
7.
  • Käll, Lukas, et al. (författare)
  • Non-parametric estimation of posterior error probabilities associated with peptides identified by tandem mass spectrometry
  • 2008
  • Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 24:16, s. i42-i48
  • Tidskriftsartikel (refereegranskat)abstract
    • Motivation: A mass spectrum produced via tandem mass spectrometry can be tentatively matched to a peptide sequence via database search. Here, we address the problem of assigning a posterior error probability (PEP) to a given peptide-spectrum match (PSM). This problem is considerably more difficult than the related problem of estimating the error rate associated with a large collection of PSMs. Existing methods for estimating PEPs rely on a parametric or semiparametric model of the underlying score distribution. Results: We demonstrate how to apply non-parametric logistic regression to this problem. The method makes no explicit assumptions about the form of the underlying score distribution; instead, the method relies upon decoy PSMs, produced by searching the spectra against a decoy sequence database, to provide a model of the null score distribution. We show that our non-parametric logistic regression method produces accurate PEP estimates for six different commonly used PSM score functions. In particular, the estimates produced by our method are comparable in accuracy to those of PeptideProphet, which uses a parametric or semiparametric model designed specifically to work with SEQUEST. The advantage of the non-parametric approach is applicability and robustness to new score functions and new types of data.
  •  
8.
  • Käll, Lukas, 1969-, et al. (författare)
  • Posterior error probabilities and false discovery rates : two sides of the same coin
  • 2008
  • Ingår i: Journal of Proteome Research. - : American Chemical Society (ACS). - 1535-3893 .- 1535-3907. ; 7:1, s. 40-44
  • Tidskriftsartikel (refereegranskat)abstract
    • A variety of methods have been described in the literature for assigning statistical significance to peptides identified via tandem mass spectrometry. Here, we explain how two types of scores, the q-value and the posterior error probability, are related and complementary to one another.
  •  
9.
  • Käll, Lukas, 1969-, et al. (författare)
  • QVALITY : non-parametric estimation of q-values and posterior error probabilities
  • 2009
  • Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 25:7, s. 964-966
  • Tidskriftsartikel (refereegranskat)abstract
    • Qvality is a C++ program for estimating two types of standard statistical confidence measures: the q-value, which is an analog of the p-value that incorporates multiple testing correction, and the posterior error probability (PEP, also known as the local false discovery rate), which corresponds to the probability that a given observation is drawn from the null distribution. In computing q-values, qvality employs a standard bootstrap procedure to estimate the prior probability of a score being from the null distribution; for PEP estimation, qvality relies upon non-parametric logistic regression. Relative to other tools for estimating statistical confidence measures, qvality is unique in its ability to estimate both types of scores directly from a null distribution, without requiring the user to calculate p-values.
  •  
10.
  • Käll, Lukas, et al. (författare)
  • Semi-supervised learning for peptide identification from shotgun proteomics datasets
  • 2007
  • Ingår i: Nature Methods. - : Springer Science and Business Media LLC. - 1548-7091 .- 1548-7105. ; 4:11, s. 923-925
  • Tidskriftsartikel (refereegranskat)abstract
    • Shotgun proteomics uses liquid chromatography-tandem mass spectrometry to identify proteins in complex biological samples. We describe an algorithm, called Percolator, for improving the rate of confident peptide identifications from a collection of tandem mass spectra. Percolator uses semi-supervised machine learning to discriminate between correct and decoy spectrum identifications, correctly assigning peptides to 17% more spectra from a tryptic Saccharomyces cerevisiae dataset, and up to 77% more spectra from non-tryptic digests, relative to a fully supervised approach.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 16

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy