SwePub - sökning: WFRF:(The Matthew)

Numrering	Referens	Omslagsbild	Hitta
1.	Afkham, Heydar Maboudi, et al. (författare) Uncertainty estimation of predictions of peptides' chromatographic retention times in shotgun proteomics 2017 Ingår i: Bioinformatics. - : OXFORD UNIV PRESS. - 1367-4803 .- 1367-4811. ; 33:4, s. 508-513 Tidskriftsartikel (refereegranskat)abstract Motivation: Liquid chromatography is frequently used as a means to reduce the complexity of peptide-mixtures in shotgun proteomics. For such systems, the time when a peptide is released from a chromatography column and registered in the mass spectrometer is referred to as the peptide's retention time. Using heuristics or machine learning techniques, previous studies have demonstrated that it is possible to predict the retention time of a peptide from its amino acid sequence. In this paper, we are applying Gaussian Process Regression to the feature representation of a previously described predictor ELUDE. Using this framework, we demonstrate that it is possible to estimate the uncertainty of the prediction made by the model. Here we show how this uncertainty relates to the actual error of the prediction. Results: In our experiments, we observe a strong correlation between the estimated uncertainty provided by Gaussian Process Regression and the actual prediction error. This relation provides us with new means for assessment of the predictions. We demonstrate how a subset of the peptides can be selected with lower prediction error compared to the whole set. We also demonstrate how such predicted standard deviations can be used for designing adaptive windowing strategies.
2.	Chang, Yun Chien, et al. (författare) Decrypting lysine deacetylase inhibitor action and protein modifications by dose-resolved proteomics 2024 Ingår i: Cell Reports. - : Elsevier BV. - 2211-1247. ; 43:6 Tidskriftsartikel (refereegranskat)abstract Lysine deacetylase inhibitors (KDACis) are approved drugs for cutaneous T cell lymphoma (CTCL), peripheral T cell lymphoma (PTCL), and multiple myeloma, but many aspects of their cellular mechanism of action (MoA) and substantial toxicity are not well understood. To shed more light on how KDACis elicit cellular responses, we systematically measured dose-dependent changes in acetylation, phosphorylation, and protein expression in response to 21 clinical and pre-clinical KDACis. The resulting 862,000 dose-response curves revealed, for instance, limited cellular specificity of histone deacetylase (HDAC) 1, 2, 3, and 6 inhibitors; strong cross-talk between acetylation and phosphorylation pathways; localization of most drug-responsive acetylation sites to intrinsically disordered regions (IDRs); an underappreciated role of acetylation in protein structure; and a shift in EP300 protein abundance between the cytoplasm and the nucleus. This comprehensive dataset serves as a resource for the investigation of the molecular mechanisms underlying KDACi action in cells and can be interactively explored online in ProteomicsDB.
3.	Collaboration, The Theia, et al. (författare) Theia: Faint objects in motion or the new astrometry frontier 2017 Ingår i: arXiv.org. ; , s. 01348-1707 Tidskriftsartikel (refereegranskat)
4.	Ellinghaus, David, et al. (författare) Analysis of five chronic inflammatory diseases identifies 27 new associations and highlights disease-specific patterns at shared loci 2016 Ingår i: Nature Genetics. - New York, USA : Nature Publishing Group. - 1061-4036 .- 1546-1718. ; 48:5, s. 510-518 Tidskriftsartikel (refereegranskat)abstract We simultaneously investigated the genetic landscape of ankylosing spondylitis, Crohn's disease, psoriasis, primary sclerosing cholangitis and ulcerative colitis to investigate pleiotropy and the relationship between these clinically related diseases. Using high-density genotype data from more than 86,000 individuals of European ancestry, we identified 244 independent multidisease signals, including 27 new genome-wide significant susceptibility loci and 3 unreported shared risk loci. Complex pleiotropy was supported when contrasting multidisease signals with expression data sets from human, rat and mouse together with epigenetic and expressed enhancer profiles. The comorbidities among the five immune diseases were best explained by biological pleiotropy rather than heterogeneity (a subgroup of cases genetically identical to those with another disease, possibly owing to diagnostic misclassification, molecular subtypes or excessive comorbidity). In particular, the strong comorbidity between primary sclerosing cholangitis and inflammatory bowel disease is likely the result of a unique disease, which is genetically distinct from classical inflammatory bowel disease phenotypes.
5.	Griss, Johannes, et al. (författare) Response to "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra" 2018 Ingår i: Journal of Proteome Research. - : AMER CHEMICAL SOC. - 1535-3893 .- 1535-3907. ; 17:5, s. 1993-1996 Tidskriftsartikel (refereegranskat)abstract In the recent benchmarking article entitled "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra", Rieder et al. compared several different approaches to cluster MS/MS spectra. While we certainly recognize the value of the manuscript, here, we report some shortcomings detected in the original analyses. For most analyses, the authors clustered only single MS/MS runs. In one of the reported analyses, three MS/MS runs were processed together, which already led to computational performance issues in many of the tested approaches. This fact highlights the difficulties of using many of the tested algorithms on the nowadays produced average proteomics data sets. Second, the authors only processed identified spectra when merging MS runs. Thereby, all unidentified spectra that are of lower quality were already removed from the data set and could not influence the clustering results. Next, we found that the authors did not analyze the effect of chimeric spectra on the clustering results. In our analysis, we found that 3% of the spectra in the used data sets were chimeric, and this had marked effects on the behavior of the different clustering algorithms tested. Finally, the authors' choice to evaluate the MS-Cluster and spectra-cluster algorithms using a precursor tolerance of 5 Da for high-resolution Orbitrap data only was, in our opinion, not adequate to assess the performance of MS/MS clustering approaches.
6.	Halloran, John T., et al. (författare) Speeding Up Percolator 2019 Ingår i: Journal of Proteome Research. - : AMER CHEMICAL SOC. - 1535-3893 .- 1535-3907. ; 18:9, s. 3353-3359 Tidskriftsartikel (refereegranskat)abstract The processing of peptide tandem mass spectrometry data involves matching observed spectra against a sequence database. The ranking and calibration of these peptide-spectrum matches can be improved substantially using a machine learning postprocessor. Here, we describe our efforts to speed up one widely used postprocessor, Percolator. The improved software is dramatically faster than the previous version of Percolator, even when using relatively few processors. We tested the new version of Percolator on a data set containing over 215 million spectra and recorded an overall reduction to 23% of the running time as compared to the unoptimized code. We also show that the memory footprint required by these speedups is modest relative to that of the original version of Percolator.
7.	Lee, J. -Y, et al. (författare) ABRF Proteome Informatics Research Group (iPRG) 2016 Study : Inferring Proteoforms from Bottom-up Proteomics Data 2018 Ingår i: Journal of biomolecular techniques : JBT. - : NLM (Medline). - 1943-4731 .- 1524-0215. ; 29:2, s. 39-45 Tidskriftsartikel (refereegranskat)abstract This report presents the results from the 2016 Association of Biomolecular Resource Facilities Proteome Informatics Research Group (iPRG) study on proteoform inference and false discovery rate (FDR) estimation from bottom-up proteomics data. For this study, 3 replicate Q Exactive Orbitrap liquid chromatography-tandom mass spectrometry datasets were generated from each of 4 Escherichia coli samples spiked with different equimolar mixtures of small recombinant proteins selected to mimic pairs of homologous proteins. Participants were given raw data and a sequence file and asked to identify the proteins and provide estimates on the FDR at the proteoform level. As part of this study, we tested a new submission system with a format validator running on a virtual private server (VPS) and allowed methods to be provided as executable R Markdown or IPython Notebooks. The task was perceived as difficult, and only eight unique submissions were received, although those who participated did well with no one method performing best on all samples. However, none of the submissions included a complete Markdown or Notebook, even though examples were provided. Future iPRG studies need to be more successful in promoting and encouraging participation. The VPS and submission validator easily scale to much larger numbers of participants in these types of studies. The unique "ground-truth" dataset for proteoform identification generated for this study is now available to the research community, as are the server-side scripts for validating and managing submissions.
8.	Rubenson, Samuel, et al. (författare) Preface 2019 Ingår i: Sojourners : Monastic Letters and Spiritual Teachings from the Desert - Monastic Letters and Spiritual Teachings from the Desert. - 9781732985230 ; 1, s. 7-11 Bokkapitel (populärvet., debatt m.m.)
9.	Schober, Florian A., et al. (författare) The one-carbon pool controls mitochondrial energy metabolism via complex I and iron-sulfur clusters 2021 Ingår i: Science Advances. - : American Association for the Advancement of Science (AAAS). - 2375-2548. ; 7:8 Tidskriftsartikel (refereegranskat)abstract Induction of the one-carbon cycle is an early hallmark of mitochondrial dysfunction and cancer metabolism. Vital intermediary steps are localized to mitochondria, but it remains unclear how one-carbon availability connects to mitochondrial function. Here, we show that the one-carbon metabolite and methyl group donor S-adenosylmethionine (SAM) is pivotal for energy metabolism. A gradual decline in mitochondrial SAM (mitoSAM) causes hierarchical defects in fly and mouse, comprising loss of mitoSAM-dependent metabolites and impaired assembly of the oxidative phosphorylation system. Complex I stability and iron-sulfur cluster biosynthesis are directly controlled by mitoSAM levels, while other protein targets are predominantly methylated outside of the organelle before import. The mitoSAM pool follows its cytosolic production, establishing mitochondria as responsive receivers of one-carbon units. Thus, we demonstrate that cellular methylation potential is required for energy metabolism, with direct relevance for pathophysiology, aging, and cancer.
10.	Stevens, Kristen N, et al. (författare) 19p13.1 is a triple negative-specific breast cancer susceptibility locus 2012 Ingår i: Cancer Research. - 0008-5472 .- 1538-7445. ; 72, s. 1795- Tidskriftsartikel (refereegranskat)abstract The 19p13.1 breast cancer susceptibility locus is a modifier of breast cancer risk in BRCA1 mutation carriers and is also associated with risk of ovarian cancer. Here we investigated 19p13.1 variation and risk of breast cancer subtypes, defined by estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor-2 (HER2) status, using 48,869 breast cancer cases and 49,787 controls from the Breast Cancer Association Consortium (BCAC). Variants from 19p13.1 were not associated with breast cancer overall or with ER-positive breast cancer but were significantly associated with ER-negative breast cancer risk [rs8170 Odds Ratio (OR)=1.10, 95% Confidence Interval (CI) 1.05 - 1.15, p=3.49 x 10-5] and triple negative (TN) (ER, PR and HER2 negative) breast cancer [rs8170 OR=1.22, 95% CI 1.13 - 1.31, p=2.22 x 10-7]. However, rs8170 was no longer associated with ER-negative breast cancer risk when TN cases were excluded [OR=0.98, 95% CI 0.89 - 1.07, p=0.62]. In addition, a combined analysis of TN cases from BCAC and the Triple Negative Breast Cancer Consortium (TNBCC) (n=3,566) identified a genome-wide significant association between rs8170 and TN breast cancer risk [OR=1.25, 95% CI 1.18 - 1.33, p=3.31 x 10-13]. Thus, 19p13.1 is the first triple negative-specific breast cancer risk locus and the first locus specific to a histological subtype defined by ER, PR, and HER2 to be identified. These findings provide convincing evidence that genetic susceptibility to breast cancer varies by tumor subtype and that triple negative tumors and other subtypes likely arise through distinct etiologic pathways.
11.	The, Matthew, et al. (författare) A Protein Standard That Emulates Homology for the Characterization of Protein Inference Algorithms 2018 Ingår i: Journal of Proteome Research. - : American Chemical Society (ACS). - 1535-3893 .- 1535-3907. ; 17:5, s. 1879-1886 Tidskriftsartikel (refereegranskat)abstract A natural way to benchmark the performance of an analytical experimental setup is to use samples of known measured analytes are peptides and not the actual proteins one of the inherent problems of interpreting data is that the composition and see to what degree one can correctly infer the content of such a sample from the data. For shotgun proteomics, themselves. As some proteins share proteolytic peptides, there might be more than one possible causative set of proteins resulting in a given set of peptides and there is a need for mechanisms that infer proteins from lists of detected peptides. A weakness of commercially available samples of known content is that they consist of proteins that are deliberately selected for producing tryptic peptides that are unique to a single protein. Unfortunately, such samples do not expose any complications in protein inference. Hence, for a realistic benchmark of protein inference procedures, there is a need for samples of known content where the present proteins share peptides with known absent proteins. Here, we present such a standard, that is based on E. coli expressed human protein fragments. To illustrate the application of this standard, we benchmark a set of different protein inference procedures on the data. We observe that inference procedures excluding shared peptides provide more accurate estimates of errors compared to methods that include information from shared peptides, while still giving a reasonable performance in terms of the number of identified proteins. We also demonstrate that using a sample of known protein content without proteins with shared tryptic peptides can give a false sense of accuracy for many protein inference methods.
12.	The, Matthew, et al. (författare) Distillation of label-free quantification data by clustering and Bayesian modeling Annan publikation (övrigt vetenskapligt/konstnärligt)abstract In shotgun proteomics, the amount of information that can be extracted from label-free quantification experiments is typically limited by the identification rate as well as the noise level of the quantitative signals. This generally causes a low sensitivity in differential expression analysis on protein level. Here, we present a new method, MaRaQuant, in which we reverse the typical identification-first workflow into a quantification-first approach. Specifically, we apply unsupervised clustering on both MS1 and MS2 level to summarize all analytes of interest without assigning identities. This ensures that no valuable information is discarded due to analytes missing identification thresholds and allows us to spend more effort on the identification process due to the data reduction achieved by clustering. Furthermore, we propagate error probabilities from feature level all the way to protein level and input these to our probabilistic protein quantification method, Triqler. Applying this methodology to an engineered dataset, we managed to identify multiple analytes of interest that would have gone unnoticed in traditional pipelines, specifically, through the use of open modification and de novo searches. MaRaQuant/Triqler obtains significantly more identifications on all levels compared to MaxQuant/Perseus, including differentially expressed proteins. Notably, we managed to identify differentially expressed proteins in a clinical dataset where previously none were discovered. Furthermore, our differentially expressed proteins allowed us to attribute multiple functional annotation terms to both clinical datasets that we investigated.
13.	The, Matthew, et al. (författare) Fast and accurate protein false discovery rates on large-scale proteomics data sets with Percolator 3.0 Annan publikation (övrigt vetenskapligt/konstnärligt)abstract Percolator is a widely used software tool that increases yield in shotgun proteomics experiments and assigns reliable statistical confidence measures, such as q values and posterior error probabilities, to peptides and peptide-spectrum matches (PSMs) from such experiments. Percolator's processing speed has been sufficient for typical data sets consisting of hundreds of thousands of PSMs. With our new scalable approach, we can now also analyze millions of PSMs in a matter of minutes on a commodity computer. Furthermore,with the increasing awareness for the need for reliable statistics on the protein level, we compared several easy-to-understand protein inference methods and implemented the best-performing method - grouping proteins by their corresponding sets of theoretical peptides and then considering only the best-scoring peptide for each protein - in the Percolator package. We used Percolator 3.0 to analyze the data from a recent study of the draft human proteome containing 25 million spectra (PM:24870542).The source code and Ubuntu, Windows, MacOS and Fedora binary packages are available from http://percolator.ms/ under an Apache 2.0 license.
14.	The, Matthew, et al. (författare) Fast and Accurate Protein False Discovery Rates on Large-Scale Proteomics Data Sets with Percolator 3.0 2016 Ingår i: Journal of the American Society for Mass Spectrometry. - : Springer. - 1044-0305 .- 1879-1123. ; 27:11, s. 1719-1727 Tidskriftsartikel (refereegranskat)abstract Percolator is a widely used software tool that increases yield in shotgun proteomics experiments and assigns reliable statistical confidence measures, such as q values and posterior error probabilities, to peptides and peptide-spectrum matches (PSMs) from such experiments. Percolator’s processing speed has been sufficient for typical data sets consisting of hundreds of thousands of PSMs. With our new scalable approach, we can now also analyze millions of PSMs in a matter of minutes on a commodity computer. Furthermore, with the increasing awareness for the need for reliable statistics on the protein level, we compared several easy-to-understand protein inference methods and implemented the best-performing method—grouping proteins by their corresponding sets of theoretical peptides and then considering only the best-scoring peptide for each protein—in the Percolator package. We used Percolator 3.0 to analyze the data from a recent study of the draft human proteome containing 25 million spectra (PM:24870542). The source code and Ubuntu, Windows, MacOS, and Fedora binary packages are available from http://percolator.ms/ under an Apache 2.0 license. [Figure not available: see fulltext.]
15.	The, Matthew, et al. (författare) Focus on the spectra that matter by clustering of quantification data in shotgun proteomics 2020 Ingår i: Nature Communications. - : Springer Nature. - 2041-1723. ; 11:1 Tidskriftsartikel (refereegranskat)abstract In shotgun proteomics, the analysis of label-free quantification experiments is typically limited by the identification rate and the noise level in the quantitative data. This generally causes a low sensitivity in differential expression analysis. Here, we propose a quantification-first approach for peptides that reverses the classical identification-first workflow, thereby preventing valuable information from being discarded in the identification stage. Specifically, we introduce a method, Quandenser, that applies unsupervised clustering on both MS1 and MS2 level to summarize all analytes of interest without assigning identities. This reduces search time due to the data reduction. We can now employ open modification and de novo searches to identify analytes of interest that would have gone unnoticed in traditional pipelines. Quandenser+Triqler outperforms the state-of-the-art method MaxQuant+Perseus, consistently reporting more differentially abundant proteins for all tested datasets. Software is available for all major operating systems at https://github.com/statisticalbiotechnology/quandenser, under Apache 2.0 license. Matching mass spectra to peptide sequences is the usual first step in proteomics data analysis, often followed by peptide quantification. Here, the authors show that clustering and quantifying mass spectral features prior to peptide identification can increase the sensitivity of label-free quantitative proteomics.
16.	The, Matthew, et al. (författare) How to talk about protein-level false discovery rates in shotgun proteomics Annan publikation (övrigt vetenskapligt/konstnärligt)abstract A frequently sought output from a shotgun proteomics experiment is a list of proteins that we believe to have been present in the analyzed sample before proteolytic digestion. The standard technique to control for errors in such lists is to enforce a preset threshold for the false discovery rate. Many researchers consider protein-level false discovery rates a difficult and vague concept, as the measurement entities, spectra, are manifestations of peptides and not proteins. Here, we argue that this confusion is unnecessary and provide a framework on how to think about protein-level false discovery rates, starting from its basic principle: the null hypothesis. Specifically, we point out that two competing null hypotheses are used concurrently in today's protein inference methods, which has gone unnoticed by many. Using simulations of a shotgun proteomics experiment, we show how confusing one null hypothesis for the other can lead to serious discrepancies in the false discovery rate. Furthermore, we demonstrate how the same simulations can be used to verify false discovery rate estimates of protein inference methods. In particular, we show that, for a simple protein inference method, decoy models can be used to accurately estimate protein-level false discovery rates for both competing null hypotheses.
17.	The, Matthew, et al. (författare) How to talk about protein-level false discovery rates in shotgun proteomics 2016 Ingår i: Proteomics. - : Wiley-Blackwell. - 1615-9853 .- 1615-9861. ; 16:18, s. 2461-2469 Tidskriftsartikel (refereegranskat)abstract A frequently sought output from a shotgun proteomics experiment is a list of proteins that we believe to have been present in the analyzed sample before proteolytic digestion. The standard technique to control for errors in such lists is to enforce a preset threshold for the false discovery rate (FDR). Many consider protein-level FDRs a difficult and vague concept, as the measurement entities, spectra, are manifestations of peptides and not proteins. Here, we argue that this confusion is unnecessary and provide a framework on how to think about protein-level FDRs, starting from its basic principle: the null hypothesis. Specifically, we point out that two competing null hypotheses are used concurrently in today's protein inference methods, which has gone unnoticed by many. Using simulations of a shotgun proteomics experiment, we show how confusing one null hypothesis for the other can lead to serious discrepancies in the FDR. Furthermore, we demonstrate how the same simulations can be used to verify FDR estimates of protein inference methods. In particular, we show that, for a simple protein inference method, decoy models can be used to accurately estimate protein-level FDRs for both competing null hypotheses.
18.	The, Matthew, et al. (författare) Integrated identification and quantification error probabilities for shotgun proteomics Annan publikation (övrigt vetenskapligt/konstnärligt)abstract Protein quantification by label-free shotgun proteomics experiments is plagued by a multitude of error sources. Typical pipelines for identifying differentially expressed proteins use intermediate filters in an attempt to control the error rate. However, they often ignore certain error sources and, moreover, regard filtered lists as completely correct in subsequent steps. These two indiscretions can easily lead to a loss of control of the false discovery rate (FDR). We propose a probabilistic graphical model, Triqler, that propagates error information through all steps, employing distributions in favor of point estimates, most notably for missing value imputation. The model outputs posterior probabilities for fold changes between treatment groups, highlighting uncertainty rather than hiding it. We analyzed 3 engineered datasets and achieved FDR control and high sensitivity, even for truly absent proteins. In a bladder cancer clinical dataset we discovered 35 proteins at 5% FDR, with the original study discovering none at this threshold. Compellingly, these proteins showed enrichment for functional annotation terms. The model executes in minutes and is freely available at https://pypi.org/project/triqler/.
19.	The, Matthew, et al. (författare) Integrated Identification and Quantification Error Probabilities for Shotgun Proteomics 2019 Ingår i: Molecular & Cellular Proteomics. - : AMER SOC BIOCHEMISTRY MOLECULAR BIOLOGY INC. - 1535-9476 .- 1535-9484. ; 18:3, s. 561-570 Tidskriftsartikel (refereegranskat)abstract Protein quantification by label-free shotgun proteomics experiments is plagued by a multitude of error sources. Typical pipelines for identifying differential proteins use intermediate filters to control the error rate. However, they often ignore certain error sources and, moreover, regard filtered lists as completely correct in subsequent steps. These two indiscretions can easily lead to a loss of control of the false discovery rate (FDR). We propose a probabilistic graphical model, Triqler, that propagates error information through all steps, employing distributions in favor of point estimates, most notably for missing value imputation. The model outputs posterior probabilities for fold changes between treatment groups, highlighting uncertainty rather than hiding it. We analyzed 3 engineered data sets and achieved FDR control and high sensitivity, even for truly absent proteins. In a bladder cancer clinical data set we discovered 35 proteins at 5% FDR, whereas the original study discovered 1 and MaxQuant/Perseus 4 proteins at this threshold. Compellingly, these 35 proteins showed enrichment for functional annotation terms, whereas the top ranked proteins reported by MaxQuant/Perseus showed no enrichment. The model executes in minutes and is freely available at https://pypi.org/project/triqler/.
20.	The, Matthew, et al. (författare) MaRaCluster : A Fragment Rarity Metric for Clustering Fragment Spectra in Shotgun Proteomics 2016 Ingår i: Journal of Proteome Research. - : American Chemical Society (ACS). - 1535-3893 .- 1535-3907. ; 15:3, s. 713-720 Tidskriftsartikel (refereegranskat)abstract Shotgun proteomics experiments generate large amounts of fragment spectra as primary data, normally with high redundancy between and within experiments. Here, we have devised a clustering technique to identify fragment spectra stemming from the same species of peptide. This is a powerful alternative method to traditional search engines for analyzing spectra, specifically useful for larger scale mass spectrometry studies. As an aid in this process, we propose a distance calculation relying on the rarity of experimental fragment peaks, following the intuition that peaks shared by only a few spectra offer more evidence than peaks shared by a large number of spectra. We used this distance calculation and a complete-linkage scheme to cluster data from a recent large-scale mass spectrometry-based study. The clusterings produced by our method have up to 40% more identified peptides for their consensus spectra compared to those produced by the previous state-of-the-art method. We see that our method would advance the construction of spectral libraries as well as serve as a tool for mining large sets of fragment spectra. The source code and Ubuntu binary packages are available at https://github.com/ statisticalbiotechnology/maracluster (under an Apache 2.0 license).
21.	The, Matthew (författare) Statistical and machine learning methods to analyze large-scale mass spectrometry data 2016 Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract As in many other fields, biology is faced with enormous amounts ofdata that contains valuable information that is yet to be extracted. The field of proteomics, the study of proteins, has the luxury of having large repositories containing data from tandem mass-spectrometry experiments, readily accessible for everyone who is interested. At the same time, there is still a lot to discover about proteins as the main actors in cell processes and cell signaling.In this thesis, we explore several methods to extract more information from the available data using methods from statistics and machine learning. In particular, we introduce MaRaCluster, a new method for clustering mass spectra on large-scale datasets. This method uses statistical methods to assess similarity between mass spectra, followed by the conservative complete-linkage clustering algorithm.The combination of these two resulted in up to 40% more peptide identifications on its consensus spectra compared to the state of the art method.Second, we attempt to clarify and promote protein-level false discovery rates (FDRs). Frequently, studies fail to report protein-level FDRs even though the proteins are actually the entities of interest. We provided a framework in which to discuss protein-level FDRs in a systematic manner to open up the discussion and take away potential hesitance. We also benchmarked some scalable protein inference methods and included the best one in the Percolator package. Furthermore, we added functionality to the Percolator package to accommodate the analysis of studies in which many runs are aggregated. This reduced the run time for a recent study regarding a draft human proteome from almost a full day to just 10 minutes on a commodity computer, resulting in a list of proteins together with their corresponding protein-level FDRs.
22.	The, Matthew (författare) Statistical and machine learning methods to analyze large-scale mass spectrometry data 2018 Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract Modern biology is faced with vast amounts of data that contain valuable information yet to be extracted. Proteomics, the study of proteins, has repositories with thousands of mass spectrometry experiments. These data gold mines could further our knowledge of proteins as the main actors in cell processes and signaling. Here, we explore methods to extract more information from this data using statistical and machine learning methods.First, we present advances for studies that aggregate hundreds of runs. We introduce MaRaCluster, which clusters mass spectra for large-scale datasets using statistical methods to assess similarity of spectra. It identified up to 40% more peptides than the state-of-the-art method, MS-Cluster. Further, we accommodated large-scale data analysis in Percolator, a popular post-processing tool for mass spectrometry data. This reduced the runtime for a draft human proteome study from a full day to 10 minutes.Second, we clarify and promote the contentious topic of protein false discovery rates (FDRs). Often, studies report lists of proteins but fail to report protein FDRs. We provide a framework to systematically discuss protein FDRs and take away hesitance. We also added protein FDRs to Percolator, opting for the best-peptide approach which proved superior in a benchmark of scalable protein inference methods.Third, we tackle the low sensitivity of protein quantification methods. Current methods lack proper control of error sources and propagation. To remedy this, we developed Triqler, which controls the protein quantification FDR through a Bayesian framework. We also introduce MaRaQuant, which proposes a quantification-first approach that applies clustering prior to identification. This reduced the number of spectra to be searched and allowed us to spot unidentified analytes of interest. Combining these tools outperformed the state-of-the-art method, MaxQuant/Perseus, and found enriched functional terms for datasets that had none before.
23.	The, Matthew, et al. (författare) Triqler for MaxQuant : Enhancing Results from MaxQuant by Bayesian Error Propagation and Integration 2021 Ingår i: Journal of Proteome Research. - : American Chemical Society (ACS). - 1535-3893 .- 1535-3907. ; 20:4, s. 2062-2068 Tidskriftsartikel (refereegranskat)abstract Error estimation for differential protein quantification by label-free shotgun proteomics is challenging due to the multitude of error sources, each contributing uncertainty to the final results. We have previously designed a Bayesian model, Triqler, to combine such error terms into one combined quantification error. Here we present an interface for Triqler that takes MaxQuant results as input, allowing quick reanalysis of already processed data. We demonstrate that Triqler outperforms the original processing for a large set of both engineered and clinical/biological relevant data sets. Triqler and its interface to MaxQuant are available as a Python module under an Apache 2.0 license from https://pypi.org/project/triqler/.
24.	Truong, Patrick, et al. (författare) Triqler for Protein Summarization of Data from Data-Independent Acquisition Mass Spectrometry 2023 Ingår i: Journal of Proteome Research. - : American Chemical Society (ACS). - 1535-3893 .- 1535-3907. ; 22:4, s. 1359-1366 Tidskriftsartikel (refereegranskat)abstract A frequent goal, or subgoal, when processing data from a quantitative shotgun proteomics experiment is a list of proteins that are differentially abundant under the examined experimental conditions. Unfortunately, obtaining such a list is a challenging process, as the mass spectrometer analyzes the proteolytic peptides of a protein rather than the proteins themselves. We have previously designed a Bayesian hierarchical probabilistic model, Triqler, for combining peptide identification and quantification errors into probabilities of proteins being differentially abundant. However, the model was developed for data from data-dependent acquisition. Here, we show that Triqler is also compatible with data-independent acquisition data after applying minor alterations for the missing value distribution. Furthermore, we find that it has better performance than a set of compared state-of-the-art protein summarization tools when evaluated on data-independent acquisition data.

Skapa referenser, mejla, bekava och länka

Länka till träfflistan

Träfflista för sökning "WFRF:(The Matthew) "

Avgränsa träffmängd

År