SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Elofsson Arne) srt2:(2020-2021)"

Sökning: WFRF:(Elofsson Arne) > (2020-2021)

  • Resultat 1-10 av 15
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Attwood, Misty M. (författare)
  • Membrane-bound proteins : Characterization, evolution, and functional analysis
  • 2020
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Alpha-helical transmembrane proteins are important components of many essential cell processes including signal transduction, transport of molecules across membranes, protein and membrane trafficking, and structural and adhesion activities, amongst others. Their involvement in critical networks makes them the focus of interest in investigating disease pathways, as candidate drug targets, and in evolutionary analyses to identify homologous protein families and possible functional activities. Transmembrane (TM) proteins can be categorized into major groups based the same gross structure, i.e., the number of transmembrane helices, which are often correlated with specific functional activities, for example as receptors or transporters. The focus of this thesis was to analyze the evolution of the membrane proteome from the last holozoan common ancestor (LHCA) through metazoans to garner insight into the fundamental functional clusters that underlie metazoan diversity and innovation. Twenty-four eukaryotic proteomes were analyzed, with results showing more than 70% of metazoan transmembrane protein families have a pre-metazoan origin. In concert with that, we characterized the previously unstudied groups of human proteins with three, four, and five membrane-spanning regions (3TM, 4TM, and 5TM) and analyzed their functional activities, involvement in disease pathways, and unique characteristics. Combined, we manually curated and classified nearly 11% of the human transmembrane proteome with these three studies. The 3TM data set included 152 proteins, with nearly 45% that localize specifically to the endoplasmic reticulum (ER), and are involved in membrane biosynthesis and lipid biogenesis, proteins trafficking, catabolic processes, and signal transduction due to the large ionotropic glutamate receptor family. The 373 proteins identified in the 4TM data set are predominantly involved in transport activities, as well as cell communication and adhesion, and function as structural elements. The compact 5TM data set includes 58 proteins that engage in localization and transport activities, such as protein targeting, membrane trafficking, and vesicle transport. Notably, ~60% are identified as cancer prognostic markers that are associated with clinical outcomes of different tumour types. This thesis investigates the evolutionary origins of the human transmembrane proteome, characterizes formerly dark areas of the membrane proteome, and extends the fundamental knowledge of transmembrane proteins.
  •  
2.
  • Baldassarre, Federico, et al. (författare)
  • GraphQA: Protein Model Quality Assessment using Graph Convolutional Networks
  • 2020
  • Ingår i: Bioinformatics. - : Oxford University Press. - 1367-4803 .- 1367-4811 .- 1460-2059. ; 37:3, s. 360-366
  • Tidskriftsartikel (refereegranskat)abstract
    • MotivationProteins are ubiquitous molecules whose function in biological processes is determined by their 3D structure. Experimental identification of a protein’s structure can be time-consuming, prohibitively expensive, and not always possible. Alternatively, protein folding can be modeled using computational methods, which however are not guaranteed to always produce optimal results.GraphQA is a graph-based method to estimate the quality of protein models, that possesses favorable properties such as representation learning, explicit modeling of both sequential and 3D structure, geometric invariance, and computational efficiency.ResultsGraphQA performs similarly to state-of-the-art methods despite using a relatively low number of input features. In addition, the graph network structure provides an improvement over the architecture used in ProQ4 operating on the same input features. Finally, the individual contributions of GraphQA components are carefully evaluated.Availability and implementationPyTorch implementation, datasets, experiments, and link to an evaluation server are available through this GitHub repository: github.com/baldassarreFe/graphqaSupplementary informationSupplementary material is available at Bioinformatics online.
  •  
3.
  • Bassot, Claudio, et al. (författare)
  • Accurate contact-based modelling of repeat proteins predicts the structure of new repeats protein families
  • 2021
  • Ingår i: PloS Computational Biology. - : Public Library of Science (PLoS). - 1553-734X .- 1553-7358. ; 17:4
  • Tidskriftsartikel (refereegranskat)abstract
    • Repeat proteins are widespread among organisms and particularly abundant in eukaryotic proteomes. Their primary sequence presents repetition in the amino acid sequences that origin structures with repeated folds/domains. Although the repeated units often can be recognised from the sequence alone, often structural information is missing. Here, we used contact prediction for predicting the structure of repeats protein directly from their primary sequences. We benchmark the methods on a dataset comprehensive of all the known repeated structures. We evaluate the contact predictions and the obtained models for different classes of repeat proteins. Further, we develop and benchmark a quality assessment (QA) method specific for repeat proteins. Finally, we used the prediction pipeline for all PFAM repeat families without resolved structures and found that forty-one of them could be modelled with high accuracy. Repeat proteins are abundant in eukaryotic proteomes. They are involved in many eukaryotic specific functions, including signalling. For many of these proteins, the structure is not known, as they are difficult to crystallise. Today, using direct coupling analysis and deep learning it is often possible to predict a protein's structure. However, the unique sequence features present in repeat proteins have been a challenge to use direct coupling analysis for predicting contacts. Here, we show that deep learning-based methods (trRosetta, DeepMetaPsicov (DMP) and PconsC4) overcomes this problem and can predict intra- and inter-unit contacts in repeat proteins. In a benchmark dataset of 815 repeat proteins, about 90% can be correctly modelled. Further, among 48 PFAM families lacking a protein structure, we produce models of forty-one families with estimated high accuracy.
  •  
4.
  • Bryant, Patrick, et al. (författare)
  • Decomposing Structural Response Due to Sequence Changes in Protein Domains with Machine Learning
  • 2020
  • Ingår i: Journal of Molecular Biology. - : Elsevier BV. - 0022-2836 .- 1089-8638. ; 432:16, s. 4435-4446
  • Tidskriftsartikel (refereegranskat)abstract
    • How protein domain structure changes in response to mutations is not well understood. Some mutations change the structure drastically, while most only result in small changes. To gain an understanding of this, we decompose the relationship between changes in domain sequence and structure using machine learning. We select pairs of evolutionarily related domains with a broad range of evolutionary distances. In contrast to earlier studies, we do not find a strictly linear relationship between sequence and structural changes. We train a random forest regressor that predicts the structural similarity between pairs with an average accuracy of 0.029 IDDT ( local Distance Difference Test) score, and a correlation coefficient of 0.92. Decomposing the feature importance shows that the domain length, or analogously, size is the most important feature. Our model enables assessing deviations in relative structural response, and thus prediction of evolutionary trajectories, in protein domains across evolution.
  •  
5.
  • Bryant, Patrick, et al. (författare)
  • Estimating the impact of mobility patterns on COVID-19 infection rates in 11 European countries
  • 2020
  • Ingår i: PeerJ. - : PeerJ. - 2167-8359. ; 8
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: As governments across Europe have issued non-pharmaceutical interventions (NPIs) such as social distancing and school closing, the mobility patterns in these countries have changed. Most states have implemented similar NPIs at similar time points. However, it is likely different countries and populations respond differently to the NPIs and that these differences cause mobility patterns and thereby the epidemic development to change.Methods: We build a Bayesian model that estimates the number of deaths on a given day dependent on changes in the basic reproductive number, R-0, due to differences in mobility patterns. We utilise mobility data from Google mobility reports using five different categories: retail and recreation, grocery and pharmacy, transit stations, workplace and residential. The importance of each mobility category for predicting changes in R-0 is estimated through the model.Findings: The changes in mobility have a considerable overlap with the introduction of governmental NPIs, highlighting the importance of government action for population behavioural change. The shift in mobility in all categories shows high correlations with the death rates 1 month later. Reduction of movement within the grocery and pharmacy sector is estimated to account for most of the decrease in R-0.Interpretation: Our model predicts 3-week epidemic forecasts, using real-time observations of changes in mobility patterns, which can provide governments with direct feedback on the effects of their NPIs. The model predicts the changes in a majority of the countries accurately but overestimates the impact of NPIs in Sweden and Denmark and underestimates them in France and Belgium. We also note that the exponential nature of all epidemiological models based on the basic reproductive number, R-0 cause small errors to have extensive effects on the predicted outcome.
  •  
6.
  • Delucchi, Matteo, et al. (författare)
  • A New Census of Protein Tandem Repeats and Their Relationship with Intrinsic Disorder
  • 2020
  • Ingår i: Genes. - : MDPI AG. - 2073-4425. ; 11:4
  • Tidskriftsartikel (refereegranskat)abstract
    • Protein tandem repeats (TRs) are often associated with immunity-related functions and diseases. Since that last census of protein TRs in 1999, the number of curated proteins increased more than seven-fold and new TR prediction methods were published. TRs appear to be enriched with intrinsic disorder and vice versa. The significance and the biological reasons for this association are unknown. Here, we characterize protein TRs across all kingdoms of life and their overlap with intrinsic disorder in unprecedented detail. Using state-of-the-art prediction methods, we estimate that 50.9% of proteins contain at least one TR, often located at the sequence flanks. Positive linear correlation between the proportion of TRs and the protein length was observed universally, with Eukaryotes in general having more TRs, but when the difference in length is taken into account the difference is quite small. TRs were enriched with disorder-promoting amino acids and were inside intrinsically disordered regions. Many such TRs were homorepeats. Our results support that TRs mostly originate by duplication and are involved in essential functions such as transcription processes, structural organization, electron transport and iron-binding. In viruses, TRs are found in proteins essential for virulence.
  •  
7.
  • Grapotte, M, et al. (författare)
  • Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network
  • 2021
  • Ingår i: Nature communications. - : Springer Science and Business Media LLC. - 2041-1723. ; 12:1, s. 3297-
  • Tidskriftsartikel (refereegranskat)abstract
    • Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism.
  •  
8.
  • Hatos, Andras, et al. (författare)
  • DisProt : intrinsic protein disorder annotation in 2020
  • 2020
  • Ingår i: Nucleic Acids Research. - : Oxford University Press (OUP). - 0305-1048 .- 1362-4962. ; 48:D1, s. D269-D276
  • Tidskriftsartikel (refereegranskat)abstract
    • The Database of Protein Disorder (DisProt, URL:https://disprot.org) provides manually curated annotations of intrinsically disordered proteins from the literature. Here we report recent developments with DisProt (version 8), including the doubling of protein entries, a new disorder ontology, improvements of the annotation format and a completely new website. The website includes a redesigned graphical interface, a better search engine, a clearer API for programmatic access and a new annotation interface that integrates text mining technologies. The new entry format provides a greater flexibility, simplifies maintenance and allows the capture of more information from the literature. The new disorder ontology has been formalized and made interoperable by adopting the OWL format, as well as its structure and term definitions have been improved. The new annotation interface has made the curation process faster and more effective. We recently showed that new DisProt annotations can be effectively used to train and validate disorder predictors. We believe the growth of DisProt will accelerate, contributing to the improvement of function and disorder predictors and therefore to illuminate the 'dark' proteome.
  •  
9.
  • Laine, Elodie, et al. (författare)
  • Protein sequence-to-structure learning : Is this the end(-to-end revolution)?
  • 2021
  • Ingår i: Proteins. - : Wiley. - 0887-3585 .- 1097-0134. ; 89:12, s. 1770-1786
  • Forskningsöversikt (refereegranskat)abstract
    • The potential of deep learning has been recognized in the protein structure prediction community for some time, and became indisputable after CASP13. In CASP14, deep learning has boosted the field to unanticipated levels reaching near-experimental accuracy. This success comes from advances transferred from other machine learning areas, as well as methods specifically designed to deal with protein sequences and structures, and their abstractions. Novel emerging approaches include (i) geometric learning, that is, learning on representations such as graphs, three-dimensional (3D) Voronoi tessellations, and point clouds; (ii) pretrained protein language models leveraging attention; (iii) equivariant architectures preserving the symmetry of 3D space; (iv) use of large meta-genome databases; (v) combinations of protein representations; and (vi) finally truly end-to-end architectures, that is, differentiable models starting from a sequence and returning a 3D structure. Here, we provide an overview and our opinion of the novel deep learning approaches developed in the last 2 years and widely used in CASP14.
  •  
10.
  • Lamb, John, et al. (författare)
  • pyconsFold : a fast and easy tool for modeling and docking using distance predictions
  • 2021
  • Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811 .- 1460-2059. ; 37:21, s. 3959-3960
  • Tidskriftsartikel (refereegranskat)abstract
    • Motivation: Contact predictions within a protein have recently become a viable method for accurate prediction of protein structure. Using predicted distance distributions has been shown in many cases to be superior to only using a binary contact annotation. Using predicted interprotein distances has also been shown to be able to dock some protein dimers.Results: Here, we present pyconsFold. Using CNS as its underlying folding mechanism and predicted contact distance it outperforms regular contact prediction-based modeling on our dataset of 210 proteins. It performs marginally worse than the state-of-the-art pyRosetta folding pipeline but is on average about 20 times faster per model. More importantly pyconsFold can also be used as a fold-and-dock protocol by using predicted interprotein contacts/distances to simultaneously fold and dock two protein chains.Availability and implementation: pyconsFold is implemented in Python 3 with a strong focus on using as few dependencies as possible for longevity. It is available both as a pip package in Python 3 and as source code on GitHub and is published under the GPLv3 license. The data underlying this article together with source code are available on github, at https://github.com/johnlamb/pyconsfold.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 15

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy