SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Capuccini Marco) "

Sökning: WFRF:(Capuccini Marco)

  • Resultat 1-10 av 12
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  •  
2.
  • Ahmed, Laeeq, et al. (författare)
  • Efficient iterative virtual screening with Apache Spark and conformal prediction
  • 2018
  • Ingår i: Journal of Cheminformatics. - : BioMed Central. - 1758-2946. ; 10
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Docking and scoring large libraries of ligands against target proteins forms the basis of structure-based virtual screening. The problem is trivially parallelizable, and calculations are generally carried out on computer clusters or on large workstations in a brute force manner, by docking and scoring all available ligands. Contribution: In this study we propose a strategy that is based on iteratively docking a set of ligands to form a training set, training a ligand-based model on this set, and predicting the remainder of the ligands to exclude those predicted as 'low-scoring' ligands. Then, another set of ligands are docked, the model is retrained and the process is repeated until a certain model efficiency level is reached. Thereafter, the remaining ligands are docked or excluded based on this model. We use SVM and conformal prediction to deliver valid prediction intervals for ranking the predicted ligands, and Apache Spark to parallelize both the docking and the modeling. Results: We show on 4 different targets that conformal prediction based virtual screening (CPVS) is able to reduce the number of docked molecules by 62.61% while retaining an accuracy for the top 30 hits of 94% on average and a speedup of 3.7. The implementation is available as open source via GitHub (https://github.com/laeeq80/spark-cpvs) and can be run on high-performance computers as well as on cloud resources.
  •  
3.
  •  
4.
  • Capuccini, Marco (författare)
  • Enabling Scalable Data Analysis on Cloud Resources with Applications in Life Science
  • 2019
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Over the past 20 years, the rise of high-throughput methods in life science has enabled research laboratories to produce massive datasets of biological interest. When dealing with this "data deluge" of modern biology researchers encounter two major challenges: first, there is a need for substantial technical skills for dealing with Big Data and; second, infrastructure procurement becomes difficult. In connection to this second challenge, the computing model and business trend that was originally popularized by Amazon under the name of cloud computing represents an interesting opportunity. Instead of buying computing infrastructure upfront, cloud providers enable the allocation and release of virtual resources on-demand. These resources are then billed with a pay-per-use pricing model and physical infrastructure management is delegated to the provider. In this thesis, we introduce a number of methods for running Big Data analyses of biological interest using cloud computing. Considerable efforts were made in enabling the application of trusted, bioinformatics software to Big Data scenarios as opposed to reimplementing the existing codebase. Further, we improve the accessibility of the technology with the aim of reducing the entry barrier for biologists. The thesis includes 5 papers. In Papers I and II, we explore the applicability of Apache Spark, one of the leading Big Data analytics platforms in cloud environments, to two drug-discovery use cases. In Paper III, we present a general method for running bioinformatics analyses on the cloud using the microservices-oriented architecture. In Paper IV, we introduce a method that combines microservices and Apache Spark with the aim of providing the best of both technologies. In Paper V, we discuss how to reduce the entry barrier for the allocation of cloud research environments. We show that all of the developed methods scale well and we provide high-level programming interfaces for improving accessibility. We have also made the developed software publicly available.
  •  
5.
  • Capuccini, Marco, et al. (författare)
  • Large-scale virtual screening on public cloud resources with Apache Spark
  • 2017
  • Ingår i: Journal of Cheminformatics. - : BioMed Central. - 1758-2946. ; 9
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. Results: We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against similar to 2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. Conclusion: Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries.
  •  
6.
  • Capuccini, Marco, et al. (författare)
  • MaRe : Processing Big Data with application containers on Apache Spark
  • 2020
  • Ingår i: GigaScience. - : Oxford University Press. - 2047-217X. ; 9:5
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Life science is increasingly driven by Big Data analytics, and the MapReduce programming model has been proven successful for data-intensive analyses. However, current MapReduce frameworks offer poor support for reusing existing processing tools in bioinformatics pipelines. Furthermore, these frameworks do not have native support for application containers, which are becoming popular in scientific data processing. Results: Here we present MaRe, an open source programming library that introduces support for Docker containers in Apache Spark. Apache Spark and Docker are the MapReduce framework and container engine that have collected the largest open source community; thus, MaRe provides interoperability with the cutting-edge software ecosystem. We demonstrate MaRe on 2 data-intensive applications in life science, showing ease of use and scalability. Conclusions: MaRe enables scalable data-intensive processing in life science with Apache Spark and application containers. When compared with current best practices, which involve the use of workflow systems, MaRe has the advantage of providing data locality, ingestion from heterogeneous storage systems, and interactive processing. MaRe is generally applicable and available as open source software.
  •  
7.
  •  
8.
  • Capuccini, Marco, et al. (författare)
  • On-demand virtual research environments using microservices
  • 2019
  • Ingår i: PeerJ Computer Science. - : PeerJ. - 2376-5992. ; 5
  • Tidskriftsartikel (refereegranskat)abstract
    • The computational demands for scientific applications are continuously increasing. The emergence of cloud computing has enabled on-demand resource allocation. However, relying solely on infrastructure as a service does not achieve the degree of flexibility required by the scientific community. Here we present a microservice-oriented methodology, where scientific applications run in a distributed orchestration platform as software containers, referred to as on-demand, virtual research environments. The methodology is vendor agnostic and we provide an open source implementation that supports the major cloud providers, offering scalable management of scientific pipelines. We demonstrate applicability and scalability of our methodology in life science applications, but the methodology is general and can be applied to other scientific domains.
  •  
9.
  • Emami Khoonsari, Payam, et al. (författare)
  • Interoperable and scalable data analysis with microservices : Applications in metabolomics
  • 2019
  • Ingår i: Bioinformatics. - : Oxford University Press (OUP). - 1367-4803 .- 1367-4811. ; 35:19, s. 3752-3760
  • Tidskriftsartikel (refereegranskat)abstract
    • MotivationDeveloping a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed using the Kubernetes container orchestrator.ResultsWe developed a Virtual Research Environment (VRE) which facilitates rapid integration of new tools and developing scalable and interoperable workflows for performing metabolomics data analysis. The environment can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry, one nuclear magnetic resonance spectroscopy and one fluxomics study. We showed that the method scales dynamically with increasing availability of computational resources. We demonstrated that the method facilitates interoperability using integration of the major software suites resulting in a turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, statistics and identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science.
  •  
10.
  • Olsson, Henrik, et al. (författare)
  • Estimating diagnostic uncertainty in artificial intelligence assisted pathology using conformal prediction
  • 2022
  • Ingår i: Nature Communications. - : Springer Nature. - 2041-1723. ; 13:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Unreliable predictions can occur when an artificial intelligence (AI) system is presented with data it has not been exposed to during training. We demonstrate the use of conformal prediction to detect unreliable predictions, using histopathological diagnosis and grading of prostate biopsies as example. We digitized 7788 prostate biopsies from 1192 men in the STHLM3 diagnostic study, used for training, and 3059 biopsies from 676 men used for testing. With conformal prediction, 1 in 794 (0.1%) predictions is incorrect for cancer diagnosis (compared to 14 errors [2%] without conformal prediction) while 175 (22%) of the predictions are flagged as unreliable when the AI-system is presented with new data from the same lab and scanner that it was trained on. Conformal prediction could with small samples (N = 49 for external scanner, N = 10 for external lab and scanner, and N = 12 for external lab, scanner and pathology assessment) detect systematic differences in external data leading to worse predictive performance. The AI-system with conformal prediction commits 3 (2%) errors for cancer detection in cases of atypical prostate tissue compared to 44 (25%) without conformal prediction, while the system flags 143 (80%) unreliable predictions. We conclude that conformal prediction can increase patient safety of AI-systems.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 12
Typ av publikation
tidskriftsartikel (8)
konferensbidrag (2)
annan publikation (1)
doktorsavhandling (1)
Typ av innehåll
refereegranskat (9)
övrigt vetenskapligt/konstnärligt (3)
Författare/redaktör
Capuccini, Marco (12)
Toor, Salman (5)
Larsson, Anders (4)
Spjuth, Ola (4)
Spjuth, Ola, Docent (4)
Emami Khoonsari, Pay ... (3)
visa fler...
Kultima, Kim (3)
Carone, Matteo (3)
Novella, Jon Ander (3)
Sadawi, Noureddin (3)
Hellander, Andreas (2)
Laure, Erwin (2)
Hankemeier, Thomas (2)
Ahmed, Laeeq (2)
Schaal, Wesley, PhD (2)
Spjuth, Ola, Docent, ... (2)
Neumann, Steffen (2)
Lampa, Samuel (2)
Spjuth, Ola, Profess ... (2)
Salek, Reza M (2)
Kale, Namrata (2)
Haug, Kenneth (2)
Schober, Daniel (2)
Rocca-Serra, Philipp ... (2)
Steinbeck, Christoph (2)
de Atauri, Pedro (2)
Cascante, Marta (2)
Zanetti, Gianluigi (2)
Dahlö, Martin (2)
Bergmann, Sven (2)
Lindskog, Cecilia (1)
Egevad, Lars (1)
Eklund, Martin (1)
Carlsson, Lars (1)
Norinder, Ulf, 1956- (1)
Georgiev, Valentin (1)
Burman, Joachim, 197 ... (1)
Tordsson, Johan, Doc ... (1)
Viklund, Lars (1)
Olsson, Henrik (1)
Schaal, Wesley (1)
Möller, J (1)
O'Donovan, Claire (1)
Notredame, Cedric (1)
Murtagh, Donal, 1959 (1)
Ebbels, Timothy M D (1)
Glen, Robert (1)
Samaratunga, Hemamal ... (1)
Pearce, Jake T. M. (1)
Gao, Jianliang (1)
visa färre...
Lärosäte
Uppsala universitet (12)
Kungliga Tekniska Högskolan (2)
Umeå universitet (1)
Örebro universitet (1)
Chalmers tekniska högskola (1)
Karolinska Institutet (1)
Språk
Engelska (12)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (10)
Teknik (3)
Medicin och hälsovetenskap (2)
Humaniora (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy