SwePub
Sök i LIBRIS databas

  Utökad sökning

L773:1758 2946
 

Sökning: L773:1758 2946 > Probabilistic Rando...

Probabilistic Random Forest improves bioactivity predictions close to the classification threshold by taking into account experimental uncertainty

Mervin, Lewis H. (författare)
AstraZeneca AB
Trapotsi, Maria Anna (författare)
University Of Cambridge
Afzal, Avid M. (författare)
AstraZeneca AB
visa fler...
Barrett, Ian P. (författare)
AstraZeneca AB
Bender, Andreas (författare)
University Of Cambridge
Engkvist, Ola, 1967 (författare)
AstraZeneca AB,Chalmers tekniska högskola,Chalmers University of Technology
visa färre...
 (creator_code:org_t)
2021-08-19
2021
Engelska.
Ingår i: Journal of Cheminformatics. - : Springer Science and Business Media LLC. - 1758-2946 .- 1758-2946. ; 13:1
  • Tidskriftsartikel (refereegranskat)
Abstract Ämnesord
Stäng  
  • Measurements of protein–ligand interactions have reproducibility limits due to experimental errors. Any model based on such assays will consequentially have such unavoidable errors influencing their performance which should ideally be factored into modelling and output predictions, such as the actual standard deviation of experimental measurements (σ) or the associated comparability of activity values between the aggregated heterogenous activity units (i.e., Ki versus IC50 values) during dataset assimilation. However, experimental errors are usually a neglected aspect of model generation. In order to improve upon the current state-of-the-art, we herein present a novel approach toward predicting protein–ligand interactions using a Probabilistic Random Forest (PRF) classifier. The PRF algorithm was applied toward in silico protein target prediction across ~ 550 tasks from ChEMBL and PubChem. Predictions were evaluated by taking into account various scenarios of experimental standard deviations in both training and test sets and performance was assessed using fivefold stratified shuffled splits for validation. The largest benefit in incorporating the experimental deviation in PRF was observed for data points close to the binary threshold boundary, when such information was not considered in any way in the original RF algorithm. For example, in cases when σ ranged between 0.4–0.6 log units and when ideal probability estimates between 0.4–0.6, the PRF outperformed RF with a median absolute error margin of ~ 17%. In comparison, the baseline RF outperformed PRF for cases with high confidence to belong to the active class (far from the binary decision threshold), although the RF models gave errors smaller than the experimental uncertainty, which could indicate that they were overtrained and/or over-confident. Finally, the PRF models trained with putative inactives decreased the performance compared to PRF models without putative inactives and this could be because putative inactives were not assigned an experimental pXC50 value, and therefore they were considered inactives with a low uncertainty (which in practice might not be true). In conclusion, PRF can be useful for target prediction models in particular for data where class boundaries overlap with the measurement uncertainty, and where a substantial part of the training data is located close to the classification threshold.

Ämnesord

MEDICIN OCH HÄLSOVETENSKAP  -- Medicinsk bioteknologi -- Biomedicinsk laboratorievetenskap/teknologi (hsv//swe)
MEDICAL AND HEALTH SCIENCES  -- Medical Biotechnology -- Biomedical Laboratory Science/Technology (hsv//eng)
NATURVETENSKAP  -- Data- och informationsvetenskap -- Bioinformatik (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Bioinformatics (hsv//eng)
NATURVETENSKAP  -- Matematik -- Sannolikhetsteori och statistik (hsv//swe)
NATURAL SCIENCES  -- Mathematics -- Probability Theory and Statistics (hsv//eng)

Nyckelord

Uncertainty estimation
Applicability Domain
Probabilistic random forest
Experimental error
Target prediction

Publikations- och innehållstyp

art (ämneskategori)
ref (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy