SwePub
Sök i LIBRIS databas

  Extended search

onr:"swepub:oai:DiVA.org:his-3542"
 

Search: onr:"swepub:oai:DiVA.org:his-3542" > Utilizing Informati...

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist

Utilizing Information on Uncertainty for In Silico Modeling using Random Forests

Boström, Henrik (author)
Högskolan i Skövde,Institutionen för kommunikation och information,Forskningscentrum för Informationsteknologi,Stockholm University, Sweden,Skövde Artificial Intelligence Lab (SAIL),Högskolan i Skövde, Institutionen för kommunikation och information
Norinder, Ulf (author)
AstraZeneca R&D, Södertälje, Sweden
 (creator_code:org_t)
Skövde : University of Skövde, 2009
2009
English.
In: Proceedings of the 3rd Skövde Workshop on Information Fusion Topics (SWIFT 2009). - Skövde : University of Skövde. - 9789197851329 ; , s. 59-62
  • Conference paper (peer-reviewed)
Abstract Subject headings
Close  
  • Information on uncertainty of measurements or estimates of molecular properties are rarely utilized by in silico predictive models. In this study, different approaches to handling uncertain numerical features are explored when using the stateof- the-art random forest algorithm for generating predictive models. Two main approaches are considered: i) sampling from probability distributions prior to tree generation, which does not require any change to the underlying tree learning algorithm, and ii) adjusting the algorithm to allow for handling probability distributions, similar to how missing values typically are handled, i.e., partitions may include fractions of examples. An experiment with six datasets concerning the prediction of various chemical properties is presented, where 95% confidence intervals are included for one of the 92 numerical features. In total, five approaches to handling uncertain numeric features are compared: ignoring the uncertainty, sampling from distributions that are assumed to be uniform and normal respectively, and adjusting tree learning to handle probability distributions that are assumed to be uniform and normal respectively. The experimental results show that all approaches that utilize information on uncertainty indeed outperform the single approach ignoring this, both with respect to accuracy and area under ROC curve. A decomposition of the squared error of the constituent classification trees shows that the highest variance is obtained by ignoring the information on uncertainty, but that this also results in the highest mean squared error of the constituent trees.

Subject headings

NATURVETENSKAP  -- Data- och informationsvetenskap (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences (hsv//eng)

Keyword

Informatics, computer and systems science
Informatik, data- och systemvetenskap
Technology
Teknik
Skövde Artificial Intelligence Lab (SAIL)
Skövde Artificial Intelligence Lab (SAIL)

Publication and Content Type

ref (subject category)
kon (subject category)

Find in a library

To the university's database

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist

Find more in SwePub

By the author/editor
Boström, Henrik
Norinder, Ulf
About the subject
NATURAL SCIENCES
NATURAL SCIENCES
and Computer and Inf ...
Articles in the publication
Proceedings of t ...
By the university
University of Skövde
Royal Institute of Technology

Search outside SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view