SwePub
Sök i LIBRIS databas

  Extended search

WFRF:(Lindgren Tony)
 

Search: WFRF:(Lindgren Tony) > Learning Random For...

Learning Random Forest from Histogram Data Using Split Specific Axis Rotation

Gurung, Ram B. (author)
Stockholms universitet,Institutionen för data- och systemvetenskap
Lindgren, Tony (author)
Stockholms universitet,Institutionen för data- och systemvetenskap
Boström, Henrik (author)
KTH,Stockholms universitet,Institutionen för data- och systemvetenskap,Skolan för informations- och kommunikationsteknik (ICT)
 (creator_code:org_t)
2018-02
2018
English.
In: International Journal of Machine Learning and Computing. - : EJournal Publishing. - 2010-3700. ; 8:1, s. 74-79
  • Journal article (peer-reviewed)
Abstract Subject headings
Close  
  • Machine learning algorithms for data containing histogram variables have not been explored to any major extent. In this paper, an adapted version of the random forest algorithm is proposed to handle variables of this type, assuming identical structure of the histograms across observations, i.e., the histograms for a variable all use the same number and width of the bins. The standard approach of representing bins as separate variables, may lead to that the learning algorithm overlooks the underlying dependencies. In contrast, the proposed algorithm handles each histogram as a unit. When performing split evaluation of a histogram variable during tree growth, a sliding window of fixed size is employed by the proposed algorithm to constrain the sets of bins that are considered together. A small number of all possible set of bins are randomly selected and principal component analysis (PCA) is applied locally on all examples in a node. Split evaluation is then performed on each principal component. Results from applying the algorithm to both synthetic and real world data are presented, showing that the proposed algorithm outperforms the standard approach of using random forests together with bins represented as separate variables, with respect to both AUC and accuracy. In addition to introducing the new algorithm, we elaborate on how real world data for predicting NOx sensor failure in heavy duty trucks was prepared, demonstrating that predictive performance can be further improved by adding variables that represent changes of the histograms over time.

Subject headings

NATURVETENSKAP  -- Data- och informationsvetenskap -- Datavetenskap (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Computer Sciences (hsv//eng)
NATURVETENSKAP  -- Data- och informationsvetenskap (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences (hsv//eng)

Keyword

Histogram random forest
histogram data
random forest PCA. histogram features.
Computer and Systems Sciences
data- och systemvetenskap

Publication and Content Type

ref (subject category)
art (subject category)

Find in a library

To the university's database

Search outside SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view