SwePub
Sök i LIBRIS databas

  Utökad sökning

id:"swepub:oai:DiVA.org:kth-4788"
 

Sökning: id:"swepub:oai:DiVA.org:kth-4788" > On practical machin...

On practical machine learning and data analysis

Gillblad, Daniel, 1975- (författare)
RISE,KTH,Beräkningsbiologi, CB,Decisions, Networks and Analytics lab
Lansner, Anders (preses)
KTH,Beräkningsbiologi, CB
Jensen, Finn, Professor (opponent)
Aalborgs universitet
 (creator_code:org_t)
ISBN 9789171789938
Stockholm : KTH, 2008
Engelska ix, 217 s.
Serie: TRITA-CSC-A, 1653-5723 ; 2008-11
Serie: SICS Dissertation Series, 1101-1335 ; 49
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)
Abstract Ämnesord
Stäng  
  • This thesis discusses and addresses some of the difficulties associated with practical machine learning and data analysis. Introducing data driven meth- ods in e. g. industrial and business applications can lead to large gains in productivity and efficiency, but the cost and complexity are often overwhelm- ing. Creating machine learning applications in practise often involves a large amount of manual labour, which often needs to be performed by an experi- enced analyst without significant experience with the application area. We will here discuss some of the hurdles faced in a typical analysis project and suggest measures and methods to simplify the process.One of the most important issues when applying machine learning meth- ods to complex data, such as e. g. industrial applications, is that the processes generating the data are modelled in an appropriate way. Relevant aspects have to be formalised and represented in a way that allow us to perform our calculations in an efficient manner. We present a statistical modelling framework, Hierarchical Graph Mixtures, based on a combination of graphi- cal models and mixture models. It allows us to create consistent, expressive statistical models that simplify the modelling of complex systems. Using a Bayesian approach, we allow for encoding of prior knowledge and make the models applicable in situations when relatively little data are available.Detecting structures in data, such as clusters and dependency structure, is very important both for understanding an application area and for speci- fying the structure of e. g. a hierarchical graph mixture. We will discuss how this structure can be extracted for sequential data. By using the inherent de- pendency structure of sequential data we construct an information theoretical measure of correlation that does not suffer from the problems most common correlation measures have with this type of data.In many diagnosis situations it is desirable to perform a classification in an iterative and interactive manner. The matter is often complicated by very limited amounts of knowledge and examples when a new system to be diag- nosed is initially brought into use. We describe how to create an incremental classification system based on a statistical model that is trained from empiri- cal data, and show how the limited available background information can still be used initially for a functioning diagnosis system.To minimise the effort with which results are achieved within data anal- ysis projects, we need to address not only the models used, but also the methodology and applications that can help simplify the process. We present a methodology for data preparation and a software library intended for rapid analysis, prototyping, and deployment.Finally, we will study a few example applications, presenting tasks within classification, prediction and anomaly detection. The examples include de- mand prediction for supply chain management, approximating complex simu- lators for increased speed in parameter optimisation, and fraud detection and classification within a media-on-demand system.

Ämnesord

NATURVETENSKAP  -- Data- och informationsvetenskap -- Datavetenskap (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Computer Sciences (hsv//eng)
NATURVETENSKAP  -- Data- och informationsvetenskap (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences (hsv//eng)

Nyckelord

Computer science
Datalogi

Publikations- och innehållstyp

vet (ämneskategori)
dok (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy