SwePub
Sök i LIBRIS databas

  Utökad sökning

L773:2192 6352 OR L773:2192 6360
 

Sökning: L773:2192 6352 OR L773:2192 6360 > GDTM: Graph-based D...

GDTM: Graph-based Dynamic Topic Models

Ghoorchian, Kambiz, 1981- (författare)
KTH,Programvaruteknik och datorsystem, SCS,KTH Royal Institute of Technology, Sweden
Sahlgren, Magnus (författare)
RISE,Datavetenskap
 (creator_code:org_t)
2020-05-15
2020
Engelska.
Ingår i: Progress in Artificial Intelligence. - : Springer Nature. - 2192-6352 .- 2192-6360. ; 9, s. 195-207
  • Tidskriftsartikel (refereegranskat)
Abstract Ämnesord
Stäng  
  • Dynamic Topic Modeling (DTM) is the ultimate solution for extracting topics from short texts generated in Online Social Networks (OSNs) like Twitter. A DTM solution is required to be scalable and to be able to account for sparsity in short texts and dynamicity of topics. Current solutions combine probabilistic mixture models like Dirichlet Multinomial or PitmanYor Process with approximate inference approaches like Gibbs Sampling and Stochastic Variational Inference to, respectively, account for dynamicity and scalability in DTM. However, these solutions rely on weak probabilistic language models, which do not account for sparsity in short texts. In addition, their inference is based on iterative optimization algorithms, which have scalability issues when it comes to DTM. We present GDTM, a single-pass graph-based DTM algorithm, to solve the problem. GDTM combines a context-rich and incremental feature representation model, called Random Indexing (RI), with a novel online graph partitioning algorithm to address scalability and dynamicity. In addition, GDTM uses a rich language modeling approach based on the Skip-gram technique to account for sparsity. We run multiple experiments over a large-scale Twitter dataset to analyze the accuracy and scalability of GDTM and compare the results with four state-of-the-art approaches. The results show that GDTM outperforms the best approach by 11% on accuracy and performs by an order of magnitude faster while creating 4 times better topic quality over standard evaluation metrics.

Ämnesord

NATURVETENSKAP  -- Data- och informationsvetenskap -- Datavetenskap (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Computer Sciences (hsv//eng)

Nyckelord

Topic Modeling
Dimensionality Reduction
Distributional Semantics
Language Modeling
Graph Partitioning
Datalogi
Computer Science

Publikations- och innehållstyp

ref (ämneskategori)
art (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Hitta mer i SwePub

Av författaren/redakt...
Ghoorchian, Kamb ...
Sahlgren, Magnus
Om ämnet
NATURVETENSKAP
NATURVETENSKAP
och Data och informa ...
och Datavetenskap
Artiklar i publikationen
Progress in Arti ...
Av lärosätet
Kungliga Tekniska Högskolan
RISE

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy