SwePub
Sök i LIBRIS databas

  Extended search

onr:"swepub:oai:DiVA.org:kth-263828"
 

Search: onr:"swepub:oai:DiVA.org:kth-263828" > GDTM: Graph-based D...

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist

GDTM: Graph-based Dynamic Topic Models

Ghoorchian, Kambiz, 1981- (author)
KTH,Programvaruteknik och datorsystem, SCS,KTH Royal Institute of Technology, Sweden
Sahlgren, Magnus (author)
RISE,Datavetenskap
 (creator_code:org_t)
2020-05-15
2020
English.
In: Progress in Artificial Intelligence. - : Springer Nature. - 2192-6352 .- 2192-6360. ; 9, s. 195-207
  • Journal article (peer-reviewed)
Abstract Subject headings
Close  
  • Dynamic Topic Modeling (DTM) is the ultimate solution for extracting topics from short texts generated in Online Social Networks (OSNs) like Twitter. A DTM solution is required to be scalable and to be able to account for sparsity in short texts and dynamicity of topics. Current solutions combine probabilistic mixture models like Dirichlet Multinomial or PitmanYor Process with approximate inference approaches like Gibbs Sampling and Stochastic Variational Inference to, respectively, account for dynamicity and scalability in DTM. However, these solutions rely on weak probabilistic language models, which do not account for sparsity in short texts. In addition, their inference is based on iterative optimization algorithms, which have scalability issues when it comes to DTM. We present GDTM, a single-pass graph-based DTM algorithm, to solve the problem. GDTM combines a context-rich and incremental feature representation model, called Random Indexing (RI), with a novel online graph partitioning algorithm to address scalability and dynamicity. In addition, GDTM uses a rich language modeling approach based on the Skip-gram technique to account for sparsity. We run multiple experiments over a large-scale Twitter dataset to analyze the accuracy and scalability of GDTM and compare the results with four state-of-the-art approaches. The results show that GDTM outperforms the best approach by 11% on accuracy and performs by an order of magnitude faster while creating 4 times better topic quality over standard evaluation metrics.

Subject headings

NATURVETENSKAP  -- Data- och informationsvetenskap -- Datavetenskap (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Computer Sciences (hsv//eng)

Keyword

Topic Modeling
Dimensionality Reduction
Distributional Semantics
Language Modeling
Graph Partitioning
Datalogi
Computer Science

Publication and Content Type

ref (subject category)
art (subject category)

Find in a library

To the university's database

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist

Find more in SwePub

By the author/editor
Ghoorchian, Kamb ...
Sahlgren, Magnus
About the subject
NATURAL SCIENCES
NATURAL SCIENCES
and Computer and Inf ...
and Computer Science ...
Articles in the publication
Progress in Arti ...
By the university
Royal Institute of Technology
RISE

Search outside SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view