Sökning: L773:2192 6352 OR L773:2192 6360 >
GDTM: Graph-based D...
Abstract
Ämnesord
Stäng
- Dynamic Topic Modeling (DTM) is the ultimate solution for extracting topics from short texts generated in Online Social Networks (OSNs) like Twitter. A DTM solution is required to be scalable and to be able to account for sparsity in short texts and dynamicity of topics. Current solutions combine probabilistic mixture models like Dirichlet Multinomial or PitmanYor Process with approximate inference approaches like Gibbs Sampling and Stochastic Variational Inference to, respectively, account for dynamicity and scalability in DTM. However, these solutions rely on weak probabilistic language models, which do not account for sparsity in short texts. In addition, their inference is based on iterative optimization algorithms, which have scalability issues when it comes to DTM. We present GDTM, a single-pass graph-based DTM algorithm, to solve the problem. GDTM combines a context-rich and incremental feature representation model, called Random Indexing (RI), with a novel online graph partitioning algorithm to address scalability and dynamicity. In addition, GDTM uses a rich language modeling approach based on the Skip-gram technique to account for sparsity. We run multiple experiments over a large-scale Twitter dataset to analyze the accuracy and scalability of GDTM and compare the results with four state-of-the-art approaches. The results show that GDTM outperforms the best approach by 11% on accuracy and performs by an order of magnitude faster while creating 4 times better topic quality over standard evaluation metrics.
Ämnesord
- NATURVETENSKAP -- Data- och informationsvetenskap -- Datavetenskap (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences -- Computer Sciences (hsv//eng)
Nyckelord
- Topic Modeling
- Dimensionality Reduction
- Distributional Semantics
- Language Modeling
- Graph Partitioning
- Datalogi
- Computer Science
Publikations- och innehållstyp
- ref (ämneskategori)
- art (ämneskategori)
Hitta via bibliotek
Till lärosätets databas