SwePub
Sök i LIBRIS databas

  Utökad sökning

WFRF:(Boman Magnus)
 

Sökning: WFRF:(Boman Magnus) > Graph Algorithms fo...

Graph Algorithms for Large-Scale and Dynamic Natural Language Processing

Ghoorchian, Kambiz, 1981- (författare)
KTH,Programvaruteknik och datorsystem, SCS
Boman, Magnus, Professor (preses)
KTH,Programvaruteknik och datorsystem, SCS
Sahlgren, Magnus (preses)
Research Institute of Sweden (RISE)
visa fler...
Velupillai, Sumithra, associate professor (opponent)
King’s College London
visa färre...
 (creator_code:org_t)
ISBN 9789178733774
KTH Royal Institute of Technology, 2019
Engelska 42 s.
Serie: TRITA-EECS-AVL ; 2019:85
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)
Abstract Ämnesord
Stäng  
  • In Natural Language Processing, researchers design and develop algorithms to enable machines to understand and analyze human language. These algorithms benefit multiple downstream applications including sentiment analysis, automatic translation, automatic question answering, and text summarization. Topic modeling is one such algorithm that solves the problem of categorizing documents into multiple groups with the goal of maximizing the intra-group document similarity. However, the manifestation of short texts like tweets, snippets, comments, and forum posts as the dominant source of text in our daily interactions and communications, as well as being the main medium for news reporting and dissemination, increases the complexity of the problem due to scalability, sparsity, and dynamicity. Scalability refers to the volume of the messages being generated, sparsity is related to the length of the messages, and dynamicity is associated with the ratio of changes in the content and topical structure of the messages (e.g., the emergence of new phrases). We improve the scalability and accuracy of Natural Language Processing algorithms from three perspectives, by leveraging on innovative graph modeling and graph partitioning algorithms, incremental dimensionality reduction techniques, and rich language modeling methods. We begin by presenting a solution for multiple disambiguation on short messages, as opposed to traditional single disambiguation. The solution proposes a simple graph representation model to present topical structures in the form of dense partitions in that graph and applies disambiguation by extracting those topical structures using an innovative distributed graph partitioning algorithm. Next, we develop a scalable topic modeling algorithm using a novel dense graph representation and an efficient graph partitioning algorithm. Then, we analyze the effect of temporal dimension to understand the dynamicity in online social networks and present a solution for geo-localization of users in Twitter using a hierarchical model that combines partitioning of the underlying social network graph with temporal categorization of the tweets. The results show the effect of temporal dynamicity on users’ spatial behavior. This result leads to design and development of a dynamic topic modeling solution, involving an online graph partitioning algorithm and a significantly stronger language modeling approach based on the skip-gram technique. The algorithm shows strong improvement on scalability and accuracy compared to the state-of-the-art models. Finally, we describe a dynamic graph-based representation learning algorithm that modifies the partitioning algorithm to develop a generalization of our previous work. A strong representation learning algorithm is proposed that can be used for extracting high quality distributed and continuous representations out of any sequential data with local and hierarchical structural properties similar to natural language text.

Ämnesord

TEKNIK OCH TEKNOLOGIER  -- Elektroteknik och elektronik -- Datorsystem (hsv//swe)
ENGINEERING AND TECHNOLOGY  -- Electrical Engineering, Electronic Engineering, Information Engineering -- Computer Systems (hsv//eng)

Nyckelord

Natural Language Processing; Lexical Disambiguation; Topic Modeling; Representation Learning; Graph Partitioning; Distributed Algorithms; Dimensionality Reduction; Random Indexing

Publikations- och innehållstyp

vet (ämneskategori)
dok (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Hitta mer i SwePub

Av författaren/redakt...
Ghoorchian, Kamb ...
Boman, Magnus, P ...
Sahlgren, Magnus
Velupillai, Sumi ...
Om ämnet
TEKNIK OCH TEKNOLOGIER
TEKNIK OCH TEKNO ...
och Elektroteknik oc ...
och Datorsystem
Delar i serien
Av lärosätet
Kungliga Tekniska Högskolan

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy