Extracting Text into Meta-Data: Improving machine text-understanding of news-media articles

↓ Direkt till sidans innehåll
↓ Direkt till sidans sekundära innehåll (sidomenyn)

Search: WFRF:(Österberg Patrik 1975 ) > Extracting Text int...

Extracting Text into Meta-Data : Improving machine text-understanding of news-media articles

Lindén, Johannes, 1993- (author): Mittuniversitetet,Institutionen för informationssystem och –teknologi

Zhang, Tingting, 1957- (thesis advisor): Mittuniversitetet,Institutionen för informationssystem och –teknologi

Forsström, Stefan, 1984- (thesis advisor): Mittuniversitetet,Institutionen för informationssystem och –teknologi

Österberg, Patrik, 1975- (thesis advisor): Mittuniversitetet,Institutionen för informationssystem och –teknologi

Holeňa, Martin, Professor (opponent): Czech Technical University, Prague

show less...

(creator_code:org_t)

ISBN 9789189341029
Sundsvall : Mid Sweden University, 2021
English 55 s.
Series: Mid Sweden University licentiate thesis, 1652-8948 ; 181

Related links:: https://miun.diva-po... (primary) (Raw object); show more...; https://urn.kb.se/re...; show less...

Licentiate thesis (other academic/artistic)

Abstract Subject headings

Society is constantly in need of information. It is important to consume event-based information of what is happening around us as well as facts and knowledge. As society grows, the amount of information to consume grows with it. This thesis demonstrates one way to extract and represent knowledge from text in a machine-readable way for news media articles. Three objectives are considered when developing a machine learning system to retrieve categories, entities, relations and other meta-data from text paragraphs. The first is to sort the terminology by topic; this makes it easier for machine learning algorithms to understand the text and the unique words used. The second objective is to construct a service for use in production, where scalability and performance are evaluated. Features are implemented to iteratively improve the model predictions, and several versions are run at the same time to, for example, compare them in an A/B test. The third objective is to further extract the gist of what is expressed in the text. The gist is extracted in the form of triples by connecting two related entities using a combination of natural language processing algorithms. The research presents a comparison between five different auto categorization algorithms, and an evaluation of their hyperparameters and how they would perform under the pressure of thousands of big, concurrent predictions. The aim is to build an auto-categorization system that can be used in the news media industry to help writers and journalists focus more on the story rather than filling in meta-data for each article. The best-performing algorithm is a Bidirectional Long-Short-Term-Memory neural network. Three different information extraction algorithms for extracting the gist of paragraphs are also compared. The proposed information extraction algorithm supports extracting information from texts in multiple languages with competitive accuracy compared with the state-of-the-art OpenIE and MinIE algorithms that can extract information in a single language. The use of the multi-linguistic models helps local-news media to write articles in different languages as a help to integrate immigrants into the society.

Find in a library

Extracting Text into Meta-Data Improving machine text-understanding of news-medi... (Search the publication in LIBRIS)

To the university's database

Find more in SwePub

By the author/editor: Lindén, Johannes ...; Zhang, Tingting, ...; Forsström, Stefa ...; Österberg, Patri ...; Holeňa, Martin, ...

About the subject

NATURAL SCIENCES: NATURAL SCIENCES; and Computer and Inf ...; and Language Technol ...

NATURAL SCIENCES: NATURAL SCIENCES; and Computer and Inf ...; and Computer Science ...

Parts in the series: Mid Sweden Unive ...

By the university: Mid Sweden University

Search outside SwePub

Extend your search to:: Google; Google Book Search; Google Scholar

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

LIBRIS.kb.se

Extracting Text into Meta-Data : Improving machine text-understanding of news-media articles

Subject headings

Publication and Content Type

Find in a library

To the university's database

Find more in SwePub

Search outside SwePub