SwePub
Sök i LIBRIS databas

  Extended search

onr:"swepub:oai:DiVA.org:uu-400834"
 

Search: onr:"swepub:oai:DiVA.org:uu-400834" > Code and Data for “...

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist

Code and Data for “Classification of Medieval Documents: Determining the Issuer, Place of Issue, and Decade for Old Swedish Charters”

Dahllöf, Mats, 1965- (author)
Uppsala universitet,Institutionen för lingvistik och filologi
 (creator_code:org_t)
2020
English.
  • Other publication
Abstract Subject headings
Close  
  • Code and data for the article Classification of Medieval Documents: Determining the Issuer, Place of Issue, and Decade for Old Swedish Charters (to appear in DHN2020 Digital Humanities in the Nordic Countries}, Riga, 17--20 March 2020).The study based on this code and dataset is a comparative exploration of different classification tasks for Swedish medieval charters (transcriptions from the SDHK collection) and different classifier setups. In particular, we explore the identification of the issuer, place of issue, and decade of production. The experiments used features based on lowercased words and character 3- and 4-grams. We evaluated the performance of two learning algorithms: linear discriminant analysis and decision trees. For evaluation, five-fold cross-validation was performed. We report accuracy and macro-averaged F1 score. The validation made use of six labeled subsets of SDHK combining the three tasks with Old Swedish and Latin. Issuer identification for the Latin dataset (595 charters from 12 issuers) reached the highest scores, above 0.9, for the decision tree classifier using word features. The best corresponding accuracy for Old Swedish was 0.81. Place and decade identification produced lower performance scores for both languages. Which classifier design is the best one seems to depend on peculiarities of the dataset and the classification task. The present study does however support the idea that text classification is useful also for medieval documents characterized by extreme spelling variation.

Subject headings

NATURVETENSKAP  -- Data- och informationsvetenskap -- Språkteknologi (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Language Technology (hsv//eng)

Keyword

Historia
History
Datorlingvistik
Computational Linguistics
Scandinavian Languages
Nordiska språk
Latin
Latin

Publication and Content Type

ovr (subject category)

To the university's database

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist

Find more in SwePub

By the author/editor
Dahllöf, Mats, 1 ...
About the subject
NATURAL SCIENCES
NATURAL SCIENCES
and Computer and Inf ...
and Language Technol ...
By the university
Uppsala University

Search outside SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view