SwePub
Sök i LIBRIS databas

  Extended search

onr:"swepub:oai:DiVA.org:lnu-37062"
 

Search: onr:"swepub:oai:DiVA.org:lnu-37062" > Automated classific...

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist

Automated classification of textual documents based on a controlled vocabulary in engineering

Golub, Koraljka (author)
Library and Information Science
Hamon, Thierry (author)
Ardö, Anders (author)
 (creator_code:org_t)
Ergon-Verlag, 2007
2007
English.
In: Knowledge organization. - : Ergon-Verlag. - 0943-7444. ; 34:4, s. 247-263
  • Journal article (peer-reviewed)
Abstract Subject headings
Close  
  • Automated subject classification has been a challenging research issue for many years now, receiving particular attention in the past decade due to rapid increase of digital documents. The most frequent approach to automated classification is machine learning. It, however, requires training documents and performs well on new documents only if these are similar enough to the former. We explore a string-matching algorithm based on a controlled vocabulary, which does not require training documents--instead it reuses the intellectual work put into creating the controlled vocabulary. Terms from the Engineering Information thesaurus and classification scheme were matched against title and abstract of engineering papers from the Compendex database. Simple string-matching was enhanced by several methods such as term weighting schemes and cut-offs, exclusion of certain terms, and enrichment of the controlled vocabulary with automatically extracted terms. The best results are 76% recall when the controlled vocabulary is enriched with new terms, and 79% precision when certain terms are excluded. Precision of individual classes is up to 98%. These results are comparable to state-of-the-art machine-learning algorithms.

Subject headings

SAMHÄLLSVETENSKAP  -- Medie- och kommunikationsvetenskap -- Biblioteks- och informationsvetenskap (hsv//swe)
SOCIAL SCIENCES  -- Media and Communications -- Information Studies (hsv//eng)

Keyword

Biblioteks- och informationsvetenskap
Library and Information Science

Publication and Content Type

ref (subject category)
art (subject category)

Find in a library

To the university's database

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist

Find more in SwePub

By the author/editor
Golub, Koraljka
Hamon, Thierry
Ardö, Anders
About the subject
SOCIAL SCIENCES
SOCIAL SCIENCES
and Media and Commun ...
and Information Stud ...
Articles in the publication
Knowledge organi ...
By the university
Linnaeus University

Search outside SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view