SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Strandqvist Wiktor) "

Sökning: WFRF:(Strandqvist Wiktor)

  • Resultat 1-8 av 8
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Santini, Marina, 1960-, et al. (författare)
  • Can We quantify domainhood? : Exploring measures to assess domain-specificity in web corpora
  • 2018
  • Ingår i: Commun. Comput. Info. Sci.. - Cham : Springer International Publishing. - 9783319991320 - 9783319991337 ; , s. 207-217
  • Konferensbidrag (refereegranskat)abstract
    • Web corpora are a cornerstone of modern Language Technology. Corpora built from the web are convenient because their creation is fast and inexpensive. Several studies have been carried out to assess the representativeness of general-purpose web corpora by comparing them to traditional corpora. Less attention has been paid to assess the representativeness of specialized or domain-specific web corpora. In this paper, we focus on the assessment of domain representativeness of web corpora and we claim that it is possible to assess the degree of domain-specificity, or domainhood, of web corpora. We present a case study where we explore the effectiveness of different measures - namely the Mann-Withney-Wilcoxon Test, Kendall correlation coefficient, Kullback–Leibler divergence, log-likelihood and burstiness - to gauge domainhood. Our findings indicate that burstiness is the most suitable measure to single out domain-specific words from a specialized corpus and to allow for the quantification of domainhood.
  •  
2.
  • Santini, Marina, 1960-, et al. (författare)
  • Designing an Extensible Domain-Specific Web Corpus for “Layfication” : A Case Study in eCare at Home : Chapter 6
  • 2019
  • Ingår i: Cyber-Physical Systemsfor Social Applications. - Hershey PA, USA 17033 : Engineering Science Reference. - 9781522578796 - 9781522578802 ; , s. 98-155
  • Bokkapitel (övrigt vetenskapligt/konstnärligt)abstract
    • In the era of data-driven science, corpus-based language technology is an essential part of cyber physicalsystems. In this chapter, the authors describe the design and the development of an extensible domainspecificweb corpus to be used in a distributed social application for the care of the elderly at home.The domain of interest is the medical field of chronic diseases. The corpus is conceived as a flexible andextensible textual resource, where additional documents and additional languages will be appendedover time. The main purpose of the corpus is to be used for building and training language technologyapplications for the “layfication” of the specialized medical jargon. “Layfication” refers to the automaticidentification of more intuitive linguistic expressions that can help laypeople (e.g., patients, familycaregivers, and home care aides) understand medical terms, which often appear opaque. Exploratoryexperiments are presented and discussed.
  •  
3.
  • Santini, Marina, et al. (författare)
  • Designing an Extensible Domain-Specific Web Corpus for “Layfication” : A Case Study in eCare at Home
  • 2019
  • Ingår i: Cyber-Physical Systems for Social Applications. - Hershey, PA, USA : IGI Global. - 9781522593454 - 9781522578802 ; , s. 98-155
  • Bokkapitel (refereegranskat)abstract
    • In the era of data-driven science, corpus-based language technology is an essential part of cyber physical systems. In this chapter, the authors describe the design and the development of an extensible domain-specific web corpus to be used in a distributed social application for the care of the elderly at home. The domain of interest is the medical field of chronic diseases. The corpus is conceived as a flexible and extensible textual resource, where additional documents and additional languages will be appended over time. The main purpose of the corpus is to be used for building and training language technology applications for the “layfication” of the specialized medical jargon. “Layfication” refers to the automatic identification of more intuitive linguistic expressions that can help laypeople (e.g., patients, family caregivers, and home care aides) understand medical terms, which often appear opaque. Exploratory experiments are presented and discussed.
  •  
4.
  •  
5.
  • Santini, Marina, et al. (författare)
  • Profiling Domain Specificity of Specialized Web Corpora using Burstiness. Explorations and Open Issues
  • 2018
  • Konferensbidrag (refereegranskat)abstract
    • In this paper we describe an approach to profile the domain specificity of specialized web corpora in Swedish. The proposedapproach is based on burstiness.   Burstiness is a statistical measure that identifies words with uneven distribution across thedocuments of a corpus. We apply burstiness to two medical web corpora that have different size and different domain granularity.Results are promising and show that burstiness is an appropriate measure to profile the domain specificity when matched againstreference lists (gold standards) that represent the target domains. However, further research is needed to find adequate evaluationmetrics, less empirical cut-off points and more principled gold standard design.
  •  
6.
  • Santini, Marina, et al. (författare)
  • Profiling specialized web corpus qualities : A progress report on "Domainhood"
  • 2019
  • Ingår i: Argentinian Journal of Applied Linguistics. - : FEDERACION ARGENTINA ASOC PROFESORES INGLES-FAAP. - 2314-3576. ; 7:1, s. 8-26
  • Tidskriftsartikel (refereegranskat)abstract
    • In this article we describe ways to profile the domain specificity, a.k.a. domainhood, of specialized web corpora in English and in Swedish. Several studies have been carried out to measure the "qualities" of general-purpose web corpora. On the contrary, less attention has been paid to the evaluation of specialized or domain-specific web corpora. To fill this gap, in this article we present case studies where we explore the effectiveness of several statistical measures – i.e. rank correlation coefficients (Kendall and Spearman), Kullback–Leibler divergence, log-likelihood and burstiness - to assess domainhood. Our findings indicate that it is possible to profile the domainhood quality of a corpus. However, further research is needed to generalize on the results.
  •  
7.
  •  
8.
  • Strandqvist, Wiktor, et al. (författare)
  • Towards a Quality Assessment of Web Corpora for Language Technology Applications
  • 2018
  • Ingår i: echnological Innovation for Specialized Linguistic Domains Languages for Digital Lives and Cultures.
  • Konferensbidrag (refereegranskat)abstract
    • In the experiments presented in this paper we focus on the creation and evaluation of domain-specific web corpora. To this purpose, we propose a two-step approach, namely the (1) the automatic extraction and evaluation of term seeds from personas and use cases/scenarios; (2) the creation and evaluation of domain-specific web corpora bootstrapped with term seeds automatically extracted in step 1. Results are encouraging and show that: (1) it is possible to create a fairly accurate term extractor for relatively short narratives; (2) it is straightforward to evaluate a quality such as domain-specificity of web corpora using well-established metrics.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-8 av 8

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy