SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L773:9782951740891 "

Sökning: L773:9782951740891

  • Resultat 1-13 av 13
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Adouane, Wafia, 1985, et al. (författare)
  • Gulf Arabic Resource Building for Sentiment Analysis
  • 2016
  • Ingår i: Proceedings of the Language Resources and Evaluation Conference (LREC), 23-28 May 2016, Portorož, Slovenia. - : European Language Resources Association. - 9782951740891
  • Konferensbidrag (refereegranskat)abstract
    • This paper deals with building linguistic resources for Gulf Arabic, one of the Arabic variations, for sentiment analysis task using machine learning. To our knowledge, no previous works were done for Gulf Arabic sentiment analysis despite the fact that it is present in different online platforms. Hence, the first challenge is the absence of annotated data and sentiment lexicons. To fill this gap, we created these two main linguistic resources. Then we conducted different experiments: use Naive Bayes classifier without any lexicon; add a sentiment lexicon designed basically for MSA; use only the compiled Gulf Arabic sentiment lexicon and finally use both MSA and Gulf Arabic sentiment lexicons. The Gulf Arabic lexicon gives a good improvement of the classifier accuracy (90.54 %) over a baseline that does not use the lexicon (82.81%), while the MSA lexicon causes the accuracy to drop to (76.83%). Moreover, mixing MSA and Gulf Arabic lexicons causes the accuracy to drop to (84.94%) compared to using only Gulf Arabic lexicon. This indicates that it is useless to use MSA resources to deal with Gulf Arabic due to the considerable differences and conflicting structures between these two languages.
  •  
2.
  • Andersson, Marta, et al. (författare)
  • Annotating Topic Development in Information Seeking Queries
  • 2016
  • Ingår i: The LREC 2016 Proceedings. - 9782951740891 ; , s. 1755-1761
  • Konferensbidrag (refereegranskat)abstract
    • This paper contributes to the limited body of empirical research in the domain of discourse structure of information seeking queries.We describe the development of an annotation schema for coding topic development in information seeking queries and the initialobservations from a pilot sample of query sessions. The main idea that we explore is the relationship between constant and variablediscourse entities and their role in tracking changes in the topic progression. We argue that the topicalized entities remain stable acrossdevelopment of the discourse and can be identified by a simple mechanism where anaphora resolution is a precursor. We also claim thata corpus annotated in this framework can be used as training data for dialogue management and computational semantics systems. 
  •  
3.
  • Dobrovoljc, Kaja, et al. (författare)
  • The Universal Dependencies Treebank of Spoken Slovenian
  • 2016
  • Ingår i: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). - 9782951740891 ; , s. 1566-1573
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents the construction of an open-source dependency treebank of spoken Slovenian, the first syntactically annotated collection of spontaneous speech in Slovenian. The treebank has been manually annotated using the Universal Dependencies annotation scheme, a one-layer syntactic annotation scheme with a high degree of cross-modality, cross-framework and cross-language interoperability. In this original application of the scheme to spoken language transcripts, we address a wide spectrum of syntactic particularities in speech, either by extending the scope of application of existing universal labels or by proposing new speech-specific extensions. The initial analysis of the resulting treebank and its comparison with the written Slovenian UD treebank confirms significant syntactic differences between the two language modalities, with spoken data consisting of shorter and more elliptic sentences, less and simpler nominal phrases, and more relations marking disfluencies, interaction, deixis and modality.
  •  
4.
  • Edlund, Jens, et al. (författare)
  • Hidden resources - Strategies to acquire and exploit potential spoken language resources in national archives
  • 2016
  • Ingår i: Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016. - : European Language Resources Association (ELRA). - 9782951740891 ; , s. 4531-4534
  • Konferensbidrag (refereegranskat)abstract
    • In 2014, the Swedish government tasked a Swedish agency, The Swedish Post and Telecom Authority (PTS), with investigating how to best create and populate an infrastructure for spoken language resources (Ref N2014/2840/ITP). As a part of this work, the department of Speech, Music and Hearing at KTH Royal Institute of Technology have taken inventory of existing potential spoken language resources, mainly in Swedish national archives and other governmental or public institutions. In this position paper, key priorities, perspectives, and strategies that may be of general, rather than Swedish, interest are presented. We discuss broad types of potential spoken language resources available; to what extent these resources are free to use; and thirdly the main contribution: strategies to ensure the continuous acquisition of spoken language resources in a manner that facilitates speech and speech technology research.
  •  
5.
  • Forsberg, Markus, 1974, et al. (författare)
  • Deriving Morphological Analyzers from Example Inflections
  • 2016
  • Ingår i: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC-2016) May 23-28, 2016, Portorož, Slovenia. - 9782951740891
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents a semi-automatic method to derive morphological analyzers from a limited number of example inflections suitable for languages with alphabetic writing systems. The system we present learns the inflectional behavior of morphological paradigms from examples and converts the learned paradigms into a finite-state transducer that is able to map inflected forms of previously unseen words into lemmas and corresponding morphosyntactic descriptions. We evaluate the system when provided with inflection tables for several languages collected from the Wiktionary.
  •  
6.
  • François, Thomas, et al. (författare)
  • SVALex: a CEFR-graded lexical resource for Swedish foreign and second language learners.
  • 2016
  • Ingår i: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), May 23-28, 2016 Portorož, Slovenia. - Paris : European Language Resources Association. - 9782951740891
  • Konferensbidrag (refereegranskat)abstract
    • The paper introduces SVALex, a lexical resource primarily aimed at learners and teachers of Swedish as a foreign and second language that describes the distribution of 15,681 words and expressions across the Common European Framework of Reference (CEFR). The resource is based on a corpus of coursebook texts, and thus describes receptive vocabulary learners are exposed to during reading activities, as opposed to productive vocabulary they use when speaking or writing. The paper describes the methodology applied to create the list and to estimate the frequency distribution. It also discusses some chracteristics of the resulting resource and compares it to other lexical resources for Swedish. An interesting feature of this resource is the possibility to separate the wheat from the chaff, identifying the core vocabulary at each level, i.e. vocabulary shared by several coursebook writers at each level, from peripheral vocabulary which is used by the minority of the coursebook writers.
  •  
7.
  • Johansson, Richard, 1975, et al. (författare)
  • A Multi-domain Corpus of Swedish Word Sense Annotation
  • 2016
  • Ingår i: 10th edition of the Language Resources and Evaluation Conference, 23-28 May 2016, Portorož (Slovenia). - : European Language Resources Association. - 9782951740891
  • Konferensbidrag (refereegranskat)abstract
    • We describe the word sense annotation layer in Eukalyptus, a freely available five-domain corpus of contemporary Swedish with several annotation layers. The annotation uses the SALDO lexicon to define the sense inventory, and allows word sense annotation of compound segments and multiword units. We give an overview of the new annotation tool developed for this project, and finally present an analysis of the inter-annotator agreement between two annotators.
  •  
8.
  • Klang, Marcus, et al. (författare)
  • WikiParq: A Tabulated Wikipedia Resource Using the Parquet Format
  • 2016
  • Ingår i: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016). - 9782951740891 ; , s. 4141-4148
  • Konferensbidrag (refereegranskat)abstract
    • Wikipedia has become one of the most popular resources in natural language processing and it is used in quantities of applications. However, Wikipedia requires a substantial pre-processing step before it can be used. For instance, its set of nonstandardized annotations, referred to as the wiki markup, is language-dependent and needs specific parsers from language to language, for English, French, Italian, etc. In addition, the intricacies of the different Wikipedia resources: main article text, categories, wikidata, infoboxes, scattered into the article document or in different files make it difficult to have global view of this outstanding resource. In this paper, we describe WikiParq, a unified format based on the Parquet standard to tabulate and package the Wikipedia corpora. In combination with Spark, a map-reduce computing framework, and the SQL query language, WikiParq makes it much easier to write database queries to extract specific information or subcorpora from Wikipedia, such as all the first paragraphs of the articles in French, or all the articles on persons in Spanish, or all the articles on persons that have versions in French, English, and Spanish. WikiParq is available in six language versions and is potentially extendible to all the languages of Wikipedia. The WikiParq files are downloadable as tarball archives from this location: http://semantica.cs.lth.se/wikiparq/.
  •  
9.
  • Megyesi, Beata, 1971-, et al. (författare)
  • The Uppsala Corpus of Student Writings : Corpus Creation, Annotation, and Analysis
  • 2016
  • Ingår i: LREC 2016. - Paris : EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA. - 9782951740891 ; , s. 3192-3199
  • Konferensbidrag (refereegranskat)abstract
    • The Uppsala Corpus of Student Writings consists of Swedish texts produced as part of a national test of students ranging in age from nine (in year three of primary school) to nineteen (the last year of upper secondary school) who are studying either Swedish or Swedish as a second language. National tests have been collected since 1996. The corpus currently consists of 2,500 texts containing over 1.5 million tokens. Parts of the texts have been annotated on several linguistic levels using existing state-of-the-art natural language processing tools. In order to make the corpus easy to interpret for scholars in the humanities, we chose the CoNLL format instead of an XML-based representation. Since spelling and grammatical errors are common in student writings, the texts are automatically corrected while keeping the original tokens in the corpus. Each token is annotated with part-of-speech and morphological features as well as syntactic structure. The main purpose of the corpus is to facilitate the systematic and quantitative empirical study of the writings of various student groups based on gender, geographic area, age, grade awarded or a combination of these, synchronically or diachronically. The intention is for this to be a monitor corpus, currently under development.
  •  
10.
  • Nivre, Joakim, 1962-, et al. (författare)
  • Universal Dependencies v1 : A Multilingual Treebank Collection
  • 2016
  • Ingår i: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). - Paris : EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA. - 9782951740891 ; , s. 1659-1666
  • Konferensbidrag (refereegranskat)abstract
    • Cross-linguistically consistent annotation is necessary for sound comparative evaluation and cross-lingual learning experiments. It is also useful for multilingual system development and comparative linguistic studies. Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework. In this paper, we describe v1 of the universal guidelines, the underlying design principles, and the currently available treebanks for 33 languages.
  •  
11.
  • Oepen, Stephan, et al. (författare)
  • Towards Comparability of Linguistic Graph Banks for Semantic Parsing
  • 2016
  • Ingår i: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC). - : European Language Resources Association. - 9782951740891 ; , s. 3991-3995
  • Konferensbidrag (refereegranskat)abstract
    • We announce a new language resource for research on semantic parsing, a large, carefully curated collection of semantic dependency graphs representing multiple linguistic traditions. This resource is called SDP 2016 and provides an update and extension to previous versions used as Semantic Dependency Parsing target representations in the 2014 and 2015 Semantic Evaluation Exercises (SemEval). For a common core of English text, this third edition comprises semantic dependency graphs from four distinct frameworks, packaged in a unified abstract format and aligned at the sentence and token levels. SDP 2016 is the first general release of this resource and available for licensing from the Linguistic Data Consortium from May 2016. The data is accompanied by an open-source SDP utility toolkit and system results from previous contrastive parsing evaluations against these target representations.
  •  
12.
  • Seraji, Mojgan, et al. (författare)
  • Universal Dependencies for Persian
  • 2016
  • Ingår i: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). - Paris : EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA. - 9782951740891 ; , s. 2361-2365
  • Konferensbidrag (refereegranskat)abstract
    • The Persian Universal Dependency Treebank (Persian UD) is a recent effort of treebanking Persian with Universal Dependencies (UD), an ongoing project that designs unified and cross-linguistically valid grammatical representations including part-of-speech tags, morphological features, and dependency relations. The Persian UD is the converted version of the Uppsala Persian Dependency Treebank (UPDT) to the universal dependencies framework and consists of nearly 6,000 sentences and 152,871 word tokens with an average sentence length of 25 words. In addition to the universal dependencies syntactic annotation guidelines, the two treebanks differ in tokenization. All words containing unsegmented clitics (pronominal and copula clitics) annotated with complex labels in the UPDT have been separated from the clitics and appear with distinct labels in the Persian UD. The treebank has its original syntactic annotation scheme based on Stanford Typed Dependencies. In this paper, we present the approaches taken in the development of the Persian UD.
  •  
13.
  • Volodina, Elena, 1973, et al. (författare)
  • SweLL on the rise: Swedish Learner Language corpus for European Reference Level studies.
  • 2016
  • Ingår i: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), May 23-28, 2016, Portorož, Slovenia. - Paris : European Language Resources Association. - 9782951740891
  • Konferensbidrag (refereegranskat)abstract
    • We present a new resource for Swedish, SweLL, a corpus of Swedish Learner essays linked to learners’ performance according to the Common European Framework of Reference (CEFR). SweLL consists of three subcorpora – SpIn, SW1203 and Tisus, collected from three different educational establishments. The common metadata for all subcorpora includes age, gender, native languages, time of residence in Sweden, type of written task. Depending on the subcorpus, learner texts may contain additional information, such as text genres, topics, grades. Five of the six CEFR levels are represented in the corpus: A1, A2, B1, B2 and C1 comprising in total 339 essays. C2 level is not included since courses at C2 level are not offered. The work flow consists of collection of essays and permits, essay digitization and registration, meta-data annotation, automatic linguistic annotation. Inter-rater agreement is presented on the basis of SW1203 subcorpus. The work on SweLL is still ongoing with more that 100 essays waiting in the pipeline. This article both describes the resource and the “how-to” behind the compilation of SweLL.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-13 av 13

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy