SwePub
Sök i SwePub databas

  Extended search

Träfflista för sökning "WFRF:(Johansson Richard 1975) srt2:(2015-2019)"

Search: WFRF:(Johansson Richard 1975) > (2015-2019)

  • Result 1-36 of 36
Sort/group result
   
EnumerationReferenceCoverFind
1.
  • Adesam, Yvonne, 1975, et al. (author)
  • Defining the Eukalyptus forest – the Koala treebank of Swedish
  • 2015
  • In: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania. Edited by Beáta Megyesi. - 1650-3686 .- 1650-3740. - 9789175190983 ; , s. 1-9
  • Conference paper (peer-reviewed)abstract
    • This paper details the design of the lexical and syntactic layers of a new annotated corpus of Swedish contemporary texts. In order to make the corpus adaptable into a variety of representations, the annotation is of a hybrid type with head-marked constituents and function-labeled edges, and with a rich annotation of non-local dependencies. The source material has been taken from public sources, to allow the resulting corpus to be made freely available.
  •  
2.
  • Adesam, Yvonne, 1975, et al. (author)
  • Multiwords, Word Senses and Multiword Senses in the Eukalyptus Treebank of Written Swedish
  • 2015
  • In: Proceedings of the Fourteenth International Workshop on Treebanks and Linguistic Theories (TLT14), 11–12 December 2015 Warsaw, Poland. - 9788363159184 ; , s. 3-12
  • Conference paper (peer-reviewed)abstract
    • Multiwords reside at the intersection of the lexicon and syntax and in an annotation project, they will affect both levels. In the Eukalyptus treebank of written Swedish, we treat multiwords formally as syntactic objects, which are assigned a lexical type and sense. With the help of a simple dichotomy, analyzed vs unanalyzed multiwords, and the expressiveness of the syntactic annotation formalism employed, we are able to flexibly handle most multiword types and usages.
  •  
3.
  • Adesam, Yvonne, 1975, et al. (author)
  • The Eukalyptus Treebank of Written Swedish
  • 2018
  • In: Seventh Swedish Language Technology Conference (SLTC), Stockholm, 7–9 November 2018.
  • Conference paper (other academic/artistic)
  •  
4.
  •  
5.
  • Brändén, Gisela, 1975, et al. (author)
  • Coherent diffractive imaging of microtubules using an X-ray laser.
  • 2019
  • In: Nature communications. - : Springer Science and Business Media LLC. - 2041-1723. ; 10:1
  • Journal article (peer-reviewed)abstract
    • X-ray free electron lasers (XFELs) create new possibilities for structural studies of biological objects that extend beyond what is possible with synchrotron radiation. Serial femtosecond crystallography has allowed high-resolution structures to be determined from micro-meter sized crystals, whereas single particle coherent X-ray imaging requires development to extend the resolution beyond a few tens of nanometers. Here we describe an intermediate approach: the XFEL imaging of biological assemblies with helical symmetry. We collected X-ray scattering images from samples of microtubules injected across an XFEL beam using a liquid microjet, sorted these images into class averages, merged these data into a diffraction pattern extending to 2nm resolution, and reconstructed these data into a projection image of the microtubule. Details such as the 4nm tubulin monomer became visible in this reconstruction. These results illustrate the potential of single-molecule X-ray imaging of biological assembles with helical symmetry at room temperature.
  •  
6.
  • Johansson, Richard, 1975, et al. (author)
  • A Multi-domain Corpus of Swedish Word Sense Annotation
  • 2016
  • In: 10th edition of the Language Resources and Evaluation Conference, 23-28 May 2016, Portorož (Slovenia). - : European Language Resources Association. - 9782951740891
  • Conference paper (peer-reviewed)abstract
    • We describe the word sense annotation layer in Eukalyptus, a freely available five-domain corpus of contemporary Swedish with several annotation layers. The annotation uses the SALDO lexicon to define the sense inventory, and allows word sense annotation of compound segments and multiword units. We give an overview of the new annotation tool developed for this project, and finally present an analysis of the inter-annotator agreement between two annotators.
  •  
7.
  • Kågebäck, Mikael, 1981, et al. (author)
  • Neural context embeddings for automatic discovery of word senses
  • 2015
  • In: Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing. Denver, United States. - 9781941643464 ; , s. 25-32
  • Conference paper (peer-reviewed)abstract
    • Word sense induction (WSI) is the problem of automatically building an inventory of senses for a set of target words using only a text corpus. We introduce a new method for embedding word instances and their context, for use in WSI. The method, Instance-context embedding (ICE), leverages neural word embeddings, and the correlation statistics they capture, to compute high quality embeddings of word contexts. In WSI, these context embeddings are clustered to find the word senses present in the text. ICE is based on a novel method for combining word embeddings using continuous Skip-gram, based on both se- mantic and a temporal aspects of context words. ICE is evaluated both in a new system, and in an extension to a previous system for WSI. In both cases, we surpass previous state-of-the-art, on the WSI task of SemEval-2013, which highlights the generality of ICE. Our proposed system achieves a 33% relative improvement.
  •  
8.
  • Tahmasebi, Nina, 1982, et al. (author)
  • Visions and open challenges for a knowledge-based culturomics
  • 2015
  • In: International Journal on Digital Libraries. - : Springer Science and Business Media LLC. - 1432-5012 .- 1432-1300. ; 15:2-4, s. 169-187
  • Journal article (peer-reviewed)abstract
    • The concept of culturomics was born out of the availability of massive amounts of textual data and the interest to make sense of cultural and language phenomena over time. Thus far however, culturomics has only made use of, and shown the great potential of, statistical methods. In this paper, we present a vision for a knowledge-based culturomics that complements traditional culturomics. We discuss the possibilities and challenges of combining knowledge-based methods with statistical methods and address major challenges that arise due to the nature of the data; diversity of sources, changes in language over time as well as temporal dynamics of information in general. We address all layers needed for knowledge-based culturomics, from natural language processing and relations to summaries and opinions.
  •  
9.
  • Adouane, Wafia, 1985, et al. (author)
  • Arabicized and Romanized Berber Automatic Identification
  • 2016
  • In: Proceedings of TICAM 2016. - Morocco : IRCAM.
  • Conference paper (peer-reviewed)abstract
    • We present an automatic language identification tool for both Arabicized Berber (Berber written in the Arabic script) and Romanized Berber (Berber written in the Latin script). The focus is on short texts (social media content). We use supervised machine learning method with character and word-based n-gram models as features. We also describe the corpora used in this paper. For both Arabicized and Romanized Berber, character-based 5-grams score the best giving an F-score of 99.50%.
  •  
10.
  •  
11.
  • Adouane, Wafia, 1985, et al. (author)
  • Automatic Detection of Arabicized Berber and Arabic Varieties
  • 2016
  • In: Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects; 63–72; December 12; Osaka, Japan.
  • Conference paper (peer-reviewed)abstract
    • Automatic Language Identification (ALI) is the detection of the natural language of an input text by a machine. It is the first necessary step to do any language-dependent natural language processing task. Various methods have been successfully applied to a wide range of languages, and the state-of-the-art automatic language identifiers are mainly based on character n-gram models trained on huge corpora. However, there are many languages which are not yet automatically processed, for instance minority and informal languages. Many of these languages are only spoken and do not exist in a written format. Social media platforms and new technologies have facilitated the emergence of written format for these spoken languages based on pronunciation. The latter are not well represented on the Web, commonly referred to as under-resourced languages, and the current available ALI tools fail to properly recognize them. In this paper, we revisit the problem of ALI with the focus on Arabicized Berber and dialectal Arabic short texts. We introduce new resources and evaluate the existing methods. The results show that machine learning models combined with lexicons are well suited for detecting Arabicized Berber and different Arabic varieties and distinguishing between them, giving a macro-average F-score of 92.94%.
  •  
12.
  • Adouane, Wafia, 1985, et al. (author)
  • Gulf Arabic Resource Building for Sentiment Analysis
  • 2016
  • In: Proceedings of the Language Resources and Evaluation Conference (LREC), 23-28 May 2016, Portorož, Slovenia. - : European Language Resources Association. - 9782951740891
  • Conference paper (peer-reviewed)abstract
    • This paper deals with building linguistic resources for Gulf Arabic, one of the Arabic variations, for sentiment analysis task using machine learning. To our knowledge, no previous works were done for Gulf Arabic sentiment analysis despite the fact that it is present in different online platforms. Hence, the first challenge is the absence of annotated data and sentiment lexicons. To fill this gap, we created these two main linguistic resources. Then we conducted different experiments: use Naive Bayes classifier without any lexicon; add a sentiment lexicon designed basically for MSA; use only the compiled Gulf Arabic sentiment lexicon and finally use both MSA and Gulf Arabic sentiment lexicons. The Gulf Arabic lexicon gives a good improvement of the classifier accuracy (90.54 %) over a baseline that does not use the lexicon (82.81%), while the MSA lexicon causes the accuracy to drop to (76.83%). Moreover, mixing MSA and Gulf Arabic lexicons causes the accuracy to drop to (84.94%) compared to using only Gulf Arabic lexicon. This indicates that it is useless to use MSA resources to deal with Gulf Arabic due to the considerable differences and conflicting structures between these two languages.
  •  
13.
  • Adouane, Wafia, 1985, et al. (author)
  • Romanized Arabic and Berber Detection Using PPM and Dictionary Methods
  • 2017
  • In: 13th ACS/IEEE International Conference on Computer Systems and Applications AICCSA 2016. - Morocco. - 2161-5322. - 9781509043200
  • Conference paper (peer-reviewed)abstract
    • Arabic is one of the Semitic languages written in Arabic script in its standard form. However, the recent rise of social media and new technologies has contributed considerably to the emergence of a new form of Arabic, namely Arabic written in Latin scripts, often called Romanized Arabic or Arabizi. While Romanized Arabic is an informal language, Berber or Tamazight uses Latin script in its standard form with some orthography differences depending on the country it is used in. Both these languages are under-resourced and unknown to the state-of-the-art language identifiers. In this paper, we present a language automatic identifier for both Romanized Arabic and Romanized Berber. We also describe the built linguistic resources (large dataset and lexicons) including a wide range of Arabic dialects (Algerian, Egyptian, Gulf, Iraqi, Levantine, Moroccan and Tunisian dialects) as well as the most popular Berber varieties (Kabyle, Tashelhit, Tarifit, Tachawit and Tamzabit). We use the Prediction by Partial Matching (PPM) and dictionary-based methods. The methods reach a macro-average F-Measure of 98.74% and 97.60% respectively.
  •  
14.
  • Adouane, Wafia, 1985, et al. (author)
  • Romanized Arabic and Berber Detection Using Prediction by Partial Matching and Dictionary Methods
  • 2016
  • In: 2016 IEEE/ACS 13TH INTERNATIONAL CONFERENCE OF COMPUTER SYSTEMS AND APPLICATIONS (AICCSA). - 9781509043200
  • Conference paper (peer-reviewed)abstract
    • Arabic is one of the Semitic languages written in Arabic script in its standard form. However, the recent rise of social media and new technologies has contributed considerably to the emergence of a new form of Arabic, namely Arabic written in Latin scripts, often called Romanized Arabic or Arabizi. While Romanized Arabic is an informal language, Berber or Tamazight uses Latin script in its standard form with some orthography differences depending on the country it is used in. Both these languages are under-resourced and unknown to the state-of-theart language identifiers. In this paper, we present a language automatic identifier for both Romanized Arabic and Romanized Berber. We also describe the built linguistic resources (large dataset and lexicons) including a wide range of Arabic dialects (Algerian, Egyptian, Gulf, Iraqi, Levantine, Moroccan and Tunisian dialects) as well as the most popular Berber varieties (Kabyle, Tashelhit, Tarifit, Tachawit and Tamzabit). We use the Prediction by Partial Matching (PPM) and dictionary-based methods. The methods reach a macro-average F-Measure of 98.74% and 97.60% respectively.
  •  
15.
  • Adouane, Wafia, 1985, et al. (author)
  • Romanized Berber and Romanized Arabic Automatic Language Identification Using Machine Learning
  • 2016
  • In: Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects; 53–61; December 12, 2016 ; Osaka, Japan. - : Association for Computational Linguistics. - 0736-587X.
  • Conference paper (peer-reviewed)abstract
    • The identification of the language of text/speech input is the first step to be able to properly do any language-dependent natural language processing. The task is called Automatic Language Identification (ALI). Being a well-studied field since early 1960’s, various methods have been applied to many standard languages. The ALI standard methods require datasets for training and use character/word-based n-gram models. However, social media and new technologies have contributed to the rise of informal and minority languages on the Web. The state-of-the-art automatic language identifiers fail to properly identify many of them. Romanized Arabic (RA) and Romanized Berber (RB) are cases of these informal languages which are under-resourced. The goal of this paper is twofold: detect RA and RB, at a document level, as separate languages and distinguish between them as they coexist in North Africa. We consider the task as a classification problem and use supervised machine learning to solve it. For both languages, character-based 5-grams combined with additional lexicons score the best, F-score of 99.75% and 97.77% for RB and RA respectively.
  •  
16.
  • Borin, Lars, 1957, et al. (author)
  • Here be dragons? The perils and promises of inter-resource lexical-semantic mapping
  • 2015
  • In: Linköping Electronic Conference Proceedings. Semantic resources and semantic annotation for Natural Language Processing and the Digital Humanities. Workshop at NODALIDA , May 11, 13-18 2015, Vilnius. - 1650-3686 .- 1650-3740. - 9789175190495 ; 112, s. 1-11
  • Conference paper (peer-reviewed)abstract
    • Lexical-semantic knowledges sources are a stock item in the language technologist’s toolbox, having proved their practical worth in many and diverse natural language processing (NLP) applications. In linguistics, lexical semantics comes in many flavors, but in the NLP world, wordnets reign more or less supreme. There has been some promising work utilizing Roget-style thesauruses instead, but wider experimentation is hampered by the limited availability of such resources. The work presented here is a first step in the direction of creating a freely available Roget-style lexical resource for modern Swedish. Here, we explore methods for automatic disambiguation of interresource mappings with the longer-term goal of utilizing similar techniques for automatic enrichment of lexical-semantic resources.
  •  
17.
  • Dods, Robert, 1989, et al. (author)
  • From Macrocrystals to Microcrystals: A Strategy for Membrane Protein Serial Crystallography.
  • 2017
  • In: Structure. - : Elsevier BV. - 1878-4186 .- 0969-2126. ; 25:9, s. 1461-1468
  • Journal article (peer-reviewed)abstract
    • Serial protein crystallography was developed at X-ray free-electron lasers (XFELs) and is now also being applied at storage ring facilities. Robust strategies for the growth and optimization of microcrystals are needed to advance the field. Here we illustrate a generic strategy for recovering high-density homogeneous samples of microcrystals starting from conditions known to yield large (macro) crystals of the photosynthetic reaction center of Blastochloris viridis (RCvir). We first crushed these crystals prior to multiple rounds of microseeding. Each cycle of microseeding facilitated improvements in the RCvir serial femtosecond crystallography (SFX) structure from 3.3-Å to 2.4-Å resolution. This approach may allow known crystallization conditions for other proteins to be adapted to exploit novel scientific opportunities created by serial crystallography.
  •  
18.
  • Ehrlemark, Anna, et al. (author)
  • Retrieving Occurrences of Grammatical Constructions
  • 2016
  • In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics : Technical Papers, December 11–17; Osaka, Japan. - 1525-2477. - 9784879747020
  • Conference paper (peer-reviewed)abstract
    • Finding authentic examples of grammatical constructions is central in constructionist approaches to linguistics, language processing, and second language learning. In this paper, we address this problem as an information retrieval (IR) task. To facilitate research in this area, we built a benchmark collection by annotating the occurrences of six constructions in a Swedish corpus. Furthermore, we implemented a simple and flexible retrieval system for finding construction occurrences, in which the user specifies a ranking function using lexical-semantic similarities (lexicon-based or distributional). The system was evaluated using standard IR metrics on the new benchmark, and we saw that lexical-semantical rerankers improve significantly over a purely surface-oriented system, but must be carefully tailored for each individual construction.
  •  
19.
  • Fares, Murhaf, et al. (author)
  • The 2018 Shared Task on Extrinsic Parser Evaluation: On the Downstream Utility of English Universal Dependency Parsers
  • 2018
  • In: Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. - : Association for Computational Linguistics.
  • Conference paper (peer-reviewed)abstract
    • We summarize empirical results and tentative conclusions from the Second Extrinsic Parser Evaluation Initiative (EPE 2018). We review the basic task setup, downstream applications involved, and end-to-end results for seventeen participating parsers. Based on both quantitative and qualitative analysis, we correlate intrinsic evaluation results at different layers of morph-syntactic analysis with observed downstream behavior.
  •  
20.
  • Ghanimifard, Mehdi, 1984, et al. (author)
  • Enriching Word-sense Embeddings with Translational Context
  • 2015
  • In: Proceedings of Recent Advances in Natural Language Processing / edited by Galia Angelova, Kalina Bontcheva, Ruslan Mitkov. International Conference, Hissar, Bulgaria 7–9 September, 2015. - 1313-8502. ; , s. 208-215
  • Conference paper (peer-reviewed)abstract
    • Vector-space models derived from corpora are an effective way to learn a representation of word meaning directly from data, and these models have many uses in practical applications. A number of unsupervised approaches have been proposed to automatically learn representations of word senses directly from corpora, but since these methods use no information but the words themselves, they sometimes miss distinctions that could be possible to make if more information were available. In this paper, we present a general framework that we call context enrichment that incorporates external information during the training of multi-sense vector-space models. Our approach is agnostic as to which external signal is used to enrich the context, but in this work we consider the use of translations as the source of enrichment. We evaluated the models trained using the translation-enriched context using several similarity benchmarks and a word analogy test set. In all our evaluations, the enriched model outperformed the purely word-based baseline soundly.
  •  
21.
  • Johansson, Richard, 1975, et al. (author)
  • Combining Relational and Distributional Knowledge for Word Sense Disambiguation
  • 2015
  • In: Proceedings of the 20th Nordic Conference of Computational Linguistics, May 12-13, Vilnius, Lithuania. Linköping Electronic Conference Proceedings 109, Linköping University Electronic Press... - 1650-3686 .- 1650-3740. - 9789175190983 ; , s. 69-78
  • Conference paper (peer-reviewed)abstract
    • We present a new approach to word sense disambiguation derived from recent ideas in distributional semantics. The input to the algorithm is a large unlabeled corpus and a graph describing how senses are related; no sense-annotated corpus is needed. The fundamental idea is to embed meaning representations of senses in the same continuous-valued vector space as the representations of words. In this way, the knowledge encoded in the lexical resource is combined with the infor- mation derived by the distributional methods. Once this step has been carried out, the sense representations can be plugged back into e.g. the skip-gram model, which allows us to compute scores for the different possible senses of a word in a given context. We evaluated the new word sense disambiguation system on two Swedish test sets annotated with senses defined by the SALDO lexical resource. In both evaluations, our system soundly outperformed random and first-sense baselines. Its accuracy was slightly above that of a well- known graph-based system, while being computationally much more efficient,
  •  
22.
  • Johansson, Richard, 1975, et al. (author)
  • Embedding a Semantic Network in a Word Space
  • 2015
  • In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Denver, United States, May 31 – June 5, 2015. - 9781941643495 ; , s. 1428-1433
  • Conference paper (peer-reviewed)abstract
    • We present a framework for using continuous- space vector representations of word meaning to derive new vectors representing the meaning of senses listed in a semantic network. It is a post-processing approach that can be applied to several types of word vector representations. It uses two ideas: first, that vectors for polysemous words can be decomposed into a convex combination of sense vectors; secondly, that the vector for a sense is kept similar to those of its neighbors in the network.This leads to a constrained optimization problem, and we present an approximation for the case when the distance function is the squared Euclidean. We applied this algorithm on a Swedish semantic network, and we evaluate the quality of the resulting sense representations extrinsically by showing that they give large improvements when used in a classifier that creates lexical units for FrameNet frames.
  •  
23.
  • Johansson, Richard, 1975 (author)
  • EPE 2017: The Trento–Gothenburg Opinion Extraction System
  • 2017
  • In: Proceedings of the 2017 Shared Task on Extrinsic Parser Evaluation at the Fourth International Conference on Dependency Linguistics and the 15th International Conference on Parsing Technologies. - Stroudsburg, USA : Association for Computational Linguistics (ACL). - 9781945626746
  • Conference paper (peer-reviewed)abstract
    • We give an overview of one of the three downstream systems in the Extrin- sic Parser Evaluation shared task of 2017: the Trento–Gothenburg system for opin- ion extraction. We describe the modi fi ca- tions required to make the system agnos- tic to its input dependency representation, and discuss how the input affects the vari- ous submodules of the system. The results of the EPE shared task are presented and discussed, and to get a more detailed un- derstanding of the effects of the dependen- cies we run two of the submodules sepa- rately. The results suggest that the module where the effects are strongest is the opin- ion holder extraction module, which can be explained by the fact that this module uses several dependency-based features. For the other modules, the effects are hard to measure.
  •  
24.
  • Malmerberg, Erik, 1980, et al. (author)
  • Conformational activation of visual rhodopsin in native disc membranes
  • 2015
  • In: Science Signaling. - : American Association for the Advancement of Science (AAAS). - 1945-0877 .- 1937-9145. ; 8:367
  • Journal article (peer-reviewed)abstract
    • Rhodopsin is the G protein-coupled receptor (GPCR) that serves as a dim-light receptor for vision in vertebrates. We probed light-induced conformational changes in rhodopsin in its native membrane environment at room temperature using time-resolved wide-angle x-ray scattering. We observed a rapid conformational transition that is consistent with an outward tilt of the cytoplasmic portion of transmembrane helix 6 concomitant with an inward movement of the cytoplasmic portion of transmembrane helix 5. These movements were considerably larger than those reported from the basis of crystal structures of activated rhodopsin, implying that light activation of rhodopsin involves a more extended conformational change than was previously suggested.
  •  
25.
  • Middeldorp, Christel M., et al. (author)
  • The Early Growth Genetics (EGG) and EArly Genetics and Lifecourse Epidemiology (EAGLE) consortia : design, results and future prospects
  • 2019
  • In: European Journal of Epidemiology. - : Springer Science and Business Media LLC. - 0393-2990 .- 1573-7284. ; 34:3, s. 279-300
  • Journal article (peer-reviewed)abstract
    • The impact of many unfavorable childhood traits or diseases, such as low birth weight and mental disorders, is not limited to childhood and adolescence, as they are also associated with poor outcomes in adulthood, such as cardiovascular disease. Insight into the genetic etiology of childhood and adolescent traits and disorders may therefore provide new perspectives, not only on how to improve wellbeing during childhood, but also how to prevent later adverse outcomes. To achieve the sample sizes required for genetic research, the Early Growth Genetics (EGG) and EArly Genetics and Lifecourse Epidemiology (EAGLE) consortia were established. The majority of the participating cohorts are longitudinal population-based samples, but other cohorts with data on early childhood phenotypes are also involved. Cohorts often have a broad focus and collect(ed) data on various somatic and psychiatric traits as well as environmental factors. Genetic variants have been successfully identified for multiple traits, for example, birth weight, atopic dermatitis, childhood BMI, allergic sensitization, and pubertal growth. Furthermore, the results have shown that genetic factors also partly underlie the association with adult traits. As sample sizes are still increasing, it is expected that future analyses will identify additional variants. This, in combination with the development of innovative statistical methods, will provide detailed insight on the mechanisms underlying the transition from childhood to adult disorders. Both consortia welcome new collaborations. Policies and contact details are available from the corresponding authors of this manuscript and/or the consortium websites.
  •  
26.
  • Mogren, Olof, 1980, et al. (author)
  • Character-based Recurrent Neural Networks for Morphological Relational Reasoning
  • 2017
  • In: Proceedings of the First Workshop on Subword and Character Level Models in NLP. - Stroudsburg, PA, United States : Association for Computational Linguistics.
  • Conference paper (peer-reviewed)abstract
    • We present a model for predicting word forms based on morphological relational reasoning with analogies. While previous work has explored tasks such as morphological inflection and reinflection, these models rely on an explicit enumeration of morphological features, which may not be available in all cases. To address the task of predicting a word form given a demo relation (a pair of word forms) and a query word, we devise a character-based recurrent neural network architecture using three separate encoders and a decoder. We also investigate a multiclass learning setup, where the prediction of the relation type label is used as an auxiliary task. Our results show that the exact form can be predicted for English with an accuracy of 94.7%. For Swedish, which has a more complex morphology with more inflectional patterns for nouns and verbs, the accuracy is 89.3%. We also show that using the auxiliary task of learning the relation type speeds up convergence and improves the prediction accuracy for the word generation task.
  •  
27.
  • Mogren, Olof, et al. (author)
  • Character-based Recurrent Neural Networks for Morphological Relational Reasoning
  • 2019
  • In: Journal of Language Modeling. - : Institute of Computer Science, Polish Academy of Sciences. - 2299-856X .- 2299-8470. ; 7:1, s. 93-124
  • Journal article (peer-reviewed)abstract
    • We present a model for predicting inflected word forms based on morphological analogies. Previous work includes rule-based algorithms that determine and copy affixes from one word to another, with limited support for varying inflectional patterns. In related tasks such as morphological reinflection, the algorithm is provided with an explicit enumeration of morphological features which may not be available in all cases. In contrast, our model is feature-free: instead of explicitly representing morphological features, the model is given a demo pair that implicitly specifies a morphological relation (such as write:writes specifying infinitive:present). Given this demo relation and a query word (e.g. watch), the model predicts the target word (e.g. watches). To address this task, we devise a character-based recurrent neural network architecture using three separate encoders and one decoder. Our experimental evaluation on five different languages shows tha the exact form can be predicted with high accuracy, consistently beating the baseline methods. Particularly, for English the prediction accuracy is 95.60%. The solution is not limited to copying affixes from the demo relation, but generalizes to words with varying inflectional patterns, and can abstract away from the orthographic level to the level of morphological forms.
  •  
28.
  • Nieto Piña, Luis, 1988, et al. (author)
  • A Simple and Efficient Method to Generate Word Sense Representations
  • 2015
  • In: Proceedings of International Conference in Recent Advances in Natural Language Processing / edited by Galia Angelova, Kalina Bontcheva, Ruslan Mitkov, Hissar, Bulgaria 7–9 September, 2015. - 1313-8502. ; , s. 465-472
  • Conference paper (peer-reviewed)abstract
    • Distributed representations of words have boosted the performance of many Natural Language Processing tasks. However, usually only one representation per word is obtained, not acknowledging the fact that some words have multiple meanings. This has a negative effect on the individual word representations and the language model as a whole. In this paper we present a simple model that enables recent techniques for building word vectors to represent distinct senses of polysemic words. In our assessment of this model we show that it is able to effectively discriminate between words’ senses and to do so in a computationally efficient manner.
  •  
29.
  • Nieto Piña, Luis, 1988, et al. (author)
  • Automatically Linking Lexical Resources with Word Sense Embedding Models
  • 2018
  • In: The Third Workshop on Semantic Deep Learning (SemDeep-3), August 20th, 2018, Santa Fe, New Mexico, USA / Luis Espinosa Anke, Thierry Declerck, Dagmar Gromann (eds.). - 9781948087568
  • Conference paper (peer-reviewed)abstract
    • Automatically learnt word sense embeddings are developed as an attempt to refine the capabilities of coarse word embeddings. The word sense representations obtained this way are, however, sensitive to underlying corpora and parameterizations, and they might be difficult to relate to word senses as formally defined by linguists. We propose to tackle this problem by devising a mechanism to establish links between word sense embeddings and lexical resources created by experts. We evaluate the applicability of these links in a task to retrieve instances of Swedish word senses not present in the lexicon.
  •  
30.
  • Nieto Piña, Luis, 1988, et al. (author)
  • Benchmarking Word Sense Disambiguation Systems for Swedish
  • 2016
  • In: The Sixth Swedish Language Technology Conference.
  • Research review (peer-reviewed)abstract
    • We compare several word sense disambiguation systems for Swedish and evaluate them on seven different sense-annotated corpora. Our results show that unsupervised systems beat a random baseline, but generally do not outperform a first-sense baseline considerably. On a lexical-sample dataset that allows us to train a supervised system, the unsupervised disambiguators are strongly outperformed by the supervised one.
  •  
31.
  • Nieto Piña, Luis, 1988, et al. (author)
  • Embedding Senses for Efficient Graph-based Word Sense Disambiguation
  • 2016
  • In: Proceedings of TextGraphs-10: the Workshop on Graph-based Methods for Natural Language Processing. - : Association for Computational Linguistics.
  • Conference paper (peer-reviewed)abstract
    • We propose a simple graph-based method for word sense disambiguation (WSD) where sense and context embeddings are constructed by applying the Skip-gram method to random walks over the sense graph. We used this method to build a WSD system for Swedish using the SALDO lexicon, and evaluated it on six different annotated test sets. In all cases, our system was several orders of magnitude faster than a state-of-the-art PageRank-based system, while outperforming a random baseline soundly.
  •  
32.
  • Nieto Piña, Luis, 1988, et al. (author)
  • Training Word Sense Embeddings With Lexicon-based Regularization
  • 2017
  • In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Taipei, Taiwan, November 27 – December 1, 2017. - : Asian Federation of Natural Language Processing. - 9781948087001
  • Conference paper (peer-reviewed)abstract
    • We propose to improve word sense embeddings by enriching an automatic corpus-based method with lexicographic data. Information from a lexicon is introduced into the learning algorithm’s objective function through a regularizer. The incorporation of lexicographic data yields embeddings that are able to reflect expertdefined word senses, while retaining the robustness, high quality, and coverage of automatic corpus-based methods. These properties are observed in a manual inspection of the semantic clusters that different degrees of regularizer strength create in the vector space. Moreover, we evaluate the sense embeddings in two downstream applications: word sense disambiguation and semantic frame prediction, where they outperform simpler approaches. Our results show that a corpusbased model balanced with lexicographic data learns better representations and improve their performance in downstream tasks
  •  
33.
  • Oepen, Stephan, et al. (author)
  • The 2017 Shared Task on Extrinsic Parser Evaluation. Towards a Reusable Community Infrastructure
  • 2017
  • In: Proceedings of the 2017 Shared Task on Extrinsic Parser Evaluation at the Fourth International Conference on Dependency Linguistics and the 15th International Conference on Parsing Technologies. - Stroudsburg, USA : Association for Computational Linguistics (ACL). - 9781945626746
  • Conference paper (peer-reviewed)abstract
    • The 2017 Shared Task on Extrinsic Parser Evaluation (EPE 2017) seeks to provide better estimates of the relative utility of different types of dependency representa- tions for a variety of downstream applica- tions that depend centrally on the analysis of grammatical structure. EPE 2017 de- fi nes a generalized notion of lexicalized syntactico-semantic dependency represen- tations and provides a common interchange format to three state-of-the-art downstream applications, viz. biomedical event extrac- tion, negation resolution, and fi ne-grained opinion analysis. As a fi rst step towards building a generic and extensible infras- tructure for extrinsic parser evaluation, the downstream applications have been gener- alized to support a broad range of diverese dependency representations (including di- vergent sentence and token boundaries) and to allow fully automated re-training and evaluation for a speci fi c collection of parser outputs. Nine teams participated in EPE 2017, submitting 49 distinct runs that encompass many different families of dependency representations, distinct ap- proaches to preprocessing and parsing, and various types and volumes of training data.
  •  
34.
  • Sandberg, Linn A.C., et al. (author)
  • Issue salience on twitter during swedish party leaders' debates
  • 2019
  • In: Nordicom Review. - : Walter de Gruyter GmbH. - 2001-5119 .- 1403-1108. ; 40:2, s. 49-61
  • Journal article (peer-reviewed)abstract
    • The objective of this study is to contribute knowledge about formation of political agendas on Twitter during mediated political events, using the party leaders' debates in Sweden before the general election of 2014 as a case study. Our findings show that issues brought up during the debates were largely mirrored on Twitter, with one striking discrepancy. Contrary to our expectations, issues on the left-right policy dimension were more salient on Twitter than in the debates, whereas issues such as the environment, immigration and refugees, all tied to a liberal-authoritarian value axis, were less salient on Twitter. © 2019 Sandberg, L.A.C., Bjereld, U., Bunyik, K., Forsberg, M. & Johansson, R., published by Sciendo 2019.
  •  
35.
  • Sharma, Amit, et al. (author)
  • Asymmetry in serial femtosecond crystallography data
  • 2017
  • In: Acta Crystallographica a-Foundation and Advances. - : International Union of Crystallography (IUCr). - 2053-2733. ; 73, s. 93-101
  • Journal article (peer-reviewed)abstract
    • Serial crystallography is an increasingly important approach to protein crystallography that exploits both X-ray free-electron laser (XFEL) and synchrotron radiation. Serial crystallography recovers complete X-ray diffraction data by processing and merging diffraction images from thousands of randomly oriented non-uniform microcrystals, of which all observations are partial Bragg reflections. Random fluctuations in the XFEL pulse energy spectrum, variations in the size and shape of microcrystals, integrating over millions of weak partial observations and instabilities in the XFEL beam position lead to new types of experimental errors. The quality of Bragg intensity estimates deriving from serial crystallography is therefore contingent upon assumptions made while modeling these data. Here it is observed that serial femtosecond crystallography (SFX) Bragg reflections do not follow a unimodal Gaussian distribution and it is recommended that an idealized assumption of single Gaussian peak profiles be relaxed to incorporate apparent asymmetries when processing SFX data. The phenomenon is illustrated by re-analyzing data collected from microcrystals of the Blastochloris viridis photosynthetic reaction center and comparing these intensity observations with conventional synchrotron data. The results show that skewness in the SFX observations captures the essence of the Wilson plot and an empirical treatment is suggested that can help to separate the diffraction Bragg intensity from the background.
  •  
36.
  • Åkerström, Joakim, et al. (author)
  • Natural Language Processing in Policy Evaluation: Extracting Policy Conditions from IMF Loan Agreements
  • 2019
  • In: Proceedings of the 22nd Nordic Conference on Computational Linguistics; September 30 – October 2; Turku, Finland. - : Linköping University Electronic Press.
  • Conference paper (peer-reviewed)abstract
    • Social science researchers often use text as the raw data in investigations: for instance, when investigating the effects of IMF policies on the development of countries under IMF programs, researchers typically encode structured descriptions of the programs using a time-consuming manual effort. Making this process automatic may open up new opportunities in scaling up such investigations. As a first step towards automatizing this coding process, we describe an experiment where we apply a sentence classifier that automatically detects mentions of policy conditions in IMF loan agreements and divides them into different types. The results show that the classifier is generally able to detect the policy conditions, although some types are hard to distinguish.
  •  
Skapa referenser, mejla, bekava och länka
  • Result 1-36 of 36
Type of publication
conference paper (27)
journal article (8)
research review (1)
Type of content
peer-reviewed (34)
other academic/artistic (2)
Author/Editor
Johansson, Richard, ... (31)
Adouane, Wafia, 1985 (7)
Adesam, Yvonne, 1975 (5)
Bouma, Gerlof, 1979 (5)
Semmar, Nasredine (5)
Borin, Lars, 1957 (3)
show more...
Forsberg, Markus, 19 ... (3)
Dubhashi, Devdatt, 1 ... (2)
Katona, Gergely, 197 ... (2)
Straker, Leon (1)
Groop, Leif (1)
Jacobsson, Bo, 1960 (1)
Magnus, Per (1)
Ahlqvist, Emma (1)
Fadista, Joao (1)
Exner, Peter (1)
Li, Jin (1)
Tahmasebi, Nina, 198 ... (1)
Raitakari, Olli T (1)
Viikari, Jorma (1)
Heinrich, Joachim (1)
Koppelman, Gerard H. (1)
Melén, Erik (1)
Cooper, Cyrus (1)
Sunyer, Jordi (1)
Melbye, Mads (1)
Richmond, Rebecca C. (1)
Estivill, Xavier (1)
Strachan, David P (1)
Bobicev, Victoria (1)
Semmar, N. (1)
Gauderman, W James (1)
Robinson, Robert C. (1)
Larsson, Henrik, 197 ... (1)
Lichtenstein, Paul (1)
Almgren, Peter (1)
McCarthy, Mark I (1)
Ahluwalia, Tarunveer ... (1)
Linneberg, Allan (1)
Grarup, Niels (1)
Pedersen, Oluf (1)
Hansen, Torben (1)
Ma, Ronald C W (1)
van Duijn, Cornelia ... (1)
Mohlke, Karen L (1)
Nugues, Pierre (1)
Liu, Jun (1)
Johansson, Stefan (1)
Seuring, Carolin (1)
Willemsen, Gonneke (1)
show less...
University
University of Gothenburg (36)
Chalmers University of Technology (4)
Lund University (2)
Uppsala University (1)
Örebro University (1)
Mid Sweden University (1)
show more...
Karolinska Institutet (1)
show less...
Language
English (36)
Research subject (UKÄ/SCB)
Natural sciences (31)
Humanities (15)
Engineering and Technology (2)
Medical and Health Sciences (2)
Social Sciences (2)

Year

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view