SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Klang Marcus) "

Sökning: WFRF:(Klang Marcus)

  • Resultat 1-10 av 16
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Ahmed, Rafsan, et al. (författare)
  • EasyNER: A Customizable Easy-to-Use Pipeline for Deep Learning- and Dictionary-based Named Entity Recognition from Medical Text
  • 2023
  • Annan publikation (övrigt vetenskapligt/konstnärligt)abstract
    • Medical research generates a large number of publications with the PubMed database already containing >35 million research articles. Integration of the knowledge scattered across this large body of literature could provide key insights into physiological mechanisms and disease processes leading to novel medical interventions. However, it is a great challenge for researchers to utilize this information in full since the scale and complexity of the data greatly surpasses human processing abilities. This becomes especially problematic in cases of extreme urgency like the COVID-19 pandemic. Automated text mining can help extract and connect information from the large body of medical research articles. The first step in text mining is typically the identification of specific classes of keywords (e.g., all protein or disease names), so called Named Entity Recognition (NER). Here we present an end-to-end pipeline for NER of typical entities found in medical research articles, including diseases, cells, chemicals, genes/proteins, and species. The pipeline can access and process large medical research article collections (PubMed, CORD-19) or raw text and incorporates a series of deep learning models fine-tuned on the HUNER corpora collection. In addition, the pipeline can perform dictionary-based NER related to COVID-19 and other medical topics. Users can also load their own NER models and dictionaries to include additional entities. The output consists of publication-ready ranked lists and graphs of detected entities and files containing the annotated texts. An associated script allows rapid inspection of the results for specific entities of interest. As model use cases, the pipeline was deployed on two collections of autophagy-related abstracts from PubMed and on the CORD19 dataset, a collection of 764 398 research article abstracts related to COVID-19.
  •  
2.
  • Exner, Peter, et al. (författare)
  • A Distant Supervision Approach to Semantic Role Labeling
  • 2015
  • Ingår i: Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics (*SEM 2015). - 9781941643396 ; , s. 239-248
  • Konferensbidrag (refereegranskat)abstract
    • Semanticrolelabelinghasbecomeakeymodule for many language processing applications such as question answering, information extraction, sentiment analysis, and machine translation. To build an unrestricted semantic role labeler, the first step is to develop a comprehensive proposition bank. However, creating such a bank is a costly enterprise, which has only been achieved for a handful of languages. In this paper, we describe a technique to build proposition banks for new languages using distant supervision. Starting from PropBank inEnglishandlooselyparallelcorporasuchas versions of Wikipedia in different languages, we carried out a mapping of semantic propositions we extracted from English to syntactic structures in Swedish using named entities. We trained a semantic parser on the generated Swedishpropositionsandwereporttheresults we obtained. Using the CoNLL 2009 evaluation script, we could reach the scores of 52.25 for labeled propositions and 62.44 for the unlabeled ones. We believe our approach can be appliedtotrainsemanticrolelabelersforother resource-scarce languages.
  •  
3.
  • Exner, Peter, et al. (författare)
  • Multilingual Supervision of Semantic Annotation
  • 2016
  • Ingår i: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. - 9784879747020 ; , s. 1007-1017
  • Konferensbidrag (refereegranskat)abstract
    • In this paper, we investigate the annotation projection of semantic units in a practical setting. Previous approaches have focused on using parallel corpora for semantic transfer. We evaluate an alternative approach using loosely parallel corpora that does not require the corpora to be exact translations of each other. We developed a method that transfers semantic annotations from one language to another using sentences aligned by entities, and we extended it to include alignments by entity-like linguistic units. We conducted our experiments on a large scale using the English, Swedish, and French language editions of Wikipedia. Our results show that the annotationprojectionusingentitiesincombinationwithlooselyparallelcorporaprovidesaviable approach to extending previous attempts. In addition, it allows the generation of proposition banks upon which semantic parsers can be trained.
  •  
4.
  • Exner, Peter, et al. (författare)
  • Using distant supervision to build a proposition bank
  • 2014
  • Konferensbidrag (refereegranskat)abstract
    • Semantic role labeling has become a key module of many language processing applications. To build an unrestricted semantic role labeler, the first step is to develop a comprehensive proposition bank. However, building such a bank is a costly enterprise, which has only been achieved for a handful of languages. In this paper, we describe a technique to build proposition banks for new languages using distant supervision. Starting from PropBank in English and loosely parallel corpora such as versions of Wikipedia in different languages, we carried out a mapping of semantic propositions we extracted from English to syntactic structures in Swedish using named entities. We could identify 2,333 predicate–argument frames in Swedish.
  •  
5.
  • Klang, Marcus (författare)
  • Building Knowledge Graphs : Processing Infrastructure and Named Entity Linking
  • 2019
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Things such as organizations, persons, or locations are ubiquitous in all texts circulating on the internet, particularly in the news, forum posts, and social media. Today, there is more written material than any single person can read through during a typical lifespan. Automatic systems can help us amplify our abilities to find relevant information, where, ideally, a system would learn knowledge from our combined written legacy. Ultimately, this would enable us, one day, to build automatic systems that have reasoning capabilities and can answer any question in any human language.In this work, I explore methods to represent linguistic structures in text, build processing infrastructures, and how they can be combined to process a comprehensive collection of documents. The goal is to extract knowledge from text via things, entities. As text, I focused on encyclopedic resources such as Wikipedia.As knowledge representation, I chose to use graphs, where the entities correspond to graph nodes. To populate such graphs, I created a named entity linker that can find entities in multiple languages such as English, Spanish, and Chinese, and associate them to unique identifiers. In addition, I describe a published state-of-the-art Swedish named entity recognizer that finds mentions of entities in text that I evaluated on the four majority classes in the Stockholm-Umeå Corpus (SUC) 3.0. To collect the text resources needed for the implementation of the algorithms and the training of the machine-learning models, I also describe a document representation, Docria, that consists of multiple layers of annotations: A model capable of representing structures found in Wikipedia and beyond. Finally, I describe how to construct processing pipelines for large-scale processing with Wikipedia using Docria.
  •  
6.
  • Klang, Marcus, et al. (författare)
  • Comparing LSTM and FOFE-based Architectures for Named Entity Recognition
  • 2018
  • Konferensbidrag (refereegranskat)abstract
    • LSTM architectures (Hochreiter and Schmidhuber, 1997) have become standard to recognize named entities (NER) in text (Lample et al., 2016; Chiu and Nichols, 2016). Nonetheless, Zhang et al. (2015) recently proposed an approach based on fixed-size ordinally forgetting encoding (FOFE) to translate variable-length contexts into fixed-length features. This encoding method can be used with feed-forward neural networks and, despite its simplicity, reach accuracy rates matching those of LTSMs in NER tasks (Xu et al., 2017). However, the figures reported in the NER articles are difficult to compare precisely as the experiments often use external resources such as gazetteers and corpora. In this paper, we describe an experimental setup, where we reimplemented the two core algorithms, to level the differences in initial conditions. This allowed us to measure more precisely the accuracy of both architectures and to report what we believe are unbiased results on English and Swedish datasets.
  •  
7.
  • Klang, Marcus, et al. (författare)
  • Docforia: A Multilayer Document Model
  • 2016
  • Konferensbidrag (refereegranskat)abstract
    • In this paper, we describe Docforia, a multilayer document model and application programming interface (API) to store formatting, lexical, syntactic, and semantic annotations on Wikipedia and other kinds of text and visualize them. While Wikipedia has become a major NLP resource, its scale and heterogeneity makes it relatively difficult to do experimentations on the whole corpus. These experimentations are rendered even more complex as, to the best of our knowledge, there is no available tool to visualize easily the results of a processing pipeline. We designed Docforia so that it can store millions of documents and billions of tokens, annotated using different processing tools, that themselves use multiple formats, and compatible with cluster computing frameworks such as Hadoop or Spark. The annotation output, either partial or complete, can then be shared more easily. To validate Docforia, we processed six language versions of Wikipedia: English, French, German, Spanish, Russian, and Swedish, up to semantic role labeling, depending on the NLP tools available for a given language. We stored the results in our document model and we created a visualization tool to inspect the annotation results. The Docforia API is available at https://github.com/marcusklang/docforia.
  •  
8.
  • Klang, Marcus, et al. (författare)
  • Docforia: A Multilayer Document Model
  • 2017
  • Ingår i: Proceedings of the 21st Nordic Conference of Computational Linguistics. - 1650-3740 .- 1650-3686. ; 131, s. 226-230
  • Konferensbidrag (refereegranskat)abstract
    • In this paper, we describe Docforia, a multilayer document model and application programming interface (API) to store formatting, lexical, syntactic, and semantic annotations on Wikipedia and other kinds of text and visualize them. While Wikipedia has become a major NLP resource, its scale and heterogeneity makes it relatively difficult to do experimentations on the whole corpus. These experimentations are rendered even more complexas,to the best of our knowledge,there is no available tool to visualize easily the results of a processing pipeline. We designed Docforia so that it can store millions of documents and billions of tokens, annotated using different processing tools,that themselves use multiple formats, and compatible with cluster computing frameworks such as Hadoop or Spark. The annotation output, either partial or complete, can then be shared more easily. To validate Docforia, we processed six language versions of Wikipedia: English, French, German, Spanish, Russian, and Swedish, up to semantic role labeling, depending on the NLP tools available for a given language. We stored the results in our document model and we created a visualization tool to inspect the annotation results.
  •  
9.
  • Klang, Marcus, et al. (författare)
  • Hedwig : A named entity linker
  • 2020
  • Ingår i: LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings. - 9791095546344 ; , s. 4501-4508
  • Konferensbidrag (refereegranskat)abstract
    • Named entity linking is the task of identifying mentions of named things in text, such as “Barack Obama” or “New York”, and linking these mentions to unique identifiers. In this paper, we describe Hedwig, an end-to-end named entity linker, which uses a combination of word and character BILSTM models for mention detection, a Wikidata and Wikipedia-derived knowledge base with global information aggregated over nine language editions, and a PageRank algorithm for entity linking. We evaluated Hedwig on the TAC2017 dataset, consisting of news texts and discussion forums, and we obtained a final score of 59.9% on CEAFmC+, an improvement over our previous generation linker Ugglan, and a trilingual entity link score of 71.9%.
  •  
10.
  • Klang, Marcus, et al. (författare)
  • Langforia: Language pipelines for annotating large collections of documents.
  • 2016
  • Ingår i: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations. - 9784879747037 ; , s. 74-78
  • Konferensbidrag (refereegranskat)abstract
    • In this paper, we describe Langforia, a multilingual processing pipeline to annotate texts with multiple layers: formatting, parts of speech, named entities, dependencies, semantic roles, and entity links. Langforia works as a web service, where the server hosts the language processing components and the client, the input and result visualization. To annotate a text or a Wikipedia page, the user chooses an NLP pipeline and enters the text or the name of the Wikipedia page in the input field of the interface. Once processed, the results are returned to the client, where the user can select the annotation layers s/he wants to visualize. We designed Langforia with a specific focus for Wikipedia, although it can process any type of text. Wikipedia has become an essential encyclopedic corpus used in many NLP projects. However, processing articles and visualizing the annotations are nontrivial tasks that require dealing with multiple markup variants, encodings issues, and tool incompatibilities across the language versions. This motivated the development of a new architecture.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 16

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy