SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Virk Shafqat 1979) "

Sökning: WFRF:(Virk Shafqat 1979)

  • Resultat 1-10 av 32
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Virk, Shafqat, 1979, et al. (författare)
  • A Deep Learning System for Automatic Extraction of Typological Linguistic Information from Descriptive Grammars
  • 2021
  • Ingår i: International Conference Recent Advances in Natural Language Processing, RANLP. - : INCOMA Ltd. Shoumen, BULGARIA. - 1313-8502. ; , s. 1480-1489
  • Konferensbidrag (refereegranskat)abstract
    • Linguistic typology is an area of linguistics concerned with analysis of and comparison between natural languages of the world based on their certain linguistic features. For that purpose, historically, the area has relied on manual extraction of linguistic feature values from textural descriptions of languages. This makes it a laborious and time expensive task and is also bound by human brain capacity. In this study, we present a deep learning system for the task of automatic extraction of linguistic features from textual descriptions of natural languages. First, textual descriptions are manually annotated with special structures called semantic frames. Those annotations are learned by a recurrent neural network, which is then used to annotate un-annotated text. Finally, the annotations are converted to linguistic feature values using a separate rule based module. Word embeddings, learned from general purpose text, are used as a major source of knowledge by the recurrent neural network. We compare the proposed deep learning system to a previously reported machine learning based system for the same task, and the deep learning system wins in terms of F1 scores with a fair margin. Such a system is expected to be a useful contribution for the automatic curation of typological databases, which otherwise are manually developed.
  •  
2.
  • Virk, Shafqat, 1979, et al. (författare)
  • A Novel Machine Learning Based Approach for Post-OCR Error Detection
  • 2021
  • Ingår i: Proceedings of the International Conference on Recent Advances in Natural Language Processing, 1–3 September, 2021 / Edited by Galia Angelova, Maria Kunilovskaya, Ruslan Mitkov, Ivelina Nikolova-Koleva. - Shoumen, Bulgaria : INCOMA. - 1313-8502 .- 2603-2813. - 9789544520724
  • Konferensbidrag (refereegranskat)abstract
    • Post processing is the most conventional approach for correcting errors that are caused by Optical Character Recognition (OCR) systems. Two steps are usually taken to correct OCR errors: detection and corrections. For the first task, supervised machine learning methods have shown state-of-the-art performances. Previously proposed approaches have focused most prominently on combining lexical, contextual and statistical features for detecting errors. In this study, we report a novel system to error detection which is based merely on the n-gram counts of a candidate token. In addition to being simple and computationally less expensive, our proposed system beats previous systems reported in the ICDAR2019 competition on OCR-error detection with notable margins. We achieved state-of-the-art F1-scores for eight out of the ten involved European languages. The maximum improvement is for Spanish which improved from 0.69 to 0.90, and the minimum for Polish from 0.82 to 0.84.
  •  
3.
  • Virk, Shafqat, 1979, et al. (författare)
  • An Open-Source Punjabi Resource Grammar
  • 2011
  • Ingår i: Proceedings of RANLP-2011, Recent Advances in Natural Language Processing, Hissar, Bulgaria, 12-14 September, 2011. ; , s. 70-76
  • Konferensbidrag (refereegranskat)
  •  
4.
  • Virk, Shafqat, 1979, et al. (författare)
  • An Open Source Urdu Resource Grammar
  • 2010
  • Ingår i: Proceedings of the 8th Workshop on Asian Language Resources (Coling 2010 workshop).
  • Konferensbidrag (refereegranskat)
  •  
5.
  • Virk, Shafqat, 1979, et al. (författare)
  • Exploiting frame semantics and frame-semantic parsing for automatic extraction of typological information from descriptive grammars of natural languages
  • 2019
  • Ingår i: International Conference Recent Advances in Natural Language Processing, RANLP. - Shoumen : Incoma Ltd. - 1313-8502. - 9789544520557 - 9789544520564 ; 2019-September, s. 1247-1256
  • Konferensbidrag (refereegranskat)abstract
    • We describe a novel system for automatic extraction of typological linguistic information from descriptive grammars of natural languages, applying the theory of frame semantics in the form of frame-semantic parsing. The current proof-of-concept system covers a few selected linguistic features, but the methodology is general and can be extended not only to other typological features but also to descriptive grammars written in languages other than English. Such a system is expected to be a useful assistance for automatic curation of typological databases which otherwise are built manually, a very labor and time consuming as well as cognitively taxing enterprise.
  •  
6.
  • Borin, Lars, 1957, et al. (författare)
  • A bird’s-eye view on South Asian languages through LSI: Areal or genetic relationships?
  • 2021
  • Ingår i: Journal of South Asian Languages and Linguistics. - : Walter de Gruyter GmbH. - 2196-0771 .- 2196-078X. ; 7:2, s. 151-185
  • Tidskriftsartikel (refereegranskat)abstract
    • We present initial exploratory work on illuminating the long-standing question of areal versus genealogical connections in South Asia using computational data visualization tools. With respect to genealogy, we focus on the subclassification of Indo-Aryan, the most ubiquitous language family of South Asia. The intent here is methodological: we explore computational methods for visualizing large datasets of linguistic features, in our case 63 features from 200 languages representing four language families of South Asia, coming out of a digitized version of Grierson’s Linguistic Survey of India. To this dataset we apply phylogenetic software originally developed in the context of computational biology for clustering the languages and displaying the clusters in the form of networks. We further explore multiple correspondence analysis as a way of illustrating how linguistic feature bundles correlate with extrinsically defined groupings of languages (genealogical and geographical). Finally, map visualization of combinations of linguistic features and language genealogy is suggested as an aid in distinguishing genealogical and areal features. On the whole, our results are in line with the conclusions of earlier studies: Areality and genealogy are strongly intertwined in South Asia, the traditional lower-level subclassification of Indo-Aryan is largely upheld, and there is a clearly discernible areal east–west divide cutting across language families.
  •  
7.
  • Borin, Lars, 1957, et al. (författare)
  • Language technology for digital linguistics: Turning the Linguistic Survey of India into a rich source of linguistic information
  • 2018
  • Ingår i: Lecture Notes in Computer Science. Computational Linguistics and Intelligent Text Processing, 18th International Conference, CICLing 2017, Budapest, Hungary, April 17–23, 2017. - Cham : Springer. - 0302-9743 .- 1611-3349. ; , s. 550-563
  • Konferensbidrag (refereegranskat)abstract
    • We present our work aiming at turning the linguistic material available in Grierson’s classical Linguistic Survey of India (LSI) from a printed discursive textual description into a formally structured digital language resource, a database suitable for a broad array of linguistic investigations of the languages of South Asia. While doing so, we develop state-of-the-art language technology for automatically extracting the relevant grammatical information from the text of the LSI, and interactive linguistic information visualization tools for better analysis and comparisons of languages based on their structural and functional features.
  •  
8.
  • Borin, Lars, 1957, et al. (författare)
  • Many a little makes a mickle - infrastructure component reuse for a massively multilingual linguistic study
  • 2018
  • Ingår i: Selected papers from the CLARIN Annual Conference 2017, Budapest, 18–20 September 2017. - Linköping : Linköping University Electronic Press. - 1650-3686 .- 1650-3740. - 9789176852736
  • Konferensbidrag (refereegranskat)abstract
    • We present ongoing work aiming at turning the linguistic material available in Grierson’s classical Linguistic Survey of India (LSI) into a digital language resource, a database suitable for a broad array of linguistic investigations of the languages of South Asia and studies relating to language typology and contact linguistics. The project has two concrete main aims: (1) to conduct a linguistic investigation of the claim that South Asia constitutes a linguistic area; (2) to develop state-of-the-art language technology for automatically extracting the relevant information from the text of the LSI. In this presentation we focus on how, in the first part of the project, a number of existing research infrastructure components provided by Swe-Clarin, the Swedish CLARIN consortium, have been ‘recycled’ in order to allow the linguists involved in the project to quickly orient themselves in the vast LSI material, and to be able to provide input to the language technologists designing the tools for information extraction from the descriptive grammars.
  •  
9.
  • Borin, Lars, 1957, et al. (författare)
  • Swedish FrameNet++ and comparative linguistics
  • 2021
  • Ingår i: The Swedish FrameNet+. - Amsterdam : John Benjamins Publishing Company. - 9789027209900 - 9789027258489 ; , s. 139-166
  • Bokkapitel (övrigt vetenskapligt/konstnärligt)abstract
    • In this chapter we describe a multilingual extension of Swedish FrameNet++, intended to address research questions of a broad comparative nature, in genealogical, areal and typological linguistics, focusing on the integration into Swedish FrameNet++ of so-called core vocabularies, used in several linguistic subfields in order to conduct massive comparative studies involving large numbers of languages. Specifically, we describe the inclusion of two such lexical databases covering several hundred South Asian languages, with the aim of investigating areal and genealogical connections among these languages.
  •  
10.
  • Borin, Lars, 1957, et al. (författare)
  • Towards a Big Data View on South Asian Linguistic Diversity
  • 2016
  • Ingår i: WILDRE-3 – 3rd Workshop on Indian Language Data: Resources and Evaluation. - Paris : ELRA.
  • Konferensbidrag (refereegranskat)abstract
    • South Asia with its rich and diverse linguistic tapestry of hundreds of languages, including many from four major language families, and a long history of intensive language contact, provides rich empirical data for studies of linguistic genealogy, linguistic typology, and language contact. South Asia is often referred to as a linguistic area, a region where, due to close contact and widespread multilingualism, languages have influenced one another to the extent that both related and unrelated languages are more similar on many linguistic levels than we would expect. However, with some rare exceptions, most studies are largely impressionistic, drawing examples from a few languages. In this paper we present our ongoing work aiming at turning the linguistic material available in Grierson’s Linguistic Survey of India (LSI) into a digital language resource, a database suitable for a broad array of linguistic investigations of the languages of South Asia. In addition to this, we aim to contribute to the methodological development of large-scale comparative linguistics drawing on digital language resources, by exploring NLP techniques for extracting linguistic information from free-text language descriptions of the kind found in the LSI.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 32
Typ av publikation
konferensbidrag (26)
bokkapitel (2)
samlingsverk (redaktörskap) (1)
tidskriftsartikel (1)
doktorsavhandling (1)
licentiatavhandling (1)
visa fler...
visa färre...
Typ av innehåll
refereegranskat (24)
övrigt vetenskapligt/konstnärligt (8)
Författare/redaktör
Virk, Shafqat, 1979 (29)
Borin, Lars, 1957 (11)
Dannélls, Dana, 1976 (6)
Forsberg, Markus, 19 ... (5)
Prasad, K V S, 1952 (5)
Saxena, Anju, 1959- (5)
visa fler...
Ranta, Aarne, 1963 (4)
Saxena, Anju (3)
Hammarström, Harald, ... (3)
Sheikh, Muhammad Aza ... (3)
Wichmann, Søren (3)
Virk, Shafqat Mumtaz ... (3)
Tahmasebi, Nina, 198 ... (2)
Angelov, Krasimir, 1 ... (2)
Comrie, Bernard (2)
Humayoun, Muhammad, ... (2)
Berdicevskis, Aleksa ... (1)
Enache, Ramona, 1985 (1)
Volodina, Elena, 197 ... (1)
Hammarström, Harald (1)
Camilleri, John J., ... (1)
Caprotti, Olga, 1964 (1)
Sköldberg, Emma, 196 ... (1)
Ohlsson, Claes (1)
Klang, Per (1)
Détrez, Grégoire (1)
Hallgren, Thomas, 19 ... (1)
Schlechtweg, Dominik (1)
Zhang, Tuo (1)
Malm, Per (1)
Nishioka, Miki (1)
Kaushik, C. A. G. (1)
Sander, Pauline (1)
Theuer Linke, Lukas (1)
Schulte im Walde, Sa ... (1)
Foster, Daniel (1)
Saleem, Raheela (1)
ABOLAHRAR, ELNAZ, 19 ... (1)
Aslam, Muhammad Irfa ... (1)
Iqbal, Saania (1)
Khurram, Nazia (1)
Prasad, K.V.S (1)
visa färre...
Lärosäte
Göteborgs universitet (30)
Chalmers tekniska högskola (13)
Uppsala universitet (7)
Högskolan i Skövde (1)
Språk
Engelska (32)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (28)
Humaniora (15)
Teknik (4)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy