SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Seraji Mojgan) "

Sökning: WFRF:(Seraji Mojgan)

  • Resultat 1-12 av 12
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Seraji, Mojgan, et al. (författare)
  • A Basic Language Resource Kit for Persian
  • 2012
  • Ingår i: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12). - : European Language Resources Association. - 9782951740877 ; , s. 2245-2252
  • Konferensbidrag (refereegranskat)abstract
    • Persian with its about 100,000,000 speakers in the world belongs to the group of languages with less developed linguistically annotated resources and tools. The few existing resources and tools are neither open source nor freely available. Thus, our goal is to develop open source resources such as corpora and treebanks, and tools for data-driven linguistic analysis of Persian. We do this by exploring the reusability of existing resources and adapting state-of-the-art methods for the linguistic annotation. We present fully functional tools for text normalization, sentence segmentation, tokenization, part-of-speech tagging, and parsing. As for resources, we describe the Uppsala PErsian Corpus (UPEC) which is a modified version of the Bijankhan corpus with additional sentence segmentation and consistent tokenization modified for more appropriate syntactic annotation. The corpus consists of 2,782,109 tokens and is annotated with parts of speech and morphological features. A treebank is derived from UPEC with an annotation scheme based on Stanford Typed Dependencies and is planned to consist of 10,000 sentences of which 215 have already been annotated. 
  •  
2.
  • Seraji, Mojgan, et al. (författare)
  • A Persian Treebank with Stanford Typed Dependencies
  • 2014
  • Ingår i: Proceedings of Language Resources and Evaluation. - 9782951740884 ; , s. 796-801
  • Konferensbidrag (refereegranskat)abstract
    • We present the Uppsala Persian Dependency Treebank (UPDT) with a syntactic annotation scheme based on Stanford Typed Dependencies.The treebank consists of 6,000 sentences and 151,671 tokens with an average sentence length of 25 words. The data is from different genres, including newspaper articles and fiction, as well as technical descriptions and texts about culture and art, taken from the open source Uppsala Persian Corpus (UPC). The syntactic annotation scheme is extended for Persian to include all syntactic relations that could not be covered by the primary scheme developed for English. In addition, we present open source tools for automatic analysis of Persian containing a text normalizer, a sentence segmenter and tokenizer, a part-of-speech tagger, and a parser. The treebank and the parser have been developed simultaneously in a bootstrapping procedure. The result of a parsing experiment shows an overall labeled attachment score of 82.05% and an unlabeled attachment score of 85.29%. The treebank is freely available as an open source resource.
  •  
3.
  • Seraji, Mojgan (författare)
  • A Statistical Part-of-Speech Tagger for Persian
  • 2011
  • Ingår i: Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA 2011. ; , s. 340-343
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents the statistical part-of-speech tagger HunPoS trained on a Persian corpus. The result of the experiments shows that HunPoS provides an overall accuracy of 96.9%, which is the best result reported for Persian part-of-speech tagging.
  •  
4.
  •  
5.
  • Seraji, Mojgan, et al. (författare)
  • Bootstrapping a Persian Treebank
  • 2012
  • Ingår i: Linguistic Issues in Language Technology. - 1945-3604. ; 7:18
  • Tidskriftsartikel (refereegranskat)
  •  
6.
  • Seraji, Mojgan, et al. (författare)
  • Dependency Parsers for Persian
  • 2012
  • Ingår i: Proceedings of 10th Workshop on Asian Language Resources, COLING 2012, 24th International Conference on Computational Linguistics, Mumbai, India. - Mumbai, India : ACL Anthology.
  • Konferensbidrag (refereegranskat)abstract
    • We present two dependency parsers for Persian, MaltParser and MSTParser, trained on theUppsala PErsian Dependency Treebank. The treebank consists of 1,000 sentences today. Itsannotation scheme is based on Stanford Typed Dependencies (STD) extended for Persianwith regard to object marking and light verb contructions. The parsers and the treebank aredeveloped simultanously in a bootstrapping scenario. We evaluate the parsers by experimentingwith different feature settings. Parser accuracy is also evaluated on automatically generated andgold standard morphological features. Best parser performance is obtained when MaltParseris trained and optimized on 18,000 tokens, achieving 68.68% labeled and 74.81% unlabeledattachment scores, compared to 63.60% and 71.08% for labeled and unlabeled attachmentscore respectively by optimizing MSTParser.
  •  
7.
  • Seraji, Mojgan, et al. (författare)
  • Dependency Parsers for Persian
  • 2012
  • Ingår i: Proceedings of 10th Workshop on Asian Language Resources. - : Association for Computational Linguistics. ; , s. 35-44
  • Konferensbidrag (refereegranskat)
  •  
8.
  • Seraji, Mojgan (författare)
  • Morphosyntactic Corpora and Tools for Persian
  • 2015
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • This thesis presents open source resources in the form of annotated corpora and modules for automatic morphosyntactic processing and analysis of Persian texts. More specifically, the resources consist of an improved part-of-speech tagged corpus and a dependency treebank, as well as tools for text normalization, sentence segmentation, tokenization, part-of-speech tagging, and dependency parsing for Persian.In developing these resources and tools, two key requirements are observed: compatibility and reuse. The compatibility requirement encompasses two parts. First, the tools in the pipeline should be compatible with each other in such a way that the output of one tool is compatible with the input requirements of the next. Second, the tools should be compatible with the annotated corpora and deliver the same analysis that is found in these. The reuse requirement means that all the components in the pipeline are developed by reusing resources, standard methods, and open source state-of-the-art tools. This is necessary to make the project feasible.Given these requirements, the thesis investigates two main research questions. The first is how can we develop morphologically and syntactically annotated corpora and tools while satisfying the requirements of compatibility and reuse? The approach taken is to accept the tokenization variations in the corpora to achieve robustness. The tokenization variations in Persian texts are related to the orthographic variations of writing fixed expressions, as well as various types of affixes and clitics. Since these variations are inherent properties of Persian texts, it is important that the tools in the pipeline can handle them. Therefore, they should not be trained on idealized data.The second question concerns how accurately we can perform morphological and syntactic analysis for Persian by adapting and applying existing tools to the annotated corpora. The experimental evaluation of the tools shows that the sentence segmenter and tokenizer achieve an F-score close to 100%, the tagger has an accuracy of nearly 97.5%, and the parser achieves a best labeled accuracy of over 82% (with unlabeled accuracy close to 87%).
  •  
9.
  • Seraji, Mojgan, et al. (författare)
  • ParsPer : A Dependency Parser for Persian
  • 2015
  • Ingår i: Depling 2015. - Uppsala : Uppsala universitet. - 9789163789656 ; , s. 300-309
  • Konferensbidrag (refereegranskat)abstract
    • We present a dependency parser for Persian, called ParsPer, developed using the graph-based parser in the Mate Tools. The parser is trained on the entire Uppsala Persian Dependency Treebank with a specific configuration that was selected by MaltParser as the best performing parsing representation. The treebank’s syntactic annotation scheme is based on Stanford Typed Dependencies with extensions for Persian. The results of the ParsPer evaluation revealed a best labeled accuracy over 82% with an unlabeled accuracy close to 87%. The parser is freely available and released as an open source tool for parsing Persian.
  •  
10.
  •  
11.
  • Seraji, Mojgan, et al. (författare)
  • Universal Dependencies for Persian
  • 2016
  • Ingår i: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). - Paris : EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA. - 9782951740891 ; , s. 2361-2365
  • Konferensbidrag (refereegranskat)abstract
    • The Persian Universal Dependency Treebank (Persian UD) is a recent effort of treebanking Persian with Universal Dependencies (UD), an ongoing project that designs unified and cross-linguistically valid grammatical representations including part-of-speech tags, morphological features, and dependency relations. The Persian UD is the converted version of the Uppsala Persian Dependency Treebank (UPDT) to the universal dependencies framework and consists of nearly 6,000 sentences and 152,871 word tokens with an average sentence length of 25 words. In addition to the universal dependencies syntactic annotation guidelines, the two treebanks differ in tokenization. All words containing unsegmented clitics (pronominal and copula clitics) annotated with complex labels in the UPDT have been separated from the clitics and appear with distinct labels in the Persian UD. The treebank has its original syntactic annotation scheme based on Stanford Typed Dependencies. In this paper, we present the approaches taken in the development of the Persian UD.
  •  
12.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-12 av 12

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy