SwePub
Sök i SwePub databas

  Extended search

Träfflista för sökning "WFRF:(Basirat Ali 1982 ) "

Search: WFRF:(Basirat Ali 1982 )

  • Result 1-10 of 15
Sort/group result
   
EnumerationReferenceCoverFind
1.
  • Basirat, Ali, Postdoctoral Researcher, 1982-, et al. (author)
  • An empirical study on the contribution of formal and semantic features to the grammatical gender of nouns
  • 2021
  • In: Linguistics Vanguard. - : Walter de Gruyter. - 2199-174X. ; 7:1
  • Journal article (peer-reviewed)abstract
    • This study conducts an experimental evaluation of two hypotheses about the contributions of formal and semantic features to the grammatical gender assignment of nouns. One of the hypotheses (Corbett and Fraser, 2000) claims that semantic features dominate formal ones. The other hypothesis, formulated within the optimal gender assignment theory (Rice, 2006), states that form and semantics contribute equally. Both hypotheses claim that the combination of formal and semantic features yields the most accurate gender identification.  In this paper, we operationalize and test these hypotheses by trying to predict grammatical gender using only character-based embeddings (that capture only formal features), only context-based embeddings (that capture only semantic features) and the combination of both. We performed the experiment using data from three languages with different gender systems (French, German and Russian). Formal features are a significantly better predictor of gender than semantic ones, and the difference in prediction accuracy is very large. Overall, formal features are also significantly better than the combination of form and semantics, but the difference is very small and the results for this comparison are not entirely consistent across languages.
  •  
2.
  • Basirat, Ali, 1982-, et al. (author)
  • Bridge the gap between statistical and hand-crafted grammars
  • 2013
  • In: Computer speech & language (Print). - : Elsevier BV. - 0885-2308 .- 1095-8363. ; 27:5, s. 1085-1104
  • Journal article (peer-reviewed)abstract
    • LTAG is a rich formalism for performing NLP tasks such as semantic interpretation, parsing, machine translation and information retrieval. Depend on the specific NLP task, different kinds of LTAGs for a language may be developed. Each of these LTAGs is enriched with some specific features such as semantic representation and statistical information that make them suitable to be used in that task. The distribution of these capabilities among the LTAGs makes it difficult to get the benefit from all of them in NLP applications.This paper discusses a statistical model to bridge between two kinds LTAGs for a natural language in order to benefit from the capabilities of both kinds. To do so, an HMM was trained that links an elementary tree sequence of a source LTAG onto an elementary tree sequence of a target LTAG. Training was performed by using the standard HMM training algorithm called Baum–Welch. To lead the training algorithm to a better solution, the initial state of the HMM was also trained by a novel EM-based semi-supervised bootstrapping algorithm.The model was tested on two English LTAGs, XTAG (XTAG-Group, 2001) and MICA's grammar (Bangalore et al., 2009) as the target and source LTAGs, respectively. The empirical results confirm that the model can provide a satisfactory way for linking these LTAGs to share their capabilities together.
  •  
3.
  •  
4.
  • Basirat, Ali, 1982-, et al. (author)
  • Linguistic information in word embeddings
  • 2019
  • In: Agents and Artificial Intelligence. - Cham : Springer. - 9783030054526 - 9783030054533 ; , s. 492-513
  • Conference paper (peer-reviewed)abstract
    • We study the presence of linguistically motivated information in the word embeddings generated with statistical methods. The nominal aspects of uter/neuter, common/proper, and count/mass in Swedish are selected to represent respectively grammatical, semantic, and mixed types of nominal categories within languages. Our results indicate that typical grammatical and semantic features are easily captured by word embeddings. The classification of semantic features required significantly less neurons than grammatical features in our experiments based on a single layer feed-forward neural network. However, semantic features also generated higher entropy in the classification output despite its high accuracy. Furthermore, the count/mass distinction resulted in difficulties to the model, even though the quantity of neurons was almost tuned to its maximum.
  •  
5.
  •  
6.
  •  
7.
  • Basirat, Ali, 1982- (author)
  • Principal Word Vectors
  • 2018
  • Doctoral thesis (other academic/artistic)abstract
    • Word embedding is a technique for associating the words of a language with real-valued vectors, enabling us to use algebraic methods to reason about their semantic and grammatical properties. This thesis introduces a word embedding method called principal word embedding, which makes use of principal component analysis (PCA) to train a set of word embeddings for words of a language. The principal word embedding method involves performing a PCA on a data matrix whose elements are the frequency of seeing words in different contexts. We address two challenges that arise in the application of PCA to create word embeddings. The first challenge is related to the size of the data matrix on which PCA is performed and affects the efficiency of the word embedding method. The data matrix is usually a large matrix that requires a very large amount of memory and CPU time to be processed. The second challenge is related to the distribution of word frequencies in the data matrix and affects the quality of the word embeddings. We provide an extensive study of the distribution of the elements of the data matrix and show that it is unsuitable for PCA in its unmodified form.We overcome the two challenges in principal word embedding by using a generalized PCA method. The problem with the size of the data matrix is mitigated by a randomized singular value decomposition (SVD) procedure, which improves the performance of PCA on the data matrix. The data distribution is reshaped by an adaptive transformation function, which makes it more suitable for PCA. These techniques, together with a weighting mechanism that generalizes many different weighting and transformation approaches used in literature, enable the principal word embedding to train high quality word embeddings in an efficient way.We also provide a study on how principal word embedding is connected to other word embedding methods. We compare it to a number of word embedding methods and study how the two challenges in principal word embedding are addressed in those methods. We show that the other word embedding methods are closely related to principal word embedding and, in many instances, they can be seen as special cases of it.The principal word embeddings are evaluated in both intrinsic and extrinsic ways. The intrinsic evaluations are directed towards the study of the distribution of word vectors. The extrinsic evaluations measure the contribution of principal word embeddings to some standard NLP tasks. The experimental results confirm that the newly proposed features of principal word embedding (i.e., the randomized SVD algorithm, the adaptive transformation function, and the weighting mechanism) are beneficial to the method and lead to significant improvements in the results. A comparison between principal word embedding and other popular word embedding methods shows that, in many instances, the proposed method is able to generate word embeddings that are better than or as good as other word embeddings while being faster than several popular word embedding methods.
  •  
8.
  • Basirat, Ali, 1982- (author)
  • Random Word Vectors
  • 2019
  • Conference paper (other academic/artistic)
  •  
9.
  • Basirat, Ali, 1982-, et al. (author)
  • Real-valued syntactic word vectors
  • 2020
  • In: Journal of experimental and theoretical artificial intelligence (Print). - 0952-813X .- 1362-3079. ; 32:4, s. 557-579
  • Journal article (peer-reviewed)abstract
    • We introduce a word embedding method that generates a set of real-valued word vectors from a distributional semantic space. The semantic space is built with a set of context units (words) which are selected by an entropy-based feature selection approach with respect to the certainty involved in their contextual environments. We show that the most predictive context of a target word is its preceding word. An adaptive transformation function is also introduced that reshapes the data distribution to make it suitable for dimensionality reduction techniques. The final low-dimensional word vectors are formed by the singular vectors of a matrix of transformed data. We show that the resulting word vectors are as good as other sets of word vectors generated with popular word embedding methods.
  •  
10.
  • Basirat, Ali, 1982-, et al. (author)
  • Real-valued Syntactic Word Vectors (RSV) for Greedy Neural Dependency Parsing
  • 2017
  • Conference paper (peer-reviewed)abstract
    • We show that a set of real-valued word vectors formed by right singular vectors of a transformed co-occurrence matrix are meaningful for determining different types of dependency relations between words. Our experimental results on the task of dependency parsing confirm the superiority of the word vectors to the other sets of word vectors generated by popular methods of word embedding. We also study the effect of using these vectors on the accuracy of dependency parsing in different languages versus using more complex parsing architectures.
  •  
Skapa referenser, mejla, bekava och länka
  • Result 1-10 of 15

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view