| 1. |
- Stymne, Sara, 1977-
(författare)
-
Compound Processing for Phrase-Based Statistical Machine Translation
- 2009
-
Licentiatavhandling (övrigt vetenskapligt)abstract
- In this thesis I explore how compound processing can be used to improve phrase-based statistical machine translation (PBSMT) between English and German/Swedish. Both German and Swedish generally use closed compounds, which are written as one word without spaces or other indicators of word boundaries. Compounding is both common and productive, which makes it problematic for PBSMT, mainly due to sparse data problems.The adopted strategy for compound processing is to split compounds into their component parts before training and translation. For translation into Swedish and German the parts are merged after translation. I investigate the effect of different splitting algorithms for translation between English and German, and of different merging algorithms for German. I also apply these methods to a different language pair, English--Swedish. Overall the studies show that compound processing is useful, especially for translation from English into German or Swedish. But there are improvements for translation into English as well, such as a reduction of unknown words.I show that for translation between English and German different splitting algorithms work best for different translation directions. I also design and evaluate a novel merging algorithm based on part-of-speech matching, which outperforms previous methods for compound merging, showing the need for information that is carried through the translation process, rather than only external knowledge sources such as word lists. Most of the methods for compound processing were originally developed for German. I show that these methods can be applied to Swedish as well, with similar results.
|
|
| 2. |
|
|
| 3. |
|
|
| 4. |
- Wesslén, Karin, 1953-
(författare)
-
Att skriva, tala och tänka samhällskunskap
- 2011
-
Licentiatavhandling (övrigt vetenskapligt)abstract
- This study is based on the presumption that language is fundamental to the construction of knowledge. In addition, linguistic demands are incorporated in the policy documents of the upper secondary education of Sweden; students shall, during their education, be given the opportunity to appropriate certain linguistic tools. The purpose of this thesis is to investigate how teachers and students in upper secondary education manage and utilize the discourse of social science in both speech and writing. More specifically, two classes are studied during three terms. The teachers’ ability to organize and support the students vocally and in written is examined, so are the effects of the teaching on students’ writing.The origin of the study is constituted by a sociocultural stance provided by Vygotskij, Bakhtin and Halliday. A combination of a functional perspective on language and a cognitive is probed, where the study is comparative in nature consisting of an experimental class and a control class. The importance of language for the creation of knowledge has been communicated to the teachers of the experimental class, with provided complementary subject didactic literature. This literature offers support for teachers to augment the use of explicit teaching and enhance student awareness of how conceptual structures mould social science.Qualitative analyses are performed on the basis of teacher-student dialogue and written tasks by a group of selected students. The analytical tools object language – metalanguage, linguistic operations and knowledge structures are developed for the purpose of processing data, and have been combined with the tools activity analysis, subject-related concepts and text activity. The results from the analyses display no difference in the handling of the discourse of social science between the experimental class and the control class. The teachers of the experimental class, like the teacher of the control class, are primarily utilizing object language where knowledge structures are visible, as opposed to a combination of object language – metalanguage. Furthermore, they exhibit diminutive use of dialogue in their teaching. The students of both classes, on their hand, demonstrate an equal progress in textual development.This study concludes that the experimental class has not been provided with sufficiently explicit support to advance in the structuring of knowledge and in level of reasoning. A more efficient support to teachers to manage these analytical tools would, in all probability, give them, and through this the students, an increasingly profound insight into structuring of text activities, the meaning and signalling of linguistic operations, the construction of subject-related concepts and, most importantly, how these three tools are interrelated.
|
|
| 5. |
- Italia, Julie, 1973-
(författare)
-
El alineamiento socio-pragmatico en hablantes de ESL inmersos en la comunidad meta
- 2011
-
Licentiatavhandling (övrigt vetenskapligt)abstract
- ResumenEl objetivo de este trabajo es de llevar a cabo un estudio comparativo entre dos grupos de informantes en el área de la competencia socio-pragmática y el impacto que esta tiene sobre la gestión interrelacional. El estudio fue realizado a partir de dos grupos: un grupo de estudio compuesto por suecos hablantes no-nativos del español de alto grado de competencia adquirida, residentes en Chile por un mínimo de cinco años, y un grupo de control compuesto por chilenos hablantes nativos. Específicamente, esta investigación propone analizar las diferencias entre el hablante no-nativo y nativo en la frecuencia y el uso de expresiones deAfecto como medio para crear lazos emocionales con el interlocutor en la negociación de una petición.El análisis comparativo de recurso a expresiones de Afecto en los dos grupos estudiados, da algunas indicaciones sobre el efecto que tales expresiones tienen sobre la gestión interrelacional en la comunicación intercultural entre chilenos y suecos. Los resultados del presente estudio indican que el alineamiento socio-pragmático del uso de estrategias afectivas es determinante para la percepción de la gestión interrelacional en algunas de las subcategorías del modelo teórico de análisis. Por lo tanto, los conocimientos y usos apropiados del Afecto en la comunicación por parte del hablante no-nativo puede contribuir a la mantención o fracaso de la gestión interrelacional en los diálogos estudiados.
|
|
| 6. |
- Hall, Johan, 1973-
(författare)
-
MaltParser -- An Architecture for Inductive Labeled Dependency Parsing
- 2006
-
Licentiatavhandling (övrigt vetenskapligt)abstract
- Denna licentiatavhandling presenterar en mjukvaruarkitektur fördatadriven dependensparsning, dvs. för att automatiskt skapa ensyntaktisk analys i form av dependensgrafer för meningar i texterpå naturligt språk. Arkitekturen bygger på idén att man ska kunna variera parsningsalgoritm, särdragsmodell och inlärningsmetod oberoende av varandra. Till grund för denna arkitektur har vi använt det teoretiska ramverket för induktiv dependensparsning presenterat av Nivre \citeyear{nivre06c}. Arkitekturen har realiserats i programvaran MaltParser, där det är möjligt att definiera komplexa särdragsmodeller i ett speciellt beskrivningsspråk. I denna avhandling kommer vi att lägga extra tyngd vid att beskriva hur vi har integrerat inlärningsmetoden supportvektor-maskiner (SVM).MaltParser valideras med tre experimentserier, där data från tre språk används (kinesiska, engelska och svenska). I den första experimentserien kontrolleras om implementationen realiserar den underliggande arkitekturen. Experimenten visar att MaltParser utklassar en trivial metod för dependensparsning (\emph{eng}. baseline) och de grundläggande kraven på välformade dependensgrafer uppfylls. Dessutom visar experimenten att det är möjligt att variera parsningsalgoritm, särdragsmodell och inlärningsmetod oberoende av varandra. Den andra experimentserien fokuserar på de speciella egenskaperna för SVM-gränssnittet. Experimenten visar att det är möjligt att reducera inlärnings- och parsningstiden utan att förlora i parsningskorrekthet genom att dela upp träningsdata enligt ordklasstaggen för nästa ord i nuvarande parsningskonfiguration. Den tredje och sista experimentserien presenterar en empirisk undersökning som jämför SVM med minnesbaserad inlärning (MBL). Studien använder sig av fem särdragsmodeller, där alla kombinationer av språk, inlärningsmetod och särdragsmodellhar genomgått omfattande parameteroptimering. Experimenten visar att SVM överträffar MBL för mer komplexa och lexikaliserade särdragsmodeller med avseende på parsningskorrekthet. Det finns även vissa indikationer på att SVM, med en uppdelningsstrategi, kan parsa en text snabbare än MBL. För svenska kan vi rapportera den högsta parsningskorrektheten hittills och för kinesiska och engelska är resultaten nära de bästa som har rapporterats.
|
|
| 7. |
- Nilsson, Jens, 1979-
(författare)
-
Tree Transformations in Inductive Dependency Parsing
- 2007
-
Licentiatavhandling (övrigt vetenskapligt)abstract
- This licentiate thesis deals with automatic syntactic analysis, or parsing, of natural languages. A parser constructs the syntactic analysis, which it learns by looking at correctly analyzed sentences, known as training data. The general topic concerns manipulations of the training data in order to improve the parsing accuracy.Several studies using constituency-based theories for natural languages in such automatic and data-driven syntactic parsing have shown that training data, annotated according to a linguistic theory, often needs to be adapted in various ways in order to achieve an adequate, automatic analysis. A linguistically sound constituent structure is not necessarily well-suited for learning and parsing using existing data-driven methods. Modifications to the constituency-based trees in the training data, and corresponding modifications to the parser output, have successfully been applied to increase the parser accuracy. The topic of this thesis is to investigate whether similar modifications in the form of tree transformations to training data, annotated with dependency-based structures, can improve accuracy for data-driven dependency parsers. In order to do this, two types of tree transformations are in focus in this thesis.The first one concerns non-projectivity. The full potential of dependency parsing can only be realized if non-projective constructions are allowed, which pose a problem for projective dependency parsers. On the other hand, non-projective parsers tend, among other things, to be slower. In order to maintain the benefits of projective parsing, a tree transformation technique to recover non-projectivity while using a projective parser is presented here.The second type of transformation concerns linguistic phenomena that are possible but hard for a parser to learn, given a certain choice of dependency analysis. This study has concentrated on two such phenomena, coordination and verb groups, for which tree transformations are applied in order to improve parsing accuracy, in case the original structure does not coincide with a structure that is easy to learn.Empirical evaluations are performed using treebank data from various languages, and using more than one dependency parser. The results show that the benefit of these tree transformations used in preprocessing and postprocessing to a large extent is language, treebank and parser independent.
|
|
| 8. |
- Liddle, Roy
(författare)
-
SPEED and TIME in the event modifier lexemes slow, fast and quick : A cognitive perspective
- 2010
-
Licentiatavhandling (övrigt vetenskapligt)abstract
- In this thesis I examine the event modifier lexemes slow, fast and quick and the events they modify. A number of observations made of the use of these modifiers cannot be explained by existing event typologies. Instead I propose a model for event structure and construal, in which I define events according to their temporal configuration in terms of DURATIVITY, BOUNDEDNESS and CHANGE. By applying this model to examples of contemporary British English, I show that there are distinct usage patterns for each modifier with regard to the temporal configuration of the events they modify. I further demonstrate that the readings of SPEED and TIME which result from the combination of event and modifier depend on a number of factors, and in doing so provide an insight into the complex conceptual nature of SPEED and its relation to TIME. In the course of the analysis, several other factors, such as the position of the modifier, the function of the -ly suffix, and the adverbal/adjectival status of the modifier are brought to light.Keywords: SPEED, TIME, BOUNDEDNESS, CHANGE, events, event modifiers, dual-form adverbs, configurational structure, temporal configuration, construal.
|
|
| 9. |
- Holmqvist, Maria, 1979-
(författare)
-
Word Alignment by Re-using Parallel Phrases
- 2008
-
Licentiatavhandling (övrigt vetenskapligt)abstract
- In this thesis we present the idea of using parallel phrases for word alignment. Each parallel phrase is extracted from a set of manual word alignments and contains a number of source and target words and their corresponding alignments. If a parallel phrase matches a new sentence pair, its word alignments can be applied to the new sentence. There are several advantages of using phrases for word alignment. First, longer text segments include more context and will be more likely to produce correct word alignments than shorter segments or single words. More importantly, the use of longer phrases makesit possible to generalize words in the phrase by replacing words by parts-of-speech or other grammatical information. In this way, the number of words covered by the extracted phrases can go beyond the words and phrases that were present in the original set of manually aligned sentences. We present experiments with phrase-based word alignment on three types of English–Swedish parallel corpora: a software manual, a novel and proceedings of the European Parliament. In order to find a balance between improved coverage and high alignment accuracy we investigated different properties of generalised phrases to identify which types of phrases are likely to produce accurate alignments on new data. Finally, we have compared phrase-based word alignments to state-of-the-art statistical alignment with encouraging results. We show that phrase-based word alignments can be used to enhance statistical word alignment. To evaluate word alignments an English–Swedish reference set for the Europarl corpus was constructed. The guidelines for producing this reference alignment are presented in the thesis.
|
|
| 10. |
- Sundblad, Håkan, 1977-
(författare)
-
Question Classification in Question Answering Systems
- 2007
-
Licentiatavhandling (övrigt vetenskapligt)abstract
- Question answering systems can be seen as the next step in information retrieval, allowing users to pose questions in natural language and receive succinct answers. In order for a question answering system as a whole to be successful, research has shown that the correct classification of questions with regards to the expected answer type is imperative. Question classification has two components: a taxonomy of answer types, and a machinery for making the classifications.This thesis focuses on five different machine learning algorithms for the question classification task. The algorithms are k nearest neighbours, naïve bayes, decision tree learning, sparse network of winnows, and support vector machines. These algorithms have been applied to two different corpora, one of which has been used extensively in previous work and has been constructed for a specific agenda. The other corpus is drawn from a set of users' questions posed to a running online system. The results showed that the performance of the algorithms on the different corpora differs both in absolute terms, as well as with regards to the relative ranking of them. On the novel corpus, naïve bayes, decision tree learning, and support vector machines perform on par with each other, while on the biased corpus there is a clear difference between them, with support vector machines being the best and naïve bayes being the worst.The thesis also presents an analysis of questions that are problematic for all learning algorithms. The errors can roughly be divided as due to categories with few members, variations in question formulation, the actual usage of the taxonomy, keyword errors, and spelling errors. A large portion of the errors were also hard to explain.
|
|