SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Mannila Heikki) "

Sökning: WFRF:(Mannila Heikki)

  • Resultat 1-5 av 5
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  •  
2.
  • Lijffijt, Jefrey, et al. (författare)
  • Significance testing of word frequencies in corpora
  • 2016
  • Ingår i: Digital Scholarship in the Humanities. - : Oxford University Press (OUP). - 2055-768X .- 2055-7671. ; 31:2, s. 374-397
  • Tidskriftsartikel (refereegranskat)abstract
    • Finding out whether a word occurs significantly more often in one text or corpus than in another is an important question in analysing corpora. As noted by Kilgarriff (Language is never, ever, ever, random, Corpus Linguistics and Linguistic Theory, 2005; 1(2): 263–76.), the use of the χ2 and log-likelihood ratio tests is problematic in this context, as they are based on the assumption that all samples are statistically independent of each other. However, words within a text are not independent. As pointed out in Kilgarriff (Comparing corpora, International Journal of Corpus Linguistics, 2001; 6(1): 1–37) and Paquot and Bestgen (Distinctive words in academic writing: a comparison of three statistical tests for keyword extraction. In Jucker, A., Schreier, D., and Hundt, M. (eds), Corpora: Pragmatics and Discourse. Amsterdam: Rodopi, 2009, pp. 247–69), it is possible to represent the data differently and employ other tests, such that we assume independence at the level of texts rather than individual words. This allows us to account for the distribution of words within a corpus. In this article we compare the significance estimates of various statistical tests in a controlled resampling experiment and in a practical setting, studying differences between texts produced by male and female fiction writers in the British National Corpus. We find that the choice of the test, and hence data representation, matters. We conclude that significance testing can be used to find consequential differences between corpora, but that assuming independence between all words may lead to overestimating the significance of the observed differences, especially for poorly dispersed words. We recommend the use of the t-test, Wilcoxon rank-sum test, or bootstrap test for comparing word frequencies across corpora.
  •  
3.
  • Lundmark, Per E, et al. (författare)
  • Evaluation of HapMap data in six populations of European descent
  • 2008
  • Ingår i: European Journal of Human Genetics. - : Springer Science and Business Media LLC. - 1018-4813 .- 1476-5438. ; 16:9, s. 1142-1150
  • Tidskriftsartikel (refereegranskat)abstract
    • We studied how well the European CEU samples used in the Haplotype Mapping Project (HapMap) represent five European populations by analyzing nuclear family samples from the Swedish, Finnish, Dutch, British and Australian (European ancestry) populations. The number of samples from each population (about 30 parent-offspring trios) was similar to that in the HapMap sample sets. A panel of 186 single nucleotide polymorphisms (SNPs) distributed over the 1.5 Mb region of the GRID2 gene on chromosome 4 was genotyped. The genotype data were compared pair-wise between the HapMap sample and the other population samples. Principal component analysis (PCA) was used to cluster the data from different populations with respect to allele frequencies and to define the markers responsible for observed variance. The only sample with detectable differences in allele frequencies was that from Kuusamo, Finland. This sample also separated from the others, including the other Finnish sample, in the PCA analysis. A set of tagSNPs was defined based on the HapMap data and applied to the samples. The tagSNPs were found to capture the genetic variation in the analyzed region at r(2)>0.8 at levels ranging from 95% in the Kuusamo sample to 87% in the Australian sample. To capture the maximal genetic variation in the region, the Kuusamo, HapMap and Australian samples required 58, 63 and 73 native tagSNPs, respectively. The HapMap CEU sample represents the European samples well for tagSNP selection, with some caution regarding estimation of allele frequencies in the Finnish Kuusamo sample, and a slight reduction in tagging efficiency in the Australian sample.
  •  
4.
  • Norén, G. Niklas, 1977- (författare)
  • Statistical methods for knowledge discovery in adverse drug reaction surveillance
  • 2007
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Collections of individual case safety reports are the main resource for early discovery of unknown adverse reactions to drugs once they have been introduced to the general public. The data sets involved are complex and based on voluntary submission of reports, but contain pieces of very important information. The aim of this thesis is to propose computationally feasible statistical methods for large-scale knowledge discovery in these data sets. The main contributions are a duplicate detection method that can reliably identify pairs of unexpectedly similar reports and a new measure for highlighting suspected drug-drug interaction. Specifically, we extend the hit-miss model for database record matching with a hit-miss mixture model for scoring numerical record fields and a new method to compensate for strong record field correlations. The extended hit-miss model is implemented for the WHO database and demonstrated to be useful in real world duplicate detection, despite the noisy and incomplete information on individual case safety reports. The Information Component measure of disproportionality has been in routine use since 1998 to screen the WHO database for excessive adverse drug reaction reporting rates. Here, it is further refined. We introduce improved credibility intervals for rare events, post-stratification adjustment for suspected confounders and an extension to higher order associations that allows for simple but robust screening for potential risk factors. A new approach to identifying reporting patterns indicative of drug-drug interaction is also proposed. Finally, we describe how imprecision estimates specific to each prediction of a Bayes classifier may be obtained with the Bayesian bootstrap. Such case-based imprecision estimates allow for better prediction when different types of errors have different associated loss, with a possible application in combining quantitative and clinical filters to highlight drug-ADR pairs for clinical review.
  •  
5.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-5 av 5

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy