SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L773:1544 6115 "

Sökning: L773:1544 6115

  • Resultat 1-10 av 16
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Dalevi, Daniel, 1974, et al. (författare)
  • A New Order Estimator for Fixed and Variable Length Markov Models with Applications to DNA Sequence Similarity
  • 2006
  • Ingår i: Statistical Applications in Genetics and Molecular Biology. - : Walter de Gruyter GmbH. - 1544-6115. ; 5:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Recently Peres and Shields discovered a new method for estimating the order of a stationary fixed order Markov chain. They showed that the estimator is consistent by proving a threshold result. While this threshold is valid asymptotically in the limit, it is not very useful for DNA sequence analysis where data sizes are moderate. In this paper we give a novel interpretation of the Peres-Shields estimator as a sharp transition phenomenon. This yields a precise and powerful estimator that quickly identifies the core dependencies in data. We show that it compares favorably to other estimators, especially in the presence of variable dependencies. Motivated by this last point, we extend the Peres-Shields estimator to Variable Length Markov Chains. We compare it to a well-established estimator and show that it is superior in terms of the predictive likelihood. We give an application to the problem of detecting DNA sequence similarity in plasmids. Copyright ©2006 The Berkeley Electronic Press. All rights reserved.
  •  
2.
  • Gusnanto, A, et al. (författare)
  • Fold-change estimation of differentially expressed genes using mixture mixed-model
  • 2005
  • Ingår i: Statistical applications in genetics and molecular biology. - : Walter de Gruyter GmbH. - 1544-6115 .- 2194-6302. ; 4, s. Article26-
  • Tidskriftsartikel (refereegranskat)abstract
    • Microarray experiments produce expression measurements for thousands of genes simultaneously, though usually for a small number of RNA samples. The most common problem is the identification of genes that are differentially expressed between different groups of samples or biological conditions. As the number of genes far exceeds the number of RNA samples, the inherent multiplicity poses a severe problem in both hypothesis testing and effect estimation. While much of the recent literature is focused on the hypothesis aspects, we concentrate in this paper on effect estimation as a tool for the identification of differentially expressed genes. We propose a linear mixed model where the random effects are assumed to follow a mixture distribution, and study in detail the case of three normals, corresponding to genes that are down-, up- or non regulated. Our approach leads to a new type of non-linear shrinkage estimation, where a proportion of estimates is shrunk to zero, while the rest follows standard linear shrinkage. This allows us to estimate the log fold-change of the genes involved and to identify those that are differentially expressed within the same model framework. We investigate the operating characteristics of our method using simulation and spike-in studies, and illustrate its application to real data using a breast-cancer dataset.
  •  
3.
  •  
4.
  • Kristiansson, Erik, 1978, et al. (författare)
  • Quality Optimised Analysis of General Paired Microarray Experiments
  • 2006
  • Ingår i: Statistical Applications in Genetics and Molecular Biology. - : Walter de Gruyter GmbH. - 1544-6115. ; 5:1
  • Tidskriftsartikel (refereegranskat)abstract
    • In microarray experiments, several steps may cause sub-optimal quality and the need for quality control is strong. Often the experiments are complex, with several conditions studied simultaneously. A linear model for paired microarray experiments is proposed as a generalisation of the paired two-sample method by Kristiansson et al. (2005). Quality variation is modelled by different variance scales for different (pairs of) arrays, and shared sources of variation are modelled by covariances between arrays. The gene-wise variance estimates are moderated in an empirical Bayes approach. Due to correlations all data is typically used in the inference of any linear combination of parameters. Both real and simulated data are analysed. Unequal variances and strong correlations are found in real data, leading to further examination of the fit of the model and of the nature of the datasets in general. The empirical distributions of the test-statistics are found to have a considerably improved match to the null distribution compared to previous methods, which implies more correct p-values provided that most genes are non-differentially expressed. In fact, assuming independent observations with identical variances typically leads to optimistic p-values. The method is shown to perform better than the alternatives in the simulation study.
  •  
5.
  • Landfors, Mattias, 1977-, et al. (författare)
  • MC-normalization : a novel method for dye-normalization of two-channel microarray data
  • 2009
  • Ingår i: Statistical Applications in Genetics and Molecular Biology. - Berkeley : The Berkeley Electronic Press (bepress). - 1544-6115 .- 1544-6115. ; 8:1, s. 42-
  • Tidskriftsartikel (refereegranskat)abstract
    • Motivation: Pre-processing plays a vital role in two-color microarray data analysis. An analysis is characterized by its ability to identify differentially expressed genes (its sensitivity) and its ability to provide unbiased estimators of the true regulation (its bias). It has been shown that microarray experiments regularly underestimate the true regulation of differentially expressed genes. We introduce the MC-normalization, where C stands for channel-wise normalization, with considerably lower bias than the commonly used standard methods. Methods: The idea behind the MC-normalization is that the channels’ individual intensities determine the correction, rather than the average intensity which is the case for the widely used MA-normalization. The two methods were evaluated using spike-in data from an in-house produced cDNA-experiment and a public available Agilent-experiment. The methods were applied on background corrected and non-background corrected data. For the cDNA-experiment the methods were either applied separately on data from each of the print-tips or applied on the complete array data. Altogether 24 analyses were evaluated. For each analysis the sensitivity, the bias and two variance measures were estimated. Results: We prove that the MC-normalization has lower bias than the MA-normalization. The spike-in data confirmed the theoretical result and suggest that the difference is significant. Furthermore, the empirical data suggest that the MC-and MA-normalization have similar sensitivity. A striking result is that print-tip normalizations did have considerably higher sensitivity than analyses using the complete array data.
  •  
6.
  • Lee, W, et al. (författare)
  • Sparse Canonical Covariance Analysis for High-throughput Data
  • 2011
  • Ingår i: STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY. - : Walter de Gruyter GmbH. - 2194-6302 .- 1544-6115. ; 10:1
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract
    • Canonical covariance analysis (CCA) has gained popularity as a method for the analysis of two sets of high-dimensional genomic data. However, it is often difficult to interpret the results because canonical vectors are linear combinations of all variables, and the coefficients are typically nonzero. Several sparse CCA methods have recently been proposed for reducing the number of nonzero coefficients, but these existing methods are not satisfactory because they still give too many nonzero coefficients. In this paper, we propose a new random-effect model approach for sparse CCA; the proposed algorithm can adapt arbitrary penalty functions to CCA without much computational demands. Through simulation studies, we compare various penalty functions in terms of the performance of correct model identification. We also develop an extension of sparse CCA to address more than two sets of variables on the same set of observations. We illustrate the method with an analysis of the NCI cancer dataset.
  •  
7.
  • Lonnstedt, I., et al. (författare)
  • Empirical Bayes microarray ANOVA and grouping cell lines by equal expression levels
  • 2005
  • Ingår i: Statistical Applications in Genetics and Molecular Biology. - : Walter de Gruyter GmbH. - 1544-6115 .- 1544-6115 .- 2194-6302. ; 4
  • Tidskriftsartikel (refereegranskat)abstract
    • In the exploding field of gene expression techniques such as DNA microarrays, there are still few general probabilistic methods for analysis of variance. Linear models and ANOVA are heavily used tools in many other disciplines of scientific research. The usual F-statistic is unsatisfactory for microarray data, which explore many thousand genes in parallel, with few replicates. We present three potential one-way ANOVA statistics in a parametric statistical framework. The aim is to separate genes that are differently regulated across several treatment conditions from those with equal regulation. The statistics have different features and are evaluated using both real and simulated data. Our statistic B-1 generally shows the best performance, and is extended for use in an algorithm that groups cell lines by equal expression levels for each gene. An extension is also outlined for more general ANOVA tests including several factors. The methods presented are implemented in the freely available statistical language R. They are available at http://www.math.uu.se/staff/pages/?uname=ingrid.
  •  
8.
  • Svennblad, Bodil, et al. (författare)
  • Improving divergence time estimation in phylogenetics : More taxa vs. longer sequences
  • 2007
  • Ingår i: Statistical Applications in Genetics and Molecular Biology. - : Walter de Gruyter GmbH. - 1544-6115 .- 1544-6115 .- 2194-6302. ; 6, s. 35-
  • Tidskriftsartikel (refereegranskat)abstract
    • Maximum Likelihood (ML) is used as a standard method for estimating divergence times in phylogenetic trees. The method is consistent and hence the precision can be improved by analyzing longer sequences. In this paper we show that the precision can be improved also by including more taxa to the existing tree. It is a theoretical study, complemented with simulations, showing that the gain in precision is faster with increasing sequence length than with increasing number of taxa. We further compare the results of estimating divergence times using Maximum Likelihood with the much faster and less complex estimation method of Mean Path Length (MPL), which works with the evolution model of Jukes-Cantor (1969). It is shown that MPL is as good as ML in estimating divergence times of nodes that are located near the root in the tree, but ML is better in estimating the divergence times of nodes lower down.
  •  
9.
  • Wang, Z.A., et al. (författare)
  • Comparing spatial maps of human population-genetic variation using Procrustes analysis
  • 2010
  • Ingår i: Statistical Applications in Genetics and Molecular Biology. - : Walter de Gruyter GmbH. - 1544-6115 .- 1544-6115. ; 9:1, s. e13-
  • Tidskriftsartikel (refereegranskat)abstract
    • Recent applications of principal components analysis (PCA) and multidimensional scaling (MDS) in human population genetics have found that "statistical maps" based on the genotypes in population-genetic samples often resemble geographic maps of the underlying sampling locations. To provide formal tests of these qualitative observations, we describe a Procrustes analysis approach for quantitatively assessing the similarity of population-genetic and geographic maps. We confirm in two scenarios, one using single-nucleotide polymorphism (SNP) data from Europe and one using SNP data worldwide, that a measurably high level of concordance exists between statistical maps of population-genetic variation and geographic maps of sampling locations. Two other examples illustrate the versatility of the Procrustes approach in population-genetic applications, verifying the concordance of SNP analyses using PCA and MDS, and showing that statistical maps of worldwide copy-number variants (CNVs) accord with statistical maps of SNP variation, especially when CNV analysis is limited to samples with the highest-quality data. As statistical maps with PCA and MDS have become increasingly common for use in summarizing population relationships, our examples highlight the potential of Procrustes-based quantitative comparisons for interpreting the results in these maps.
  •  
10.
  • Fernandez-Ricaud, L., et al. (författare)
  • Testing of Chromosomal Clumping of Gene Properties
  • 2009
  • Ingår i: Statistical Applications in Genetics and Molecular Biology. - 1544-6115. ; 8:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Clumping of gene properties like expression or mutant phenotypes along chromosomes is commonly detected using completely random null-models where their location is equally likely across the chromosomes. Interpretation of statistical tests based on these assumptions may be misleading if dependencies exist that are unequal between chromosomes or in different chromosomal parts. One such regional dependency is the telomeric effect, observed in several studies of Saccharomyces cerevisiae, under which e. g. essential genes are less likely to reside near the chromosomal ends. In this study we demonstrate that standard randomisation test procedures are of limited applicability in the presence of telomeric effects. Several extensions of such standard tests are here suggested for handling clumping simultaneously with regional differences in essentiality frequencies in sub-telomeric and central gene positions. Furthermore, a general non-homogeneous discrete Markov approach for combining parametrically modelled position dependent probabilities of a dichotomous property with a simple single parameter clumping is suggested. This Markov model is adapted to the observed telomeric effects and then simulations are used to demonstrate properties of the suggested modified randomisation tests. The model is also applied as a direct alternative tool for statistical analysis of the S. cerevisiae genome for clumping of phenotypes.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 16

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy