SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Schliep Alexander 1967) srt2:(2010-2014)"

Sökning: WFRF:(Schliep Alexander 1967) > (2010-2014)

  • Resultat 1-10 av 11
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Hafemeister, Christoph, et al. (författare)
  • Classifying short gene expression time-courses with Bayesian estimation of piecewise constant functions.
  • 2011
  • Ingår i: Bioinformatics (Oxford, England). - : Oxford University Press (OUP). - 1367-4811 .- 1367-4803. ; 27:7, s. 946-52
  • Tidskriftsartikel (refereegranskat)abstract
    • Analyzing short time-courses is a frequent and relevant problem in molecular biology, as, for example, 90% of gene expression time-course experiments span at most nine time-points. The biological or clinical questions addressed are elucidating gene regulation by identification of co-expressed genes, predicting response to treatment in clinical, trial-like settings or classifying novel toxic compounds based on similarity of gene expression time-courses to those of known toxic compounds. The latter problem is characterized by irregular and infrequent sample times and a total lack of prior assumptions about the incoming query, which comes in stark contrast to clinical settings and requires to implicitly perform a local, gapped alignment of time series. The current state-of-the-art method (SCOW) uses a variant of dynamic time warping and models time series as higher order polynomials (splines).We suggest to model time-courses monitoring response to toxins by piecewise constant functions, which are modeled as left-right Hidden Markov Models. A Bayesian approach to parameter estimation and inference helps to cope with the short, but highly multivariate time-courses. We improve prediction accuracy by 7% and 4%, respectively, when classifying toxicology and stress response data. We also reduce running times by at least a factor of 140; note that reasonable running times are crucial when classifying response to toxins. In conclusion, we have demonstrated that appropriate reduction of model complexity can result in substantial improvements both in classification performance and running time.A Python package implementing the methods described is freely available under the GPL from http://bioinformatics.rutgers.edu/Software/MVQueries/.
  •  
2.
  • Marschall, Tobias, et al. (författare)
  • CLEVER: clique-enumerating variant finder.
  • 2012
  • Ingår i: Bioinformatics (Oxford, England). - : Oxford University Press (OUP). - 1367-4811 .- 1367-4803. ; 28:22, s. 2875-82
  • Tidskriftsartikel (refereegranskat)abstract
    • Next-generation sequencing techniques have facilitated a large-scale analysis of human genetic variation. Despite the advances in sequencing speed, the computational discovery of structural variants is not yet standard. It is likely that many variants have remained undiscovered in most sequenced individuals.Here, we present a novel internal segment size based approach, which organizes all, including concordant, reads into a read alignment graph, where max-cliques represent maximal contradiction-free groups of alignments. A novel algorithm then enumerates all max-cliques and statistically evaluates them for their potential to reflect insertions or deletions. For the first time in the literature, we compare a large range of state-of-the-art approaches using simulated Illumina reads from a fully annotated genome and present relevant performance statistics. We achieve superior performance, in particular, for deletions or insertions (indels) of length 20-100 nt. This has been previously identified as a remaining major challenge in structural variation discovery, in particular, for insert size based approaches. In this size range, we even outperform split-read aligners. We achieve competitive results also on biological data, where our method is the only one to make a substantial amount of correct predictions, which, additionally, are disjoint from those by split-read aligners.CLEVER is open source (GPL) and available from http://clever-sv.googlecode.com.as@cwi.nl or tm@cwi.nl.Supplementary data are available at Bioinformatics online.
  •  
3.
  • Georgi, Benjamin, et al. (författare)
  • PyMix--the python mixture package--a tool for clustering of heterogeneous biological data.
  • 2010
  • Ingår i: BMC bioinformatics. - : Springer Science and Business Media LLC. - 1471-2105. ; 11
  • Tidskriftsartikel (refereegranskat)abstract
    • Cluster analysis is an important technique for the exploratory analysis of biological data. Such data is often high-dimensional, inherently noisy and contains outliers. This makes clustering challenging. Mixtures are versatile and powerful statistical models which perform robustly for clustering in the presence of noise and have been successfully applied in a wide range of applications.PyMix - the Python mixture package implements algorithms and data structures for clustering with basic and advanced mixture models. The advanced models include context-specific independence mixtures, mixtures of dependence trees and semi-supervised learning. PyMix is licenced under the GNU General Public licence (GPL). PyMix has been successfully used for the analysis of biological sequence, complex disease and gene expression data.PyMix is a useful tool for cluster analysis of biological data. Due to the general nature of the framework, PyMix can be applied to a wide range of applications and data sets.
  •  
4.
  • Hafemeister, Christoph, et al. (författare)
  • Selecting oligonucleotide probes for whole-genome tiling arrays with a cross-hybridization potential.
  • 2011
  • Ingår i: IEEE/ACM transactions on computational biology and bioinformatics. - 1557-9964. ; 8:6, s. 1642-52
  • Tidskriftsartikel (refereegranskat)abstract
    • For designing oligonucleotide tiling arrays popular, current methods still rely on simple criteria like Hamming distance or longest common factors, neglecting base stacking effects which strongly contribute to binding energies. Consequently, probes are often prone to cross-hybridization which reduces the signal-to-noise ratio and complicates downstream analysis. We propose the first computationally efficient method using hybridization energy to identify specific oligonucleotide probes. Our Cross-Hybridization Potential (CHP) is computed with a Nearest Neighbor Alignment, which efficiently estimates a lower bound for the Gibbs free energy of the duplex formed by two DNA sequences of bounded length. It is derived from our simplified reformulation of t-gap insertion-deletion-like metrics. The computations are accelerated by a filter using weighted ungapped q-grams to arrive at seeds. The computation of the CHP is implemented in our software OSProbes, available under the GPL, which computes sets of viable probe candidates. The user can choose a trade-off between running time and quality of probes selected. We obtain very favorable results in comparison with prior approaches with respect to specificity and sensitivity for cross-hybridization and genome coverage with high-specificity probes. The combination of OSProbes and our Tileomatic method, which computes optimal tiling paths from candidate sets, yields globally optimal tiling arrays, balancing probe distance, hybridization conditions, and uniqueness of hybridization.
  •  
5.
  • Hochstättler, Winfried., et al. (författare)
  • CATBox
  • 2010
  • Bok (övrigt vetenskapligt/konstnärligt)
  •  
6.
  • Mahmud, Md Pavel, et al. (författare)
  • Fast MCMC sampling for hidden Markov Models to determine copy number variations.
  • 2011
  • Ingår i: BMC bioinformatics. - : Springer Science and Business Media LLC. - 1471-2105. ; 12
  • Tidskriftsartikel (refereegranskat)abstract
    • Hidden Markov Models (HMM) are often used for analyzing Comparative Genomic Hybridization (CGH) data to identify chromosomal aberrations or copy number variations by segmenting observation sequences. For efficiency reasons the parameters of a HMM are often estimated with maximum likelihood and a segmentation is obtained with the Viterbi algorithm. This introduces considerable uncertainty in the segmentation, which can be avoided with Bayesian approaches integrating out parameters using Markov Chain Monte Carlo (MCMC) sampling. While the advantages of Bayesian approaches have been clearly demonstrated, the likelihood based approaches are still preferred in practice for their lower running times; datasets coming from high-density arrays and next generation sequencing amplify these problems.We propose an approximate sampling technique, inspired by compression of discrete sequences in HMM computations and by kd-trees to leverage spatial relations between data points in typical data sets, to speed up the MCMC sampling.We test our approximate sampling method on simulated and biological ArrayCGH datasets and high-density SNP arrays, and demonstrate a speed-up of 10 to 60 respectively 90 while achieving competitive results with the state-of-the art Bayesian approaches.An implementation of our method will be made available as part of the open source GHMM library from http://ghmm.org.
  •  
7.
  • Mahmud, Md Pavel, et al. (författare)
  • Indel-tolerant read mapping with trinucleotide frequencies using cache-oblivious kd-trees.
  • 2012
  • Ingår i: Bioinformatics (Oxford, England). - : Oxford University Press (OUP). - 1367-4811 .- 1367-4803. ; 28:18
  • Tidskriftsartikel (refereegranskat)abstract
    • Mapping billions of reads from next generation sequencing experiments to reference genomes is a crucial task, which can require hundreds of hours of running time on a single CPU even for the fastest known implementations. Traditional approaches have difficulties dealing with matches of large edit distance, particularly in the presence of frequent or large insertions and deletions (indels). This is a serious obstacle both in determining the spectrum and abundance of genetic variations and in personal genomics.For the first time, we adopt the approximate string matching paradigm of geometric embedding to read mapping, thus rephrasing it to nearest neighbor queries in a q-gram frequency vector space. Using the L(1) distance between frequency vectors has the benefit of providing lower bounds for an edit distance with affine gap costs. Using a cache-oblivious kd-tree, we realize running times, which match the state-of-the-art. Additionally, running time and memory requirements are about constant for read lengths between 100 and 1000 bp. We provide a first proof-of-concept that geometric embedding is a promising paradigm for read mapping and that L(1) distance might serve to detect structural variations. TreQ, our initial implementation of that concept, performs more accurate than many popular read mappers over a wide range of structural variants.TreQ will be released under the GNU Public License (GPL), and precomputed genome indices will be provided for download at http://treq.sf.net.pavelm@cs.rutgers.eduSupplementary data are available at Bioinformatics online.
  •  
8.
  • Mahmud, Md Pavel, et al. (författare)
  • Speeding up Bayesian HMM by the four Russians method
  • 2011
  • Ingår i: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). - Berlin, Heidelberg : Springer Berlin Heidelberg. - 0302-9743 .- 1611-3349.
  • Konferensbidrag (refereegranskat)abstract
    • Bayesian computations with Hidden Markov Models (HMMs) are often avoided in practice. Instead, due to reduced running time, point estimates - maximum likelihood (ML) or maximum a posterior (MAP) - are obtained and observation sequences are segmented based on the Viterbi path, even though the lack of accuracy and dependency on starting points of the local optimization are well known. We propose a method to speed-up Bayesian computations which addresses this problem for regular and time-dependent HMMs with discrete observations. In particular, we show that by exploiting sequence repetitions, using the four Russians method, and the conditional dependency structure, it is possible to achieve a Θ(logT) speed-up, where T is the length of the observation sequence. Our experimental results on identification of segments of homogeneous nucleic acid composition, known as the DNA segmentation problem, show that the speed-up is also observed in practice. © 2011 Springer-Verlag.
  •  
9.
  • Roy, Rajat S, et al. (författare)
  • SLIQ: simple linear inequalities for efficient contig scaffolding.
  • 2012
  • Ingår i: Journal of computational biology : a journal of computational molecular cell biology. - : Mary Ann Liebert Inc. - 1557-8666. ; 19:10, s. 1162-75
  • Tidskriftsartikel (refereegranskat)abstract
    • Scaffolding is an important subproblem in de novo genome assembly, in which mate pair data are used to construct a linear sequence of contigs separated by gaps. Here we present SLIQ, a set of simple linear inequalities derived from the geometry of contigs on the line that can be used to predict the relative positions and orientations of contigs from individual mate pair reads and thus produce a contig digraph. The SLIQ inequalities can also filter out unreliable mate pairs and can be used as a preprocessing step for any scaffolding algorithm. We tested the SLIQ inequalities on five real data sets ranging in complexity from simple bacterial genomes to complex mammalian genomes and compared the results to the majority voting procedure used by many other scaffolding algorithms. SLIQ predicted the relative positions and orientations of the contigs with high accuracy in all cases and gave more accurate position predictions than majority voting for complex genomes, in particular the human genome. Finally, we present a simple scaffolding algorithm that produces linear scaffolds given a contig digraph. We show that our algorithm is very efficient compared to other scaffolding algorithms while maintaining high accuracy in predicting both contig positions and orientations for real data sets.
  •  
10.
  • Schilling, Ruben, et al. (författare)
  • pGQL: A probabilistic graphical query language for gene expression time courses.
  • 2011
  • Ingår i: BioData mining. - : Springer Science and Business Media LLC. - 1756-0381. ; 4
  • Tidskriftsartikel (refereegranskat)abstract
    • Timeboxes are graphical user interface widgets that were proposed to specify queries on time course data. As queries can be very easily defined, an exploratory analysis of time course data is greatly facilitated. While timeboxes are effective, they have no provisions for dealing with noisy data or data with fluctuations along the time axis, which is very common in many applications. In particular, this is true for the analysis of gene expression time courses, which are mostly derived from noisy microarray measurements at few unevenly sampled time points. From a data mining point of view the robust handling of data through a sound statistical model is of great importance.We propose probabilistic timeboxes, which correspond to a specific class of Hidden Markov Models, that constitutes an established method in data mining. Since HMMs are a particular class of probabilistic graphical models we call our method Probabilistic Graphical Query Language. Its implementation was realized in the free software package pGQL. We evaluate its effectiveness in exploratory analysis on a yeast sporulation data set.We introduce a new approach to define dynamic, statistical queries on time course data. It supports an interactive exploration of reasonably large amounts of data and enables users without expert knowledge to specify fairly complex statistical models with ease. The expressivity of our approach is by its statistical nature greater and more robust with respect to amplitude and frequency fluctuation than the prior, deterministic timeboxes.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 11

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy