Sökning: id:"swepub:oai:DiVA.org:uu-97961" >
De novo search for ...
De novo search for non-coding RNA genes in the AT-rich genome of Dictyostelium discoideum : Performance of Markov-dependent genome feature scoring
-
- Larsson, Pontus (författare)
- Uppsala universitet,Institutionen för cell- och molekylärbiologi
-
- Hinas, Andrea (författare)
- Uppsala universitet,Institutionen för medicinsk biokemi och mikrobiologi
-
- Ardell, David H. (författare)
- Uppsala universitet,Centrum för bioinformatik
-
visa fler...
-
- Kirsebom, Leif A. (författare)
- Uppsala universitet,Institutionen för cell- och molekylärbiologi
-
- Virtanen, Anders (författare)
- Uppsala universitet,Molekylärbiologi
-
Söderbom, Fredrik (författare)
-
visa färre...
-
(creator_code:org_t)
- 2008-03-17
- 2008
- Engelska.
-
Ingår i: Genome Research. - : Cold Spring Harbor Laboratory. - 1088-9051 .- 1549-5469. ; 18:6, s. 888-899
- Relaterad länk:
-
http://genome.cshlp....
-
visa fler...
-
https://urn.kb.se/re...
-
https://doi.org/10.1...
-
visa färre...
Abstract
Ämnesord
Stäng
- Genome data are increasingly important in the computational identification of novel regulatory non-coding RNAs (ncRNAs). However, most ncRNA gene-finders are either specialized to well-characterized ncRNA gene families or require comparisons of closely related genomes. We developed a method for de novo screening for ncRNA genes with a nucleotide composition that stands out against the background genome based on a partial sum process. We compared the performance when assuming independent and first-order Markov-dependent nucleotides, respectively, and used Karlin-Altschul and Karlin-Dembo statistics to evaluate the significance of hits. We hypothesized that a first-order Markov-dependent process might have better power to detect ncRNA genes since nearest-neighbor models have been shown to be successful in predicting RNA structures. A model based on a first-order partial sum process (analyzing overlapping dinucleotides) had better sensitivity and specificity than a zeroth-order model when applied to the AT-rich genome of the amoeba Dictyostelium discoideum. In this genome, we detected 94% of previously known ncRNA genes (at this sensitivity, the false positive rate was estimated to be 25% in a simulated background). The predictions were further refined by clustering candidate genes according to sequence similarity and/or searching for an ncRNA-associated upstream element. We experimentally verified six out of 10 tested ncRNA gene predictions. We conclude that higher-order models, in combination with other information, are useful for identification of novel ncRNA gene families in single-genome analysis of D. discoideum. Our generalizable approach extends the range of genomic data that can be searched for novel ncRNA genes using well-grounded statistical methods.
Ämnesord
- NATURVETENSKAP -- Biologi -- Biokemi och molekylärbiologi (hsv//swe)
- NATURAL SCIENCES -- Biological Sciences -- Biochemistry and Molecular Biology (hsv//eng)
Nyckelord
- Cell and molecular biology
- Cell- och molekylärbiologi
Publikations- och innehållstyp
- ref (ämneskategori)
- art (ämneskategori)
Hitta via bibliotek
Till lärosätets databas