SwePub
Sök i LIBRIS databas

  Utökad sökning

WFRF:(Jakobsson Per)
 

Sökning: WFRF:(Jakobsson Per) > Comparison of seque...

Comparison of sequencing data processing pipelines and application to underrepresented human populations

Breton, Gwenna (författare)
Uppsala universitet,Människans evolution
Johansson, Anna C. V. (författare)
Uppsala universitet,Science for Life Laboratory, SciLifeLab
Sjödin, Per (författare)
Uppsala universitet,Människans evolution
visa fler...
Schlebusch, Carina, 1977- (författare)
Uppsala universitet,Människans evolution,Science for Life Laboratory, SciLifeLab,Palaeo-Research Institute, University of Johannesburg, P.O. Box 524, Auckland Park 2006, South Africa
Jakobsson, Mattias (författare)
Uppsala universitet,Science for Life Laboratory, SciLifeLab,Människans evolution,Palaeo-Research Institute, University of Johannesburg, P.O. Box 524, Auckland Park 2006, South Africa
visa färre...
 (creator_code:org_t)
Engelska.
Ingår i: BMC Bioinformatics. - 1471-2105.
  • Tidskriftsartikel (refereegranskat)
Abstract Ämnesord
Stäng  
  • Population genetic studies of humans make increasing use of high-throughput sequencing in order to capture human diversity in an unbiased way. There is an abundance of sequencing technologies, bioinformatic tools and the available genomes are increasing in number. Studies have evaluated and compared some of these technologies and tools, such as the Genome Analysis Toolkit (GATK) and its “Best Practices” bioinformatic pipelines. However, studies often focus on few genomes of Eurasian origin in order to detect technical issues. We instead surveyed the use of the GATK tools and established a pipeline for processing high coverage full genomes from a diverse set of populations, including Sub-Saharan African groups, in order to reveal challenges from human diversity and stratification.We started by surveying 29 studies using high-throughput sequencing data, and compared their strategies for data pre-processing and variant calling. We found that processing of data is very variable across studies, that the GATK “Best Practices” are seldom followed strictly and that processing pipelines are often not reported in full details. We then compared three versions of the GATK pipeline, differing in the inclusion of an indel realignment step and with a modification of the base quality score recalibration step. We applied the pipelines on a diverse set of 28 individuals. We compared the pipelines in terms of count of called variants and overlap of the callsets. We found that the pipelines resulted in similar callsets, in particular after callset filtering. We also ran one of the pipeline on a larger dataset of 179 individuals. We noted that including more individuals at the joint genotyping step resulted in different counts of variants. At the individual level, we observed that the average genome coverage was correlated to the number of variants called.We conclude that applying the GATK “Best Practices” pipeline, including their recommended reference datasets, to underrepresented populations does not lead to a decrease in the number of called variants compared to alternative pipelines. We recommend to aim for a coverage of >30X, and to work with large sample sizes at the variant calling stage, also for underrepresented individuals and populations.

Publikations- och innehållstyp

ref (ämneskategori)
art (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy