SwePub
Sök i LIBRIS databas

  Extended search

onr:"swepub:oai:DiVA.org:umu-181727"
 

Search: onr:"swepub:oai:DiVA.org:umu-181727" > Comparison of Metho...

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist

Comparison of Methods for Feature Selection in Clustering of High-Dimensional RNA-Sequencing Data to Identify Cancer Subtypes

Källberg, David, 1982- (author)
Umeå universitet,Handelshögskolan vid Umeå universitet,Institutionen för matematik och matematisk statistik
Vidman, Linda (author)
Umeå universitet,Institutionen för matematik och matematisk statistik,Onkologi
Rydén, Patrik (author)
Umeå universitet,Institutionen för matematik och matematisk statistik
 (creator_code:org_t)
2021-02-24
2021
English.
In: Frontiers in Genetics. - : Frontiers Media S.A.. - 1664-8021. ; 12
  • Journal article (peer-reviewed)
Abstract Subject headings
Close  
  • Cancer subtype identification is important to facilitate cancer diagnosis and select effective treatments. Clustering of cancer patients based on high-dimensional RNA-sequencing data can be used to detect novel subtypes, but only a subset of the features (e.g., genes) contains information related to the cancer subtype. Therefore, it is reasonable to assume that the clustering should be based on a set of carefully selected features rather than all features. Several feature selection methods have been proposed, but how and when to use these methods are still poorly understood. Thirteen feature selection methods were evaluated on four human cancer data sets, all with known subtypes (gold standards), which were only used for evaluation. The methods were characterized by considering mean expression and standard deviation (SD) of the selected genes, the overlap with other methods and their clustering performance, obtained comparing the clustering result with the gold standard using the adjusted Rand index (ARI). The results were compared to a supervised approach as a positive control and two negative controls in which either a random selection of genes or all genes were included. For all data sets, the best feature selection approach outperformed the negative control and for two data sets the gain was substantial with ARI increasing from (−0.01, 0.39) to (0.66, 0.72), respectively. No feature selection method completely outperformed the others but using the dip-rest statistic to select 1000 genes was overall a good choice. The commonly used approach, where genes with the highest SDs are selected, did not perform well in our study.

Subject headings

NATURVETENSKAP  -- Matematik -- Sannolikhetsteori och statistik (hsv//swe)
NATURAL SCIENCES  -- Mathematics -- Probability Theory and Statistics (hsv//eng)
NATURVETENSKAP  -- Biologi -- Bioinformatik och systembiologi (hsv//swe)
NATURAL SCIENCES  -- Biological Sciences -- Bioinformatics and Systems Biology (hsv//eng)

Keyword

cancer subtypes
feature selection
gene selection
high-dimensional
RNA-seq

Publication and Content Type

ref (subject category)
art (subject category)

Find in a library

To the university's database

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist

Find more in SwePub

By the author/editor
Källberg, David, ...
Vidman, Linda
Rydén, Patrik
About the subject
NATURAL SCIENCES
NATURAL SCIENCES
and Mathematics
and Probability Theo ...
NATURAL SCIENCES
NATURAL SCIENCES
and Biological Scien ...
and Bioinformatics a ...
Articles in the publication
Frontiers in Gen ...
By the university
Umeå University

Search outside SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view