Domain expertise–agnostic feature selection for the analysis of breast cancer data*

↓ Direkt till sidans innehåll
↓ Direkt till sidans sekundära innehåll (sidomenyn)

Tyck till om SwePub Sök här!

Sökning: AMNE:(MEDICAL AND HEALTH SCIENCES Clinical Medicine Cancer and Oncology) > (2020) > Domain expertise–ag...

Domain expertise–agnostic feature selection for the analysis of breast cancer data*

Pozzoli, Susanna (författare): KTH,Programvaruteknik och datorsystem, SCS,KTH Royal Institute of Technology, Sweden; Politecnico di Milano, Italy

Soliman, Amira (författare): RISE,RISE SICS

Bahri, Leila (författare): KTH,Programvaruteknik och datorsystem, SCS,KTH Royal Institute of Technology, Sweden

visa fler...

Branca, Rui (författare): Karolinska Institutet

Girdzijauskas, Sarunas (författare): KTH,RISE,Datavetenskap,KTH Royal Institute of Technology, Sweden,Programvaruteknik och datorsystem, SCS

Brambilla, Marco (författare): Politecnico di Milano, Italy

visa färre...

(creator_code:org_t)

Elsevier B.V. 2020
2020
Engelska.
Ingår i: Artificial Intelligence in Medicine. - : Elsevier B.V.. - 0933-3657 .- 1873-2860. ; 108

Relaterad länk:: https://zenodo.org/r...; visa fler...; https://urn.kb.se/re...; https://doi.org/10.1...; https://urn.kb.se/re...; http://kipublication...; visa färre...

Tidskriftsartikel (refereegranskat)

Abstract Ämnesord

Stäng

Progress in proteomics has enabled biologists to accurately measure the amount of protein in a tumor. This work is based on a breast cancer data set, result of the proteomics analysis of a cohort of tumors carried out at Karolinska Institutet. While evidence suggests that an anomaly in the protein content is related to the cancerous nature of tumors, the proteins that could be markers of cancer types and subtypes and the underlying interactions are not completely known. This work sheds light on the potential of the application of unsupervised learning in the analysis of the aforementioned data sets, namely in the detection of distinctive proteins for the identification of the cancer subtypes, in the absence of domain expertise. In the analyzed data set, the number of samples, or tumors, is significantly lower than the number of features, or proteins; consequently, the input data can be thought of as high-dimensional data. The use of high-dimensional data has already become widespread, and a great deal of effort has been put into high-dimensional data analysis by means of feature selection, but it is still largely based on prior specialist knowledge, which in this case is not complete. There is a growing need for unsupervised feature selection, which raises the issue of how to generate promising subsets of features among all the possible combinations, as well as how to evaluate the quality of these subsets in the absence of specialist knowledge. We hereby propose a new wrapper method for the generation and evaluation of subsets of features via spectral clustering and modularity, respectively. We conduct experiments to test the effectiveness of the new method in the analysis of the breast cancer data, in a domain expertise–agnostic context. Furthermore, we show that we can successfully augment our method by incorporating an external source of data on known protein complexes. Our approach reveals a large number of subsets of features that are better at clustering the samples than the state-of-the-art classification in terms of modularity and shows a potential to be useful for future proteomics research.

Ämnesord

NATURVETENSKAP -- Data- och informationsvetenskap (hsv//swe)
NATURAL SCIENCES -- Computer and Information Sciences (hsv//eng)
MEDICIN OCH HÄLSOVETENSKAP -- Klinisk medicin -- Cancer och onkologi (hsv//swe)
MEDICAL AND HEALTH SCIENCES -- Clinical Medicine -- Cancer and Oncology (hsv//eng)
NATURVETENSKAP -- Data- och informationsvetenskap -- Datavetenskap (hsv//swe)
NATURAL SCIENCES -- Computer and Information Sciences -- Computer Sciences (hsv//eng)
TEKNIK OCH TEKNOLOGIER -- Medicinteknik -- Annan medicinteknik (hsv//swe)
ENGINEERING AND TECHNOLOGY -- Medical Engineering -- Other Medical Engineering (hsv//eng)

Nyckelord

Breast cancer
Clustering
Clustering performance evaluation
Dimensionality reduction
Feature selection
Proteomics
Unsupervised learning
Clustering algorithms
Diseases
Feature extraction
Proteins
Set theory
Tumors
Breast cancer data
High dimensional data
High-dimensional data analysis
Number of samples
Protein complexes
Proteomics research
Spectral clustering
Unsupervised feature selection
Quality control

Publikations- och innehållstyp

ref (ämneskategori)
art (ämneskategori)

Hitta via bibliotek

Artificial Intelligence in Medicine (Sök värdpublikationen i LIBRIS)

Till lärosätets databas

Hitta mer i SwePub

Av författaren/redakt...: Pozzoli, Susanna; Soliman, Amira; Bahri, Leila; Branca, Rui; Girdzijauskas, S ...; Brambilla, Marco

Om ämnet

NATURVETENSKAP: NATURVETENSKAP; och Data och informa ...

MEDICIN OCH HÄLSOVETENSKAP: MEDICIN OCH HÄLS ...; och Klinisk medicin; och Cancer och onkol ...

NATURVETENSKAP: NATURVETENSKAP; och Data och informa ...; och Datavetenskap

TEKNIK OCH TEKNOLOGIER: TEKNIK OCH TEKNO ...; och Medicinteknik; och Annan medicintek ...

Artiklar i publikationen: Artificial Intel ...

Av lärosätet: RISE; Kungliga Tekniska Högskolan; Karolinska Institutet

Sök utanför SwePub

Sök vidare i:: Google; Google Book Search; Google Scholar

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

LIBRIS.kb.se