SwePub
Sök i LIBRIS databas

  Utökad sökning

WFRF:(Li Chun Biu)
 

Sökning: WFRF:(Li Chun Biu) > Contemporary develo...

Contemporary developments and applications of unsupervised machine learning methods

Tas Kiper, Busra, 1991- (författare)
Stockholms universitet,Matematiska institutionen
Li, Chun-Biu, Docent (preses)
Stockholms universitet,Matematiska institutionen
Tyrcha, Joanna, Professor, 1956- (preses)
Stockholms universitet,Matematiska institutionen
visa fler...
Wallin, Jonas, Docent (opponent)
Department of Statistics, Lund University, Sweden
visa färre...
 (creator_code:org_t)
ISBN 9789180148696
Stockholm : Department of Mathematics, Stockholm University, 2024
Engelska 35 s.
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)
Abstract Ämnesord
Stäng  
  • This thesis presents state-of-the-art developments in the field of unsupervised learning, particularly in clustering analysis. Unsupervised learning is a branch of machine learning whose task is to discover hidden patterns and relationships in high-dimensional data without any labels. It is an important step in providing valuable insights, e.g., the existence of important discrete structures and low-dimensional features, for downstream statistical analyses as well as revealing anomalies. The achievements of this thesis detailed below advance our toolboxes in pattern recognition and anomaly detection that have potential applications in many scientific areas with unstructured and unlabelled data.Paper I presents the application of unsupervised change point (CP) detection to molecular time series to explain the dynamics of motor proteins. Data-driven non-parametric detection of CP enables an objective identification and modelling of stepping patterns in molecular motors. Beyond CP detection, this study provides further tools to analyze molecular motors, such as the reliable extraction of reaction statistics and establishing a predictive model for the reaction rates. The methods developed and applied in this paper are applicable to time series data from a broad range of scientific fields.Paper II proposes the Graph-based Fuzzy Density Peak Clustering (GF-DPC) method that comprises comprehensive generalizations of existing density-based clustering methods. The first generalization is employing graph-based methods to estimate densities and capture nonlinearities in the data that enhances the power of detecting clusters with arbitrary shapes. On the other hand, a fuzzy extension is formulated to provide a probabilistic framework to assign data points to clusters. Finally, the identification of cluster centers and the number of clusters is automated in terms of the fuzzy clustering validation index. Compared with other well-known fuzzy clustering methods, the superior performances of GF-DPC in discovering clusters with arbitrary shapes, densities, separations and overlapping are demonstrated using both intuitive examples and real datasets.Paper III establishes a validation framework versatile for fuzzy clustering, termed the Shape-aware Generalized Silhouette Analysis (SAGSA), based on the silhouette index. In SAGSA, a probabilistic framework is formulated to quantify the degree of cohesion and separation for the detected fuzzy clusters. In addition, graph-based distances are employed in SAGSA to facilitate an accurate validation of nonlinear clustering structures. Most importantly, a 2-dimensional graphical tool, the cohesion-separation (CS) plot, is introduced to enable visual diagnoses of possible problems in the clustering results at the point-wise, cluster-wise and global levels regardless of the dimensionality of the dataset. Finally, we illustrate the effectiveness of SAGSA in cluster validation compared with other commonly used methods in terms of various test examples of clustering challenges, these include clusters with arbitrary shapes, imbalance sizes, overlapping, hierarchical structures, mixed with noises, etc.

Ämnesord

NATURVETENSKAP  -- Matematik -- Beräkningsmatematik (hsv//swe)
NATURAL SCIENCES  -- Mathematics -- Computational Mathematics (hsv//eng)

Nyckelord

Clustering analysis
Fuzzy clustering
Graph-based methods
Clustering validation
Time series analysis
Change point detection
Computational Mathematics
beräkningsmatematik

Publikations- och innehållstyp

vet (ämneskategori)
dok (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy