SwePub
Sök i SwePub databas

  Extended search

Träfflista för sökning "WFRF:(Martins Rafael Messias Dr. 1984 ) "

Search: WFRF:(Martins Rafael Messias Dr. 1984 )

  • Result 1-50 of 50
Sort/group result
   
EnumerationReferenceCoverFind
1.
  • Chatzimparmpas, Angelos, 1994-, et al. (author)
  • A survey of surveys on the use of visualization for interpreting machine learning models
  • 2020
  • In: Information Visualization. - : Sage Publications. - 1473-8716 .- 1473-8724. ; 19:3, s. 207-233
  • Journal article (peer-reviewed)abstract
    • Research in machine learning has become very popular in recent years, with many types of models proposed to comprehend and predict patterns and trends in data originating from different domains. As these models get more and more complex, it also becomes harder for users to assess and trust their results, since their internal operations are mostly hidden in black boxes. The interpretation of machine learning models is currently a hot topic in the information visualization community, with results showing that insights from machine learning models can lead to better predictions and improve the trustworthiness of the results. Due to this, multiple (and extensive) survey articles have been published recently trying to summarize the high number of original research papers published on the topic. But there is not always a clear definition of what these surveys cover, what is the overlap between them, which types of machine learning models they deal with, or what exactly is the scenario that the readers will find in each of them. In this article, we present a metaanalysis (i.e. a ‘‘survey of surveys’’) of manually collected survey papers that refer to the visual interpretation of machine learning models, including the papers discussed in the selected surveys. The aim of our article is to serve both as a detailed summary and as a guide through this survey ecosystem by acquiring, cataloging, and presenting fundamental knowledge of the state of the art and research opportunities in the area. Our results confirm the increasing trend of interpreting machine learning with visualizations in the past years, and that visualization can assist in, for example, online training processes of deep learning models and enhancing trust into machine learning. However, the question of exactly how this assistance should take place is still considered as an open challenge of the visualization community.
  •  
2.
  • Chatzimparmpas, Angelos, 1994-, et al. (author)
  • DeforestVis : Behavior Analysis of Machine Learning Models with Surrogate Decision Stumps
  • 2024
  • In: Computer graphics forum (Print). - : John Wiley & Sons. - 0167-7055 .- 1467-8659.
  • Journal article (peer-reviewed)abstract
    • As the complexity of Machine Learning (ML) models increases and their application in different (and critical) domains grows, there is a strong demand for more interpretable and trustworthy ML. A direct, model-agnostic, way to interpret such models is to train surrogate models—such as rule sets and decision trees—that sufficiently approximate the original ones while being simpler and easier-to-explain. Yet, rule sets can become very lengthy, with many if-else statements, and decision tree depth grows rapidly when accurately emulating complex ML models. In such cases, both approaches can fail to meet their core goal—providing users with model interpretability. To tackle this, we propose DeforestVis, a visual analytics tool that offers summarization of the behavior of complex ML models by providing surrogate decision stumps (one-level decision trees) generated with the Adaptive Boosting (AdaBoost) technique. DeforestVis helps users to explore the complexity vs fidelity trade-off by incrementally generating more stumps, creating attribute-based explanations with weighted stumps to justify decision making, and analyzing the impact of rule overriding on training instance allocation between one or more stumps. An independent test set allows users to monitor the effectiveness of manual rule changes and form hypotheses based on case-by-case analyses. We show the applicability and usefulness of DeforestVis with two use cases and expert interviews with data analysts and model developers.
  •  
3.
  • Chatzimparmpas, Angelos, 1994-, et al. (author)
  • Empirical Study : Visual Analytics for Comparing Stacking to Blending Ensemble Learning
  • 2021
  • In: Proceedings of the 23rd International Conference on Control Systems and Computer Science (CSCS23), 26–28 May 2021, Bucharest, Romania. - : IEEE. - 9781665439404 - 9781665439398 ; , s. 1-8
  • Conference paper (other academic/artistic)abstract
    • Stacked generalization (also called stacking) is an ensemble method in machine learning that uses a metamodel to combine the predictive results of heterogeneous base models arranged in at least one layer. K-fold cross-validation is employed at the various stages of training in this method. Nonetheless, another validation strategy is to try out several splits of data leading to different train and test sets for the base models and then use only the latter to train the metamodel—this is known as blending. In this work, we present a modification of an existing visual analytics system, entitled StackGenVis, that now supports the process of composing robust and diverse ensembles of models with both aforementioned methods. We have built multiple ensembles using our system with the two respective methods, and we tested the performance with six small- to large-sized data sets. The results indicate that stacking is significantly more powerful than blending based on three performance metrics. However, the training times of the base models and the final ensembles are lower and more stable during various train/test splits in blending rather than stacking.
  •  
4.
  • Chatzimparmpas, Angelos, 1994-, et al. (author)
  • FeatureEnVi : Visual Analytics for Feature Engineering Using Stepwise Selection and Semi-Automatic Extraction Approaches
  • 2022
  • In: IEEE Transactions on Visualization and Computer Graphics. - : IEEE. - 1077-2626 .- 1941-0506. ; 28:4, s. 1773-1791
  • Journal article (peer-reviewed)abstract
    • The machine learning (ML) life cycle involves a series of iterative steps, from the effective gathering and preparation of the data—including complex feature engineering processes—to the presentation and improvement of results, with various algorithms to choose from in every step. Feature engineering in particular can be very beneficial for ML, leading to numerous improvements such as boosting the predictive results, decreasing computational times, reducing excessive noise, and increasing the transparency behind the decisions taken during the training. Despite that, while several visual analytics tools exist to monitor and control the different stages of the ML life cycle (especially those related to data and algorithms), feature engineering support remains inadequate. In this paper, we present FeatureEnVi, a visual analytics system specifically designed to assist with the feature engineering process. Our proposed system helps users to choose the most important feature, to transform the original features into powerful alternatives, and to experiment with different feature generation combinations. Additionally, data space slicing allows users to explore the impact of features on both local and global scales. FeatureEnVi utilizes multiple automatic feature selection techniques; furthermore, it visually guides users with statistical evidence about the influence of each feature (or subsets of features). The final outcome is the extraction of heavily engineered features, evaluated by multiple validation metrics. The usefulness and applicability of FeatureEnVi are demonstrated with two use cases and a case study. We also report feedback from interviews with two ML experts and a visualization researcher who assessed the effectiveness of our system.
  •  
5.
  • Chatzimparmpas, Angelos, 1994-, et al. (author)
  • StackGenVis : Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using Performance Metrics
  • 2021
  • In: IEEE Transactions on Visualization and Computer Graphics. - : IEEE Computer Society Digital Library. - 1077-2626 .- 1941-0506. ; 27:2, s. 1547-1557
  • Journal article (peer-reviewed)abstract
    • In machine learning (ML), ensemble methods—such as bagging, boosting, and stacking—are widely-established approaches that regularly achieve top-notch predictive performance. Stacking (also called "stacked generalization") is an ensemble method that combines heterogeneous base models, arranged in at least one layer, and then employs another metamodel to summarize the predictions of those models. Although it may be a highly-effective approach for increasing the predictive performance of ML, generating a stack of models from scratch can be a cumbersome trial-and-error process. This challenge stems from the enormous space of available solutions, with different sets of data instances and features that could be used for training, several algorithms to choose from, and instantiations of these algorithms using diverse parameters (i.e., models) that perform differently according to various metrics. In this work, we present a knowledge generation model, which supports ensemble learning with the use of visualization, and a visual analytics system for stacked generalization. Our system, StackGenVis, assists users in dynamically adapting performance metrics, managing data instances, selecting the most important features for a given data set, choosing a set of top-performant and diverse algorithms, and measuring the predictive performance. In consequence, our proposed tool helps users to decide between distinct models and to reduce the complexity of the resulting stack by removing overpromising and underperforming models. The applicability and effectiveness of StackGenVis are demonstrated with two use cases: a real-world healthcare data set and a collection of data related to sentiment/stance detection in texts. Finally, the tool has been evaluated through interviews with three ML experts.
  •  
6.
  • Chatzimparmpas, Angelos, 1994-, et al. (author)
  • t-viSNE : A Visual Inspector for the Exploration of t-SNE
  • 2018
  • In: Presented at IEEE Information Visualization  (VIS '18), Berlin, Germany, 21-26 October, 2018.
  • Conference paper (peer-reviewed)abstract
    • The use of t-Distributed Stochastic Neighborhood Embedding (t-SNE) for the visualization of multidimensional data has proven to be a popular approach, with applications published in a wide range of domains. Despite their usefulness, t-SNE plots can sometimes be hard to interpret or even misleading, which hurts the trustworthiness of the results. By opening the black box of the algorithm and showing insights into its behavior through visualization, we may learn how to use it in a more effective way. In this work, we present t-viSNE, a visual inspection tool that enables users to explore anomalies and assess the quality of t-SNE results by bringing forward aspects of the algorithm that would normally be lost after the dimensionality reduction process is finished.
  •  
7.
  • Chatzimparmpas, Angelos, 1994-, et al. (author)
  • t-viSNE : Interactive Assessment and Interpretation of t-SNE Projections
  • 2020
  • In: IEEE Transactions on Visualization and Computer Graphics. - : IEEE. - 1077-2626 .- 1941-0506. ; 26:8, s. 2696-2714
  • Journal article (peer-reviewed)abstract
    • t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of multidimensional data has proven to be a popular approach, with successful applications in a wide range of domains. Despite their usefulness, t-SNE projections can be hard to interpret or even misleading, which hurts the trustworthiness of the results. Understanding the details of t-SNE itself and the reasons behind specific patterns in its output may be a daunting task, especially for non-experts in dimensionality reduction. In this work, we present t-viSNE, an interactive tool for the visual exploration of t-SNE projections that enables analysts to inspect different aspects of their accuracy and meaning, such as the effects of hyper-parameters, distance and neighborhood preservation, densities and costs of specific neighborhoods, and the correlations between dimensions and visual patterns. We propose a coherent, accessible, and well-integrated collection of different views for the visualization of t-SNE projections. The applicability and usability of t-viSNE are demonstrated through hypothetical usage scenarios with real data sets. Finally, we present the results of a user study where the tool’s effectiveness was evaluated. By bringing to light information that would normally be lost after running t-SNE, we hope to support analysts in using t-SNE and making its results better understandable.
  •  
8.
  • Chatzimparmpas, Angelos, 1994-, et al. (author)
  • The State of the Art in Enhancing Trust in Machine Learning Models with the Use of Visualizations
  • 2020
  • In: Computer graphics forum (Print). - : John Wiley & Sons. - 0167-7055 .- 1467-8659. ; 39:3, s. 713-756
  • Journal article (peer-reviewed)abstract
    • Machine learning (ML) models are nowadays used in complex applications in various domains such as medicine, bioinformatics, and other sciences. Due to their black box nature, however, it may sometimes be hard to understand and trust the results they provide. This has increased the demand for reliable visualization tools related to enhancing trust in ML models, which has become a prominent topic of research in the visualization community over the past decades. To provide an overview and present the frontiers of current research on the topic, we present a State-of-the-Art Report (STAR) on enhancing trust in ML models with the use of interactive visualization. We define and describe the background of the topic, introduce a categorization for visualization techniques that aim to accomplish this goal, and discuss insights and opportunities for future research directions. Among our contributions is a categorization of trust against different facets of interactive ML, expanded and improved from previous research. Our results are investigated from different analytical perspectives: (a) providing a statistical overview, (b) summarizing key findings, (c) performing topic analyses, and (d) exploring the data sets used in the individual papers, all with the support of an interactive web-based survey browser. We intend this survey to be beneficial for visualization researchers whose interests involve making ML models more trustworthy, as well as researchers and practitioners from other disciplines in their search for effective visualization techniques suitable for solving their tasks with confidence and conveying meaning to their data.
  •  
9.
  • Chatzimparmpas, Angelos, 1994-, et al. (author)
  • VisEvol : Visual Analytics to Support Hyperparameter Search through Evolutionary Optimization
  • 2021
  • In: Computer graphics forum (Print). - : John Wiley & Sons. - 0167-7055 .- 1467-8659. ; 40:3, s. 201-214
  • Journal article (peer-reviewed)abstract
    • During the training phase of machine learning (ML) models, it is usually necessary to configure several hyperparameters. This process is computationally intensive and requires an extensive search to infer the best hyperparameter set for the given problem. The challenge is exacerbated by the fact that most ML models are complex internally, and training involves trial-and-error processes that could remarkably affect the predictive result. Moreover, each hyperparameter of an ML algorithm is potentially intertwined with the others, and changing it might result in unforeseeable impacts on the remaining hyperparameters. Evolutionary optimization is a promising method to try and address those issues. According to this method, performant models are stored, while the remainder are improved through crossover and mutation processes inspired by genetic algorithms. We present VisEvol, a visual analytics tool that supports interactive exploration of hyperparameters and intervention in this evolutionary procedure. In summary, our proposed tool helps the user to generate new models through evolution and eventually explore powerful hyperparameter combinations in diverse regions of the extensive hyperparameter space. The outcome is a voting ensemble (with equal rights) that boosts the final predictive performance. The utility and applicability of VisEvol are demonstrated with two use cases and interviews with ML experts who evaluated the effectiveness of the tool.
  •  
10.
  • Chatzimparmpas, Angelos, 1994-, et al. (author)
  • VisRuler : Visual Analytics for Extracting Decision Rules from Bagged and Boosted Decision Trees
  • 2023
  • In: Information Visualization. - : Sage Publications. - 1473-8716 .- 1473-8724. ; 22:2, s. 115-139
  • Journal article (peer-reviewed)abstract
    • Bagging and boosting are two popular ensemble methods in machine learning (ML) that produce many individual decision trees. Due to the inherent ensemble characteristic of these methods, they typically outperform single decision trees or other ML models in predictive performance. However, numerous decision paths are generated for each decision tree, increasing the overall complexity of the model and hindering its use in domains that require trustworthy and explainable decisions, such as finance, social care, and health care. Thus, the interpretability of bagging and boosting algorithms—such as random forest and adaptive boosting—reduces as the number of decisions rises. In this paper, we propose a visual analytics tool that aims to assist users in extracting decisions from such ML models via a thorough visual inspection workflow that includes selecting a set of robust and diverse models (originating from different ensemble learning algorithms), choosing important features according to their global contribution, and deciding which decisions are essential for global explanation (or locally, for specific cases). The outcome is a final decision based on the class agreement of several models and the explored manual decisions exported by users. We evaluated the applicability and effectiveness of VisRuler via a use case, a usage scenario, and a user study. The evaluation revealed that most users managed to successfully use our system to explore decision rules visually, performing the proposed tasks and answering the given questions in a satisfying way.
  •  
11.
  • Espadoto, Mateus, et al. (author)
  • Toward a Quantitative Survey of Dimension Reduction Techniques
  • 2021
  • In: IEEE Transactions on Visualization and Computer Graphics. - : IEEE. - 1077-2626 .- 1941-0506. ; 27:3, s. 2153-2173
  • Journal article (peer-reviewed)abstract
    • Dimensionality reduction methods, also known as projections, are frequently used in multidimeDimensionality reduction methods, also known as projections, are frequently used in multidimensional data exploration in machine learning, data science, and information visualization. Tens of such techniques have been proposed, aiming to address a wide set of requirements, such as ability to show the high-dimensional data structure, distance or neighborhood preservation, computational scalability, stability to data noise and/or outliers, and practical ease of use. However, it is far from clear for practitioners how to choose the best technique for a given use context. We present a survey of a wide body of projection techniques that helps answering this question. For this, we characterize the input data space, projection techniques, and the quality of projections, by several quantitative metrics. We sample these three spaces according to these metrics, aiming at good coverage with bounded effort. We describe our measurements and outline observed dependencies of the measured variables. Based on these results, we draw several conclusions that help comparing projection techniques, explain their results for different types of data, and ultimately help practitioners when choosing a projection for a given context. Our methodology, datasets, projection implementations, metrics, visualizations, and results are publicly open, so interested stakeholders can examine and/or extend this benchmark.nsional data exploration in machine learning, data science, and information visualization. Tens of such techniques have been proposed, aiming to address a wide set of requirements, such as ability to show the high-dimensional data structure, distance or neighborhood preservation, computational scalability, stability to data noise and/or outliers, and practical ease of use. However, it is far from clear for practitioners how to choose the best technique for a given use context. We present a survey of a wide body of projection techniques that helps answering this question. For this, we characterize the input data space, projection techniques, and the quality of projections, by several quantitative metrics. We sample these three spaces according to these metrics, aiming at good coverage with bounded effort. We describe our measurements and outline observed dependencies of the measured variables. Based on these results, we draw several conclusions that help comparing projection techniques, explain their results for different types of data, and ultimately help practitioners when choosing a projection for a given context. Our methodology, datasets, projection implementations, metrics, visualizations, and results are publicly open, so interested stakeholders can examine and/or extend this benchmark.
  •  
12.
  • Kucher, Kostiantyn, et al. (author)
  • Analysis of VINCI 2009–2017 Proceedings
  • 2018
  • In: Proceedings of the 11th International Symposium on Visual Information Communication and Interaction (VINCI '18), 13-15 August 2018, Växjö, Sweden. - New York, NY, USA : Association for Computing Machinery (ACM). - 9781450365017 ; , s. 97-101
  • Conference paper (peer-reviewed)abstract
    • Both the metadata and the textual contents of scientific publications can provide us with insights about the development and the current state of the corresponding scientific community. In this short paper, we take a look at the proceedings of VINCI from the previous years and conduct several types of analyses. We summarize the yearly statistics about different types of publications, identify the overall authorship statistics and the most prominent contributors, and analyze the current community structure with a co-authorship network. We also apply topic modeling to identify the most prominent topics discussed in the publications. We hope that the results of our work will provide insights for the visualization community and will also be used as an overview for researchers previously unfamiliar with VINCI.
  •  
13.
  • Kucher, Kostiantyn, 1989-, et al. (author)
  • Project in Visualization and Data Analysis : Experiences in Designing and Coordinating the Course
  • 2021
  • In: Proceedings of 42nd Annual Conference of the European Association for Computer Graphics (EG '21) — Education Papers. - : Eurographics - European Association for Computer Graphics. - 9783038681328 ; , s. 39-44
  • Conference paper (peer-reviewed)abstract
    • Visual analytics involves both visual and computational components for empowering human analysts who face the challenges of making sense and making use of large and heterogeneous data sets in various application domains. In order to facilitate the learning process for the students at higher education institutions with regard to both the theoretical knowledge and practical skills in visual analytics, the respective courses must cover a variety of topics and include multiple assessment methods and activities. In this paper, we report on the design and first instantiation of a full term project-based course in visualization and data analysis, which was recently offered to graduate and post-graduate students at our department and met with positive feedback from the course participants.
  •  
14.
  • Kucher, Kostiantyn, 1989-, et al. (author)
  • StanceVis Prime : Visual Analysis of Sentiment and Stance in Social Media Texts
  • 2020
  • In: Journal of Visualization. - : Springer. - 1343-8875 .- 1875-8975. ; 23:6, s. 1015-1034
  • Journal article (peer-reviewed)abstract
    • Text visualization and visual text analytics methods have been successfully applied for various tasks related to the analysis of individual text documents and large document collections such as summarization of main topics or identification of events in discourse. Visualization of sentiments and emotions detected in textual data has also become an important topic of interest, especially with regard to the data originating from social media. Despite the growing interest for this topic, the research problem related to detecting and visualizing various stances, such as rudeness or uncertainty, has not been adequately addressed by existing approaches. The challenges associated with this problem include development of the underlying computational methods and visualization of the corresponding multi-label stance classification results. In this paper, we describe our work on a visual analytics platform, called StanceVis Prime, which has been designed for the analysis of sentiment and stance in temporal text data from various social media data sources. The use case scenarios intended for StanceVis Prime include social media monitoring and research in sociolinguistics. The design was motivated by the requirements of collaborating domain experts in linguistics as part of a larger research project on stance analysis. Our approach involves consuming documents from several text stream sources and applying sentiment and stance classification, resulting in multiple data series associated with source texts. StanceVis Prime provides the end users with an overview of similarities between the data series based on dynamic time warping analysis, as well as detailed visualizations of data series values. Users can also retrieve and conduct both distant and close reading of the documents corresponding to the data series. We demonstrate our approach with case studies involving political targets of interest and several social media data sources and report preliminary user feedback received from a domain expert.
  •  
15.
  • Martins, Rafael Messias, Dr. 1984-, et al. (author)
  • Efficient Dynamic Time Warping for Big Data Streams
  • 2019
  • In: Proceedings of the IEEE International Conference on Big Data (Big Data '18). - : IEEE. - 9781538650356 - 9781538650363 ; , s. 2924-2929
  • Conference paper (peer-reviewed)abstract
    • Many common data analysis and machine learning algorithms for time series, such as classification, clustering, or dimensionality reduction, require a distance measurement between pairs of time series in order to determine their similarity. A variety of measures can be found in the literature, each with their own strengths and weaknesses, but the Dynamic Time Warping (DTW) distance measure has occupied an important place since its early applications for the analysis and recognition of spoken word. The main disadvantage of the DTW algorithm is, however, its quadratic time and space complexity, which limits its practical use to relatively small time series. This issue is even more problematic when dealing with streaming time series that are continuously updated, since the analysis must be re-executed regularly and with strict running time constraints. In this paper, we describe enhancements to the DTW algorithm that allow it to be used efficiently in a streaming scenario by supporting an append operation for new time steps with a linear complexity when an exact, error-free DTW is needed, and even better performance when either a Sakoe-Chiba band is used, or when a sliding window is the desired range for the data. Our experiments with one synthetic and four natural data sets have shown that it outperforms other DTW implementations and the potential errors are, in general, much lower than another state-of-the-art approximated DTW technique.
  •  
16.
  • Mohseni, Zeynab (Artemis) (author)
  • Development of Visual Learning Analytic Tools to Explore Performance and Engagement of Students in Primary, Secondary, and Higher Education
  • 2024
  • Doctoral thesis (other academic/artistic)abstract
    • Schools and educational institutions collect large amounts of data about students and their learning, including text, grades, quizzes, timestamps, and other activities. However, in primary and secondary education, this data is often dispersed across different digital platforms, lacking standardized methods for collection, processing, analysis, and presentation. These issues hinder teachers and students from making informed decisions or strategic and effective use of data. This presents a significant obstacle to progress in education and the effective development of Educational Technology (EdTech) products. Visual Learning Analytics (VLA) tools, also known as Learning Analytics Dashboards (LADs), are designed to visualize student data to support pedagogical decision-making. Despite their potential, the effectiveness of these tools remains limited. Addressing these challenges requires both technical solutions and thoughtful design considerations, as explored in Papers 1 through 5 of this thesis. Paper 1 examines the design aspects of VLA tools by evaluating higher education data and various visualization and Machine Learning (ML) techniques. Paper 2 provides broader insights into the VLA landscape through a systematic review, mapping key concepts and research gaps in VLA and emphasizing the potential of VLA tools to enhance pedagogical decisions and learning outcomes. Meanwhile, Paper 3 delves into a technical solution (data pipeline and data standard) considering a secure Swedish warehouse, SUNET. This includes a data standard for integrating educational data into SUNET, along with customized scripts to reformat, merge, and hash multiple student datasets. Papers 4 and 5 focus on design aspects, with Paper 4 discussing the proposed Human-Centered Design (HCD) approach involving teachers in co-designing a simple VLA tool. Paper 5 introduces a scenario-based framework for Multiple Learning Analytics Dashboards (MLADs) development, stressing user engagement for tailored LADs that facilitate informed decision-making in education. The dissertation offers a comprehensive approach to advancing VLA tools, integrating technical solutions with user-centric design principles. By addressing data integration challenges and involving users in tool development, these efforts aim to empower teachers in leveraging educational data for improved teaching and learning experiences.
  •  
17.
  • Neves, Tácito Trindade de Araújo Tiburtino, et al. (author)
  • Fast and Reliable Incremental Dimensionality Reduction for Streaming Data
  • 2022
  • In: Computers & graphics. - : Elsevier. - 0097-8493 .- 1873-7684. ; 102, s. 233-244
  • Journal article (peer-reviewed)abstract
    • Streaming data applications are becoming more common due to the ability ofdifferent information sources to continuously capture or produce data, such as sensors and social media. Although there are recent advances, most visualization approaches, particularly Dimensionality Reduction (DR) techniques, cannot be directly applied in such scenarios due to the transient nature of streaming data. A few DR methods currently address this limitation using online or incremental strategies, continuously updating the visualization as data is received. Despite their relative success, most impose the need to store and access the data multiple times to produce a complete projection, not being appropriate for streaming where data continuously grow. Others do not impose such requirements but cannot update the position of the data already projected, potentially resulting in visual artifacts. This paper presents Xtreaming, a novel incremental DR technique that continuously updates the visual representation to reflect new emerging structures or patterns without visiting the high-dimensional data more than once. Our tests show that in streaming scenarios where data is not fully stored in-memory, Xtreaming is competitive in terms of quality compared to other streaming and incremental techniques while being orders of magnitude faster.
  •  
18.
  • Ulan, Maria, et al. (author)
  • Artifact: Quality Models Inside Out : Interactive Visualization of Software Metrics by Means of Joint Probabilities
  • 2018
  • Other publication (software/multimedia) (peer-reviewed)abstract
    • Assessing software quality, in general, is hard; each metric has a different interpretation, scale, range of values, or measurement method. Combining these metrics automatically is especially difficult, because they measure different aspects of software quality, and creating a single global final quality score limits the evaluation of the specific quality aspects and trade-offs that exist when looking at different metrics. We present a way to visualize multiple aspects of software quality. In general, software quality can be decomposed hierarchically into characteristics, which can be assessed by various direct and indirect metrics. These characteristics are then combined and aggregated to assess the quality of the software system as a whole. We introduce an approach for quality assessment based on joint distributions of metrics values. Visualizations of these distributions allow users to explore and compare the quality metrics of software systems and their artifacts, and to detect patterns, correlations, and anomalies. Furthermore, it is possible to identify common properties and flaws, as our visualization approach provides rich interactions for visual queries to the quality models’ multivariate data. We evaluate our approach in two use cases based on: 30 real-world technical documentation projects with 20,000 XML documents, and an open source project written in Java with 1000 classes. Our results show that the proposed approach allows an analyst to detect possible causes of bad or good quality.
  •  
19.
  • Ulan, Maria, et al. (author)
  • Quality Models Inside Out : Interactive Visualization of Software Metrics by Means of Joint Probabilities
  • 2018
  • In: Proceedings of the 2018 Sixth IEEE Working Conference on Software Visualization, (VISSOFT), Madrid, Spain, 2018. - : IEEE. - 9781538682920 - 9781538682937 ; , s. 65-75
  • Conference paper (peer-reviewed)abstract
    • Assessing software quality, in general, is hard; each metric has a different interpretation, scale, range of values, or measurement method. Combining these metrics automatically is especially difficult, because they measure different aspects of software quality, and creating a single global final quality score limits the evaluation of the specific quality aspects and trade-offs that exist when looking at different metrics. We present a way to visualize multiple aspects of software quality. In general, software quality can be decomposed hierarchically into characteristics, which can be assessed by various direct and indirect metrics. These characteristics are then combined and aggregated to assess the quality of the software system as a whole. We introduce an approach for quality assessment based on joint distributions of metrics values. Visualizations of these distributions allow users to explore and compare the quality metrics of software systems and their artifacts, and to detect patterns, correlations, and anomalies. Furthermore, it is possible to identify common properties and flaws, as our visualization approach provides rich interactions for visual queries to the quality models’ multivariate data. We evaluate our approach in two use cases based on: 30 real-world technical documentation projects with 20,000 XML documents, and an open source project written in Java with 1000 classes. Our results show that the proposed approach allows an analyst to detect possible causes of bad or good quality.
  •  
20.
  • Ventocilla, Elio, 1984-, et al. (author)
  • Scaling the Growing Neural Gas for Visual Cluster Analysis
  • 2021
  • In: Big Data Research. - : Elsevier. - 2214-5796 .- 2214-580X. ; 26
  • Journal article (peer-reviewed)abstract
    • The growing neural gas (GNG) is an unsupervised topology learning algorithm that models a data space through interconnected units that stand on the populated areas of that space. Its output is a graph that can be visually represented on a two-dimensional plane, and be used as means to disclose cluster patterns in datasets. GNG, however, creates highly connected graphs when trained on high dimensional data, which in turn leads to highly clutter representations that fail to disclose any meaningful patterns. Moreover, its sequential learning limits its potential for faster executions on local datasets, and, more importantly, its potential for training on distributed datasets while leveraging from the computational resources of the infrastructures in which they reside.This paper presents two methods that improve GNG for the visualization of cluster patterns in large and high-dimensional datasets. The first one focuses on providing more meaningful and accurate cluster pattern representations of high-dimensional datasets, by avoiding connections that lead to high-dimensional graphs in the modeled topology, which may, in turn, lead to visual cluttering in 2D representations. The second method presented in this paper enables the use of GNG on big and distributed datasets with faster execution times, by modeling and merging separate parts of a dataset using the MapReduce model.Quantitative and qualitative evaluations show that the first method leads to the creation of lower-dimensional graph structures, which in turn provide more accurate and meaningful cluster representations; and that the second method preserves the accuracy and meaning of the cluster representations while enabling its execution in distributed settings.
  •  
21.
  • Witschard, Daniel, et al. (author)
  • A Statement Report on the Use of Multiple Embeddings for Visual Analytics of Multivariate Networks
  • 2021
  • In: Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2021) - Volume 3: IVAPP. - : SciTePress. - 9789897584886 ; , s. 219-223
  • Conference paper (peer-reviewed)abstract
    • The visualization of large multivariate networks (MVN) continues to be a great challenge and will probably remain so for a foreseeable future. The field of Multivariate Network Embedding seeks to meet this challenge by providing MVN-specific embedding technologies that targets different properties such as network topology or attribute values for nodes or links. Although many steps forward have been taken, the goal of efficiently embedding all aspects of a MVN remains distant. This position paper contrasts the current trend of finding new ways of jointly embedding several properties with the alternative strategy of instead using, and combining, already existing state-of-the-art single scope embedding technologies. From this comparison, we argue that the latter strategy provides a more generic and flexible approach with several advantages. Hence, we hope to convince the visual analytics community to invest more work in resolving some of the key issues that would make this methodology possible.
  •  
22.
  • Witschard, Daniel, et al. (author)
  • Interactive Optimization of Embedding-based Text Similarity Calculations
  • 2022
  • In: Information Visualization. - : Sage Publications. - 1473-8716 .- 1473-8724. ; 21:4, s. 335-353
  • Journal article (peer-reviewed)abstract
    • Comparing text documents is an essential task for a variety of applications within diverse research fields, and several different methods have been developed for this. However, calculating text similarity is an ambiguous and context-dependent task, so many open challenges still exist. In this paper, we present a novel method for text similarity calculations based on the combination of embedding technology and ensemble methods. By using several embeddings, instead of only one, we show that it is possible to achieve higher quality, which in turn is a key factor for developing high-performing applications for text similarity exploitation. We also provide a prototype visual analytics tool which helps the analyst to find optimal performing ensembles and gain insights to the inner workings of the similarity calculations. Furthermore, we discuss the generalizability of our key ideas to fields beyond the scope of text analysis.
  •  
23.
  • Witschard, Daniel, et al. (author)
  • Multiple Embeddings for Multivariate Network Analysis
  • 2020
  • In: 6th annual Big Data Conference at Linnaeus University, in Växjö, Sweden.
  • Conference paper (other academic/artistic)abstract
    • The visualization and visual analytics of large multivariate networks (MVN) continues to be a great challenge and will probably remain so for a foreseeable future. The field of Multivariate Network Embedding seeks to meet this challenge by providing MVN-specific embedding technologies that targets different properties such as network topology or attribute values for nodes or links. Embeddings are relatively low-dimensional vector representations of the embedded items and they are well suited for similarity calculations. Although many steps forward have been taken, the goal of efficiently embedding all aspects of a MVN remains distant. As a possible way forward we suggest a new angle of approach where, instead of trying to fit all aspects of a MVN into one embedding, the strategy would be to embed each property by itself and then find ways to combine these sets of embeddings.
  •  
24.
  • Buljan, Matej, et al. (author)
  • An Investigation on the Impact of Non-Uniform Random Sampling Techniques for t-SNE
  • 2020
  • In: 2020 Swedish Workshop on Data Science, SweDS 2020. - : Institute of Electrical and Electronics Engineers Inc.. - 9781728192048 - 9781728192055
  • Conference paper (peer-reviewed)abstract
    • t-Distributed Statistical Neighbor Embedding (t-SNE) is a dimensionality reduction technique that has gained much popularity for its increased capability of creating low-dimensional embeddings that preserve well-separated clusters from high-dimensional spaces. Despite its strengths, the running times for t-SNE are usually high and do not scale well with the size of datasets, which limits its applicability to scenarios that involve, for example, Big Data and interactive visualization. Downsampling the dataset into more manageable sizes is a possible straightforward workaround, but it is not clear from the literature how much the quality of the embedding suffers from the downsampling, and whether uniform random sampling is indeed the best possible solution. In this paper, we report on a thorough series of experiments performed to draw conclusions about the quality of embeddings from running t-SNE on samples of data using different sampling techniques: Uniform random sampling, random walk sampling, our proposed affinity-based random walk sampling, and the so-called hubness sampling. Throughout our testing, the affinity-based variant of random walk sampling distinguished itself as a promising alternative to uniform random sampling.
  •  
25.
  • Coimbra, Danilo B., et al. (author)
  • Analyzing the quality of local and global multidimensional projections using performance evaluation planning
  • 2021
  • In: Theoretical Computer Science. - : Elsevier. - 0304-3975 .- 1879-2294. ; 872, s. 41-54
  • Journal article (peer-reviewed)abstract
    • Among the challenges of the big data era, the analysis of high-dimensional data is still an open research area. As a result, several multidimensional projection techniques have been developed to reduce data dimensionality, becoming important visualization and visual analytics tools. In order to ensure the quality of projections, it is necessary to assess the low-dimensional embeddings by using different dataset configurations as input and analyzing evaluation metrics. However, it is not clear to the user how factors such as the number of dimensions, instances, or clusters, can affect the projection mapping and its quality regarding different projection techniques and assessment metrics. The research reported in this paper aims to clarify how much these factors affect each response variable via performance evaluation planning. We present an evaluation approach, supported by factorial design, that carries out a complete analysis, in the sense of measuring all possible combinations of all the input factors. The results of the analyses of local and global structure preservation in the projections yield a better understanding of how distinct dataset properties can influence the choice of projections based on quality metrics results. (C) 2021 Elsevier B.V. All rights reserved.
  •  
26.
  • Coimbra, Danilo B., et al. (author)
  • Explaining three-dimensional dimensionality reduction plots
  • 2016
  • In: Information Visualization. - : Sage Publications. - 1473-8716 .- 1473-8724. ; 15:2, s. 154-172
  • Journal article (peer-reviewed)abstract
    • Understanding three-dimensional projections created by dimensionality reduction from high-variate datasets is very challenging. In particular, classical three-dimensional scatterplots used to display such projections do not explicitly show the relations between the projected points, the viewpoint used to visualize the projection, and the original data variables. To explore and explain such relations, we propose a set of interactive visualization techniques. First, we adapt and enhance biplots to show the data variables in the projected threedimensional space. Next, we use a set of interactive bar chart legends to show variables that are visible from a given viewpoint and also assist users to select an optimal viewpoint to examine a desired set of variables. Finally, we propose an interactive viewpoint legend that provides an overview of the information visible in a given three-dimensional projection from all possible viewpoints. Our techniques are simple to implement and can be applied to any dimensionality reduction technique. We demonstrate our techniques on the exploration of several real-world high-dimensional datasets.
  •  
27.
  • Felizardo, Katia Romero, et al. (author)
  • Content based visual mining of document collections using ontologies
  • 2009
  • In: II Workshop on Web and Text Intelligence (WTI) 2009.
  • Conference paper (peer-reviewed)abstract
    • Document collections are important data sets in many applications. It has been shown that content based visual mappings of documents can be done effectively through projection and point placement strategies. An important step in this process is the creation of a vector space model, in which terms selected from the text and weighted are used as attributes for the vector space. That step in many cases impairs the quality of the projection due to the existence, in the data set, of many terms that are frequent but do not represent important concepts in the user's particular context. This paper proposes and evaluates the use of ontologies for content based visual analysis of textual data sets as a means to improve the displays for the analysis of the collection. The results show that when the ontology effectively represents the data domain it increases quality of maps.
  •  
28.
  • Felizardo, Katia Romero, et al. (author)
  • Using visual text mining to support the study selection activity in systematic literature reviews
  • 2011
  • In: 2011 Fifth International Symposium on Empirical Software Engineering and Measurement, ESEM 2011. - Washington : IEEE. - 9781457722035 - 9780769546049 ; , s. 77-86
  • Conference paper (peer-reviewed)abstract
    • Background: A systematic literature review (SLR) is a methodology used to aggregate all relevant existing evidence to answer a research question of interest. Although crucial, the process used to select primary studies can be arduous, time consuming, and must often be conducted manually.Objective: We propose a novel approach, known as 'Systematic Literature Review based on Visual Text Mining' or simply SLR-VTM, to support the primary study selection activity using visual text mining (VTM) techniques. Method: We conducted a case study to compare the performance and effectiveness of four doctoral students in selecting primary studies manually and using the SLR-VTM approach. To enable the comparison, we also developed a VTM tool that implemented our approach. We hypothesized that students using SLR-VTM would present improved selection performance and effectiveness.Results: Our results show that incorporating VTM in the SLR study selection activity reduced the time spent in this activity and also increased the number of studies correctly included.Conclusions: Our pilot case study presents promising results suggesting that the use of VTM may indeed be beneficial during the study selection activity when performing an SLR.
  •  
29.
  • Felizardo, Katia Romero, et al. (author)
  • Visual Text Mining : Ensuring the Presence of Relevant Studies in Systematic Literature Reviews
  • 2015
  • In: International journal of software engineering and knowledge engineering. - : World Scientific. - 0218-1940. ; 25:5, s. 909-928
  • Journal article (peer-reviewed)abstract
    • One of the activities associated with the Systematic Literature Review (SLR) process is the selection review of primary studies. When the researcher faces large volumes of primary studies to be analyzed, the process used to select studies can be arduous. In a previous experiment, we conducted a pilot test to compare the performance and accuracy of PhD students in conducting the selection review activity manually and using Visual Text Mining (VTM) techniques. The goal of this paper is to describe a replication study involving PhD and Master students. The replication study uses the same experimental design and materials of the original experiment. This study also aims to investigate whether the researcher's level of experience with conducting SLRs and research in general impacts the outcome of the primary study selection step of the SLR process. The replication results have con¯rmed the outcomes of the original experiment, i.e., VTM is promising and can improve the performance of the selection review of primary studies. We also observed that both accuracy and performance increase in function of the researcher's experience level in conducting SLRs. The use of VTM can indeed be beneficial during the selection review activity.
  •  
30.
  • Hilasaca, Gladys M., et al. (author)
  • A Grid-based Method for Removing Overlaps of Dimensionality Reduction Scatterplot Layouts
  • 2024
  • In: IEEE Transactions on Visualization and Computer Graphics. - : IEEE. - 1077-2626 .- 1941-0506. ; 30:8, s. 5733-5749
  • Journal article (peer-reviewed)abstract
    • Dimensionality Reduction (DR) scatterplot layouts have become a ubiquitous visualization tool for analyzing multidimensional datasets. Despite their popularity, such scatterplots suffer from occlusion, especially when informative glyphs are used to represent data instances, potentially obfuscating critical information for the analysis under execution. Different strategies have been devised to address this issue, either producing overlap-free layouts that lack the powerful capabilities of contemporary DR techniques in uncovering interesting data patterns or eliminating overlaps as a post-processing strategy. Despite the good results of post-processing techniques, most of the best methods typically expand or distort the scatterplot area, thus reducing glyphs’ size (sometimes) to unreadable dimensions, defeating the purpose of removing overlaps. This article presents Distance Grid (DGrid) , a novel post-processing strategy to remove overlaps from DR layouts that faithfully preserves the original layout's characteristics and bounds the minimum glyph sizes. We show that DGrid surpasses the state-of-the-art in overlap removal (through an extensive comparative evaluation considering multiple different metrics) while also being one of the fastest techniques, especially for large datasets. A user study with 51 participants also shows that DGrid is consistently ranked among the top techniques for preserving the original scatterplots’ visual characteristics and the aesthetics of the final results.
  •  
31.
  • Laitinen, Mikko, 1973-, et al. (author)
  • The Nordic Tweet Stream : A Dynamic Real-Time Monitor Corpus of Big and Rich Language Data
  • 2018
  • In: DHN 2018 Digital Humanities in the Nordic Countries 3rd Conference. - : CEUR-WS.org. ; , s. 349-362
  • Conference paper (peer-reviewed)abstract
    • This article presents the Nordic Tweet Stream (NTS), a cross-disciplinarycorpus project of computer scientists and a group of sociolinguists interestedin language variability and in the global spread of English. Our research integratestwo types of empirical data: We not only rely on traditional structured corpusdata but also use unstructured data sources that are often big and rich inmetadata, such as Twitter streams. The NTS downloads tweets and associatedmetadata from Denmark, Finland, Iceland, Norway and Sweden. We first introducesome technical aspects in creating a dynamic real-time monitor corpus, andthe following case study illustrates how the corpus could be used as empiricalevidence in sociolinguistic studies focusing on the global spread of English tomultilingual settings. The results show that English is the most frequently usedlanguage, accounting for almost a third. These results can be used to assess howwidespread English use is in the Nordic region and offer a big data perspectivethat complement previous small-scale studies. The future objectives include annotatingthe material, making it available for the scholarly community, and expandingthe geographic scope of the data stream outside Nordic region.
  •  
32.
  • Martins, Rafael Messias, Dr. 1984-, et al. (author)
  • Explaining Neighborhood Preservation for Multidimensional Projections
  • 2015
  • In: EG UK Computer Graphics & Visual Computing (2015). - : Eurographics - European Association for Computer Graphics. ; , s. 7-14
  • Conference paper (peer-reviewed)abstract
    • Dimensionality reduction techniques are the tools of choice for exploring high-dimensional datasets by means of low-dimensional projections. However, even state-of-the-art projection methods fail, up to various degrees, in perfectly preserving the structure of the data, expressed in terms of inter-point distances and point neighborhoods. To support better interpretation of a projection, we propose several metrics for quantifying errors related to neighborhood preservation. Next, we propose a number of visualizations that allow users to explore and explain the quality of neighborhood preservation at different scales, captured by the aforementioned error metrics. We demonstrate our exploratory views on three real-world datasets and two state-of-the-art multidimensional projection techniques.
  •  
33.
  • Martins, Rafael Messias, Dr. 1984- (author)
  • Explanatory visualization of multidimensional projections
  • 2016
  • Doctoral thesis (other academic/artistic)abstract
    • One way of getting insight into large data collections (known nowadays under the name of ‘big data’) is by depicting them visually and next interactively exploring the resulting visualizations. However, both the number of data points or measurements, and the number of dimensions describing each measurement, can be very large – much like a data table can have many rows and columns. Visualizing such so-called high-dimensional datasets is very challenging. One way to do this is to construct low (two or three) dimensional depictions of the data, and find patterns of interest in these depictions rather than in the original high-dimensional data. Techniques that perform this, called projections, have several advantages – they are visually scalable, work well with noisy data, and are fast to compute. However, a major limitation they have is that they generate hard-to-interpret images for the average user.We approach this problem in this thesis from several angles – by showing where errors appear in the projection, and by explaining projections in terms of the original high dimensions both locally and globally. Our proposed mechanisms are simple to learn, computationally scalable, and easy to add to any data exploration pipeline using any type of projection. We demonstrate and validate our proposals on several applications using data from measurements, scientific simulations, software engineering, and networks.
  •  
34.
  • Martins, Rafael Messias, Dr. 1984-, et al. (author)
  • Multidimensional Projections for Visual Analysis of Social Networks
  • 2012
  • In: Journal of Computer Science and Technology. - : Springer. - 1000-9000 .- 1860-4749. ; 27:4, s. 791-810
  • Journal article (peer-reviewed)abstract
    • Visual analysis of social networks is usually based on graph drawing algorithms and tools. However, social networks are a special kind of graph in the sense that interpretation of displayed relationships is heavily dependent on context. Context, in its turn, is given by attributes associated with graph elements, such as individual nodes, edges, and groups of edges, as well as by the nature of the connections between individuals. In most systems, attributes of individuals and communities are not taken into consideration during graph layout, except to derive weights for force-based placement strategies. This paper proposes a set of novel tools for displaying and exploring social networks based on attribute and connectivity mappings. These properties are employed to layout nodes on the plane via multidimensional projection techniques. For the attribute mapping, we show that node proximity in the layout corresponds to similarity in attribute, leading to easiness in locating similar groups of nodes. The projection based on connectivity yields an initial placement that forgoes force-based or graph analysis algorithm, reaching a meaningful layout in one pass. When a force algorithm is then applied to this initial mapping, the final layout presents better properties than conventional force-based approaches. Numerical evaluations show a number of advantages of pre-mapping points via projections. User evaluation demonstrates that these tools promote ease of manipulation as well as fast identification of concepts and associations which cannot be easily expressed by conventional graph visualization alone. In order to allow better space usage for complex networks, a graph mapping on the surface of a sphere is also implemented.
  •  
35.
  • Martins, Rafael Messias, Dr. 1984-, et al. (author)
  • Visual Analysis of Dimensionality Reduction Quality for Parameterized Projections
  • 2014
  • In: Computers & graphics. - : Elsevier. - 0097-8493 .- 1873-7684. ; 41, s. 26-42
  • Journal article (peer-reviewed)abstract
    • In recent years, many dimensionality reduction (DR) algorithms have been proposed for visual analysis of multidimensional data. Given a set of n-dimensional observations, such algorithms create a 2D or 3D projection thereof that preserves relative distances or neighborhoods. The quality of resulting projections is strongly influenced by many choices, such as the DR techniques used and their various parameter settings. Users find it challenging to judge the effectiveness of a projection in maintaining features from the original space and to understand the effect of parameter settings on these results, as well as performing related tasks such as comparing two projections. We present a set of interactive visualizations that aim to help users with these tasks by revealing the quality of a projection and thus allowing inspection of parameter choices for DR algorithms, by observing the effects of these choices on the resulting projection. Our visualizations target questions regarding neighborhoods, such as finding false and missing neighbors and showing how such projection errors depend on algorithm or parameter choices. By using several space-filling techniques, our visualizations scale to large datasets. We apply our visualizations on several recent DR techniques and high-dimensional datasets, showing how they easily offer local detail on point and group neighborhood preservation while relieving users from having to understand technical details of projections.
  •  
36.
  • Martins, Rafael Messias, Dr. 1984-, et al. (author)
  • Visual Learning Analytics of Multidimensional Student Behavior in Self-regulated Learning
  • 2019
  • In: Transforming Learning with Meaningful Technologies. - Cham : Springer. - 9783030297350 - 9783030297367 ; , s. 737-741
  • Conference paper (peer-reviewed)abstract
    • In Self-Regulated Learning (SLR), the lack of a predefined, formal learning trajectory makes it more challenging to assess students’ progress (e.g. by comparing it to specific baselines) and to offer relevant feedback and scaffolding when appropriate. In this paper we describe a Visual Learning Analytics (VLA) solution for exploring students’ datasets collected in a Web-Based Learning Environment (WBLE). We employ mining techniques for the analysis of multidimensional data, such as t-SNE and clustering, in an exploratory study for identifying patterns of students with similar study behavior and interests. An example use case is presented as evidence of the effectiveness of our proposed method, with a dataset of learning behaviors of 6423 students who used an online study tool during 18 months.
  •  
37.
  • Mohseni, Zeynab, et al. (author)
  • A technical infrastructure for primary education data that contributes to data standardization
  • 2024
  • In: Education and Information Technologies. - : Springer. - 1360-2357 .- 1573-7608.
  • Journal article (peer-reviewed)abstract
    • There is a significant amount of data available about students and their learning activities in many educational systems today. However, these datasets are frequently spread across several different digital services, making it challenging to use them strategically. In addition, there are no established standards for collecting, processing, analyzing, and presenting such data. As a result, school leaders, teachers, and students do not capitalize on the possibility of making decisions based on data. This is a serious barrier to the improvement of work in schools, teacher and student progress, and the development of effective Educational Technology (EdTech) products and services. Data standards can be used as a protocol on how different IT systems communicate with each other. When working with data from different public and private institutions simultaneously (e.g., different municipalities and EdTech companies), having a trustworthy data pipeline for retrieving the data and storing it in a secure warehouse is critical. In this study, we propose a technical solution containing a data pipeline by employing a secure warehouse—the Swedish University Computer Network (SUNET), which is an interface for information exchange between operational processes in schools. We conducted a user study in collaboration with four municipalities and four EdTech companies based in Sweden. Our proposal involves introducing a data standard to facilitate the integration of educational data from diverse resources in our SUNET drive. To accomplish this, we created customized scripts for each stakeholder, tailored to their specific data formats, with the aim of merging the students’ data. The results of the first four steps show that our solution works. Once the results of the next three steps are in, we will contemplate scaling up our technical solution nationwide. With the implementation of the suggested data standard and the utilization of the proposed technical solution, diverse stakeholders can benefit from improved management, transportation, analysis, and visualization of educational data.
  •  
38.
  • Mohseni, Zeynab, et al. (author)
  • Co-Developing an Easy-to-Use Learning Analytics Dashboard for Teachers in Primary/Secondary Education : A Human-Centered Design Approach
  • 2023
  • In: Education Sciences. - : MDPI. - 2227-7102. ; 13:12
  • Journal article (peer-reviewed)abstract
    • Learning Analytics Dashboards (LADs) can help provide insights and inform pedagogical decisions by supporting the analysis of large amounts of educational data, obtained from sources such as Digital Learning Materials (DLMs). Extracting requirements is a crucial step in developing a LAD, as it helps identify the underlying design problem that needs to be addressed. In fact, determining the problem that requires a solution is one of the primary objectives of requirements extraction. Although there have been studies on the development of LADs for K12 education, these studies have not specifically emphasized the use of a Human-Centered Design (HCD) approach to better comprehend the teachers’ requirements and produce more stimulating insights. In this paper we apply prototyping, which is widely acknowledged as a successful way for rapidly implementing cost-effective designs and efficiently gathering stakeholder feedback, to elicit such requirements. We present a three-step HCD approach, involving a design cycle that employs paper and interactive prototypes to guide the systematic and effective design of LADs that truly meet teacher requirements in primary/secondary education, actively engaging them in the design process. We then conducted interviews and usability testing to co-design and develop a LAD that can be used in classroom’s everyday learning activities. Our results show that the visualizations of the interactive prototype were easily interpreted by the participants, verifying our initial goal of co-developing an easy-to-use LAD.
  •  
39.
  • Mohseni, Zeynab, et al. (author)
  • Improving Classification in Imbalanced Educational Datasets using Over-sampling
  • 2020
  • In: Proceedings of the 28th international conference on computer in education. - : Asia-Pacific Society for Computers in Education. - 9789869721455 ; , s. 278-283
  • Conference paper (peer-reviewed)abstract
    • Learning Analytics (LA) involves a growing range of methods for understanding and optimizing learning and the environments in which it occurs. Different Machine Learning (ML) algorithms or learning classifiers can be used to implement LA, with the goal of predicting learning outcomes and classifying the data into predetermined categories. Many educational datasets are imbalanced, where the number of samples in one category is significantly larger than in other categories. Ordinarily, it is ML’s performance on the minority categories that is the most important. Since most ML classification algorithms ignore the minority categories, and in turn have poor performance, so learning from imbalanced datasets is really challenging. In order to address this challenge and also to improve the performance of different classifiers, Synthetic Minority Over-sampling Technique (SMOTE) is used to oversample the minority categories. In this paper, the accuracy of seven well-known classifiers considering 5 and 10-fold cross-validation and the F1-score are compared. The imbalanced dataset collected based on self-regulated learning activities contains the learning behaviour of 6,423 medical students who used a web-based study platform—Hypocampus—with different educational topics for one year. Also, two diagnostic tools including Area Under the Receiver Operating Characteristics (AUC-ROC) curves and Precision-Recall (PR) curves are applied to predict probabilities of an observation belonging to each category in a classification problem. Using these diagnostic tools may help LA researchers on how to deal with imbalanced educational datasets. The outcomes of our experimental results show that Neural Network with 92.77% in 5-fold cross-validation, 93.20% in 10-fold cross-validation and 0.95 in F1-score has the highest accuracy and performance compared to other classifiers when we applied the SMOTE technique. Also, the probability of detection in different classifiers using SMOTE has shown a significant improvement. 
  •  
40.
  • Mohseni, Zeynab, et al. (author)
  • SAVis : a Learning Analytics Dashboard with Interactive Visualization and Machine Learning
  • 2021
  • In: CEUR Workshop Proceedings, Volume 2985. - : ceur-ws.org.
  • Conference paper (peer-reviewed)abstract
    • A dashboard that provides a central location to monitor and analyze data is an efficient way to track multiple data sources. In the educational community, for example, using dashboards can be a straightforward introduction into the concepts of visual learning analytics. In this paper, the design and implementation of Student Activity Visualization (SAVis), a new Learning Analytics Dashboard (LAD) using interactive visualization and Machine Learning (ML) is presented and discussed. The design of the dashboard was directed towards answering a set of 22 pedagogical questions that teachers might want to investigate in an educational dataset. We evaluate SAVis with an educational dataset containing more than two million samples, including the learning behaviors of 6,423 students who used a web-based learning platform for one year. We show how SAVis can deliver relevant information to teachers and support them to interact with and analyze the students’ data to gain a better overview of students’ activities in terms of, for example, their performance in number of correct/incorrect answers per each topic.
  •  
41.
  • Mohseni, Zeynab, 1988-, et al. (author)
  • SBGTool : Similarity-Based Grouping Tool for Students’ Learning Outcomes
  • 2021
  • In: Proceedings of the 2021 Swedish Workshop on Data Science (SweDS). - : IEEE. - 9781665418300 - 9781665418317
  • Conference paper (peer-reviewed)abstract
    • With the help of Visual Learning Analytics (VLA) tools, teachers can construct meaningful groups of students that can, for example, collaborate and be engaged in productive discussions. However, finding similar samples in large educational databases requires effective similarity measures that capture the teacher’s intent. In this paper we propose a web-based VLA tool called Similarity-Based Grouping (SBGTool), to assist teachers in categorizing students into different groups based on their similar learning outcomes and activities. By using SBGTool, teachers may compare individual students by considering the number of answers (correct and incorrect) in different question categories and time ranges, find the most difficult question categories considering the percentage of similarity to the correct answers, determine the degree of similarity and dissimilarity across students, and find the relationship between students’ activity and success. To demonstrate the tool’s efficacy, we used 10,000 random samples from the EdNet dataset, a large-scale hierarchical educational dataset consisting of student-system interactions from multiple platforms, at university level, collected over a period of two years. The results point to the conclusion that the tool is efficient, can be adapted to different learning domains, and has the potential to assist teachers in maximizing the collaborative learning potential in their classrooms.
  •  
42.
  • Mohseni, Zeynab, et al. (author)
  • SBGTool v2.0: An Empirical Study on a Similarity-Based Grouping Tool for Students’ Learning Outcomes
  • 2022
  • In: Data. - : MDPI. - 2306-5729. ; 7:7
  • Journal article (peer-reviewed)abstract
    • Visual Learning Analytics (VLA) tools and technologies enable meaningful exchange of information between educational data and teachers. This allows teachers to create meaningful groups of students based on possible collaboration and productive discussions. VLA tools also allow to better understand students' educational demands. Finding similar samples in huge educational datasets, however, involves the use of effective similarity measures that represent the teacher's purpose. In this study, we conducted a user study and improved our web-based VLA tool, Similarity-Based Grouping (SBGTool), to help teachers categorize students into groups based on their similar learning outcomes and activities. SBGTool v2.0 differs from SBGTool due to design changes made in response to teacher suggestions, the addition of sorting options to the dashboard table, the addition of a dropdown component to group the students into classrooms and improvement in some visualizations. To counteract colour-blindness, we have also considered a number of color palettes. By applying SBGTool v2.0, teachers may compare the outcomes of individual students inside a classroom, determine which subjects are the most and least difficult over the period of a week or an academic year, identify the number of correct and incorrect responses for the most difficult and easiest subjects, categorize students into various groups based on their learning outcomes, discover the week with the most interactions for examining students' engagement, and find the relationship between students’ activity and study success. We used 10,000 random samples from the EdNet dataset, a large-scale hierarchical educational dataset consisting of student-system interactions from multiple platforms at the university level, collected over a two-year period, to illustrate the tool's efficacy. Finally, we provide the outcomes of the user study that evaluated the tool's effectiveness. The results revealed that even with limited training, the participants were able to complete the required analysis tasks. Additionally, the participants’ feedback showed that the SBGTool v2.0 gained a good level of support for the given tasks, and it had the potential to assist teachers in enhancing collaborative learning in their classrooms.
  •  
43.
  • Mohseni, Zeynab, et al. (author)
  • Towards a Teacher-Oriented Framework of Visual Learning Analytics by Scenario-Based Development
  • 2023
  • In: DCECTEL 2023 Doctoral Consortium of ECTEL 2023. - : Technical University of Aachen. ; , s. 12-17
  • Conference paper (peer-reviewed)abstract
    • Visual Learning Analytics (VLA) tools (such as dashboards) serve as a centralized hub for monitoring and analyzing educational data. Dashboards can assist teachers in data-informed pedagogical decision-making and/or students in following their own learning progress. However, the design of VLA tools should include features of trust in order to make analytics overt among its users. In this study, we propose a framework for the development of VLA tools from beginning to end that describes how we intend to develop the digital and technical infrastructure in our project for teachers. With that aim, we offer one scenario describing how data is managed, transferred, analyzed, and visualized by teachers. The suggested framework intends to make it easier for developers to understand the various steps involved in co-designing and developing a reliable VLA tool and to comprehend the importance of the teacher’s participation in design. VLA tools developed based on the proposed framework have the potential to assist teachers in understanding and analyzing educational data, monitoring students’ learning paths based on their learning outcomes and activities, simplifying regular tasks, and giving teachers more time to support teaching/learning and growth.
  •  
44.
  • Mohseni, Zeynab, et al. (author)
  • Visual Learning Analytics for Educational Interventions in Primary and Secondary Schools : A Scoping Review
  • 2024
  • In: Journal of Learning Analytics. - : Society for Learning Analytics Research (SoLAR). - 1929-7750 .- 1929-7750. ; 11:2, s. 91-111
  • Journal article (peer-reviewed)abstract
    • Visual Learning Analytics (VLA) uses analytics to monitor and assess educational data by combining visual and automated analysis to provide educational explanations. Such tools could aid teachers in primary and secondary schools in making pedagogical decisions, however, the evidence of their effectiveness and benefits is still limited. With this scoping review, we provide a comprehensive overview of related research on proposed VLA methods, as well as identifying any gaps in the literature that could assist in describing new and helpful directions to the field. This review searched all relevant articles in five accessible databases — Scopus, Web of Science, ERIC, ACM, and IEEE Xplore — using 40 keywords. These studies were mapped, categorized, and summarized based on their objectives, the collected data, the intervention approaches employed, and the results obtained. The results determined what affordances the VLA tools allowed, what kind of visualizations were used to inform teachers and students, and, more importantly, positive evidence of educational interventions. We conclude that there are moderate-to-clear learning improvements within the limit of the studies’ interventions to support the use of VLA tools. More systematic research is needed to determine whether any learning gains are translated into long-term improvements.
  •  
45.
  •  
46.
  • Pagliosa, Paulo, et al. (author)
  • MIST : Multiscale Information and Summaries of Texts
  • 2013
  • In: Proceedings. - : IEEE. - 9780769550992
  • Conference paper (peer-reviewed)abstract
    • Combining distinct visual metaphors has been the mechanism adopted by several systems to enable the simultaneous visualization of multiple levels of information in a single layout. However, providing a meaningful layout while avoiding visual clutter is still a challenge. In this work we combine word clouds and a rigid-body simulation engine into an intuitive visualization tool that allows a user to visualize and interact with the content of document collections using a single overlap-free layout. The proposed force scheme ensures that neighboring documents are kept close to each other during and after layout change. Each group of neighboring documents formed on the layout generates a word cloud. A multi-seeded procedure guarantees a harmonious arrangement of distinct word clouds in visual space. The visual metaphor employs disks to represent document instances where the size of each disk defines the importance of the document in the collection. To keep the visualization clean and intuitive, only the most relevant documents are depicted as disks while the remaining ones are either displayed as smaller glyphs to help convey density information or simply removed from the layout. Hidden instances are moved together with its neighbors during rigid-body simulation, should they become visible later, but are not processed individually. This shadow movement avoids excess calculations by the force-based scheme, thus ensuring scalability and interactivity.
  •  
47.
  • Proceedings of the 2021 Swedish Workshop on Data Science (SweDS) : Växjö, SwedenDecember 2–3, 2021
  • 2021
  • Editorial proceedings (peer-reviewed)abstract
    • Welcome to the 9th Swedish Workshop on Data Science (SweDS21) held (virtually) in Växjö, Sweden during December 2–3, 2021. SweDS is a national event with a focus of maintaining and developing Swedish data science research and its applications by fostering the exchange of ideas and promoting collaboration within and across disciplines. This annual workshop brings together researchers and practitioners of data science working in a variety of academic, commercial, industrial, or other sectors. The current and past workshops have included presentations from a variety of domains, e.g., computer science, linguistics, eco- nomics, archaeology, environmental science, education, journalism, medicine, healthcare, biology, sociology, psychology, history, physics, chemistry, geography, forestry, design, and music. SweDS is hosted by Linnaeus University (Växjö, Sweden) this year. Due to the yet ongoing COVID-19 pandemic, travel restrictions, and public health concerns, the workshop is conducted online-only, which has allowed authors both within and outside of Sweden to submit and present their work. 
  •  
48.
  • Proceedings of the 2021 Swedish Workshop on Data Science (SweDS)
  • 2021
  • Editorial proceedings (peer-reviewed)abstract
    • Welcome to the 9th Swedish Workshop on Data Science (SweDS21) held (virtually) in Växjö, Sweden during December 2–3, 2021. SweDS is a national event with a focus of maintaining and developing Swedish data science research and its applications by fostering the exchange of ideas and promoting collaboration within and across disciplines. This annual workshop brings together researchers and practitioners of data science working in a variety of academic, commercial, industrial, or other sectors. The current and past workshops have included presentations from a variety of domains, e.g., computer science, linguistics, eco- nomics, archaeology, environmental science, education, journalism, medicine, healthcare, biology, sociology, psychology, history, physics, chemistry, geography, forestry, design, and music. SweDS is hosted by Linnaeus University (Växjö, Sweden) this year. Due to the yet ongoing COVID-19 pandemic, travel restrictions, and public health concerns, the workshop is conducted online-only, which has allowed authors both within and outside of Sweden to submit and present their work. 
  •  
49.
  • Silva, Renato R. O., et al. (author)
  • Attribute-based Visual Explanation of Multidimensional Projections
  • 2015
  • In: EuroVis Workshop on Visual Analytics (2015). - : Eurographics - European Association for Computer Graphics.
  • Conference paper (peer-reviewed)abstract
    • Multidimensional projections (MPs) are key tools for the analysis of multidimensional data. MPs reduce data dimensionality while keeping the original distance structure in the low-dimensional output space, typically shown by a 2D scatterplot. While MP techniques grow more precise and scalable, they still do not show how the original dimensions (attributes) influence the projection's layout. In other words, MPs show which points are similar, but not why. We propose a visual approach to describe which dimensions contribute mostly to similarity relationships over the projection, thus explain the projection's layout. For this, we rank dimensions by increasing variance over each point-neighborhood, and propose a visual encoding to show the least-varying dimensions over each neighborhood. We demonstrate our technique with both synthetic and real-world datasets.
  •  
50.
  • van den Elzen, Stef, et al. (author)
  • The Flow of Trust : A Visualization Framework to Externalize, Explore, and Explain Trust in ML Applications
  • 2023
  • In: IEEE Computer Graphics and Applications. - : IEEE Computer Society. - 0272-1716 .- 1558-1756. ; 43:2, s. 78-88
  • Journal article (peer-reviewed)abstract
    • We present a conceptual framework for the development of visual interactive techniques to formalize and externalize trust in machine learning (ML) workflows. Currently, trust in ML applications is an implicit process that takes place in the user-s mind. As such, there is no method of feedback or communication of trust that can be acted upon. Our framework will be instrumental in developing interactive visualization approaches that will help users to efficiently and effectively build and communicate trust in ways that fit each of the ML process stages. We formulate several research questions and directions that include: 1) a typology/taxonomy of trust objects, trust issues, and possible reasons for (mis)trust; 2) formalisms to represent trust in machine-readable form; 3) means by which users can express their state of trust by interacting with a computer system (e.g., text, drawing, marking); 4) ways in which a system can facilitate users- expression and communication of the state of trust; and 5) creation of visual interactive techniques for representation and exploration of trust over all stages of an ML pipeline.
  •  
Skapa referenser, mejla, bekava och länka
  • Result 1-50 of 50
Type of publication
journal article (24)
conference paper (21)
editorial proceedings (2)
doctoral thesis (2)
other publication (1)
Type of content
peer-reviewed (46)
other academic/artistic (4)
Author/Editor
Martins, Rafael Mess ... (50)
Kerren, Andreas, Dr. ... (21)
Kucher, Kostiantyn, ... (11)
Masiello, Italo, Pro ... (10)
Chatzimparmpas, Ange ... (10)
Mohseni, Zeynab (7)
show more...
Minghim, Rosane (6)
Jusufi, Ilir, 1983- (5)
Telea, Alexandru C. (5)
Felizardo, Katia Rom ... (4)
Ericsson, Morgan, Do ... (3)
Milrad, Marcelo, 196 ... (3)
Maldonado, José Carl ... (3)
Löwe, Welf (2)
Weyns, Danny (2)
Paulovich, Fernando ... (2)
Coimbra, Danilo B. (2)
Lopes, Alneu de Andr ... (2)
Paradis, Carita (1)
Ericsson, Morgan, 19 ... (1)
Laitinen, Mikko, 197 ... (1)
Levin, Magnus, 1972- (1)
Lundberg, Jonas, 196 ... (1)
Wingkvist, Anna, 197 ... (1)
Riveiro, Maria, 1978 ... (1)
Andrienko, Gennady (1)
Andrienko, Natalia (1)
Mendes, Emilia (1)
Ventocilla, Elio, 19 ... (1)
Peltonen, Jaakko (1)
MacDonell, Stephen G ... (1)
Nordqvist, Jonas (1)
Buljan, Matej (1)
Rossi, Fabrice (1)
Mota, Edson (1)
Tiburtino, Tacito (1)
Diamantino, Pedro (1)
Peixoto, Maycon L. M ... (1)
Neves, Tácito T. A. ... (1)
Wingkvist, Anna, PhD ... (1)
Espadoto, Mateus (1)
Hirata, Nina S. T. (1)
Maldonadon, José Car ... (1)
Salleh, Norsaremah (1)
Barbosa, Ellen Franc ... (1)
Valle, Pedro Henriqu ... (1)
Berge, Elias (1)
Hilasaca, Gladys M. (1)
Marcílio-Jr, Wilson ... (1)
Eler, Danilo M. (1)
show less...
University
Linnaeus University (49)
Linköping University (13)
Blekinge Institute of Technology (5)
Royal Institute of Technology (1)
Jönköping University (1)
Lund University (1)
show more...
University of Skövde (1)
show less...
Language
English (50)
Research subject (UKÄ/SCB)
Natural sciences (46)
Engineering and Technology (3)
Social Sciences (3)
Humanities (2)

Year

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view