SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Liwicki Marcus) "

Sökning: WFRF:(Liwicki Marcus)

  • Resultat 1-50 av 154
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Adewumi, Oluwatosin, 1978-, et al. (författare)
  • Conversational Systems in Machine Learning from the Point of View of the Philosophy of Science—Using Alime Chat and Related Studies
  • 2019
  • Ingår i: Philosophies. - Switzerland : MDPI. - 2409-9287. ; 4:3
  • Tidskriftsartikel (refereegranskat)abstract
    • This essay discusses current research efforts in conversational systems from the philosophy of science point of view and evaluates some conversational systems research activities from the standpoint of naturalism philosophical theory. Conversational systems or chatbots have advanced over the decades and now have become mainstream applications. They are software that users can communicate with, using natural language. Particular attention is given to the Alime Chat conversational system, already in industrial use, and the related research. The competitive nature of systems in production is a result of different researchers and developers trying to produce new conversational systems that can outperform previous or state-of-the-art systems. Different factors affect the quality of the conversational systems produced, and how one system is assessed as being better than another is a function of objectivity and of the relevant experimental results. This essay examines the research practices from, among others, Longino’s view on objectivity and Popper’s stand on falsification. Furthermore, the need for qualitative and large datasets is emphasized. This is in addition to the importance of the peer-review process in scientific publishing, as a means of developing, validating, or rejecting theories, claims, or methodologies in the research community. In conclusion, open data and open scientific discussion fora should become more prominent over the mere publication-focused trend.
  •  
2.
  • Adewumi, Oluwatosin, 1978-, et al. (författare)
  • Corpora Compared : The Case of the Swedish Gigaword & Wikipedia Corpora
  • 2020
  • Konferensbidrag (refereegranskat)abstract
    • In this work, we show that the difference in performance of embeddings from differently sourced data for a given language can be due to other factors besides data size. Natural language processing (NLP) tasks usually perform better with embeddings from bigger corpora. However, broadness of covered domain and noise can play important roles. We evaluate embeddings based on two Swedish corpora: The Gigaword and Wikipedia, in analogy (intrinsic) tests and discover that the embeddings from the Wikipedia corpus generally outperform those from the Gigaword corpus, which is a bigger corpus. Downstream tests will be required to have a definite evaluation.
  •  
3.
  • Adewumi, Oluwatosin, 1978-, et al. (författare)
  • Exploring Swedish & English fastText Embeddings
  • 2022
  • Ingår i: Artificial Intelligence and Cognition 2022. ; , s. 201-208
  • Konferensbidrag (refereegranskat)abstract
    • In this paper, we show that embeddings from relatively smaller corpora sometimes outperform thosefrom larger corpora and we introduce a new Swedish analogy test set and make it publicly available.To achieve good performance in Natural Language Processing (NLP) downstream tasks, several factorsplay important roles: dataset size, the right hyper-parameters, and well-trained embeddings. We utilizethe fastText tool for our experiments. We evaluate both the Swedish and English embeddings that wecreated using intrinsic evaluation (including analogy & Spearman correlation) and compare them with2 common, publicly available embeddings. Our English continuous Bag-of-Words (CBoW)-negativesampling embedding shows better performance compared to the publicly available GoogleNews version.We also describe the relationship between NLP and cognitive science. We contribute the embeddings forresearch or other useful purposes by publicly releasing them.
  •  
4.
  • Adewumi, Oluwatosin, 1978-, et al. (författare)
  • Exploring Swedish & English fastText Embeddings for NER with the Transformer
  • Annan publikation (övrigt vetenskapligt/konstnärligt)abstract
    • In this paper, our main contributions are that embeddings from relatively smaller corpora can outperform ones from far larger corpora and we present the new Swedish analogy test set. To achieve a good network performance in natural language processing (NLP) downstream tasks, several factors play important roles: dataset size, the right hyper-parameters, and well-trained embeddings. We show that, with the right set of hyper-parameters, good network performance can be reached even on smaller datasets. We evaluate the embeddings at the intrinsic level and extrinsic level, by deploying them on the Transformer in named entity recognition (NER) task and conduct significance tests. This is done for both Swedish and English. We obtain better performance in both languages on the downstream task with far smaller training data, compared to recently released, common crawl versions; and character n-grams appear useful for Swedish, a morphologically rich language.
  •  
5.
  • Adewumi, Oluwatosin, 1978-, et al. (författare)
  • Småprat : DialoGPT for Natural Language Generation of Swedish Dialogue by Transfer Learning
  • 2022
  • Ingår i: Vol. 3 (2022): Proceedings of the Northern Lights Deep Learning Workshop 2022. - : Septentrio Academic Publishing. - 2703-6928.
  • Konferensbidrag (refereegranskat)abstract
    • Building open-domain conversational systems (or chatbots) that produce convincing responses is a recognized challenge. Recent state-of-the-art (SoTA) transformer-based models for the generation of natural language dialogue have demonstrated impressive performance in simulating human-like, single-turn conversations in English.This work investigates, by an empirical study, the potential for transfer learning of such models to Swedish language. DialoGPT, an English language pre-trained model, is adapted by training on three different Swedish language conversational datasets obtained from publicly available sources: Reddit, Familjeliv and the GDC. Perplexity score (an automated intrinsic metric) and surveys by human evaluation were used to assess the performances of the fine-tuned models. We also compare the DialoGPT experiments with an attention-mechanism-based seq2seq baseline model, trained on the GDC dataset. The results indicate that the capacity for transfer learning can be exploited with considerable success. Human evaluators asked to score the simulated dialogues judged over 57% of the chatbot responses to be human-like for the model trained on the largest (Swedish) dataset. The work agrees with the hypothesis that deep monolingual models learn some abstractions which generalize across languages. We contribute the codes, datasets and model checkpoints and host the demos on the HuggingFace platform.
  •  
6.
  • Adewumi, Oluwatosin, 1978-, et al. (författare)
  • T5 for Hate Speech, Augmented Data, and Ensemble
  • 2023
  • Ingår i: Sci. - : MDPI. - 2413-4155. ; 5:4
  • Tidskriftsartikel (refereegranskat)abstract
    • We conduct relatively extensive investigations of automatic hate speech (HS) detection using different State-of-The-Art (SoTA) baselines across 11 subtasks spanning six different datasets. Our motivation is to determine which of the recent SoTA models is best for automatic hate speech detection and what advantage methods, such as data augmentation and ensemble, may have on the best model, if any. We carry out six cross-task investigations. We achieve new SoTA results on two subtasks—macro F1 scores of 91.73% and 53.21% for subtasks A and B of the HASOC 2020 dataset, surpassing previous SoTA scores of 51.52% and 26.52%, respectively. We achieve near-SoTA results on two others—macro F1 scores of 81.66% for subtask A of the OLID 2019 and 82.54% for subtask A of the HASOC 2021, in comparison to SoTA results of 82.9% and 83.05%, respectively. We perform error analysis and use two eXplainable Artificial Intelligence (XAI) algorithms (Integrated Gradient (IG) and SHapley Additive exPlanations (SHAP)) to reveal how two of the models (Bi-Directional Long Short-Term Memory Network (Bi-LSTM) and Text-to-Text-Transfer Transformer (T5)) make the predictions they do by using examples. Other contributions of this work are: (1) the introduction of a simple, novel mechanism for correcting Out-of-Class (OoC) predictions in T5, (2) a detailed description of the data augmentation methods, and (3) the revelation of the poor data annotations in the HASOC 2021 dataset by using several examples and XAI (buttressing the need for better quality control). We publicly release our model checkpoints and codes to foster transparency.
  •  
7.
  • Adewumi, Oluwatosin, 1978-, et al. (författare)
  • Vector Representations of Idioms in Conversational Systems
  • 2022
  • Ingår i: Sci. - : MDPI. - 2413-4155. ; 4:4
  • Tidskriftsartikel (refereegranskat)abstract
    • In this study, we demonstrate that an open-domain conversational system trained on idioms or figurative language generates more fitting responses to prompts containing idioms. Idioms are a part of everyday speech in many languages and across many cultures, but they pose a great challenge for many natural language processing (NLP) systems that involve tasks such as information retrieval (IR), machine translation (MT), and conversational artificial intelligence (AI). We utilized the Potential Idiomatic Expression (PIE)-English idiom corpus for the two tasks that we investigated: classification and conversation generation. We achieved a state-of-the-art (SoTA) result of a 98% macro F1 score on the classification task by using the SoTA T5 model. We experimented with three instances of the SoTA dialogue model—the Dialogue Generative Pre-trained Transformer (DialoGPT)—for conversation generation. Their performances were evaluated by using the automatic metric, perplexity, and a human evaluation. The results showed that the model trained on the idiom corpus generated more fitting responses to prompts containing idioms 71.9% of the time in comparison with a similar model that was not trained on the idiom corpus. We have contributed the model checkpoint/demo/code to the HuggingFace hub for public access.
  •  
8.
  • Adewumi, Oluwatosin, 1978- (författare)
  • Vector Representations of Idioms in Data-Driven Chatbots for Robust Assistance
  • 2022
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • This thesis presents resources capable of enhancing solutions of some Natural Language Processing (NLP) tasks, demonstrates the learning of abstractions by deep models through cross-lingual transferability, and shows how deep learning models trained on idioms can enhance open-domain conversational systems. The challenges of open-domain conversational systems are many and include bland repetitive utterances, lack of utterance diversity, lack of training data for low-resource languages, shallow world-knowledge and non-empathetic responses, among others. These challenges contribute to the non-human-like utterances that open-domain conversational systems suffer from. They, hence,have motivated the active research in Natural Language Understanding (NLU) and Natural Language Generation (NLG), considering the very important role conversations (or dialogues) play in human lives. The methodology employed in this thesis involves an iterative set of scientific methods. First, it conducts a systematic literature review to identify the state-of-the-art (SoTA) and gaps, such as the challenges mentioned earlier, in current research. Subsequently, it follows the seven stages of the Machine Learning (ML) life-cycle, which are data gathering (or acquisition), data preparation, model selection, training, evaluation with hyperparameter tuning, prediction and model deployment. For data acquisition, relevant datasets are acquired or created, using benchmark datasets as references, and their data statements are included. Specific contributions of this thesis are the creation of the Swedish analogy test set for evaluating word embeddings and the Potential Idiomatic Expression (PIE)-English idioms corpus for training models in idiom identification and classification. In order to create a benchmark, this thesis performs human evaluation on the generated predictions of some SoTA ML models, including DialoGPT. As different individuals may not agree on all the predictions, the Inter-Annotator Agreement (IAA) is measured. A typical method for measuring IAA is Fleiss Kappa, however, it has a number of shortcomings, including high sensitivity to the number of categories being evaluated. Therefore, this thesis introduces the credibility unanimous score (CUS), which is more intuitive, easier to calculate and seemingly less sensitive to changes in the number of categories being evaluated. The results of human evaluation and comments from evaluators provide valuable feedback on the existing challenges within the models. These create the opportunity for addressing such challenges in future work. The experiments in this thesis test two hypothesis; 1) an open-domain conversational system that is idiom-aware generates more fitting responses to prompts containing idioms, and 2) deep monolingual models learn some abstractions that generalise across languages. To investigate the first hypothesis, this thesis trains English models on the PIE-English idioms corpus for classification and generation. For the second hypothesis, it explores cross-lingual transferability from English models to Swedish, Yorùbá, Swahili, Wolof, Hausa, Nigerian Pidgin English and Kinyarwanda. From the results, the thesis’ additional contributions mainly lie in 1) confirmation of the hypothesis that an open-domain conversational system that is idiom-aware generates more fitting responses to prompts containing idioms, 2) confirmation of the hypothesis that deep monolingual models learn some abstractions that generalise across languages, 3) introduction of CUS and its benefits, 4) insight into the energy-saving and time-saving benefits of more optimal embeddings from relatively smaller corpora, and 5) provision of public access to the model checkpoints that were developed from this work. We further discuss the ethical issues involved in developing robust, open-domain conversational systems. Parts of this thesis are already published in the form of peer-reviewed journal and conference articles.
  •  
9.
  • Adewumi, Oluwatosin, 1978-, et al. (författare)
  • Word2Vec: Optimal Hyper-Parameters and Their Impact on NLP Downstream Tasks
  • Annan publikation (övrigt vetenskapligt/konstnärligt)abstract
    • Word2Vec is a prominent model for natural language processing (NLP) tasks. Similar nspiration is found in distributed embeddings for new state-of-the-art (SotA) deep neural networks.  However, wrong combination of hyper-parameters can produce poor quality vectors. The objective of this work is to empirically show optimal combination of hyper-parameters exists and evaluate various combinations. We compare them with the released, pre-trained original word2vec model. Both intrinsic and extrinsic (downstream) evaluations, including named entity recognition (NER) and sentiment analysis (SA) were carried out. The downstream tasks reveal that the best model is usually task-specific, high analogy scores don’t necessarily correlate positively with F1 scores and the same applies to focus on data alone. Increasing vector dimension size after a point leads to poor quality or performance. If ethical considerations to save time, energy and the environment are made, then reasonably smaller corpora may do just as well or even better in some cases. Besides, using a small corpus, we obtain better human-assigned WordSim scores, corresponding Spearman correlation and better downstream performances (with significance tests) compared to the original model, trained on 100 billion-word corpus.
  •  
10.
  • Adewumi, Oluwatosin, 1978-, et al. (författare)
  • Word2Vec: Optimal hyperparameters and their impact on natural language processing downstream tasks
  • 2022
  • Ingår i: Open Computer Science. - : Walter de Gruyter. - 2299-1093. ; 12:1, s. 134-141
  • Tidskriftsartikel (refereegranskat)abstract
    • Word2Vec is a prominent model for natural language processing tasks. Similar inspiration is found in distributed embeddings (word-vectors) in recent state-of-the-art deep neural networks. However, wrong combination of hyperparameters can produce embeddings with poor quality. The objective of this work is to empirically show that Word2Vec optimal combination of hyper-parameters exists and evaluate various combinations. We compare them with the publicly released, original Word2Vec embedding. Both intrinsic and extrinsic (downstream) evaluations are carried out, including named entity recognition and sentiment analysis. Our main contributions include showing that the best model is usually task-specific, high analogy scores do not necessarily correlate positively with F1 scores, and performance is not dependent on data size alone. If ethical considerations to save time, energy, and the environment are made, then relatively smaller corpora may do just as well or even better in some cases. Increasing the dimension size of embeddings after a point leads to poor quality or performance. In addition, using a relatively small corpus, we obtain better WordSim scores, corresponding Spearman correlation, and better downstream performances (with significance tests) compared to the original model, which is trained on a 100 billion-word corpus.
  •  
11.
  • Adewumi, Tosin, et al. (författare)
  • AfriWOZ: Corpus for Exploiting Cross-Lingual Transfer for Dialogue Generation in Low-Resource, African Languages
  • 2023
  • Ingår i: IJCNN 2023 - International Joint Conference on Neural Networks, Conference Proceedings. - : Institute of Electrical and Electronics Engineers Inc.. - 9781665488686 - 9781665488679
  • Konferensbidrag (refereegranskat)abstract
    • Dialogue generation is an important NLP task fraught with many challenges. The challenges become more daunting for low-resource African languages. To enable the creation of dialogue agents for African languages, we contribute the first high-quality dialogue datasets for 6 African languages: Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda & Yorùbá. There are a total of 9,000 turns, each language having 1,500 turns, which we translate from a portion of the English multi-domain MultiWOZ dataset. Subsequently, we benchmark by investigating & analyzing the effectiveness of modelling through transfer learning by utilziing state-of-the-art (SoTA) deep monolingual models: DialoGPT and BlenderBot. We compare the models with a simple seq2seq baseline using perplexity. Besides this, we conduct human evaluation of single-turn conversations by using majority votes and measure inter-annotator agreement (IAA). We find that the hypothesis that deep monolingual models learn some abstractions that generalize across languages holds. We observe human-like conversations, to different degrees, in 5 out of the 6 languages. The language with the most transferable properties is the Nigerian Pidgin English, with a human-likeness score of 78.1%, of which 34.4% are unanimous. We freely provide the datasets and host the model checkpoints/demos on the HuggingFace hub for public access.
  •  
12.
  • Adewumi, Tosin, 1978-, et al. (författare)
  • Bipol : Multi-axes Evaluation of Bias with Explainability in BenchmarkDatasets
  • 2023
  • Ingår i: Proceedings of Recent Advances in Natural Language Processing. - : Incoma Ltd.. ; , s. 1-10
  • Konferensbidrag (refereegranskat)abstract
    • We investigate five English NLP benchmark datasets (on the superGLUE leaderboard) and two Swedish datasets for bias, along multiple axes. The datasets are the following: Boolean Question (Boolq), CommitmentBank (CB), Winograd Schema Challenge (WSC), Winogender diagnostic (AXg), Recognising Textual Entailment (RTE), Swedish CB, and SWEDN. Bias can be harmful and it is known to be common in data, which ML models learn from. In order to mitigate bias in data, it is crucial to be able to estimate it objectively. We use bipol, a novel multi-axes bias metric with explainability, to estimate and explain how much bias exists in these datasets. Multilingual, multi-axes bias evaluation is not very common. Hence, we also contribute a new, large Swedish bias-labeled dataset (of 2 million samples), translated from the English version and train the SotA mT5 model on it. In addition, we contribute new multi-axes lexica for bias detection in Swedish. We make the codes, model, and new dataset publicly available.
  •  
13.
  • Adewumi, Tosin, 1978-, et al. (författare)
  • ML_LTU at SemEval-2022 Task 4: T5 Towards Identifying Patronizingand Condescending Language
  • 2022
  • Ingår i: Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022). - : Association for Computational Linguistics. ; , s. 473-478
  • Konferensbidrag (refereegranskat)abstract
    • This paper describes the system used by the Machine Learning Group of LTU in subtask 1 of the SemEval-2022 Task 4: Patronizing and Condescending Language (PCL) Detection. Our system consists of finetuning a pretrained text-to-text transfer transformer (T5) and innovatively reducing its out-of-class predictions. The main contributions of this paper are 1) the description of the implementation details of the T5 model we used, 2) analysis of the successes & struggles of the model in this task, and 3) ablation studies beyond the official submission to ascertain the relative importance of data split. Our model achieves an F1 score of 0.5452 on the official test set.
  •  
14.
  • Adewumi, Tosin P., 1978-, et al. (författare)
  • The Challenge of Diacritics in Yorùbá Embeddings
  • 2020
  • Ingår i: ML4D 2020 Proceedings. - : Neural Information Processing Systems Foundation.
  • Konferensbidrag (refereegranskat)abstract
    • The major contributions of this work include the empirical establishment of a better performance for Yoruba embeddings from undiacritized (normalized) dataset and provision of new analogy sets for evaluation.The Yoruba language, being a tonal language, utilizes diacritics (tonal marks) in written form. We show that this affects embedding performance by creating embeddings from exactly the same Wikipedia dataset but with the second one normalized to be undiacritized. We further compare average intrinsic performance with two other work (using analogy test set & WordSim) and we obtain the best performance in WordSim and corresponding Spearman correlation.
  •  
15.
  • Adewumi, Tosin P., 1978-, et al. (författare)
  • Vector Representations of Idioms in Chatbots
  • 2020
  • Ingår i: Proceedings. - : Chalmers University of Technology.
  • Konferensbidrag (refereegranskat)abstract
    • Open-domain chatbots have advanced but still have many gaps. My PhD aims to solve a few of those gaps by creating vector representations of idioms (figures of speech) that will be beneficial to chatbots and natural language processing (NLP), generally. In the process, new, optimal fastText embeddings in Swedish and English have been created and the first Swedish analogy test set, larger than the Google original, for intrinsic evaluation of Swedish embeddings has also been produced. Major milestones have been attained and others are soon to follow. The deliverables of this project will give NLP researchers the opportunity to measure the quality of Swedish embeddings easily and advance state-of-the-art (SotA) in NLP.
  •  
16.
  • Adewumi, Tosin, 1978-, et al. (författare)
  • Potential Idiomatic Expression (PIE)-English: Corpus for Classes of Idioms
  • 2022
  • Ingår i: Proceedings of the 13th Language Resources and Evaluation Conference. - : European Language Resources Association (ELRA). ; , s. 689-696
  • Konferensbidrag (refereegranskat)abstract
    • We present a fairly large, Potential Idiomatic Expression (PIE) dataset for Natural Language Processing (NLP) in English. The challenges with NLP systems with regards to tasks such as Machine Translation (MT), word sense disambiguation (WSD) and information retrieval make it imperative to have a labelled idioms dataset with classes such as it is in this work. To the best of the authors’ knowledge, this is the first idioms corpus with classes of idioms beyond the literal and the general idioms classification. Inparticular, the following classes are labelled in the dataset: metaphor, simile, euphemism, parallelism, personification, oxymoron, paradox, hyperbole, irony and literal. We obtain an overall inter-annotator agreement (IAA) score, between two independent annotators, of 88.89%. Many past efforts have been limited in the corpus size and classes of samples but this dataset contains over 20,100 samples with almost 1,200 cases of idioms (with their meanings) from 10 classes (or senses). The corpus may also be extended by researchers to meet specific needs. The corpus has part of speech (PoS) tagging from the NLTK library. Classification experiments performed on the corpus to obtain a baseline and comparison among three common models, including the state-of-the-art (SoTA) BERT model, give good results. We also make publicly available the corpus and the relevant codes for working with it for NLP tasks.
  •  
17.
  • Adewumi, Tosin, 1978-, et al. (författare)
  • State-of-the-Art in Open-Domain Conversational AI: A Survey
  • 2022
  • Ingår i: Information. - : MDPI. - 2078-2489. ; 13:6
  • Forskningsöversikt (refereegranskat)abstract
    • We survey SoTA open-domain conversational AI models with the objective of presenting the prevailing challenges that still exist to spur future research. In addition, we provide statistics on the gender of conversational AI in order to guide the ethics discussion surrounding the issue. Open-domain conversational AI models are known to have several challenges, including bland, repetitive responses and performance degradation when prompted with figurative language, among others. First, we provide some background by discussing some topics of interest in conversational AI. We then discuss the method applied to the two investigations carried out that make up this study. The first investigation involves a search for recent SoTA open-domain conversational AI models, while the second involves the search for 100 conversational AI to assess their gender. Results of the survey show that progress has been made with recent SoTA conversational AI, but there are still persistent challenges that need to be solved, and the female gender is more common than the male for conversational AI. One main takeaway is that hybrid models of conversational AI offer more advantages than any single architecture. The key contributions of this survey are (1) the identification of prevailing challenges in SoTA open-domain conversational AI, (2) the rarely held discussion on open-domain conversational AI for low-resource languages, and (3) the discussion about the ethics surrounding the gender of conversational AI.
  •  
18.
  • Al-Azzawi, Sana Sabah Sabry, et al. (författare)
  • Innovative Education Approach Toward Active Distance Education: a Case Study in the Introduction to AI course
  • 2022
  • Ingår i: Conference Proceedings. The Future of Education 2022.
  • Konferensbidrag (refereegranskat)abstract
    • In this paper, we first describe various synchronous and asynchronous methods for enhancing student engagement in big online courses. We showcase the implementation of these methods in the “Introduction to Artificial Intelligence (AI)” course at Luleå University of Technology, which has attracted around 500 students in each of its iterations (twice yearly, since 2019). We also show that these methods can be applied efficiently, in terms of the teaching hours required. With the increase in digitization and student mobility, the demand for improved and personalized content delivery for distance education has also increased. This applies not only in the context of traditional undergraduate education, but also in the context of adult education and lifelong learning. This higher level of demand, however, introduces a challenge, especially as it is typically combined with a shortage of staff and needs for efficient education. This challenge is further amplified by the current pandemic situation, which led to an even bigger risk of student-dropout. To mitigate this risk, as well as to meet the increased demand, we applied various methods for creating engaging interaction in our pedagogy based on Moor’s framework: learner-to-learner, learner-to-instructor, and learner-to-content engagement strategies. The main methods of this pedagogy are as follows: short, and interactive videos, active discussions in topic-based forums, regular live sessions with group discussions, and the introduction of optional content at many points in the course, to address different target groups. In this paper, we show how we originally designed and continuously improved the course, without requiring more than 500 teaching hours per iteration (one hour per enrolled student), while we also managed to increase the successful completion rate of the participants by 10%, and improved student engagement and feedback for the course by 50%. We intend to share a set of best-practices applicable to many other e-learning courses in ICT.
  •  
19.
  • Günther, Christian, et al. (författare)
  • Towards a Machine Learning Framework for Drill Core Analysis
  • 2021
  • Ingår i: 2021 Swedish Artificial Intelligence Society Workshop (SAIS). - : IEEE. ; , s. 19-24
  • Konferensbidrag (refereegranskat)abstract
    • This paper discusses existing methods for geological analysis of drill cores and describes the research and development directions of a machine learning framework for such a task. Drill core analysis is one of the first steps of the mining value chain. Such analysis incorporates a high complexity of input features (visual and compositional) derived from multiple sources and commonly by multiple observers. Especially the huge amount of visual information available from the drill core can provide valuable insights, but due to the complexity of many geological materials, automated data acquisition is difficult. This paper (i) describes the difficulty of drill core analysis, (ii) discusses common approaches and recent machine learning-based approaches to address the issues towards automation, and finally, (iii) proposes a machine learning-based framework for drill core analysis which is currently in development. The first major component, the registration of the drill core image for further processing, is presented in detail and evaluated on a dataset of 180 drill core images. We furthermore investigate the amount of labelled data required to automate the drill core analysis. As an interesting outcome, already a few labelled images led to an average precision (AP) of around 80%, which indicates that the manual drill core analysis can be made more efficient with the support of a Machine Learning/labeling workflow.
  •  
20.
  • Javed, Saleha, 1990-, et al. (författare)
  • Understanding the Role of Objectivity in Machine Learning and Research Evaluation
  • 2021
  • Ingår i: Philosophies. - Switzerland : MDPI. - 2409-9287. ; 6:1
  • Tidskriftsartikel (refereegranskat)abstract
    • This article makes the case for more objectivity in Machine Learning (ML) research. Any research work that claims to hold benefits has to be scrutinized based on many parameters, such as the methodology employed, ethical considerations and its theoretical or technical contribution. We approach this discussion from a Naturalist philosophical outlook. Although every analysis may be subjective, it is important for the research community to keep vetting the research for continuous growth and to produce even better work. We suggest standardizing some of the steps in ML research in an objective way and being aware of various biases threatening objectivity. The ideal of objectivity keeps research rational since objectivity requires beliefs to be based on facts. We discuss some of the current challenges, the role of objectivity in the two elements (product and process) that are up for consideration in ML and make recommendations to support the research community.
  •  
21.
  •  
22.
  • Liwicki, Foteini, et al. (författare)
  • Rethinking the Methods and Algorithms for Inner Speech Decoding and Making Them Reproducible
  • 2022
  • Ingår i: NeuroSci. - : MDPI. - 2673-4087. ; 3:2, s. 226-244
  • Tidskriftsartikel (refereegranskat)abstract
    • This study focuses on the automatic decoding of inner speech using noninvasive methods, such as Electroencephalography (EEG). While inner speech has been a research topic in philosophy and psychology for half a century, recent attempts have been made to decode nonvoiced spoken words by using various brain–computer interfaces. The main shortcomings of existing work are reproducibility and the availability of data and code. In this work, we investigate various methods (using Convolutional Neural Network (CNN), Gated Recurrent Unit (GRU), Long Short-Term Memory Networks (LSTM)) for the detection task of five vowels and six words on a publicly available EEG dataset. The main contributions of this work are (1) subject dependent vs. subject-independent approaches, (2) the effect of different preprocessing steps (Independent Component Analysis (ICA), down-sampling and filtering), and (3) word classification (where we achieve state-of-the-art performance on a publicly available dataset). Overall we achieve a performance accuracy of 35.20% and 29.21% when classifying five vowels and six words, respectively, in a publicly available dataset, using our tuned iSpeech-CNN architecture. All of our code and processed data are publicly available to ensure reproducibility. As such, this work contributes to a deeper understanding and reproducibility of experiments in the area of inner speech detection.
  •  
23.
  • Liwicki, Foteini Simistira, et al. (författare)
  • Deep learning for historical document anlysis
  • 2020. - 6
  • Ingår i: Handbook Of Pattern Recognition And Computer Vision. - : World Scientific. ; , s. 287-303
  • Bokkapitel (övrigt vetenskapligt/konstnärligt)abstract
    • This chapter gives an overview of the state of the art and recent methods in the area of historical document analysis. Historical documents differ from the ordinary documents due to the presence of different artifacts. Issues such as poor conditions of the documents, texture, noise and degradation, large variability of page layout, page skew, random alignment, variety of fonts, presence of embellishments, variations in spacing between characters, words, lines, paragraphs and margins, overlapping object boundaries, superimposition of information layers, etc bring complexity issues in analyzing them. Most methods currently rely on deep learning based methods, including Convolutional Neural Networks and Long Short-Term Memory Networks. In addition to the overview of the state of the art, this chapter describes a recently introduced idea for the detection of graphical elements in historical documents and an ongoing effort towards the creation of large database.
  •  
24.
  • Mishra, Ashish Ranjan, et al. (författare)
  • SignEEG v1.0: Multimodal Dataset with Electroencephalography and Hand-written Signature for Biometric Systems
  • 2024
  • Ingår i: Scientific Data. - : Nature Research. - 2052-4463. ; 11
  • Tidskriftsartikel (refereegranskat)abstract
    • Handwritten signatures in biometric authentication leverage unique individual characteristics for identification, offering high specificity through dynamic and static properties. However, this modality faces significant challenges from sophisticated forgery attempts, underscoring the need for enhanced security measures in common applications. To address forgery in signature-based biometric systems, integrating a forgery-resistant modality, namely, noninvasive electroencephalography (EEG), which captures unique brain activity patterns, can significantly enhance system robustness by leveraging multimodality’s strengths. By combining EEG, a physiological modality, with handwritten signatures, a behavioral modality, our approach capitalizes on the strengths of both, significantly fortifying the robustness of biometric systems through this multimodal integration. In addition, EEG’s resistance to replication offers a high-security level, making it a robust addition to user identification and verification. This study presents a new multimodal SignEEG v1.0 dataset based on EEG and hand-drawn signatures from 70 subjects. EEG signals and hand-drawn signatures have been collected with Emotiv Insight and Wacom One sensors, respectively. The multimodal data consists of three paradigms based on mental, & motor imagery, and physical execution: i) thinking of the signature’s image, (ii) drawing the signature mentally, and (iii) drawing a signature physically. Extensive experiments have been conducted to establish a baseline with machine learning classifiers. The results demonstrate that multimodality in biometric systems significantly enhances robustness, achieving high reliability even with limited sample sizes. We release the raw, pre-processed data and easy-to-follow implementation details.
  •  
25.
  •  
26.
  • Sabry, Sana Sabah, et al. (författare)
  • HaT5: Hate Language Identification using Text-to-Text Transfer Transformer
  • 2022
  • Ingår i: 2022 International Joint Conference on Neural Networks (IJCNN): Conference Proceedings. - : Institute of Electrical and Electronics Engineers (IEEE).
  • Konferensbidrag (refereegranskat)abstract
    • We investigate the performance of a state-of-the-art (SoTA) architecture T5 (available on the SuperGLUE) and compare it with 3 other previous SoTA architectures across 5 different tasks from 2 relatively diverse datasets. The datasets are diverse in terms of the number and types of tasks they have. To improve performance, we augment the training data by using a new autoregressive conversational AI model checkpoint. We achieve near-SoTA results on a couple of the tasks - macro F1 scores of 81.66% for task A of the OLID 2019 dataset and 82.54% for task A of the hate speech and offensive content (HASOC) 2021 dataset, where SoTA are 82.9% and 83.05%, respectively. We perform error analysis and explain why one of the models (Bi-LSTM) makes the predictions it does by using a publicly available algorithm: Integrated Gradient (IG). This is because explainable artificial intelligence (XAI) is essential for earning the trust of users. The main contributions of this work are the implementation method of T5, which is discussed; the data augmentation, which brought performance improvements; and the revelation on the shortcomings of the HASOC 2021 dataset. The revelation shows the difficulties of poor data annotation by using a small set of examples where the T5 model made the correct predictions, even when the ground truth of the test set were incorrect (in our opinion). We also provide our model checkpoints on the HuggingFace hub1. https://huggingface.co/sana-ngu/HaT5_augmentation https://huggingface.co/sana-ngu/HaT5.
  •  
27.
  • Saini, Rajkumar, et al. (författare)
  • ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records
  • 2019
  • Ingår i: The 15th IAPR International Conference on Document Analysis and Recognition. - Piscataway, New Jersey, USA : IEEE. ; , s. 1499-1504
  • Konferensbidrag (refereegranskat)abstract
    • In this paper, we present a large historical database of Chinese family records with the aim to develop robust systems for historical document analysis. In this direction, we propose a Historical Document Reading Challenge on Large Chinese Structured Family Records (ICDAR 2019 HDRCCHINESE).The objective of the competition is to recognizeand analyze the layout, and finally detect and recognize thetextlines and characters of the large historical document image dataset containing more than 10000 pages. Cascade R-CNN, CRNN, and U-Net based architectures were trained to evaluatethe performances in these tasks. Error rate of 0.01 has been recorded for textline recognition (Task1) whereas a Jaccard Index of 99.54% has been recorded for layout analysis (Task2).The graph edit distance based total error ratio of 1.5% has been recorded for complete integrated textline detection andrecognition (Task3).
  •  
28.
  • Saini, Rajkumar, Dr. 1988-, et al. (författare)
  • Imagined Object Recognition Using EEG-Based Neurological Brain Signals
  • 2022
  • Ingår i: Recent Trends in Image Processing and Pattern Recognition (RTIP2R 2021). - Cham : Springer. ; , s. 305-319
  • Konferensbidrag (refereegranskat)abstract
    • Researchers have been using Electroencephalography (EEG) to build Brain-Computer Interfaces (BCIs) systems. They have had a lot of success modeling brain signals for applications, including emotion detection, user identification, authentication, and control. The goal of this study is to employ EEG-based neurological brain signals to recognize imagined objects. The user imagines the object after looking at the same on the monitor screen. The EEG signal is recorded when the user thinks up about the object. These EEG signals were processed using signal processing methods, and machine learning algorithms were trained to classify the EEG signals. The study involves coarse and fine level EEG signal classification. The coarse-level classification categorizes the signals into three classes (Char, Digit, Object), whereas the fine-level classification categorizes the EEG signals into 30 classes. The recognition rates of 97.30%, and 93.64% were recorded at coarse and fine level classification, respectively. Experiments indicate the proposed work outperforms the previous methods.
  •  
29.
  • Shankar, Priyamvada, 1991- (författare)
  • Data driven crop disease modeling
  • 2022
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • The concept of precision farming deals with the creation and use of data from machinery and sensors on and off the field to optimize resources and sustainably intensify food production to keep up with increasing demand. However, in the face of a growing amount of data being collected, smarter data processing and analysis techniques are needed and have prompted the evaluation and incorporation of artificial intelligence (AI) and machine learning (ML) techniques for multiple use cases right from seeding to harvesting. One such use case that has yet to fully gauge the propositions of AI and ML is crop disease prediction. Since multiple biotic and abiotic factors could be responsible for the occurrence of a disease, modeling requires finding suitable data associated with these factors from multiple farms for an extended time frame and developing smarter models able to capture underlying relationships between them. This thesis presents research conducted to develop data-driven methodologies and optimization approaches for building crop disease models. The objective is realized by breaking down the task into three modules: (i) data collection; (ii) data processing and model building; and finally, (iii) the maintenance of models in production. The traditional data collection approach for disease modeling is through setting up of trials which is expensive and labor-intensive which prompted the evaluation of other novel and free to access data sources. Therefore, in module one two studies were conducted to assess the suitability of social media platforms and remote sensing products. The results show that social media is not a viable option yet due to limited geo-referenced data and ambiguity in categorizing the discussions. On the other hand, vegetation indices derived from multispectral satellite imagery, despite their high spatial granularity, are an interesting addition to the modeling pipeline. Moving on to module two, a study was conducted to demonstrate the process of fusing and preparing data from multiple sources with different formats collected in an extended time frame to be used for model building. The study establishes the relevance of using advanced machine learning models such as deep learning in the prediction of crop diseases. The results show that given the appropriate data preparation process at the right data granularity and the use of some smart tricks, neural network-based models hold the potential to outperform widely used models such as XGBoost. Since neural networks offer advantages such as multimodal learning, transfer learning, and automated feature engineering, which are crucial in building scalable models with heterogeneous data and reduced human effort, the observations of this study led to a follow-up study. This study investigates neural network-based algorithms specifically designed for tabular data and compares them against popular tree ensemble-based models. Apart from acting as a comprehensive analysis of the two families of techniques the results showed that although neural network-based models were not able to outperform tree-based models, they achieved comparable results and allowed for the creation of easier and more accurate models for new diseases by application of transfer learning. Climate change leads to unexpected weather events and modified disease occurrence patterns that cause static models to drift rapidly. Models need to be maintained to ensure they are performing as required. Capturing real-time data and triggering retraining when enough new data has been collected can help maintain models by acting as a feedback loop for model improvement. This was attempted by collecting crowd-sourced data from a disease recognition app, but it was not usable in its current form and required further annotation. Since annotations are expensive and time-consuming, a study for real-life agricultural data retrieval and large-scale annotation flow optimization based on similarity search technique is presented which significantly optimizes the annotation process. The results derived from these studies are highly relevant for progressing the United Nations Sustainable Development Goal of Zero Hunger. It is also expected to ease farmers' anxiety related to yield loss due to crop diseases and enhance their capability of planning and scheduling management practices by giving them an early warning of disease occurrence. The results have been verified through comparison with traditional crop disease prediction methods and interaction with experienced agronomists working for a major AgTech company.
  •  
30.
  • Simistira Liwicki, Foteini, et al. (författare)
  • Analysing Musical Performance in Videos Using Deep Neural Networks
  • 2020
  • Ingår i: Proceedings of the 1st Joint Conference on AI Music Creativity, AIMC, Stockholm, Sweden.
  • Konferensbidrag (refereegranskat)abstract
    • This paper proposes a method to facilitate labelling of music performance videos with automatic methods (3D-Convolutional Neural Networks) instead of tedious labelling by human experts. In particular, we are interested in the detection of the 17 musical performance gestures generated during the performance (guitar play) of musical pieces which have been video-recorded. In earlier work, these videos have been annotated manually by a human expert according to the labels in the musical analysis methodology. Such a labelling method is time-consuming and would not be scalable to big collections of video recordings. In this paper, we use a 3D-CNN model from activity recognition tasks and adapt it to the music performance dataset following a transfer learning approach. In particular, the weights of the first blocks were kept and only the later layers as well as additional classification layers were re-trained. The model was evaluated on a set of 17 music performance gestures and reports an average accuracy of 97.9% (F1:77.8%) on the training set and 85.7% (F1:38.6%) on the test set. An additional analysis shows which gestures are particularly difficult and suggest improvements for future work.
  •  
31.
  • Simistira Liwicki, Foteini, et al. (författare)
  • Bimodal electroencephalography-functional magnetic resonance imaging dataset for inner-speech recognition
  • 2023
  • Ingår i: Scientific Data. - : Springer Nature. - 2052-4463. ; 10
  • Tidskriftsartikel (refereegranskat)abstract
    • The recognition of inner speech, which could give a ‘voice’ to patients that have no ability to speak or move, is a challenge for brain-computer interfaces (BCIs). A shortcoming of the available datasets is that they do not combine modalities to increase the performance of inner speech recognition. Multimodal datasets of brain data enable the fusion of neuroimaging modalities with complimentary properties, such as the high spatial resolution of functional magnetic resonance imaging (fMRI) and the temporal resolution of electroencephalography (EEG), and therefore are promising for decoding inner speech. This paper presents the first publicly available bimodal dataset containing EEG and fMRI data acquired nonsimultaneously during inner-speech production. Data were obtained from four healthy, right-handed participants during an inner-speech task with words in either a social or numerical category. Each of the 8-word stimuli were assessed with 40 trials, resulting in 320 trials in each modality for each participant. The aim of this work is to provide a publicly available bimodal dataset on inner speech, contributing towards speech prostheses.
  •  
32.
  • Simistira Liwicki, Foteini, et al. (författare)
  • Bimodal pilot study on inner speech decoding reveals the potential of combining EEG and fMRI
  • 2024
  • Annan publikation (övrigt vetenskapligt/konstnärligt)abstract
    • This paper presents the first publicly available bimodal electroencephalography (EEG) / functional magnetic resonance imaging (fMRI) dataset and an open source benchmark for inner speech decoding. Decoding inner speech or thought (expressed through a voice without actual speaking); is a challenge with typical results close to chance level. The dataset comprises 1280 trials (4 subjects, 8 stimuli = 2 categories * 4 words, and 40 trials per stimuli) in each modality. The pilot study reports for the binary classification, a mean accuracy of 71.72\% when combining the two modalities (EEG and fMRI), compared to 62.81% and 56.17% when using EEG, resp. fMRI alone. The same improvement in performance for word classification (8 classes) can be observed (30.29% with combination, 22.19%, and 17.50% without). As such, this paper demonstrates that combining EEG with fMRI is a promising direction for inner speech decoding.
  •  
33.
  • Zarris, Dimitrios, et al. (författare)
  • Enhancing Educational Paradigms with Large Language Models: From Teacher to Study Assistants in Personalized Learning
  • 2024
  • Ingår i: EDULEARN24 Proceedings. - : IATED Academy. ; , s. 1295-1303
  • Konferensbidrag (refereegranskat)abstract
    • This paper investigates the application of large language models (LLMs) in the educational field, specifically focusing on roles like "Teacher Assistant" and "Study Assistant" to enhance personalized and adaptive learning. The significance of integrating AI in educational frameworks is underscored, given the shift towards AI-powered educational tools. The methodology of this research is structured and multifaceted, examining the dynamics between prompt engineering, methodological approaches, and LLM outputs with the help of indexed documents. The study bifurcates its approach into prompt structuring and advanced prompt engineering techniques. Initial investigations revolve around persona and template prompts to evaluate their individual and collective effects on LLM outputs. Advanced techniques, including few-shot and chain-of-thought prompting, are analyzed for their potential to elevate the quality and specificity of LLM responses. The "Study Assistant" aspect of the study involves applying these techniques to educational content across disciplines such as biology, mathematics, and physics. Findings from this research are poised to contribute significantly to the evolution of AI in education, offering insights into the variables that enhance LLM performance. This paper not only enriches the academic discourse on LLMs but also provides actionable insights for the development of sophisticated AI-based educational tools. As the educational landscape continues to evolve, this research underscores the imperative for continuous exploration and refinement in the application of AI to fully realize its benefits in education.
  •  
34.
  • Abid, Nosheen, 1993-, et al. (författare)
  • Burnt Forest Estimation from Sentinel-2 Imagery of Australia using Unsupervised Deep Learning
  • 2021
  • Ingår i: Proceedings of the Digital Image Computing: Technqiues and Applications (DICTA). - : IEEE. ; , s. 74-81
  • Konferensbidrag (refereegranskat)abstract
    • Massive wildfires not only in Australia, but also worldwide are burning millions of hectares of forests and green land affecting the social, ecological, and economical situation. Widely used indices-based threshold methods like Normalized Burned Ratio (NBR) require a huge amount of data preprocessing and are specific to the data capturing source. State-of-the-art deep learning models, on the other hand, are supervised and require domain experts knowledge for labeling the data in huge quantity. These limitations make the existing models difficult to be adaptable to new variations in the data and capturing sources. In this work, we have proposed an unsupervised deep learning based architecture to map the burnt regions of forests by learning features progressively. The model considers small patches of satellite imagery and classifies them into burnt and not burnt. These small patches are concatenated into binary masks to segment out the burnt region of the forests. The proposed system is composed of two modules: 1) a state-of-the-art deep learning architecture for feature extraction and 2) a clustering algorithm for the generation of pseudo labels to train the deep learning architecture. The proposed method is capable of learning the features progressively in an unsupervised fashion from the data with pseudo labels, reducing the exhausting efforts of data labeling that requires expert knowledge. We have used the realtime data of Sentinel-2 for training the model and mapping the burnt regions. The obtained F1-Score of 0.87 demonstrates the effectiveness of the proposed model.
  •  
35.
  • Abid, Nosheen, 1993- (författare)
  • Deep Learning for Geo-referenced Data : Case Study: Earth Observation
  • 2021
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • The thesis focuses on machine learning methods for Earth Observation (EO) data, more specifically, remote sensing data acquired by satellites and drones. EO plays a vital role in monitoring the Earth’s surface and modelling climate change to take necessary precautionary measures. Initially, these efforts were dominated by methods relying on handcrafted features and expert knowledge. The recent advances of machine learning methods, however, have also led to successful applications in EO. This thesis explores supervised and unsupervised approaches of Deep Learning (DL) to monitor natural resources of water bodies and forests. The first study of this thesis introduces an Unsupervised Curriculum Learning (UCL) method based on widely-used DL models to classify water resources from RGB remote sensing imagery. In traditional settings, human experts labeled images to train the deep models which is costly and time-consuming. UCL, instead, can learn the features progressively in an unsupervised fashion from the data, reducing the exhausting efforts of labeling. Three datasets of varying resolution are used to evaluate UCL and show its effectiveness: SAT-6, EuroSAT, and PakSAT. UCL outperforms the supervised methods in domain adaptation, which demonstrates the effectiveness of the proposed algorithm. The subsequent study is an extension of UCL for the multispectral imagery of Australian wildfires. This study has used multispectral Sentinel-2 imagery to create the dataset for the forest fires ravaging Australia in late 2019 and early 2020. 12 out of the 13 spectral bands of Sentinel-2 are concatenated in a way to make them suitable as a three-channel input to the unsupervised architecture. The unsupervised model then classified the patches as either burnt or not burnt. This work attains 87% F1-Score mapping the burnt regions of Australia, demonstrating the effectiveness of the proposed method. The main contributions of this work are (i) the creation of two datasets using Sentinel-2 Imagery, PakSAT dataset and Australian Forest Fire dataset; (ii) the introduction of UCL that learns the features progressively without the need of labelled data; and (iii) experimentation on relevant datasets for water body and forest fire classification. This work focuses on patch-level classification which could in future be expanded to pixel-based classification. Moreover, the methods proposed in this study can be extended to the multi-class classification of aerial imagery. Further possible future directions include the combination of geo-referenced meteorological and remotely sensed image data to explore proposed methods. Lastly, the proposed method can also be adapted to other domains involving multi-spectral and multi-modal input, such as, historical documents analysis, forgery detection in documents, and Natural Language Processing (NLP) classification tasks.
  •  
36.
  • Abid, Nosheen, et al. (författare)
  • Seagrass classification using unsupervised curriculum learning (UCL)
  • 2024
  • Ingår i: Ecological Informatics. - : Elsevier B.V.. - 1574-9541 .- 1878-0512. ; 83
  • Tidskriftsartikel (refereegranskat)abstract
    • Seagrass ecosystems are pivotal in marine environments, serving as crucial habitats for diverse marine species and contributing significantly to carbon sequestration. Accurate classification of seagrass species from underwater images is imperative for monitoring and preserving these ecosystems. This paper introduces Unsupervised Curriculum Learning (UCL) to seagrass classification using the DeepSeagrass dataset. UCL progressively learns from simpler to more complex examples, enhancing the model's ability to discern seagrass features in a curriculum-driven manner. Experiments employing state-of-the-art deep learning architectures, convolutional neural networks (CNNs), show that UCL achieved overall 90.12 % precision and 89 % recall, which significantly improves classification accuracy and robustness, outperforming some traditional supervised learning approaches like SimCLR, and unsupervised approaches like Zero-shot CLIP. The methodology of UCL involves four main steps: high-dimensional feature extraction, pseudo-label generation through clustering, reliable sample selection, and fine-tuning the model. The iterative UCL framework refines CNN's learning of underwater images, demonstrating superior accuracy, generalization, and adaptability to unseen seagrass and background samples of undersea images. The findings presented in this paper contribute to the advancement of seagrass classification techniques, providing valuable insights into the conservation and management of marine ecosystems. The code and dataset are made publicly available and can be assessed here: https://github.com/nabid69/Unsupervised-Curriculum-Learning—UCL. 
  •  
37.
  • Abid, Nosheen, 1993-, et al. (författare)
  • UCL: Unsupervised Curriculum Learning for Utility Pole Detection from Aerial Imagery
  • 2022
  • Ingår i: Proceedings of the Digital Image Computing: Technqiues and Applications (DICTA). - : IEEE. - 9781665456425
  • Konferensbidrag (refereegranskat)abstract
    • This paper introduces a machine learning-based approach for detecting electric poles, an essential part of power grid maintenance. With the increasing popularity of deep learning, several such approaches have been proposed for electric pole detection. However, most of these approaches are supervised, requiring a large amount of labeled data, which is time-consuming and labor-intensive. Unsupervised deep learning approaches have the potential to overcome the need for huge amounts of training data. This paper presents an unsupervised deep learning framework for utility pole detection. The framework combines Convolutional Neural Network (CNN) and clustering algorithms with a selection operation. The CNN architecture for extracting meaningful features from aerial imagery, a clustering algorithm for generating pseudo labels for the resulting features, and a selection operation to filter out reliable samples to fine-tune the CNN architecture further. The fine-tuned version then replaces the initial CNN model, thus improving the framework, and we iteratively repeat this process so that the model learns the prominent patterns in the data progressively. The presented framework is trained and tested on a small dataset of utility poles provided by “Mention Fuvex” (a Spanish company utilizing long-range drones for power line inspection). Our extensive experimentation demonstrates the progressive learning behavior of the proposed method and results in promising classification scores with significance test having p−value<0.00005 on the utility pole dataset.
  •  
38.
  • Adewumi, Oluwatosin, 1978-, et al. (författare)
  • Inner For-Loop for Speeding Up Blockchain Mining
  • 2020
  • Ingår i: Open Computer Science. - Poland : Walter de Gruyter. - 2299-1093. ; 10:1, s. 42-47
  • Tidskriftsartikel (refereegranskat)abstract
    • In this paper, the authors propose to increase the efficiency of blockchain mining by using a population-based approach. Blockchain relies on solving difficult mathematical problems as proof-of-work within a network before blocks are added to the chain. Brute force approach, advocated by some as the fastest algorithm for solving partial hash collisions and implemented in Bitcoin blockchain, implies exhaustive, sequential search. It involves incrementing the nonce (number) of the header by one, then taking a double SHA-256 hash at each instance and comparing it with a target value to ascertain if lower than that target. It excessively consumes both time and power. In this paper, the authors, therefore, suggest using an inner for-loop for the population-based approach. Comparison shows that it’s a slightly faster approach than brute force, with an average speed advantage of about 1.67% or 3,420 iterations per second and 73% of the time performing better. Also, we observed that the more the total particles deployed, the better the performance until a pivotal point. Furthermore, a recommendation on taming the excessive use of power by networks, like Bitcoin’s, by using penalty by consensus is suggested.
  •  
39.
  • Adewumi, Oluwatosin, 1978- (författare)
  • Word Vector Representations using Shallow Neural Networks
  • 2021
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • This work highlights some important factors for consideration when developing word vector representations and data-driven conversational systems. The neural network methods for creating word embeddings have gained more prominence than their older, count-based counterparts.However, there are still challenges, such as prolonged training time and the need for more data, especially with deep neural networks. Shallow neural networks with lesser depth appear to have the advantage of less complexity, however, they also face challenges, such as sub-optimal combination of hyper-parameters which produce sub-optimal models. This work, therefore, investigates the following research questions: "How importantly do hyper-parameters influence word embeddings’ performance?" and "What factors are important for developing ethical and robust conversational systems?" In answering the questions, various experiments were conducted using different datasets in different studies. The first study investigates, empirically, various hyper-parameter combinations for creating word vectors and their impact on a few natural language processing (NLP) downstream tasks: named entity recognition (NER) and sentiment analysis (SA). The study shows that optimal performance of embeddings for downstream \acrshort{nlp} tasks depends on the task at hand.It also shows that certain combinations give strong performance across the tasks chosen for the study. Furthermore, it shows that reasonably smaller corpora are sufficient or even produce better models in some cases and take less time to train and load. This is important, especially now that environmental considerations play prominent role in ethical research. Subsequent studies build on the findings of the first and explore the hyper-parameter combinations for Swedish and English embeddings for the downstream NER task. The second study presents the new Swedish analogy test set for evaluation of Swedish embeddings. Furthermore, it shows that character n-grams are useful for Swedish, a morphologically rich language. The third study shows that broad coverage of topics in a corpus appears to be important to produce better embeddings and that noise may be helpful in certain instances, though they are generally harmful. Hence, relatively smaller corpus can show better performance than a larger one, as demonstrated in the work with the smaller Swedish Wikipedia corpus against the Swedish Gigaword. The argument is made, in the final study (in answering the second question) from the point of view of the philosophy of science, that the near-elimination of the presence of unwanted bias in training data and the use of foralike the peer-review, conferences, and journals to provide the necessary avenues for criticism and feedback are instrumental for the development of ethical and robust conversational systems.
  •  
40.
  • Agües Paszkowsky, Núria, et al. (författare)
  • Vegetation and Drought Trends in Sweden’s Mälardalen Region – Year-on-Year Comparison by Gaussian Process Regression
  • 2020
  • Ingår i: 2020 Swedish Workshop on Data Science (SweDS). - : IEEE. - 9781728192048
  • Konferensbidrag (refereegranskat)abstract
    • This article describes analytical work carried out in a pilot project for the Swedish Space Data Lab (SSDL), which focused on monitoring drought in the Mälardalen region in central Sweden. Normalized Difference Vegetation Index (NDVI) and the Moisture Stress Index (MSI) – commonly used to analyse drought – are estimated from Sentinel 2 satellite data and averaged over a selection of seven grassland areas of interest. To derive a complete time-series over a season that interpolates over days with missing data, we use Gaussian Process Regression, a technique from multivariate Bayesian analysis. The analysis show significant differences at 95% confidence for five out of seven areas when comparing the peak drought period in the dry year 2018 compared to the corresponding period in 2019. A cross-validation analysis indicates that the model parameter estimates are robust for temporal covariance structure (while inconclusive for the spatial dimensions). There were no signs of over-fitting when comparing in-sample and out-of-sample RMSE.
  •  
41.
  • Ahmad, Riaz, et al. (författare)
  • A Deep Learning based Arabic Script Recognition System : Benchmark on KHAT
  • 2020
  • Ingår i: The International Arab Journal of Information Technology. - : Zarqa University, Jordan. - 1683-3198 .- 2309-4524. ; 17:3, s. 299-305
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents a deep learning benchmark on a complex dataset known as KFUPM Handwritten Arabic TexT (KHATT). The KHATT data-set consists of complex patterns of handwritten Arabic text-lines. This paper contributes mainly in three aspects i.e., (1) pre-processing, (2) deep learning based approach, and (3) data-augmentation. The pre-processing step includes pruning of white extra spaces plus de-skewing the skewed text-lines. We deploy a deep learning approach based on Multi-Dimensional Long Short-Term Memory (MDLSTM) networks and Connectionist Temporal Classification (CTC). The MDLSTM has the advantage of scanning the Arabic text-lines in all directions (horizontal and vertical) to cover dots, diacritics, strokes and fine inflammation. The data-augmentation with a deep learning approach proves to achieve better and promising improvement in results by gaining 80.02% Character Recognition (CR) over 75.08% as baseline.
  •  
42.
  • Ahmad, Riaz, et al. (författare)
  • Recognizable units in Pashto language for OCR
  • 2015
  • Ingår i: 13th International Conference on Document Analysis and Recognition. - : IEEE. ; , s. 1246-1250
  • Konferensbidrag (refereegranskat)abstract
    • Atomic segmentation of cursive scripts into con- stituent characters is one of the most challenging problems in pattern recognition. To avoid segmentation in cursive script, concrete shapes are considered as recognizable units. Therefore, the objective of this work is to find out the alternate recognizable units in Pashto cursive script. These alternatives are ligatures and primary ligatures. However, we need sound statistical analysis to find the appropriate numbers of ligatures and primary ligatures in Pashto script. In this work, a corpus of 2, 313, 736 Pashto words are extracted from a large scale diversified web sources, and total of 19, 268 unique ligatures have been identified in Pashto cursive script. Analysis shows that only 7000 ligatures represent 91% portion of overall corpus of the Pashto unique words. Similarly, about 7, 681 primary ligatures are also identified which represent the basic shapes of all the ligatures.
  •  
43.
  • Ahmad, Riaz, et al. (författare)
  • Scale and Rotation Invariant OCR for Pashto Cursive Script using MDLSTM Network
  • 2015
  • Ingår i: 13th International Conference on Document Analysis and Recognition. - : IEEE. ; , s. 1101-1105
  • Konferensbidrag (refereegranskat)abstract
    • Optical Character Recognition (OCR) of cursive scripts like Pashto and Urdu is difficult due the presence of complex ligatures and connected writing styles. In this paper, we evaluate and compare different approaches for the recognition of such complex ligatures. The approaches include Hidden Markov Model (HMM), Long Short Term Memory (LSTM) network and Scale Invariant Feature Transform (SIFT). Current state of the art in cursive script assumes constant scale without any rotation, while real world data contain rotation and scale variations. This research aims to evaluate the performance of sequence classifiers like HMM and LSTM and compare their performance with descriptor based classifier like SIFT. In addition, we also assess the performance of these methods against the scale and rotation variations in cursive script ligatures. Moreover, we introduce a database of 480,000 images containing 1000 unique ligatures or sub-words of Pashto. In this database, each ligature has 40 scale and 12 rotation variations. The evaluation results show a significantly improved performance of LSTM over HMM and traditional feature extraction technique such as SIFT. Keywords.
  •  
44.
  • Ahmed, Muhammad, et al. (författare)
  • Survey and Performance Analysis of Deep Learning Based Object Detection in Challenging Environments
  • 2021
  • Ingår i: Sensors. - : MDPI. - 1424-8220. ; 21:15
  • Forskningsöversikt (refereegranskat)abstract
    • Recent progress in deep learning has led to accurate and efficient generic object detection networks. Training of highly reliable models depends on large datasets with highly textured and rich images. However, in real-world scenarios, the performance of the generic object detection system decreases when (i) occlusions hide the objects, (ii) objects are present in low-light images, or (iii) they are merged with background information. In this paper, we refer to all these situations as challenging environments. With the recent rapid development in generic object detection algorithms, notable progress has been observed in the field of deep learning-based object detection in challenging environments. However, there is no consolidated reference to cover the state of the art in this domain. To the best of our knowledge, this paper presents the first comprehensive overview, covering recent approaches that have tackled the problem of object detection in challenging environments. Furthermore, we present a quantitative and qualitative performance analysis of these approaches and discuss the currently available challenging datasets. Moreover, this paper investigates the performance of current state-of-the-art generic object detection algorithms by benchmarking results on the three well-known challenging datasets. Finally, we highlight several current shortcomings and outline future directions.
  •  
45.
  •  
46.
  • Alberti, Michele, et al. (författare)
  • Are You Tampering with My Data?
  • 2019
  • Ingår i: Computer Vision – ECCV 2018 Workshops. - Cham : Springer. - 9783030110116 ; , s. 296-312
  • Konferensbidrag (refereegranskat)abstract
    • We propose a novel approach towards adversarial attacks on neural networks (NN), focusing on tampering the data used for training instead of generating attacks on trained models. Our network-agnostic method creates a backdoor during training which can be exploited at test time to force a neural network to exhibit abnormal behaviour. We demonstrate on two widely used datasets (CIFAR-10 and SVHN) that a universal modification of just one pixel per image for all the images of a class in the training set is enough to corrupt the training procedure of several state-of-the-art deep neural networks, causing the networks to misclassify any images to which the modification is applied. Our aim is to bring to the attention of the machine learning community, the possibility that even learning-based methods that are personally trained on public datasets can be subject to attacks by a skillful adversary.
  •  
47.
  • Alberti, M., et al. (författare)
  • DeepDIVA : A Highly-Functional Python Framework for Reproducible Experiments
  • 2018
  • Ingår i: Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR 2018. - : IEEE. - 9781538658758 ; , s. 423-428
  • Konferensbidrag (refereegranskat)abstract
    • We introduce DeepDIVA: an infrastructure designed to enable quick and intuitive setup of reproducible experiments with a large range of useful analysis functionality. Reproducing scientific results can be a frustrating experience, not only in document image analysis but in machine learning in general. Using DeepDIVA a researcher can either reproduce a given experiment or share their own experiments with others. Moreover, the framework offers a large range of functions, such as boilerplate code, keeping track of experiments, hyper-parameter optimization, and visualization of data and results. To demonstrate the effectiveness of this framework, this paper presents case studies in the area of handwritten document analysis where researchers benefit from the integrated functionality. DeepDIVA is implemented in Python and uses the deep learning framework PyTorch. It is completely open source(1), and accessible as Web Service through DIVAServices(2).
  •  
48.
  • Alberti, Michele, et al. (författare)
  • Improving Reproducible Deep Learning Workflows with DeepDIVA
  • 2019
  • Ingår i: Proceedings 6<sup>th</sup> Swiss Conference on Data Science. - : IEEE. ; , s. 13-18
  • Konferensbidrag (refereegranskat)abstract
    • The field of deep learning is experiencing a trend towards producing reproducible research. Nevertheless, it is still often a frustrating experience to reproduce scientific results. This is especially true in the machine learning community, where it is considered acceptable to have black boxes in your experiments. We present DeepDIVA, a framework designed to facilitate easy experimentation and their reproduction. This framework allows researchers to share their experiments with others, while providing functionality that allows for easy experimentation, such as: boilerplate code, experiment management, hyper-parameter optimization, verification of data integrity and visualization of data and results. Additionally, the code of DeepDIVA is well-documented and supported by several tutorials that allow a new user to quickly familiarize themselves with the framework.
  •  
49.
  • Alberti, Michele, et al. (författare)
  • Labeling, Cutting, Grouping : An Efficient Text Line Segmentation Method for Medieval Manuscripts
  • 2019
  • Ingår i: The 15th IAPR International Conference on Document Analysis and Recognition. - : IEEE. ; , s. 1200-1206
  • Konferensbidrag (övrigt vetenskapligt/konstnärligt)abstract
    • This paper introduces a new way for text-line extraction by integrating deep-learning based pre-classification and state-of-the-art segmentation methods. Text-line extraction in complex handwritten documents poses a significant challenge, even to the most modern computer vision algorithms. Historical manuscripts are a particularly hard class of documents as they present several forms of noise, such as degradation, bleed-through, interlinear glosses, and elaborated scripts. In this work, we propose a novel method which uses semantic segmentation at pixel level as intermediate task, followed by a text-line extraction step. We measured the performance of our method on a recent dataset of challenging medieval manuscripts and surpassed state-of-the-art results by reducing the error by 80.7%. Furthermore, we demonstrate the effectiveness of our approach on various other datasets written in different scripts. Hence, our contribution is two-fold. First, we demonstrate that semantic pixel segmentation can be used as strong denoising pre-processing step before performing text line extraction. Second, we introduce a novel, simple and robust algorithm that leverages the high-quality semantic segmentation to achieve a text-line extraction performance of 99.42% line IU on a challenging dataset.
  •  
50.
  • Alberti, Michele, et al. (författare)
  • Trainable Spectrally Initializable Matrix Transformations in Convolutional Neural Networks
  • 2021
  • Ingår i: Proceedings of ICPR 2020. - : IEEE. ; , s. 8204-8211
  • Konferensbidrag (refereegranskat)abstract
    • In this work, we introduce a new architectural component to Neural Network (NN), i.e., trainable and spectrally initializable matrix transformations on feature maps. While previous literature has already demonstrated the possibility of adding static spectral transformations as feature processors, our focus is on more general trainable transforms. We study the transforms in various architectural configurations on four datasets of different nature: from medical (ColorectalHist, HAM10000) and natural (Flowers) images to historical documents (CB55). With rigorous experiments that control for the number of parameters and randomness, we show that networks utilizing the introduced matrix transformations outperform vanilla neural networks. The observed accuracy increases appreciably across all datasets. In addition, we show that the benefit of spectral initialization leads to significantly faster convergence, as opposed to randomly initialized matrix transformations. The transformations are implemented as auto-differentiable PyTorch modules that can be incorporated into any neural network architecture. The entire code base is open-source.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-50 av 154
Typ av publikation
konferensbidrag (79)
tidskriftsartikel (44)
annan publikation (11)
licentiatavhandling (9)
forskningsöversikt (6)
doktorsavhandling (4)
visa fler...
bokkapitel (1)
visa färre...
Typ av innehåll
refereegranskat (118)
övrigt vetenskapligt/konstnärligt (35)
Författare/redaktör
Liwicki, Marcus (152)
Liwicki, Foteini (27)
Stricker, Didier (23)
Afzal, Muhammad Zesh ... (20)
Saini, Rajkumar, Dr. ... (19)
Mokayed, Hamam (18)
visa fler...
Pagani, Alain (16)
Upadhyay, Richa (14)
Adewumi, Oluwatosin, ... (13)
Hashmi, Khurram Azee ... (13)
Sandin, Fredrik, 197 ... (13)
Chhipa, Prakash Chan ... (13)
Ingold, Rolf (12)
Pondenkandath, Vinay ... (10)
Alberti, Michele (9)
Almqvist, Andreas (9)
Abid, Nosheen, 1993- (8)
Kovács, György, Post ... (8)
Seuret, Mathias (8)
Usman, Ali (8)
Adewumi, Tosin, 1978 ... (7)
Grund Pihlgren, Gust ... (7)
Alonso, Pedro, 1986- (6)
De, Kanjar (6)
Javed, Saleha, 1990- (6)
Rakesh, Sumit (6)
Delsing, Jerker, 195 ... (5)
Nikolaidou, Konstant ... (5)
Kovács, György, 1984 ... (5)
Saini, Rajkumar (5)
Uchida, Seiichi (5)
Belay, Birhanu (4)
Habtegebrial, Tewodr ... (4)
Nazir, Danish (4)
Gupta, Vibha (4)
Fischer, Andreas (3)
Shafait, Faisal (3)
Brännvall, Rickard, ... (3)
Sabry, Sana Sabah (3)
Ahmad, Riaz (3)
Shridhar, Kumar (3)
Sandin, Fredrik (3)
Belay, Gebeyehu (3)
Pal, Umapada (3)
Chopra, Muskaan (3)
Gupta, Varun (3)
Nilsson, Jacob (3)
Park, Cheol Woo (3)
Taal, Cees (3)
Prabhu, Sameer (3)
visa färre...
Lärosäte
Luleå tekniska universitet (153)
RISE (7)
Umeå universitet (2)
Uppsala universitet (2)
Göteborgs universitet (1)
Örebro universitet (1)
visa fler...
Blekinge Tekniska Högskola (1)
visa färre...
Språk
Engelska (154)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (132)
Teknik (41)
Samhällsvetenskap (5)
Medicin och hälsovetenskap (2)
Humaniora (2)
Lantbruksvetenskap (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy