SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Sheikholeslami Sina 1993 ) "

Sökning: WFRF:(Sheikholeslami Sina 1993 )

  • Resultat 1-8 av 8
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Chikafa, Gibson, 1993-, et al. (författare)
  • Cloud-native RStudio on Kubernetes for Hopsworks
  • 2023
  • Annan publikation (övrigt vetenskapligt/konstnärligt)abstract
    • In order to fully benefit from cloud computing, services are designed following the “multi-tenant” architectural model, which is aimed at maximizing resource sharing among users. However, multi-tenancy introduces challenges of security, performance isolation, scaling, and customization. RStudio server is an open-source Integrated Development Environment (IDE) accessible over a web browser for the R programming language. We present the design and implementation of a multi-user distributed system on Hopsworks, a data-intensive AI platform, following the multi-tenant model that provides RStudio as Software as a Service (SaaS). We use the most popular cloud-native technologies: Docker and Kubernetes, to solve the problems of performance isolation, security, and scaling that are present in a multi-tenant environment. We further enable secure data sharing in RStudio server instances to provide data privacy and allow collaboration among RStudio users. We integrate our system with Apache Spark, which can scale and handle Big Data processing workloads. Also, we provide a UI where users can provide custom configurations and have full control of their own RStudio server instances. Our system was tested on a Google Cloud Platform cluster with four worker nodes, each with 30GB of RAM allocated to them. The tests on this cluster showed that 44 RStudio servers, each with 2GB of RAM, can be run concurrently. Our system can scale out to potentially support hundreds of concurrently running RStudio servers by adding more resources (CPUs and RAM) to the cluster or system.
  •  
2.
  • Angelovska, Marina, et al. (författare)
  • Siamese Neural Networks for Detecting Complementary Products
  • 2021
  • Ingår i: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. - : Association for Computational Linguistics. ; , s. 65-70
  • Konferensbidrag (refereegranskat)abstract
    • Recommender systems play an important role in e-commerce websites as they improve the customer journey by helping the users find what they want at the right moment. In this paper, we focus on identifying a complementary relationship between the products of an e-commerce company. We propose a content-based recommender system for detecting complementary products, using Siamese Neural Networks (SNN). To this end, we implement and compare two different models: Siamese Convolutional Neural Network (CNN) and Siamese Long Short-Term Memory (LSTM). Moreover, we propose an extension of the SNN approach to handling millions of products in a matter of seconds, and we reduce the training time complexity by half. In the experiments, we show that Siamese LSTM can predict complementary products with an accuracy of ~85% using only the product titles.
  •  
3.
  • Asratyan, Albert, et al. (författare)
  • A Parallel Chain Mail Approach for Scalable Spatial Data Interpolation
  • 2021
  • Ingår i: 2021 IEEE International Conference on Big Data (Big Data). - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 306-314
  • Konferensbidrag (refereegranskat)abstract
    • Deteriorating air quality is a growing concern that has been linked to many health-related issues. Its monitoring is a good first step to understanding the problem. However, it is not always possible to collect air quality data from every location. Various data interpolation techniques are used to assist with populating sparse maps with more context, but many of these algorithms are computationally expensive. This work introduces a three-step Chain Mail algorithm that uses kriging (without any modifications to the base algorithm) and achieves up to ×100 execution time improvement with minimal accuracy loss (relative RMSE of 3%) by running concurrent interpolation executions. This approach can be described as a multiple-step parallel interpolation algorithm that includes specific regional border data manipulation for achieving greater accuracy. It does so by interpolating geographically defined data chunks in parallel and sharing the results with their neighboring nodes to provide context and compensate for lack of knowledge of the surrounding areas. Combined with a serverless cloud architecture, this approach opens doors to interpolating large data sets in a matter of minutes while remaining cost-efficient. The effectiveness of the three-step Chain Mail approach depends on the equal point distribution among all nodes and the resolution of the parallel configuration. In general, it offers a good balance between execution speed and accuracy.
  •  
4.
  • Hagos, Desta Haileselassie, et al. (författare)
  • ExtremeEarth Meets Satellite Data From Space
  • 2021
  • Ingår i: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. - : Institute of Electrical and Electronics Engineers (IEEE). - 1939-1404 .- 2151-1535. ; 14, s. 9038-9063
  • Tidskriftsartikel (refereegranskat)abstract
    • Bringing together a number of cutting-edge technologies that range from storing extremely large volumes of data all the way to developing scalable machine learning and deep learning algorithms in a distributed manner and having them operate over the same infrastructure poses unprecedented challenges. One of these challenges is the integration of European Space Agency (ESA)'s Thematic Exploitation Platforms (TEPs) and data information access service platforms with a data platform, namely Hopsworks, which enables scalable data processing, machine learning, and deep learning on Copernicus data, and development of very large training datasets for deep learning architectures targeting the classification of Sentinel images. In this article, we present the software architecture of ExtremeEarth that aims at the development of scalable deep learning and geospatial analytics techniques for processing and analyzing petabytes of Copernicus data. The ExtremeEarth software infrastructure seamlessly integrates existing and novel software platforms and tools for storing, accessing, processing, analyzing, and visualizing large amounts of Copernicus data. New techniques in the areas of remote sensing and artificial intelligence with an emphasis on deep learning are developed. These techniques and corresponding software presented in this article are to be integrated with and used in two ESA TEPs, namely Polar and Food Security TEPs. Furthermore, we present the integration of Hopsworks with the Polar and Food Security use cases and the flow of events for the products offered through the TEPs.
  •  
5.
  • Hagos, Desta Haileselassie, et al. (författare)
  • Scalable Artificial Intelligence for Earth Observation Data Using Hopsworks
  • 2022
  • Ingår i: Remote Sensing. - : MDPI AG. - 2072-4292. ; 14:8
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper introduces the Hopsworks platform to the entire Earth Observation (EO) data community and the Copernicus programme. Hopsworks is a scalable data-intensive open-source Artificial Intelligence (AI) platform that was jointly developed by Logical Clocks and the KTH Royal Institute of Technology for building end-to-end Machine Learning (ML)/Deep Learning (DL) pipelines for EO data. It provides the full stack of services needed to manage the entire life cycle of data in ML. In particular, Hopsworks supports the development of horizontally scalable DL applications in notebooks and the operation of workflows to support those applications, including parallel data processing, model training, and model deployment at scale. To the best of our knowledge, this is the first work that demonstrates the services and features of the Hopsworks platform, which provide users with the means to build scalable end-to-end ML/DL pipelines for EO data, as well as support for the discovery and search for EO metadata. This paper serves as a demonstration and walkthrough of the stages of building a production-level model that includes data ingestion, data preparation, feature extraction, model training, model serving, and monitoring. To this end, we provide a practical example that demonstrates the aforementioned stages with real-world EO data and includes source code that implements the functionality of the platform. We also perform an experimental evaluation of two frameworks built on top of Hopsworks, namely Maggy and AutoAblation. We show that using Maggy for hyperparameter tuning results in roughly half the wall-clock time required to execute the same number of hyperparameter tuning trials using Spark while providing linear scalability as more workers are added. Furthermore, we demonstrate how AutoAblation facilitates the definition of ablation studies and enables the asynchronous parallel execution of ablation trials.
  •  
6.
  • Meister, Moritz, et al. (författare)
  • Maggy : Scalable Asynchronous Parallel Hyperparameter Search
  • 2020
  • Ingår i: Proceedings of the 1st Workshop on Distributed Machine Learning. - New York, NY, USA : Association for Computing Machinery. ; , s. 28-33
  • Konferensbidrag (refereegranskat)abstract
    • Running extensive experiments is essential for building Machine Learning (ML) models. Such experiments usually require iterative execution of many trials with varying run times. In recent years, Apache Spark has become the de-facto standard for parallel data processing in the industry, in which iterative processes are implemented within the bulk-synchronous parallel (BSP) execution model. The BSP approach is also being used to parallelize ML trials in Spark. However, the BSP task synchronization barriers prevent asynchronous execution of trials, which leads to a reduced number of trials that can be run on a given computational budget. In this paper, we introduce Maggy, an open-source framework based on Spark, to execute ML trials asynchronously in parallel, with the ability to early stop poorly performing trials. In the experiments, we compare Maggy with the BSP execution of parallel trials in Spark and show that on random hyperparameter search on a convolutional neural network for the Fashion-MNIST dataset Maggy reduces the required time to execute a fixed number of trials by 33% to 58%, without any loss in the final model accuracy.
  •  
7.
  • Sheikholeslami, Sina, 1993-, et al. (författare)
  • AutoAblation: Automated Parallel Ablation Studies for Deep Learning
  • 2021
  • Ingår i: EuroMLSys '21: Proceedings of the 1st Workshop on Machine Learning and Systems. - New York, NY, USA : Association for Computing Machinery. ; , s. 55-61
  • Konferensbidrag (refereegranskat)abstract
    • Ablation studies provide insights into the relative contribution of different architectural and regularization components to machine learning models' performance. In this paper, we introduce AutoAblation, a new framework for the design and parallel execution of ablation experiments. AutoAblation provides a declarative approach to defining ablation experiments on model architectures and training datasets, and enables the parallel execution of ablation trials. This reduces the execution time and allows more comprehensive experiments by exploiting larger amounts of computational resources. We show that AutoAblation can provide near-linear scalability by performing an ablation study on the modules of the Inception-v3 network trained on the TenGeoPSAR dataset.  
  •  
8.
  • Sheikholeslami, Sina, 1993-, et al. (författare)
  • The Impact of Importance-Aware Dataset Partitioning on Data-Parallel Training of Deep Neural Networks
  • 2023
  • Ingår i: Distributed Applications and Interoperable Systems - 23rd IFIP WG 6.1 International Conference, DAIS 2023, Held as Part of the 18th International Federated Conference on Distributed Computing Techniques, DisCoTec 2023, Proceedings. - : Springer Nature. ; , s. 74-89
  • Konferensbidrag (refereegranskat)abstract
    • Deep neural networks used for computer vision tasks are typically trained on datasets consisting of thousands of images, called examples. Recent studies have shown that examples in a dataset are not of equal importance for model training and can be categorized based on quantifiable measures reflecting a notion of “hardness” or “importance”. In this work, we conduct an empirical study of the impact of importance-aware partitioning of the dataset examples across workers on the performance of data-parallel training of deep neural networks. Our experiments with CIFAR-10 and CIFAR-100 image datasets show that data-parallel training with importance-aware partitioning can perform better than vanilla data-parallel training, which is oblivious to the importance of examples. More specifically, the proper choice of the importance measure, partitioning heuristic, and the number of intervals for dataset repartitioning can improve the best accuracy of the model trained for a fixed number of epochs. We conclude that the parameters related to importance-aware data-parallel training, including the importance measure, number of warmup training epochs, and others defined in the paper, may be considered as hyperparameters of data-parallel model training.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-8 av 8

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy