SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Niazi Salman) "

Sökning: WFRF:(Niazi Salman)

  • Resultat 1-18 av 18
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Bessani, A., et al. (författare)
  • BiobankCloud : A platform for the secure storage, sharing, and processing of large biomedical data sets
  • 2016
  • Ingår i: 1st International Workshop on Data Management and Analytics for Medicine and Healthcare, DMAH 2015 and Workshop on Big-Graphs Online Querying, Big-O(Q) 2015 held in conjunction with 41st International Conference on Very Large Data Bases, VLDB 2015. - Cham : Springer. - 9783319415758 - 9783319415765 ; , s. 89-105
  • Konferensbidrag (refereegranskat)abstract
    • Biobanks store and catalog human biological material that is increasingly being digitized using next-generation sequencing (NGS). There is, however, a computational bottleneck, as existing software systems are not scalable and secure enough to store and process the incoming wave of genomic data from NGS machines. In the BiobankCloud project, we are building a Hadoop-based platform for the secure storage, sharing, and parallel processing of genomic data. We extended Hadoop to include support for multi-tenant studies, reduced storage requirements with erasure coding, and added support for extensible and consistent metadata. On top of Hadoop, we built a scalable scientific workflow engine featuring a proper workflow definition language focusing on simple integration and chaining of existing tools, adaptive scheduling on Apache Yarn, and support for iterative dataflows. Our platform also supports the secure sharing of data across different, distributed Hadoop clusters. The software is easily installed and comes with a user-friendly web interface for running, managing, and accessing data sets behind a secure 2-factor authentication. Initial tests have shown that the engine scales well to dozens of nodes. The entire system is open-source and includes pre-defined workflows for popular tasks in biomedical data analysis, such as variant identification, differential transcriptome analysis using RNA-Seq, and analysis of miRNA-Seq and ChIP-Seq data.
  •  
2.
  •  
3.
  • Chikafa, Gibson, 1993-, et al. (författare)
  • Cloud-native RStudio on Kubernetes for Hopsworks
  • 2023
  • Annan publikation (övrigt vetenskapligt/konstnärligt)abstract
    • In order to fully benefit from cloud computing, services are designed following the “multi-tenant” architectural model, which is aimed at maximizing resource sharing among users. However, multi-tenancy introduces challenges of security, performance isolation, scaling, and customization. RStudio server is an open-source Integrated Development Environment (IDE) accessible over a web browser for the R programming language. We present the design and implementation of a multi-user distributed system on Hopsworks, a data-intensive AI platform, following the multi-tenant model that provides RStudio as Software as a Service (SaaS). We use the most popular cloud-native technologies: Docker and Kubernetes, to solve the problems of performance isolation, security, and scaling that are present in a multi-tenant environment. We further enable secure data sharing in RStudio server instances to provide data privacy and allow collaboration among RStudio users. We integrate our system with Apache Spark, which can scale and handle Big Data processing workloads. Also, we provide a UI where users can provide custom configurations and have full control of their own RStudio server instances. Our system was tested on a Google Cloud Platform cluster with four worker nodes, each with 30GB of RAM allocated to them. The tests on this cluster showed that 44 RStudio servers, each with 2GB of RAM, can be run concurrently. Our system can scale out to potentially support hundreds of concurrently running RStudio servers by adding more resources (CPUs and RAM) to the cluster or system.
  •  
4.
  • de la Rua Martinez, Javier, et al. (författare)
  • The Hopsworks Feature Store for Machine Learning
  • 2024
  • Ingår i: SIGMOD-Companion 2024 - Companion of the 2024 International Conferaence on Management of Data. - : Association for Computing Machinery (ACM). ; , s. 135-147
  • Konferensbidrag (refereegranskat)abstract
    • Data management is the most challenging aspect of building Machine Learning (ML) systems. ML systems can read large volumes of historical data when training models, but inference workloads are more varied, depending on whether it is a batch or online ML system. The feature store for ML has recently emerged as a single data platform for managing ML data throughout the ML lifecycle, from feature engineering to model training to inference. In this paper, we present the Hopsworks feature store for machine learning as a highly available platform for managing feature data with API support for columnar, row-oriented, and similarity search query workloads. We introduce and address challenges solved by the feature stores related to feature reuse, how to organize data transformations, and how to ensure correct and consistent data between feature engineering, model training, and model inference. We present the engineering challenges in building high-performance query services for a feature store and show how Hopsworks outperforms existing cloud feature stores for training and online inference query workloads.
  •  
5.
  • Gholami, Ali, et al. (författare)
  • Privacy-Preservation for Publishing Sample Availability Data with Personal Identifiers
  • 2015
  • Ingår i: Journal of medical and bioengineering. - : EJournal Publishing. - 2301-3796. ; 4:2, s. 117-125
  • Tidskriftsartikel (refereegranskat)abstract
    • Medical organizations collect, store and process vast amounts of sensitive information about patients. Easy access to this information by researchers is crucial to improving medical research, but in many institutions, cumbersome security measures and walled-gardens have created a situation where even information about what medical data is out there is not available. One of the main security challenges in this area, is enabling researchers to cross-link different medical studies, while preserving the privacy of the patients involved. In this paper, we introduce a privacy-preserving system for publishing sample availability data that allows researchers to make queries that crosscut different studies. That is, researchers can ask questions such as how many patients have had both diabetes and prostate cancer, where the diabetes and prostate cancer information originates from different clinical registries. We realize our solution by having a two-level anonymiziation mechanism, where our toolkit for publishing availability data first pseudonymizes personal identifiers and then anonymizes sensitive attributes. Our toolkit also includes a web-based server that stores the encrypted pseudonymized sample data and allows researchers to execute cross-linked queries across different study data. We believe that our toolkit contributes a first step to support the privacy preserving publication of data containing personal identifiers.
  •  
6.
  • Ismail, Mahmoud, et al. (författare)
  • Distributed Hierarchical File Systems strike back in the Cloud
  • 2020
  • Ingår i: 2020 IEEE 40th international conference on distributed computing systems (ICDCS). - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 820-830
  • Konferensbidrag (refereegranskat)abstract
    • Cloud service providers have aligned on availability zones as an important unit of failure and replication for storage systems. An availability zone (AZ) has independent power, networking, and cooling systems and consists of one or more data centers. Multiple AZs in close geographic proximity form a region that can support replicated low latency storage services that can survive the failure of one or more AZs. Recent reductions in inter-AZ latency have made synchronous replication protocols increasingly viable, instead of traditional quorum-based replication protocols. We introduce HopsFS-CL, a distributed hierarchical file system with support for high-availability (HA) across AZs, backed by AZ-aware synchronously replicated metadata and AZ-aware block replication. HopsFS-CL is a redesign of HopsFS, a version of HDFS with distributed metadata, and its design involved making replication protocols and block placement protocols AZ-aware at all layers of its stack: the metadata serving, the metadata storage, and block storage layers. In experiments on a real-world workload from Spotify, we show that HopsFS-CL, deployed in HA mode over 3 AZs, reaches 1.66 million ops/s, and has similar performance to HopsFS when deployed in a single AZ, while preserving the same semantics.
  •  
7.
  • Ismail, Mahmoud, et al. (författare)
  • HopsFS-S3 : Extending Object Stores with POSIX-like Semantics and more (industry track)
  • 2020
  • Ingår i: Proceedings of the 2020 21st international middleware conference industrial track (Middleware industry '20). - New York, NY, USA : Association for Computing Machinery (ACM). ; , s. 23-30
  • Konferensbidrag (refereegranskat)abstract
    • Object stores have become the de-facto platform for storage in the cloud due to their scalability, high availability, and low cost. However, they provide weaker metadata semantics and lower performance compared to distributed hierarchical file systems. In this paper, we introduce HopsFS-S3, a hybrid distributed hierarchical file system backed by an object store while preserving the file system's strong consistency semantics. We base our implementation on HopsFS, a next-generation distribution of HDFS with distributed metadata. We redesigned HopsFS' block storage layer to transparently use an object store to store the file's blocks without sacrificing the file system's semantics. We also introduced a new block caching service to leverage faster NVMe storage for hot blocks. In our experiments, we show that HopsFS-S3 outperforms EMRFS for IO-bound workloads, with up to 20% higher performance and delivers up to 3.4X the aggregated read throughput of EMRFS. Moreover, we demonstrate that metadata operations on HopsFS-S3 (such as directory rename) are up to two orders of magnitude faster than EMRFS. Finally, HopsFS-S3 opens up the currently closed metadata in object stores, enabling correctly-ordered change notifications with HopsFS' change data capture (CDC) API and customized extensions to metadata.
  •  
8.
  • Ismail, Mahmoud, et al. (författare)
  • Scalable Block Reporting for HopsFS
  • 2019
  • Ingår i: 2019 IEEE International Congress on Big Data (BigData Congress). - 9781728127712 ; , s. 157-164
  • Konferensbidrag (refereegranskat)abstract
    • Distributed hierarchical file systems typically de- couple the storage of the file system’s metadata from the data (file system blocks) to enable the scalability of the file system. This decoupling, however, requires the introduction of a periodic synchronization protocol to ensure the consistency of the file system’s metadata and its blocks. Apache HDFS and HopsFS implement a protocol, called block reporting, where each data server periodically sends ground truth information about all its file system blocks to the metadata servers, allowing the metadata to be synchronized with the actual state of the data blocks in the file system. The network and processing overhead of the existing block reporting protocol, however, increases with cluster size, ultimately limiting cluster scalability. In this paper, we introduce a new block reporting protocol for HopsFS that reduces the protocol bandwidth and processing overhead by up to three orders of magnitude, compared to HDFS/HopsFS’ existing protocol. Our new protocol removes a major bottleneck that prevented HopsFS clusters scaling to tens of thousands of servers.
  •  
9.
  • Ismail, Mahmoud, et al. (författare)
  • Scaling HDFS to more than 1 million operations per second with HopsFS
  • 2017
  • Ingår i: Proceedings - 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017. - : Institute of Electrical and Electronics Engineers Inc.. - 9781509066100 ; , s. 683-688
  • Konferensbidrag (refereegranskat)abstract
    • HopsFS is an open-source, next generation distribution of the Apache Hadoop Distributed File System(HDFS) that replaces the main scalability bottleneck in HDFS, single node in-memory metadata service, with a no-sharedstate distributed system built on a NewSQL database. By removing the metadata bottleneck in Apache HDFS, HopsFS enables significantly larger cluster sizes, more than an order of magnitude higher throughput, and significantly lower clientlatencies for large clusters. In this paper, we detail the techniques and optimizations that enable HopsFS to surpass 1 million file system operations per second-at least 16 times higher throughput than HDFS. In particular, we discuss how we exploit recent high performance features from NewSQL databases, such as application defined partitioning, partition-pruned index scans, and distribution aware transactions. Together with more traditional techniques, such as batching and write-Ahead caches, we show how many incremental optimizations have enabled a revolution in distributed hierarchical file system performance.
  •  
10.
  •  
11.
  • Naqvi, Salman Raza, et al. (författare)
  • Potential of biomass for bioenergy in Pakistan based on present case and future perspectives
  • 2018
  • Ingår i: Renewable & sustainable energy reviews. - : Elsevier. - 1364-0321 .- 1879-0690. ; 81:1, s. 1247-1258
  • Forskningsöversikt (refereegranskat)abstract
    • Future energy security and environmental issues are major driving forces for increased biomass utilization globally and especially in developing countries like Pakistan. For efficient utilization of indigenous biomass resources in the future energy mix, it is important to gain knowledge of current energy system in various sectors. Some of the technologies and initiatives are under development to achieve transition from non-renewable resources to renewable resources, and reducing fossil fuel dependency and greenhouse gas emissions. Recently, number of proposals has been presented for the development of sustainable biofuels production methods for promise for accelerating a shift away from an unsustainable approach to possible sustainable production practices or a sustainable social, economic and environment. This article presents an extensive literature review of the biomass-based renewable energy potential in Pakistan based on current energy scenario and future perspectives. It also highlights the availability of the indigenous and local biomass resources and potential biomass conversion technologies to convert such resources to bioenergy. The drivers for utilization of indigenous biomass resources in future energy mix and challenges regarding awareness among stakeholders and R & D to fill knowledge gaps are economically restraints. The article concludes with suggestions on future directions and policies for effective implementation of biomass based renewable energy production.
  •  
12.
  • Naqvi, Salman Raza, et al. (författare)
  • Pyrolysis of high-ash sewage sludge : Thermo-kinetic study using TGA and artificial neural networks
  • 2018
  • Ingår i: Fuel. - Oxon, UK : Elsevier Ltd. - 0016-2361 .- 1873-7153. ; 233, s. 529-538
  • Tidskriftsartikel (refereegranskat)abstract
    • Pyrolysis of high-ash sewage sludge (HASS) is a considered as an effective method and a promising way for energy production from solid waste of wastewater treatment facilities. The main purpose of this work is to build knowledge on pyrolysis mechanisms, kinetics, thermos-gravimetric analysis of high-ash (44.6%) sewage sludge using model-free methods & results validation with artificial neural network (ANN). TG-DTG curves at 5,10 and 20 °C/min showed the pyrolysis zone was divided into three zone. In kinetics, E values of models ranges are; Friedman (10.6–306.2 kJ/mol), FWO (45.6–231.7 kJ/mol), KAS (41.4–232.1 kJ/mol) and Popescu (44.1–241.1 kJ/mol) respectively. ΔH and ΔG values predicted by OFW, KAS and Popescu method are in good agreement and ranged from (41–236 kJ/mol) and 53–304 kJ/mol, respectively. Negative value of ΔS showed the non-spontaneity of the process. An artificial neural network (ANN) model of 2 * 5 * 1 architecture was employed to predict the thermal decomposition of high-ash sewage sludge, showed a good agreement between the experimental values and predicted values (R2 ⩾ 0.999) are much closer to 1. Overall, the study reflected the significance of ANN model that could be used as an effective fit model to the thermogravimetric experimental data. © 2018 Elsevier Ltd
  •  
13.
  • Niazi, Salman, 1982-, et al. (författare)
  • HopsFS : Scaling Hierarchical File System Metadata Using NewSQL Databases
  • 2017
  • Ingår i: 15th USENIX Conference on File and Storage Technologies, FAST 2017, Santa Clara, CA, USA, February 27 - March 2, 2017. - : USENIX Association. ; , s. 89-103
  • Konferensbidrag (refereegranskat)abstract
    • Recent improvements in both the performance and scalability of shared-nothing, transactional, in-memory NewSQL databases have reopened the research question of whether distributed metadata for hierarchical file systems can be managed using commodity databases. In this paper, we introduce HopsFS, a next generation distribution of the Hadoop Distributed File System (HDFS) that replaces HDFS’ single node in-memory metadata service, with a distributed metadata service built on a NewSQL database. By removing the metadata bottleneck, HopsFS enables an order of magnitude larger and higher throughput clusters compared to HDFS. Metadata capacity has been increased to at least 37 times HDFS’ capacity, and in experiments based on a workload trace from Spotify, we show that HopsFS supports 16 to 37 times the throughput of Apache HDFS. HopsFS also has lower latency for many concurrent clients, and no downtime during failover. Finally, as metadata is now stored in a commodity database, it can be safely extended and easily exported to external systems for online analysis and free-text search.
  •  
14.
  • Niazi, Salman, 1982-, et al. (författare)
  • HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases
  • 2019
  • Ingår i: Encyclopedia of Big Data Technologies. - Cham : Springer. - 9783319775241 - 9783319775258 ; , s. 16-32
  • Bokkapitel (refereegranskat)abstract
    • Modern NewSQL database systems can be used to store fully normalized metadata for distributed hierarchical file systems, and provide high throughput and low operational latencies for the file system operations.
  •  
15.
  • Niazi, Salman, 1982-, et al. (författare)
  • Leader Election Using NewSQL Database Systems
  • 2015
  • Ingår i: Distributed Applications and Interoperable Systems. - France : Springer. - 9783319191294 - 9783319191287 ; , s. 158-172
  • Konferensbidrag (refereegranskat)abstract
    • Leader election protocols are a fundamental building blockfor replicated distributed services. They ease the design of leader-basedcoordination protocols that tolerate failures. In partially synchronoussystems, designing a leader election algorithm, that does not permit mul-tiple leaders while the system is unstable, is a complex task. As a resultmany production systems use third-party distributed coordination ser-vices, such as ZooKeeper and Chubby, to provide a reliable leader electionservice. However, adding a third-party service such as ZooKeeper to adistributed system incurs additional operational costs and complexity.ZooKeeper instances must be kept running on at least three machinesto ensure its high availability. In this paper, we present a novel leaderelection protocol using NewSQL databases for partially synchronous sys-tems, that ensures at most one leader at any given time. The leaderelection protocol uses the database as distributed shared memory. Ourwork enables distributed systems that already use NewSQL databasesto save the operational overhead of managing an additional third-partyservice for leader election. Our main contribution is the design, imple-mentation and validation of a practical leader election algorithm, basedon NewSQL databases, that has performance comparable to a leaderelection implementation using a state-of-the-art distributed coordinationservice, ZooKeeper
  •  
16.
  • Niazi, Salman, 1982- (författare)
  • Scaling Distributed Hierarchical File Systems Using NewSQL Databases
  • 2018
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • For many years, researchers have investigated the use of database technology to manage file system metadata, with the goal of providing extensible typed metadata and support for fast, rich metadata search. However, earlier attempts failed mainly due to the reduced performance introduced by adding database operations to the file system’s critical path. Recent improvements in the performance of distributed in-memory online transaction processing databases (NewSQL databases) led us to re-investigate the possibility of using a database to manage file system metadata, but this time for a distributed, hierarchical file system, the Hadoop Distributed File System (HDFS). The single-host metadata service of HDFS is a well-known bottleneck for both the size of the HDFS clusters and their throughput.In this thesis, we detail the algorithms, techniques, and optimizations used to develop HopsFS, an open-source, next-generation distribution of the HDFS that replaces the main scalability bottleneck in HDFS, single node in-memory metadata service, with a no-shared state distributed system built on a NewSQL database. In particular, we discuss how we exploit recent high-performance features from NewSQL databases, such as application-defined partitioning, partition pruned index scans, and distribution aware transactions, as well as more traditional techniques such as batching and write-ahead caches, to enable a revolution in distributed hierarchical file system performance.HDFS’ design is optimized for the storage of large files, that is, files ranging from megabytes to terabytes in size. However, in many production deployments of the HDFS, it has been observed that almost 20% of the files in the system are less than 4 KB in size and as much as 42% of all the file system operations are performed on files less than 16 KB in size. HopsFS introduces a tiered storage solution to store files of different sizes more efficiently. The tiers range from the highest tier where an in-memory NewSQL database stores very small files (<1 KB), to the next tier where small files (<64 KB) are stored in solid-state-drives (SSDs), also using a NewSQL database, to the largest tier, the existing Hadoop block storage layer for very large files. Our approach is based on extending HopsFS with an inode stuffing technique, where we embed the contents of small files with the metadata and use database transactions and database replication guarantees to ensure the availability, integrity, and consistency of the small files. HopsFS enables significantly larger cluster sizes, more than an order of magnitude higher throughput, and significantly lower client latencies for large clusters.Lastly, coordination is an integral part of the distributed file system operations protocols. We present a novel leader election protocol for partially synchronous systems that uses NewSQL databases as shared memory. Our work enables HopsFS, that uses a NewSQL database to save the operational overhead of managing an additional third-party service for leader election and deliver performance comparable to a leader election implementation using a state-of-the-art distributed coordination service, ZooKeeper.
  •  
17.
  • Niazi, Salman, 1982-, et al. (författare)
  • Size Matters : Improving the Performance of Small Files in Hadoop
  • 2018
  • Konferensbidrag (refereegranskat)abstract
    • The Hadoop Distributed File System (HDFS) is designed to handle massive amounts of data, preferably stored in very large files. The poor performance of HDFS in managing small files has long been a bane of the Hadoop community. In many production deployments of HDFS, almost 25% of the files are less than 16 KB in size and as much as 42% of all the file system operations are performed on these small files. We have designed an adaptive tiered storage using in-memory and on-disk tables stored in a high-performance distributed database to efficiently store and improve the performance of the small files in HDFS. Our solution is completely transparent, and it does not require any changes in the HDFS clients or the applications using the Hadoop platform. In experiments, we observed up to 61~times higher throughput in writing files, and for real-world workloads from Spotify our solution reduces the latency of reading and writing small files by a factor of 3.15 and 7.39 respectively.
  •  
18.
  • Thomas, HS, et al. (författare)
  • 2019
  • swepub:Mat__t
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-18 av 18

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy