SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Niazi Salman 1982 ) "

Sökning: WFRF:(Niazi Salman 1982 )

  • Resultat 1-8 av 8
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Chikafa, Gibson, 1993-, et al. (författare)
  • Cloud-native RStudio on Kubernetes for Hopsworks
  • 2023
  • Annan publikation (övrigt vetenskapligt/konstnärligt)abstract
    • In order to fully benefit from cloud computing, services are designed following the “multi-tenant” architectural model, which is aimed at maximizing resource sharing among users. However, multi-tenancy introduces challenges of security, performance isolation, scaling, and customization. RStudio server is an open-source Integrated Development Environment (IDE) accessible over a web browser for the R programming language. We present the design and implementation of a multi-user distributed system on Hopsworks, a data-intensive AI platform, following the multi-tenant model that provides RStudio as Software as a Service (SaaS). We use the most popular cloud-native technologies: Docker and Kubernetes, to solve the problems of performance isolation, security, and scaling that are present in a multi-tenant environment. We further enable secure data sharing in RStudio server instances to provide data privacy and allow collaboration among RStudio users. We integrate our system with Apache Spark, which can scale and handle Big Data processing workloads. Also, we provide a UI where users can provide custom configurations and have full control of their own RStudio server instances. Our system was tested on a Google Cloud Platform cluster with four worker nodes, each with 30GB of RAM allocated to them. The tests on this cluster showed that 44 RStudio servers, each with 2GB of RAM, can be run concurrently. Our system can scale out to potentially support hundreds of concurrently running RStudio servers by adding more resources (CPUs and RAM) to the cluster or system.
  •  
2.
  • Ismail, Mahmoud, et al. (författare)
  • Scalable Block Reporting for HopsFS
  • 2019
  • Ingår i: 2019 IEEE International Congress on Big Data (BigData Congress). - 9781728127712 ; , s. 157-164
  • Konferensbidrag (refereegranskat)abstract
    • Distributed hierarchical file systems typically de- couple the storage of the file system’s metadata from the data (file system blocks) to enable the scalability of the file system. This decoupling, however, requires the introduction of a periodic synchronization protocol to ensure the consistency of the file system’s metadata and its blocks. Apache HDFS and HopsFS implement a protocol, called block reporting, where each data server periodically sends ground truth information about all its file system blocks to the metadata servers, allowing the metadata to be synchronized with the actual state of the data blocks in the file system. The network and processing overhead of the existing block reporting protocol, however, increases with cluster size, ultimately limiting cluster scalability. In this paper, we introduce a new block reporting protocol for HopsFS that reduces the protocol bandwidth and processing overhead by up to three orders of magnitude, compared to HDFS/HopsFS’ existing protocol. Our new protocol removes a major bottleneck that prevented HopsFS clusters scaling to tens of thousands of servers.
  •  
3.
  • Ismail, Mahmoud, et al. (författare)
  • Scaling HDFS to more than 1 million operations per second with HopsFS
  • 2017
  • Ingår i: Proceedings - 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017. - : Institute of Electrical and Electronics Engineers Inc.. - 9781509066100 ; , s. 683-688
  • Konferensbidrag (refereegranskat)abstract
    • HopsFS is an open-source, next generation distribution of the Apache Hadoop Distributed File System(HDFS) that replaces the main scalability bottleneck in HDFS, single node in-memory metadata service, with a no-sharedstate distributed system built on a NewSQL database. By removing the metadata bottleneck in Apache HDFS, HopsFS enables significantly larger cluster sizes, more than an order of magnitude higher throughput, and significantly lower clientlatencies for large clusters. In this paper, we detail the techniques and optimizations that enable HopsFS to surpass 1 million file system operations per second-at least 16 times higher throughput than HDFS. In particular, we discuss how we exploit recent high performance features from NewSQL databases, such as application defined partitioning, partition-pruned index scans, and distribution aware transactions. Together with more traditional techniques, such as batching and write-Ahead caches, we show how many incremental optimizations have enabled a revolution in distributed hierarchical file system performance.
  •  
4.
  • Niazi, Salman, 1982-, et al. (författare)
  • HopsFS : Scaling Hierarchical File System Metadata Using NewSQL Databases
  • 2017
  • Ingår i: 15th USENIX Conference on File and Storage Technologies, FAST 2017, Santa Clara, CA, USA, February 27 - March 2, 2017. - : USENIX Association. ; , s. 89-103
  • Konferensbidrag (refereegranskat)abstract
    • Recent improvements in both the performance and scalability of shared-nothing, transactional, in-memory NewSQL databases have reopened the research question of whether distributed metadata for hierarchical file systems can be managed using commodity databases. In this paper, we introduce HopsFS, a next generation distribution of the Hadoop Distributed File System (HDFS) that replaces HDFS’ single node in-memory metadata service, with a distributed metadata service built on a NewSQL database. By removing the metadata bottleneck, HopsFS enables an order of magnitude larger and higher throughput clusters compared to HDFS. Metadata capacity has been increased to at least 37 times HDFS’ capacity, and in experiments based on a workload trace from Spotify, we show that HopsFS supports 16 to 37 times the throughput of Apache HDFS. HopsFS also has lower latency for many concurrent clients, and no downtime during failover. Finally, as metadata is now stored in a commodity database, it can be safely extended and easily exported to external systems for online analysis and free-text search.
  •  
5.
  • Niazi, Salman, 1982-, et al. (författare)
  • HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases
  • 2019
  • Ingår i: Encyclopedia of Big Data Technologies. - Cham : Springer. - 9783319775241 - 9783319775258 ; , s. 16-32
  • Bokkapitel (refereegranskat)abstract
    • Modern NewSQL database systems can be used to store fully normalized metadata for distributed hierarchical file systems, and provide high throughput and low operational latencies for the file system operations.
  •  
6.
  • Niazi, Salman, 1982-, et al. (författare)
  • Leader Election Using NewSQL Database Systems
  • 2015
  • Ingår i: Distributed Applications and Interoperable Systems. - France : Springer. - 9783319191294 - 9783319191287 ; , s. 158-172
  • Konferensbidrag (refereegranskat)abstract
    • Leader election protocols are a fundamental building blockfor replicated distributed services. They ease the design of leader-basedcoordination protocols that tolerate failures. In partially synchronoussystems, designing a leader election algorithm, that does not permit mul-tiple leaders while the system is unstable, is a complex task. As a resultmany production systems use third-party distributed coordination ser-vices, such as ZooKeeper and Chubby, to provide a reliable leader electionservice. However, adding a third-party service such as ZooKeeper to adistributed system incurs additional operational costs and complexity.ZooKeeper instances must be kept running on at least three machinesto ensure its high availability. In this paper, we present a novel leaderelection protocol using NewSQL databases for partially synchronous sys-tems, that ensures at most one leader at any given time. The leaderelection protocol uses the database as distributed shared memory. Ourwork enables distributed systems that already use NewSQL databasesto save the operational overhead of managing an additional third-partyservice for leader election. Our main contribution is the design, imple-mentation and validation of a practical leader election algorithm, basedon NewSQL databases, that has performance comparable to a leaderelection implementation using a state-of-the-art distributed coordinationservice, ZooKeeper
  •  
7.
  • Niazi, Salman, 1982- (författare)
  • Scaling Distributed Hierarchical File Systems Using NewSQL Databases
  • 2018
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • For many years, researchers have investigated the use of database technology to manage file system metadata, with the goal of providing extensible typed metadata and support for fast, rich metadata search. However, earlier attempts failed mainly due to the reduced performance introduced by adding database operations to the file system’s critical path. Recent improvements in the performance of distributed in-memory online transaction processing databases (NewSQL databases) led us to re-investigate the possibility of using a database to manage file system metadata, but this time for a distributed, hierarchical file system, the Hadoop Distributed File System (HDFS). The single-host metadata service of HDFS is a well-known bottleneck for both the size of the HDFS clusters and their throughput.In this thesis, we detail the algorithms, techniques, and optimizations used to develop HopsFS, an open-source, next-generation distribution of the HDFS that replaces the main scalability bottleneck in HDFS, single node in-memory metadata service, with a no-shared state distributed system built on a NewSQL database. In particular, we discuss how we exploit recent high-performance features from NewSQL databases, such as application-defined partitioning, partition pruned index scans, and distribution aware transactions, as well as more traditional techniques such as batching and write-ahead caches, to enable a revolution in distributed hierarchical file system performance.HDFS’ design is optimized for the storage of large files, that is, files ranging from megabytes to terabytes in size. However, in many production deployments of the HDFS, it has been observed that almost 20% of the files in the system are less than 4 KB in size and as much as 42% of all the file system operations are performed on files less than 16 KB in size. HopsFS introduces a tiered storage solution to store files of different sizes more efficiently. The tiers range from the highest tier where an in-memory NewSQL database stores very small files (<1 KB), to the next tier where small files (<64 KB) are stored in solid-state-drives (SSDs), also using a NewSQL database, to the largest tier, the existing Hadoop block storage layer for very large files. Our approach is based on extending HopsFS with an inode stuffing technique, where we embed the contents of small files with the metadata and use database transactions and database replication guarantees to ensure the availability, integrity, and consistency of the small files. HopsFS enables significantly larger cluster sizes, more than an order of magnitude higher throughput, and significantly lower client latencies for large clusters.Lastly, coordination is an integral part of the distributed file system operations protocols. We present a novel leader election protocol for partially synchronous systems that uses NewSQL databases as shared memory. Our work enables HopsFS, that uses a NewSQL database to save the operational overhead of managing an additional third-party service for leader election and deliver performance comparable to a leader election implementation using a state-of-the-art distributed coordination service, ZooKeeper.
  •  
8.
  • Niazi, Salman, 1982-, et al. (författare)
  • Size Matters : Improving the Performance of Small Files in Hadoop
  • 2018
  • Konferensbidrag (refereegranskat)abstract
    • The Hadoop Distributed File System (HDFS) is designed to handle massive amounts of data, preferably stored in very large files. The poor performance of HDFS in managing small files has long been a bane of the Hadoop community. In many production deployments of HDFS, almost 25% of the files are less than 16 KB in size and as much as 42% of all the file system operations are performed on these small files. We have designed an adaptive tiered storage using in-memory and on-disk tables stored in a high-performance distributed database to efficiently store and improve the performance of the small files in HDFS. Our solution is completely transparent, and it does not require any changes in the HDFS clients or the applications using the Hadoop platform. In experiments, we observed up to 61~times higher throughput in writing files, and for real-world workloads from Spotify our solution reduces the latency of reading and writing small files by a factor of 3.15 and 7.39 respectively.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-8 av 8

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy