SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Maguire Jr. Gerald Q. professor 1955 ) "

Sökning: WFRF:(Maguire Jr. Gerald Q. professor 1955 )

  • Resultat 1-10 av 16
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Roozbeh, Amir, 1983- (författare)
  • Realizing Next-Generation Data Centers via Software-Defined “Hardware” Infrastructures and Resource Disaggregation : Exploiting your cache
  • 2021
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • The cloud is evolving due to additional demands introduced by new technological advancements and the wide movement toward digitalization. Moreover, next-generation Data Centers (DCs) and clouds are expected (and need) to become cheaper, more efficient, and capable of offering more predictable services. Aligned with this, this thesis examines the concept of Software-Defined “Hardware” Infrastructure (SDHI) based on hardware resource disaggregation as one possible way of realizing next-generation DCs. This thesis starts with an overview of the functional architecture of a cloud based on SDHI. Following this, a series of use-cases and deployment scenarios enabled by SDHI are discussed along with an exploration of the role of each functional block of SDHI’s architecture, i.e., cloud infrastructure, cloud platforms, cloud execution environments, and applications.This thesis proposes a framework to evaluate the impact of SDHI on the techno-economic efficiency of DCs, explicitly focusing on application profiling, hardware dimensioning, and Total Cost of Ownership (TCO). It then shows that combining resource disaggregation and software-defined capabilities makes DCs less expensive and easier to expand; hence, they can rapidly follow the expected exponential demand growth. Additionally, this thesis elaborates the technologies underlying SDHI, its challenges, and its potential future directions.It is advocated that achieving and maintaining a high level of memory performance is crucial for realizing SDHI & disaggregated DC. Nevertheless, a memory management and Input/Output (I/O) data management scheme suitable for SDHI is proposed and its advantages are shown. This work focuses on the management of Last Level Cache (LLC) in currently available Intel processors, takes advantage of LLC’s Non-Uniform Cache Architectures (NUCA), and investigates how better utilization of LLC can provide higher performance, more predictable response time, and improved isolation between threads. Additionally, this thesis scrutinizes the impact of cache management, specifically Direct Cache Access (DCA), on the performance of I/O intensive applications. The results of an empirical study shows that the proposed memory management scheme enables system designers and developers to optimize systems for I/O intensive applications and highlights some potential changes expected for I/O management in future DC systems.
  •  
2.
  • Barbette, Tom, 1990-, et al. (författare)
  • A High-Speed Load-Balancer Design with Guaranteed Per-Connection-Consistency
  • 2020
  • Ingår i: Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020. - Santa Clara, CA, USA : USENIX Association. ; , s. 667-683
  • Konferensbidrag (refereegranskat)abstract
    • Large service providers use load balancers to dispatch millions of incoming connections per second towards thousands of servers. There are two basic yet critical requirements for a load balancer: uniform load distribution of the incoming connections across the servers and per-connection-consistency (PCC), i.e., the ability to map packets belonging to the same connection to the same server even in the presence of changes in the number of active servers and load balancers. Yet, meeting both these requirements at the same time has been an elusive goal. Today's load balancers minimize PCC violations at the price of non-uniform load distribution.This paper presents Cheetah, a load balancer that supports uniform load distribution and PCC while being scalable, memory efficient, resilient to clogging attacks, and fast at processing packets. The Cheetah LB design guarantees PCC for any realizable server selection load balancing mechanism and can be deployed in both a stateless and stateful manner, depending on the operational needs. We implemented Cheetah on both a software and a Tofino-based hardware switch. Our evaluation shows that a stateless version of Cheetah guarantees PCC, has negligible packet processing overheads, and can support load balancing mechanisms that reduce the flow completion time by a factor of 2–3×.
  •  
3.
  • Barbette, Tom, 1990-, et al. (författare)
  • Cheetah : A High-Speed Programmable Load-Balancer Framework with Guaranteed Per-Connection-Consistency
  • 2022
  • Ingår i: IEEE/ACM Transactions on Networking. - : Institute of Electrical and Electronics Engineers (IEEE). - 1063-6692 .- 1558-2566. ; 30:1, s. 354-367
  • Tidskriftsartikel (refereegranskat)abstract
    • Large service providers use load balancers to dispatch millions of incoming connections per second towards thousands of servers. There are two basic yet critical requirements for a load balancer: uniform load distribution of the incoming connections across the servers, which requires to support advanced load balancing mechanisms, and per-connection-consistency (PCC), i.e, the ability to map packets belonging to the same connection to the same server even in the presence of changes in the number of active servers and load balancers. Yet, simultaneously meeting these requirements has been an elusive goal. Today's load balancers minimize PCC violations at the price of non-uniform load distribution. This paper presents Cheetah, a load balancer that supports advanced load balancing mechanisms and PCC while being scalable, memory efficient, fast at processing packets, and offers comparable resilience to clogging attacks as with today's load balancers. The Cheetah LB design guarantees PCC for any realizable server selection load balancing mechanism and can be deployed in both stateless and stateful manners, depending on operational needs. We implemented Cheetah on both a software and a Tofino-based hardware switch. Our evaluation shows that a stateless version of Cheetah guarantees PCC, has negligible packet processing overheads, and can support load balancing mechanisms that reduce the flow completion time by a factor of 2-3 ×.
  •  
4.
  • Barbette, Tom, 1990-, et al. (författare)
  • Stateless CPU-aware datacenter load-balancing
  • 2020
  • Ingår i: Poster: Stateless CPU-aware datacenter load-balancing. - New York, NY, USA : Association for Computing Machinery (ACM). ; , s. 548-549
  • Konferensbidrag (refereegranskat)abstract
    • Today, datacenter operators deploy Load-balancers (LBs) to efficiently utilize server resources, but must over-provision server resources (by up to 30%) because of load imbalances and the desire to bound tail service latency. We posit one of the reasons for these imbalances is the lack of per-core load statistics in existing LBs. As a first step, we designed CrossRSS, a CPU core-aware LB that dynamically assigns incoming connections to the least loaded cores in the server pool. CrossRSS leverages knowledge of the dispatching by each server's Network Interface Card (NIC) to specific cores to reduce imbalances by more than an order of magnitude compared to existing LBs in a proof-of-concept datacenter environment, processing 12% more packets with the same number of cores.
  •  
5.
  • Bogdanov, Kirill, et al. (författare)
  • Toward Automated Testing of Geo-Distributed Replica Selection Algorithms
  • 2015
  • Ingår i: Computer communication review. - : Association for Computing Machinery (ACM). - 0146-4833 .- 1943-5819. ; 45:4, s. 89-90
  • Tidskriftsartikel (refereegranskat)abstract
    • Many geo-distributed systems rely on a replica selection algorithms to communicate with the closest set of replicas. Unfortunately, the bursty nature of the Internet traffic and ever changing network conditions present a problem in identifying the best choices of replicas. Suboptimal replica choices result in increased response latency and reduced system performance. In this work we present GeoPerf, a tool that tries to automate testing of geo-distributed replica selection algorithms. We used GeoPerf to test Cassandra and MongoDB, two popular data stores, and found bugs in each of these systems.
  •  
6.
  • Farshin, Alireza, 1992-, et al. (författare)
  • Optimizing Intel Data Direct I/O Technology for Multi-hundred-gigabit Networks
  • 2020
  • Ingår i: Proceedings of the Fifteenth EuroSys Conference (EuroSys'20), Heraklion, Crete, Greece, April 27-30, 2020..
  • Konferensbidrag (refereegranskat)abstract
    • Digitalization across society is expected to produce a massive amount of data, leading to the introduction of faster network interconnects. In addition, many Internet services require high throughput and low latency. However, having only faster links does not guarantee throughput or low latency. Therefore, it is essential to perform holistic system optimization to fully take advantage of the faster links to provide high-performance services. Intel Data Direct I/O (DDIO) is a recent technology that was introduced to facilitate the deployment of high-performance services based on fast interconnects. We evaluated the effectiveness of DDIO for multi-hundred-gigabit networks. This paper briefly discusses our findings on DDIO, which show the necessity of optimizing/adapting it to address the challenges of multi-hundred-gigabit-per-second links.
  •  
7.
  • Farshin, Alireza, 1992-, et al. (författare)
  • Reexamining Direct Cache Access to Optimize I/O Intensive Applications for Multi-hundred-gigabit Networks
  • 2020
  • Ingår i: 2020 USENIX Annual Technical Conference (USENIX ATC 20). ; , s. 673-689
  • Konferensbidrag (refereegranskat)abstract
    • Memory access is the major bottleneck in realizing multi-hundred-gigabit networks with commodity hardware, hence it is essential to make good use of cache memory that is a faster, but smaller memory closer to the processor. Our goal is to study the impact of cache management on the performance of I/O intensive applications. Specifically, this paper looks at one of the bottlenecks in packet processing, i.e., direct cache access (DCA). We systematically studied the current implementation of DCA in Intel processors, particularly Data Direct I/O technology (DDIO), which directly transfers data between I/O devices and the processor's cache. Our empirical study enables system designers/developers to optimize DDIO-enabled systems for I/O intensive applications. We demonstrate that optimizing DDIO could reduce the latency of I/O intensive network functions running at 100 Gbps by up to ~30%. Moreover, we show that DDIO causes a 30% increase in tail latencies when processing packets at 200 Gbps, hence it is crucial to selectively inject data into the cache or to explicitly bypass it.
  •  
8.
  • Ghasemirahni, Hamid, 1991-, et al. (författare)
  • Packet Order Matters! Improving Application Performance by Deliberately Delaying Packets
  • 2022
  • Ingår i: Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2022. - : USENIX - The Advanced Computing Systems Association. ; , s. 807-827
  • Konferensbidrag (refereegranskat)abstract
    • Data centers increasingly deploy commodity servers with high-speed network interfaces to enable low-latency communication. However, achieving low latency at high data rates crucially depends on how the incoming traffic interacts with the system's caches. When packets that need to be processed in the same way are consecutive, i.e., exhibit high temporal and spatial locality, caches deliver great benefits.In this paper, we systematically study the impact of temporal and spatial traffic locality on the performance of commodity servers equipped with high-speed network interfaces. Our results show that (i) the performance of a variety of widely deployed applications degrades substantially with even the slightest lack of traffic locality, and (ii) a traffic trace from our organization reveals poor traffic locality as networking protocols, drivers, and the underlying switching/routing fabric spread packets out in time (reducing locality). To address these issues, we built Reframer, a software solution that deliberately delays packets and reorders them to increase traffic locality. Despite introducing μs-scale delays of some packets, we show that Reframer increases the throughput of a network service chain by up to 84% and reduces the flow completion time of a web server by 11% while improving its throughput by 20%.
  •  
9.
  • Girondi, Massimo, et al. (författare)
  • Toward GPU-centric Networking on Commodity Hardware
  • 2024
  • Ingår i: 7th International Workshop on Edge Systems, Analytics and Networking (EdgeSys 2024),  April 22, 2024, Athens, Greece. - New York : ACM Digital Library.
  • Konferensbidrag (refereegranskat)abstract
    • GPUs are emerging as the most popular accelerator for many applications, powering the core of machine learning applications. In networked GPU-accelerated applications input & output data typically traverse the CPU and the OS network stack multiple times, getting copied across the system’s main memory. These transfers increase application latency and require expensive CPU cycles, reducing the system’s efficiency, and increasing the overall response times. These inefficiencies become of greater importance in latency-bounded deployments, or with high throughput, where copy times could quickly inflate the response time of modern GPUs.We leverage the efficiency and kernel-bypass benefits of RDMA to transfer data in and out of GPUs without using any CPU cycles or synchronization. We demonstrate the ability of modern GPUs to saturate a 100-Gbps link, and evaluate the network processing timein the context of an inference serving application.
  •  
10.
  • Katsikas, Georgios P., 1987-, et al. (författare)
  • Metron : NFV service chains at the true speed of the underlying hardware
  • 2019
  • Konferensbidrag (refereegranskat)abstract
    • In this paper we present Metron, a Network Functions Virtualization (NFV) platform that achieves high resource utilization by jointly exploiting the underlying network and commodity servers’ resources. This synergy allows Metron to: (i) offload part of the packet processing logic to the network, (ii) use smart tagging to setup and exploit the affinity of traffic classes, and (iii) use tag-based hardware dispatching to carry out the remaining packet processing at the speed of the servers’ fastest cache(s), with zero intercore communication. Metron also introduces a novel resource allocation scheme that minimizes the resource allocation overhead for large-scale NFV deployments. With commodity hardware assistance, Metron deeply inspects traffic at 40 Gbps and realizes stateful network functions at the speed of a 100 GbE network card on a single server. Metron has 2.75-6.5x better efficiency than OpenBox, a state of the art NFV system, while ensuring key requirements such as elasticity, fine-grained load balancing, and flexible traffic steering
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 16

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy