SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Arelakis Angelos 1984) "

Sökning: WFRF:(Arelakis Angelos 1984)

  • Resultat 1-11 av 11
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Angerd, Alexandra, 1988, et al. (författare)
  • GBDI: Going Beyond Base-Delta-Immediate Compression with Global Bases
  • 2022
  • Ingår i: Proceedings - International Symposium on High-Performance Computer Architecture. - 1530-0897. - 9781665420273 ; 2022-April, s. 1115-1127
  • Konferensbidrag (refereegranskat)abstract
    • Memory bandwidth is limiting performance for many emerging applications. While compression techniques can unlock a higher memory bandwidth, prior art offers only modestly better bandwidth. This paper contributes with a new compression method - Global Base Delta Immediate compression (GBDI) - that offers substantially higher memory bandwidth by, unlike prior art, selecting base values across memory blocks. GBDI uses a novel clustering algorithm through data analysis in the background. The presented accelerator infrastructure offers low area overhead and latency. This paper shows that GBDI offers a compression ratio of 2.3×, and yields 1.5× higher bandwidth and 1.1× higher performance compared with a baseline without compression support, on average, for SPEC2017 benchmarks requiring medium to high memory bandwidth.
  •  
2.
  • Arelakis, Angelos, 1984, et al. (författare)
  • A Cache System and a Method of Operating a Cache
  • 2016
  • Patent (övrigt vetenskapligt/konstnärligt)abstract
    • n one embodiment, a computer cache is extended with structures that can (1) establish the frequency by which distinct values occur in the cache and use that information to (2) compress values in caches into dense codes using a plurality of statistical-based compression techniques and (3) decompress densely coded values to realize caches that can store information densely that can be retrieved with low overhead.
  •  
3.
  •  
4.
  • Arelakis, Angelos, 1984, et al. (författare)
  • A Case for a Value-Aware Cache
  • 2014
  • Ingår i: IEEE Computer Architecture Letters. - 1556-6056 .- 1556-6064. ; 13:1, s. 1-4
  • Tidskriftsartikel (refereegranskat)abstract
    • Replication of values causes poor utilization of on-chip cache memory resources. This paper addresses the question: How much cache resources can be theoretically and practically saved if value replication is eliminated? We introduce the concept of value-aware caches and show that a sixteen times smaller value-aware cache can yield the same miss rate as a conventional cache. We then make a case for a value-aware cache design using Huffman-based compression. Since the value set is rather stable across the execution of an application, one can afford to reconstruct the coding tree in software. The decompression latency is kept short by our proposed novel pipelined Huffman decoder that uses canonical codewords. While the (loose) upper-bound compression factor is 5.2X, we show that, by eliminating cache-block alignment restrictions, it is possible to achieve a compression factor of 3.4X for practical designs.
  •  
5.
  • Arelakis, Angelos, 1984 (författare)
  • Design Considerations of Value-aware Caches
  • 2013
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • On-chip cache memories are instrumental in tackling several performance and energy issues facing contemporary and future microprocessor chip architectures. First, they are key to bridge the growing speed-gap between memory and processors. Second, as the bandwidth into the chip is not keeping up the pace with the growth in processing performance, on-chip caches are essential in keeping the bandwidth demands within the limits. Finally, since off-chip memory accesses consume a substantial amount of energy, larger on-chip caches can potentially bring down energy wastage for off-chip accesses. Hence, techniques to improve on-chip cache utilization are important.This thesis shows that value replication -- the same value is replicated in multiple memory locations -- is an important source to improve utilization of cache/memory capacity. The thesis establishes through experimentation that many applications exhibit a high value locality and when it is exploited by storing each unique memory value exactly once, compression factors beyond 16X can be achieved. The proposed cache compression techniques build on this opportunity by encoding replicated values. While cache compression techniques in the past manage to code frequent values densely, they trade off a high compression ratio for low decompression latency, thus missing opportunities to utilize on-chip cache capacity more effectively.The thesis further analyses design considerations when realising a practical value-aware cache that accommodates statistical-based compression and presents, for the first time, a detailed design-space exploration of statistical-based cache compression. It is shown that more aggressive, statistical-based compression approaches, such as Huffman coding, that have been excluded in the past due to the processing overhead for compression and decompression, are prime candidates for cache and memory compression. In this thesis, I find that, even though more processing-intensive decompression affects the cache-hit time of last-level caches, modern out-of-order cores can typically hide the decompression latency successfully. Moreover, the impact of statistics acquisition to generate new codewords is also low because value locality varies little over time so new encodings need to be generated rarely making it possible to off-load it to software routines. Interestingly, the high compression ratio obtained by statistical-based cache compression is shown to improve cache capacity by close to three times which for cache-intensive workloads results in significant performance gains (20% on average) and substantial energy savings (the saved energy may be even 10 times larger than the total energy overheads) by reducing the off-chip use.
  •  
6.
  • Arelakis, Angelos, 1984, et al. (författare)
  • HyComp: A Hybrid Cache Compression Method for Selection of Data-Type-Specific Compression Methods
  • 2015
  • Ingår i: 48th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2015, Waikiki, United States, 5-9 December 2015. - New York, NY, USA : ACM. - 1072-4451. - 9781450340342 ; , s. 38-49
  • Konferensbidrag (refereegranskat)abstract
    • Proposed cache compression schemes make design-time assumptions on value locality to reduce decompression latency. For example, some schemes assume that common values are spatially close whereas other schemes assume that null blocks are common. Most schemes, however, assume that value locality is best exploited by fixed-size data types (e.g., 32-bit integers). This assumption falls short when other data types, such as floating-point numbers, are common. This paper makes two contributions. First, HyComp - a hybrid cache compression scheme - selects the best-performing compression scheme, based on heuristics that predict data types. Data types considered are pointers, integers, floating-point numbers and the special (and trivial) case of null blocks. Second, this paper contributes with a compression method that exploits value locality in data types with predefined semantic value fields, e.g., as in the exponent and the mantissa in floating-point numbers. We show that HyComp, augmented with the proposed floating-point-number compression method, offers superior performance in comparison with prior art.
  •  
7.
  • Arelakis, Angelos, 1984, et al. (författare)
  • SC2: A statistical compression cache scheme
  • 2014
  • Ingår i: Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA. - 1063-6897. - 9781479943968 ; , s. 145-156
  • Konferensbidrag (refereegranskat)abstract
    • Low utilization of on-chip cache capacity limits performance and wastes energy because of the long latency, limited bandwidth, and energy consumption associated with off-chip memory accesses. Value replication is an important source of low capacity utilization. While prior cache compression techniques manage to code frequent values densely, they trade off a high compression ratio for low decompression latency, thus missing opportunities to utilize capacity more effectively. This paper presents, for the first time, a detailed design-space exploration of caches that utilize statistical compression. We show that more aggressive approaches like Huffman coding, which have been neglected in the past due to the high processing overhead for (de)compression, are suitable techniques for caches and memory. Based on our key observation that value locality varies little over time and across applications, we first demonstrate that the overhead of statistics acquisition for code generation is low because new encodings are needed rarely, making it possible to off-load it to software routines. We then show that the high compression ratio obtained by Huffman-coding makes it possible to utilize the performance benefits of 4X larger last-level caches with about 50% lower power consumption than such larger caches.
  •  
8.
  • Arelakis, Angelos, 1984 (författare)
  • Statistical Compression Cache Designs
  • 2015
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • On-chip caches are essential as they bridge the growing speed-gap between off-chip memory and processors. To this end, processing cores are sacrificed for more cache space in the chip's real estate, possibly affecting the cache access time and power dissipation. An alternative to increase the effective cache capacity without enlarging its size is cache compression. However, the compression and decompression processes required add complexity and latency; especially decompression lies in the critical memory access path. Prior work focuses on methods that target lower decompression latency by sacrificing important gains in compressibility. On the other hand, this thesis focuses on cache designs that exploit more advanced compression methods, i.e., statistical compression.The thesis first contributes with an abstract value-aware cache model, which shows that applications often exhibit value locality, and establishes that ideally, by storing each appeared value exactly once, important compression opportunities open up. Motivated by this, the thesis proposes SC^2, a Huffman-based statistical compression cache design. The thesis tackles the problem of statistics acquisition by building a sampling mechanism in hardware. It finds that value locality is rather stable over long time-periods, hence code generation can be offloaded in software. Then it builds the support for compression and decompression in hardware, deals with practical issues such as cache space management, and finally makes a detailed exploration of statistical compression in the last-level cache. Unfortunately, this approach cannot be straightforwardly applied to data types that contain semantically well-defined data fields. Among such types, the thesis focuses on the common double-precision floating-point data and explores a different avenue to extract value locality by considering the different fields (sign, exponent and mantissa) in isolation. Contrary to prior observations, it is shown that the mantissa exhibits significant value locality if it is further partitioned. Then a novel statistical compression method, called FP-H, tailored for cache compression is proposed. Finally, the thesis makes the observation that none of the compressed cache designs, including state of the art, are hitherto always better than others. Hence the thesis establishes HyComp, a practical cache design that adopts hybrid compression for the first time where one out of data-type specific compression methods is selected through heuristics. HyComp offers robust compressibility across applications that manipulate diverse data types, without affecting decompression but only slightly impacting compression latency.
  •  
9.
  • Eldstål-Ahrens, Albin, 1988, et al. (författare)
  • FlatPack: Flexible Compaction of Compressed Memory
  • 2022
  • Ingår i: Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT. - New York, NY, USA : ACM. - 1089-795X. ; , s. 96-108
  • Konferensbidrag (refereegranskat)abstract
    • The capacity and bandwidth of main memory is an increasingly important factor in computer system performance. Memory compression and compaction have been combined to increase effective capacity and reduce costly page faults. However, existing systems typically maintain compaction at the expense of bandwidth. One major cause of extra traffic in such systems is page overflows, which occur when data compressibility degrades and compressed pages must be reorganized. This paper introduces FlatPack, a novel approach to memory compaction which is able to mitigate this overhead by reorganizing compressed data dynamically with less data movement. Reorganization is carried out by an addition to the memory controller, without intervention from software. FlatPack is able to maintain memory capacity competitive with current state-of-the-art memory compression designs, while reducing mean memory traffic by up to 67%. This yields average improvements in performance and total system energy consumption over existing memory compression solutions of 31-46% and 11-25%, respectively. In total, FlatPack improves on baseline performance and energy consumption by 108% and 40%, respectively, in a single-core system, and 83% and 23%, respectively, in a multi-core system.
  •  
10.
  • Eldstål-Ahrens, Albin, 1988, et al. (författare)
  • L2C: Combining Lossy and Lossless Compression on Memory and I/O
  • 2022
  • Ingår i: Transactions on Embedded Computing Systems. - : Association for Computing Machinery (ACM). - 1558-3465 .- 1539-9087. ; 21:1
  • Tidskriftsartikel (refereegranskat)abstract
    • In this paper we introduce L2C, a hybrid lossy/lossless compression scheme applicable both to the memory subsystem and I/O traffic of a processor chip. L2C employs general-purpose lossless compression and combines it with state of the art lossy compression to achieve compression ratios up to 16:1 and improve the utilization of chip's bandwidth resources. Compressing memory traffic yields lower memory access time, improving system performance and energy efficiency. Compressing I/O traffic offers several benefits for resource-constrained systems, including more efficient storage and networking. We evaluate L2C as a memory compressor in simulation with a set of approximation-tolerant applications. L2C improves baseline execution time by an average of 50\%, and total system energy consumption by 16%. Compared to the lossy and lossless current state of the art memory compression approaches, L2C improves execution time by 9% and 26% respectively, and reduces system energy costs by 3% and 5%, respectively. I/O compression efficacy is evaluated using a set of real-life datasets. L2C achieves compression ratios of up to 10.4:1 for a single dataset and on average about 4:1, while introducing no more than 0.4% error.
  •  
11.
  • Shao, Qi, 1991, et al. (författare)
  • HMComp: Extending Near-Memory Capacity using Compression in Hybrid Memory
  • 2024
  • Ingår i: Proceedings of the International Conference on Supercomputing. ; , s. 74-84
  • Konferensbidrag (refereegranskat)abstract
    • Hybrid memories, especially combining a first-tier near memory using High-Bandwidth Memory (HBM) and a second-tier far memory using DRAM, can realize a large and low cost, high-bandwidth main memory. State-of-the-art hybrid memories typically use a flat hierarchy where blocks are swapped between near and far memory based on bandwidth demands. However, this may cause significant overheads for metadata storage and traffic. While using a fixed-size, near-memory cache and compressing data in near memory can help, precious near-memory capacity is still wasted by the cache and the metadata needed to manage a compressed hybrid memory. This paper proposes HMComp, a flat hybrid-memory architecture, in which compression techniques free up near-memory capacity to be used as a cache for far memory data to cut down swap traffic without sacrificing any memory capacity. Moreover, through a carefully crafted metadata layout, we show that metadata can be stored in less costly far memory, thus avoiding to waste any near-memory capacity. Overall, HMComp offers a speedup of single-thread performance of up to 22%, on average 13%, and traffic reduction due to swapping of up to 60% and by 41% on average compared to flat hybrid memory designs.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-11 av 11

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy