SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Stefanos Kaxiras) srt2:(2020-2024)"

Sökning: WFRF:(Stefanos Kaxiras) > (2020-2024)

  • Resultat 1-10 av 30
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Aimoniotis, Pavlos, et al. (författare)
  • Data-Out Instruction-In (DOIN!) : Leveraging Inclusive Caches to Attack Speculative Delay Schemes
  • 2022
  • Ingår i: 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED 2022). - : Institute of Electrical and Electronics Engineers (IEEE). - 9781665485265 - 9781665485272 ; , s. 49-60
  • Konferensbidrag (refereegranskat)abstract
    • Although the cache has been a known side-channel for years, it has gained renewed notoriety with the introduction of speculative side-channel attacks such as Spectre, which were able to use caches to not just observe a victim, but to leak secrets. Because the cache continues to be one of the most exploitable side channels, it is often the primary target to safeguard in secure speculative execution schemes. One of the simpler secure speculation approaches is to delay speculative accesses whose effect can be observed until they become non-speculative. Delay-on-Miss, for example, delays all observable speculative loads, i.e., the ones that miss in the cache, and preserves the majority of the performance of the baseline (unsafe speculation) by executing speculative loads that hit in the cache, which were thought to be unobservable.However, previous work has failed to consider how instruction fetching can eject cache lines from the shared, lower level caches, and thus from higher cache levels due to inclusivity. In this work, we show how cache conflicts between instruction fetch and data accesses can extend previous attacks and present the following new insights:1. It is possible to use lower level caches to perform Prime+Probe through conflicts resulting from instruction fetching. This is an extension to previous Prime+Probe attacks that potentially avoids other developed mitigation strategies.2. Data-instruction conflicts can be used to perform a Spectre attack that breaks Delay-on-Miss. After acquiring a secret, secret-dependent instruction fetching can cause cache conflicts that result in evictions in the L1D cache, creating observable timing differences. Essentially, it is possible to leak a secret bit-by-bit through the cache, despite Delay-on-Miss defending against caches.We call our new attack Data-Out Instruction-In, DOIN!, and demonstrate it on a real commercial core, the AMD Ryzen 9. We demonstrate how DOIN! interacts with Delay-on-Miss and perform an analysis of noise and bandwidth. Furthermore, we propose a simple defense extension for Delay-on-Miss to maintain its security guarantees, at the cost of negligible performance degradation while executing the Spec06 workloads.
  •  
2.
  • Aimoniotis, Pavlos, et al. (författare)
  • ReCon : Efficient Detection, Management, and Use of Non-Speculative Information Leakage
  • 2023
  • Ingår i: 56th IEEE/ACM International Symposium on Microarchitecture, MICRO 2023. - : Association for Computing Machinery (ACM). - 9798400703294 ; , s. 828-842
  • Konferensbidrag (refereegranskat)abstract
    • In a speculative side-channel attack, a secret is improperly accessed and then leaked by passing it to a transmitter instruction. Several proposed defenses effectively close this security hole by either delaying the secret from being loaded or propagated, or by delaying dependent transmitters (e.g., loads) from executing when fed with tainted input derived from an earlier speculative load. This results in a loss of memory-level parallelism and performance. A security definition proposed recently, in which data already leaked in non-speculative execution need not be considered secret during speculative execution, can provide a solution to the loss of performance. However, detecting and tracking non-speculative leakage carries its own cost, increasing complexity. The key insight of our work that enables us to exploit non-speculative leakage as an optimization to other secure speculation schemes is that the majority of non-speculative leakage is simply due to pointer dereferencing (or base-address indexing) - essentially what many secure speculation schemes prevent from taking place speculatively. We present ReCon that: i) efficiently detects non-speculative leakage by limiting detection to pairs of directly-dependent loads that dereference pointers (or index a base-address); and ii) piggybacks non-speculative leakage information on the coherence protocol. In ReCon, the coherence protocol remembers and propagates the knowledge of what has leaked and therefore what is safe to dereference under speculation. To demonstrate the effectiveness of ReCon, we show how two state-of-the-art secure speculation schemes, Non-speculative Data Access (NDA) and speculative Taint Tracking (STT), leverage this information to enable more memorylevel parallelism both in a single core scenario and in a multicore scenario: NDA with ReCon reduces the performance loss by 28.7% for SPEC2017, 31.5% for SPEC2006, and 46.7% for PARSEC; STT with ReCon reduces the loss by 45.1%, 39%, and 78.6%, respectively.
  •  
3.
  • Aimoniotis, Pavlos, et al. (författare)
  • Reorder Buffer Contention : A Forward Speculative Interference Attack for Speculation Invariant Instructions
  • 2021
  • Ingår i: IEEE COMPUTER ARCHITECTURE LETTERS. - : Institute of Electrical and Electronics Engineers (IEEE). - 1556-6056 .- 1556-6064. ; 20:2, s. 162-165
  • Tidskriftsartikel (refereegranskat)abstract
    • Speculative side-channel attacks access sensitive data and use transmitters to leak the data during wrong-path execution. Various defenses have been proposed to prevent such information leakage. However, not all speculatively executed instructions are unsafe: Recent work demonstrates that speculation invariantinstructions are independent of speculative control-flow paths and are guaranteed to eventually commit, regardless of the speculation outcome. Compile-time information coupled with run-time mechanisms can then selectively lift defenses for speculation invariant instructions, reclaiming some of the lost performance. Unfortunately, speculation invariant instructions can easily be manipulated by a form of speculative interference to leak information via a new side-channel that we introduce in this paper. We show that forward speculative interference where older speculative instructions interfere with younger speculation invariant instructions effectively turns them into transmitters for secret data accessed during speculation. We demonstrate forward speculative interference on actual hardware, by selectively filling the reorder buffer (ROB) with instructions, pushing speculative invariant instructions in-or-out of the ROB on demand, based on a speculatively accessed secret. This reveals the speculatively accessed secret, as the occupancy of the ROB itself becomes a new speculative side-channel.
  •  
4.
  • Alipour, Mehdi, et al. (författare)
  • Delay and Bypass : Ready and Criticality Aware Instruction Scheduling in Out-of-Order Processors
  • 2020
  • Ingår i: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). - 9781728161495 ; , s. 424-434
  • Konferensbidrag (refereegranskat)abstract
    • Flexible instruction scheduling is essential for performance in out-of-order processors. This is typically achieved by using CAM-based Instruction Queues (IQs) that provide complete flexibility in choosing ready instructions for execution, but at the cost of significant scheduling energy.In this work we seek to reduce the instruction scheduling energy by reducing the depth and width of the IQ. We do so by classifying instructions based on their readiness and criticality, and using this information to bypass the IQ for instructions that will not benefit from its expensive scheduling structures and delay instructions that will not harm performance. Combined, these approaches allow us to offload a significant portion of the instructions from the IQ to much cheaper FIFO-based scheduling structures without hurting performance. As a result we can reduce the IQ depth and width by half, thereby saving energy.Our design, Delay and Bypass (DNB), is the first design to explicitly address both readiness and criticality to reduce scheduling energy. By handling both classes we are able to achieve 95% of the baseline out-of-order performance while only using 33% of the scheduling energy. This represents a significant improvement over previous designs which addressed only criticality or readiness (91%/89% performance at 74%/53% energy).
  •  
5.
  • Alipour, Mehdi (författare)
  • Rethinking Dynamic Instruction Scheduling and Retirement for Efficient Microarchitectures
  • 2020
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Out-of-order execution is one of the main micro-architectural techniques used to improve the performance of both single- and multi-threaded processors. The application of such a processor varies from mobile devices to server computers. This technique achieves higher performance by finding independent instructions and hiding execution latency and uses the cycles which otherwise would be wasted or caused a CPU stall. To accomplish this, it uses scheduling resources including the ROB, IQ, LSQ and physical registers, to store and prioritize instructions.The pipeline of an out-of-order processor has three macro-stages: the front-end, the scheduler, and the back-end. The front-end fetches instructions, places them in the out-of-order resources, and analyzes them to prepare for their execution. The scheduler identifies which instructions are ready for execution and prioritizes them for scheduling. The back-end updates the processor state with the results of the oldest completed instructions, deallocates the resources and commits the instructions in the program order to maintain correct execution.Since out-of-order execution needs to be able to choose any available instructions for execution, its scheduling resources must have complex circuits for identifying and prioritizing instructions, which makes them very expansive, therefore, limited. Due to their cost, the scheduling resources are constrained in size. This limited size leads to two stall points respectively at the front-end and the back-end of the pipeline. The front-end can stall due to fully allocated resources and therefore no more new instructions can be placed in the scheduler. The back-end can stall due to the unfinished execution of an instruction at the head of the ROB which prevents other resources from being deallocated, preventing new instructions from being inserted into the pipeline.To address these two stalls, this thesis focuses on reducing the time instructions occupy the scheduling resources. Our front-end technique tackles IQ pressure while our back-end approach considers the rest of the resources. To reduce front-end stalls we reduce the pressure on the IQ for both storing (depth) and issuing (width) instructions by bypassing them to cheaper storage structures. To reduce back-end stalls, we explore how we can retire instructions earlier, and out-of-order, to reduce the pressure on the out-of-order resource.
  •  
6.
  • Alves, Ricardo, et al. (författare)
  • Early Address Prediction : Efficient Pipeline Prefetch and Reuse
  • 2021
  • Ingår i: ACM Transactions on Architecture and Code Optimization (TACO). - : Association for Computing Machinery (ACM). - 1544-3566 .- 1544-3973. ; 18:3
  • Tidskriftsartikel (refereegranskat)abstract
    • Achieving low load-to-use latency with low energy and storage overheads is critical for performance. Existing techniques either prefetch into the pipeline (via address prediction and validation) or provide data reuse in the pipeline (via register sharing or LO caches). These techniques provide a range of tradeoffs between latency, reuse, and overhead. In this work, we present a pipeline prefetching technique that achieves state-of-the-art performance and data reuse without additional data storage, data movement, or validation overheads by adding address tags to the register file. Our addition of register file tags allows us to forward (reuse) load data from the register file with no additional data movement, keep the data alive in the register file beyond the instruction's lifetime to increase temporal reuse, and coalesce prefetch requests to achieve spatial reuse. Further, we show that we can use the existing memory order violation detection hardware to validate prefetches and data forwards without additional overhead. Our design achieves the performance of existing pipeline prefetching while also forwarding 32% of the loads from the register file (compared to 15% in state-of-the-art register sharing), delivering a 16% reduction in L1 dynamic energy (1.6% total processor energy), with an area overhead of less than 0.5%.
  •  
7.
  •  
8.
  • Asgharzadeh, Ashkan, et al. (författare)
  • Free Atomics : Hardware Atomic Operations without Fences
  • 2022
  • Ingår i: PROCEEDINGS OF THE 2022 THE 49TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '22). - New York, NY, USA : Association for Computing Machinery (ACM). - 9781450386104 ; , s. 14-26
  • Konferensbidrag (refereegranskat)abstract
    • Atomic Read-Modify-Write (RMW) instructions are primitive synchronization operations implemented in hardware that provide the building blocks for higher-abstraction synchronization mechanisms to programmers. According to publicly available documentation, current x86 implementations serialize atomic RMW operations, i.e., the store buffer is drained before issuing atomic RMWs and subsequent memory operations are stalled until the atomic RMW commits. This serialization, carried out by memory fences, incurs a performance cost which is expected to increase with deeper pipelines. This work proposes Free atomics, a lightweight, speculative, deadlock-free implementation of atomic operations that removes the need for memory fences, thus improving performance, while preserving atomicity and consistency. Free atomics is, to the best of our knowledge, the first proposal to enable store-to-load forwarding for atomic RMWs. Free atomics only requires simple modifications and incurs a small area overhead (15 bytes). Our evaluation using gem5-20 shows that, for a 32-core configuration, Free atomics improves performance by 12.5%, on average, for a large range of parallel workloads and 25.2%, on average, for atomic-intensive parallel workloads over a fenced atomic RMW implementation.
  •  
9.
  • Cebrian, Juan M., et al. (författare)
  • Boosting Store Buffer Efficiency with Store-Prefetch Bursts
  • 2020
  • Ingår i: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). - : Institute of Electrical and Electronics Engineers (IEEE). - 9781728173832 - 9781728173832 ; , s. 568-580
  • Konferensbidrag (refereegranskat)abstract
    • Virtually all processors today employ a store buffer (SB) to hide store latency. However, when the store buffer is full, store latency is exposed to the processor causing pipeline stalls. The default strategies to mitigate these stalls are to issue prefetch for ownership requests when store instructions commit and to continuously increase the store buffer size. While these strategies considerably increase memory-level parallelism for stores, there are still applications that suffer deeply from stalls caused by the store buffer. Even worse, store-buffer induced stalls increase considerably when simultaneous multi-threading is enabled, as the store buffer is statically partitioned among the threads.In this paper, we propose a highly selective and very aggressive prefetching strategy to minimize store-buffer induced stalls. Our proposal, Store-Prefetch Burst (SPB), is based on the following insights: i) the majority of store-buffer induced stalls are caused by a few stores; ii) the access pattern of such stores are easily predictable; and iii) the latency of the stores is not commonly hidden by standard cache prefetchers, as hiding their latency would require tremendous prefetch aggressiveness. SPB accurately detects contiguous store-access patterns (requiring just 67 bits of storage) and prefetches the remaining memory blocks of the accessed page in a single burst request to the L1 controller. SPB matches the performance of a 1024-entry SB implementation on a 56-entry SB (i.e., Skylake architecture). For a 14-entry SB (e.g., running four logical cores), it achieves 95.0% of that ideal performance, on average, for SPEC CPU 2017. Additionally, a 20-entry store buffer that incorporates SPB achieves the average performance of a standard 56-entry store buffer.
  •  
10.
  • Chen, Xiaoyue, et al. (författare)
  • Clueless : A Tool Characterising Values Leaking as Addresses
  • 2022
  • Ingår i: Proceedings of the 11th International Workshop on Hardware and Architectural Support for Security And Privacy, HASP 2022. - : Association for Computing Machinery (ACM). - 9781450398718 ; , s. 27-34
  • Konferensbidrag (refereegranskat)abstract
    • Clueless is a binary instrumentation tool that characterises explicit cache side channel vulnerabilities of programs. It detects the transformation of data values into addresses by tracking dynamic instruction dependencies. Clueless tags data values in memory if it discovers that they are used in address calculations to further access other data. Clueless can report on the amount of data that are used as addresses at each point during execution. It can also be specifically instructed to track certain data in memory (e.g., a password) to see if they are turned into addresses at any point during execution. It returns a trace on how the tracked data are turned into addresses, if they do. We demonstrate Clueless on SPEC 2006 and characterise, for the first time, the amount of data values that are turned into addresses in these programs. We further demonstrate Clueless on a micro benchmark and on a case study. The case study is the different implementations of AES in OpenSSL: T-table, Vector Permutation AES (VPAES), and Intel Advanced Encryption Standard New Instructions (AES-NI). Clueless shows how the encryption key is transformed into addresses in the T-table implementation, while explicit cache side channel vulnerabilities are note detected in the other implementations.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 30
Typ av publikation
konferensbidrag (19)
tidskriftsartikel (7)
doktorsavhandling (4)
Typ av innehåll
refereegranskat (26)
övrigt vetenskapligt/konstnärligt (4)
Författare/redaktör
Kaxiras, Stefanos (27)
Ros, Alberto (13)
Sakalis, Christos (8)
Aimoniotis, Pavlos (6)
Själander, Magnus, 1 ... (5)
Cebrian, Juan M. (4)
visa fler...
Jimborean, Alexandra (4)
Kvalsvik, Amund Berg ... (3)
Själander, Magnus (3)
Chen, Xiaoyue (3)
Kaxiras, Stefanos, P ... (3)
Acacio, Manuel E. (3)
Sjalander, Magnus (2)
Alipour, Mehdi (2)
Black-Schaffer, Davi ... (2)
Black-Schaffer, Davi ... (2)
Alves, Ricardo (2)
Tran, Kim-Anh (2)
Feliu, Josue (2)
Jose Gomez-Hernandez ... (2)
Titos-Gil, Ruben (2)
Voigt, Thiemo (1)
Sagonas, Konstantino ... (1)
Kumar, Rakesh (1)
Mottola, Luca, 1980- (1)
H. Lipasti, Mikko, P ... (1)
Wrigstad, Tobias, Pr ... (1)
Asgharzadeh, Ashkan (1)
Perais, Arthur (1)
Yao, Yuan (1)
Fernández-Pascual, R ... (1)
Song, Weining (1)
Akturk, Ismail (1)
Ekemark, Per (1)
Gómez-Hernández, Edu ... (1)
Shao, Ruixiang (1)
Chowdhury, Zamshed I ... (1)
Wadle, Shayne (1)
Karpuzcu, Ulya R. (1)
Sakalis, Christos, 1 ... (1)
Själander, Magnus, A ... (1)
Jimborean, Alexandra ... (1)
Torrellas, Josep, Sa ... (1)
Shimchenko, Marina (1)
Shimchenko, Marina, ... (1)
Liu, Yu David, Profe ... (1)
Yao, Yuan, 1986- (1)
Jimborean, Alexandra ... (1)
Pouchet, Louis-Noël, ... (1)
visa färre...
Lärosäte
Uppsala universitet (30)
Språk
Engelska (30)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (22)
Teknik (10)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy