SwePub
Sök i SwePub databas

  Extended search

Träfflista för sökning "WFRF:(Ros Alberto) "

Search: WFRF:(Ros Alberto)

  • Result 1-10 of 48
Sort/group result
   
EnumerationReferenceCoverFind
1.
  •  
2.
  • Abdulla, Parosh Aziz, et al. (author)
  • Mending fences with self-invalidation and self-downgrade
  • 2018
  • In: Logical Methods in Computer Science. - 1860-5974. ; 14:1
  • Journal article (peer-reviewed)abstract
    • Cache coherence protocols based on self-invalidation and self-downgrade have recently seen increased popularity due to their simplicity, potential performance efficiency, and low energy consumption. However, such protocols result in memory instruction reordering, thus causing extra program behaviors that are often not intended by the programmers. We propose a novel formal model that captures the semantics of programs running under such protocols, and features a set of fences that interact with the coherence layer. Using the model, we design an algorithm to analyze the reachability and check whether a program satisfies a given safety property with the current set of fences. We describe a method for insertion of optimal sets of fences that ensure correctness of the program under such protocols. The method relies on a counter-example guided fence insertion procedure. One feature of our method is that it can handle a variety of fences (with different costs). This diversity makes optimization more difficult since one has to optimize the total cost of the inserted fences, rather than just their number. To demonstrate the strength of our approach, we have implemented a prototype and run it on a wide range of examples and benchmarks. We have also, using simulation, evaluated the performance of the resulting fenced programs.
  •  
3.
  • Altés Arlandis, Alberto, 1978-, et al. (author)
  • Common Life : LiAi research Report 2014-2015
  • 2015
  • Book (other academic/artistic)abstract
    • Common Life is the theme of the third term of the MA programme at the "Laboratory of Immediate Architectural Intervention", taught at Umeå School of Architecture. The main course in the term is called "Architectural Intervention, Realization and Consequences" and it aims at enabling the students to adopt a more reflective approach after their first year, in order to start developing a position as architects and as researchers. The semester is also a transition towards the final term in which students are to develop their Master's Theses. This publication gathers the results of a collective effort to carry out a research assignment that is conceived to prepare the students for some of the challenges they will have to face in order to complete those theses. The course focuses on architecture's ability to provide, through its transformative power, conditions of 'possibility' for the sharing of various spaces, times, processes and other things. Producing uncertainties, contingent relationships and unexpected effects can help define more positive value systems than that of the building-as-consumable-object. We understand the building as a relational object within a complex meshwork of other things, people and technologies, and we explore the notion of architecture-as-an-emergent-'gift': a relational practice that enables encounter(s), and affords the sharing of moments, conversations and, why not, lives. Inhabiting the spaces, discourses and events of this common life, we try to develop a faithful and true care for the situations we are enmeshed in, and perhaps discover ways to displace what is, in order to 'architect' what could be and what ought to be. Homes are very close to us, we inhabit them with our bodies, emotions and thoughts, we make them and they make us. How much are we ready to share of our lives? What can we obtain from sharing? What can we share? Are there degrees of sharing? What is our role as architects and the role of architecture in affording such 'sharings'?
  •  
4.
  • Alves, Ricardo, et al. (author)
  • Filter caching for free : The untapped potential of the store-buffer
  • 2019
  • In: Proc. 46th International Symposium on Computer Architecture. - New York : ACM Press. - 9781450366694 ; , s. 436-448
  • Conference paper (peer-reviewed)abstract
    • Modern processors contain store-buffers to allow stores to retire under a miss, thus hiding store-miss latency. The store-buffer needs to be large (for performance) and searched on every load (for correctness), thereby making it a costly structure in both area and energy. Yet on every load, the store-buffer is probed in parallel with the L1 and TLB, with no concern for the store-buffer's intrinsic hit rate or whether a store-buffer hit can be predicted to save energy by disabling the L1 and TLB probes.In this work we cache data that have been written back to memory in a unified store-queue/buffer/cache, and predict hits to avoid L1/TLB probes and save energy. By dynamically adjusting the allocation of entries between the store-queue/buffer/cache, we can achieve nearly optimal reuse, without causing stalls. We are able to do this efficiently and cheaply by recognizing key properties of stores: free caching (since they must be written into the store-buffer for correctness we need no additional data movement), cheap coherence (since we only need to track state changes of the local, dirty data in the store-buffer), and free and accurate hit prediction (since the memory dependence predictor already does this for scheduling).As a result, we are able to increase the store-buffer hit rate and reduce store-buffer/TLB/L1 dynamic energy by 11.8% (up to 26.4%) on SPEC2006 without hurting performance (average IPC improvements of 1.5%, up to 4.7%).The cost for these improvements is a 0.2% increase in L1 cache capacity (1 bit per line) and one additional tail pointer in the store-buffer.
  •  
5.
  • Asgharzadeh, Ashkan, et al. (author)
  • Free Atomics : Hardware Atomic Operations without Fences
  • 2022
  • In: PROCEEDINGS OF THE 2022 THE 49TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '22). - New York, NY, USA : Association for Computing Machinery (ACM). - 9781450386104 ; , s. 14-26
  • Conference paper (peer-reviewed)abstract
    • Atomic Read-Modify-Write (RMW) instructions are primitive synchronization operations implemented in hardware that provide the building blocks for higher-abstraction synchronization mechanisms to programmers. According to publicly available documentation, current x86 implementations serialize atomic RMW operations, i.e., the store buffer is drained before issuing atomic RMWs and subsequent memory operations are stalled until the atomic RMW commits. This serialization, carried out by memory fences, incurs a performance cost which is expected to increase with deeper pipelines. This work proposes Free atomics, a lightweight, speculative, deadlock-free implementation of atomic operations that removes the need for memory fences, thus improving performance, while preserving atomicity and consistency. Free atomics is, to the best of our knowledge, the first proposal to enable store-to-load forwarding for atomic RMWs. Free atomics only requires simple modifications and incurs a small area overhead (15 bytes). Our evaluation using gem5-20 shows that, for a 32-core configuration, Free atomics improves performance by 12.5%, on average, for a large range of parallel workloads and 25.2%, on average, for atomic-intensive parallel workloads over a fenced atomic RMW implementation.
  •  
6.
  •  
7.
  • Cebrian, Juan M., et al. (author)
  • Boosting Store Buffer Efficiency with Store-Prefetch Bursts
  • 2020
  • In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). - : Institute of Electrical and Electronics Engineers (IEEE). - 9781728173832 - 9781728173832 ; , s. 568-580
  • Conference paper (peer-reviewed)abstract
    • Virtually all processors today employ a store buffer (SB) to hide store latency. However, when the store buffer is full, store latency is exposed to the processor causing pipeline stalls. The default strategies to mitigate these stalls are to issue prefetch for ownership requests when store instructions commit and to continuously increase the store buffer size. While these strategies considerably increase memory-level parallelism for stores, there are still applications that suffer deeply from stalls caused by the store buffer. Even worse, store-buffer induced stalls increase considerably when simultaneous multi-threading is enabled, as the store buffer is statically partitioned among the threads.In this paper, we propose a highly selective and very aggressive prefetching strategy to minimize store-buffer induced stalls. Our proposal, Store-Prefetch Burst (SPB), is based on the following insights: i) the majority of store-buffer induced stalls are caused by a few stores; ii) the access pattern of such stores are easily predictable; and iii) the latency of the stores is not commonly hidden by standard cache prefetchers, as hiding their latency would require tremendous prefetch aggressiveness. SPB accurately detects contiguous store-access patterns (requiring just 67 bits of storage) and prefetches the remaining memory blocks of the accessed page in a single burst request to the L1 controller. SPB matches the performance of a 1024-entry SB implementation on a 56-entry SB (i.e., Skylake architecture). For a 14-entry SB (e.g., running four logical cores), it achieves 95.0% of that ideal performance, on average, for SPEC CPU 2017. Additionally, a 20-entry store buffer that incorporates SPB achieves the average performance of a standard 56-entry store buffer.
  •  
8.
  • Davari, Mahdad (author)
  • Advances Towards Data-Race-Free Cache Coherence Through Data Classification
  • 2017
  • Doctoral thesis (other academic/artistic)abstract
    • Providing a consistent view of the shared memory based on precise and well-defined semantics—memory consistency model—has been an enabling factor in the widespread acceptance and commercial success of shared-memory architectures. Moreover, cache coherence protocols have been employed by the hardware to remove from the programmers the burden of dealing with the memory inconsistency that emerges in the presence of the private caches. The principle behind all such cache coherence protocols is to guarantee that consistent values are read from the private caches at all times.In its most stringent form, a cache coherence protocol eagerly enforces two invariants before each data modification: i) no other core has a copy of the data in its private caches, and ii) all other cores know where to receive the consistent data should they need the data later. Nevertheless, by partly transferring the responsibility for maintaining those invariants to the programmers, commercial multicores have adopted weaker memory consistency models, namely the Total Store Order (TSO), in order to optimize the performance for more common cases.Moreover, memory models with more relaxed invariants have been proposed based on the observation that more and more software is written in compliance with the Data-Race-Free (DRF) semantics. The semantics of DRF software can be leveraged by the hardware to infer when data in the private caches might be inconsistent. As a result, hardware ignores the inconsistent data and retrieves the consistent data from the shared memory. DRF semantics therefore removes from the hardware the burden of eagerly enforcing the strong consistency invariants before each data modification. Instead, consistency is guaranteed only when needed. This results in manifold optimizations, such as reducing the energy consumption and improving the performance and scalability. The efficiency of detecting and discarding the inconsistent data is an important factor affecting the efficiency of such coherence protocols. For instance, discarding the consistent data does not affect the correctness, but results in performance loss and increased energy consumption.In this thesis we show how data classification can be leveraged as an effective tool to simplify the cache coherence based on the DRF semantics. In particular, we introduce simple but efficient hardware-based private/shared data classification techniques that can be used to efficiently detect the inconsistent data, thus enabling low-overhead and scalable cache coherence solutions based on the DRF semantics.
  •  
9.
  •  
10.
  •  
Skapa referenser, mejla, bekava och länka
  • Result 1-10 of 48
Type of publication
conference paper (29)
journal article (15)
reports (1)
book (1)
doctoral thesis (1)
patent (1)
show more...
show less...
Type of content
peer-reviewed (42)
other academic/artistic (5)
pop. science, debate, etc. (1)
Author/Editor
Ros, Alberto (44)
Kaxiras, Stefanos (40)
Sakalis, Christos (10)
Leonardsson, Carl (5)
Cebrian, Juan M. (5)
Själander, Magnus (3)
show more...
Alipour, Mehdi (3)
Abdulla, Parosh Aziz (2)
Atig, Mohamed Faouzi (2)
Zhu, Yunyun (2)
Carlson, Trevor E. (2)
Kaaks, Rudolf (1)
Masala, Giovanna (1)
Tumino, Rosario (1)
Romieu, Isabelle (1)
Sagonas, Konstantino ... (1)
Larsson, Nina (1)
Weiderpass, Elisabet ... (1)
Almquist, Martin (1)
Katzke, Verena (1)
Panico, Salvatore (1)
Eriksen, Anne Kirsti ... (1)
Lukic, Marko (1)
Rinaldi, Sabina (1)
Tjonneland, Anne (1)
Fadeel, Bengt (1)
Dossus, Laure (1)
Taylor, Joshua (1)
Christakoudi, Sofia (1)
Rodriguez-Barranco, ... (1)
Black-Schaffer, Davi ... (1)
Truong, Thérèse (1)
Guevara, Marcela (1)
Zamora-Ros, Raul (1)
Altés Arlandis, Albe ... (1)
Garriga, Josep (1)
Taylor, Rafaela (1)
Ros, Miguel (1)
Rahmanian, Hossein (1)
Paczkowski, Piotr (1)
Mahmood, Ibrahim (1)
Jerlei, Epp (1)
Bouroucha, Soumia (1)
Bäckström, Nina (1)
Vänstedt, Ida (1)
Alves, Ricardo (1)
Palermo, Vincenzo, 1 ... (1)
Asgharzadeh, Ashkan (1)
Perais, Arthur (1)
Sandström, Maria (1)
show less...
University
Uppsala University (44)
Umeå University (3)
Lund University (1)
Chalmers University of Technology (1)
Karolinska Institutet (1)
Language
English (48)
Research subject (UKÄ/SCB)
Natural sciences (32)
Engineering and Technology (15)
Medical and Health Sciences (1)
Humanities (1)

Year

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view