SwePub
Sök i SwePub databas

  Extended search

Träfflista för sökning "WFRF:(Stenström Per 1957) "

Search: WFRF:(Stenström Per 1957)

  • Result 1-10 of 193
Sort/group result
   
EnumerationReferenceCoverFind
1.
  • Bardizbanyan, Alen, 1986, et al. (author)
  • Improving Data Access Efficiency by Using a Tagless Access Buffer (TAB)
  • 2013
  • In: Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2013. - 9781467355254 ; , s. 269-279
  • Conference paper (peer-reviewed)abstract
    • The need for energy efficiency continues to grow for many classes of processors, including those for which performance remains vital. Data cache is crucial for good performance, but it also represents a significant portion of the processor's energy expenditure. We describe the implementation and use of a tagless access buffer (TAB) that greatly improves data access energy efficiency while slightly improving performance. The compiler recognizes memory reference patterns within loops and allocates these references to a TAB. This combined hardware/software approach reduces energy usage by (1) replacing many level-one data cache (L1D) accesses with accesses to the smaller, more power-efficient TAB; (2) removing the need to perform tag checks or data translation lookaside buffer (DTLB) lookups for TAB accesses; and (3) reducing DTLB lookups when transferring data between the L1D and the TAB. Accesses to the TAB occur earlier in the pipeline, and data lines are prefetched from lower memory levels, which result in asmall performance improvement. In addition, we can avoid many unnecessary block transfers between other memory hierarchy levels by characterizing how data in the TAB are used. With a combined size equal to that of a conventional 32-entry register file, a four-entry TAB eliminates 40% of L1D accesses and 42% of DTLB accesses, on average. This configuration reduces data-access related energy by 35% while simultaneously decreasing execution time by 3%.
  •  
2.
  • Björk, Magnus, 1977, et al. (author)
  • Exposed Datapath for Efficient Computing
  • 2006
  • Reports (other academic/artistic)abstract
    • We introduce FlexCore, which is the first exemplar of a processor based on the FlexSoC processor paradigm. TheFlexCore utilizes an exposed datapath for increased performance. Microbenchmarks yield a performance boost of a factor of two over a traditional five-stage pipeline with the same functional units as the FlexCore.We describe our approach to compiling for the FlexCore.A flexible interconnect allows the FlexCore datapath to bedynamically reconfigured as a consequence of code generation. Additionally, specialized functional units may be introduced and utilized within the same architecture and compilation framework. The exposed datapath requires a wide control word. The conducted evaluation of two micro benchmarks confirms that this increases the instruction bandwidth and memory footprint. This calls for an efficient instruction decoding as proposed in the FlexSoC paradigm.
  •  
3.
  •  
4.
  • Hughes, John, 1958, et al. (author)
  • FlexSoC: Combining Flexibility and Efficiency in SoC Designs
  • 2003
  • In: Proceedings of 21st Norchip Conference. ; Riga, Latvia, s. 52-55
  • Conference paper (peer-reviewed)abstract
    • The FlexSoC project aims at developing a designframework that makes it possible to combine the computational speed and energy-efficiency of specialized hardware accelerators with the flexibility of programmable processors. FlexSoC approaches this problem by defining auniform programming interface across the heterogeneousstructure of processing resources. This paper justifies ourapproach and also discusses the central research issueswe will focus on in the areas of VLSI design, computerarchitecture, and programming and verification.
  •  
5.
  •  
6.
  • Thuresson, Martin, 1977, et al. (author)
  • FlexCore: Utilizing Exposed Datapath Control for Efficient Computing
  • 2009
  • In: Journal of Signal Processing Systems. - : Springer Science and Business Media LLC. - 1939-8018 .- 1939-8115. ; 57:1, s. 5-19
  • Journal article (peer-reviewed)abstract
    • We introduce FlexCore, the first exemplar of an architecture based on the FlexSoC framework. Comprising the same datapath units found in a conventional five-stage pipeline, the FlexCore has an exposed datapath control and a flexible interconnect to allow the datapath to be dynamically reconfigured as a consequence of code generation. Additionally, the FlexCore allows specialized datapath units to be inserted and utilized within the same architecture and compilation framework.This study shows that, in comparison to a conventional five-stage general-purpose processor, the FlexCore is up to 40\% more efficient in terms of cycle count on a set of benchmarks from the embedded application domain. We show that both the fine-grained control and the flexible interconnect contribute to the speedup. Furthermore, our synthesized, placed and routed FlexCore offers savings both in energy and execution time.The exposed FlexCore datapath requires a wide control word. The conducted evaluation confirms that this increases the instruction bandwidth and memory footprint. This calls for efficient instruction decoding as proposed in the FlexSoC framework.
  •  
7.
  • Alvarez, Lluc, et al. (author)
  • eProcessor: European, Extendable, Energy-Efficient, Extreme-Scale, Extensible, Processor Ecosystem
  • 2023
  • In: Proceedings of the 20th ACM International Conference on Computing Frontiers 2023, CF 2023. ; , s. 309-314
  • Conference paper (peer-reviewed)abstract
    • The eProcessor project aims at creating a RISC-V full stack ecosystem. The eProcessor architecture combines a high-performance out-of-order core with energy-efficient accelerators for vector processing and artificial intelligence with reduced-precision functional units. The design of this architecture follows a hardware/software co-design approach with relevant application use cases from the high-performance computing, bioinformatics and artificial intelligence domains. Two eProcessor prototypes will be developed based on two fabricated eProcessor ASICs integrated into a computer-on-module.
  •  
8.
  • Angerd, Alexandra, 1988, et al. (author)
  • A Framework for Automated and Controlled Floating-Point Accuracy Reduction in Graphics Applications on GPUs
  • 2017
  • In: Transactions on Architecture and Code Optimization. - : Association for Computing Machinery (ACM). - 1544-3973 .- 1544-3566. ; 14:4
  • Journal article (peer-reviewed)abstract
    • Reducing the precision of floating-point values can improve performance and/or reduce energy expenditure in computer graphics, among other, applications. However, reducing the precision level of floating-point values in a controlled fashion needs support both at the compiler and at the microarchitecture level. At the compiler level, a method is needed to automate the reduction of precision of each floating-point value. At the microarchitecture level, a lower precision of each floating-point register can allow more floating-point values to be packed into a register file. This, however, calls for new register file organizations.This article proposes an automated precision-selection method and a novel GPU register file organization that can store floating-point register values at arbitrary precisions densely. The automated precision-selection method uses a data-driven approach for setting the precision level of floating-point values, given a quality threshold and a representative set of input data. By allowing a small, but acceptable, degradation in output quality, our method can remove a significant amount of the bits needed to represent floating-point values in the investigated kernels (between 28% and 60%). Our proposed register file organization exploits these lower-precision floating-point values by packing several of them into the same physical register. This reduces the register pressure per thread by up to 48%, and by 27% on average, for a negligible output-quality degradation. This can enable GPUs to keep up to twice as many threads in flight simultaneously.
  •  
9.
  • Angerd, Alexandra, 1988, et al. (author)
  • A GPU Register File using Static Data Compression
  • 2020
  • In: ACM International Conference Proceeding Series. - New York, NY, USA : ACM.
  • Conference paper (peer-reviewed)abstract
    • GPUs rely on large register files to unlock thread-level parallelism for high throughput. Unfortunately, large register files are power hungry, making it important to seek for new approaches to improve their utilization. This paper introduces a new register file organization for efficient register-packing of narrow integer and floating-point operands designed to leverage on advances in static analysis. We show that the hardware/software co-designed register file organization yields a performance improvement of up to 79%, and 18.6%, on average, at a modest output-quality degradation.
  •  
10.
  • Angerd, Alexandra, 1988, et al. (author)
  • GBDI: Going Beyond Base-Delta-Immediate Compression with Global Bases
  • 2022
  • In: Proceedings - International Symposium on High-Performance Computer Architecture. - 1530-0897. - 9781665420273 ; 2022-April, s. 1115-1127
  • Conference paper (peer-reviewed)abstract
    • Memory bandwidth is limiting performance for many emerging applications. While compression techniques can unlock a higher memory bandwidth, prior art offers only modestly better bandwidth. This paper contributes with a new compression method - Global Base Delta Immediate compression (GBDI) - that offers substantially higher memory bandwidth by, unlike prior art, selecting base values across memory blocks. GBDI uses a novel clustering algorithm through data analysis in the background. The presented accelerator infrastructure offers low area overhead and latency. This paper shows that GBDI offers a compression ratio of 2.3×, and yields 1.5× higher bandwidth and 1.1× higher performance compared with a baseline without compression support, on average, for SPEC2017 benchmarks requiring medium to high memory bandwidth.
  •  
Skapa referenser, mejla, bekava och länka
  • Result 1-10 of 193
Type of publication
conference paper (100)
journal article (54)
reports (15)
editorial collection (13)
patent (9)
book (1)
show more...
research review (1)
show less...
Type of content
peer-reviewed (151)
other academic/artistic (42)
Author/Editor
Stenström, Per, 1957 (191)
Manivannan, Madhavan ... (21)
Negi, Anurag, 1980 (17)
Pericas, Miquel, 197 ... (14)
Thuresson, Martin, 1 ... (12)
Papaefstathiou, Vasi ... (11)
show more...
Dubois, Michel (11)
Själander, Magnus, 1 ... (8)
Garcia, J. M. (8)
Arelakis, Angelos, 1 ... (7)
Titos Gil, Ruben, 19 ... (7)
Warg, Fredrik, 1974 (7)
Larsson-Edefors, Per ... (6)
Cristal, Adrian (6)
Svensson, Lars, 1960 (5)
Azhar, Muhammad Waqa ... (5)
McKee, Sally A, 1963 (5)
Ardö, Anders (4)
Unsal, Osman (4)
Vajda, András (4)
Björk, Magnus, 1977 (4)
Holtryd, Nadja, 1988 (4)
Chen, Guancheng (4)
Dybdahl, Haakon (4)
Hughes, John, 1958 (3)
Jeppson, Kjell, 1947 (3)
Angerd, Alexandra, 1 ... (3)
Sintorn, Erik, 1980 (3)
Sheeran, Mary, 1959 (3)
Whalley, David (3)
Bardine, Alessandro (3)
Busck, Alexander (3)
Engbom, Mikael (3)
Gaydadjiev, Georgi, ... (3)
Chen, Jianwei (3)
Grahn, Håkan (2)
Nilsson, Jim (2)
Marazakis, Manolis (2)
Goel, Bhavishya, 198 ... (2)
Ulander, Lars, 1962 (2)
Mueller, Frank (2)
Dahlgren, Fredrik, 1 ... (2)
Jeong, J (2)
Foglia, PieroFrances ... (2)
Gabrielli, G (2)
Prete, Antonio (2)
Vallejo, F (2)
Sandin, Patrik (2)
Karlsson, Jonas, 197 ... (2)
De Bosschere, Koen (2)
show less...
University
Chalmers University of Technology (193)
Blekinge Institute of Technology (2)
University of Gothenburg (1)
Lund University (1)
RISE (1)
Language
English (192)
Swedish (1)
Research subject (UKÄ/SCB)
Natural sciences (170)
Engineering and Technology (52)

Year

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view