SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Själander Magnus 1977) "

Sökning: WFRF:(Själander Magnus 1977)

  • Resultat 1-10 av 63
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Björk, Magnus, 1977, et al. (författare)
  • Exposed Datapath for Efficient Computing
  • 2006
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • We introduce FlexCore, which is the first exemplar of a processor based on the FlexSoC processor paradigm. TheFlexCore utilizes an exposed datapath for increased performance. Microbenchmarks yield a performance boost of a factor of two over a traditional five-stage pipeline with the same functional units as the FlexCore.We describe our approach to compiling for the FlexCore.A flexible interconnect allows the FlexCore datapath to bedynamically reconfigured as a consequence of code generation. Additionally, specialized functional units may be introduced and utilized within the same architecture and compilation framework. The exposed datapath requires a wide control word. The conducted evaluation of two micro benchmarks confirms that this increases the instruction bandwidth and memory footprint. This calls for an efficient instruction decoding as proposed in the FlexSoC paradigm.
  •  
2.
  •  
3.
  •  
4.
  • Thuresson, Martin, 1977, et al. (författare)
  • FlexCore: Utilizing Exposed Datapath Control for Efficient Computing
  • 2009
  • Ingår i: Journal of Signal Processing Systems. - : Springer Science and Business Media LLC. - 1939-8115 .- 1939-8018. ; 57:1, s. 5-19
  • Tidskriftsartikel (refereegranskat)abstract
    • We introduce FlexCore, the first exemplar of an architecture based on the FlexSoC framework. Comprising the same datapath units found in a conventional five-stage pipeline, the FlexCore has an exposed datapath control and a flexible interconnect to allow the datapath to be dynamically reconfigured as a consequence of code generation. Additionally, the FlexCore allows specialized datapath units to be inserted and utilized within the same architecture and compilation framework.This study shows that, in comparison to a conventional five-stage general-purpose processor, the FlexCore is up to 40\% more efficient in terms of cycle count on a set of benchmarks from the embedded application domain. We show that both the fine-grained control and the flexible interconnect contribute to the speedup. Furthermore, our synthesized, placed and routed FlexCore offers savings both in energy and execution time.The exposed FlexCore datapath requires a wide control word. The conducted evaluation confirms that this increases the instruction bandwidth and memory footprint. This calls for efficient instruction decoding as proposed in the FlexSoC framework.
  •  
5.
  • Själander, Magnus, 1977, et al. (författare)
  • A Flexible Datapath Interconnect for Embedded Applications
  • 2007
  • Ingår i: IEEE Computer Society Annual Symposium on VLSI. ; , s. 15-20
  • Konferensbidrag (refereegranskat)abstract
    • We investigate the effects of introducing a flexible interconnect into an exposed datapath. We define an exposed datapath as a traditional GPP datapath that has its normal control removed, leading to the exposure of a wide control word. For an FFT benchmark, the introduction of a flexible interconnect reduces the total execution time by 16%. Compared to a traditional GPP, the execution time for an exposed datapath using a flexible interconnect is 32% shorter whereas the energy dissipation is 29% lower. Our investigation is based on a cycleaccurate architectural simulator and figures on delay, power, and area are obtained from placed-and-routed layouts in a commercial 0.13-ìm technology. The results from our case studies indicate that by utilizing a flexible interconnect, significant performance gains can be achieved for generic applications.
  •  
6.
  • Thuresson, Martin, 1977, et al. (författare)
  • A Flexible Code-Compression Scheme using Partitioned Look-Up Tables
  • 2009
  • Ingår i: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). - Berlin, Heidelberg : Springer Berlin Heidelberg. - 1611-3349 .- 0302-9743. - 3540929894 ; 5409 LNCS, s. 95-109
  • Konferensbidrag (refereegranskat)abstract
    • Wide instruction formats make it possible to control microarchitecture resources more precisely by the compiler by either enabling more parallelism (VLIW) or by saving power. Unfortunately, wide instructions impose a high pressure on the memory system due to an increased instruction-fetch bandwidth and a larger code working set/footprint. This paper presents a code compression scheme that allows the compiler to select what subset of a wide instruction set to use in each program phase at the granularity of basic blocks based on a profiling methodology. The decompression engine comprises a set of tables that convert a narrow instruction into a wide instruction in a dynamic fashion. The paper also presents a method for how to configure and dimension the decompression engine and how to generate a compressed program with embedded instructions that dynamically manage the tables in the decompression engine. We find that the 77 control bits in the original FlexCore instruction format can be reduced to 32 bits offering a compression of 58% and a modest performance overhead of less than 1% for management of the decompression tables.
  •  
7.
  • Thuresson, Martin, 1977, et al. (författare)
  • A Flexible Code Compression Scheme using Partitioned Look-Up Tables
  • 2008
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • Wide instruction formats make it possible to controlmicroarchitecture resources more finely by enabling more parallelism(VLIW) or by utilizing the microarchitecture more efficiently byexposing the control to the compiler. Unfortunately, wideinstructions impose a higher pressure on the memory system due to anincreased instruction-fetch bandwidth and a larger code workingset/footprint.This paper presents a code compression scheme that allows thecompiler to select what subset of the wide instruction set to usein each program phase at the granularity of basic blocks based on aprofiling methodology. The decompression engine comprises a set oftables that convert a narrow instruction into a wide instruction ina dynamic fashion. The paper also presents a method for how toconfigure and dimension the decompression engine and how togenerate a compressed program with embedded instructions thatdynamically manage the tables in the decompression engine.We find that the 77 control bits in the original FlexCoreinstruction format can be reduced to 32 bits offering a compressionof 58% and a modest performance overhead of less than 1% formanagement of the decompression tables.
  •  
8.
  • Azhar, Muhammad Waqar, 1986, et al. (författare)
  • Viterbi Accelerator for Embedded Processor Datapaths
  • 2012
  • Ingår i: Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors. - 1063-6862. - 9780769547688 ; , s. 133-140
  • Konferensbidrag (refereegranskat)abstract
    • We present a novel architecture for a lightweight Viterbi accelerator that can be tightly integrated inside an embedded processor. We investigate the accelerator’s impact on processor performance by using the EEMBC Viterbi benchmark and the in-house Viterbi Branch Metric kernel. Our evaluation based on the EEMBC benchmark shows that an accelerated 65-nm 2.7-ns processor datapath is 20% larger but 90% more cycle efficient than a datapath lacking the Viterbi accelerator, leading to an 87% overall energy reduction and a data throughput of 3.52 Mbit/s.
  •  
9.
  • Bardizbanyan, Alen, 1986, et al. (författare)
  • Designing a Practical Data Filter Cache to Improve Both Energy Efficiency and Performance
  • 2013
  • Ingår i: Transactions on Architecture and Code Optimization. - 1544-3973 .- 1544-3566. ; 10:4, s. 25 pages-
  • Tidskriftsartikel (refereegranskat)abstract
    • Conventional Data Filter Cache (DFC) designs improve processor energy efficiency, but degrade performance. Furthermore, the single-cycle line transfer suggested in prior studies adversely affects Level-1 Data Cache (L1 DC) area and energy efficiency. We propose a practical DFC that is accessed early in the pipeline and transfers a line over multiple cycles. Our DFC design improves performance and eliminates a substantial fraction of L1 DC accesses for loads, L1 DC tag checks on stores, and data translation lookaside buffer accesses for both loads and stores. Our evaluation shows that the proposed DFC can reduce the data access energy by 42.5% and improve execution time by 4.2%.
  •  
10.
  • Bardizbanyan, Alen, 1986, et al. (författare)
  • Improving Data Access Efficiency by Using a Tagless Access Buffer (TAB)
  • 2013
  • Ingår i: Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2013. - 9781467355254 ; , s. 269-279
  • Konferensbidrag (refereegranskat)abstract
    • The need for energy efficiency continues to grow for many classes of processors, including those for which performance remains vital. Data cache is crucial for good performance, but it also represents a significant portion of the processor's energy expenditure. We describe the implementation and use of a tagless access buffer (TAB) that greatly improves data access energy efficiency while slightly improving performance. The compiler recognizes memory reference patterns within loops and allocates these references to a TAB. This combined hardware/software approach reduces energy usage by (1) replacing many level-one data cache (L1D) accesses with accesses to the smaller, more power-efficient TAB; (2) removing the need to perform tag checks or data translation lookaside buffer (DTLB) lookups for TAB accesses; and (3) reducing DTLB lookups when transferring data between the L1D and the TAB. Accesses to the TAB occur earlier in the pipeline, and data lines are prefetched from lower memory levels, which result in asmall performance improvement. In addition, we can avoid many unnecessary block transfers between other memory hierarchy levels by characterizing how data in the TAB are used. With a combined size equal to that of a conventional 32-entry register file, a four-entry TAB eliminates 40% of L1D accesses and 42% of DTLB accesses, on average. This configuration reduces data-access related energy by 35% while simultaneously decreasing execution time by 3%.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 63
Typ av publikation
konferensbidrag (39)
tidskriftsartikel (9)
rapport (6)
doktorsavhandling (2)
licentiatavhandling (2)
patent (2)
visa fler...
bok (1)
annan publikation (1)
bokkapitel (1)
visa färre...
Typ av innehåll
refereegranskat (43)
övrigt vetenskapligt/konstnärligt (20)
Författare/redaktör
Själander, Magnus, 1 ... (62)
Larsson-Edefors, Per ... (39)
Bardizbanyan, Alen, ... (10)
McKee, Sally A, 1963 (10)
Hoang, Tung, 1980 (9)
Kaxiras, Stefanos (8)
visa fler...
Stenström, Per, 1957 (8)
Whalley, David (8)
Sakalis, Christos (7)
Thuresson, Martin, 1 ... (6)
Björk, Magnus, 1977 (5)
Svensson, Lars, 1960 (4)
SUBRAMANIYAN, KASYAB ... (4)
Eriksson, Henrik, 19 ... (4)
Ros, Alberto (3)
Goel, Bhavishya, 198 ... (3)
Sheeran, Mary, 1959 (3)
Gavin, Peter (3)
Jimborean, Alexandra (3)
Islam, Mafijul, 1975 (3)
Hughes, John, 1958 (2)
Jeppson, Kjell, 1947 (2)
Engdal, David (2)
Karlsson, Jonas, 197 ... (2)
Frolov, Nikita, 1986 (2)
Wong, S (1)
Giorgi, R (1)
Yu, Z. (1)
Goumas, Georgios, 19 ... (1)
Vajda, András (1)
Azhar, Muhammad Waqa ... (1)
Hasan, Ali, 1984 (1)
Vijayashekar, Akshay ... (1)
Ansari, Kashan Khurs ... (1)
Schilling, Thomas (1)
Brauer, P. (1)
Brinck, Martin, 1979 (1)
Eklund, Kristian, 19 ... (1)
Zhang, Lixin (1)
Tran, Kim-Anh (1)
Sanchez, Carlos (1)
Akturk, Ismail (1)
Carpenter, Paul (1)
Drazdziulis, Mindaug ... (1)
Duranton, Marc (1)
Johansson, Daniel, 1 ... (1)
Schölin, Martin, 198 ... (1)
Karlsson, Sven (1)
Spiliopoulos, Vasile ... (1)
Keramidas, Georgios, ... (1)
visa färre...
Lärosäte
Chalmers tekniska högskola (51)
Uppsala universitet (13)
Språk
Engelska (63)
Forskningsämne (UKÄ/SCB)
Teknik (37)
Naturvetenskap (26)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy