SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Jafri Syed. M. A. H.) "

Sökning: WFRF:(Jafri Syed. M. A. H.)

  • Resultat 1-8 av 8
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Jafri, Syed M. A. H., et al. (författare)
  • TEA : Timing and Energy Aware compression architecture for Efficient Configuration in CGRAs
  • 2015
  • Ingår i: Microprocessors and microsystems. - : Elsevier. - 0141-9331 .- 1872-9436.
  • Tidskriftsartikel (refereegranskat)abstract
    • Coarse Grained Reconfigurable Architectures (CGRAs) are emerging as enabling platforms to meet the high performance demanded by modern applications (e.g. 4G, CDMA, etc.). Recently proposed CGRAs offer time-multiplexing and dynamic applications parallelism to enhance device utilization and reduce energy consumption at the cost of additional memory (up to 50% area of the overall platform). To reduce the memory overheads, novel CGRAs employ either statistical compression, intermediate compact representation, or multicasting. Each compaction technique has different properties (i.e. compression ratio, decompression time and decompression energy) and is best suited for a particular class of applications. However, existing research only deals with these methods separately. Moreover, they only analyze the compaction ratio and do not evaluate the associated energy overheads. To tackle these issues, we propose a polymorphic compression architecture that interleaves these techniques in a unique platform. The proposed architecture allows each application to take advantage of a separate compression/decompression hierarchy (consisting of various types and implementations of hardware/software decoders) tailored to its needs. Simulation results, using different applications (FFT, Matrix multiplication, and WLAN), reveal that the choice of compression hierarchy has a significant impact on compression ratio (up to 52%), decompression energy (up to 4 orders of magnitude), and configuration time (from 33. n to 1.5. s) for the tested applications. Synthesis results reveal that introducing adaptivity incurs negligible additional overheads (1%) compared to the overall platform area.
  •  
2.
  • Farahini, Nasim, et al. (författare)
  • Parallel distributed scalable runtime address generation scheme for a coarse grain reconfigurable computation and storage fabric
  • 2014
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 38:8, s. 788-802
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents a hardware based solution for a scalable runtime address generation scheme for DSP applications mapped to a parallel distributed coarse grain reconfigurable computation and storage fabric. The scheme can also deal with non-affine functions of multiple variables that typically correspond to multiple nested loops. The key innovation is the judicious use of two categories of address generation resources. The first category of resource is the low cost AGU that generates addresses for given address bounds for affine functions of up to two variables. Such low cost AGUs are distributed and associated with every read/write port in the distributed memory architecture. The second category of resource is relatively more complex but is also distributed but shared among a few storage units and is capable of handling more complex address generation requirements like dynamic computation of address bounds that are then used to configure the AGUs, transformation of non-affine functions to affine function by computing the affine factor outside the loop, etc. The runtime computation of the address constraints results in negligibly small overhead in latency, area and energy while it provides substantial reduction in program storage, reconfiguration agility and energy compared to the prevalent pre-computation of address constraints. The efficacy of the proposed method has been validated against the prevalent address generation schemes for a set of six realistic DSP functions. Compared to the pre-computation method, the proposed solution achieved 75% average code compaction and compared to the centralized runtime address generation scheme, the proposed solution achieved 32.7% average performance improvement.
  •  
3.
  • Jafri, Syed M. A. H., et al. (författare)
  • Architecture and Implementation of Dynamic Parallelism, Voltage and Frequency Scaling (PVFS) on CGRAs
  • 2015
  • Ingår i: ACM Journal on Emerging Technologies in Computing Systems. - : Association for Computing Machinery (ACM). - 1550-4832 .- 1550-4840. ; 11:4
  • Tidskriftsartikel (refereegranskat)abstract
    • In the era of platforms hosting multiple applications with arbitrary performance requirements, providing a worst-case platform-wide voltage/frequency operating point is neither optimal nor desirable. As a solution to this problem, designs commonly employ dynamic voltage and frequency scaling (DVFS). DVFS promises significant energy and power reductions by providing each application with the operating point (and hence the performance) tailored to its needs. To further enhance the optimization potential, recent works interleave dynamic parallelism with conventional DVFS. The induced parallelism results in performance gains that allow an application to lower its operating point even further (thereby saving energy and power consumption). However, the existing works employ costly dedicated hardware (for synchronization) and rely solely on greedy algorithms to make parallelism decisions. To efficiently integrate parallelism with DVFS, compared to state-of-the-art, we exploit the reconfiguration (to reduce DVFS synchronization overheads) and enhance the intelligence of the greedy algorithm (to make optimal parallelism decisions). Specifically, our solution relies on dynamically reconfigurable isolation cells and an autonomous parallelism, voltage, and frequency selection algorithm. The dynamically reconfigurable isolation cells reduce the area overheads of DVFS circuitry by configuring the existing resources to provide synchronization. The autonomous parallelism, voltage, and frequency selection algorithm ensures high power efficiency by combining parallelism with DVFS. It selects that parallelism, voltage, and frequency trio which consumes minimum power to meet the deadlines on available resources. Synthesis and simulation results using various applications/algorithms (WLAN, MPEG4, FFT, FIR, matrix multiplication) show that our solution promises significant reduction in area and power consumption (23% and 51%) compared to state-of-the-art.
  •  
4.
  • Jafri, Syed M.A.H., et al. (författare)
  • Customizable Compression Architecture for Efficient Configuration in CGRAs
  • 2011
  • Ingår i: Proceedings. ; , s. 31-31
  • Konferensbidrag (refereegranskat)abstract
    • Today, Coarse Grained Reconfigurable Architectures (CGRAs) host multiple applications. Novel CGRAs allow each application to exploit runtime parallelism and time sharing. Although these features enhance the power and silicon efficiency, they significantly increase the configuration memory overheads. As a solution to this problem researchers have employed statistical compression, intermediate compact representation, and multicasting. Each of these techniques has different properties, and is therefore best suited for a particular class of applications. However, existing research only deals with these methods separately. In this paper we propose a morphable compression architecture that interleaves these techniques in a unique platform.
  •  
5.
  • Jafri, Syed. M. A. H., et al. (författare)
  • Energy-Aware Coarse-Grained Reconfigurable Architectures using Dynamically Reconfigurable Isolation Cells
  • 2013
  • Ingår i: Proceedings Of The Fourteenth International Symposium On Quality Electronic Design (ISQED 2013). - 9781467349529 ; , s. 104-111
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents a self adaptive architecture to enhance the energy efficiency of coarse-grained reconfigurable architectures (CGRAs). Today, platforms host multiple applications, with arbitrary inter-application communication and concurrency patterns. Each application itself can have multiple versions (implementations with different degree of parallelism) and the optimal version can only be determined at runtime. For such scenarios, traditional worst case designs and compile time mapping decisions are neither optimal nor desirable. Existing solutions to this problem employ costly dedicated hardware to configure the operating point at runtime (using DVFS). As an alternative to dedicated hardware, we propose exploiting the reconfiguration features of modern CGRAs. Our solution relies on dynamically reconfigurable isolation cells (DRICs) and autonomous parallelism, voltage, and frequency selection algorithm (APVFS). The DRICs reduce the overheads of DVFS circuitry by configuring the existing resources as isolation cells. APVFS ensures high efficiency by dynamically selecting the parallelism, voltage and frequency trio, which consumes minimum power to meet the deadlines on available resources. Simulation results using representative applications (Matrix multiplication, FIR, and FFT) showed up to 23% and 51% reduction in power and energy, respectively, compared to traditional DVFS designs. Synthesis results have confirmed significant reduction in area overheads compared to state of the art DVFS methods.
  •  
6.
  • Jafri, Syed M. A. H., et al. (författare)
  • Morphable Compression Architecture for Efficient Configuration in CGRAs
  • 2014
  • Ingår i: 2014 17th Euromicro Conference on Digital System Design (DSD). ; , s. 42-49
  • Konferensbidrag (refereegranskat)abstract
    • Today, Coarse Grained Reconfigurable Architectures (CGRAs) host multiple applications. Novel CGRAs allow each application to exploit runtime parallelism and time sharing. Although these features enhance the power and silicon efficiency, they significantly increase the configuration memory overheads (up to 50% area of the overall platform). As a solution to this problem researchers have employed statistical compression, intermediate compact representation, and multicasting. Each of these techniques has different properties (i.e. compression ratio and decoding time), and is therefore best suited for a particular class of applications (and situation). However, existing research only deals with these methods separately. In this paper we propose a morphable compression architecture that interleaves these techniques in a unique platform. The proposed architecture allows each application to enjoy a separate compression/decompression hierarchy (consisting of various types and implementations of hardware/software decoders) tailored to its needs. Thereby, our solution offers minimal memory while meeting the required configuration deadlines. Simulation results, using different applications (FFT, Matrix multiplication, and WLAN), reveal that the choice of compression hierarchy has a significant impact on compression ratio (from configware replication to 52%) and configuration cycles (from 33 nsec to 1.5 secs) for the tested applications. Synthesis results reveal that introducing adaptivity incurs negligible additional overheads (1%) compared to the overall platform area.
  •  
7.
  • Jafri, Syed M. A. H., et al. (författare)
  • Private reliability environments for efficient fault-tolerance in CGRAs
  • 2014
  • Ingår i: Design automation for embedded systems. - : Springer Science and Business Media LLC. - 0929-5585 .- 1572-8080. ; 18:3-4, s. 295-327
  • Tidskriftsartikel (refereegranskat)abstract
    • In the era of platforms hosting multiple applications with variable reliability needs, worst-case platform-wide fault-tolerance decisions are neither optimal nor desirable. As a solution to this problem, designs commonly employ adaptive fault-tolerance strategies that provide each application with the reliability level actually needed. However, in the CGRA domain, the existing schemes either only allow to shift between different levels of modular redundancy (duplication, triplication, etc.) or protect only a particular region of a device (e.g. configuration memory, computation, or data memory). To complement these strategies, we propose private fault-tolerance environments which, in addition to modular redundancy, also provide low cost sub-modular (e.g. residue mod 3) redundancy capable of handling both permanent and temporary faults in configuration memory, computation, communication, and data memory. In addition, we also present adaptive configuration scrubbing techniques which prevent fault accumulation in the configuration memory. Simulation results using a few selected algorithms (FFT, matrix multiplication, and FIR filter) show that the approach proposed is capable of providing flexible protection with energy overhead ranging from 3.125 % to 107 % for different reliability levels. Synthesis results have confirmed that the proposed architecture reduces the area overhead for self-checking (58 %) and fault-tolerant (7.1 %) versions, compared to the state of the art adaptive reliability techniques.
  •  
8.
  • Jafri, Syed M. A. H., et al. (författare)
  • TransMap : Transformation Based Remapping and Parallelism for High Utilization and Energy Efficiency in CGRAs
  • 2016
  • Ingår i: IEEE Transactions on Computers. - : IEEE. - 0018-9340 .- 1557-9956. ; 65:11, s. 3456-3469
  • Tidskriftsartikel (refereegranskat)abstract
    • In the era of platforms hosting multiple applications with arbitrary inter application communication and computation patterns, compile time mapping decisions are neither optimal nor desirable. As a solution to this problem, recently proposed architectures offer run-time remapping-. The run-time remapping techniques displace or parallelize/serialize an application to optimize different parameters (e.g., utilization and energy). To implement the dynamic remapping, reconfigurable architectures commonly store multiple (compile-time generated) implementations of an application. Each implementation represents a different platform location and/or degree of parallelism. The optimal implementation is selected at run-time. However, the compile-time binding either incurs excessive configuration memory overheads and/or is unable to map/parallelize an application even when sufficient resources are available. As a solution to this problem, we present Transformation based reMapping and parallelism (TransMap). TransMap stores only a single implementation and applies a series for transformations to the stored bitstream for remapping or parallelizing an application. Compared to state of the art, in addition to simple relocation in horizontal/vertical directions, TransMap also allows to rotate an application for mapping or parallelizing an application in resource constrained scenarios. By storing only a single implementation, TransMap offers significant reductions in configuration memory requirements (up to 73 percent for the tested applications), compared to state of the art compaction techniques. Simulation results reveal that the additional flexibility reduces the energy requirements by 33 percent and enhances the device utilization by 50 percent for the tested applications. Gate level analysis reveals that TransMap incurs negligible silicon (0.2 percent of the platform) and timing (6 additional cycles per application) penalty.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-8 av 8

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy