↓ Direkt till sidans innehåll
↓ Direkt till sidans sekundära innehåll (sidomenyn)

Träfflista för sökning "WFRF:(Hemani Ahmed) ;pers:(Jafri Syed)"

Sökning: WFRF:(Hemani Ahmed) > Jafri Syed

Resultat 1-10 av 37

Sortera/gruppera träfflistan

Sortering: Träffar per sida:

Numrering	Referens	Omslagsbild	Hitta
1.	Anwar, Hassan, et al. (författare) Exploring Spiking Neural Network on Coarse-Grain Reconfigurable Architectures 2014 Ingår i: ACM International Conference Proceeding Series. - New York, NY, USA : ACM. - 9781450328227 ; , s. 64-67 Konferensbidrag (refereegranskat)abstract Today, reconfigurable architectures are becoming increas- ingly popular as the candidate platforms for neural net- works. Existing works, that map neural networks on re- configurable architectures, only address either FPGAs or Networks-on-chip, without any reference to the Coarse-Grain Reconfigurable Architectures (CGRAs). In this paper we investigate the overheads imposed by implementing spiking neural networks on a Coarse Grained Reconfigurable Ar- chitecture (CGRAs). Experimental results (using point to point connectivity) reveal that up to 1000 neurons can be connected, with an average response time of 4.4 msec.
2.	Farahini, Nasim, et al. (författare) Parallel distributed scalable runtime address generation scheme for a coarse grain reconfigurable computation and storage fabric 2014 Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 38:8, s. 788-802 Tidskriftsartikel (refereegranskat)abstract This paper presents a hardware based solution for a scalable runtime address generation scheme for DSP applications mapped to a parallel distributed coarse grain reconfigurable computation and storage fabric. The scheme can also deal with non-affine functions of multiple variables that typically correspond to multiple nested loops. The key innovation is the judicious use of two categories of address generation resources. The first category of resource is the low cost AGU that generates addresses for given address bounds for affine functions of up to two variables. Such low cost AGUs are distributed and associated with every read/write port in the distributed memory architecture. The second category of resource is relatively more complex but is also distributed but shared among a few storage units and is capable of handling more complex address generation requirements like dynamic computation of address bounds that are then used to configure the AGUs, transformation of non-affine functions to affine function by computing the affine factor outside the loop, etc. The runtime computation of the address constraints results in negligibly small overhead in latency, area and energy while it provides substantial reduction in program storage, reconfiguration agility and energy compared to the prevalent pre-computation of address constraints. The efficacy of the proposed method has been validated against the prevalent address generation schemes for a set of six realistic DSP functions. Compared to the pre-computation method, the proposed solution achieved 75% average code compaction and compared to the centralized runtime address generation scheme, the proposed solution achieved 32.7% average performance improvement.
3.	Hemani, Ahmed, 1961-, et al. (författare) Synchoricity and NOCs could make Billion Gate custom hardware centric SOCs affordable 2017 Ingår i: 2017 11th IEEE/ACM International Symposium on Networks-on-Chip, NOCS 2017. - New York, NY, USA : Association for Computing Machinery (ACM). - 9781450349840 Konferensbidrag (refereegranskat)abstract In this paper, we present a novel synchoros VLSI design scheme that discretizes space uniformly. Synchoros derives from the Greek word chóros for space. We propose raising the physical design abstraction to register transfer level by using coarse grain reconfigurable building blocks called SiLago blocks. SiLago blocks are hardened, synchoros and are used to create arbitrarily complex VLSI design instances by abutting them and not requiring any further logic and physical syntheses. SiLago blocks are interconnected by two levels of NOCs, regional and global. By configuring the SiLago blocks and the two levels of NOCs, it is possible to create implementation alternatives whose cost metrics can be evaluated with agility and post layout accuracy. This framework, called the SiLago framework includes a synthesis based design flow that allows end to end automation of multi-million gate functionality modeled as SDF in Simulink to be transformed into timing and DRC clean physical design in minutes, while exploring 100s of solutions. We benchmark the synthesis efficiency, and silicon and computational efficiencies against the conventional standard cell based tooling to show two orders improvement in accuracy and three orders improvement in synthesis while eliminating the need to verify at lower abstractions like RTL. The proposed solution is being extended to deal with system-level non-compile time functionalities. We also present arguments on how synchoricity could also contribute to eliminating the engineering cost of designing masks to lower the manufacturing cost.
4.	Hemani, Ahmed, et al. (författare) The silago solution : Architecture and design methods for a heterogeneous dark silicon aware coarse grain reconfigurable fabric 2017 Ingår i: The Dark Side of Silicon. - Cham : Springer. - 9783319315966 - 9783319315942 ; , s. 47-94 Bokkapitel (refereegranskat)abstract The dark silicon constraint will restrict the VLSI designers to utilize an increasingly smaller percentage of transistors as we progress deeper into nano-scale regime because of the power delivery and thermal dissipation limits. The best way to deal with the dark silicon constraint is to use the transistors that can be turned on as efficiently as possible. Inspired by this rationale, the VLSI design community has adopted customization as the principal means to address the dark silicon constraint. Two categories of customization, often in tandem have been adopted by the community. The first is the processors that are heterogeneous in functionality and/or have ability to more efficiently match varying functionalities and runtime load. The second category of customization is based on the fact that hardware implementations often offer 2–6 orders more efficiency compared to software. For this reason, designers isolate the power and performance critical functionality and map them to custom hardware implementations called accelerators. Both these categories of customizations are partial in being compute centric and still implement the bulk of functionality in the inefficient software style. In this chapter, we propose a contrarian approach: implement the bulk of functionality in hardware style and only retain control intensive and flexibility critical functionality in small simple processors that we call flexilators. We propose using a micro-architecture level coarse grain reconfigurable fabric as the alternative to the Boolean level standard cells and LUTs of the FPGAs as the basis for dynamically reconfigurable hardware implementation. This coarse grain reconfigurable fabric allows dynamic creation of arbitrarily wide and deep datapath with their hierarchical control that can be coupled with a cluster of storage resources to create private execution partitions that host individual applications. Multiple such partitions can be created that can operate at different voltage frequency operating points. Unused resources can be put into a range of low power modes. This CGRA fabric allows not just compute centric customization but also interconnect, control, storage and access to storage can be customized. The customization is not only possible at compile/build time but also at runtime to match the available resources and runtime load conditions. This complete, micro-architecture level hardware centric customization overcomes the limitations of partial compute centric customization offered by the state-of-the-art accelerator-rich heterogeneous multi-processor implementation style by extracting more functionality and performance from the limited number of transistors that can be turned on. Besides offering complete and more effective customization and a hardware centric implementation style, we also propose a methodology that dramatically reduces the cost of customization. This methodology is based on a concept called SiLago (Silicon Large Grain Objects) method. The core idea behind the SiLago method is to use large grain micro-architecture level hardened and characterized blocks, the SiLago blocks, as the atomic physical design building blocks and a grid based structured layout scheme that enables composition of the SiLago fabric simply by abutting the blocks to produce a timing and DRC clean GDSII design. Effectively, the SiLago method raises the abstraction of the physical design to micro-architectural level from the present Boolean level standard cell and LUT based physical design. This significantly improves the efficiency and predictability of synthesis from higher levels of abstraction. In addition, it also enables true system-level synthesis that by virtue of correct-by-construction guarantee eliminates the costly functional verification step. The proposed solution allows a fully customized design with dynamic fine grain power management to be automatically generated from Simulink down to GDSII with computational and silicon efficiencies that are modestly lower than ASIC. The micro-architecture level SiLago block based design process with correct by construction guarantee is 5–6 orders more efficient and 2 orders more accurate compared to the Boolean standard cell based design flows.
5.	Jafri, Syed, et al. (författare) Implementation and evaluation of configuration scrubbing on CGRAs : A case study 2013 Ingår i: 2013 International Symposium on System-on-Chip, SoC 2013 - Proceedings. - : IEEE Computer Society. ; , s. 6675262- Konferensbidrag (refereegranskat)abstract This paper investigates the overhead imposed by various configuration scrubbing techniques used in fault-tolerant Coarse Grained Reconfigurable Arrays (CGRAs). Today, reconfigurable architectures host large configuration memories. As we progress further in the nanometer regime, these configuration memories have become increasingly susceptible to single event upsets caused e.g. by cosmic radiation. Configuration scrubbing is a frequently used technique to protect these configuration memories against single event upsets. Existing works on configuration scrubbing deal only with FPGA without any reference to the CGRAs (in which configuration memories consume up to 50% of silicon area). Moreover, in the known literature lacks a comprehensive comparison of various configuration scrubbing techniques to guide system designers about the merits/demerits of different scrubbing methods which could be applied to CGRAs. To address these problems, in this paper we classify various configuration scrubbing techniques and quantify their trade-offs when implemented on a CGRA. Synthesis results reveal that scrubbing logic incurs negligible silicon overhead (up to 3% of the area of computational units). Simulation results obtained for a few algorithms/applications (FFT, FIR, matrix multiplication, and WLAN) show that the choice of the configuration scrubbing scheme (external vs. internal) has significant impact on both the size of configuration memory and the number of reconfiguration cycles (respectively 20-80% more and up to 38 times more for the former).
6.	Jafri, Syed M. A. H., et al. (författare) Architecture and Implementation of Dynamic Parallelism, Voltage and Frequency Scaling (PVFS) on CGRAs 2015 Ingår i: ACM Journal on Emerging Technologies in Computing Systems. - : Association for Computing Machinery (ACM). - 1550-4832 .- 1550-4840. ; 11:4 Tidskriftsartikel (refereegranskat)abstract In the era of platforms hosting multiple applications with arbitrary performance requirements, providing a worst-case platform-wide voltage/frequency operating point is neither optimal nor desirable. As a solution to this problem, designs commonly employ dynamic voltage and frequency scaling (DVFS). DVFS promises significant energy and power reductions by providing each application with the operating point (and hence the performance) tailored to its needs. To further enhance the optimization potential, recent works interleave dynamic parallelism with conventional DVFS. The induced parallelism results in performance gains that allow an application to lower its operating point even further (thereby saving energy and power consumption). However, the existing works employ costly dedicated hardware (for synchronization) and rely solely on greedy algorithms to make parallelism decisions. To efficiently integrate parallelism with DVFS, compared to state-of-the-art, we exploit the reconfiguration (to reduce DVFS synchronization overheads) and enhance the intelligence of the greedy algorithm (to make optimal parallelism decisions). Specifically, our solution relies on dynamically reconfigurable isolation cells and an autonomous parallelism, voltage, and frequency selection algorithm. The dynamically reconfigurable isolation cells reduce the area overheads of DVFS circuitry by configuring the existing resources to provide synchronization. The autonomous parallelism, voltage, and frequency selection algorithm ensures high power efficiency by combining parallelism with DVFS. It selects that parallelism, voltage, and frequency trio which consumes minimum power to meet the deadlines on available resources. Synthesis and simulation results using various applications/algorithms (WLAN, MPEG4, FFT, FIR, matrix multiplication) show that our solution promises significant reduction in area and power consumption (23% and 51%) compared to state-of-the-art.
7.	Jafri, Syed Mohammad Asad Hassan, et al. (författare) Compact Generic Intermediate representation (CGIR) to enable late binding in Coarse Grained Reconfigurable Architectures 2011 Konferensbidrag (refereegranskat)
8.	Jafri, Syed Mohammad Asad Hassan, et al. (författare) Compression Based Efficient and Agile Configuration Mechanism for Coarse Grained Reconfigurable Architectures 2011 Ingår i: Proc. IEEE Int Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW) Symp. - 9780769543857 ; , s. 290-293 Konferensbidrag (refereegranskat)abstract This paper considers the possibility of speeding up the configuration by reducing the size of configware in coarsegrained reconfigurable architectures (CGRAs). Our goal was to reduce the number of cycles and increase the configuration bandwidth. The proposed technique relies on multicasting and bitstream compression. The multicasting reduces the cycles by configuring the components performing identical functions simultaneously, in a single cycle, while the bitstream compression increases the configuration bandwidth. We have chosen the dynamically reconfigurable resource array (DRRA) architecture as a vehicle to study the efficiency of this approach. In our proposed method, the configuration bitstream is compressed offline and stored in a memory. If reconfiguration is required, the compressed bitstream is decompressed using an online decompresser and sent to DRRA. Simulation results using practical applications showed upto 78% and 22% decrease in configuration cycles for completely parallel and completely serial implementations, respectively. Synthesis results have confirmed nigligible overhead in terms of area (1.2 %) and timing.
9.	Jafri, Syed M.A.H., et al. (författare) Customizable Compression Architecture for Efficient Configuration in CGRAs 2011 Ingår i: Proceedings. ; , s. 31-31 Konferensbidrag (refereegranskat)abstract Today, Coarse Grained Reconfigurable Architectures (CGRAs) host multiple applications. Novel CGRAs allow each application to exploit runtime parallelism and time sharing. Although these features enhance the power and silicon efficiency, they significantly increase the configuration memory overheads. As a solution to this problem researchers have employed statistical compression, intermediate compact representation, and multicasting. Each of these techniques has different properties, and is therefore best suited for a particular class of applications. However, existing research only deals with these methods separately. In this paper we propose a morphable compression architecture that interleaves these techniques in a unique platform.
10.	Jafri, Syed Mohammad Asad Hassan, et al. (författare) Energy-Aware CGRAs using Dynamically Re-configurable isolation Cells 2013 Konferensbidrag (refereegranskat)abstract This paper presents a self adaptive architectureto enhance the energy efﬁciency of coarse-grained reconﬁgurablearchitectures (CGRAs). Today, platforms host multipleapplications, with arbitrary inter-application communication andconcurrency patterns. Each application itself can have multipleversions (implementations with different degree of parallelism)and the optimal version can only be determined at runtime. Forsuch scenarios, traditional worst case designs and compile timemapping decisions are neither optimal nor desirable. Existingsolutions to this problem employ costly dedicated hardware toconﬁgure the operating point at runtime (using DVFS). As analternative to dedicated hardware, we propose exploiting thereconﬁguration features of modern CGRAs. Our solution relieson dynamically reconﬁgurable isolation cells (DRICs) and autonomousparallelism, voltage, and frequency selection algorithm(APVFS). The DRICs reduce the overheads of DVFS circuitryby conﬁguring the existing resources as isolation cells. APVFSensures high efﬁciency by dynamically selecting the parallelism,voltage and frequency trio, which consumes minimum powerto meet the deadlines on available resources. Simulation resultsusing representative applications (Matrix multiplication, FIR,and FFT) showed up to 23% and 51% reduction in powerand energy, respectively, compared to traditional DVFS designs.Synthesis results have conﬁrmed signiﬁcant reduction in areaoverheads compared to state of the art DVFS methods.

Skapa referenser, mejla, bekava och länka

Länka till träfflistan

Resultat 1-10 av 37

Avgränsa träffmängd

Typ av publikation: konferensbidrag (26); tidskriftsartikel (9); annan publikation (1); bokkapitel (1)

Typ av innehåll: refereegranskat (36); övrigt vetenskapligt/konstnärligt (1)

Författare/redaktör: Hemani, Ahmed (30); Tenhunen, Hannu (21); Plosila, Juha (19); Paul, Kolin (17); Jafri, Syed Mohammad ... (14); Jafri, Syed (11)Ta bort avgränsningen; visa fler...; Daneshtalab, Masoud (9); Jafri, Syed M. A. H. (8); Hemani, Ahmed, 1961- (7); Farahini, Nasim (6); Tajammul, Muhammad A ... (6); Stathis, Dimitrios (5); Abbas, N (3); Ellervee, Peeter (3); Chaourani, Panagioti ... (3); Plosila, J. (3); Guang, Liang (3); Syed, Jafri (3); Paul, K (2); Dytckov, Sergei (2); Sohofi, Hassan (2); Tajammul, Adeel (2); Bag, Ozan (2); Piestrak, Stanislaw ... (2); Leon, Guillermo Serr ... (2); Iqbal, J. (1); Jantsch, Axel (1); Anwar, Hassan (1); Sergei, Dytckov (1); Gia, T. N. (1); Li, Shuo (1); Yang, Yu (1); Ellervee, P. (1); Masoumian, S. (1); Piestrak, S. J. (1); Ozbag, Ozan (1); Kolin, Paul (1); Tenuhnen, Hannu (1); Awan, M. A. (1); Abbas, Naeem (1); Serrano Leon, Guille ... (1); Hassan, Hasan (1); Mutlu, Onur (1); Intesa, Leonardo (1); Ngyen, T. (1); Jafri, Syed M. A. (1); Ellerve, P. (1); visa färre...

Lärosäte: Kungliga Tekniska Högskolan (37)

Språk: Engelska (37)

Forskningsämne (UKÄ/SCB): Teknik (29); Naturvetenskap (7)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

Copyright © LIBRIS - Nationella bibliotekssystem
LIBRIS.kb.se

pil uppåt

Stäng

Kopiera och spara länken för att återkomma till aktuell vy