SwePub
Tyck till om SwePub Sök här!
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Hemani Ahmed) ;pers:(Daneshtalab Masoud)"

Sökning: WFRF:(Hemani Ahmed) > Daneshtalab Masoud

  • Resultat 1-10 av 10
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Anwar, Hassan, et al. (författare)
  • Exploring Spiking Neural Network on Coarse-Grain Reconfigurable Architectures
  • 2014
  • Ingår i: ACM International Conference Proceeding Series. - New York, NY, USA : ACM. - 9781450328227 ; , s. 64-67
  • Konferensbidrag (refereegranskat)abstract
    • Today, reconfigurable architectures are becoming increas- ingly popular as the candidate platforms for neural net- works. Existing works, that map neural networks on re- configurable architectures, only address either FPGAs or Networks-on-chip, without any reference to the Coarse-Grain Reconfigurable Architectures (CGRAs). In this paper we investigate the overheads imposed by implementing spiking neural networks on a Coarse Grained Reconfigurable Ar- chitecture (CGRAs). Experimental results (using point to point connectivity) reveal that up to 1000 neurons can be connected, with an average response time of 4.4 msec.
  •  
2.
  • Daneshtalab, Masoud, et al. (författare)
  • Special issue on many-core embedded systems
  • 2014
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 38:6, s. 525-525
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)
  •  
3.
  • Jafri, Syed M.A.H., et al. (författare)
  • Customizable Compression Architecture for Efficient Configuration in CGRAs
  • 2011
  • Ingår i: Proceedings. ; , s. 31-31
  • Konferensbidrag (refereegranskat)abstract
    • Today, Coarse Grained Reconfigurable Architectures (CGRAs) host multiple applications. Novel CGRAs allow each application to exploit runtime parallelism and time sharing. Although these features enhance the power and silicon efficiency, they significantly increase the configuration memory overheads. As a solution to this problem researchers have employed statistical compression, intermediate compact representation, and multicasting. Each of these techniques has different properties, and is therefore best suited for a particular class of applications. However, existing research only deals with these methods separately. In this paper we propose a morphable compression architecture that interleaves these techniques in a unique platform.
  •  
4.
  • Jafri, Syed M. A. H., et al. (författare)
  • Morphable Compression Architecture for Efficient Configuration in CGRAs
  • 2014
  • Ingår i: 2014 17th Euromicro Conference on Digital System Design (DSD). ; , s. 42-49
  • Konferensbidrag (refereegranskat)abstract
    • Today, Coarse Grained Reconfigurable Architectures (CGRAs) host multiple applications. Novel CGRAs allow each application to exploit runtime parallelism and time sharing. Although these features enhance the power and silicon efficiency, they significantly increase the configuration memory overheads (up to 50% area of the overall platform). As a solution to this problem researchers have employed statistical compression, intermediate compact representation, and multicasting. Each of these techniques has different properties (i.e. compression ratio and decoding time), and is therefore best suited for a particular class of applications (and situation). However, existing research only deals with these methods separately. In this paper we propose a morphable compression architecture that interleaves these techniques in a unique platform. The proposed architecture allows each application to enjoy a separate compression/decompression hierarchy (consisting of various types and implementations of hardware/software decoders) tailored to its needs. Thereby, our solution offers minimal memory while meeting the required configuration deadlines. Simulation results, using different applications (FFT, Matrix multiplication, and WLAN), reveal that the choice of compression hierarchy has a significant impact on compression ratio (from configware replication to 52%) and configuration cycles (from 33 nsec to 1.5 secs) for the tested applications. Synthesis results reveal that introducing adaptivity incurs negligible additional overheads (1%) compared to the overall platform area.
  •  
5.
  • Jafri, Syed Mohammad Asad Hassan, et al. (författare)
  • NeuroCGRA : A CGRA with support for neural networks
  • 2014
  • Ingår i: Proceedings of the 2014 International Conference on High Performance Computing and Simulation, HPCS 2014. - : IEEE. - 9781479953127 ; , s. 506-511
  • Konferensbidrag (refereegranskat)abstract
    • Today, Coarse Grained Reconfigurable Architectures (CGRAs) are becoming an increasingly popular implementation platform. In real world applications, the CGRAs are required to simultaneously host processing (e.g. Audio/video acquisition) and estimation (e.g. audio/video/image recognition) tasks. For estimation problems, neural networks, promise a higher efficiency than conventional processing. However, most of the existing CGRAs provide no support for neural networks. To realize realize both neural networks and conventional processing on the same platform, this paper presents NeuroCGRA. NeuroCGRA allows the processing elements and the network to dynamically morph into either conventional CGRA or a neural network, depending on the hosted application. We have chosen the DRRA as a vehicle to study the feasibility and overheads of our approach. Synthesis results reveal that the proposed enhancements incur negligible overheads (4.4% area and 9.1% power) compared to the original DRRA cell.
  •  
6.
  • Jafri, Syed Mohammad Asad Hassan, et al. (författare)
  • RuRot : Run-time rotatable-expandable partitions for efficient mapping in CGRAs
  • 2014
  • Ingår i: Proceedings - International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, SAMOS 2014. - 9781479937707 ; , s. 233-241
  • Konferensbidrag (refereegranskat)abstract
    • Today, Coarse Grained Reconfigurable Architectures (CGRAs) host multiple applications, with arbitrary communication and computation patterns. Compile-time mapping decisions are neither optimal nor desirable to efficiently support the diverse and unpredictable application requirements. As a solution to this problem, recently proposed architectures offer run-time remapping. The run-time remappers displace or expand (parallelize/serialize) an application to optimize different parameters (such as platform utilization). However, the existing remappers support application displacement or expansion in either horizontal or vertical direction. Moreover, most of the works only address dynamic remapping in packet-switched networks and therefore are not applicable to the CGRAs that exploit circuitswitching for low-power and high predictability. To enhance the optimality of the run-time remappers, this paper presents a design framework called Run-time Rotatable-expandable Partitions (RuRot). RuRot provides architectural support to dynamically remap or expand (i.e. parallelize) the hosted applications in CGRAs with circuit-switched interconnects. Compared to state of the art, the proposed design supports application rotation (in clockwise and anticlockwise directions) and displacement (in horizontal and vertical directions), at run-time. Simulation results using a few applications reveal that the additional flexibility enhances the device utilization, significantly (on average 50 % for the tested applications). Synthesis results confirm that the proposed remapper has negligible silicon (0.2 % of the platform) and timing (2 cycles per application) overheads.
  •  
7.
  • Jafri, Syed M. A. H., et al. (författare)
  • TEA : Timing and Energy Aware compression architecture for Efficient Configuration in CGRAs
  • 2015
  • Ingår i: Microprocessors and microsystems. - : Elsevier. - 0141-9331 .- 1872-9436.
  • Tidskriftsartikel (refereegranskat)abstract
    • Coarse Grained Reconfigurable Architectures (CGRAs) are emerging as enabling platforms to meet the high performance demanded by modern applications (e.g. 4G, CDMA, etc.). Recently proposed CGRAs offer time-multiplexing and dynamic applications parallelism to enhance device utilization and reduce energy consumption at the cost of additional memory (up to 50% area of the overall platform). To reduce the memory overheads, novel CGRAs employ either statistical compression, intermediate compact representation, or multicasting. Each compaction technique has different properties (i.e. compression ratio, decompression time and decompression energy) and is best suited for a particular class of applications. However, existing research only deals with these methods separately. Moreover, they only analyze the compaction ratio and do not evaluate the associated energy overheads. To tackle these issues, we propose a polymorphic compression architecture that interleaves these techniques in a unique platform. The proposed architecture allows each application to take advantage of a separate compression/decompression hierarchy (consisting of various types and implementations of hardware/software decoders) tailored to its needs. Simulation results, using different applications (FFT, Matrix multiplication, and WLAN), reveal that the choice of compression hierarchy has a significant impact on compression ratio (up to 52%), decompression energy (up to 4 orders of magnitude), and configuration time (from 33. n to 1.5. s) for the tested applications. Synthesis results reveal that introducing adaptivity incurs negligible additional overheads (1%) compared to the overall platform area.
  •  
8.
  • Jafri, Syed M. A. H., et al. (författare)
  • TransMap : Transformation Based Remapping and Parallelism for High Utilization and Energy Efficiency in CGRAs
  • 2016
  • Ingår i: IEEE Transactions on Computers. - : IEEE. - 0018-9340 .- 1557-9956. ; 65:11, s. 3456-3469
  • Tidskriftsartikel (refereegranskat)abstract
    • In the era of platforms hosting multiple applications with arbitrary inter application communication and computation patterns, compile time mapping decisions are neither optimal nor desirable. As a solution to this problem, recently proposed architectures offer run-time remapping-. The run-time remapping techniques displace or parallelize/serialize an application to optimize different parameters (e.g., utilization and energy). To implement the dynamic remapping, reconfigurable architectures commonly store multiple (compile-time generated) implementations of an application. Each implementation represents a different platform location and/or degree of parallelism. The optimal implementation is selected at run-time. However, the compile-time binding either incurs excessive configuration memory overheads and/or is unable to map/parallelize an application even when sufficient resources are available. As a solution to this problem, we present Transformation based reMapping and parallelism (TransMap). TransMap stores only a single implementation and applies a series for transformations to the stored bitstream for remapping or parallelizing an application. Compared to state of the art, in addition to simple relocation in horizontal/vertical directions, TransMap also allows to rotate an application for mapping or parallelizing an application in resource constrained scenarios. By storing only a single implementation, TransMap offers significant reductions in configuration memory requirements (up to 73 percent for the tested applications), compared to state of the art compaction techniques. Simulation results reveal that the additional flexibility reduces the energy requirements by 33 percent and enhances the device utilization by 50 percent for the tested applications. Gate level analysis reveals that TransMap incurs negligible silicon (0.2 percent of the platform) and timing (6 additional cycles per application) penalty.
  •  
9.
  • Jafri, Syed Mohammad Asad Hassan, et al. (författare)
  • TransPar : Transformation based dynamic Parallelism for low power CGRAs
  • 2014
  • Ingår i: Conference Digest - 24th International Conference on Field Programmable Logic and Applications, FPL 2014. - 9783000446450
  • Konferensbidrag (refereegranskat)abstract
    • Coarse Grained Reconfigurable Architectures (CGRAs) are emerging as enabling platforms to meet the high performance demanded by modern applications (e.g. 4G, CDMA, etc.). Recently proposed CGRAs offer runtime parallelism to reduce energy consumption (by lowering voltage/frequency). To implement the runtime parallelism, CGRAs commonly store multiple compile-time generated implementations of an application (with different degree of parallelism) and select the optimal version at runtime. However, the compile-time binding incurs excessive configuration memory overheads and/or is unable to parallelize an application even when sufficient resources are available. As a solution to this problem, we propose Transformation based dynamic Parallelism (TransPar). TransPar stores only a single implementation and applies a series for transformations to generate the bitstream for the parallel version. In addition, it also allows to displace and/or rotate an application to parallelize in resource constrained scenarios. By storing only a single implementation, TransPar offers significant reductions in configuration memory requirements (up to 73% for the tested applications), compared to state of the art compaction techniques. Simulation and synthesis results, using real applications, reveal that the additional flexibility allows up to 33% energy reduction compared to static memory based parallelism techniques. Gate level analysis reveals that TransPar incurs negligible silicon (0.2% of the platform) and timing (6 additional cycles per application) penalty.
  •  
10.
  • Ngyen, T., et al. (författare)
  • FIST : A framework to interleave spiking neural networks on CGRAs
  • 2015
  • Ingår i: Proceedings - 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2015. - : IEEE. ; , s. 751-758
  • Konferensbidrag (refereegranskat)abstract
    • Coarse Grained Reconfigurable Architectures (CGRAs) are emerging as enabling platforms to meet the high performance demanded by modern embedded applications. In many application domains (e.g. robotics and cognitive embedded systems), the CGRAs are required to simultaneously host processing (e.g. Audio/video acquisition) and estimation (e.g. audio/video/image recognition) tasks. Recent works have revealed that the efficiency and scalability of the estimation algorithms can be significantly improved by using neural networks. However, existing CGRAs commonly employ homogeneous processing resources for both the tasks. To realize the best of both the worlds (conventional processing and neural networks), we present FIST. FIST allows the processing elements and the network to dynamically morph into either conventional CGRA or a neural network, depending on the hosted application. We have chosen the DRRA as a vehicle to study the feasibility and overheads of our approach. Synthesis results reveal that the proposed enhancements incur negligible overheads (4.4% area and 9.1% power) compared to the original DRRA cell.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 10

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy