SwePub
Tyck till om SwePub Sök här!
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Hemani Ahmed) ;pers:(Farahini Nasim)"

Sökning: WFRF:(Hemani Ahmed) > Farahini Nasim

  • Resultat 1-10 av 20
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Azad, S. P., et al. (författare)
  • Customization methodology of a Coarse Grained Reconfigurable architecture
  • 2015
  • Ingår i: NORCHIP 2014 - 32nd NORCHIP Conference. - 9781479954421
  • Konferensbidrag (refereegranskat)abstract
    • Mapping algorithms on CGRAs can lead to an inefficient implementation and hardware under-utilization if there is a mismatch between the granularity of reconfigurable processing unit and the algorithm. In this paper, we introduce a tool that takes the hardware configuration of a set of applications, identifies the unused parts of the CGRA, and let the user sweep the design space from fully programmable to fully customized by eliminating the unused components. User can select among multiple design points according to the application specification. This method is very useful to design multi-mode ASIC accelerators. The fully customized hardware generated using our tool has a negligible area and power overhead compared to the equivalent ASIC but can be generated significantly faster.
  •  
2.
  • Farahini, Nasim, et al. (författare)
  • 39.9 GOPs/watt multi-mode CGRA accelerator for a multi-standard basestation
  • 2013
  • Ingår i: 2013 IEEE International Symposium on Circuits and Systems (ISCAS). - : IEEE. - 9781467357609 ; , s. 1448-1451
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents an industrial case study of using a Coarse Grain Reconfigurable Architecture (CGRA) for a multi-mode accelerator for two kernels: FFT for the LTE standard and the Correlation Pool for the UMTS standard to be executed in a mutually exclusive manner. The CGRA multi-mode accelerator achieved computational efficiency of 39.94 GOPS/watt (OP is multiply-add) and silicon efficiency of 56.20 GOPS/mm2. By analyzing the code and inferring the unused features of the fully programmable solution, an in-house developed tool was used to automatically customize the design to run just the two kernels and the two efficiency metrics improved to 49.05 GOPS/watt and 107.57 GOPS/mm2. Corresponding numbers for the ASIC implementation are 63.84 GOPS/watt and 90.91 GOPS/mm2. Though the ASIC’s silicon and computational efficiency numbers are slightly better, the engineering efficiency of the pre-verified/characterized CGRA solution is at least 10X better than the ASIC solution.
  •  
3.
  • Farahini, Nasim, et al. (författare)
  • A conceptual custom super-computer design for real-time simulation of human brain
  • 2013
  • Ingår i: 2013 21st Iranian Conference on Electrical Engineering, ICEE 2013. - 9781467356343 ; , s. 1-6
  • Konferensbidrag (refereegranskat)abstract
    • In this paper, we introduce BRIC, a novel custom multi-chip digital computer architecture for simulating in realtime a model of human brain in form of a spiking Bayesian Confidence Propagation Neural Network (BCPNN). The design is conceptually dimensioned for available technology in 2015-2020 with the estimated size of a pizza box, consuming less than 3 kWs of power, delivering 800 Teraflops/sec (single precision multiply operation) and 30 TBs of memory. To the best of our knowledge, this will be the smallest and lowest power real-time brain simulation engine if manufactured. The silicon and computational efficiencies come from use of 3D memory stacking, innovation in algorithm and architectural customization. The chip will be programmable allowing experimentation with variants of the BCPNN brain model.
  •  
4.
  • Farahini, Nasim, et al. (författare)
  • A scalable custom simulation machine for the Bayesian Confidence Propagation Neural Network model of the brain
  • 2014
  • Ingår i: 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC). - : IEEE. - 9781479928163 ; , s. 578-585
  • Konferensbidrag (refereegranskat)abstract
    • A multi-chip custom digital super-computer called eBrain for simulating Bayesian Confidence Propagation Neural Network (BCPNN) model of the human brain has been proposed. It uses Hybrid Memory Cube (HMC), the 3D stacked DRAM memories for storing synaptic weights that are integrated with a custom designed logic chip that implements the BCPNN model. In 22nm node, eBrain executes BCPNN in real time with 740 TFlops/s while accessing 30 TBs synaptic weights with a bandwidth of 112 TBs/s while consuming less than 6 kWs power for the typical case. This efficiency is three orders better than general purpose supercomputers in the same technology node.
  •  
5.
  •  
6.
  • Farahini, Nasim, et al. (författare)
  • Atomic stream computation unit based on micro-thread level parallelism
  • 2015
  • Ingår i: IEEE 26th Application-specific Systems, Architectures and Processors (ASAP) 2015. - : IEEE. - 9781479919246 ; , s. 25-29
  • Konferensbidrag (refereegranskat)abstract
    • The increasing demand for higher resolution of images and communication bandwidth requires the streaming applications to deal with ever increasing size of datasets. Further, with technology scaling the cost of moving data is reducing at a slower pace compared to the cost of computing. These trends have motivated the proposed micro-architectural reorganization of stream processors by dividing the stream computation into functional computation, address constraints computation and address generation and deploying independent, distributed micro-threads to implement them. This scheme is an alternative to parallelizing them at instruction level. The proposed scheme has two benefits: a more efficient sequencer logic and energy savings in address generation and transportation. These benefits are quantified for a set of streaming applications and show average percentage improvement of 39 in silicon efficiency of the sequencer logic and 23 in total computational efficiency.
  •  
7.
  • Farahini, Nasim, et al. (författare)
  • Distributed Runtime Computation of Constraints for Multiple Inner Loops
  • 2013
  • Ingår i: Proceedings - 16th Euromicro Conference on Digital System Design, DSD 2013. - New York : IEEE. - 9780769550749 ; , s. 389-395
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents hardware solution for runtime computation of loop constraints and synchronizing delays for multiple inner loops in parallel distributed implementation of digital signal processing sub-systems. Methods to map and generate the runtime computation code for loop constraints and synchronizing delays are also presented. Compared to the traditional methods, the proposed solution achieves 55% average code compaction and 32.7% average performance improvement. The solution has modest hardware cost that increases linearly with the dimension of the architecture and has no performance penalty. Results from multiple realistic examples are presented, analyzed and compared to the traditional methods.
  •  
8.
  • Farahini, Nasim, et al. (författare)
  • Parallel distributed scalable runtime address generation scheme for a coarse grain reconfigurable computation and storage fabric
  • 2014
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 38:8, s. 788-802
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents a hardware based solution for a scalable runtime address generation scheme for DSP applications mapped to a parallel distributed coarse grain reconfigurable computation and storage fabric. The scheme can also deal with non-affine functions of multiple variables that typically correspond to multiple nested loops. The key innovation is the judicious use of two categories of address generation resources. The first category of resource is the low cost AGU that generates addresses for given address bounds for affine functions of up to two variables. Such low cost AGUs are distributed and associated with every read/write port in the distributed memory architecture. The second category of resource is relatively more complex but is also distributed but shared among a few storage units and is capable of handling more complex address generation requirements like dynamic computation of address bounds that are then used to configure the AGUs, transformation of non-affine functions to affine function by computing the affine factor outside the loop, etc. The runtime computation of the address constraints results in negligibly small overhead in latency, area and energy while it provides substantial reduction in program storage, reconfiguration agility and energy compared to the prevalent pre-computation of address constraints. The efficacy of the proposed method has been validated against the prevalent address generation schemes for a set of six realistic DSP functions. Compared to the pre-computation method, the proposed solution achieved 75% average code compaction and compared to the centralized runtime address generation scheme, the proposed solution achieved 32.7% average performance improvement.
  •  
9.
  • Farahini, Nasim, et al. (författare)
  • Physical Design Aware System Level Synthesis of Hardware
  • 2015
  • Ingår i: Proceedings - Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2015. - : IEEE. ; , s. 141-148
  • Konferensbidrag (refereegranskat)abstract
    • In spite of decades of research, only a small percentage of hardware is designed using high-level synthesis because of the large gap between the abstraction levels of standard cells and algorithmic level. We propose a grid-based regular physical design platform composed of large grain hardened building blocks called SiLago blocks. This platform is divided into regions which are specialized for different functionalities like computation, storage, system control, etc. The characterized micro-architectural operations of the SiLago platform serve as the interface to meet-in-the-middle high-level and system-level syntheses framework. This framework was used to generate three hardware macro instances, derived from SiLago platform for three applications from signal processing domain. Results show two orders of magnitude improvements in efficiency of the system-level design space exploration and synthesis time, with average loss in design quality of 18% for energy and 54% for area compared to the commercial SOC flow.
  •  
10.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 20

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy