SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Hemani Ahmed) ;pers:(Shami Muhammad Ali 1980)"

Sökning: WFRF:(Hemani Ahmed) > Shami Muhammad Ali 1980

  • Resultat 1-7 av 7
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Shami, Muhammad Ali, 1980-, et al. (författare)
  • Address generation scheme for a coarse grain reconfigurable architecture
  • 2011
  • Ingår i: Proc. IEEE Int Application-Specific Systems, Architectures and Processors (ASAP) Conf. - 9781457712920 ; , s. 17-24
  • Konferensbidrag (refereegranskat)abstract
    • In this paper, we describe a versatile address generation scheme for distributed storage resources of a coarse grain Parallel Distributed Digital Signal Processing (PDDSP) reconfigurable architecture under development in our group. This scheme proposes the distributed address generation units (AGUs) to decouple the address generation logic with compute logic to exploit parallelism (ILP and TLP). To achieve this, the proposed distributed address generation scheme with standard DSP address generation modes like linear vectorized, circular buffer and bit-reverse addressing, all with parameterizable address range and increment/decrement offsets is further enhanced with temporal flexibility by introducing three dynamically programmable delays: initial delay before the stream starts, middle delay after every address generation for the stream and end delay after the stream is complete. The dynamic programmability of these delays makes streams elastic that can be chained with an interrupt mechanism to create chained-elastic streams. Our approach is compared with the traditional approach of using VLIW and Scalar. Our approach shows 21times;(Scalar), 10×(VLIW) reduction in instructions and 2×(Scalar) reduction in cycles for a single thread FIR filter. When compared for Synchronous and Asynchronous scenarios of two parallel treads T1 and T2, our approach shows 4.6×(Scalar), 5.6×(VLIW) reduction in instructions, 1.76 reduction in cycles for Synchronous and 4.6×(Scalar), 15×(VLIW) eduction in instructions, 1.76×(Scalar) reduction in cycles for Asynchronous threads.
  •  
2.
  • Shami, Muhammad Ali, 1980-, et al. (författare)
  • An improved self-reconfigurable interconnection scheme for a Coarse Grain Reconfigurable Architecture
  • 2010
  • Ingår i: NORCHIP 2010. - 9781424489732
  • Konferensbidrag (refereegranskat)abstract
    • An improved Dynamic, Partial and self reconfigurable interconnection network (Hybrid-2 Network) is presented for Dynamically Reprogrammable Resource Array (DRRA), which is a Coarse Grain Reconfiguration Architecture (CGRA). To justify the design decision, Hybrid-2 network implementation is compared against the possible implementations using Multiplexer, NoC, Crossbar and already published Hybrid-1 interconnection network. Results shows that newly presented Hybrid-2 Interconnection network take (1.08x, 0.104x, 0.212x and 0.681x) the area, (1x, 0.037x, 0.026x and 0.107x) the configuration bits of Multiplexer, NoC, Crossbar and Hybrid-1 Implementation respectively. Hybrid-2 network is also 2.87x and 5.86x faster than Multiplexer and Hybrid-1 networks.
  •  
3.
  • Shami, Muhammad Ali, 1980-, et al. (författare)
  • Control Scheme for a CGRA
  • 2010
  • Ingår i: Proc. 22nd Int Computer Architecture and High Performance Computing (SBAC-PAD) Symp. - 9780769542164 ; , s. 17-24
  • Konferensbidrag (refereegranskat)abstract
    • Ability to instantiate low cost and agile FSMs that can implement an arbitrary parallelism and combine such FSMs in a chain and in a hierarchy is one of the key differentiating factors between the ASICs and MPSOCs. CGRAs that have been reported in literature, like MPSOCs, also lack this ASIC like ability. The downside of ASICs is their lack of reuse and high engineering cost. We present a CGRA architecture that retains the programmability of CGRA and yet has the ASIC like ability to construct a) arbitrarily parallel data-path/FSM combine, b) chain an arbitrary number of such FSMs and c) create a hierarchy of such chains. We present in detail the architecture of such a control scheme and illustrate its use for an example composed of FFT and FIRs. We quantify the benefits of our approach by benchmarking for energy-delay product against a) ASICs (4.8X worse), b) a state-of-the-art CGRA (4.58X better) and FPGAs (63.95X better).
  •  
4.
  • Shami, Muhammad Ali, 1980- (författare)
  • Dynamically Reconfigurable Resource Array
  • 2012
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • The goals set by the International Technology Roadmap for Semiconductors (ITRS) for the consumer portable category, to be realized by 2020, are 1000X improvement in performance with only 40\% increase in power budget and no increase in design team size. To meet these goals, the challenges facing the VLSI community are gaps in architecture efficacy, design productivity and battery capacity.As the causes of the gaps in architecture efficacy and battery capacity, this thesis identifies: a) instruction granularity mismatch, b) bit-width granularity mismatch, c) silicon granularity mismatch and d) parallelism mismatch. Field Programmable Gate Array(FPGA) technology can address instruction/bit-width granularity and parallelism mismatch but suffers from silicon granularity mismatch due to high reconfiguration overheads. The ultimate design goal of a system-on-chip is to achieve an ASIC-like performance and FPGA-like flexibility, design time and cost. Coarse Grain Reconfigurable Architectures (CGRAs) are a compromise between ASIC and FPGA since they provide better computational efficiency compared to FPGAs and better engineering efficiency compared to ASIC. However, the current generation of CGRAs lack many architectural properties that would enable them to replace ASIC and/or FPGA by mainstream industry.To objectively discuss these properties, in the first part of the thesis a classification scheme has been proposed that classifies parallel computing machines into 47 classes and propose how they can be graded in terms of flexibility. We apply this classification scheme on academic and industrial reconfigurable architectures to compare them for their similarities and differences. We identify an instruction flow spatial computing class to be used for a CGRA fabric called Dynamically Reconfigurable Resource Array (DRRA) presented in the second part of this thesis. The DRRA fabric is a Parallel Distributed Digital Signal Processing (PDDSP) fabric with distributed arithmetic, logic, interconnect and control resources. Problems associated with the distributed control model of DRRA are identified and architectural solutions that can be exploited by the compiler tools are presented.After logical and physical synthesis, DRRA shows a peak performance of 21 GOPS and peak silicon efficiency of 16.03 GOPS/mm\textsuperscript{2}. We further performed a three-level validation of the DRRA fabric. At first level, we mapped a number of signal and compute intensive algorithms to demonstrate the flexibility of the DRRA fabric. At second level, we measured the gap between ASIC, DRRA and FPGA. On average DRRA shows 22.87x area, 10.75x power consumption, 852x configuration bits, 959x configuration cycles, 63,94x silicon efficiency, 4.78x computational efficiency, and 6.15E+10x better energy-delay product improvements compared to FPGA. Finally, at third level we present the use of DRRA for a real world example of implementing a 128-, 256-, 512-, 1024-, 2048-point configurable FFT processor. For 1024 point FFT, in terms of computational efficiency, DRRA outperforms all CGRAs by at least 2x and is worse than ASIC by 3.45x. As regards silicon efficiency, although dedicated processors perform 1.6x better, DRRA is better than all other CGRAs.
  •  
5.
  • Shami, Muhammad Ali, 1980-, et al. (författare)
  • Morphable DPU : Smart and Efficient Data Path for Signal Processing Applications
  • 2009
  • Ingår i: SIPS: 2009 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS. - 9781424443345 ; , s. 167-172
  • Konferensbidrag (refereegranskat)abstract
    • A coarse grained morphable Datapath Unit (mDPU) has been proposed. This mDPU implements multiplier in a smart way that enables the component adders to be reused when we do not need the multiplier. A pipelined design further enhances the design by creating a balanced datapath in temporal sense. These two features results in a design that optimally uses silicon and time. A judicious set of Coarse Granular instructions are enabled by the mDPU that we show can implement typical signal processing functions. A radix-2 64 point FFT has been implemented in 90 nm technology using the proposed mDPUs and performance and energy results from physical design phase are reported and compared to a state-of-the-art comparable design from the research community. 4X improvement in performance and 2.5X improvement in power-performance product are reported.
  •  
6.
  • Shami, Muhammad Ali, 1980-, et al. (författare)
  • Partially Reconfigurable Interconnection Network for Dynamically Reprogrammable Resource Array
  • 2009
  • Ingår i: 2009 IEEE 8TH INTERNATIONAL CONFERENCE ON ASIC, VOLS 1 AND 2, PROCEEDINGS. - NEW YORK : IEEE. ; , s. 122-125
  • Konferensbidrag (refereegranskat)abstract
    • This paper describes an innovative regular non-blocking, point-to-point, point-to-multipoint, low latency interconnection network scheme with sliding window connectivity, which allows arbitrary parallelism among large sub-systems. The area overhead of interconnect is only 30% of the chip area which is much smaller as compared to 80% in case of FPGA. The interconnection scheme is partially and dynamically reconfigurable. The configware is reduced 5.6 times by using binary encoding which allows energy efficient dynamic reconfiguration(1).
  •  
7.
  • Tajammul, Muhammad Adeel, 1982-, et al. (författare)
  • A NoC based distributed memory architecture with programmable and partitionable capabilities
  • 2010
  • Ingår i: 28th Norchip Conference, NORCHIP 2010. - 9781424489732
  • Konferensbidrag (refereegranskat)abstract
    • The paper focuses on the design of a Network-on-chip based programmable and partitionable distributed memory architecture which can be integrated with a Coarse Grain Reconfigurable Architecture (CGRA). The proposed interconnect enables better interaction between computation fabric and memory fabric. The system can modify its memory to computation element ratio at runtime. The extensive capabilities of the memory system are analyzed by interfacing it with a Dynamically Reconfigurable Resource Array (DRRA), a CGRA. The interconnect can provide multiple interfaces which supports upto 8 GB/s per interface.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-7 av 7

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy