SwePub
Tyck till om SwePub Sök här!
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Hemani Ahmed) "

Sökning: WFRF:(Hemani Ahmed)

  • Resultat 211-220 av 284
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
211.
  • Pudi, Dhilleswararao, et al. (författare)
  • Implementation of Sobel Edge Detection on DRRA and DiMArch Architectures *
  • 2023
  • Ingår i: Proceedings - 2023 26th Euromicro Conference on Digital System Design, DSD 2023. - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 16-23
  • Konferensbidrag (refereegranskat)abstract
    • Edge detection is a fundamental operation in image processing, serving as a crucial step in various applications such as object recognition, image segmentation, and scene understanding. The Sobel edge detection algorithm has emerged as a widely used method for detecting vertical and horizontal edges in digital images. However, performing edge detection on high-resolution images with large dimensions can be computationally intensive and time-consuming. Specialized hardware solutions such as Field Programmable Gate Arrays (FPGAs) and Coarse-Grained Reconfigurable Arrays (CGRAs) offer significant advantages over general-purpose processors for implementing edge detection algorithms. This paper proposes algorithms for implementing the Sobel edge detection algorithm using two CGRA fabrics: dynamically reconfigurable resource array and distributed memory architecture. Furthermore, we discuss the implementation of Sobel edge detection on the target architecture for an input matrix of arbitrary size. Finally, the proposed approaches were compared with other CGRA-based implementations in terms of latency. The experimental results show that the proposed approaches exhibit significantly lower latency compared to other CGRA - based implempntations.
  •  
212.
  • Pudi, Dhilleswararao, et al. (författare)
  • Methodology for Structured Data-Path Implementation in VLSI Physical Design : A Case Study
  • 2022
  • Ingår i: Electronics. - : MDPI AG. - 2079-9292. ; 11:18
  • Tidskriftsartikel (refereegranskat)abstract
    • State-of-the-art modern microprocessor and domain-specific accelerator designs are dominated by data-paths composed of regular structures, also known as bit-slices. Random logic placement and routing techniques may not result in an optimal layout for these data-path-dominated designs. As a result, implementation tools such as Cadence's Innovus include a Structured Data-Path (SDP) feature that allows data-path placement to be completely customized by constraining the placement engine. A relative placement file is used to provide these constraints to the tool. However, the tool neither extracts nor automatically places the regular data-path structures. In other words, the relative placement file is not automatically generated. In this paper, we propose a semi-automated method for extracting bit-slices from the Innovus SDP flow. It has been demonstrated that the proposed method results in 17% less density or use for a pixel buffer design. At the same time, the other performance metrics are unchanged when compared to the traditional place and route flow.
  •  
213.
  • Raza, Asad, et al. (författare)
  • Security characterization for evaluation of software architectures using ATAM
  • 2009
  • Ingår i: IEEE International Conference on Information and Communication Technologies, 2009. ICICT '09.. - Karachi, Pakistan. ; , s. 241-246
  • Konferensbidrag (refereegranskat)abstract
    • Significant technological advancement in the current electronic era has influenced the work processes of private and government business entities. E-Government is one such area where almost every country is emphasizing and automating their work processes. Software architecture is the integral constituent of any software system with not only cumbersome modeling and development but require heedful evaluation. Considering this aspect we have highlighted in this paper, security evaluation of an ongoing e-society project ESAM using Architectural Tradeoff Analysis Method (ATAM). ESAM is a web based system intended to provide e-services to the Swedish community residents. ATAM is primarily used for architectural evaluation aligned with the quality goals i.e. performance, availability and modifiability of an organization. We present research analysis for characterization, stimuli, and architectural decisions to evaluate software architecture with respect to security measures using ATAM. This security characterization will serve as a tool to evaluate security aspects of a software architecture using ATAM. We believe that ATAM capability of evaluating software security will provide potential benefits in secure software development.
  •  
214.
  •  
215.
  • Rezk, Nesma, 1987- (författare)
  • Exploring Efficient Implementations of Deep Learning Applications on Embedded Platforms
  • 2020
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • The promising results of deep learning (deep neural network) models in many applications such as speech recognition and computer vision have aroused a need for their realization on embedded platforms. Augmenting DL (Deep Learning) in embedded platforms grants them the support to intelligent tasks in smart homes, mobile phones, and healthcare applications. Deep learning models rely on intensive operations between high precision values. In contrast, embedded platforms have restricted compute and energy budgets. Thus, it is challenging to realize deep learning models on embedded platforms.In this thesis, we define the objectives of implementing deep learning models on embedded platforms. The main objective is to achieve efficient implementations. The implementation should achieve high throughput, preserve low power consumption, and meet real-time requirements.The secondary objective is flexibility. It is not enough to propose an efficient hardware solution for one model. The proposed solution should be flexible to support changes in the model and the application constraints. Thus, the overarching goal of the thesis is to explore flexible methods for efficient realization of deep learning models on embedded platforms.Optimizations are applied to both the DL model and the embedded platform to increase implementation efficiency. To understand the impact of different optimizations, we chose recurrent neural networks (as a class of DL models) and compared its' implementations on embedded platforms. The comparison analyzes the optimizations applied and the corresponding performance to provide conclusions on the most fruitful and essential optimizations. We concluded that it is essential to apply an algorithmic optimization to the model to decrease it's compute and memory requirement, and it is essential to apply a memory-specific optimization to hide the overhead of memory access to achieve high efficiency. Furthermore, it has been revealed that many of the work understudy focus on implementation efficiency, and flexibility is less attempted.We have explored the design space of Convolutional neural networks (CNNs) on Epiphany manycore architecture. We adopted a pipeline implementation of CNN that relies on the on-chip memory solely to store the weights. Also, the proposed mapping supported both ALexNet and GoogleNet CNN models, varying precision for weights, and two memory sizes for Epiphany cores. We were able to achieve competitive performance with respect to emerging manycores.As a part of the work in progress, we have studied a DL-architecture co-design approach to increase the flexibility of hardware solutions. A flexible platform should support variations in the model and variations in optimizations. The optimization method should be automated to respond to the changes in the model and application constraints with minor effort. Besides, the mapping of the models on embedded platforms should be automated as well.
  •  
216.
  • Rezk, Nesma, 1987-, et al. (författare)
  • MOHAQ : Multi-Objective Hardware-Aware Quantization of recurrent neural networks
  • 2022
  • Ingår i: Journal of systems architecture. - Amsterdam : Elsevier BV. - 1383-7621 .- 1873-6165. ; 133
  • Tidskriftsartikel (refereegranskat)abstract
    • The compression of deep learning models is of fundamental importance in deploying such models to edge devices. The selection of compression parameters can be automated to meet changes in the hardware platform and application. This article introduces a Multi-Objective Hardware-Aware Quantization (MOHAQ) method, which considers hardware performance and inference error as objectives for mixed-precision quantization. The proposed method feasibly evaluates candidate solutions in a large search space by relying on two steps. First, post-training quantization is applied for fast solution evaluation (inference-only search). Second, we propose the "beacon-based search" to retrain selected solutions only and use them as beacons to estimate the effect of retraining on other solutions. We use speech recognition models on TIMIT dataset. Experimental evaluations show that Simple Recurrent Unit (SRU)-based models can be compressed up to 8x by post-training quantization without any significant error increase. On SiLago, we found solutions that achieve 97% and 86% of the maximum possible speedup and energy saving, with a minor increase in error on an SRU-based model. On Bitfusion, the beacon-based search reduced the error gain of the inference-only search on SRU-based models and Light Gated Recurrent Unit (LiGRU)-based model by up to 4.9 and 3.9 percentage points, respectively.
  •  
217.
  • Shami, Muhammad Ali, 1980-, et al. (författare)
  • Address generation scheme for a coarse grain reconfigurable architecture
  • 2011
  • Ingår i: Proc. IEEE Int Application-Specific Systems, Architectures and Processors (ASAP) Conf. - 9781457712920 ; , s. 17-24
  • Konferensbidrag (refereegranskat)abstract
    • In this paper, we describe a versatile address generation scheme for distributed storage resources of a coarse grain Parallel Distributed Digital Signal Processing (PDDSP) reconfigurable architecture under development in our group. This scheme proposes the distributed address generation units (AGUs) to decouple the address generation logic with compute logic to exploit parallelism (ILP and TLP). To achieve this, the proposed distributed address generation scheme with standard DSP address generation modes like linear vectorized, circular buffer and bit-reverse addressing, all with parameterizable address range and increment/decrement offsets is further enhanced with temporal flexibility by introducing three dynamically programmable delays: initial delay before the stream starts, middle delay after every address generation for the stream and end delay after the stream is complete. The dynamic programmability of these delays makes streams elastic that can be chained with an interrupt mechanism to create chained-elastic streams. Our approach is compared with the traditional approach of using VLIW and Scalar. Our approach shows 21times;(Scalar), 10×(VLIW) reduction in instructions and 2×(Scalar) reduction in cycles for a single thread FIR filter. When compared for Synchronous and Asynchronous scenarios of two parallel treads T1 and T2, our approach shows 4.6×(Scalar), 5.6×(VLIW) reduction in instructions, 1.76 reduction in cycles for Synchronous and 4.6×(Scalar), 15×(VLIW) eduction in instructions, 1.76×(Scalar) reduction in cycles for Asynchronous threads.
  •  
218.
  • Shami, Muhammad Ali, 1980-, et al. (författare)
  • An improved self-reconfigurable interconnection scheme for a Coarse Grain Reconfigurable Architecture
  • 2010
  • Ingår i: NORCHIP 2010. - 9781424489732
  • Konferensbidrag (refereegranskat)abstract
    • An improved Dynamic, Partial and self reconfigurable interconnection network (Hybrid-2 Network) is presented for Dynamically Reprogrammable Resource Array (DRRA), which is a Coarse Grain Reconfiguration Architecture (CGRA). To justify the design decision, Hybrid-2 network implementation is compared against the possible implementations using Multiplexer, NoC, Crossbar and already published Hybrid-1 interconnection network. Results shows that newly presented Hybrid-2 Interconnection network take (1.08x, 0.104x, 0.212x and 0.681x) the area, (1x, 0.037x, 0.026x and 0.107x) the configuration bits of Multiplexer, NoC, Crossbar and Hybrid-1 Implementation respectively. Hybrid-2 network is also 2.87x and 5.86x faster than Multiplexer and Hybrid-1 networks.
  •  
219.
  • Shami, Muhammad Ali, et al. (författare)
  • Classification of Massively Parallel Computer Architectures
  • 2012
  • Ingår i: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012. - : IEEE. - 9780769546766 ; , s. 344-351
  • Konferensbidrag (refereegranskat)abstract
    • Faced with slowing performance and energy benefits of technology scaling, VLSI/Computer architectures have turned from parallel to massively parallel machines for personal and embedded applications in the form of multi and many core architectures. Additionally, in the pursuit of finding the sweet spot between engineering and computational efficiency, massively parallel Coarse Grain Reconfigurable Architectures(CRGAs) have been researched. While these articles have been surveyed, they have not been rigorously classified to enable objective differentiation and comparison for performance, area and flexibility. In this paper, we extend the well known Skillicorn taxonomy to create new classes, present a scoring system to rate these classes on flexibility, and present equations for early estimation of area and configuration overheads. Furthermore, we use this extended classification scheme to classify and compare 25 different massively parallel architectures that covers most of the reported CGRAs and other well known multi and many core architectures.
  •  
220.
  • Shami, Muhammad Ali, et al. (författare)
  • Configurable FFT Processor Using Dynamically Reconfigurable Resource Arrays
  • 2019
  • Ingår i: Journal of Signal Processing Systems. - : SPRINGER. - 1939-8018 .- 1939-8115. ; 91:5, s. 459-473
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents results of using a Coarse Grain Reconfigurable Architecture called DRRA (Dynamically Reconfigurable Resource Array) for FFT implementations varying in order and degree of parallelism using radix-2 decimation in time (DIT). The DRRA fabric is extended with memory architecture to be able to deal with data-sets much larger than what can be accommodated in the register files of DRRA. The proposed implementation scheme is generic in terms of the number of FFT point, the size of memory and the size of register file in DRRA. Two implementations (DRRA-1 and DRRA-2) have been synthesized in 65 nm technology and energy/delay numbers measured with post-layout annotated gate level simulations. The results are compared to other Coarse Grain Reconfigurable Architectures (CGRAs), and dedicated FFT processors for 1024 and 2048 point FFT. For 1024 point FFT, in terms of FFT operations per unit energy, DRRA-1 and DRRA-2 outperforms all CGRA by at least 2x and is worse than ASIC by 3.45x. However, in terms of energy-delay product DRRA-2 outperforms CGRAs by at least 1.66x and dedicated FFT processors by at least 10.9x. For 2048-point FFT, DRRA-1 and DRRA-2 are 10x better for energy efficiency and 94.84 better for energy-delay product. However, radix-2 implementation is worse by 9.64x and 255x in terms of energy efficiency and energy-delay product when compared against a radix-2(4) implementation.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 211-220 av 284
Typ av publikation
konferensbidrag (212)
tidskriftsartikel (43)
doktorsavhandling (11)
rapport (8)
bokkapitel (4)
annan publikation (2)
visa fler...
licentiatavhandling (2)
samlingsverk (redaktörskap) (1)
proceedings (redaktörskap) (1)
visa färre...
Typ av innehåll
refereegranskat (246)
övrigt vetenskapligt/konstnärligt (38)
Författare/redaktör
Hemani, Ahmed (225)
Hemani, Ahmed, 1961- (47)
Jantsch, Axel (44)
Tenhunen, Hannu (43)
Öberg, Johnny (41)
Ellervee, Peeter (36)
visa fler...
Paul, Kolin (30)
Kumar, Shashi (24)
Stathis, Dimitrios (22)
Plosila, Juha (20)
Farahini, Nasim (20)
Postula, Adam (20)
Svantesson, Bengt (19)
Abbas, Haider (16)
Yngström, Louise (16)
Li, Shuo (15)
Yang, Yu (15)
Jafri, Syed Mohammad ... (14)
Kumar, Anshul (13)
Jafri, Syed (11)
O'Nils, Mattias (10)
Daneshtalab, Masoud (10)
Chabloz, Jean-Michel (10)
Penolazzi, Sandro (10)
Hemani, Ahmed, Profe ... (9)
Tajammul, Muhammad A ... (9)
Lu, Zhonghai (8)
Liu, Pei (8)
Lindqvist, Dan (8)
Meincke, Thomas (8)
Jafri, Syed M. A. H. (8)
Magnusson, Christer (7)
Sander, Ingo (7)
Badawi, Mohammad (7)
Lansner, Anders, Pro ... (6)
Zou, Zhuo (6)
Shami, Muhammad Ali (6)
Olsson, Thomas (5)
Nilsson, Peter (5)
Deb, Abhijit Kumar (5)
Sohofi, Hassan (5)
Isoaho, Jouni (5)
Mokhtari, Mehran (5)
Xu, Jiawei (5)
Zheng, Li-Rong (4)
Lansner, Anders (4)
Li, Feng (4)
Lansner, Anders, Pro ... (4)
Boppu, Srinivas (4)
Wang, Deyu (4)
visa färre...
Lärosäte
Kungliga Tekniska Högskolan (272)
Stockholms universitet (13)
Lunds universitet (4)
Mittuniversitetet (3)
Uppsala universitet (2)
Högskolan i Halmstad (2)
visa fler...
Umeå universitet (1)
Linköpings universitet (1)
Jönköping University (1)
visa färre...
Språk
Engelska (282)
Odefinierat språk (2)
Forskningsämne (UKÄ/SCB)
Teknik (225)
Naturvetenskap (51)
Medicin och hälsovetenskap (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy