SwePub
Tyck till om SwePub Sök här!
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Hemani Ahmed) srt2:(2020-2024)"

Sökning: WFRF:(Hemani Ahmed) > (2020-2024)

  • Resultat 1-10 av 32
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Altayo Gonzalez, u1dr0yqp, et al. (författare)
  • Synthesis of Predictable Global NoC by Abutment in Synchoros VLSI Design
  • 2021
  • Ingår i: Proceedings - 2021 15th IEEE/ACM International Symposium on Networks-on-Chip, NOCS 2021. - New York, NY, USA : Association for Computing Machinery (ACM). ; , s. 61-66
  • Konferensbidrag (refereegranskat)abstract
    • Synchoros VLSI design style has been proposed as an alternative to the standard cell-based design style; the word synchoros is derived from the Greek word choros for space. Synchoricity discretises space with a virtual grid, the way synchronicity discretises time with clock ticks. SiLago (Silicon Lego) blocks are atomic synchoros building blocks like Lego bricks. SiLago blocks absorb all metal layer details, i.e., all wires, to enable composition by abutment of valid; valid in the sense of being technology design rules compliant, timing clean and OCV ruggedized. Effectively, composition by abutment eliminates logic and physical synthesis for the end user. Like Lego system, synchoricity does need a finite number of SiLago block types to cater to different types of designs. Global NoCs are important system level design components. In this paper, we show, how with a small library of SiLago blocks for global NoCs, it is possible to automatically synthesize arbitrary global NoCs of different types, dimensions, and topology. The synthesized global NoCs are not only valid VLSI designs, but their cost metrics (area, latency, and energy) are known with post-layout accuracy in linear time. We argue that this is essential to be able to do chip-level design space exploration. We show how the abstract timing model of such global NoC SiLago blocks can be built and used to analyse the timing of global NoC links with post layout accuracy and in linear time. We validate this claim by subjecting the same VLSI designs of global NoC to commercial EDA's static timing analysis and show that the abstract timing analysis enabled by synchoros VLSI design gives the same results as the commercial EDA tools.
  •  
2.
  • Baccelli, Guido, et al. (författare)
  • NACU : A Non-Linear Arithmetic Unit for Neural Networks
  • 2020
  • Ingår i: PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC). - : IEEE.
  • Konferensbidrag (refereegranskat)abstract
    • Reconfigurable architectures targeting neural networks are an attractive option. They allow multiple neural networks of different types to be hosted on the same hardware, in parallel or sequence. Reconfig-urability also grants the ability to morph into different micro-architectures to meet varying power-performance constraints. In this context, the need for a reconfigurable non-linear computational unit has not been widely researched. In this work, we present a formal and comprehensive method to select the optimal fixed-point representation to achieve the highest accuracy against the floating-point implementation benchmark. We also present a novel design of an optimised reconfigurable arithmetic unit for calculating non-linear functions. The unit can be dynamically configured to calculate the sigmoid, hyperbolic tangent, and exponential function using the same underlying hardware. We compare our work with the state-of-the-art and show that our unit can calculate all three functions without loss of accuracy.
  •  
3.
  • Dhilleswararao, Pudi, et al. (författare)
  • Efficient Implementation of 2-D Convolution on DRRA and DiMArch Architectures
  • 2023
  • Ingår i: Proceedings of the 13th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, HEART 2023. - : Association for Computing Machinery (ACM). ; , s. 86-92
  • Konferensbidrag (refereegranskat)abstract
    • Convolution has been widely employed in image processing and computer vision applications such as picture augmentation, smoothing, and structure extraction. In addition, convolution operations are the most prevalent computing patterns in machine learning domains. Convolutions, for example, are used in a substantial chunk of state-of-the-art convolutional neural network operations. Therefore, effectively mapping convolution operations onto hardware architectures is crucial for achieving superior performance while accelerating convolutional neural networks. In this paper, we proposed various algorithms to efficiently map the 2-D convolution operation onto a dynamically reconfigurable resource array and distributed memory architecture. Furthermore, we have discussed the mapping of 2-D convolution on the target architecture for an input matrix of arbitrary size, as well as the generalization of the proposed approaches for multi-column DRRA architectures.
  •  
4.
  •  
5.
  • Iqbal, W., et al. (författare)
  • PCSS : Privacy Preserving Communication Scheme for SDN Enabled Smart Homes
  • 2022
  • Ingår i: IEEE Sensors Journal. - : Institute of Electrical and Electronics Engineers (IEEE). - 1530-437X .- 1558-1748. ; 22:18, s. 17677-17690
  • Tidskriftsartikel (refereegranskat)abstract
    • Smart home technology aka home automation system allows the homeowner and residents to control and monitor the smart devices like HVAC, fridge, doors, cameras etc. These features offer peace of mind to users by providing a safe and well-suited environment. However, at the same time the connected devices are exploited by the cybercriminals for carrying out various sophisticated attacks due to no or minimal security functionalities in the currently produced smart devices. Due to no authentication and plain text data transmission, intruders can get user profiles, learn user behavior, and can even inject malwares in the un-authenticated devices. Therefore, authentication and privacy preserving user queries remains the key issues in wide adaptation of such technologies. Adding to this dilemma, the traditional security solutions cannot be deployed in the low processing devices. Therefore, to overcome the security issues of these low processing gadgets, a network level, lightweight cryptographic security mechanisms are necessitated where the processing is done at the network level middle box rather than low resources end devices. In this aspect, the evolving networking paradigm Software Defined Networking (SDN) offers such properties like programmability, agility, centralized management, and vendor neutrality that overcome the conventional networking control, management, and security problems. The controller of SDN at the control layer manages all the computation and complexities at the network level, rather than at the smart devices. Therefore, in this research, we present a privacy preserving communication scheme for SDN enabled smart homes (PCSS), which aims at provisioning user and smart device authentication, privacy for data (rest and transit) and user queries. It hinders the learning and modification of data by any intruder during the transmission and features mutual authentication of user, controller, and smart device. PCSS, also offers privacy preserving user queries for the smart homes. This is achieved by proposing a symmetric key based lightweight authentication and searchable encrypted queries protocol. We further highlight that the experimental results show the efficacy and usefulness of PCSS scheme when compared with existing secure smart home/system protocols. 
  •  
6.
  • Jafri, Syed, et al. (författare)
  • Refresh Triggered Computation : Improving the Energy Efficiency of Convolutional Neural Network Accelerators
  • 2021
  • Ingår i: ACM Transactions on Architecture and Code Optimization (TACO). - : Association for Computing Machinery (ACM). - 1544-3566 .- 1544-3973. ; 18:1
  • Tidskriftsartikel (refereegranskat)abstract
    • To employ a Convolutional Neural Network (CNN) in an energy-constrained embedded system, it is critical for the CNN implementation to be highly energy efficient. Many recent studies propose CNN accelerator architectures with custom computation units that try to improve the energy efficiency and performance of CNNs by minimizing data transfers from DRAM-based main memory. However, in these architectures, DRAM is still responsible for half of the overall energy consumption of the system, on average. A key factor of the high energy consumption of DRAM is the refresh overhead, which is estimated to consume 40% of the total DRAM energy. In this article, we propose a new mechanism, Refresh Triggered Computation (RTC), that exploits the memory access patterns of CNN applications to reduce the number of refresh operation& RTC uses two major techniques to mitigate the refresh overhead. First, Refresh Triggered Transfer (RTT) is based on our new observation that a CNN application accesses a large portion of the DRAM in a predictable and recurring manner. Thus, the read/write accesses of the application inherently refresh the DRAM, and therefore a significant fraction of refresh operations can be skipped. Second, Partial Array Auto-Refresh (PARR) eliminates the refresh operations to DRAM regions that do not store any data. We propose three RTC designs (min-RTC, mid-RTC, and full-RTC), each of which requires a different level of aggressiveness in terms of customization to the DRAM subsystem. All of our designs have small overhead. Even the most aggressive RTC design (i.e., full-RTC) imposes an area overhead of only 0.18% in a 16 Gb DRAM chip and can have less overhead for denser chips. Our experimental evaluation on six well-known CNNs shows that RTC reduces average DRAM energy consumption by 24.4% and 61.3% for the least aggressive and the most aggressive RTC implementations, respectively. Besides CNNs, we also evaluate our RTC mechanism on three workloads from other domains. We show that RTC saves 31.9% and 16.9% DRAM energy for Face Recognition and Bayesian Confidence Propagation Neural Network (BCPNN), respectively. We believe RTC can be applied to other applications whose memory access patterns remain predictable for a sufficiently long time.
  •  
7.
  • Kallapu, Reeshita, et al. (författare)
  • DRRA-based Reconfigurable Architecture for Mixed-Radix FFT
  • 2023
  • Ingår i: Proceedings of the IEEE International Conference on VLSI Design. - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 25-30
  • Konferensbidrag (refereegranskat)abstract
    • Fast-Fourier Transform is an important algorithm which is used in digital signal processing and communication applications. Furthermore, mixed-radix FFT provides flexibility and increases the speed of FFT computation. For real-time processing, efficient hardware implementation using reconfigurable architectures is preferred which can offer higher performance and flexibility. In this paper, we propose an architecture for the implementation of the FFT that is derived from the Dynamically Reconfigurable Resource Array and has multiple parallel processing cells while also providing the flexibility to select the radix for each stage of the FFT. The twiddle factor generator proposed in this architecture minimizes the memory requirements and simplifies the hardware. Using the proposed architecture, various length FFTs were mapped onto either single cell or multiple cells in parallel. It is observed that the proposed architecture improves the performance by 2x times when compared to the existing FFT architectures.
  •  
8.
  • Liu, Lizheng, et al. (författare)
  • A FPGA-based Hardware Accelerator for Bayesian Confidence Propagation Neural Network
  • 2020
  • Ingår i: 2020 IEEE Nordic Circuits and Systems Conference, NORCAS 2020 - Proceedings. - : Institute of Electrical and Electronics Engineers (IEEE).
  • Konferensbidrag (refereegranskat)abstract
    • The Bayesian Confidence Propagation Neural Network (BCPNN) has been applied in higher level of cognitive intelligence (e.g. working memory, associative memory). However, in the spike-based version of this learning rule the pre-, postsynaptic and coincident activity is traced in three low-passfiltering stages, the calculation processes of weight update are very computationally intensive. In this paper, a hardware architecture of the updating process for lazy update mode is proposed for updating 8 local synaptic state variables. The parallelism by decomposing the calculation steps of formulas based on the inherent data dependencies is optimized. The FPGA-based hardware accelerator of BCPNN is designed and implemented. The experimental results show the updating process on FPGA can be accomplished within 110 ns with a clock frequency of 200 MHz, the updating speed is greatly enhanced compared with the CPU test. The trade-off between performance, accuracy and resources on dedicated hardware is evaluated, and the impact of the module reuse on resource consumption and computing performance is evaluated.
  •  
9.
  • Mirsalari, Seyed Ahmad, et al. (författare)
  • Optimizing Self-Organizing Maps for Bacterial Genome Identification on Parallel Ultra-Low-Power Platforms
  • 2023
  • Ingår i: ICECS 2023 - 2023 30th IEEE International Conference on Electronics, Circuits and Systems: Technosapiens for Saving Humanity. - : Institute of Electrical and Electronics Engineers (IEEE).
  • Konferensbidrag (refereegranskat)abstract
    • Pathogenic bacteria significantly threaten human health, highlighting the need for precise and efficient methods for swiftly identifying bacterial species. This paper addresses the challenges associated with performing genomics computations for pathogen identification on embedded systems with limited computational power. We propose an optimized implementation of Self-Organizing Maps (SOMs) targeting a parallel ultra-low-power platform based on the RISC-V instruction set architecture. We propose two mapping methods for implementing the SOM algorithm on a parallel cluster, coupled with software techniques to improve the throughput. Orthogonally to parallelization, we investigate the impact of smaller-than-32-bit floating-point formats (smallFloats) on energy savings, precision, and performance. Our experimental results show that all smallFloat formats exhibit a 100% classification accuracy. The parallel variants achieve a speed-up of 1.98 × , 3.79 ×, and 6.83 × on 2, 4, and 8 cores, respectively. Comparing our design with a 16-bit fixed-point implementation on a coarse grain reconfigurable architecture (CGRA), the FP8 implementation achieves, on average, 1. 42 × energy efficiency, 1. 51 × speedup, and a 50% reduction in memory footprint compared to CGRA. Furthermore, FP8 vectorization increases the average speed-up by 2.5 ×.
  •  
10.
  • Patan, A. K., et al. (författare)
  • Design and Implementation of Optimized Register File for Streaming Applications
  • 2021
  • Ingår i: 2021 25th International Symposium on VLSI Design and Test, VDAT 2021. - : Institute of Electrical and Electronics Engineers (IEEE).
  • Konferensbidrag (refereegranskat)abstract
    • The increased demand for energy-efficient solutions compels system architects to explore the opportunities for minimizing area and power in the critical parts of a system. The register file is one such essential and critical component of any processor system that provides local storage for computing hardware such as arithmetic and logical unit. In this paper, we present an optimized design and implementation of a synthesizable register file that reduces the area and power consumption over an existing design. The proposed design is functionally equivalent to the existing design and uses latches in its core as main storage elements as opposed to the flip-flops; thus, reducing the area and power consumption. The proposed design has 10% less area and 23% less leakage power than the existing design when synthesized using a CMOS 45nm process libraries. Furthermore, the back-end implementation results show that the proposed design has 13% less core utilization and 2.3X less power. 
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 32

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy