SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Hemani Ahmed 1961 ) "

Sökning: WFRF:(Hemani Ahmed 1961 )

  • Resultat 1-10 av 51
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Liu, Lizheng, et al. (författare)
  • A FPGA-based Hardware Accelerator for Bayesian Confidence Propagation Neural Network
  • 2020
  • Ingår i: 2020 IEEE Nordic Circuits and Systems Conference, NORCAS 2020 - Proceedings. - : Institute of Electrical and Electronics Engineers (IEEE).
  • Konferensbidrag (refereegranskat)abstract
    • The Bayesian Confidence Propagation Neural Network (BCPNN) has been applied in higher level of cognitive intelligence (e.g. working memory, associative memory). However, in the spike-based version of this learning rule the pre-, postsynaptic and coincident activity is traced in three low-passfiltering stages, the calculation processes of weight update are very computationally intensive. In this paper, a hardware architecture of the updating process for lazy update mode is proposed for updating 8 local synaptic state variables. The parallelism by decomposing the calculation steps of formulas based on the inherent data dependencies is optimized. The FPGA-based hardware accelerator of BCPNN is designed and implemented. The experimental results show the updating process on FPGA can be accomplished within 110 ns with a clock frequency of 200 MHz, the updating speed is greatly enhanced compared with the CPU test. The trade-off between performance, accuracy and resources on dedicated hardware is evaluated, and the impact of the module reuse on resource consumption and computing performance is evaluated.
  •  
2.
  • Abbas, Haider, et al. (författare)
  • DUDE: Decryption, Unpacking, Deobfuscation, and Endian Conversion Framework for Embedded Devices Firmware
  • 2023
  • Ingår i: IEEE Transactions on Dependable and Secure Computing. - : Institute of Electrical and Electronics Engineers (IEEE). - 1545-5971 .- 1941-0018.
  • Tidskriftsartikel (refereegranskat)abstract
    • Commercial-Off-The-Shelf (COTS) embedded devices rely on vendor-specific firmware to perform essential tasks. These firmware have been under active analysis by researchers to check security features and identify possible vendor backdoors. However, consistently unpacking newly created filesystem formats has been exceptionally challenging. To thwart attempts at unpacking, vendors frequently use encryption and obfuscation methods. On the other hand, when handling encrypted, obfuscated, big endian cramfs, or custom filesystem formats found in firmware under test, the available literature and tools are insufficient. This study introduces DUDE, an automated framework that provides novel functionalities, outperforming cutting-edge tools in the decryption, unpacking, deobfuscation, and endian conversion of firmware. For big endian compressed romfs filesystem formats, DUDE supports endian conversion. It also supports deobfuscating obfuscated signatures for successful unpacking. Moreover, decryption support for encrypted binaries from the D-Link and MOXA series has also been added, allowing for easier analysis and access to the contents of these firmware files. Additionally, the framework offers unpacking assistance by supporting the extraction of special filesystem formats commonly found in firmware samples from various vendors. A remarkable 78% (1424 out of 1814) firmware binaries from different vendors were successfully unpacked using the suggested framework. This performance surpasses the capabilities of commercially available tools combined on a single platform.
  •  
3.
  • Altayo Gonzalez, u1dr0yqp, et al. (författare)
  • Synthesis of Predictable Global NoC by Abutment in Synchoros VLSI Design
  • 2021
  • Ingår i: Proceedings - 2021 15th IEEE/ACM International Symposium on Networks-on-Chip, NOCS 2021. - New York, NY, USA : Association for Computing Machinery (ACM). ; , s. 61-66
  • Konferensbidrag (refereegranskat)abstract
    • Synchoros VLSI design style has been proposed as an alternative to the standard cell-based design style; the word synchoros is derived from the Greek word choros for space. Synchoricity discretises space with a virtual grid, the way synchronicity discretises time with clock ticks. SiLago (Silicon Lego) blocks are atomic synchoros building blocks like Lego bricks. SiLago blocks absorb all metal layer details, i.e., all wires, to enable composition by abutment of valid; valid in the sense of being technology design rules compliant, timing clean and OCV ruggedized. Effectively, composition by abutment eliminates logic and physical synthesis for the end user. Like Lego system, synchoricity does need a finite number of SiLago block types to cater to different types of designs. Global NoCs are important system level design components. In this paper, we show, how with a small library of SiLago blocks for global NoCs, it is possible to automatically synthesize arbitrary global NoCs of different types, dimensions, and topology. The synthesized global NoCs are not only valid VLSI designs, but their cost metrics (area, latency, and energy) are known with post-layout accuracy in linear time. We argue that this is essential to be able to do chip-level design space exploration. We show how the abstract timing model of such global NoC SiLago blocks can be built and used to analyse the timing of global NoC links with post layout accuracy and in linear time. We validate this claim by subjecting the same VLSI designs of global NoC to commercial EDA's static timing analysis and show that the abstract timing analysis enabled by synchoros VLSI design gives the same results as the commercial EDA tools.
  •  
4.
  • Baccelli, Guido, et al. (författare)
  • NACU : A Non-Linear Arithmetic Unit for Neural Networks
  • 2020
  • Ingår i: PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC). - : IEEE.
  • Konferensbidrag (refereegranskat)abstract
    • Reconfigurable architectures targeting neural networks are an attractive option. They allow multiple neural networks of different types to be hosted on the same hardware, in parallel or sequence. Reconfig-urability also grants the ability to morph into different micro-architectures to meet varying power-performance constraints. In this context, the need for a reconfigurable non-linear computational unit has not been widely researched. In this work, we present a formal and comprehensive method to select the optimal fixed-point representation to achieve the highest accuracy against the floating-point implementation benchmark. We also present a novel design of an optimised reconfigurable arithmetic unit for calculating non-linear functions. The unit can be dynamically configured to calculate the sigmoid, hyperbolic tangent, and exponential function using the same underlying hardware. We compare our work with the state-of-the-art and show that our unit can calculate all three functions without loss of accuracy.
  •  
5.
  • Dhilleswararao, Pudi, et al. (författare)
  • Efficient Implementation of 2-D Convolution on DRRA and DiMArch Architectures
  • 2023
  • Ingår i: Proceedings of the 13th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, HEART 2023. - : Association for Computing Machinery (ACM). ; , s. 86-92
  • Konferensbidrag (refereegranskat)abstract
    • Convolution has been widely employed in image processing and computer vision applications such as picture augmentation, smoothing, and structure extraction. In addition, convolution operations are the most prevalent computing patterns in machine learning domains. Convolutions, for example, are used in a substantial chunk of state-of-the-art convolutional neural network operations. Therefore, effectively mapping convolution operations onto hardware architectures is crucial for achieving superior performance while accelerating convolutional neural networks. In this paper, we proposed various algorithms to efficiently map the 2-D convolution operation onto a dynamically reconfigurable resource array and distributed memory architecture. Furthermore, we have discussed the mapping of 2-D convolution on the target architecture for an input matrix of arbitrary size, as well as the generalization of the proposed approaches for multi-column DRRA architectures.
  •  
6.
  •  
7.
  • Hemani, Ahmed, 1961-, et al. (författare)
  • Synchoricity and NOCs could make Billion Gate custom hardware centric SOCs affordable
  • 2017
  • Ingår i: 2017 11th IEEE/ACM International Symposium on Networks-on-Chip, NOCS 2017. - New York, NY, USA : Association for Computing Machinery (ACM). - 9781450349840
  • Konferensbidrag (refereegranskat)abstract
    • In this paper, we present a novel synchoros VLSI design scheme that discretizes space uniformly. Synchoros derives from the Greek word chóros for space. We propose raising the physical design abstraction to register transfer level by using coarse grain reconfigurable building blocks called SiLago blocks. SiLago blocks are hardened, synchoros and are used to create arbitrarily complex VLSI design instances by abutting them and not requiring any further logic and physical syntheses. SiLago blocks are interconnected by two levels of NOCs, regional and global. By configuring the SiLago blocks and the two levels of NOCs, it is possible to create implementation alternatives whose cost metrics can be evaluated with agility and post layout accuracy. This framework, called the SiLago framework includes a synthesis based design flow that allows end to end automation of multi-million gate functionality modeled as SDF in Simulink to be transformed into timing and DRC clean physical design in minutes, while exploring 100s of solutions. We benchmark the synthesis efficiency, and silicon and computational efficiencies against the conventional standard cell based tooling to show two orders improvement in accuracy and three orders improvement in synthesis while eliminating the need to verify at lower abstractions like RTL. The proposed solution is being extended to deal with system-level non-compile time functionalities. We also present arguments on how synchoricity could also contribute to eliminating the engineering cost of designing masks to lower the manufacturing cost.
  •  
8.
  • Iqbal, W., et al. (författare)
  • PCSS : Privacy Preserving Communication Scheme for SDN Enabled Smart Homes
  • 2022
  • Ingår i: IEEE Sensors Journal. - : Institute of Electrical and Electronics Engineers (IEEE). - 1530-437X .- 1558-1748. ; 22:18, s. 17677-17690
  • Tidskriftsartikel (refereegranskat)abstract
    • Smart home technology aka home automation system allows the homeowner and residents to control and monitor the smart devices like HVAC, fridge, doors, cameras etc. These features offer peace of mind to users by providing a safe and well-suited environment. However, at the same time the connected devices are exploited by the cybercriminals for carrying out various sophisticated attacks due to no or minimal security functionalities in the currently produced smart devices. Due to no authentication and plain text data transmission, intruders can get user profiles, learn user behavior, and can even inject malwares in the un-authenticated devices. Therefore, authentication and privacy preserving user queries remains the key issues in wide adaptation of such technologies. Adding to this dilemma, the traditional security solutions cannot be deployed in the low processing devices. Therefore, to overcome the security issues of these low processing gadgets, a network level, lightweight cryptographic security mechanisms are necessitated where the processing is done at the network level middle box rather than low resources end devices. In this aspect, the evolving networking paradigm Software Defined Networking (SDN) offers such properties like programmability, agility, centralized management, and vendor neutrality that overcome the conventional networking control, management, and security problems. The controller of SDN at the control layer manages all the computation and complexities at the network level, rather than at the smart devices. Therefore, in this research, we present a privacy preserving communication scheme for SDN enabled smart homes (PCSS), which aims at provisioning user and smart device authentication, privacy for data (rest and transit) and user queries. It hinders the learning and modification of data by any intruder during the transmission and features mutual authentication of user, controller, and smart device. PCSS, also offers privacy preserving user queries for the smart homes. This is achieved by proposing a symmetric key based lightweight authentication and searchable encrypted queries protocol. We further highlight that the experimental results show the efficacy and usefulness of PCSS scheme when compared with existing secure smart home/system protocols. 
  •  
9.
  • Jafri, Syed, et al. (författare)
  • Refresh Triggered Computation : Improving the Energy Efficiency of Convolutional Neural Network Accelerators
  • 2021
  • Ingår i: ACM Transactions on Architecture and Code Optimization (TACO). - : Association for Computing Machinery (ACM). - 1544-3566 .- 1544-3973. ; 18:1
  • Tidskriftsartikel (refereegranskat)abstract
    • To employ a Convolutional Neural Network (CNN) in an energy-constrained embedded system, it is critical for the CNN implementation to be highly energy efficient. Many recent studies propose CNN accelerator architectures with custom computation units that try to improve the energy efficiency and performance of CNNs by minimizing data transfers from DRAM-based main memory. However, in these architectures, DRAM is still responsible for half of the overall energy consumption of the system, on average. A key factor of the high energy consumption of DRAM is the refresh overhead, which is estimated to consume 40% of the total DRAM energy. In this article, we propose a new mechanism, Refresh Triggered Computation (RTC), that exploits the memory access patterns of CNN applications to reduce the number of refresh operation& RTC uses two major techniques to mitigate the refresh overhead. First, Refresh Triggered Transfer (RTT) is based on our new observation that a CNN application accesses a large portion of the DRAM in a predictable and recurring manner. Thus, the read/write accesses of the application inherently refresh the DRAM, and therefore a significant fraction of refresh operations can be skipped. Second, Partial Array Auto-Refresh (PARR) eliminates the refresh operations to DRAM regions that do not store any data. We propose three RTC designs (min-RTC, mid-RTC, and full-RTC), each of which requires a different level of aggressiveness in terms of customization to the DRAM subsystem. All of our designs have small overhead. Even the most aggressive RTC design (i.e., full-RTC) imposes an area overhead of only 0.18% in a 16 Gb DRAM chip and can have less overhead for denser chips. Our experimental evaluation on six well-known CNNs shows that RTC reduces average DRAM energy consumption by 24.4% and 61.3% for the least aggressive and the most aggressive RTC implementations, respectively. Besides CNNs, we also evaluate our RTC mechanism on three workloads from other domains. We show that RTC saves 31.9% and 16.9% DRAM energy for Face Recognition and Bayesian Confidence Propagation Neural Network (BCPNN), respectively. We believe RTC can be applied to other applications whose memory access patterns remain predictable for a sufficiently long time.
  •  
10.
  • Jafri, Syed, et al. (författare)
  • SPEED : Open-Source Framework to Accelerate Speech Recognition on Embedded GPUs
  • 2017
  • Ingår i: Proceedings - 20th Euromicro Conference on Digital System Design, DSD 2017. - : Institute of Electrical and Electronics Engineers (IEEE). - 9781538621455 ; , s. 94-101
  • Konferensbidrag (refereegranskat)abstract
    • Due to high accuracy, inherent redundancy, and embarrassingly parallel nature, the neural networks are fast becoming mainstream machine learning algorithms. However, these advantages come at the cost of high memory and processing requirements (that can be met by either GPUs, FPGAs or ASICs). For embedded systems, the requirements are particularly challenging because of stiff power and timing budgets. Due to the availability of efficient mapping tools, GPUs are an appealing platforms to implement the neural networks. While, there is significant work that implements the image recognition (in particular Convolutional Neural Networks) on GPUs, only a few works deal with efficiently implement of speech recognition on GPUs. The work that does focus on implementing speech recognition does not address embedded systems. To tackle this issue, this paper presents SPEED (Open-source framework to accelerate speech recognition on embedded GPUs). We have used Eesen speech recognition framework because it is considered as the most accurate speech recognition technique. Experimental results reveal that the proposed techniques offer 2.6X speedup compared to state of the art.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 51

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy