SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Azhar Muhammad Waqar 1986) "

Sökning: WFRF:(Azhar Muhammad Waqar 1986)

  • Resultat 1-10 av 17
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Azhar, Muhammad Waqar, 1986, et al. (författare)
  • Approx-RM: Reducing Energy on Heterogeneous Multicore processors under Accuracy and Timing Constraints
  • 2023
  • Ingår i: Transactions on Architecture and Code Optimization. - 1544-3973 .- 1544-3566. ; 20:3
  • Tidskriftsartikel (refereegranskat)abstract
    • Reducing energy consumption while providing performance and quality guarantees is crucial for computing systems ranging from battery-powered embedded systems to data centers. This paper considers approximate iterative applications executing on heterogeneous multi-core platforms under user-specified performance and quality targets. We note that allowing a slight yet bounded relaxation in solution quality can considerably reduce the required iteration count and thereby can save significant amounts of energy. To this end, this paper proposes Approx-RM, a resource management scheme that reduces energy expenditure while guaranteeing a specified performance as well as accuracy target. Approx-RMpredicts the number of iterations required to meet the relaxed accuracy target at run-time. The time saved generates execution-time slack, which allows Approx-RM to allocate fewer resources on a heterogeneous multi-core platform in terms of DVFS, core type, and core count to save energy while meeting the performance target. Approx-RMcontributes with lightweight methods for predicting the iteration count needed to meet the accuracy target and the resources needed to meet the performance target. Approx-RM uses the aforementioned predictions to allocate just enoughresources to comply with quality of service constraints to save energy. Our evaluation shows energy savings of 31.6%, on average, compared to Race-to-idle when the accuracy is only relaxed by 1%. Approx-RM incurs timing and energy overheads of less than 0.1%.
  •  
2.
  • Azhar, Muhammad Waqar, 1986, et al. (författare)
  • Viterbi Accelerator for Embedded Processor Datapaths
  • 2012
  • Ingår i: Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors. - 1063-6862. - 9780769547688 ; , s. 133-140
  • Konferensbidrag (refereegranskat)abstract
    • We present a novel architecture for a lightweight Viterbi accelerator that can be tightly integrated inside an embedded processor. We investigate the accelerator’s impact on processor performance by using the EEMBC Viterbi benchmark and the in-house Viterbi Branch Metric kernel. Our evaluation based on the EEMBC benchmark shows that an accelerated 65-nm 2.7-ns processor datapath is 20% larger but 90% more cycle efficient than a datapath lacking the Viterbi accelerator, leading to an 87% overall energy reduction and a data throughput of 3.52 Mbit/s.
  •  
3.
  • Azhar, Muhammad Waqar, 1986, et al. (författare)
  • ARADA: Adaptive Resource Allocation for Improving Energy Efficiency in Deep Learning Accelerators
  • 2023
  • Ingår i: Proceedings of the 20th ACM International Conference on Computing Frontiers 2023, CF 2023. - 9798400701405 ; , s. 63-72
  • Konferensbidrag (refereegranskat)abstract
    • Deep Learning (DL) applications are entering every part of our life given their ability to solve complex problems. Nevertheless, energy efficiency is still a major concern due to the large computational and memory requirements. State-of-the-art accelerators strive to address this issue by optimizing the architecture to the compute requirements of DL algorithms. However, there is always a mismatch between compute and memory requirements and what is offered by a particular design. A way to close this gap is by providing run-time adaptation or resource allocation to improve efficiency. This paper proposes an adaptive resource allocation for deep learning applications (ARADA) with the goal of improving energy efficiency for deep learning accelerators. This is leveraged by having a layer-by-layer resource allocation. The rationale is that each layer in the DL model has a unique compute and memory bandwidth requirement and allocating fixed resources to all layers leads to inefficiencies. This can be achieved by means of resource allocation (e.g., voltage-frequency, memory bandwidth) to save energy without sacrificing performance. Experimental results show that applying ARADA to the execution of 9 state-of-the-art CNN models results in an energy savings of 38% on average compared to race-to-idle for an Edge TPU coupled with LPDDR4 off-chip memory.
  •  
4.
  • Azhar, Muhammad Waqar, 1986, et al. (författare)
  • Cyclic Redundancy Checking (CRC) Accelerator for the FlexCore Processor
  • 2010
  • Ingår i: Proceedings of Euromicro Conference on Digital System Design (DSD). - 9780769541716 ; , s. 675-680
  • Konferensbidrag (refereegranskat)abstract
    • A proven approach to increase performance of general-purpose processors is to add hardware accelerators. In its basic configuration, the FlexCore processor has a limited set of datapath units. But thanks to a flexible datapath interconnect and a wide control word, the FlexCore datapath is explicitly designed to support integration of special units that, on demand, can accelerate certain data-intensive applications. We present the integration of a versatile accelerator for several Cyclic Redundancy Checking (CRC) keys. Furthermore, we investigate the accelerator's impact on processor execution time and energy efficiency, using the PowerStone CRC benchmark. Our evaluation shows that the accelerated 65-nm 2.7-ns FlexCore datapath is, for example, 86% more energy and cycle efficient than a datapath lacking the CRC accelerator. © 2010 IEEE.
  •  
5.
  • Azhar, Muhammad Waqar, 1986, et al. (författare)
  • SaC: Exploiting execution-time slack to save energy in heterogeneous multicore systems
  • 2019
  • Ingår i: ACM International Conference Proceeding Series. - New York, NY, USA : ACM.
  • Konferensbidrag (refereegranskat)abstract
    • Reducing the energy to carry out computational tasks is key to almost any computing application. We focus in this paper on iterative applications that have explicit computational deadlines per iteration. Our objective is to meet the computational deadlines while minimizing energy. We leverage the vast configuration space offered by heterogeneous multicore platforms which typically expose three dimensions for energy saving configurability: Voltage/frequency levels, thread count and core type (e.g. ARM big/LITTLE). We note that when choosing the most energy-efficient configuration that meets the computational deadline, an iteration will typically finish before the deadline and execution-time slack will build up across iterations. Our proposed slack management policy - SaC (Slack as a Currency) - proactively explores the configuration space to select configurations that can save substantial amounts of energy. To avoid the overheads of an exhaustive search of the configuration space, our proposal also comprises a low-overhead, on-line method by which one can assess each point in the configuration space by linearly interpolating between the endpoints in each configuration-space dimension. Overall, we show that our proposed slack management policy and linear-interpolation configuration assessment method can yield 62% energy savings on top of race-to-idle without missing any deadlines.
  •  
6.
  • Azhar, Muhammad Waqar, 1986, et al. (författare)
  • SLOOP: QoS-Supervised Loop Execution to Reduce Energy on Heterogeneous Architectures
  • 2017
  • Ingår i: Transactions on Architecture and Code Optimization. - : Association for Computing Machinery (ACM). - 1544-3973 .- 1544-3566. ; 14:4, s. Article No. 41-
  • Tidskriftsartikel (refereegranskat)abstract
    • Most systems allocate computational resources to each executing task without any actual knowledge of the application’s Quality-of-Service (QoS) requirements. Such best-effort policies lead to overprovisioning of the resources and increase energy loss. This work assumes applications with soft QoS requirements and exploits the inherent timing slack to minimize the allocated computational resources to reduce energy consumption. We propose a lightweight progress-tracking methodology based on the outer loops of application kernels. It builds on online history and uses it to estimate the total execution time. The prediction of the execution time and the QoS requirements are then used to schedule the application on a heterogeneous architecture with big out-of-order cores and small (LITTLE) in-order cores and select the minimum operating frequency, using DVFS, that meets the deadline. Our scheme is effective in exploiting the timing slack of each application. We show that it can reduce the energy consumption by more than 20% without missing any computational deadlines.
  •  
7.
  • Azhar, Muhammad Waqar, 1986, et al. (författare)
  • Task-RM: A Resource Manager for Energy Reduction in Task-Parallel Applications under Quality of Service Constraints
  • 2022
  • Ingår i: Transactions on Architecture and Code Optimization. - : Association for Computing Machinery (ACM). - 1544-3973 .- 1544-3566. ; 19:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Improving energy efficiency is an important goal of computer system design. This article focuses on a general model of task-parallel applications under quality-of-service requirements on the completion time. Our technique, called Task-RM, exploits the variance in task execution-times and imbalance between tasks to allocate just enough resources in terms of voltage-frequency and core-allocation so that the application completes before the deadline. Moreover, we provide a solution that can harness additional energy savings with the availability of additional processors. We observe that, for the proposed run-time resource manager to allocate resources, it requires specification of the soft deadlines to the tasks. This is accomplished by analyzing the energy-saving scenarios offline and by providing Task-RM with the performance requirements of the tasks. The evaluation shows an energy saving of 33% compared to race-to-idle and 22% compared to dynamic slack allocation (DSA) with an overhead of less than 1%.
  •  
8.
  • Azhar, Muhammad Waqar, 1986 (författare)
  • Techniques to Improve Energy Efficiency on Heterogeneous Multiprocessors under Timing and Quality Constraints
  • 2022
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Traditionally, applications are executed without the notion of a computational deadline and often use all available system resources, which leads to higher energy consumption. User specification of Quality of Service (QoS) constraints, in terms of completion time and solution quality, opens up for allocation of just enough resources to an application to finish just in time and thereby save energy. Modern heterogeneous multiprocessor (HMP) platforms provide a set of configurable resources, including a frequency range of dynamic voltage frequency scaling (DVFS), one among a set processor types, and one or a plurality of processors of each type. They can be configured at run-time to open up new opportunities for resource management. This thesis presents techniques to reduce energy consumption under QoS constraints by allocating resources at run-time on heterogeneous multiprocessor platforms targeting sequential and parallel iterative and task-parallel applications. The proposed techniques rely on a progress-tracking framework that monitors and predicts how much time is left until the application finishes. Furthermore, the proposed framework enables the prediction of computation demand and performance requirements for future iterations or tasks. The first contribution of this thesis is a resource management technique, called SLOOP, targeting single-threaded applications. SLOOP allocates resources, i.e., processor type and DVFS, for each iteration to meet deadlines while using the prediction of computational demand and execution time. The second contribution of this thesis is a resource-management scheme, called SaC, for multi-threaded applications executing on HMPs, where resources also include the number of processors besides DVFS and processor type. SaC first chooses the most energy-efficient configuration that meets the deadline. The proposed technique collects execution-time slack over subsequent iterations to select a configuration that can save energy. The third contribution of this thesis is a resource manager, called Task-RM, for task-parallel applications executing on HMPs under QoS constraints. Task-RM exploits the variance in task execution times and imbalance between sibling tasks to allocate just enough resources in terms of DVFS and processor type. It uses an innovative off-line analysis to avoid redoing scheduling analysis at run-time. Finally, the fourth contribution is a scheme, called Approx-RM, that can exploit accuracy-energy trade-offs in approximate iterative applications. Approx-RM allocates an appropriate amount of resources while guaranteeing timing and solution quality specifications. Approx-RM first predicts the iteration count required to meet the quality target and then allocates enough resources on an HMP in terms of DVFS, processor type, and processor count to save energy while meeting a performance target.
  •  
9.
  • Azhar, Muhammad Waqar, 1986 (författare)
  • Techniques to Save Energy in Heterogeneous Multicore Architectures under QoS Constraints
  • 2019
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Typically, applications are run with available system resources leading to over-provisioning of resources which can lead to high energy consumption. If the computational demand is specified, in terms of a Quality of Service (QoS) contract, it is possible to devote just enough resources to applications and thereby reduce energy consumption. Modern heterogeneous multicore platforms, such as ARM big.LITTLE, typically provide a multidimensional space of resources, called configuration space, such as Voltage-Frequency (V-F) settings, thread count and processor types, which can be configured at run-time to open up new opportunities for resource management. This thesis presents techniques to improve energy efficiency under the constraint of QoS by managing the resource allocation at run-time for applications run on heterogeneous multicore platforms. The applications considered are iterative with a computational deadline associated with each loop iteration. The proposed techniques apply to a framework that uses applications’ outer loop iterations as a means for progress-tracking and prediction of the execution time. A first contribution of the thesis is a resource management technique for single-threaded applications that uses core type (e.g. big or little cores) and V-F settings as a configuration space to select a configuration, for each iteration, based on the execution-time prediction of future iterations and computational deadlines. The thesis shows that an energy saving of 25% over the race-to-idle state-of-the-art technique is achieved without missing any deadlines. This scheme incurs only 0.6% and 0.8% of timing and energy overheads, respectively. A second contribution of the thesis is a novel resource-management policy for multi-threaded applications. Here, the configuration space is extended to also consider the thread count, i.e., the number of cores assigned to multi-threaded applications. The proposed technique first chooses the most energy-efficient configuration that meets the computational deadline. Since an iteration typically finishes before the deadline, the proposed technique collects the generated execution-time slack over subsequent iterations with the goal of selecting a configuration that can save more energy. To allow for on-line exploration of the configuration space, at low overhead, a third contribution of the thesis is an online, low-overhead prediction method based on interpolation, that measures the execution statistics at end points of each configuration-space dimension and interpolates the values at intermediate configurations. Overall, the proposed technique saves 61% energy compared to the state-of-the-art race-to-idle technique without missing any deadlines. Further, it only incurs 0.6% and 0.7% of timing and energy overheads, respectively.
  •  
10.
  • Griessl, René, et al. (författare)
  • A Scalable, Heterogeneous Hardware Platform for Accelerated AIoT based on Microservers
  • 2023
  • Ingår i: Shaping the Future of IoT with Edge Intelligence How Edge Computing Enables the Next Generation of IoT Applications. - 9788770040273 ; , s. 179-196
  • Bokkapitel (övrigt vetenskapligt/konstnärligt)abstract
    • Performance and energy efficiency are key aspects of next-generation AIoT hardware. This chapter presents a scalable, heterogeneous hardware platform for accelerated AIoT based on microserver technology. It integrates several accelerator platforms based on technologies like CPUs, embedded GPUs, FPGAs, or specialized ASICs, supporting the full range of the cloud−edgeIoT continuum. The modular microserver approach enables the integrationof different, heterogeneous accelerators into one platform. Benchmarking the various accelerators takes performance, energy efficiency, and accuracy into account. The results provide a solid overview of available accelerator solutions and guide hardware selection for AIoT applications from the far edge to the cloud.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 17
Typ av publikation
konferensbidrag (10)
tidskriftsartikel (4)
doktorsavhandling (1)
bokkapitel (1)
licentiatavhandling (1)
Typ av innehåll
refereegranskat (14)
övrigt vetenskapligt/konstnärligt (3)
Författare/redaktör
Azhar, Muhammad Waqa ... (17)
Petersen Moura Tranc ... (9)
Mohammad Qararyah, F ... (5)
Stenström, Per, 1957 (4)
Kaiser, Martin (3)
Hagemeyer, Jens (3)
visa fler...
Kucza, Nils (3)
Zouzoula, Stavroula, ... (3)
Porrmann, Mario (3)
Larsson-Edefors, Per ... (2)
Vázquez Maceiras, Ma ... (2)
Pericas, Miquel, 197 ... (2)
Hoang, Tung, 1980 (2)
Griessl, René (2)
Porrmann, Florian (2)
Mika, K. (2)
Tassemeier, M. (2)
Flottmann, M. (2)
Odman, D. (2)
Gugala, K. (2)
Latosinski, G. (2)
Eriksson, Olof (1)
Knauss, Eric, 1977 (1)
Papaefstathiou, Vasi ... (1)
Manivannan, Madhavan ... (1)
Ask, Andréas (1)
Själander, Magnus, 1 ... (1)
Hasan, Ali, 1984 (1)
Vijayashekar, Akshay ... (1)
Ansari, Kashan Khurs ... (1)
Brunnegard, Oliver (1)
Heyn, Hans-Martin, 1 ... (1)
Casimiro, Antonio (1)
Felber, Pascal (1)
Pasin, Marcelo (1)
Salomonsson, Hans, 1 ... (1)
Marcus, Carina (1)
Griessl, R. (1)
Porrmann, F. (1)
Mika, Kevin (1)
Tigges, Lennart (1)
Ménétrey, Jämes (1)
Ödman, Daniel (1)
Bessani, Alysson (1)
Carvalho, Tiago (1)
Gugala, Karol (1)
Zierhoffer, Piotr (1)
Latosinski, Grzegorz (1)
Tassemeier, Marco (1)
Mao, Yufei (1)
visa färre...
Lärosäte
Chalmers tekniska högskola (17)
Göteborgs universitet (1)
Språk
Engelska (17)
Forskningsämne (UKÄ/SCB)
Teknik (16)
Naturvetenskap (14)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy