SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Plosila Juha) "

Sökning: WFRF:(Plosila Juha)

  • Resultat 1-50 av 76
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Anwar, Hassan, et al. (författare)
  • Exploring Spiking Neural Network on Coarse-Grain Reconfigurable Architectures
  • 2014
  • Ingår i: ACM International Conference Proceeding Series. - New York, NY, USA : ACM. - 9781450328227 ; , s. 64-67
  • Konferensbidrag (refereegranskat)abstract
    • Today, reconfigurable architectures are becoming increas- ingly popular as the candidate platforms for neural net- works. Existing works, that map neural networks on re- configurable architectures, only address either FPGAs or Networks-on-chip, without any reference to the Coarse-Grain Reconfigurable Architectures (CGRAs). In this paper we investigate the overheads imposed by implementing spiking neural networks on a Coarse Grained Reconfigurable Ar- chitecture (CGRAs). Experimental results (using point to point connectivity) reveal that up to 1000 neurons can be connected, with an average response time of 4.4 msec.
  •  
2.
  • Carlsson, Jonas, 1972- (författare)
  • Contributions to Asynchronous Communication Ports for GALS Systems
  • 2006
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Digital systems commonly use a single global clock signal to synchronize the whole system. This is not always possible and it can be more advantageously to divide the system into separate clock domains, where each clock domain can operate with its own clock frequency. Communication between the different clock domains are not trivial and must be handled with care. Several schemes can be used depending on the relation between the clock frequencies of the communicating clock domains. This thesis focuses on the Globally Asynchronous Locally Synchronous (GALS) scheme, in which all communications between clock domains are handled using dedicated communication channels. These communication channels use asynchronous handshaking protocols to transfer information between clock domains. No global clock signal is used and the clock signal is instead local for each clock domain.An efficient design flow for GALS system has been developed, which allows a designer to implement GALS systems without prior knowledge of asynchronous circuits. The GALS design flow starts with a high-level model of the system behavior and ends with an implementation in an FPGA or an ASIC. The design flow can also increase the design efficiency for GALS system since the flow alleviates the design and placement of the asynchronous circuits for the designer. A tool that handles the asynchronous circuits in the design flow has been developed.Two types of communication ports have been developed to handle the communication between clock domains. Both of these ports can be used in systems with static schedule or dynamic schedule of transactions. One of the communication ports can easily be migrated to a new CMOS process, since it only uses standard-cells that care provided by most vendors of CMOS processes. A clock gating circuit has been developed to allow a clock domain to use an external stable clock signal to create an internal stoppable clock signal. A stoppable local clock is used to eliminate problems with metastability when transferring data between clock domains with arbitrary clock frequencies.In order to validate the design flow and proposed circuitry, has an integrated circuit for 2-dimensional Discrete Cosine Transform been implemented using the GALS scheme and one of the proposed communication ports. The circuit has been implemented using a standard-cell library in a 0.35 mm CMOS process. A few possible improvements to the implementation are also discussed in the thesis.The GALS design flow with the asynchronous wrapper generation tool has been used to implement the digital baseband processing in the physical layer of the IEEE 802.11a transmitter. The transmitter is built using multiple clock domains. The transmitter has been implemented and tested in a Stratix II FPGA.
  •  
3.
  • Carlsson, Jonas, 1972- (författare)
  • Studies on asynchronous communication ports for GALS systems
  • 2005
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Digital systems generally use a global clock signal for the whole system. A System-on-Chip may have to communicate with the environment, using several different data rates that does not fit well to the single global clock frequency. When designing a digital system, it might be beneficial to divide the system into different clock domains where each domain can operate with its own clock frequency.In this thesis, various clocking schemes are discussed. The synchronous clocking schemes that are discussed are mesochronous, plesiochronous, rational, oversampling and arbitrary clocking schemes.The thesis focuses on the Globally Asynchronous Locally Synchronous scheme. This scheme transfers information between the different clock domains through dedicated communication channels. These communication channels use asynchronous handshaking protocols to transfer information without the necessity for a clock.A communication channel consists of a transmitting and receiving port. Two types of communication ports are proposed in the thesis. The communication ports can be used either in a system with a static schedule or dynamic schedule of transactions. One of the ports can easily be implemented in different CMOS processes, since it only uses standard cells that can be found in most existing CMOS processes standard library.A 2-dimensional Discrete Cosine Transform has been implemented using the GALS scheme and one of the proposed communication ports. The 2-D DCT has been implemented using a standard cell library supplied by AMS fora 0.35 µm CMOS process. A few improvements to the implementation are also discussed in the thesis.
  •  
4.
  • Daneshtalab, Masoud, et al. (författare)
  • In-order delivery approach for 2D and 3D NoCs
  • 2015
  • Ingår i: Journal of Supercomputing. - : Springer Science and Business Media LLC. - 0920-8542 .- 1573-0484. ; 71:8, s. 2877-2899
  • Tidskriftsartikel (refereegranskat)abstract
    • In many applications, it is critical to guarantee the in-order delivery of requests from the master cores to the slave cores, so that the requests can be executed in the correct order without requiring buffers. Since in NoCs packets may use different paths and on the other hand traffic congestion varies on different routes, the in-order delivery constraint cannot be met without support. To guarantee the in-order delivery, traditional approaches either use dimension-order routing or employ reordering buffers at network interfaces. Dimension-order routing degrades the performance considerably while the usage of reordering buffers imposes large area overhead. In this paper, we present a mechanism allowing packets to be routed through multiple paths in the network, helping to balance the traffic load while guaranteeing the in-order delivery. The proposed method combines the advantages of both deterministic and adaptive routing algorithms. The simple idea is to use different deterministic algorithms for independent flows. This approach neither requires reordering buffers nor limits packets to use a single path. The algorithm is simple and practical with negligible area overhead over dimension-order routing. The concept is investigated in both 2D and 3D mesh networks.
  •  
5.
  • Daneshtalab, Masoud, et al. (författare)
  • Special issue on many-core embedded systems
  • 2014
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 38:6, s. 525-525
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)
  •  
6.
  • Dytckov, Sergei, et al. (författare)
  • Efficient STDP Micro-Architecture for Silicon Spiking Neural Networks
  • 2014
  • Ingår i: 2014 17TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD). - 9781479957934 ; , s. 496-503
  • Konferensbidrag (refereegranskat)abstract
    • Spiking neural networks (SNNs) are the closest approach to biological neurons in comparison with conventional artificial neural networks (ANN). SNNs are composed of neurons and synapses which are interconnected with a complex pattern. As communication in such massively parallel computational systems is getting critical, the network-on-chip (NoC) becomes a promising solution to provide a scalable and robust interconnection fabric. However, using NoC for large-scale SNNs arises a trade-off between scalability, throughput, neuron/router ratio (cluster size), and area overhead. In this paper, we tackle the trade-off using a clustering approach and try to optimize the synaptic resource utilization. An optimal cluster size can provide the lowest area overhead and power consumption. For the learning purposes, a phenomenon known as spike-timing-dependent plasticity (STDP) is utilized. The micro-architectures of the network, clusters, and the computational neurons are also described. The presented approach suggests a promising solution of integrating NoCs and STDP-based SNNs for the optimal performance based on the underlying application.
  •  
7.
  • Ebrahimi, Masoumeh, et al. (författare)
  • Fault-tolerant routing algorithm for 3D NoC using hamiltonian path strategy
  • 2013
  • Ingår i: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013. ; , s. 1601-1604
  • Konferensbidrag (refereegranskat)abstract
    • While Networks-on-Chip (NoC) have been increasing in popularity with industry and academia, it is threatened by the decreasing reliability of aggressively scaled transistors. In this paper, we address the problem of faulty elements by the means of routing algorithms. Commonly, fault-tolerant algorithms are complex due to supporting different fault models while preventing deadlock. When moving from 2D to 3D network, the complexity increases significantly due to the possibility of creating cycles within and between layers. In this paper, we take advantages of the Hamiltonian path to tolerate faults in the network. The presented approach is not only very simple but also able to support almost all one-faulty unidirectional links in 2D and 3D NoCs.
  •  
8.
  • Ebrahimi, Masoumeh, et al. (författare)
  • In-Order Delivery Approach for 3D NoCs
  • 2013
  • Ingår i: 2013 17TH CSI INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND DIGITAL SYSTEMS (CADS 2013). - : IEEE. - 9781479905621 ; , s. 87-
  • Konferensbidrag (refereegranskat)abstract
    • Routing algorithms can be classified into deterministic and adaptive methods. In deterministic methods, a single path is selected for each pair of source and destination nodes, and thus they are unable to distribute the traffic load over the network. Using deterministic routing, packets reach a destination in the same order they are delivered from a source node. Adaptive routing algorithms can greatly improve the performance by distributing packets over different routes. However, it requires a mechanism to reorder packets at destinations. Thereby, a large reordering buffer and a complex control mechanism are required at each node. This motivated us to propose a method guaranteeing in-order delivery while sending packets through alternative paths. The proposed method combines the advantages of both deterministic and adaptive routing algorithms. We introduce several routing algorithms working together in the network without creating cycles. By using these algorithms, packets of different flows use different routes while packets belonging to the same flow follow a single path. In this way, traffic is distributed over the network while addressing in-order delivery. We employ this approach on three-dimensional Networks-on-Chip.
  •  
9.
  • Ebrahimi, Masoumeh, et al. (författare)
  • Path-Based Partitioning Methods for 3D Networks-on-Chip with Minimal Adaptive Routing
  • 2014
  • Ingår i: IEEE Transactions on Computers. - 0018-9340 .- 1557-9956. ; 63:3, s. 718-733
  • Tidskriftsartikel (refereegranskat)abstract
    • Combining the benefits of 3D ICs and Networks-on-Chip (NoCs) schemes provides a significant performance gain in ChipMultiprocessors (CMPs) architectures. As multicast communication is commonly used in cache coherence protocols for CMPs and invarious parallel applications, the performance of these systems can be significantly improved if multicast operations are supported at thehardware level. In this paper, we present several partitioning methods for the path-based multicast approach in 3D mesh-based NoCs,each with different levels of efficiency. In addition, we develop novel analytical models for unicast and multicast traffic to explore theefficiency of each approach. In order to distribute the unicast and multicast traffic more efficiently over the network, we propose theMinimal and Adaptive Routing (MAR) algorithm for the presented partitioning methods. The analytical and experimental results show thatan advantageous method named Recursive Partitioning (RP) outperforms the other approaches. RP recursively partitions the networkuntil all partitions contain a comparable number of switches and thus the multicast traffic is equally distributed among several subsetsand the network latency is considerably decreased. The simulation results reveal that the RP method can achieve performanceimprovement across all workloads while performance can be further improved by utilizing the MAR algorithm. Nineteen percent averageand 42 percent maximum latency reduction are obtained on SPLASH-2 and PARSEC benchmarks running on a 64-core CMP.
  •  
10.
  • Farahnakian, Fahimeh, et al. (författare)
  • Adaptive Load Balancing in Learning-based Approaches for Many-core Embedded Systems
  • 2014
  • Ingår i: Journal of Supercomputing. - : Springer Science and Business Media LLC. - 0920-8542 .- 1573-0484. ; 68:3, s. 1214-1234
  • Tidskriftsartikel (refereegranskat)abstract
    • Adaptive routing algorithms improve network performance by distributingtraffic over the whole network. However, they require congestion information to facilitateload balancing. To provide local and global congestion information, we proposea learning method based on dual reinforcement learning approach. This informationcan be dynamically updated according to the changing traffic condition in the networkby propagating data and learning packets. We utilize a congestion detection methodwhich updates the learning rate according to the congestion level. This method calculatesthe average number of free buffer slots in each switch at specific time intervalsand compares it with maximum and minimum values. Based on the comparison result,the learning rate sets to a value between 0 and 1. If a switch gets congested, the learningrate is set to a high value, meaning that the global information is more important thanlocal. In contrast, local is more emphasized than global information in non-congestedswitches. Results show that the proposed approach achieves a significant performanceimprovement over the traditional Q-routing, DRQ-routing, DBAR and Dynamic XYalgorithms.
  •  
11.
  • Farahnakian, Fahimeh, et al. (författare)
  • Bi-LCQ: A Low-weight Clustering-based Q-learning Approach for NoCs
  • 2014
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 38:1, s. 64-75
  • Tidskriftsartikel (refereegranskat)abstract
    • Network congestion has a negative impact on the performance of on-chip networks due to the increasedpacket latency. Many congestion-aware routing algorithms have been developed to alleviate trafficcongestion over the network. In this paper, we propose a congestion-aware routing algorithm basedon the Q-learning approach for avoiding congested areas in the network. By using the learning method,local and global congestion information of the network is provided for each switch. This information canbe dynamically updated, when a switch receives a packet. However, Q-learning approach suffers fromhigh area overhead in NoCs due to the need for a large routing table in each switch. In order to reducethe area overhead, we also present a clustering approach that decreases the number of routing tablesby the factor of 4. Results show that the proposed approach achieves a significant performance improvementover the traditional Q-learning, C-routing, DBAR and Dynamic XY algorithms.
  •  
12.
  • Guang, Liang, et al. (författare)
  • Embedding Fault-Tolerance with Dual-Level Agents in Many-Core Systems
  • 2012
  • Ingår i: First MEDIAN Workshop (MEDIAN'12).
  • Konferensbidrag (övrigt vetenskapligt/konstnärligt)abstract
    • Dual-level fault-tolerance is presented on many-core systems, provided by the software-based system agent and hardware-based local agents. The system agent performs fault-triggered energy-aware remapping with bandwidth constraints, addressing coarse-grained processor failures. The local agents achieve fine-grained link-level fault tolerance against transient and permanent errors. The paper concisely presents the architecture, dual-level fault-tolerant techniques and experiment results.
  •  
13.
  • Guang, Liang, et al. (författare)
  • Hierarchical supporting structure for dynamic organization in many-core computing systems
  • 2013
  • Ingår i: PECCS 2013. ; , s. 252-261
  • Konferensbidrag (refereegranskat)abstract
    • Hierarchical supporting structures for dynamic organization in many-core computing systems are presented.With profound hardware variations and unpredictable errors, dependability becomes a challenging issue in theemerging many-core systems. To provide fault-tolerance against processor failures or performance degradation,dynamic organization is proposed which allows clusters to be created and updated at the run-time. Hierarchicalsupporting structures are designed for each level of monitoring agents, to enable the tracing, storingand updating of component and system status. These supporting structures need to follow software/hardwareco-design to provide small and scalable overhead, while accommodating the functions of agents on the correspondinglevel. This paper presents the architectural design, functional simulation and implementationanalysis. The study demonstrates that the proposed structures facilitate the dynamic organization in caseof processor failures and incur small area overhead on many-core systems.
  •  
14.
  • Haghbayan, Mohammad-Hashem, et al. (författare)
  • A Power-Aware Approach for Online Test Scheduling in Many-Core Architectures
  • 2016
  • Ingår i: IEEE Transactions on Computers. - : IEEE. - 0018-9340 .- 1557-9956. ; 65:3, s. 730-743
  • Tidskriftsartikel (refereegranskat)abstract
    • Aggressive technology scaling triggers novel challenges to the design of multi-/many-core systems, such as limited power budget and increased reliability issues. Today's many-core systems employ dynamic power management and runtime mapping strategies trying to offer optimal performance while fulfilling power constraints. On the other hand, due to the reliability challenges, online testing techniques are becoming a necessity in current and near future technologies. However, state-of-the-art techniques are not aware of the other power/performance requirements. This paper proposes a power-aware non-intrusive online testing approach for many-core systems. The approach schedules software based self-test routines on the various cores during their idle periods, while honoring the power budget and limiting delays in the workload execution. A test criticality metric, based on a device aging model, is used to select cores to be tested at a time. Moreover, power and reliability issues related to the testing at different voltage and frequency levels are also handled. Extensive experimental results reveal that the proposed approach can i) efficiently test the cores within the available power budget causing a negligible performance penalty, ii) adapt the test frequency to the current cores' aging status, and iii) cover available voltage and frequency levels during the testing.
  •  
15.
  • Hosseinpour, Farhoud, et al. (författare)
  • A Resource Management Model for Distributed Multi-Task Applications in Fog Computing Networks
  • 2021
  • Ingår i: IEEE Access. - : Institute of Electrical and Electronics Engineers (IEEE). - 2169-3536. ; 9, s. 152792-152802
  • Tidskriftsartikel (refereegranskat)abstract
    • While the effectiveness of fog computing in Internet of Things (IoT) applications has been widely investigated in various studies, there is still a lack of techniques to efficiently utilize the computing resources in a fog platform to maximize Quality of Service (QoS) and Quality of Experience (QoE). This paper presents a resource management model for service placement of distributed multitasking applications in fog computing through mathematical modeling of such a platform. Our main design goal is to reduce communication between the candidate nodes hosting different task modules of an application by selecting a group of nodes near each other and as close to the source of the data as possible. We propose a method based on a greedy principle that demonstrates a highly scalable and near-optimal performance for resource mapping problems for multitasking applications in fog computing networks. Compared with the commercial Gurobi optimizer, our proposed algorithm provides a mapping solution that obtains 93% of the performance, attributed to a higher communication cost, while outperforming the reference method in terms of the computing speed, cutting the mapping execution time to less than 1% of that of the Gurobi optimizer.
  •  
16.
  • Jafri, Syed M. A. H., et al. (författare)
  • Architecture and Implementation of Dynamic Parallelism, Voltage and Frequency Scaling (PVFS) on CGRAs
  • 2015
  • Ingår i: ACM Journal on Emerging Technologies in Computing Systems. - : Association for Computing Machinery (ACM). - 1550-4832 .- 1550-4840. ; 11:4
  • Tidskriftsartikel (refereegranskat)abstract
    • In the era of platforms hosting multiple applications with arbitrary performance requirements, providing a worst-case platform-wide voltage/frequency operating point is neither optimal nor desirable. As a solution to this problem, designs commonly employ dynamic voltage and frequency scaling (DVFS). DVFS promises significant energy and power reductions by providing each application with the operating point (and hence the performance) tailored to its needs. To further enhance the optimization potential, recent works interleave dynamic parallelism with conventional DVFS. The induced parallelism results in performance gains that allow an application to lower its operating point even further (thereby saving energy and power consumption). However, the existing works employ costly dedicated hardware (for synchronization) and rely solely on greedy algorithms to make parallelism decisions. To efficiently integrate parallelism with DVFS, compared to state-of-the-art, we exploit the reconfiguration (to reduce DVFS synchronization overheads) and enhance the intelligence of the greedy algorithm (to make optimal parallelism decisions). Specifically, our solution relies on dynamically reconfigurable isolation cells and an autonomous parallelism, voltage, and frequency selection algorithm. The dynamically reconfigurable isolation cells reduce the area overheads of DVFS circuitry by configuring the existing resources to provide synchronization. The autonomous parallelism, voltage, and frequency selection algorithm ensures high power efficiency by combining parallelism with DVFS. It selects that parallelism, voltage, and frequency trio which consumes minimum power to meet the deadlines on available resources. Synthesis and simulation results using various applications/algorithms (WLAN, MPEG4, FFT, FIR, matrix multiplication) show that our solution promises significant reduction in area and power consumption (23% and 51%) compared to state-of-the-art.
  •  
17.
  •  
18.
  • Jafri, Syed Mohammad Asad Hassan, et al. (författare)
  • Compression Based Efficient and Agile Configuration Mechanism for Coarse Grained Reconfigurable Architectures
  • 2011
  • Ingår i: Proc. IEEE Int Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW) Symp. - 9780769543857 ; , s. 290-293
  • Konferensbidrag (refereegranskat)abstract
    • This paper considers the possibility of speeding up the configuration by reducing the size of configware in coarsegrained reconfigurable architectures (CGRAs). Our goal was to reduce the number of cycles and increase the configuration bandwidth. The proposed technique relies on multicasting and bitstream compression. The multicasting reduces the cycles by configuring the components performing identical functions simultaneously, in a single cycle, while the bitstream compression increases the configuration bandwidth. We have chosen the dynamically reconfigurable resource array (DRRA) architecture as a vehicle to study the efficiency of this approach. In our proposed method, the configuration bitstream is compressed offline and stored in a memory. If reconfiguration is required, the compressed bitstream is decompressed using an online decompresser and sent to DRRA. Simulation results using practical applications showed upto 78% and 22% decrease in configuration cycles for completely parallel and completely serial implementations, respectively. Synthesis results have confirmed nigligible overhead in terms of area (1.2 %) and timing.
  •  
19.
  • Jafri, Syed M.A.H., et al. (författare)
  • Customizable Compression Architecture for Efficient Configuration in CGRAs
  • 2011
  • Ingår i: Proceedings. ; , s. 31-31
  • Konferensbidrag (refereegranskat)abstract
    • Today, Coarse Grained Reconfigurable Architectures (CGRAs) host multiple applications. Novel CGRAs allow each application to exploit runtime parallelism and time sharing. Although these features enhance the power and silicon efficiency, they significantly increase the configuration memory overheads. As a solution to this problem researchers have employed statistical compression, intermediate compact representation, and multicasting. Each of these techniques has different properties, and is therefore best suited for a particular class of applications. However, existing research only deals with these methods separately. In this paper we propose a morphable compression architecture that interleaves these techniques in a unique platform.
  •  
20.
  • Jafri, Syed Mohammad Asad Hassan, et al. (författare)
  • Energy-Aware CGRAs using Dynamically Re-configurable isolation Cells
  • 2013
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents a self adaptive architectureto enhance the energy efficiency of coarse-grained reconfigurablearchitectures (CGRAs). Today, platforms host multipleapplications, with arbitrary inter-application communication andconcurrency patterns. Each application itself can have multipleversions (implementations with different degree of parallelism)and the optimal version can only be determined at runtime. Forsuch scenarios, traditional worst case designs and compile timemapping decisions are neither optimal nor desirable. Existingsolutions to this problem employ costly dedicated hardware toconfigure the operating point at runtime (using DVFS). As analternative to dedicated hardware, we propose exploiting thereconfiguration features of modern CGRAs. Our solution relieson dynamically reconfigurable isolation cells (DRICs) and autonomousparallelism, voltage, and frequency selection algorithm(APVFS). The DRICs reduce the overheads of DVFS circuitryby configuring the existing resources as isolation cells. APVFSensures high efficiency by dynamically selecting the parallelism,voltage and frequency trio, which consumes minimum powerto meet the deadlines on available resources. Simulation resultsusing representative applications (Matrix multiplication, FIR,and FFT) showed up to 23% and 51% reduction in powerand energy, respectively, compared to traditional DVFS designs.Synthesis results have confirmed significant reduction in areaoverheads compared to state of the art DVFS methods.
  •  
21.
  • Jafri, Syed. M. A. H., et al. (författare)
  • Energy-Aware Coarse-Grained Reconfigurable Architectures using Dynamically Reconfigurable Isolation Cells
  • 2013
  • Ingår i: Proceedings Of The Fourteenth International Symposium On Quality Electronic Design (ISQED 2013). - 9781467349529 ; , s. 104-111
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents a self adaptive architecture to enhance the energy efficiency of coarse-grained reconfigurable architectures (CGRAs). Today, platforms host multiple applications, with arbitrary inter-application communication and concurrency patterns. Each application itself can have multiple versions (implementations with different degree of parallelism) and the optimal version can only be determined at runtime. For such scenarios, traditional worst case designs and compile time mapping decisions are neither optimal nor desirable. Existing solutions to this problem employ costly dedicated hardware to configure the operating point at runtime (using DVFS). As an alternative to dedicated hardware, we propose exploiting the reconfiguration features of modern CGRAs. Our solution relies on dynamically reconfigurable isolation cells (DRICs) and autonomous parallelism, voltage, and frequency selection algorithm (APVFS). The DRICs reduce the overheads of DVFS circuitry by configuring the existing resources as isolation cells. APVFS ensures high efficiency by dynamically selecting the parallelism, voltage and frequency trio, which consumes minimum power to meet the deadlines on available resources. Simulation results using representative applications (Matrix multiplication, FIR, and FFT) showed up to 23% and 51% reduction in power and energy, respectively, compared to traditional DVFS designs. Synthesis results have confirmed significant reduction in area overheads compared to state of the art DVFS methods.
  •  
22.
  • Jafri, Syed Mohammad Asad Hassan, et al. (författare)
  • Energy-Aware Fault-Tolerant CGRAs Addressing Application with Different Reliability Needs
  • 2013
  • Ingår i: Digital System Design (DSD), 2013 Euromicro Conference on. - : IEEE conference proceedings. ; , s. 525-534
  • Konferensbidrag (refereegranskat)abstract
    • In this paper, we propose a polymorphic fault tolerant architecture that can be tailored to efficiently support the reliability needs of multiple applications at run-time. Today, coarse-grained reconfigurable architectures (CGRAs) host multiple applications with potentially different reliability needs. Providing platform-wide worst-case (maximum) protection to all the applications is neither optimal nor desirable. To reduce the fault-tolerance overhead, adaptive fault-tolerance strategies have been proposed. The proposed techniques access the reliability requirements of each application and adjust the fault-tolerance intensity (and hence overhead), accordingly. However, existing flexible reliability schemes only allow to shift between different levels of modular redundancy (duplication, triplication, etc.) and deal with only a single class of faults (e.g. soft errors). To complement these strategies, we propose energy-aware fault-tolerance that, in addition to modular redundancy, can also provide low cost, sub-modular (e.g. residue mod 3) redundancy, to cater both permanent and temporary faults. Our solution relies on an agent based control layer and a configurable fault-tolerance data path. The control layer identifies the application class and configures the data path to provide the needed reliability. Simulation results using a few selected algorithms (FFT, matrix multiplication, and FIR filter) showed that the proposed method provides flexible protection with energy overhead ranging from 3.125% to 107% for different reliability levels. Synthesis results have confirmed that the proposed architecture significantly reduces the area overhead for self-checking (59.1%) and fault tolerant (7.1%) versions, compared to the state of the art adaptive reliability techniques.
  •  
23.
  • Jafri, Syed Mohammad Asad Hassan, et al. (författare)
  • Energy-Aware Fault-Tolerant Network-on-Chips for Addressing Multiple Traffic Classes
  • 2012
  • Ingår i: Proceedings. - 9781467324984 ; , s. 242-249
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents an energy efficient architectureto provide on-demand fault tolerance to multiple traffic classes,running simultaneously on single network on chip (NoC) platform.Today, NoCs host multiple traffic classes with potentiallydifferent reliability needs. Providing platform-wide worst-case(maximum) protection to all the classes is neither optimal nordesirable. To reduce the overheads incurred by fault tolerance,various adaptive strategies have been proposed. The proposedtechniques rely on individual packet fields and operating conditionsto adjust the intensity and hence the overhead of faulttolerance. Presence of multiple traffic classes undermines theeffectiveness of these methods. To complement the existing adaptivestrategies, we propose on-demand fault tolerance, capableof providing required reliability, while significantly reducing theenergy overhead. Our solution relies on a hierarchical agentbased control layer and a reconfigurable fault tolerance datapath. The control layer identifies the traffic class and directs thepacket to the path providing the needed reliability. Simulationresults using representative applications (matrix multiplication,FFT, wavefront, and HiperLAN) showed up to 95% decrease inenergy consumption compared to traditional worst case methods.Synthesisresultshave confirmedanegligible additionaloverhead,for providing on-demand protection (up to 5.3% area), comparedto the overall fault tolerance circuitry.
  •  
24.
  • Jafri, Syed Mohammad Asad Hassan, et al. (författare)
  • Energy-aware fault-tolerant network-on-chips for addressing multiple traffic classes
  • 2013
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 37:8, s. 811-822
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents an energy efficient architecture to provide on-demand fault tolerance to multiple traffic classes, running simultaneously on single network on chip (NoC) platform. Today, NoCs host multiple traffic classes with potentially different reliability needs. Providing platform-wide worst-case (maximum) protection to all the classes is neither optimal nor desirable. To reduce the overheads incurred by fault tolerance, various adaptive strategies have been proposed. The proposed techniques rely on individual packet fields and operating conditions to adjust the intensity and hence the overhead of fault tolerance. Presence of multiple traffic classes undermines the effectiveness of these methods. To complement the existing adaptive strategies, we propose on-demand fault tolerance, capable of providing required reliability, while significantly reducing the energy overhead. Our solution relies on a hierarchical agent based control layer and a reconfigurable fault tolerance data path. The control layer identifies the traffic class and directs the packet to the path providing the needed reliability. Simulation results using representative applications (matrix multiplication, FFT, wavefront, and HiperLAN) showed up to 95% decrease in energy consumption compared to traditional worst case methods. Synthesis results have confirmed a negligible additional overhead, for providing on-demand protection (up to 5.3% area), compared to the overall fault tolerance circuitry.
  •  
25.
  • Jafri, Syed Mohammad Asad Hassan, et al. (författare)
  • Energy-Aware-Task-Parallelism for Efficient Dynamic Voltage, and Frequency Scaling, in CGRAs
  • 2013
  • Ingår i: Proceedings - 2013 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, IC-SAMOS 2013. - : IEEE. - 9781479901036 ; , s. 104-112
  • Konferensbidrag (refereegranskat)abstract
    • Today, coarse grained reconfigurable architectures (CGRAs) host multiple applications, with arbitrary communication and computation patterns. Each application itself is composed of multiple tasks, spatially mapped to different parts of platform. Providing worst-case operating point to all applications leads to excessive energy and power consumption. To cater this problem, dynamic voltage and frequency scaling (DVFS) is a frequently used technique. DVFS allows to scale the voltage and/or frequency of the device, based on runtime constraints. Recent research suggests that the efficiency of DVFS can be significantly enhanced by combining dynamic parallelism with DVFS. The proposed methods exploit the speedup induced by parallelism to allow aggressive frequency and voltage scaling. These techniques, employ greedy algorithm, that blindly parallelizes a task whenever required resources are available. Therefore, it is likely to parallelize a task(s) even if it offers no speedup to the application, thereby undermining the effectiveness of parallelism. As a solution to this problem, we present energy aware task parallelism. Our solution relies on a resource allocation graphs and an autonomous parallelism, voltage, and frequency selection algorithm. Using resource allocation graph, as a guide, the autonomous parallelism, voltage, and frequency selection algorithm parallelizes a task only if its parallel version reduces overall application execution time. Simulation results, using representative applications (MPEG4, WLAN), show that our solution promises better resource utilization, compared to greedy algorithm. Synthesis results (using WLAN) confirm a significant reduction in energy (up to 36%), power (up to 28%), and configuration memory requirements (up to 36%), compared to state of the art.
  •  
26.
  • Jafri, Syed M. A. H., et al. (författare)
  • Morphable Compression Architecture for Efficient Configuration in CGRAs
  • 2014
  • Ingår i: 2014 17th Euromicro Conference on Digital System Design (DSD). ; , s. 42-49
  • Konferensbidrag (refereegranskat)abstract
    • Today, Coarse Grained Reconfigurable Architectures (CGRAs) host multiple applications. Novel CGRAs allow each application to exploit runtime parallelism and time sharing. Although these features enhance the power and silicon efficiency, they significantly increase the configuration memory overheads (up to 50% area of the overall platform). As a solution to this problem researchers have employed statistical compression, intermediate compact representation, and multicasting. Each of these techniques has different properties (i.e. compression ratio and decoding time), and is therefore best suited for a particular class of applications (and situation). However, existing research only deals with these methods separately. In this paper we propose a morphable compression architecture that interleaves these techniques in a unique platform. The proposed architecture allows each application to enjoy a separate compression/decompression hierarchy (consisting of various types and implementations of hardware/software decoders) tailored to its needs. Thereby, our solution offers minimal memory while meeting the required configuration deadlines. Simulation results, using different applications (FFT, Matrix multiplication, and WLAN), reveal that the choice of compression hierarchy has a significant impact on compression ratio (from configware replication to 52%) and configuration cycles (from 33 nsec to 1.5 secs) for the tested applications. Synthesis results reveal that introducing adaptivity incurs negligible additional overheads (1%) compared to the overall platform area.
  •  
27.
  • Jafri, Syed Mohammad Asad Hassan, et al. (författare)
  • NeuroCGRA : A CGRA with support for neural networks
  • 2014
  • Ingår i: Proceedings of the 2014 International Conference on High Performance Computing and Simulation, HPCS 2014. - : IEEE. - 9781479953127 ; , s. 506-511
  • Konferensbidrag (refereegranskat)abstract
    • Today, Coarse Grained Reconfigurable Architectures (CGRAs) are becoming an increasingly popular implementation platform. In real world applications, the CGRAs are required to simultaneously host processing (e.g. Audio/video acquisition) and estimation (e.g. audio/video/image recognition) tasks. For estimation problems, neural networks, promise a higher efficiency than conventional processing. However, most of the existing CGRAs provide no support for neural networks. To realize realize both neural networks and conventional processing on the same platform, this paper presents NeuroCGRA. NeuroCGRA allows the processing elements and the network to dynamically morph into either conventional CGRA or a neural network, depending on the hosted application. We have chosen the DRRA as a vehicle to study the feasibility and overheads of our approach. Synthesis results reveal that the proposed enhancements incur negligible overheads (4.4% area and 9.1% power) compared to the original DRRA cell.
  •  
28.
  • Jafri, Syed Mohammad Asad Hassan, et al. (författare)
  • Polymorphic Configuration Architecture for CGRAs
  • 2016
  • Ingår i: IEEE Transactions on Very Large Scale Integration (vlsi) Systems. - : IEEE. - 1063-8210 .- 1557-9999. ; 24:1, s. 403-407
  • Tidskriftsartikel (refereegranskat)abstract
    • In the era of platforms hosting multiple applications with arbitrary reconfiguration requirements, static configuration architectures are neither optimal nor desirable. The static reconfiguration architectures either incur excessive overheads or cannot support advanced features (like time-sharing and runtime parallelism). As a solution to this problem, we present a polymorphic configuration architecture (PCA) that provides each application with a configuration infrastructure tailored to its needs.
  •  
29.
  • Jafri, Syed M. A. H., et al. (författare)
  • Private reliability environments for efficient fault-tolerance in CGRAs
  • 2014
  • Ingår i: Design automation for embedded systems. - : Springer Science and Business Media LLC. - 0929-5585 .- 1572-8080. ; 18:3-4, s. 295-327
  • Tidskriftsartikel (refereegranskat)abstract
    • In the era of platforms hosting multiple applications with variable reliability needs, worst-case platform-wide fault-tolerance decisions are neither optimal nor desirable. As a solution to this problem, designs commonly employ adaptive fault-tolerance strategies that provide each application with the reliability level actually needed. However, in the CGRA domain, the existing schemes either only allow to shift between different levels of modular redundancy (duplication, triplication, etc.) or protect only a particular region of a device (e.g. configuration memory, computation, or data memory). To complement these strategies, we propose private fault-tolerance environments which, in addition to modular redundancy, also provide low cost sub-modular (e.g. residue mod 3) redundancy capable of handling both permanent and temporary faults in configuration memory, computation, communication, and data memory. In addition, we also present adaptive configuration scrubbing techniques which prevent fault accumulation in the configuration memory. Simulation results using a few selected algorithms (FFT, matrix multiplication, and FIR filter) show that the approach proposed is capable of providing flexible protection with energy overhead ranging from 3.125 % to 107 % for different reliability levels. Synthesis results have confirmed that the proposed architecture reduces the area overhead for self-checking (58 %) and fault-tolerant (7.1 %) versions, compared to the state of the art adaptive reliability techniques.
  •  
30.
  • Jafri, Syed Mohammad Asad Hassan, et al. (författare)
  • RuRot : Run-time rotatable-expandable partitions for efficient mapping in CGRAs
  • 2014
  • Ingår i: Proceedings - International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, SAMOS 2014. - 9781479937707 ; , s. 233-241
  • Konferensbidrag (refereegranskat)abstract
    • Today, Coarse Grained Reconfigurable Architectures (CGRAs) host multiple applications, with arbitrary communication and computation patterns. Compile-time mapping decisions are neither optimal nor desirable to efficiently support the diverse and unpredictable application requirements. As a solution to this problem, recently proposed architectures offer run-time remapping. The run-time remappers displace or expand (parallelize/serialize) an application to optimize different parameters (such as platform utilization). However, the existing remappers support application displacement or expansion in either horizontal or vertical direction. Moreover, most of the works only address dynamic remapping in packet-switched networks and therefore are not applicable to the CGRAs that exploit circuitswitching for low-power and high predictability. To enhance the optimality of the run-time remappers, this paper presents a design framework called Run-time Rotatable-expandable Partitions (RuRot). RuRot provides architectural support to dynamically remap or expand (i.e. parallelize) the hosted applications in CGRAs with circuit-switched interconnects. Compared to state of the art, the proposed design supports application rotation (in clockwise and anticlockwise directions) and displacement (in horizontal and vertical directions), at run-time. Simulation results using a few applications reveal that the additional flexibility enhances the device utilization, significantly (on average 50 % for the tested applications). Synthesis results confirm that the proposed remapper has negligible silicon (0.2 % of the platform) and timing (2 cycles per application) overheads.
  •  
31.
  • Jafri, Syed Mohammad Asad Hassan, et al. (författare)
  • TransPar : Transformation based dynamic Parallelism for low power CGRAs
  • 2014
  • Ingår i: Conference Digest - 24th International Conference on Field Programmable Logic and Applications, FPL 2014. - 9783000446450
  • Konferensbidrag (refereegranskat)abstract
    • Coarse Grained Reconfigurable Architectures (CGRAs) are emerging as enabling platforms to meet the high performance demanded by modern applications (e.g. 4G, CDMA, etc.). Recently proposed CGRAs offer runtime parallelism to reduce energy consumption (by lowering voltage/frequency). To implement the runtime parallelism, CGRAs commonly store multiple compile-time generated implementations of an application (with different degree of parallelism) and select the optimal version at runtime. However, the compile-time binding incurs excessive configuration memory overheads and/or is unable to parallelize an application even when sufficient resources are available. As a solution to this problem, we propose Transformation based dynamic Parallelism (TransPar). TransPar stores only a single implementation and applies a series for transformations to generate the bitstream for the parallel version. In addition, it also allows to displace and/or rotate an application to parallelize in resource constrained scenarios. By storing only a single implementation, TransPar offers significant reductions in configuration memory requirements (up to 73% for the tested applications), compared to state of the art compaction techniques. Simulation and synthesis results, using real applications, reveal that the additional flexibility allows up to 33% energy reduction compared to static memory based parallelism techniques. Gate level analysis reveals that TransPar incurs negligible silicon (0.2% of the platform) and timing (6 additional cycles per application) penalty.
  •  
32.
  • Kakakhel, Syed Rameez Ullah, et al. (författare)
  • Enhancing Smart Grids via Advanced Metering Infrastructure and Fog Computing Fusion
  • 2020
  • Ingår i: 2020 IEEE 6TH WORLD FORUM ON INTERNET OF THINGS (WF-IOT). - : Institute of Electrical and Electronics Engineers (IEEE).
  • Konferensbidrag (refereegranskat)abstract
    • The smart grid is a new generation of the power grid that incorporates advanced features such as distributed energy resources, two-way communication and situation awareness. It is not just energy that is exchanged between consumers and producers but information. An efficient and robust smart grid requires efficient and robust communication and computation infrastructure to carry and process the associated data. We provide an overview of the possibilities that fog computing offer for smart grids. In our investigation, the pillars of fog computing, such as decentralization, resiliency, scalability and mobility, offer a perfect match for the decentralized smart grid. Fog computing nodes, capable of communication and coordination, incorporated in smart meters, will provide distributed control, communication and computation. Thus, enhancing reliability, resiliency and scalability of the smart grid as more and more distributed energy resources (DERs) are added to the grid.
  •  
33.
  • Karami, Masoomeh, et al. (författare)
  • Hierarchical Fault Simulation of Deep Neural Networks on Multi-Core Systems
  • 2021
  • Ingår i: 2021 IEEE EUROPEAN TEST SYMPOSIUM (ETS 2021). - : Institute of Electrical and Electronics Engineers (IEEE).
  • Konferensbidrag (refereegranskat)abstract
    • In this paper, a hierarchical fault simulation technique for neural networks is proposed, supporting both permanent and temporary faults. In the proposed technique, different levels of hierarchy are used, forming a mixed-level simulation environment. In such an environment, the pre-synthesis behavioral specification of the network and the post-synthesis gate-level model are co-simulated. To accelerate the fault simulation process, faults are injected in the gate-level specification of the selected neurons while the behavioral model in different levels of abstraction is used to simulate the remaining neurons. Further speedup is obtained through event-driven simulation and parallelization. Experimental results confirm the time efficiency of the proposed fault simulation technique.
  •  
34.
  • Karami, Masoomeh, et al. (författare)
  • High-Performance Parallel Fault Simulation for Multi-Core Systems
  • 2021
  • Ingår i: 2021 29th euromicro international conference on parallel, distributed and network-based processing (PDP 2021). - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 207-211
  • Konferensbidrag (refereegranskat)abstract
    • Fault simulation is a time-consuming process that requires customized methods and techniques to accelerate it. Multi-threading and Multi-core approaches are two promising techniques that can be exploited to accelerate the fault simulation process by using different parts of the hardware at the same time. However, an efficient parallelization is obtained only by the refinement of software with respect to the hardware platform. In this paper, a parallel multi-thread fault simulation technique is proposed to accelerate the simulation process on multi-core platforms. In this approach, the gate input values are independently assigned to each thread. Each input value carries the information of several parallel simulation processes. This provides a multithread parallel fault simulation environment. The experimental results show that the proposed technique can efficiently use the hardware platform. In a single-core platform. the proposed technique can reduce the time by 25% while in a dual-core increasing the thread approximately halves the execution time.
  •  
35.
  • Karami, Masoomeh, et al. (författare)
  • Thread-level Parallelism in Fault Simulation of Deep Neural Networks on Multi-Processor Systems
  • 2022
  • Ingår i: Proceedings. - : Institute of Electrical and Electronics Engineers (IEEE).
  • Konferensbidrag (refereegranskat)abstract
    • High-performance fault simulation is one of the essential and preliminary tasks in the process of online and offline testing of machine learning (ML) hardware. Deep neural networks (DNN), as one of the essential parts of ML programs, are widely used in many critical and non-critical applications in Systems-on-Chip and ASIC designs. Through fault simulation for DNNs, by increasing the number of neurons, the fault simulation time increases exponentially. However, the software architecture of neural networks and the lack of dependency between neurons in each inference layer provide significant opportunity for parallelism of the fault simulation time in a multi-processor platform. In this paper, a multi-thread technique for hierarchical fault simulation of neural network is proposed, targeting both permanent and transient faults. During the process of fault simulation the neurons for each inference layer will be distributed among the executing threads. Since in the process of hierarchical fault simulation, the faulty neuron demands proportionally enormous computation comparing to behavioural model of non-faulty neurons, the faulty neuron will be assigned to one thread while the rest of the neurons will be divided among the remaining threads. Experimental results confirm the time efficiency of the proposed fault simulation technique on multi-processor architectures.
  •  
36.
  • Kelati, Amleset, et al. (författare)
  • Biosignal Feature Extraction Techniques for IoT Healthcare Platform
  • 2016
  • Ingår i: IEEE Conference on Design and Architectures for Signal and Image Processing (DASIP2016). - Rennes, France.
  • Konferensbidrag (populärvet., debatt m.m.)abstract
    • In IoT healthcare platform, a variety of biosignals are acquired from its sensors and appropriate feature extraction techniques are crucial in order to make use of the acquired biosignal data and help the healthcare scientist or bio-engineer to reach at optimal decisions. This work reviews the existing biosignal feature extraction and classification methods for different healthcare applications. Due the enormous amount of different biosignals and since most healthcare applications uses electrocardiogram (ECG), electroencephalogram (EEG), electromyogram (EMG), Electrogastrogram (EGG), we focus the review on feature extractions and classification method for these biosignals. The review also includes a summary of Blood Oxygen Saturation determined by Pulse Oximetry (SpO2), Electrooculography and eye movement (EOG), and Respiration (RSP) signals. Its discussion and analysis focuses on advantages, performance and drawbacks of the techniques.
  •  
37.
  • Kelati, Amleset (författare)
  • Classification of Pain level using Zygomaticus and Corrugator EMG Features: Machine Learning Approach
  • Tidskriftsartikel (refereegranskat)abstract
    • A real-time recognition of facial expressions is required to certify the accurate pain assess-8 ment of patients in ICU, infants, and other patients who may not be able to communicate verbally 9 or even express the sensation of pain. Facial expression is a key pain-related behavior that may 10 unlock the answer to objective pain measurement tool. In this work, a machine learning based pain 11 level classification using data collected from facial electromyograms (EMG) is presented. The da-12 taset is acquired from part of Bio Vid Heat Pain database [1] to evaluated facial expression from emg 13 corrugator and emg zygomaticus and an EMG signal processing and data analysis flow is adapted 14 for continuous pain estimation. The extracted pain-associated facial electromyography (fEMG) fea-15 tures classification is performed by a supervised ML algorithm, on the KNN by choosing the value 16 of k and that depends on the nonlinear models. The presentation of the accuracy estimation is per-17 formed with and considerable growth in classification accuracy is noticed when the subject matter 18 from the features is omitted from the analysis. The ML algorithm for classification of the amount of 19 pain in patients could deliver valuable evidence for the health care providers and aid the treatment 20 assessment. Performances of 99.4% shown on the binary classification for the dis-crimination be-21 tween the baseline and the pain tolerance level (P0 verse P4) without the influence of on a subject 22 bias. Moreover, the result of the classification accuracy is clearly showing the relevance of the pro-23 posed approach.
  •  
38.
  • Kelati, Amleset (författare)
  • Data-driven Implementations for Enhanced Healthcare Internet-of-Things Systems
  • 2021
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Healthcare monitoring systems based on the Internet of Things (IoT) areemerging as a potential solution for reducing healthcare costs by impacting and improving the quality of health care delivery. The rising numberof elderly and chronic patient population in the world and the associatedhealthcare costs urges the application of IoT technology to improve andsupport the health care services. This thesis develops and integrates twoIoT-based healthcare systems aiming to support elderly independent livingat home. The first one involves using IoT-based remote monitoring for paindetection, while the second one detects behavioral changes caused by illnessvia profiling the appliances’ energy usage.In the first approach, an Electromyography (EMG )sensor node with aWireless Fidelity (Wi-Fi) radio module is designed for monitoring the painof patients living at home. An appropriate feature-extraction and classification algorithm is applied to the EMG signal. The classification algorithmachieves 98.5% accuracy for the experimental data collected from the developed EMG sensor node, while it achieves 99.4% classification accuracy forthe clinically approved pain intensity dataset. Moreover, the experimentalresults clearly show the relevance of the proposed approaches and provetheir suitability for real-life applications. The developed sensor node for thepain level classification method is beneficial for continuous pain assessmentto the smart home-care community.As a complement to the first approach, in the second approach, an IoTbased smart meter and a set of appliance-level load profiling methods aredeveloped to detect the electricity usage of users’ daily living at home, whichindirectly provides information about the subject’s health status. The thesishas formulated a novel methodology by integrating Non-intrusive ApplianceLoad Monitoring (NIALM) analysis with Machine Learning- (ML) basedclassification at the fog layer. The developed method allows the detectionof a single appliance with high accuracy by associating the user’s Activitiesof Daily Living (ADL). The appliances detection is performed by employinga k-Nearest Neighbors (k-NN) classification algorithm. It achieves 97.4% accuracy, demonstrating its high detection performance. Due to the low cost and reusability advantages of Field Programmable Gate Arrays (FPGA),the execution of k-NN for appliances classification model is performed onan FPGA. Its classification performance was comparable with other computing platforms, making it a cost-effective alternative for IoT-based healthcare assessment of daily living at home. The developed methods have haspractical application in assisting real-time e-health monitoring of any individual who can remain in the comfort of their normal living environment. 
  •  
39.
  • Kelati, Amleset, et al. (författare)
  • Implementation of K-nearest Neighbor on Field Programmable Gate Arrays for Appliance Classification
  • 2020
  • Ingår i: 2020 8th International Conference on Smart Energy Grid Engineering, SEGE 2020. - Oshawa, ON, Canada : IEEE. ; , s. 51-57
  • Konferensbidrag (refereegranskat)abstract
    • Accurate appliance energy consumption information can perform with the Non-Intrusive Appliances Load Monitoring (NIALM) system. However, faster and advanced appliance classification accuracy can be enhanced by the implementation of the k-nearest neighbor (k-NN) classifier in hardware. A field-programmable gate array (FPGA) hardware implementation can speed up the processing time with a high level of performance accuracy. The result proved that the HLS-based solution has reduced design complexity and time for cost-effectiveness. The Plug Load Appliance Identification Dataset (PLAID) is used as a benchmark for the implementation. The selected appliance identification is implemented using Xilinx Zynq-7000 and the HLS-based solution has used an area of 37.1% for LUT and 21% for FF from the available chip. Thus, the implementation improved the cost and classification accuracy with a processing time of 5.9 ms and the consumed power was 1.94 W.
  •  
40.
  • Kelati, Amleset, et al. (författare)
  • Implementation of non-intrusive appliances load monitoring (NIALM) on k-nearest neighbors (k-NN) classifier
  • 2020
  • Ingår i: AIMS Electronics and Electrical Engineering. - : American Institute of Mathematical Sciences (AIMS). - 2578-1588. ; 4:3, s. 326-344
  • Tidskriftsartikel (refereegranskat)abstract
    • Nonintrusive Appliance Load Monitoring (NIALM) is used to analyze individual’s house energy consumption by distinguishing variations in voltage and current of appliances in a household. The method identifies load consumption of each appliance from the aggregated home energy consumption. NIALM will also provide information of load consumptions of each appliance by indirectly detecting the abnormal changes of appliance usage. The proposed NIALM approach is based on features extraction from load consumptions measurements of electrical power signals in order to classify appliance’s state of operation. In this work, we have improved the identification accuracy and the detection of appliances based on their operational state by employing Machine Learning (ML) technique; namely k-nearest neighbor (k-NN) classification algorithm. The dataset used to perform this process is from the publicly available (PLAID) of power, voltage and current signals of appliances from several houses. This is used as benchmark data set. The PLAID dataset is collected and processed for each appliance and our classification results based on k-NN algorithm achieved high accuracy and is able to gain cost-effective solution. In addition, the result shows that k-NN classifier is a proven as an efficient method for NIALM techniques when compared with other proposed different ML options. Based on the used dataset, the average F-score measure obtained using the k-NN classifier is 90%. Possible reasons behind these findings are discussed and areas for further exploration are proposed.
  •  
41.
  • Kelati, Amleset, et al. (författare)
  • Machine Learning for sEMG Facial Feature Characterization
  • 2020
  • Ingår i: Signal Processing Algorithms, Architectures, Arrangements and Applications (SPA). - : IEEE. ; , s. 169-174
  • Konferensbidrag (refereegranskat)abstract
    • Wearable e-health system, are frequently used for monitoring biomedical signals. These devices need to have advanced and applicable methods of feature selection and classifications for real time applications. Electromyogram (EMG) signal records the movement of the human muscle. EMG signal processing techniques aim to achieve the actual signal and among others, detect the state of signals related to positive and negative emotional expression. In our study, the data collected is from the facial muscle activity that is produced by the emotion of the facial expressions. The key challenge is in finding an accurate classification method of the measured signals. This paper investigates the promising techniques for the detection and classification of EMG signal using machinelearning theory. Here, we demonstrated Support Vector Machine (SVM) is an optimal method for classification of facial surface Electromyogram (sEMG) signal associated to pain dataset. The test results and the methods are able to analyze the patterns recognition of facial EMG signal classification. The result and the findings 99% accuracy with SVM method adds value on the classification algorithms of our EMG signal acquisitions platform.
  •  
42.
  • Kelati, Amleset, et al. (författare)
  • Real-Time Classification of Pain Level Using Zygomaticus and Corrugator EMG Features
  • 2022
  • Ingår i: Electronics. - : MDPI AG. - 2079-9292. ; 11:11, s. 1671-1671
  • Tidskriftsartikel (refereegranskat)abstract
    • The real-time recognition of pain level is required to perform an accurate pain assessment of patients in the intensive care unit, infants, and other subjects who may not be able to communicate verbally or even express the sensation of pain. Facial expression is a key pain-related behavior that may unlock the answer to an objective pain measurement tool. In this work, a machine learning-based pain level classification system using data collected from facial electromyograms (EMG) is presented. The dataset was acquired from part of the BioVid Heat Pain database to evaluate facial expression from an EMG corrugator and EMG zygomaticus and an EMG signal processing and data analysis flow is adapted for continuous pain estimation. The extracted pain-associated facial electromyography (fEMG) features classification is performed by K-nearest neighbor (KNN) by choosing the value of k which depends on the nonlinear models. The presentation of the accuracy estimation is performed, and considerable growth in classification accuracy is noticed when the subject matter from the features is omitted from the analysis. The ML algorithm for the classification of the amount of pain experienced by patients could deliver valuable evidence for health care providers and aid treatment assessment. The proposed classification algorithm has achieved a 99.4% accuracy for classifying the pain tolerance level from the baseline (P0 versus P4) without the influence of a subject bias. Moreover, the result on the classification accuracy clearly shows the relevance of the proposed approac
  •  
43.
  • Kelati, Amleset, et al. (författare)
  • Signal Processing Based BioSignal Feature Extraction and Classification Techniques for IoT Healthcare Platform: Survey
  • 2016
  • Ingår i: IEEE Conference on Design and Architectures for Signal and Image Processing (DASIP2016). - Rennes, France.
  • Konferensbidrag (populärvet., debatt m.m.)abstract
    • In IoT healthcare platform, a variety of biosignals are acquired from its sensors and appropriate feature extraction techniques are crucial in order to make use of the acquired biosignal data and help the healthcare scientist or bio-engineer to reach at optimal decisions. This work reviews the existing biosignal feature extraction and classification methods for different healthcare applications. Due the enormous amount of different biosignals and since most healthcare applications uses electrocardiogram (ECG), electroencephalogram (EEG), electromyogram (EMG), Electrogastrogram (EGG), we focus the review on feature extractions and classification method for these biosignals. The review also includes a summary of Blood Oxygen Saturation determined by Pulse Oximetry (SpO2), Electrooculography and eye movement (EOG), and Respiration (RSP) signals. Its discussion and analysis focuses on advantages, performance and drawbacks of the techniques.
  •  
44.
  • Kelati, Amleset, et al. (författare)
  • Smart Meter Load Profiling for e-Health Monitoring System
  • 2019
  • Ingår i: 2019 IEEE 7th International Conference on Smart Energy Grid Engineering (SEGE). - Oshawa, ON, Canada : IEEE. - 9781728124407 - 9781728124414 - 9781728124391
  • Konferensbidrag (refereegranskat)abstract
    • A structural health-monitoring system needed to come out from the problem associated due to the rapidly growing population of elderly and the health care demand. The paper discussed the consumer's electricity usage data, from the smart meter, how to support the healthcare sector by load profiling the normal or abnormal energy consumption. For this work, the measured dataset is taken from 12 households and collected by the smart meter with an interval of an hour for one month. The dataset is grouped according to the features pattern, reduced by matrix-based analysis and classified with K-Means algorithm data mining clustering method. We showed how the clustering result of the Sum Square Error (SSE) has connection trend to indicate normal or abnormal behavior of electricity usage and leads to determine the assumption of the consumer's health status.
  •  
45.
  • Majd, Amin, et al. (författare)
  • Multi-Population Parallel Imperialist Competitive Algorithm for Solving Systems of Nonlinear Equations
  • 2016
  • Ingår i: 2016 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2016). - : IEEE. - 9781509020881 ; , s. 767-775
  • Konferensbidrag (refereegranskat)abstract
    • the widespreadimportance of optimization and solving NP-hard problems, like solving systems of nonlinear equations, is indisputable in a diverse range of sciences. Vast uses of non-linear equations are undeniable. Some of their applications are in economics, engineering, chemistry, mechanics, medicine, and robotics. There are different types of methods of solving the systems of nonlinear equations. One of the most popular of them is Evolutionary Computing (EC). This paper presents an evolutionary algorithm that is called Parallel Imperialist Competitive Algorithm (PICA) which is based on a multi population technique for solving systems of nonlinear equations. In order to demonstrate the efficiency of the proposed approach, some well-known problems are utilized. The results indicate that the PICA has a high success and a quick convergence rate.
  •  
46.
  • Majd, Amin, et al. (författare)
  • NOMeS : Near-Optimal Metaheuristic Scheduling for MPSoCs
  • 2017
  • Ingår i: 2017 19TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND DIGITAL SYSTEMS (CADS). - : IEEE. - 9781538643792 ; , s. 70-75
  • Konferensbidrag (refereegranskat)abstract
    • The task scheduling problem for Multiprocessor System-on-Chips (MPSoC), which plays a vital role in performance, is an NP-hard problem. Exploring the whole search space in order to find the optimal solution is not time efficient, thus metaheuristics are mostly used to find a near-optimal solution in a reasonable amount of time. We propose a novel metaheuristic method for near-optimal scheduling that can provide performance guarantees for multiple applications implemented on a shared platform. Applications are represented as directed acyclic task graphs (DAG) and are executed on an MPSoC platform with given communication costs. We introduce a novel multi-population method inspired by both genetic and imperialist competitive algorithms. It is specialized for the scheduling problem with the goal to improve the convergence policy and selection pressure. The potential of the approach is demonstrated by experiments using a Sobel filter, a SUSAN filter, RASTA-PLP and JPEG encoder as real-world case studies.
  •  
47.
  • Majd, Amin, et al. (författare)
  • Parallel imperialist competitive algorithms
  • 2018
  • Ingår i: Concurrency and Computation. - : WILEY. - 1532-0626 .- 1532-0634. ; 30:7
  • Tidskriftsartikel (refereegranskat)abstract
    • The importance of optimization and NP-problem solving cannot be overemphasized. The usefulness and popularity of evolutionary computing methods are also well established. There are various types of evolutionary methods; they are mostly sequential but some of them have parallel implementations as well. We propose a multi-population method to parallelize the Imperialist Competitive Algorithm. The algorithm has been implemented with the Message Passing Interface on 2 computer platforms, and we have tested our method based on shared memory and message passing architectural models. An outstanding performance is obtained, demonstrating that the proposed method is very efficient concerning both speed and accuracy. In addition, compared with a set of existing well-known parallel algorithms, our approach obtains more accurate results within a shorter time period.
  •  
48.
  • Majd, Amin, et al. (författare)
  • PICA : Multi-Population Implementation of Parallel Imperialist Competitive Algorithms
  • 2016
  • Ingår i: 2016 24TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP). - : Institute of Electrical and Electronics Engineers (IEEE). - 9781467387767 ; , s. 248-255
  • Konferensbidrag (refereegranskat)abstract
    • The importance of optimization and NP problems solving cannot be over emphasized. The usefulness and popularity of evolutionary computing methods are also well established. There are various types of evolutionary methods that arc mostly sequential, and some others have parallel implementation. We propose a method to parallelize Imperialist Competitive Algorithm (Multi-Population). The algorithm has been implemented with MPI on two platforms and have tested our algorithms on a shared- memory and message passing architecture. An outstanding performance is obtained, which indicates that the method is efficient concern to speed and accuracy. In the second step, the proposed algorithm is compared with a set of existing well known parallel algorithms and is indicated that it obtains more accurate solutions in a lower time.
  •  
49.
  • Majd, Amin, et al. (författare)
  • Placement of Smart Mobile Access Points in Wireless Sensor Networks and Cyber-Physical Systems using Fog Computing
  • 2016
  • Ingår i: Proceedings - 13th IEEE International Conference on Ubiquitous Intelligence and Computing, 13th IEEE International Conference on Advanced and Trusted Computing, 16th IEEE International Conference on Scalable Computing and Communications, IEEE International Conference on Cloud and Big Data Computing, IEEE International Conference on Internet of People and IEEE Smart World Congress and Workshops, UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld 2016Proceedings - 13th IEEE International Conference on Ubiquitous Intelligence and Computing, 13th IEEE International Conference on Advanced and Trusted Computing, 16th IEEE International Conference on Scalable Computing and Communications, IEEE International Conference on Cloud and Big Data Computing, IEEE International Conference on Internet of People and IEEE Smart World Congress and Workshops, UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld 2016. - : IEEE conference proceedings. - 9781509027712 ; , s. 680-689
  • Konferensbidrag (refereegranskat)abstract
    • Increasingly sophisticated, complex, and energy-efficient cyber-physical systems and wireless sensor networks are emerging, facilitated by recent advances in computing and sensor technologies. Integration of cyber-physical systems and wireless sensor networks with other contemporary technologies, such as unmanned aerial vehicles and fog or edge computing, enable creation of completely new smart solutions. We present the concept of a Smart Mobile Access Point (SMAP), which is a key building block for a smart network, and propose an efficient placement approach for such SMAPs. SMAPs predict the behavior of the network, based on information collected from the network, and select the best approach to support the network at any given time. When needed, they autonomously change their positions to obtain a better configuration from the network performance perspective. Therefore, placement of SMAPs is an important issue in such a system. Initial placement of SMAPs is an NP problem, and evolutionary algorithms provide an efficient means to solve it. Specifically, we present a parallel implementation of the imperialistic competitive algorithm and an efficient evaluation or fitness function to solve the initial placement of SMAPs in the fog computing context.
  •  
50.
  • Mohamed, Sherif A. S., et al. (författare)
  • Monocular visual odometry based on hybrid parameterization
  • 2020
  • Ingår i: Proceedings of SPIE - The International Society for Optical Engineering. - : SPIE.
  • Konferensbidrag (refereegranskat)abstract
    • Visual odometry (VO) is one of the most challenging techniques in computer vision for autonomous vehicle/vessels. In VO, the camera pose that also represents the robot pose in ego-motion is estimated analyzing the features and pixels extracted from the camera images. Different VO techniques mainly provide different trade-offs among the resources that are being considered for odometry, such as camera resolution, computation/communication capacity, power/energy consumption, and accuracy. In this paper, a hybrid technique is proposed for camera pose estimation by combining odometry based on triangulation using the long-term period of direct-based odometry and the short-term period of inverse depth mapping. Experimental results based on the EuRoC data set shows that the proposed technique significantly outperforms the traditional direct-based pose estimation method for Micro Aerial Vehicle (MAV), keeping its potential negative effect on performance negligible.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-50 av 76
Typ av publikation
konferensbidrag (50)
tidskriftsartikel (20)
doktorsavhandling (4)
konstnärligt arbete (3)
licentiatavhandling (2)
Typ av innehåll
refereegranskat (66)
övrigt vetenskapligt/konstnärligt (8)
populärvet., debatt m.m. (2)
Författare/redaktör
Plosila, Juha (72)
Tenhunen, Hannu (57)
Hemani, Ahmed (20)
Daneshtalab, Masoud (20)
Liljeberg, Pasi (17)
Jafri, Syed Mohammad ... (15)
visa fler...
Paul, Kolin (14)
Ebrahimi, Masoumeh (12)
Rahmani, Amir-Mohamm ... (12)
Kelati, Amleset (9)
Majd, Amin (6)
Jafri, Syed M. A. H. (5)
Ellervee, Peeter (4)
Plosila, Juha, Profe ... (4)
Dytckov, Sergei (4)
Guang, Liang (4)
Farahini, Nasim (3)
Tajammul, Muhammad A ... (3)
Mubeen, Saad (2)
Anwar, Hassan (2)
Yang, Bo (2)
Bruhn, Fredrik (2)
Tsog, Nandinbaatar (2)
Carlsson, Jonas, 197 ... (2)
Farahnakian, Fahimeh (2)
Abbas, N (1)
Oelmann, Bengt (1)
Behnam, Moris (1)
Seoane, Fernando, 19 ... (1)
Behnam, Moris, 1973- (1)
Sjödin, Mikael (1)
Öberg, Johnny (1)
Zheng, Li-Rong (1)
Iqbal, J. (1)
Jantsch, Axel (1)
Westerlund, Tomi (1)
Mvungi, Nerey (1)
Sergei, Dytckov (1)
Moghaddami Khalilzad ... (1)
Troubitsyna, Elena (1)
Palesi, Maurizio (1)
Kondoro, Aron (1)
Ben Dhaou, Imed (1)
Gia, T. N. (1)
Kakakhel, Syed Ramee ... (1)
Nolin, Mikael, 1971- (1)
Mohammadi, Siamak (1)
Chang, Xin (1)
Flich, Jose (1)
Jafri, Syed (1)
visa färre...
Lärosäte
Kungliga Tekniska Högskolan (71)
Mälardalens universitet (4)
Linköpings universitet (2)
Språk
Engelska (74)
Svenska (2)
Forskningsämne (UKÄ/SCB)
Teknik (53)
Naturvetenskap (17)
Medicin och hälsovetenskap (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy