SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Ebrahimi Masoumeh) "

Sökning: WFRF:(Ebrahimi Masoumeh)

  • Resultat 1-50 av 77
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Abdollahi, Meisam, et al. (författare)
  • RAP-NoC : Reliability Assessment of Photonic Network-on-Chips, A simulator
  • 2021
  • Ingår i: Proceedings of the 8th ACM international conference on nanoscale computing and communication (ACM NANOCOM 2021). - New York, NY, USA : Association for Computing Machinery (ACM).
  • Konferensbidrag (refereegranskat)abstract
    • Nowadays, optical network-on-chip is accepted as a promising alternative solution for traditional electrical interconnects due to lower transmission delay and power consumption as well as considerable high data bandwidth. However, silicon photonics struggles with some particular challenges that threaten the reliability of the data transmission process.The most important challenges can be considered as temperature fluctuation, process variation, aging, crosstalk noise, and insertion loss. Although several attempts have been made to investigate the effect of these issues on the reliability of optical network-on-chip, none of them modeled the reliability of photonic network-on-chip in a system-level approach based on basic element failure rate. In this paper, an analytical model-based simulator, called Reliability Assessment of Photonic Network-on-Chips (RAP-NoC), is proposed to evaluate the reliability of different 2D optical network-on-chip architectures and data traffic. The experimental results show that, in general, Mesh topology is more reliable than Torus considering the same size. Increasing the reliability of Microring Resonator (MR) has a more significant impact on the reliability of an optical router rather than a network.
  •  
2.
  • Alizadeh, Razieh, et al. (författare)
  • Fault-Tolerant Circular Routing Algorithm for 3D-NoC
  • 2014
  • Ingår i: 2014 INTERNATIONAL CONGRESS ON TECHNOLOGY, COMMUNICATION AND KNOWLEDGE (ICTCK). - : IEEE. - 9781479980215
  • Konferensbidrag (refereegranskat)abstract
    • ExpandingNetworks-on-Chip (NoCs) to the third dimension (3D-NoC) has been known as a promising solution for the latency challenges of future many-core Systems-on-Chip. 3D-NoC may take advantages of TSVsfor vertical links which are shorter and faster than horizontal ones. Faults may occur in TSVs as well as the horizontal links though faults inTSVs are more costly. In this paper, we present a fault-tolerant routing algorithm targeting faults in both TSVs and horizontal links. The proposed routing algorithm is based on defining some circular routing paths which offers a deadlock-free routing for packets in mesh-based topologies. In addition to tolerating faults, these circular pathshelp in reducing congestion in the centralpart of the network at high injection rates. The proposed circular routing algorithm is able to tolerate all one-faulty links. In addition, it is shown that its performance is better than those of traditional methods.
  •  
3.
  • Baharloo, Mohammad, et al. (författare)
  • Traffic-aware performance optimization in Real-time wireless network on chip
  • 2020
  • Ingår i: Nano Communication Networks. - : Elsevier BV. - 1878-7789 .- 1878-7797. ; 26
  • Tidskriftsartikel (refereegranskat)abstract
    • Network on Chip (NoC) is a prevailing communication platform for multi-core embedded systems. Wireless network on chip (WNoC) employs wired and wireless technologies simultaneously to improve the performance and power-efficiency of traditional NoCs. In this paper, we propose a deterministic and scalable arbitration mechanism for the medium access control in the wireless plane and present its analytical worst-case delay model in a certain use-case scenario that considers both Real-time (RT) and Non Real-time (NRT) flows with different packet sizes. Furthermore, we design an optimization model to jointly consider the worst-case and the average-case performance parameters of the system. The Optimization technique determines how NRT flows are allowed to use the wireless plane in a way that all RT flows meet their deadlines, and the average case delay of the WNoC is minimized. Results show that our proposed approach decreases the average latency of network flows up to 17.9%, and 11.5% in 5 × 5, and 6 × 6 mesh sizes, respectively.
  •  
4.
  • Ben Dhaou, Imed, et al. (författare)
  • Edge Devices for Internet of Medical Things : Technologies, Techniques, and Implementation
  • 2021
  • Ingår i: Electronics. - : MDPI AG. - 2079-9292. ; 10:17
  • Forskningsöversikt (refereegranskat)abstract
    • The health sector is currently experiencing a significant paradigm shift. The growing number of elderly people in several countries along with the need to reduce the healthcare cost result in a big need for intelligent devices that can monitor and diagnose the well-being of individuals in their daily life and provide necessary alarms. In this context, wearable computing technologies are gaining importance as edge devices for the Internet of Medical Things. Their enabling technologies are mainly related to biological sensors, computation in low-power processors, and communication technologies. Recently, energy harvesting techniques and circuits have been proposed to extend the operating time of wearable devices and to improve usability aspects. This survey paper aims at providing an overview of technologies, techniques, and algorithms for wearable devices in the context of the Internet of Medical Things. It also surveys the various transformation techniques used to implement those algorithms using fog computing and IoT devices.
  •  
5.
  • Bitalebi, Hossein, et al. (författare)
  • LATOA : Load-Aware Task Offloading and Adoption in GPU
  • 2023
  • Ingår i: Proceedings of the 15th Workshop on General Purpose Processing Using GPU, GPGPU 2023. - : Association for Computing Machinery (ACM). ; , s. 7-13
  • Konferensbidrag (refereegranskat)abstract
    • The emerging new applications, such as data mining and graph analysis, demand extra processing power at the hardware level. Conventional static task scheduling is no longer able to meet the requirements of such complicated applications. This inefficiency is a major concern when the application is supposed to run on a Graphics Processing Unit (GPU), where millions of instructions should be distributed among a limited number of processing cores. A non-optimal scheduling strategy leads to unfair load distribution among the GPU’s processing cores. Consequently, while busy cores are stalled due to the lack of resources, waiting for their data from the main memory, other cores are idle, waiting for busy cores to complete their tasks. Our study introduces LATOA, a Load-Aware Task Offloading and Adoption method that tackles this problem by reducing both stall and idle cycles. LATOA is the first study moving from static to dynamic task scheduling based on run-time information obtained from the Miss Status Holding Register (MSHR) tables. In LATOA, all processing cores are dynamically tagged with critical, neutral, or relaxed states. Then, irregular warps with low locality properties are detected and offloaded from critical cores (going to the stall state) to relaxed ones (going to the idle state). Based on our experiments, LATOA reduces the number of stall cycles on average by 24% and increases the neutral states on average by 38%. In addition, with negligible hardware overhead, LATOA improves system performance and power efficiency on average by 26% and 7%, respectively.
  •  
6.
  • Bitalebi, Hossein, et al. (författare)
  • Near LLC versus near main memory processing
  • 2022
  • Ingår i: ACM Int. Conf. Proc. Ser.. - New York, NY, USA : Association for Computing Machinery (ACM). ; , s. 1-6
  • Konferensbidrag (refereegranskat)abstract
    • Emerging advanced applications, such as deep learning and graph processing, with enormous processing demand and massive memory requests call for a comprehensive processing system or advanced solutions to address these requirements. Near data processing is one of the promising structures targeting this goal. However, most recent studies have focused on processing instructions near the main memory data banks while ignoring the benefits of processing instructions near other memory hierarchy levels such as LLC. In this study, we investigate the near LLC processing structures, and compare it to the near main memory processing alternative, specifically in graphics processing units. We analyze these two structures on various applications in terms of performance and power. Results show a clear benefit of near LLC processing over near main memory processing in a class of applications. Further, we suggest an architecture, which could benefit from both near main memory and near LLC processing structures, but requiring the applications to be characterized in advance or at run time.
  •  
7.
  • Charif, Amir, et al. (författare)
  • First-Last: A Cost-Effective Adaptive Routing Solution for TSV-Based Three-Dimensional Networks-on-Chip
  • 2018
  • Ingår i: IEEE Transactions on Computers. - 0018-9340 .- 1557-9956. ; 67:10, s. 1430-1444
  • Tidskriftsartikel (refereegranskat)abstract
    • 3D integration opens up new opportunities for future multiprocessor chips by enabling fast and highly scalable 3DNetwork-on-Chip (NoC) topologies. However, in an aim to reduce the cost of Through-silicon via (TSV), partially vertically connectedNoCs, in which only a few vertical TSV links are available, have been gaining relevance. To reliably route packets under suchconditions, we introduce a lightweight, efficient and highly resilient adaptive routing algorithm targeting partially vertically connected3D-NoCs named First-Last. It requires a very low number of virtual channels (VCs) to achieve deadlock-freedom (2 VCs in the Eastand North directions and 1 VC in all other directions), and guarantees packet delivery as long as one healthy TSV connecting all layersis available anywhere in the network. An improved version of our algorithm, named Enhanced-First-Last is also introduced and shownto dramatically improve performance under low TSV availability while still using less virtual channels than state-of-the-art algorithms. Acomprehensive evaluation of the cost and performance of our algorithms is performed to demonstrate their merits with respects toexisting solutions.
  •  
8.
  • Chen, Kun-Chih, et al. (författare)
  • A Lego-Based Neural Network Design Methodology With Flexible NoC
  • 2021
  • Ingår i: IEEE Journal on Emerging and Selected Topics in Circuits and Systems. - : Institute of Electrical and Electronics Engineers (IEEE). - 2156-3357 .- 2156-3365. ; 11:4, s. 711-724
  • Tidskriftsartikel (refereegranskat)abstract
    • Deep Neural Networks (DNNs) have shown superiority in solving the problems of classification and recognition in recent years. However, DNN hardware implementation is challenging due to the high computational complexity and diverse dataflow in different DNN models. 'lb mitigate this design challenge, a large body of research has focused on accelerating specific DNN models or layers and proposed dedicated designs. However, dedicated designs for specific DNN models or layers limit the design flexibility. In this work, we take advantage of the similarity among different DNN models and propose a novel Lego-based Deep Neural Network on a Chip (DNNoC) design methodology. We work on common neural computing units (e.g., multiply-accumulation and pooling) and create some neuron computing units called NeuLego processing elements (NeuLego(PE)(s)). These NeuLego(PE)(s) are then interconnected using a flexible Network-on-Chip (NoC), allowing to construct different DNN models. To support large-scale DNN models, we enhance the reusability of each NeuLego(PE) by proposing a Lego placement method. The proposed design methodology allows leveraging different DNN model implementations, helping to reduce implementation cost and time-to-market. Compared with the conventional approaches, the proposed approach can improve the average throughput by 2,802% for given DNN models. Besides, the corresponding hardware is implemented to validate the proposed design methodology, showing on average 12,523% hardware efficiency improvement by considering the throughput and area overhead simultaneously.
  •  
9.
  • Chen, Kun-Chih, et al. (författare)
  • Guest Editorial : Communication-Aware Designs and Methodologies for Reliable and Adaptable On-Chip AI SubSystems and Accelerators
  • 2020
  • Ingår i: IEEE Journal on Emerging and Selected Topics in Circuits and Systems. - : IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. - 2156-3357 .- 2156-3365. ; 10:3, s. 265-267
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract
    • This Special Issue of the IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS) is dedicated to investigate the latest research about the topic of communication-aware AI subsystems and accelerators. Because of the complex communication, extensive computations, and massive storage requirements, the demand of communication-aware AI designs has been increased in recent years.
  •  
10.
  • Chen, Kun-Chih (jimmy), et al. (författare)
  • A NoC-based simulator for design and evaluation of deep neural networks
  • 2020
  • Ingår i: Microprocessors and microsystems. - : ELSEVIER. - 0141-9331 .- 1872-9436. ; 77
  • Tidskriftsartikel (refereegranskat)abstract
    • The astonishing development in the field of artificial neural networks (ANN) has brought significant advancement in many application domains, such as pattern recognition, image classification, and computer vision. ANN imitates neuron behaviors and makes a decision or prediction by learning patterns and features from the given data set. To reach higher accuracies, neural networks are getting deeper, and consequently, the computation and storage demands on hardware platforms are steadily increasing. In addition, the massive data communication among neurons makes the interconnection more complex and challenging. To overcome these challenges, ASIC-based DNN accelerators are being designed which usually incorporate customized processing elements, fixed interconnection, and large off-chip memory storage. As a result, DNN computation involves large memory accesses due to frequent load/off-loading data, which significantly increases the energy consumption and latency. Also, the rigid architecture and interconnection among processing elements limit the efficiency of the platform to specific applications. In recent years, Network-on-Chip-based (NoC-based) DNN becomes an emerging design paradigm because the NoC interconnection can help to reduce the off-chip memory accesses while offers better scalability and flexibility. To evaluate the NoC-based DNN in the early design stage, we introduce a cycle-accurate NoC-based DNN simulator, called DNNoC-sim. To support various operations such as convolution and pooling in the modern DNN models, we first propose a DNN flattening technique to convert diverse DNN operation into MAC-like operations. In addition, we propose a DNN slicing method to evaluate the large-scale DNN models on a resource-constraint NoC platform. The evaluation results show a significant reduction in the off-chip memory accesses compared to the state-of-the-art DNN model. We also analyze the performance and discuss the trade-off between different design parameters. 
  •  
11.
  • Chen, Kun-Chih (Jimmy), et al. (författare)
  • NoC-based DNN Accelerator: A Future Design Paradigm
  • 2019
  • Ingår i: Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip, NOCS 2019. - New York, NY, USA : Association for Computing Machinery (ACM). - 9781450367004
  • Konferensbidrag (refereegranskat)abstract
    • Deep Neural Networks (DNN) have shown significant advantagesin many domains such as pattern recognition, prediction, and controloptimization. The edge computing demand in the Internet-of-Things era has motivated many kinds of computing platforms toaccelerate the DNN operations. The most common platforms areCPU, GPU, ASIC, and FPGA. However, these platforms suffer fromlow performance (i.e., CPU and GPU), large power consumption(i.e., CPU, GPU, ASIC, and FPGA), or low computational flexibilityat runtime (i.e., FPGA and ASIC). In this paper, we suggest theNoC-based DNN platform as a new accelerator design paradigm.The NoC-based designs can reduce the off-chip memory accessesthrough a flexible interconnect that facilitates data exchange betweenprocessing elements on the chip. We first comprehensivelyinvestigate conventional platforms and methodologies used in DNNcomputing. Then we study and analyze different design parametersto implement the NoC-based DNN accelerator. The presentedaccelerator is based on mesh topology, neuron clustering, randommapping, and XY-routing. The experimental results on LeNet, MobileNet,and VGG-16 models show the benefits of the NoC-basedDNN accelerator in reducing off-chip memory accesses and improvingruntime computational flexibility.
  •  
12.
  • Chen, K. -CJ., et al. (författare)
  • Routing algorithm design for power- and temperature-aware NoCs
  • 2022
  • Ingår i: Advances in Computers. - : Elsevier BV. ; , s. 117-150
  • Konferensbidrag (refereegranskat)abstract
    • The Network-on-Chip (NoC) interconnection is a popular way to build up contemporary large-scale multi-processor System-on-Chip (MPSoC) systems. However, due to the high integration density with high operation frequency, the larger power density leads to serious temperature problems. The thermal issue limits the performance and results in higher leakage power and lower system reliability. The thermal and power issues become worsen in the modern 3D stacking NoC structure and become the primary design challenge. In this chapter, we first investigate the correlation between power and temperature in NoC systems and introduce a thermal model for such systems. With this thermal model, we introduce novel routing design methodologies for power- and temperature-aware NoCs by using Game theory and reinforcement learning.
  •  
13.
  • Daneshtalab, Masoud, et al. (författare)
  • In-order delivery approach for 2D and 3D NoCs
  • 2015
  • Ingår i: Journal of Supercomputing. - : Springer Science and Business Media LLC. - 0920-8542 .- 1573-0484. ; 71:8, s. 2877-2899
  • Tidskriftsartikel (refereegranskat)abstract
    • In many applications, it is critical to guarantee the in-order delivery of requests from the master cores to the slave cores, so that the requests can be executed in the correct order without requiring buffers. Since in NoCs packets may use different paths and on the other hand traffic congestion varies on different routes, the in-order delivery constraint cannot be met without support. To guarantee the in-order delivery, traditional approaches either use dimension-order routing or employ reordering buffers at network interfaces. Dimension-order routing degrades the performance considerably while the usage of reordering buffers imposes large area overhead. In this paper, we present a mechanism allowing packets to be routed through multiple paths in the network, helping to balance the traffic load while guaranteeing the in-order delivery. The proposed method combines the advantages of both deterministic and adaptive routing algorithms. The simple idea is to use different deterministic algorithms for independent flows. This approach neither requires reordering buffers nor limits packets to use a single path. The algorithm is simple and practical with negligible area overhead over dimension-order routing. The concept is investigated in both 2D and 3D mesh networks.
  •  
14.
  • Dytckov, Sergei, et al. (författare)
  • Efficient STDP Micro-Architecture for Silicon Spiking Neural Networks
  • 2014
  • Ingår i: 2014 17TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD). - 9781479957934 ; , s. 496-503
  • Konferensbidrag (refereegranskat)abstract
    • Spiking neural networks (SNNs) are the closest approach to biological neurons in comparison with conventional artificial neural networks (ANN). SNNs are composed of neurons and synapses which are interconnected with a complex pattern. As communication in such massively parallel computational systems is getting critical, the network-on-chip (NoC) becomes a promising solution to provide a scalable and robust interconnection fabric. However, using NoC for large-scale SNNs arises a trade-off between scalability, throughput, neuron/router ratio (cluster size), and area overhead. In this paper, we tackle the trade-off using a clustering approach and try to optimize the synaptic resource utilization. An optimal cluster size can provide the lowest area overhead and power consumption. For the learning purposes, a phenomenon known as spike-timing-dependent plasticity (STDP) is utilized. The micro-architectures of the network, clusters, and the computational neurons are also described. The presented approach suggests a promising solution of integrating NoCs and STDP-based SNNs for the optimal performance based on the underlying application.
  •  
15.
  • Ebrahimi, Masoumeh, et al. (författare)
  • A General Methodology on Designing Acyclic Channel Dependency Graphs in Interconnection Networks
  • 2018
  • Ingår i: IEEE Micro. - : IEEE Computer Society. - 0272-1732 .- 1937-4143. ; 38:3, s. 79-85
  • Tidskriftsartikel (refereegranskat)abstract
    • For the past three decades, the interconnection network has been developed based on two major theories, one by Dally and the other by Duato. In this article, we introduce EbDa with a simplified theoretical basis, which directly allows for designing an acyclic channel dependency graph and verifying algorithms on their freedom from deadlock. EbDa is composed of three theorems that enable extracting all allowable turns without dealing with turn models.
  •  
16.
  • Ebrahimi, Masoumeh, et al. (författare)
  • A Light-weight fault-tolerant routing algorithm tolerating faulty links and routers
  • 2013
  • Ingår i: Computing. - : Springer Science and Business Media LLC. - 0010-485X .- 1436-5057. ; 97, s. 631-648
  • Tidskriftsartikel (refereegranskat)abstract
    • Faults at either the link or router level may result in the failure of the system. Fault-tolerant routing algorithms attempt to tolerate faults by rerouting packets around the faulty region. This rerouting would be at the cost of significant performance loss. The proposed algorithm in this paper is able to tolerate both faulty routers and links with negligible impact on the performance. In fact, the proposed algorithm avoids taking unnecessary longer paths and the shortest paths are always taken as long as a path exists. On the other hand, fault-tolerant routing algorithms might be based on deterministic routing in which all packets use a single path between each pair of source and destination routers. Using deterministic routing, packets reach the destination in the same order they have been delivered from the source so that no reordering buffer is needed at the destination. For improving the performance, fault-tolerant algorithms might be based on adaptive routing in which packets are delivered through multiple paths to destinations. In this case, packets should be reordered at the destinations demanding reordering buffers. The proposed algorithm can be configured in both working modes, such that it can be based on deterministic or adaptive routing.
  •  
17.
  • Ebrahimi, Masoumeh, et al. (författare)
  • Creation of CERID : Challenge, Education, Research, Innovation, and Deployment in the context of smart MicroGrid
  • 2019
  • Ingår i: IST-Africa 2019 Conference Proceedings. - 9781905824632
  • Konferensbidrag (refereegranskat)abstract
    • The iGrid project deals with the design and implementation of a solar-powered smart microgrid to supply electric power to small rural communities. In this paper, we discuss the roadmap of the iGrid project, which forms by merging the roadmaps of KIC (knowledge and Innovation Community) and CDE (Challenge-Driven Education). We introduce and explain a five-gear chain as Challenge, Education, Research, Innovation, and Deployment, called CERID, to reach the main goals of this project. We investigate the full chain in the iGrid project, which is established between KTH Royal Institute of Technology (Sweden) and University of Dar es Salam (Tanzania). We introduce the key stakeholders and explain how CERID goals can be accomplished in higher educations and through scientific research. Challenges are discussed, some innovative ideas are introduced and deployment solutions are recommended.
  •  
18.
  • Ebrahimi, Masoumeh, et al. (författare)
  • EbDa: A New Theory on Design and Verification of Deadlock-free Interconnection Networks
  • 2017
  • Ingår i: In Proceedings of ISCA ’17. - New York, NY, USA : ACM Press. ; , s. 1-13, s. 703-715
  • Konferensbidrag (refereegranskat)abstract
    • Freedom from deadlock is one of the most important issues whendesigning routing algorithms in on-chip/off-chip networks. Manyworks have been developed upon Dally’s theory proving that a networkis deadlock-free if there is no cyclic dependency on the channeldependency graph. However, finding such acyclic graph has beenvery challenging, which limits Dally’s theory to networks with a lownumber of channels. In this paper, we introduce three theorems thatdirectly lead to routing algorithms with an acyclic channel dependencygraph.We also propose the partitioning methodology, enablinga design to reach the maximum adaptiveness for the n-dimensionalmesh and k-ary n-cube topologies with any given number of channels.In addition, deadlock-free routing algorithms can be derivedranging from maximally fully adaptive routing down to deterministicrouting. The proposed theorems can drastically remove thedifficulties of designing deadlock-free routing algorithms.
  •  
19.
  • Ebrahimi, Masoumeh, et al. (författare)
  • Fault-tolerant routing algorithm for 3D NoC using hamiltonian path strategy
  • 2013
  • Ingår i: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013. ; , s. 1601-1604
  • Konferensbidrag (refereegranskat)abstract
    • While Networks-on-Chip (NoC) have been increasing in popularity with industry and academia, it is threatened by the decreasing reliability of aggressively scaled transistors. In this paper, we address the problem of faulty elements by the means of routing algorithms. Commonly, fault-tolerant algorithms are complex due to supporting different fault models while preventing deadlock. When moving from 2D to 3D network, the complexity increases significantly due to the possibility of creating cycles within and between layers. In this paper, we take advantages of the Hamiltonian path to tolerate faults in the network. The presented approach is not only very simple but also able to support almost all one-faulty unidirectional links in 2D and 3D NoCs.
  •  
20.
  • Ebrahimi, Masoumeh (författare)
  • Fully adaptive routing algorithms and region-based approaches for two-dimensional and three-dimensional networks-on-chip
  • 2013
  • Ingår i: IET Computers & Digital Techniques. - : Institution of Engineering and Technology (IET). - 1751-8601 .- 1751-861X. ; :6, s. 264-273
  • Tidskriftsartikel (refereegranskat)abstract
    • Network congestion has negative impact on the performance of networks-on-chip (NoC). In traditional congestionawaretechniques, congestion is measured at a router level and delivered to other routers, either local or non-local. One of thecontributions of this study is to show that performance can be improved if the congestion level is measured for a group ofrouters, called cluster, and propagated over the network, rather than considering the congestion level of a single router. Thepresented approach is discussed in both two-dimensional (2D) and three-dimensional (3D) mesh networks. To collect andpropagate the congestion information of different clusters, a distributed approach is presented. The gathered information isutilised at routing units to deliver packets through the less congested regions. To distribute packets over the network withoutforming deadlock, routing algorithms should be carefully designed. The authors take advantage of fully adaptive routingalgorithms, providing the maximum degree of adaptiveness for distributing packets. For 2D NoCs, a conventional fullyadaptive routing algorithm, named dynamic XY (DyXY), is utilised. However, for 3D NoCs a fully adaptive routingalgorithm is proposed and this method is called 3D-FAR. On top of each fully adaptive routing algorithm, a region-basedapproach is developed.
  •  
21.
  • Ebrahimi, Masoumeh, et al. (författare)
  • In-Order Delivery Approach for 3D NoCs
  • 2013
  • Ingår i: 2013 17TH CSI INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND DIGITAL SYSTEMS (CADS 2013). - : IEEE. - 9781479905621 ; , s. 87-
  • Konferensbidrag (refereegranskat)abstract
    • Routing algorithms can be classified into deterministic and adaptive methods. In deterministic methods, a single path is selected for each pair of source and destination nodes, and thus they are unable to distribute the traffic load over the network. Using deterministic routing, packets reach a destination in the same order they are delivered from a source node. Adaptive routing algorithms can greatly improve the performance by distributing packets over different routes. However, it requires a mechanism to reorder packets at destinations. Thereby, a large reordering buffer and a complex control mechanism are required at each node. This motivated us to propose a method guaranteeing in-order delivery while sending packets through alternative paths. The proposed method combines the advantages of both deterministic and adaptive routing algorithms. We introduce several routing algorithms working together in the network without creating cycles. By using these algorithms, packets of different flows use different routes while packets belonging to the same flow follow a single path. In this way, traffic is distributed over the network while addressing in-order delivery. We employ this approach on three-dimensional Networks-on-Chip.
  •  
22.
  • Ebrahimi, Masoumeh, et al. (författare)
  • NoCArc 2018 Message from the Chairs
  • 2018
  • Ingår i: 2018 11th International Workshop on Network on Chip Architectures, NoCArc 2018. - : Institute of Electrical and Electronics Engineers Inc..
  • Konferensbidrag (refereegranskat)
  •  
23.
  • Ebrahimi, Masoumeh, et al. (författare)
  • NoD : Network-on-Die as a Standalone NoC for Heterogeneous Many-core Systems in 2.5D ICs
  • 2017
  • Ingår i: 2017 19TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND DIGITAL SYSTEMS (CADS). - : IEEE. - 9781538643792 ; , s. 28-33
  • Konferensbidrag (refereegranskat)abstract
    • Due to a high cost of 3D IC process technology, the semiconductor industry is targeting 2.5D ICs with interposer as a fast and low-cost alternative to integrate dissimilar technologies. In this paper, we propose an independent network-on-chip die, called Network-on-Die (NoD), for 2.5D ICs that operates as a communication backbone for heterogeneous many-core systems on interposer. NoD is responsible for routing packets from a source router to a destination router, and the connections between routers and cores pass through the interposer. This technique eliminates the complexity of the routing algorithms in heterogeneous systems by turning the irregular form of NoC in 2.5D ICs into a regular/optimized one in NoD. The performance evaluation is verified through RTL simulations for a heterogeneous many-core system of varying die sizes and with asymmetric shapes. We provide the theoretical justification for our simulation results.
  •  
24.
  • Ebrahimi, Masoumeh, et al. (författare)
  • Partitioning methods for unicast/multicast traffic in 3D NoC architecture
  • 2010
  • Ingår i: Proceedings of the 13th IEEE Symposium on Design and Diagnostics of Electronic Circuits and Systems, DDECS 2010. - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 127-132
  • Konferensbidrag (refereegranskat)abstract
    • As the scale of integration grows, the interconnection problem becomes one of the major design considerations of Multi Processor System on Chip (MPSoC). In recent years, many researchers have conducted studies on 3D IC designs stacking multiple layers on top of each other. In order to decrease the transmission delay of unicast/multicast messages in a network based multicore system, the network is divided into several partitions. In this paper, we first introduce a novel idea of balanced partitioning that allows the network to be partitioned effectively. Then, we propose a set of partitioning approaches each with a different level of efficiency. In addition, we present an advantageous method based on the idea of balanced partitioning to provide a high degree of parallelism with a considerable reduction of packet delay in unicast/multicast traffic. Simulations are provided to evaluate and compare the performance of proposed methods.
  •  
25.
  • Ebrahimi, Masoumeh, et al. (författare)
  • Path-Based Partitioning Methods for 3D Networks-on-Chip with Minimal Adaptive Routing
  • 2014
  • Ingår i: IEEE Transactions on Computers. - 0018-9340 .- 1557-9956. ; 63:3, s. 718-733
  • Tidskriftsartikel (refereegranskat)abstract
    • Combining the benefits of 3D ICs and Networks-on-Chip (NoCs) schemes provides a significant performance gain in ChipMultiprocessors (CMPs) architectures. As multicast communication is commonly used in cache coherence protocols for CMPs and invarious parallel applications, the performance of these systems can be significantly improved if multicast operations are supported at thehardware level. In this paper, we present several partitioning methods for the path-based multicast approach in 3D mesh-based NoCs,each with different levels of efficiency. In addition, we develop novel analytical models for unicast and multicast traffic to explore theefficiency of each approach. In order to distribute the unicast and multicast traffic more efficiently over the network, we propose theMinimal and Adaptive Routing (MAR) algorithm for the presented partitioning methods. The analytical and experimental results show thatan advantageous method named Recursive Partitioning (RP) outperforms the other approaches. RP recursively partitions the networkuntil all partitions contain a comparable number of switches and thus the multicast traffic is equally distributed among several subsetsand the network latency is considerably decreased. The simulation results reveal that the RP method can achieve performanceimprovement across all workloads while performance can be further improved by utilizing the MAR algorithm. Nineteen percent averageand 42 percent maximum latency reduction are obtained on SPLASH-2 and PARSEC benchmarks running on a 64-core CMP.
  •  
26.
  • Ebrahimi, Masoumeh, et al. (författare)
  • Performance Analysis of 3D NoCs Partitioning Methods
  • 2010
  • Ingår i: IEEE Annual Symposium on VLSI, ISVLSI 2010. ; , s. 479-480
  • Konferensbidrag (refereegranskat)abstract
    • 3D IC design improves performance and decreases power consumption by replacing long horizontal interconnects with short vertical ones. Achieving higher performance along with reducing the network latency can be obtained by utilizing an efficient communication protocol in 3D Networks-on-Chlp (NoCs). In this work, several unlcast/multicast partitioning methods are explained in order to And an advantageous method with low communication latency. Moreover, two factors of efficiency, unicast latency and multicast latency, are analyzed by analytical models. We also perform simulation to compare the efficiency of proposed methods. The results show that Mixed Partitioning method outperforms other methods in term of latency.
  •  
27.
  • Ebrahimi, Masoumeh, et al. (författare)
  • Rescuing healthy cores against disabled routers
  • 2014
  • Konferensbidrag (refereegranskat)abstract
    • A router may be temporarily or permanently disabled in NoCs for several reasons such as saving power, occurring faults or testing. Disabling a router, however, may have a severe impact on the performance or functionality of the entire system if it results in disconnecting the core from the network. In this paper, we propose a deadlock-free routing algorithm which allows the core to stay connected to the system and continue its normal operation when its connected router is disabled. Our analysis and experiments show that the proposed technique has 100%, 93.60%, and 87.19% network availability by 100% packet delivery when 1, 2 and 3 routers are defunct or intentionally disabled. The algorithm provides adaptivity and it is lightweight, requiring one and two virtual channels along the X and Y dimension, respectively.
  •  
28.
  • Farahnakian, Fahimeh, et al. (författare)
  • Adaptive Load Balancing in Learning-based Approaches for Many-core Embedded Systems
  • 2014
  • Ingår i: Journal of Supercomputing. - : Springer Science and Business Media LLC. - 0920-8542 .- 1573-0484. ; 68:3, s. 1214-1234
  • Tidskriftsartikel (refereegranskat)abstract
    • Adaptive routing algorithms improve network performance by distributingtraffic over the whole network. However, they require congestion information to facilitateload balancing. To provide local and global congestion information, we proposea learning method based on dual reinforcement learning approach. This informationcan be dynamically updated according to the changing traffic condition in the networkby propagating data and learning packets. We utilize a congestion detection methodwhich updates the learning rate according to the congestion level. This method calculatesthe average number of free buffer slots in each switch at specific time intervalsand compares it with maximum and minimum values. Based on the comparison result,the learning rate sets to a value between 0 and 1. If a switch gets congested, the learningrate is set to a high value, meaning that the global information is more important thanlocal. In contrast, local is more emphasized than global information in non-congestedswitches. Results show that the proposed approach achieves a significant performanceimprovement over the traditional Q-routing, DRQ-routing, DBAR and Dynamic XYalgorithms.
  •  
29.
  • Farahnakian, Fahimeh, et al. (författare)
  • Bi-LCQ: A Low-weight Clustering-based Q-learning Approach for NoCs
  • 2014
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 38:1, s. 64-75
  • Tidskriftsartikel (refereegranskat)abstract
    • Network congestion has a negative impact on the performance of on-chip networks due to the increasedpacket latency. Many congestion-aware routing algorithms have been developed to alleviate trafficcongestion over the network. In this paper, we propose a congestion-aware routing algorithm basedon the Q-learning approach for avoiding congested areas in the network. By using the learning method,local and global congestion information of the network is provided for each switch. This information canbe dynamically updated, when a switch receives a packet. However, Q-learning approach suffers fromhigh area overhead in NoCs due to the need for a large routing table in each switch. In order to reducethe area overhead, we also present a clustering approach that decreases the number of routing tablesby the factor of 4. Results show that the proposed approach achieves a significant performance improvementover the traditional Q-learning, C-routing, DBAR and Dynamic XY algorithms.
  •  
30.
  • Gharibi, Farid, et al. (författare)
  • Challenges of Implementing an Effective Primary Health Care Accreditation Program : a qualitative study in Iran
  • 2023
  • Ingår i: BMC Primary Care. - : BioMed Central (BMC). - 2731-4553. ; 24:1
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Accreditation is a prerequisite for scientific management of the health system, owing to its numerous benefits on health centres’ performance. The current study examined Iran’s primary healthcare accreditation program to ascertain the challenges to its successful implementation. Methods: This qualitative study examined the perspectives of 32 managers and staff members in the pilot accreditation program (from the Ministry of Health and Medical Education, Semnan University of Medical Sciences, and Aradan District Health Network). Three in-depth group interviews were conducted using a semi-structured questionnaire, and the data obtained were assessed using thematic analysis. As a result of this investigation identified six themes, 29 sub-themes, and 218 codes as challenges to the successful accreditation of primary health care in Iran. Results: Six main themes, including “organisational culture”, “motivational mechanisms”, “staff workload”, “training system”, “information systems”, and “macro-executive infrastructure”, were identified as the main domain of challenges, with seven, five, two, four, three, and eight sub-themes respectively. Conclusion: Accreditation of PHC in Iran faces significant challenges and obstacles that, if ignored, can jeopardise the program’s success and effectiveness. By identifying challenges and obstacles and making practical suggestions for overcoming them, the findings of this study can aid in the program’s successful implementation and achievement of desired outcomes. 
  •  
31.
  • Gharibi, Farid, et al. (författare)
  • Evaluating Educational Performance of Postgraduate Students Based on the Tennessee Academic Audit Model
  • 2023
  • Ingår i: Shiraz E Medical Journal. - : Brieflands. - 1735-1391. ; 24:4
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: The importance of improving quality and performance in higher education has led various universities to turn to effective methods of educational evaluation, such as auditing. Objectives: The present study evaluated the academic performance of the Tabriz Faculty of Management and Medical Informatics postgraduate students, an Iranian Center of Excellence in Health Management based on the Tennessee Academic Audit Model. Methods: This descriptive-cross sectional study was conducted in 2019 with the participation of educational managers and faculty members of the same faculties in two phases consisting of self-assessment and external evaluation. After contextualization, the Tennessee comprehensive higher education audit checklist was used. Data were studied descriptively, and the results were reported as frequency (percentage) and mean ± standard deviation. Analysis of variance (ANOVA) and Tukey’s post hoc tests were used to evaluate the significance of the difference in academic performance between the educational groups. t-test was also used to evaluate the difference in performance scores in self-assessment and external evaluation phases. A P-value < 0.05 was considered significant. Results: The participants’ performance in the self-assessment phase was moderate (total score: 5.32), and their performance in the external evaluation phase was weak (total score: 2.75). The best and the worst self-assessment scores were in the dimensions of “overall assessment” and “follow-up of previous academic audits,” respectively. In the external evaluation, the dimensions of “con-tributions to the program and university goals” and “follow-up of previous academic audits” had the best and worst performance scores, respectively. Conclusions: The results demonstrated that the Tabriz Faculty of Management and Medical Informatics of the Medical School needs to improve in terms of international standards. Therefore, managers and policymakers are required to implement interventions to address this gap. 
  •  
32.
  • Gharibi, Farid, et al. (författare)
  • Quality of Life and Its Relative Factors Among Patients With Multiple Sclerosis : A Cross-sectional Study in Northwest Iran
  • 2023
  • Ingår i: Journal of Research and Health. - : Social Development & Health Promotion Research Center, Gonabad University of Medical Sciences. - 2423-5717. ; 13:4, s. 263-272
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Multiple sclerosis (MS) is one of the critical diseases due to its adverse clinical, social, and economic consequences for affected people. This study aims to assess the quality of life (QoL) of patients with MS in East Azerbaijan, Iran. Methods: This cross-sectional study was conducted using the multiple sclerosis quality of life-54 (MSQoL-54) questionnaires to interview 300 randomly selected MS patients in East Azarbaijan Province, Iran. The independent t-test, analysis of variance (ANOVA), and Tukey post hoc test were used to examine the relationship between demographic variables, and QoL, and all analyses were performed using SPSS software, version 19. Results: The QoL score in MS patients is 48.22±22.48. The “life satisfaction” is the best and “physical role limitation” is the worst QoL aspect. Significant relationships were observed between marital status, education level, employment status, age of symptoms onset, and years of illness with QoL (P<0.05). Conclusion: The QoL of the MS patients in East Azarbaijan Province is lower than in other parts of Iran and much lower than in Organization for Economic Co-operation and Development (OECD) countries.
  •  
33.
  • Hosseini, Maryam S., et al. (författare)
  • Application Characterization for Near Memory Processing
  • 2021
  • Ingår i: 2021 29th Euromicro international conference on parallel, distributed and network-based processing (PDP 2021). - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 148-152
  • Konferensbidrag (refereegranskat)abstract
    • Data movement between memory subsystem and processor unit is a crippling performance and energy bottleneck for data-intensive applications. Near Memory Processing (NMP) is a promising solution to alleviate the data movement bottleneck. The introduction of 3D-stacked memories and more importantly hybrid memory systems enable the long-wished NMP capability. This work explores the feasibility and efficacy of having NMP on the hybrid memory system for a given set of applications. In this paper, we first redefine a set of NMP-centric performance metrics in order to analyze the efficacy of a given processing unit. Leveraging the proposed metrics, we characterize various sets of applications to assess the suitability of a processing unit in terms of performance. Specifically, in this work we motivate the efficiency of NMP subsystems to process memory-intensive applications when 3D-NVM technologies are employed.
  •  
34.
  • Huang, Letian, et al. (författare)
  • A Lifetime-aware Mapping Algorithm to Extend MTTF of Networks-on-Chip
  • 2018
  • Ingår i: 2018 23rd Asia and South Pacific Design Automation Conference Proceedings (ASP-DAC). - : Institute of Electrical and Electronics Engineers (IEEE). - 9781509006021 ; , s. 147-152
  • Konferensbidrag (refereegranskat)abstract
    • Fast aging of components has become one of the major concerns in Systems-on-Chip with further scaling of the submicron technology. This problem accelerates when combined with improper working conditions such as unbalanced components' utilization. Considering the mapping algorithms in the Networks-on-Chip domain, some routers/links might be frequently selected for mapping while others are underutilized. Consequently, the highly utilized components may age faster than others which results in disconnecting the related cores from the network. To address this issue, we propose a mapping algorithm, called lifetime-aware neighborhood allocation (LaNA), that takes the aging of components into account when mapping applications. The proposed method is able to balance the wear-out of NoC components, and thus extending the service time of NoC. We model the lifetime as a resource consumed over time and accordingly define the lifetime budget metric. LaNA selects a suitable node for mapping which has the maximum lifetime budget. Experimental results show that the lifetime-aware mapping algorithm could improve the minimal MTTF of NoC around 72.2%, 58.3%, 46.6% and 48.2% as compared to NN, CoNA, WeNA and CASqA, respectively.
  •  
35.
  • Huang, L., et al. (författare)
  • ECDR2 : Error Corrector and Detector Relocation Router for Network-on-Chip
  • 2020
  • Ingår i: IEEE Transactions on Computers. - : IEEE Computer Society. - 0018-9340 .- 1557-9956.
  • Tidskriftsartikel (refereegranskat)abstract
    • Network-on-chip (NoC) is commonly used in modern many core systems due to its high bandwidth and flexibility. As the manufacturing process keeps scaling, the reliability challenge of NoCs becomes more significant. Error correction code (ECC) is wildly adopted in error correction NoCs to improve the data correctness. At the same time, extra stages are introduced in the router pipeline to embed the error correction capability. As a result, traditional error correction routers suffer from high network latency. Our research was motivated by this limitation and removes the extra pipeline stages introduced for error correction. In this brief, we propose an error correction router, called error corrector and detector relocation router (ECDR2), that achieves both low latency and high error correction capability. The ECDR2 architecture optimizes the pipeline flow of the router, and further improves the network latency. Experimental results show that, compared with the baseline design, ECDR^2 achieves 22.6% and 39.4% less average latency under uniform traffic pattern and Dedup benchmark, respectively in a 8×8 mesh NoC. The circuit area of ECR is also 7.9% less than that of the baseline design under 45nm technology 
  •  
36.
  • Huang, Letian, et al. (författare)
  • Non-Blocking Testing for Network-on-Chip
  • 2016
  • Ingår i: IEEE Transactions on Computers. - : IEEE. - 0018-9340 .- 1557-9956. ; 65:3, s. 679-692
  • Tidskriftsartikel (refereegranskat)abstract
    • To achieve high reliability in on-chip networks, it is necessary to test the network as frequently as possible to detect physical failures before they lead to system-level failures. A main obstacle is that the circuit under test has to be isolated, resulting in network cuts and packet blockage which limit the testing frequency. To address this issue, we propose a comprehensive network-level approach which could test multiple routers simultaneously at high speed without blocking or dropping packets. We first introduce a reconfigurable router architecture allowing the cores to keep their connections with the network while the routers are under test. A deadlock-free and highly adaptive routing algorithm is proposed to support reconfigurations for testing. In addition, a testing sequence is defined to allow testing multiple routers to avoid dropping of packets. A procedure is proposed to control the behavior of the affected packets during the transition of a router from the normal to the testing mode and vice versa. This approach neither interrupts the execution of applications nor has a significant impact on the execution time. Experiments with the PARSEC benchmarks on an 8x8 NoC-based chip multiprocessors show only 3 percent execution time increase with four routers simultaneously under test.
  •  
37.
  • Huang, L., et al. (författare)
  • Tolerating transient illegal turn faults in NoCs
  • 2016
  • Ingår i: Microprocessors and microsystems. - : Elsevier. - 0141-9331 .- 1872-9436. ; 43:SI, s. 104-115
  • Tidskriftsartikel (refereegranskat)abstract
    • Network-on-Chip (NoC) is becoming a competitive solution to connect hundreds of processing elements in modern computing platforms. Under the trend of shrinking feature sizes, circuits are likely to suffer from faults which lead to degraded performance and erroneous behaviour. Compared to permanent faults, transient faults happen even more frequently and seriously while they are hidden within complex on chip behaviours. One of the serious consequences caused by transient faults is taking illegal turns by the packets after the damage of control logic in on-chip routers which may lead to a deadlock situation and eventually crashing the entire system. To avoid this situation, in this paper, we propose a comprehensive scheme called ODT including an improved router architecture, an illegal-turn-resilient routing algorithm, online fault-detect units and a fault classification method. By applying ODT, more turns are supported on routing level and the deadlock situations can be significantly reduced. Experimental results indicate up to 22% increase of the survived packets in the network when 4% of routing computation units in failure. The extra area overhead and power consumption of ODT method is around 9.22% and 9.63%.
  •  
38.
  • Jiang, Shuyan, et al. (författare)
  • Optimizing Dynamic Mapping Techniques for On-Line NoC Test
  • 2018
  • Ingår i: 2018 23RD ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC). - : IEEE. - 9781509006021 ; , s. 227-232
  • Konferensbidrag (refereegranskat)abstract
    • With the aggressive scaling of submicron technology, intermittent faults are becoming one of the limiting factors in achieving a high reliability in Network-on-Chip (NoC). Increasing test frequency is necessary to detect intermittent faults, which in turn interrupts the execution of applications. On the other hand, the main goal of traditional mapping algorithms is to allocate applications to the NoC platform, ignoring about the test requirement. In this paper, we propose a novel testing-aware mapping algorithm (TAMA) for NoC, targeting intermittent faults on the paths between crossbars. In this approach, the idle links are identified and the components between two crossbars are tested when the application is mapped to the platform. The components can be tested if there is enough time from when the application leaves the platform and a new application enters it. The mapping algorithm is tuned to give a higher priority to the tested paths in the next application mapping. This leaves enough time to test the links and the belonging components that have not been tested in the expected time. Experiment results show that the proposed testing-aware mapping algorithm leads to a significant improvement over FF, NN, CoNA, and WeNA.
  •  
39.
  • Jiang, Shuyan, et al. (författare)
  • Testing aware dynamic mapping for path-centric network-on-chip test
  • 2019
  • Ingår i: Integration. - : Elsevier. - 0167-9260 .- 1872-7522. ; 67, s. 134-143
  • Tidskriftsartikel (refereegranskat)abstract
    • With the aggressive scaling of submicron technology, intermittent faults are becoming one of the limiting factors in achieving high reliability in Network-on-Chip (NoC). Increasing test frequency is necessary to detect intermittent faults, which in turn interrupts the execution of applications. On the other hand, the primary goal of traditional mapping algorithms is to allocate applications to the NoC platform, ignoring the test requirement. In this paper, we propose a novel testing-aware mapping algorithm (TAMA) for NoC, targeting intermittent faults on the paths between crossbars. In this approach, the idle paths are identified, and the components between two crossbars are tested when the application is mapped to the platform. The components can be tested if there is enough time from the time when the application leaves the platform to the time when a new application enters it. The mapping algorithm is tuned to give a higher priority to the tested paths in the next application mapping, which leaves enough time to test the links and the belonging components that have not been tested in the expected time. Experiment results show that the proposed testing-aware mapping algorithm leads to a significant improvement over FF(Fiexitrst Free), NN(Nearest Neighbor), CoNA(Contiguous Neighborhood Allocation), and WeNA(Weighted-based Neighborhood Allocation).
  •  
40.
  • Karami, Masoomeh, et al. (författare)
  • Hierarchical Fault Simulation of Deep Neural Networks on Multi-Core Systems
  • 2021
  • Ingår i: 2021 IEEE EUROPEAN TEST SYMPOSIUM (ETS 2021). - : Institute of Electrical and Electronics Engineers (IEEE).
  • Konferensbidrag (refereegranskat)abstract
    • In this paper, a hierarchical fault simulation technique for neural networks is proposed, supporting both permanent and temporary faults. In the proposed technique, different levels of hierarchy are used, forming a mixed-level simulation environment. In such an environment, the pre-synthesis behavioral specification of the network and the post-synthesis gate-level model are co-simulated. To accelerate the fault simulation process, faults are injected in the gate-level specification of the selected neurons while the behavioral model in different levels of abstraction is used to simulate the remaining neurons. Further speedup is obtained through event-driven simulation and parallelization. Experimental results confirm the time efficiency of the proposed fault simulation technique.
  •  
41.
  • Karami, Masoomeh, et al. (författare)
  • High-Performance Parallel Fault Simulation for Multi-Core Systems
  • 2021
  • Ingår i: 2021 29th euromicro international conference on parallel, distributed and network-based processing (PDP 2021). - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 207-211
  • Konferensbidrag (refereegranskat)abstract
    • Fault simulation is a time-consuming process that requires customized methods and techniques to accelerate it. Multi-threading and Multi-core approaches are two promising techniques that can be exploited to accelerate the fault simulation process by using different parts of the hardware at the same time. However, an efficient parallelization is obtained only by the refinement of software with respect to the hardware platform. In this paper, a parallel multi-thread fault simulation technique is proposed to accelerate the simulation process on multi-core platforms. In this approach, the gate input values are independently assigned to each thread. Each input value carries the information of several parallel simulation processes. This provides a multithread parallel fault simulation environment. The experimental results show that the proposed technique can efficiently use the hardware platform. In a single-core platform. the proposed technique can reduce the time by 25% while in a dual-core increasing the thread approximately halves the execution time.
  •  
42.
  • Karami, Masoomeh, et al. (författare)
  • Thread-level Parallelism in Fault Simulation of Deep Neural Networks on Multi-Processor Systems
  • 2022
  • Ingår i: Proceedings. - : Institute of Electrical and Electronics Engineers (IEEE).
  • Konferensbidrag (refereegranskat)abstract
    • High-performance fault simulation is one of the essential and preliminary tasks in the process of online and offline testing of machine learning (ML) hardware. Deep neural networks (DNN), as one of the essential parts of ML programs, are widely used in many critical and non-critical applications in Systems-on-Chip and ASIC designs. Through fault simulation for DNNs, by increasing the number of neurons, the fault simulation time increases exponentially. However, the software architecture of neural networks and the lack of dependency between neurons in each inference layer provide significant opportunity for parallelism of the fault simulation time in a multi-processor platform. In this paper, a multi-thread technique for hierarchical fault simulation of neural network is proposed, targeting both permanent and transient faults. During the process of fault simulation the neurons for each inference layer will be distributed among the executing threads. Since in the process of hierarchical fault simulation, the faulty neuron demands proportionally enormous computation comparing to behavioural model of non-faulty neurons, the faulty neuron will be assigned to one thread while the rest of the neurons will be divided among the remaining threads. Experimental results confirm the time efficiency of the proposed fault simulation technique on multi-processor architectures.
  •  
43.
  • Kondoro, Aron (författare)
  • Developing a Security-Enhanced Internet-of-Things Based Communication System for Smart Microgrids
  • 2021
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Access to clean and reliable electric power is still a challenge for many local communities in developing countries. Smart micro-grids are one of the new practical solutions that can take advantage of locally available resources to satisfy the energy demands of these communities. They are local low-voltage autonomous power system that consist of renewable power sources, storage systems, and a set of local loads. One of the main challenges in realizing these micro-grids is a robust, ubiquitous and reliable information and communication infrastructure for the control, coordination and monitoring of the power generation and distribution process. The emergence of Internet of Things (IoT) technologies provide a key set of tools to solve this challenge. They facilitate the integration of computational and communication capabilities within power system components. However, integrating these technologies in micro-grids is still a challenge due to stringent security, reliability, and performance requirements of power systems.In this thesis, we develop a security enhanced communication system for IoT based micro-grids that provides comprehensive security services of confidentiality, availability, integrity, and privacy that can be implemented in a resource constrained environment while satisfying the reliability and performance requirements of micro-grid functions. We utilize fog-based communication architectures to reduce latency of data exchanges and improve the efficiency of the communication process. We use security extensions of standard IoT communication protocols to implement a lightweight and performance-aware security system.First, we analyze how the integration of IoT in power systems introduces security vulnerabilities in the power generation and distribution process. We develop a simulation model that is used to evaluate the impact of security attacks on different parts of a power system. Using the model, we demonstrate several attack scenarios that can lead to theft of power, loss of privacy, and power outage. This information is used to determine the security requirements for the new system. Then, we build a lab-scale hardware based micro-grid communication system prototype and demonstrate the performance limitations of existing IoT communication security standards. We show that existing standards do not scale and fail to meet the timing requirements for microgrid protection and control operations. We propose new communication specifications and modifications needed to pass the standard power system requirements. Finally, using the security requirements and communication specifications, we develop a secure IoT based communication system that provides encryption, integrity, privacy, and authentication features with minimal impact to performance. We implement and evaluate the design on the lab-scale hardware prototype. We show how the system can support micro-grid protection, control and monitoring using secure communication channels without exceeding the required performance limitations.
  •  
44.
  • Malekzadeh, Elaheh, et al. (författare)
  • The Impact of Faults on DNNs : A Case Study
  • 2021
  • Ingår i: 2021 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT). - : Institute of Electrical and Electronics Engineers (IEEE).
  • Konferensbidrag (refereegranskat)abstract
    • Deep neural networks (DNNs) are showing superior advantages in different domains and are opening their path into critical applications where reliability is the main concern. DNNs can be executed in different hardware platforms, including general-purpose processors which usually operate under floating-point (FP) numbering systems. Considering the small range of weights in DNNs stored in the FP format, some bits remain constant as 0 or 1 for all weights. On the other hand, a single event upset may flip a bit, increasing or decreasing the value of a weight. In this paper, we analyze the effect of bit flips in a sample network of LeNet5, and show the sensitivity of convolution layers to faults and the vulnerability of DNNs to a single fault in a specific bit position. This is while the network is inherently robust against bit flips in the other bit positions. We then show that the choice of activation functions and pooling techniques could alleviate the negative effects of faults to a large extend.
  •  
45.
  • Nabavinejad, Seyed M., et al. (författare)
  • An Overview of Efficient Interconnection Networks for Deep Neural Network Accelerators
  • 2020
  • Ingår i: IEEE Journal on Emerging and Selected Topics in Circuits and Systems. - : Institute of Electrical and Electronics Engineers Inc.. - 2156-3357 .- 2156-3365. ; 10:3, s. 268-282
  • Tidskriftsartikel (refereegranskat)abstract
    • Deep Neural Networks (DNNs) have shown significant advantages in many domains, such as pattern recognition, prediction, and control optimization. The edge computing demand in the Internet-of-Things (IoTs) era has motivated many kinds of computing platforms to accelerate DNN operations. However, due to the massive parallel processing, the performance of the current large-scale artificial neural network is often limited by the huge communication overheads and storage requirements. As a result, efficient interconnection and data movement mechanisms for future on-chip artificial intelligence (AI) accelerators are worthy of study. Currently, a large body of research aims to find an efficient on-chip interconnection to achieve low-power and high-bandwidth DNN computing. This paper provides a comprehensive investigation of the recent advances in efficient on-chip interconnection and design methodology of the DNN accelerator design. First, we provide an overview of the different interconnection methods on the DNN accelerator. Then, the interconnection methods on the non-ASIC DNN accelerator will be discussed. On the other hand, with the flexible interconnection, the DNN accelerator can support different computing flow, which increases the computing flexibility. With this motivation, reconfigurable DNN computing with flexible on-chip interconnection will be investigated in this paper. Finally, we investigate the emerging interconnection technologies (e.g., in/near-memory processing) for the DNN accelerator design. This paper systematically investigates the interconnection networks in modern DNN accelerator designs. With this article, the readers are able to: 1) understand the interconnection design for DNN accelerators; 2) evaluate DNNs with different on-chip interconnection; 3) familiarize with the trade-offs under different interconnections.
  •  
46.
  • Nabavinejad, Seyed Morteza, et al. (författare)
  • BatchSizer : Power-Performance Trade-off for DNN Inference
  • 2021
  • Ingår i: 2021 26TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC). - New York, NY, USA : IEEE. ; , s. 819-824
  • Konferensbidrag (refereegranskat)abstract
    • GPU accelerators can deliver significant improvement for DNN processing; however, their performance is limited by internal and external parameters. A well-known parameter that restricts the performance of various computing platforms in real-world setups, including GPU accelerators, is the power cap imposed usually by an external power controller. A common approach to meet the power cap constraint is using the Dynamic Voltage Frequency Scaling (DVFS) technique. However, the functionally of this technique is limited and platform-dependent. To improve the performance of DNN inference on GPU accelerators, we propose a new control knob, which is the size of input batches fed to the GPU accelerator in DNN inference applications. After evaluating the impact of this control knob on power consumption and performance of GPU accelerators and DNN inference applications, we introduce the design and implementation of a fast and lightweight runtime system, called BatchSizer. This runtime system leverages the new control knob for managing the power consumption of GPU accelerators in the presence of the power cap. Conducting several experiments using a modern GPU and several DNN models and input datasets, we show that our BatchSizer can significantly surpass the conventional DVFS technique regarding performance (up to 29%), while successfully meeting the power cap.
  •  
47.
  • Nabavinejad, Seyed Morteza, et al. (författare)
  • Coordinated Batching and DVFS for DNN Inference on GPU Accelerators
  • 2022
  • Ingår i: IEEE Transactions on Parallel and Distributed Systems. - : Institute of Electrical and Electronics Engineers (IEEE). - 1045-9219 .- 1558-2183. ; 33:10, s. 2496-2508
  • Tidskriftsartikel (refereegranskat)abstract
    • Employing hardware accelerators to improve the performance and energy-efficiency of DNN applications is on the rise. One challenge of using hardware accelerators, including the GPU-based ones, is that their performance is limited by internal and external factors, such as power caps. A common approach to meet the power cap constraint is using the Dynamic Voltage Frequency Scaling (DVFS) technique. However, the functionally of this technique is limited and platform-dependent. To tackle this challenge, we propose a new control knob, which is the size of input batches fed to the GPU accelerator in DNN inference applications. We first evaluate the impact of batch size on power consumption and performance of DNN inference. Then, we introduce the design and implementation of a fast and lightweight runtime system, called BatchDVFS. Dynamic batching is implemented in BatchDVFS to adaptively change the batch size, and hence, trade-off throughput with power consumption. It employs an approach based on binary search to find the proper batch size within a short period of time. Combining dynamic batching with the DVFS technique, BatchDVFS can control the power consumption in wider ranges, and hence, yield higher throughput in the presence of power caps. To find near-optimal solution for long-running jobs that can afford a relatively significant profiling overhead, compared with BatchDVFS overhead, we also design an approach, called BOBD, that employs Bayesian Optimization to wisely explore the vast state space resulted by combination of the batch size and DVFS solutions. Conducting several experiments using a modern GPU and several DNN models and input datasets, we show that our BatchDVFS can significantly surpass the techniques solely based on DVFS or batching, regarding throughput (up to 11.2x and 2.2x, respectively), while successfully meeting the power cap.
  •  
48.
  • Nikdast, Mahdi, et al. (författare)
  • Special Issue on the 2023 International Symposium on Networks-on-Chip (NOCS 2023)
  • 2023
  • Ingår i: IEEE design & test. - : Institute of Electrical and Electronics Engineers (IEEE). - 2168-2356 .- 2168-2364. ; 40:6, s. 5-6
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract
    • The International Symposium on networks-on-chip (NOCS) serves as the premier interdisciplinary meeting for research on NoC architecture, implementation, analysis, optimization, and verification, encompassing various aspects of NoCs for embedded high-performance computing systems, un-core and system-level NoCs, inter/intrachip, and rack-scale networks. Similar to previous years, this event has been held in conjunction with the Embedded Systems Week (ESWEEK). This year, NOCS was held in Hamburg, Germany, on 21-22 September 2023, marking its return to a fully in-person conference after virtual and hybrid editions during the pandemic.
  •  
49.
  • Patti, D., et al. (författare)
  • Message from the Chairs
  • 2017
  • Ingår i: 10th International Workshop on Network on Chip Architectures, NoCArc 2017. - : Association for Computing Machinery, Inc.
  • Konferensbidrag (refereegranskat)
  •  
50.
  • Rabiee, Navid, et al. (författare)
  • Green Biomaterials : fundamental principles
  • 2023
  • Ingår i: Green Biomaterials. - : Taylor & Francis. - 2993-4168. ; 1:1, s. 1-4
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-50 av 77
Typ av publikation
konferensbidrag (42)
tidskriftsartikel (32)
doktorsavhandling (2)
forskningsöversikt (1)
Typ av innehåll
refereegranskat (71)
övrigt vetenskapligt/konstnärligt (6)
Författare/redaktör
Ebrahimi, Masoumeh (71)
Daneshtalab, Masoud (18)
Tenhunen, Hannu (12)
Plosila, Juha (12)
Huang, L. (7)
Huang, Letian (7)
visa fler...
Bagherzadeh, Nader (6)
Wang, Junshi (6)
Wang, J. (5)
Li, G. (5)
Liljeberg, Pasi (5)
Johnsson, Andreas (4)
Jantsch, Axel (4)
Li, Qiang (4)
Jantsch, A. (3)
Li, Q. (3)
Baharloo, Mohammad (3)
Dalal, Koustuv, 1969 ... (3)
Wu, Qiong (3)
Ben Dhaou, Imed (3)
Gharibi, Farid (3)
Haghbayan, Mohammad- ... (3)
Chen, Shuyu (3)
Jiang, Shuyan (3)
Li, Guangjun (3)
Wang, X. (2)
Zhang, X. (2)
Hallén, Anders. (2)
Weldezion, Awet Yema ... (2)
Rabiee, Navid (2)
Lu, Zhonghai (2)
Mvungi, Nerey (2)
Imani, Ali (2)
Palesi, Maurizio (2)
Kondoro, Aron (2)
Rwegasira, Diana (2)
Bitalebi, Hossein (2)
Geraeinejad, Vahid (2)
Chen, Kun-Chih (2)
Kogel, Tim (2)
Chen, Kun-Chih (jimm ... (2)
Wang, Ting-Yi (2)
Yang, Yuch-Chi (2)
Chen, K. -CJ. (2)
Dytckov, Sergei (2)
Farahnakian, Fahimeh (2)
Tavani, Masoumeh Ebr ... (2)
Miele, Antonio (2)
Reda, Sherief (2)
Zhang, Xiaofan (2)
visa färre...
Lärosäte
Kungliga Tekniska Högskolan (72)
Uppsala universitet (5)
Mälardalens universitet (3)
Mittuniversitetet (2)
Högskolan i Skövde (2)
Karlstads universitet (1)
visa fler...
Karolinska Institutet (1)
Högskolan Dalarna (1)
visa färre...
Språk
Engelska (77)
Forskningsämne (UKÄ/SCB)
Teknik (53)
Naturvetenskap (25)
Medicin och hälsovetenskap (3)
Samhällsvetenskap (2)
Humaniora (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy