SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L773:0141 9331 "

Sökning: L773:0141 9331

  • Resultat 1-50 av 56
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Jalminger, Jonas, 1971, et al. (författare)
  • Improvement of energy-efficiency in off-chip caches by selective prefetching
  • 2002
  • Ingår i: Microprocessors and Microsystems. - 0141-9331. ; 26:3, s. 107-121
  • Tidskriftsartikel (refereegranskat)abstract
    • The line size/performance trade-offs in off-chip second-level caches in light of energy-efficiency are revisited. Based on a mix of applications representing server and mobile computer system usage, we show that while the large line sizes (128 bytes) typically used maximize performance, they result in a high power dissipation owing to the limited exploitation of spatial locality. In contrast, small blocks (32 bytes) are found to cut the energy-delay by more than a factor of 2 with only a moderate performance loss of less than 25%. As a remedy, prefetching, if applied selectively, is shown to avoid the performance losses of small blocks, yet keeping power consumption low.
  •  
2.
  • Törngren, Martin, et al. (författare)
  • A modelling framework to support the design and analysis of distributed real-time control systems
  • 2000
  • Ingår i: Microprocessors and microsystems. - 0141-9331 .- 1872-9436. ; 24:2, s. 81-93
  • Tidskriftsartikel (refereegranskat)abstract
    • Within the automatic control in distributed applications project a modelling framework has been developed to support design issues related to the implementation of control applications in embedded distributed computer systems. At a relatively high level of abstraction the models describe the structure and timing behaviour of a control application (in terms of functions and operational models) and its implementation (hardware, operating system threads and resources). The resource description allows the timing behaviour of the implementation to be analysed and fed back into the application models. The models form the basis for a decentralization tool-set, where a first prototype is under development. Examples of the models are given and the framework is compared to related modelling approaches.
  •  
3.
  • Afzal, Wasif, et al. (författare)
  • The MegaM@Rt2 ECSEL project : MegaModelling at Runtime – Scalable model-based framework for continuous development and runtime validation of complex systems
  • 2018
  • Ingår i: Microprocessors and microsystems. - : Elsevier B.V.. - 0141-9331 .- 1872-9436. ; 61, s. 86-95
  • Tidskriftsartikel (refereegranskat)abstract
    • A major challenge for the European electronic industry is to enhance productivity by ensuring quality of development, integration and maintenance while reducing the associated costs. Model-Driven Engineering (MDE) principles and techniques have already shown promising capabilities, but they still need to scale up to support real-world scenarios implied by the full deployment and use of complex electronic components and systems. Moreover, maintaining efficient traceability, integration, and communication between two fundamental system life cycle phases (design time and runtime) is another challenge requiring the scalability of MDE. This paper presents an overview of the ECSEL 1 project entitled “MegaModelling at runtime – Scalable model-based framework for continuous development and runtime validation of complex systems” (MegaM@Rt2), whose aim is to address the above mentioned challenges facing MDE. Driven by both large and small industrial enterprises, with the support of research partners and technology providers, MegaM@Rt2 aims to deliver a framework of tools and methods for: 1) system engineering/design and continuous development, 2) related runtime analysis and 3) global models and traceability management. Diverse industrial use cases (covering strategic domains such as aeronautics, railway, construction and telecommunications) will integrate and demonstrate the validity of the MegaM@Rt2 solution. This paper provides an overview of the MegaM@Rt2 project with respect to its approach, mission, objectives as well as to its implementation details. It further introduces the consortium as well as describes the work packages and few already produced deliverables.
  •  
4.
  • Agirre, J. A., et al. (författare)
  • The VALU3S ECSEL project : Verification and validation of automated systems safety and security
  • 2021
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 87, s. 104349-
  • Tidskriftsartikel (refereegranskat)abstract
    • Manufacturers of automated systems and their components have been allocating an enormous amount of time and effort in R&D activities, which led to the availability of prototypes demonstrating new capabilities as well as the introduction of such systems to the market within different domains. Manufacturers need to make sure that the systems function in the intended way and according to specifications. This is not a trivial task as system complexity rises dramatically the more integrated and interconnected these systems become with the addition of automated functionality and features to them. This effort translates into an overhead on the V&V (verification and validation) process making it time-consuming and costly. In this paper, we present VALU3S, an ECSEL JU (joint undertaking) project that aims to evaluate the state-of-the-art V&V methods and tools, and design a multi-domain framework to create a clear structure around the components and elements needed to conduct the V&V process. The main expected benefit of the framework is to reduce time and cost needed to verify and validate automated systems with respect to safety, cyber-security, and privacy requirements. This is done through identification and classification of evaluation methods, tools, environments and concepts for V&V of automated systems with respect to the mentioned requirements. VALU3S will provide guidelines to the V&V community including engineers and researchers on how the V&V of automated systems could be improved considering the cost, time and effort of conducting V&V processes. To this end, VALU3S brings together a consortium with partners from 10 different countries, amounting to a mix of 25 industrial partners, 6 leading research institutes, and 10 universities to reach the project goal.
  •  
5.
  • Arslan, Mehmet Ali, et al. (författare)
  • Instruction Selection and Scheduling for DSP Kernels
  • 2014
  • Ingår i: Microprocessors and Microsystems. - : Elsevier BV. - 0141-9331. ; 38:8, s. 803-813
  • Tidskriftsartikel (refereegranskat)abstract
    • As custom multicore architectures become more and more common for DSP applications, instruction selection and scheduling for such applications and architectures become important topics. In this paper, we explore the effects of defining the problem of finding an optimal instruction selection and scheduling as a constraint satisfaction problem (CSP). We incorporate methods based on sub-graph isomorphism and global constraints designed for scheduling. We experiment using several media applications on a custom architecture, a generic VLIW architecture and a RISC architecture, all three with several cores. Our results show that defining the problem with constraints gives flexibility in modelling, while state-of-the-art constraint solvers enable optimal solutions for large problems, hinting a new method for code generation.
  •  
6.
  • Asghari, S. A., et al. (författare)
  • A software implemented comprehensive soft error detection method for embedded systems
  • 2020
  • Ingår i: Microprocessors and microsystems. - : Elsevier. - 0141-9331 .- 1872-9436. ; 77
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents a comprehensive software-based technique that is capable of detecting soft errors in embedded systems. Soft errors can be categorized into Control Flow Errors (CFEs) and data errors. The CFEs change the flow of the program erroneously and data errors also change the results. In this paper, a new comprehensive method is presented to detect both (based on combination of authors’ previous works). In order to evaluate the proposed method, a new factor is defined that considers three main parameters simultaneously; namely fault coverage, memory overhead, and performance overhead. Since these parameters are very important in safety critical applications, they should be improved concurrently. The experimental results on SPEC2000 benchmarks show that the Evaluation Factor of the proposed method is 50% better than the Relationship Signatures for Control Flow Checking with Data Validation (RSCFCDV) methods, which are suggested in the literature. 
  •  
7.
  • Badawi, Mohammad, et al. (författare)
  • Quality-of-service-aware adaptation scheme for multi-core protocol processing architecture
  • 2017
  • Ingår i: Microprocessors and microsystems. - : Elsevier. - 0141-9331 .- 1872-9436. ; 54, s. 47-59
  • Tidskriftsartikel (refereegranskat)abstract
    • Employing adaptable protocol processing architectures has shown a high potential in provisioning Quality-of-Service (QoS) while retaining efficient use of available energy budget. Nevertheless, successful QoS provisioning using adaptable protocol processing architectures requires adaption to be agile and to have low latency. That is, a long adaptation latency might lead to violating desired packet processing latency, desired throughput or loss of packets if the memory fails to accommodate packet accumulation. This paper presents an elastic management scheme to permit agile and QoS-aware adaptation of processing elements (PEs) within the protocol processing architecture, such that desired QoS is maintained. Moreover, our proposed scheme has the potential to reduce energy consumption since it employs the PEs upon demand. We quantify the latency required for PEs adaptation, the reduction in energy and the reduction in area that can be achieved using our scheme. We also consider two different real-life use cases to demonstrate the effectiveness of our proposed management scheme in maintaining QoS while conserving available energy.
  •  
8.
  •  
9.
  • Bruneliere, H., et al. (författare)
  • AIDOaRt : AI-augmented Automation for DevOps, a model-based framework for continuous development in Cyber–Physical Systems
  • 2022
  • Ingår i: Microprocessors and microsystems. - : Elsevier B.V.. - 0141-9331 .- 1872-9436. ; 94
  • Tidskriftsartikel (refereegranskat)abstract
    • The advent of complex Cyber–Physical Systems (CPSs) creates the need for more efficient engineering processes. Recently, DevOps promoted the idea of considering a closer continuous integration between system development (including its design) and operational deployment. Despite their use being still currently limited, Artificial Intelligence (AI) techniques are suitable candidates for improving such system engineering activities (cf. AIOps). In this context, AIDOaRT is a large European collaborative project that aims at providing AI-augmented automation capabilities to better support the modeling, coding, testing, monitoring, and continuous development of CPSs. The project proposes to combine Model Driven Engineering principles and techniques with AI-enhanced methods and tools for engineering more trustable CPSs. The resulting framework will (1) enable the dynamic observation and analysis of system data collected at both runtime and design time and (2) provide dedicated AI-augmented solutions that will then be validated in concrete industrial cases. This paper describes the main research objectives and underlying paradigms of the AIDOaRt project. It also introduces the conceptual architecture and proposed approach of the AIDOaRt overall solution. Finally, it reports on the actual project practices and discusses the current results and future plans.
  •  
10.
  • Castillejo, Pedro, et al. (författare)
  • Aggregate Farming in the Cloud : The AFarCloud ECSEL project
  • 2020
  • Ingår i: Microprocessors and microsystems. - : Elsevier B.V.. - 0141-9331 .- 1872-9436. ; 78
  • Tidskriftsartikel (refereegranskat)abstract
    • Farming is facing many economic challenges in terms of productivity and cost-effectiveness. Labor shortage partly due to depopulation of rural areas, especially in Europe, is another challenge. Domain specific problems such as accurate monitoring of soil and crop properties and animal health are key factors for minimizing economical risks, and not risking human health. The ECSEL AFarCloud (Aggregate Farming in the Cloud) project will provide a distributed platform for autonomous farming that will allow the integration and cooperation of agriculture Cyber Physical Systems in real-time in order to increase efficiency, productivity, animal health, food quality and reduce farm labor costs. Moreover, such a platform can be integrated with farm management software to support monitoring and decision-making solutions based on big data and real-time data mining techniques. © 2020 The Author(s)
  •  
11.
  • Cavo, Luis, et al. (författare)
  • Design of an area efficient crypto processor for 3GPP-LTE NB-IoT devices
  • 2020
  • Ingår i: Microprocessors and Microsystems. - : Elsevier BV. - 0141-9331. ; 72
  • Tidskriftsartikel (refereegranskat)abstract
    • Providing information security is crucial for the Internet of Things (IoT) devices, platforms in which the available power budget is very limited. This paper tackles this challenge and presents a cryptographic processor compliant with the security algorithms specified by the 3rd Generation Partnership Project (3GPP) Long Term Evolution (LTE) NarrowBand IoT (NB-IoT) standard. The proposed processor has been optimized to the needs of the low end portfolio technologies that compose the IoT market, which addresses low-area, low-cost and low-data rate applications. Operation analysis at the algorithm-level and hardware sharing at the architecture-level have enabled extensive area reduction. The cryptographic processor has been described using the High-Level Synthesis (HLS) design flow and integrated with a general purpose processor in a cycle accurate virtual platform. The design achieves a reduction of area ranging from 5% to 42% in comparison to similar work. Synthesis results using a 65-nm CMOS technology show that the processor has a hardware cost of 53.6 kGE, and is capable of performing at 52.4 Mbps for the block cipher and 800 Mbps for the stream cipher algorithms at a 100 MHz clock.
  •  
12.
  • Chen, Kun-Chih (jimmy), et al. (författare)
  • A NoC-based simulator for design and evaluation of deep neural networks
  • 2020
  • Ingår i: Microprocessors and microsystems. - : ELSEVIER. - 0141-9331 .- 1872-9436. ; 77
  • Tidskriftsartikel (refereegranskat)abstract
    • The astonishing development in the field of artificial neural networks (ANN) has brought significant advancement in many application domains, such as pattern recognition, image classification, and computer vision. ANN imitates neuron behaviors and makes a decision or prediction by learning patterns and features from the given data set. To reach higher accuracies, neural networks are getting deeper, and consequently, the computation and storage demands on hardware platforms are steadily increasing. In addition, the massive data communication among neurons makes the interconnection more complex and challenging. To overcome these challenges, ASIC-based DNN accelerators are being designed which usually incorporate customized processing elements, fixed interconnection, and large off-chip memory storage. As a result, DNN computation involves large memory accesses due to frequent load/off-loading data, which significantly increases the energy consumption and latency. Also, the rigid architecture and interconnection among processing elements limit the efficiency of the platform to specific applications. In recent years, Network-on-Chip-based (NoC-based) DNN becomes an emerging design paradigm because the NoC interconnection can help to reduce the off-chip memory accesses while offers better scalability and flexibility. To evaluate the NoC-based DNN in the early design stage, we introduce a cycle-accurate NoC-based DNN simulator, called DNNoC-sim. To support various operations such as convolution and pooling in the modern DNN models, we first propose a DNN flattening technique to convert diverse DNN operation into MAC-like operations. In addition, we propose a DNN slicing method to evaluate the large-scale DNN models on a resource-constraint NoC platform. The evaluation results show a significant reduction in the off-chip memory accesses compared to the state-of-the-art DNN model. We also analyze the performance and discuss the trade-off between different design parameters. 
  •  
13.
  • Daneshtalab, Masoud, et al. (författare)
  • Special issue on many-core embedded systems
  • 2014
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 38:6, s. 525-525
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)
  •  
14.
  • Diaz, Isael, et al. (författare)
  • A New Digital Front-End for Flexible Reception in Software Defined Radio
  • 2015
  • Ingår i: Microprocessors and Microsystems. - : Elsevier BV. - 0141-9331. ; 39:8, s. 889-900
  • Tidskriftsartikel (refereegranskat)abstract
    • Future mobile terminals are expected to support an ever increasing number of Radio Access Technologies (RAT) concurrently. This imposes a challenge to terminal designers already today. Software Defined Radio (SDR) solutions are a compelling alternative to address this issue in the digital baseband, given its high flexibility and low Non-Recurring Engineering (NRE) cost. However, the challenge still remains in the Digital Front-End (DFE), where many operations are too complex or energy hungry to be implemented as software instructions. Thus, new architectures are needed to feed the SDR digital baseband while keeping complexity and energy consumption at bay. In this article the architecture of a Digital Front-End Receiver (DFE-Rx) for the next-generation mobile terminals is presented. The flexibility needed for multi-standard support is demonstrated by detecting, synchronizing and reporting carrier-frequency offset, of multiple concurrent radio standards. Moreover, the proposed architecture has been fabricated in a 65 nm CMOS low power high-VT cell technology in a die size of 5 mm2. The core module of the DFE-Rx, the synchronization engine, has been measured at 1.2 V and reports an average power consumption of 1.9 mW during Wireless Local Area Network (WLAN) reception and 1.6 mW during configuration, while running at 10 MHz.
  •  
15.
  • Fakih, M., et al. (författare)
  • SAFEPOWER project : Architecture for safe and power-efficient mixed-criticality systems
  • 2017
  • Ingår i: Microprocessors and microsystems. - : Elsevier. - 0141-9331 .- 1872-9436. ; 52, s. 89-105
  • Tidskriftsartikel (refereegranskat)abstract
    • With the ever increasing industrial demand for bigger, faster and more efficient systems, a growing number of cores is integrated on a single chip. Additionally, their performance is further maximized by simultaneously executing as many processes as possible without regarding their criticality. Even safety critical domains like railway and avionics apply these paradigms under strict certification regulations. As the number of cores is continuously expanding, the importance of cost-effectiveness grows. One way to increase the cost-efficiency of such System on Chip (SoC) is to enhance the way the SoC handles its power resources. By increasing the power efficiency, the reliability of the SoC is raised because the lifetime of the battery lengthens. Secondly, by having less energy consumed, the emitted heat is reduced in the SoC which translates into fewer cooling devices. Though energy efficiency has been thoroughly researched, there is no application of those power saving methods in safety critical domains yet. The EU project SAFEPOWER1.
  •  
16.
  • Farahini, Nasim, et al. (författare)
  • Parallel distributed scalable runtime address generation scheme for a coarse grain reconfigurable computation and storage fabric
  • 2014
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 38:8, s. 788-802
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents a hardware based solution for a scalable runtime address generation scheme for DSP applications mapped to a parallel distributed coarse grain reconfigurable computation and storage fabric. The scheme can also deal with non-affine functions of multiple variables that typically correspond to multiple nested loops. The key innovation is the judicious use of two categories of address generation resources. The first category of resource is the low cost AGU that generates addresses for given address bounds for affine functions of up to two variables. Such low cost AGUs are distributed and associated with every read/write port in the distributed memory architecture. The second category of resource is relatively more complex but is also distributed but shared among a few storage units and is capable of handling more complex address generation requirements like dynamic computation of address bounds that are then used to configure the AGUs, transformation of non-affine functions to affine function by computing the affine factor outside the loop, etc. The runtime computation of the address constraints results in negligibly small overhead in latency, area and energy while it provides substantial reduction in program storage, reconfiguration agility and energy compared to the prevalent pre-computation of address constraints. The efficacy of the proposed method has been validated against the prevalent address generation schemes for a set of six realistic DSP functions. Compared to the pre-computation method, the proposed solution achieved 75% average code compaction and compared to the centralized runtime address generation scheme, the proposed solution achieved 32.7% average performance improvement.
  •  
17.
  • Farahnakian, Fahimeh, et al. (författare)
  • Bi-LCQ: A Low-weight Clustering-based Q-learning Approach for NoCs
  • 2014
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 38:1, s. 64-75
  • Tidskriftsartikel (refereegranskat)abstract
    • Network congestion has a negative impact on the performance of on-chip networks due to the increasedpacket latency. Many congestion-aware routing algorithms have been developed to alleviate trafficcongestion over the network. In this paper, we propose a congestion-aware routing algorithm basedon the Q-learning approach for avoiding congested areas in the network. By using the learning method,local and global congestion information of the network is provided for each switch. This information canbe dynamically updated, when a switch receives a packet. However, Q-learning approach suffers fromhigh area overhead in NoCs due to the need for a large routing table in each switch. In order to reducethe area overhead, we also present a clustering approach that decreases the number of routing tablesby the factor of 4. Results show that the proposed approach achieves a significant performance improvementover the traditional Q-learning, C-routing, DBAR and Dynamic XY algorithms.
  •  
18.
  • Granlund, Stefan, et al. (författare)
  • A Low-Latency High-Throughput Soft-Output Signal Detector for Spatial Multiplexing MIMO Systems
  • 2015
  • Ingår i: Microprocessors and Microsystems. - : Elsevier BV. - 0141-9331. ; 39:8, s. 901-908
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents a low latency, high throughput soft-output signal detector for a 4x4 64-QAM spatial-multiplexing MIMO system. To achieve high data-level parallelism and accurate soft information, the detector adopts a channel-adaptive node perturbation technique to generate a list of candidate vectors around an initial linear estimation. The detection algorithm provides a large range and convenient performance-complexity trade-off by adjusting the node perturbation parameter. A partial-parallel pipelined VLSI architecture is developed to implement the algorithm with high throughput, low processing latency, while offering the flexibility to support run-time performance tuning. Moreover, a fast and hardware-friendly node enumeration scheme is developed to further reduce the processing delay by exploiting the geometric property of the quadrature amplitude modulation (QAM) constellation. The detector was synthesized using Synopsys Design Compiler with a 65nm CMOS standard cell library. The core area is 0.58mm2 with 290K gates. The peak throughput is 3Gb/s at 500MHz clock frequency with a latency of 20ns. Compared to other reported soft-output MIMO detectors, this is a latency reduction of 71%. The corresponding energy consumption is 33pJ per bit detection.
  •  
19.
  • Gruian, Flavius, et al. (författare)
  • Hardware Support for CSP on a Java Chip-Multiprocessor
  • 2013
  • Ingår i: Microprocessors and Microsystems. - : Elsevier BV. - 0141-9331. ; 37:4-5, s. 472-481
  • Tidskriftsartikel (refereegranskat)abstract
    • Due to memory bandwidth limitations, chip multiprocessors (CMP) adopting the convenient shared memory model for their main memory architecture scale poorly. On-chip core-to-core communication is a solution to this problem, that can lead to further performance increase for a number of multithreaded applications. Programmatically, the Communicating Sequential Processes (CSP) paradigm provides a sound computational model for such an architecture with message based communication. In this paper we explore hardware support for CSP in the context of an embedded Java CMP. The hardware support for CSP are on-chip communication channels, implemented by a ring-based network-on-chip (NoC), to reduce the memory bandwidth pressure on the shared memory. The presented solution is scalable and also specific for our limited resources and real-time predictability requirements. CMP architectures of three to eight processors were implemented and tested on both Altera (EP1C12, EP2C70) and Xilinx (XC3S1200e) FPGAs, showing that the NoC accounts for under 9% of the total device area used by the system. Compared to shared memory-based communication, our NoC-based solution is between 1.7 and 9.3 times faster for raw data transfer, depending on the communication and memory configuration. Application speed-up, on the other hand, is highly dependent on the type of processing, as our measurements show.
  •  
20.
  • Grüttner, K., et al. (författare)
  • CONTREX : Design of embedded mixed-criticality CONTRol systems under consideration of EXtra-functional properties
  • 2017
  • Ingår i: Microprocessors and microsystems. - : Elsevier B.V.. - 0141-9331 .- 1872-9436. ; 51, s. 39-55
  • Tidskriftsartikel (refereegranskat)abstract
    • The increasing processing power of today's HW/SW platforms leads to the integration of more and more functions in a single device. Additional design challenges arise when these functions share computing resources and belong to different criticality levels. CONTREX complements current activities in the area of predictable computing platforms and segregation mechanisms with techniques to consider the extra-functional properties, i.e., timing constraints, power, and temperature. CONTREX enables energy efficient and cost aware design through analysis and optimization of these properties with regard to application demands at different criticality levels. This article presents an overview of the CONTREX European project, its main innovative technology (extension of a model based design approach, functional and extra-functional analysis with executable models and run-time management) and the final results of three industrial use-cases from different domain (avionics, automotive and telecommunication).
  •  
21.
  • Guang, Liang, et al. (författare)
  • Interconnection alternatives for hierarchical monitoring communication in parallel SoCs
  • 2010
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 34:5, s. 118-128
  • Tidskriftsartikel (refereegranskat)abstract
    • Interconnection architectures for hierarchical monitoring communication in parallel System-on-Chip (SoC) platforms are explored. Hierarchical agent monitoring design paradigm is an efficient and scalable approach for the design of parallel embedded systems. Between distributed agents on different levels, monitoring communication is required to exchange information, which forms a prioritized traffic class over data traffic. The paper explains the common monitoring operations in SoCs, and categorizes them into different types of functionality and various granularities. Requirements for on-chip interconnections to support the monitoring communication are outlined. Baseline architecture with best-effort service, time division multiple access (TDMA) and two types of physically separate interconnections are discussed and compared, both theoretically and quantitatively on a Network-on-Chip (NoC)-based platform. The simulation uses power estimation of 65 nm technology and NoC microbenchmarks as traffic traces. The evaluation points out the benefits and issues of each interconnection alternative. In particular, hierarchical monitoring networks are the most suitable alternative, which decouple the monitoring communication from data traffic, provide the highest energy efficiency with simple switching, and enable flexible reconfiguration to tradeoff power and performance.
  •  
22.
  • Hertz, Erik, 1956-, et al. (författare)
  • Combining the parabolic synthesis methodology with second-degree interpolation
  • 2016
  • Ingår i: Microprocessors and Microsystems. - Amsterdam : Elsevier BV. - 0141-9331 .- 1872-9436. ; 42, s. 142-155
  • Tidskriftsartikel (refereegranskat)abstract
    • The Parabolic Synthesis methodology is an approximation methodology for implementing unary functions, such as trigonometric functions, logarithms and square root, as well as binary functions, such as division, in hardware. Unary functions are extensively used in baseband for wireless/wireline communication, computer graphics, digital signal processing, robotics, astrophysics, fluid physics, games and many other areas. For high-speed applications, as well as in low-power systems, software solutions are not sufficient and a hardware implementation is therefore needed. The Parabolic Synthesis methodology is a way to implement functions in hardware based on low complexity operations that are simple to implement in hardware. A difference in the Parabolic Synthesis methodology compared to many other approximation methodologies is that it is a multiplicative, in contrast to additive, methodology. To further improve the performance of Parabolic Synthesis based designs, the methodology is combined with Second-Degree Interpolation. The paper shows that the methodology provides a significant reduction in chip area, computation delay and power consumption with preserved characteristics of the error. To evaluate this, the logarithmic function was implemented, as an example, using the Parabolic Synthesis methodology in comparison to the Parabolic Synthesis methodology combined with Second-Degree Interpolation. To further demonstrate the feasibility of both methodologies, they have been compared with the CORDIC methodology. The comparison is made on the implementation of the fractional part of the logarithmic function with a 15-bit resolution. The designs implemented using the Parabolic Synthesis methodology - with and without the Second-Degree Interpolation - perform 4x and 8x better, respectively, than the CORDIC implementation in terms of throughput. In terms of energy consumption, the CORDIC implementation consumes 140% and 800% more energy, respectively. The chip area is also smaller in the case when the Parabolic Synthesis methodology combined with Second-Degree Interpolation is used.
  •  
23.
  • Huang, L., et al. (författare)
  • Tolerating transient illegal turn faults in NoCs
  • 2016
  • Ingår i: Microprocessors and microsystems. - : Elsevier. - 0141-9331 .- 1872-9436. ; 43:SI, s. 104-115
  • Tidskriftsartikel (refereegranskat)abstract
    • Network-on-Chip (NoC) is becoming a competitive solution to connect hundreds of processing elements in modern computing platforms. Under the trend of shrinking feature sizes, circuits are likely to suffer from faults which lead to degraded performance and erroneous behaviour. Compared to permanent faults, transient faults happen even more frequently and seriously while they are hidden within complex on chip behaviours. One of the serious consequences caused by transient faults is taking illegal turns by the packets after the damage of control logic in on-chip routers which may lead to a deadlock situation and eventually crashing the entire system. To avoid this situation, in this paper, we propose a comprehensive scheme called ODT including an improved router architecture, an illegal-turn-resilient routing algorithm, online fault-detect units and a fault classification method. By applying ODT, more turns are supported on routing level and the deadlock situations can be significantly reduced. Experimental results indicate up to 22% increase of the survived packets in the network when 4% of routing computation units in failure. The extra area overhead and power consumption of ODT method is around 9.22% and 9.63%.
  •  
24.
  • Jafri, Syed Mohammad Asad Hassan, et al. (författare)
  • Design of the coarse-grained reconfigurable architecture DART with on-line error detection
  • 2014
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 38:2, s. 124-136
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents the implementation of the coarse-grained reconfigurable architecture (CGRA) DART with on-line error detection intended for increasing fault-tolerance. Most parts of the data paths and of the local memory of DART are protected using residue code modulo 3, whereas only the logic unit is protected using duplication with comparison. These low-cost hardware techniques would allow to tolerate temporary faults (including so called soft errors caused by radiation), provided that some technique based on re-execution of the last operation is used. Synthesis results obtained for a 90 nm CMOS technology have confirmed significant hardware and power consumption savings of the proposed approach over commonly used duplication with comparison. Introducing one extra pipeline stage in the self-checking version of the basic arithmetic blocks has allowed to significantly reduce the delay overhead compared to our previous design.
  •  
25.
  • Jafri, Syed Mohammad Asad Hassan, et al. (författare)
  • Energy-aware fault-tolerant network-on-chips for addressing multiple traffic classes
  • 2013
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 37:8, s. 811-822
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents an energy efficient architecture to provide on-demand fault tolerance to multiple traffic classes, running simultaneously on single network on chip (NoC) platform. Today, NoCs host multiple traffic classes with potentially different reliability needs. Providing platform-wide worst-case (maximum) protection to all the classes is neither optimal nor desirable. To reduce the overheads incurred by fault tolerance, various adaptive strategies have been proposed. The proposed techniques rely on individual packet fields and operating conditions to adjust the intensity and hence the overhead of fault tolerance. Presence of multiple traffic classes undermines the effectiveness of these methods. To complement the existing adaptive strategies, we propose on-demand fault tolerance, capable of providing required reliability, while significantly reducing the energy overhead. Our solution relies on a hierarchical agent based control layer and a reconfigurable fault tolerance data path. The control layer identifies the traffic class and directs the packet to the path providing the needed reliability. Simulation results using representative applications (matrix multiplication, FFT, wavefront, and HiperLAN) showed up to 95% decrease in energy consumption compared to traditional worst case methods. Synthesis results have confirmed a negligible additional overhead, for providing on-demand protection (up to 5.3% area), compared to the overall fault tolerance circuitry.
  •  
26.
  • Jafri, Syed M. A. H., et al. (författare)
  • TEA : Timing and Energy Aware compression architecture for Efficient Configuration in CGRAs
  • 2015
  • Ingår i: Microprocessors and microsystems. - : Elsevier. - 0141-9331 .- 1872-9436.
  • Tidskriftsartikel (refereegranskat)abstract
    • Coarse Grained Reconfigurable Architectures (CGRAs) are emerging as enabling platforms to meet the high performance demanded by modern applications (e.g. 4G, CDMA, etc.). Recently proposed CGRAs offer time-multiplexing and dynamic applications parallelism to enhance device utilization and reduce energy consumption at the cost of additional memory (up to 50% area of the overall platform). To reduce the memory overheads, novel CGRAs employ either statistical compression, intermediate compact representation, or multicasting. Each compaction technique has different properties (i.e. compression ratio, decompression time and decompression energy) and is best suited for a particular class of applications. However, existing research only deals with these methods separately. Moreover, they only analyze the compaction ratio and do not evaluate the associated energy overheads. To tackle these issues, we propose a polymorphic compression architecture that interleaves these techniques in a unique platform. The proposed architecture allows each application to take advantage of a separate compression/decompression hierarchy (consisting of various types and implementations of hardware/software decoders) tailored to its needs. Simulation results, using different applications (FFT, Matrix multiplication, and WLAN), reveal that the choice of compression hierarchy has a significant impact on compression ratio (up to 52%), decompression energy (up to 4 orders of magnitude), and configuration time (from 33. n to 1.5. s) for the tested applications. Synthesis results reveal that introducing adaptivity incurs negligible additional overheads (1%) compared to the overall platform area.
  •  
27.
  • Jalminger, Jonas, et al. (författare)
  • A Cache Block Reuse Prediction Scheme
  • 2004
  • Ingår i: Microprocessors and Microsystems. - : Elsevier BV. - 0141-9331. ; 28:7, s. 373-385
  • Tidskriftsartikel (refereegranskat)abstract
    • We introduce a novel approach to predict whether a block should be allocated in the cache or not upon a miss based on past reuse behavior during its lifetime in the cache. It introduces a new reuse model that makes a single-entry bypass buffer suffice to exploit the spatial locality in non-allocated blocks. It also applies classical two-level branch prediction to the reuse history patterns to predict whether the block should be allocated or not.Our evaluation of the scheme, based on five benchmarks from SPEC'95 and a set of six multimedia and database applications, shows that the prediction accuracy is between 66 and 94% across the applications and can result in a miss rate reduction of between 1 and 32% with an average of 12% (using the ideal implementation). We also consider cost/performance aspects of several implementations of the scheme. We find that with a modest hardware cost—essentially a table of about 300 bytes—miss rate can be cut by up to 14% compared to a cache with an always-allocate strategy.
  •  
28.
  • K, Divya Nath, et al. (författare)
  • Interfaced Circuit using a non- destructive method for Moisture Measurement
  • 2020
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 73
  • Tidskriftsartikel (refereegranskat)abstract
    • Analysing the moisture in stored products like harvested cereal grains and their products, peas, beans, oil-seeds, copra, cocoa beans, spices etc. is very much important to avoid the fungi growth. Moisture can be present in grain in more than one state, i.e. as bound, adsorbed or absorbed water. A designed, integrated circuit was interfaced with personal computer to measure the capacitance which in turn help to calculate the moisture content of rice. The interfaced circuit was tested by measuring the capacitance of different ceramic capacitor. This technique is fast, reliable, accurate and gives hundred set of readings in few seconds. Moisture contents are measured in percentage. The error correction was done with the help of mat - lab programming.
  •  
29.
  • Kamuf, Matthias, et al. (författare)
  • Design and Measurement of a Variable-Rate Viterbi Decoder in 130-nm Digital CMOS
  • 2010
  • Ingår i: Microprocessors and Microsystems. - : Elsevier BV. - 0141-9331. ; 34:2010, s. 129-137
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper discusses design and measurements of a flexible Viterbi decoder fabricated in 130-nm digital CMOS. Flexibility was incorporated by providing various code rates and modulation schemes to adjust to varying channel conditions. Based on previous trade-off studies, flexible building blocks were carefully designed to cause as little area penalty as possible. The chip runs down to a minimal core supply of 0.8V. It turns out that striving for more modulation schemes is beneficial in terms of power consumption once the price is paid for accepting different code rates viz. radices in the trellis and survivor path units.
  •  
30.
  • Kuchcinski, Krzysztof, et al. (författare)
  • Announcing MICPRO embedded hardware design
  • 2008
  • Ingår i: Microprocessors and Microsystems. - : Elsevier BV. - 0141-9331. ; 32:1
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)
  •  
31.
  • Kuchcinski, Krzysztof (författare)
  • Constraint programming in embedded systems design : Considered helpful
  • 2019
  • Ingår i: Microprocessors and Microsystems. - : Elsevier BV. - 0141-9331. ; 69, s. 24-34
  • Tidskriftsartikel (refereegranskat)abstract
    • Embedded systems are built for specific purposes and are optimized to meet different kind of constraints, such as performance, timing, power and cost. The design process therefore involves different optimization activities. In this paper, we discuss the use of constraint programming (CP) technology for these optimization problems. The main advantages and disadvantages of applying CP to embedded system design problems are discussed on two examples, scheduling and mapping. Based on these examples modelling capabilities of CP and basic solving methods are discussed. We have identified CP modelling capability as an important factor for problem formalization and their uniform representation. We have also, using several experiments, show efficiency of the models and solving process. Finally, we have also pointed out difficulties with CP technology that are mostly related to search methods that, for more realistic problems, must be carefully selected or even new methods must be developed.
  •  
32.
  • Latif, Khalid, et al. (författare)
  • Service based communication for MPSoC platform-SegBus
  • 2011
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 35:7, s. 643-655
  • Tidskriftsartikel (refereegranskat)abstract
    • MPSoC platforms offer solutions to deal with communication limitations for multiple cores on single chip, but many new issues arise within the context. The SegBus platform is one of the solutions for application deployment on multi-core applications. There are many applications where identical data is transferred from the same source towards different destinations. Multicast services may come as a performance improving factor for the interconnection platform, together with interrupt service. In this paper, the task is to analyze, how different services can be designed for the SegBus platform and observe the improvement in system performance. The designer can select the services according to the requirements. The running example is represented by the H.264 encoder. The SegBus platform architecture, the communication mechanism, the allocation of processing elements on the platform, the communication services and their implementation are the main topics elaborated here.
  •  
33.
  • Li, Nan, et al. (författare)
  • Area-efficient high-coverage LBIST
  • 2014
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 38:5, s. 368-374
  • Tidskriftsartikel (refereegranskat)abstract
    • Logic Built-In Self Test (LBIST) is a popular technique for applications requiring in-field testing of digital circuits. LBIST incorporates test generation and response-capture on-chip. It requires no interaction with a large, expensive tester. LBIST offers test time reduction due to at-speed test pattern application, makes possible test data re-usability at many levels, and enables test-ready IP. However, the traditional pseudo-random pattern-based LBIST often has a low test coverage. This paper presents a new method for on-chip generation of deterministic test patterns based on registers with non-linear update. Our experimental results on 7 real designs show that the presented approach can achieve a higher stuck-at coverage than the test point insertion with less area overhead. We also show that registers with non-linear update are asymptotically smaller than memories required to store the same test patterns in a compressed form.
  •  
34.
  • Loni, Mohammad, et al. (författare)
  • DeepMaker : A multi-objective optimization framework for deep neural networks in embedded systems
  • 2020
  • Ingår i: Microprocessors and microsystems. - : Elsevier B.V.. - 0141-9331 .- 1872-9436. ; 73
  • Tidskriftsartikel (refereegranskat)abstract
    • Deep Neural Networks (DNNs) are compute-intensive learning models with growing applicability in a wide range of domains. Due to their computational complexity, DNNs benefit from implementations that utilize custom hardware accelerators to meet performance and response time as well as classification accuracy constraints. In this paper, we propose DeepMaker framework that aims to automatically design a set of highly robust DNN architectures for embedded devices as the closest processing unit to the sensors. DeepMaker explores and prunes the design space to find improved neural architectures. Our proposed framework takes advantage of a multi-objective evolutionary approach that exploits a pruned design space inspired by a dense architecture. DeepMaker considers the accuracy along with the network size factor as two objectives to build a highly optimized network fitting with limited computational resource budgets while delivers an acceptable accuracy level. In comparison with the best result on the CIFAR-10 dataset, a generated network by DeepMaker presents up to a 26.4x compression rate while loses only 4% accuracy. Besides, DeepMaker maps the generated CNN on the programmable commodity devices, including ARM Processor, High-Performance CPU, GPU, and FPGA.
  •  
35.
  • Ma, Ning, et al. (författare)
  • System design of full HD MVC decoding on mesh-based multicore NoCs
  • 2011
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 35:2, s. 217-229
  • Tidskriftsartikel (refereegranskat)abstract
    • Future multimedia applications such as full HD (1920 x 1080) multiview video coding (MVC) present great challenges on computing architectures. Even if with the state-of-the-art ASIC technology which can process single view HD decoding, dealing with multiple views would require times of computation capacity in proportion to the number of views, which is difficult to achieve. In this paper, we explore the system-level design space for full HD MVC applications mapped onto mesh-based multicore Network-on-Chip (NoC) architectures. To this end, we establish a simulation framework capable of simulating the combination of communication networks with computing cores. We investigate two task assignment schemes: picture-level assignment and view-level assignment. With an eight-view MVC decoding, we explore the design options with respect to network size, single-core performance and link bandwidth under both task assignment schemes. Our studies show that, to achieve a certain decoding performance, the computation capability and communication capacity should be balanced in the system. Also, to realize the eight-view HD decoding, the system only requires twice or less than twice of the single-core processing capacity required by single view decoding, thanks to the parallel computation and communication enabled by the multicore NoC architectures. Our results exhibit feasibility and potential of efficiently implementing the full HD MVC decoding on multicore NoC architectures.
  •  
36.
  • Mohseni, Zeynab, et al. (författare)
  • Reliability Characterization and Activity Analysis of lowRISC Internal Modules against Single Event Upsets Using Fault Injection and RTL Simulation
  • 2019
  • Ingår i: Microprocessors and microsystems. - : Elsevier. - 0141-9331 .- 1872-9436. ; 71
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract
    • One of the concerns about satellites with sensitive electronic devices is the harmful radiation that produces effects like Single Event Upsets (SEUs), which can cause errors. SRAM-based FPGAs are extensively used to implement a wide range of digital circuits among which are soft processors. In this paper, we focus on two different issues: 1) characterizing the different modules of lowRISC to determine their sensitivity to errors in the FPGA configuration memory and 2) analyzing the activity level of the mentioned modules using RTL simulation to correlate the activity level and the sensitivity of the different modules of the soft processor. Fault injection campaigns have been performed in order to evaluate the reliability of these different modules. Experimental results show that the instruction cache module is the most sensitive module of lowRISC for the benchmarks considered. Therefore, this cache module could be protected using different protection techniques to increase the reliability of the microprocessor.
  •  
37.
  • Norollah, Amin, et al. (författare)
  • A security-aware hardware scheduler for modern multi-core systems with hard real-time constraints
  • 2022
  • Ingår i: Microprocessors and microsystems. - Amsterdam : Elsevier. - 0141-9331 .- 1872-9436. ; 95
  • Tidskriftsartikel (refereegranskat)abstract
    • In this paper, we propose an online security-aware hardware scheduler, the so-called Secure And Fast hardware Scheduler (SAFAS), for real-time task scheduling in multi-core systems in the presence of schedule-based side-channel attacks. To avoid such attacks and ensure that all tasks meet their deadlines, SAFAS schedules critical tasks and their replicas using a hardware-based strict Least Slack Time first (LST) algorithm independently while it independently schedules the non-critical tasks using a hardware-based EDF algorithm. SAFAS enhances the system performance and reduces the chance of side-channel attacks due to the different processing cores allocated to each task in each scheduling interval. The hardware scheduler operates independently and in parallel with the multi-core system and hides the scheduling characteristics from adversaries. The software-based Earliest Deadline First (EDF) algorithm is also used for schedulability tests and feasibility analysis of hard real-time periodic tasks to maximize the number of tasks scheduled successfully in the multi-core system. SAFAS has been synthesized and simulated on a Xilinx Vivado 2018.2 and implemented on a Spartan-7 FPGA chip. Our experimental results indicate that SAFAS increases the performance of the system by 4.8 times as compared to previous state-of-the-art hardware schedulers while guaranteeing that all critical tasks and their replicas meet their deadlines. © 2022 Elsevier B.V.
  •  
38.
  • Nunez-Yanez, Jose Luis, et al. (författare)
  • Dynamically reconfigurable variable-precision sparse-dense matrix acceleration in Tensorflow Lite
  • 2023
  • Ingår i: Microprocessors and microsystems. - : ELSEVIER. - 0141-9331 .- 1872-9436. ; 98
  • Tidskriftsartikel (refereegranskat)abstract
    • In this paper, we present a dynamically reconfigurable hardware accelerator called FADES (Fused Architecture for DEnse and Sparse matrices). The FADES design offers multiple configuration options that trade off parallelism and complexity using a dataflow model to create four stages that read, compute, scale and write results. FADES is mapped to the programmable logic (PL) and integrated with the TensorFlow Lite inference engine running on the processing system (PS) of a heterogeneous SoC device. The accelerator is used to compute the tensor operations, while the dynamically reconfigurable approach can be used to switch precision between int8 and float modes. This dynamic reconfiguration enables better performance by allowing more cores to be mapped to the resource-constrained device and lower power consumption compared with supporting both arithmetic precisions simultaneously. We compare the proposed hardware with a high-performance systolic architecture for dense matrices obtaining 25% better performance in dense mode with half the DSP blocks in the same technology. In sparse mode, we show that the core can outperform dense mode even at low sparsity levels, and a single-core achieves up to 20x acceleration over the software-optimized NEON RUY library.
  •  
39.
  • Pnevmatikatos, Dionisios N., et al. (författare)
  • FASTER: Facilitating Analysis and Synthesis Technologies for Effective Reconfiguration
  • 2015
  • Ingår i: Microprocessors and Microsystems. - : Elsevier BV. - 0141-9331. ; 39:4-5, s. 321-338
  • Tidskriftsartikel (refereegranskat)abstract
    • The FASTER (Facilitating Analysis and Synthesis Technologies for Effective Reconfiguration) EU FP7 project, aims to ease the design and implementation of dynamically changing hardware systems. Our motivation stems from the promise reconfigurable systems hold for achieving high performance and extending product functionality and lifetime via the addition of new features that operate at hardware speed. However, designing a changing hardware system is both challenging and time-consuming. FASTER facilitates the use of reconfigurable technology by providing a complete methodology enabling designers to easily specify, analyze, implement and verify applications on platforms with general-purpose processors and acceleration modules implemented in the latest reconfigurable technology. Our tool-chain supports both coarse- and fine-grain FPGA reconfiguration, while during execution a flexible run-time system manages the reconfigurable resources. We target three applications from different domains. We explore the way each application benefits from reconfiguration, and then we asses them and the FASTER tools, in terms of performance, area consumption and accuracy of analysis.
  •  
40.
  • Pop, P., et al. (författare)
  • Safe cooperating cyber-physical systems using wireless communication : The SafeCOP approach
  • 2017
  • Ingår i: Microprocessors and microsystems. - : Elsevier B.V.. - 0141-9331 .- 1872-9436. ; 53, s. 42-50
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents an overview of the ECSEL project entitled “Safe Cooperating Cyber-Physical Systems using Wireless Communication” (SafeCOP), which runs during the period 2016–2019. SafeCOP targets safety-related Cooperating Cyber-Physical Systems (CO-CPS) characterised by use of wireless communication, multiple stakeholders, dynamic system definitions (openness), and unpredictable operating environments. SafeCOP will provide an approach to the safety assurance of CO-CPS, enabling thus their certification and development. The project will define a runtime manager architecture for runtime detection of abnormal behaviour, triggering if needed a safe degraded mode. SafeCOP will also develop methods and tools, which will be used to produce safety assurance evidence needed to certify cooperative functions. SafeCOP will extend current wireless technologies to ensure safe and secure cooperation, and also contribute to new standards and regulations, by providing certification authorities and standardization committees with the scientifically validated solutions needed to craft effective standards extended to also address cooperation and system-of-systems issues. The project has 28 partners from 6 European countries, and a budget of about 11 million Euros corresponding to about 1,300 person-months. 
  •  
41.
  • Rahmati, Dara, et al. (författare)
  • Power-efficient deterministic and adaptive routing in torus networks-on-chip
  • 2012
  • Ingår i: Microprocessors and microsystems. - : Elsevier. - 0141-9331 .- 1872-9436. ; 36:7, s. 571-585
  • Tidskriftsartikel (refereegranskat)abstract
    • Modern SoC architectures use NoCs for high-speed inter-IP communication. For NoC architectures, high-performance efficient routing algorithms with low power consumption are essential for real-time applications. NoCs with mesh and torus interconnection topologies are now popular due to their simple structures. A torus NoC is very similar to the mesh NoC, but has rather smaller diameter. For a routing algorithm to be deadlock-free in a torus, at least two virtual channels per physical channel must be used to avoid cyclic channel dependencies due to the warp-around links; however, in a mesh network deadlock freedom can be insured using only one virtual channel. The employed number of virtual channels is important since it has a direct effect on the power consumption of NoCs. In this paper, we propose a novel systematic approach for designing deadlock-free routing algorithms for torus NoCs. Using this method a new deterministic routing algorithm (called TRANC) is proposed that uses only one virtual channel per physical channel in torus NoCs. We also propose an algorithmic mapping that enables extracting TRANC-based routing algorithms from existing routing algorithms, which can be both deterministic and adaptive. The simulation results show power consumption and performance improvements when using the proposed algorithms.
  •  
42.
  • Redell, Ola, et al. (författare)
  • The AIDA toolset for design and implementation analysis of distributed real-time control systems
  • 2004
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 28:4, s. 163-182
  • Tidskriftsartikel (refereegranskat)abstract
    • This article introduces a toolset that integrates the design and performance analysis of control systems with embedded real-time system design. The toolset enables specification and analysis of real-time implementations of control applications. Control system designs are imported to a real-time system-modelling domain in which the functionality is distributed on a target computer system. The control functionality is partitioned into operating system processes, inter-process communications are defined and the triggering of processes is specified. Once the real-time design is complete, the response times and release jitter of the processes and their contained functions can be analysed and the system information exported back to the control domain. This enables analysis of the resulting control performance with account taken to implementation effects such as delays and release jitter due to resource sharing and scheduling. The usage of the toolset is demonstrated on a dual leg controller for a walking robot. The case study shows how the toolset is used to describe a system, from the control system specification to the design of its implementation on a distributed network of processors. Different implementation solutions are suggested and evaluated based on simulated control system performance.
  •  
43.
  • Rezaei, A., et al. (författare)
  • CAP-W : Congestion-aware platform for wireless-based network-on-chip in many-core era
  • 2017
  • Ingår i: Microprocessors and microsystems. - : Elsevier. - 0141-9331 .- 1872-9436. ; 52, s. 23-33
  • Tidskriftsartikel (refereegranskat)abstract
    • In order to fulfill the ever-increasing demand for high-speed and high-bandwidth, wireless-based MCSoC is presented based on a NoC communication infrastructure. Inspiring the separation between the communication and the computation demands as well as providing the flexible topology configurations, makes wireless-based NoC a promising future MCSoC architecture. However, congestion occurrence in wireless routers reduces the benefit of high-speed wireless links and significantly increases the network latency. Therefore, in this paper, a congestion-aware platform, named CAP-W, is introduced for wireless-based NoC in order to reduce congestion in the network and especially over wireless routers. The triple-layer platform of CAP-W is composed of mapping, migration, and routing layers. In order to minimize the congestion probability, the mapping layer is responsible for selecting the suitable free core as the first candidate, finding the suitable first task to be mapped onto the selected core, and allocating other tasks with respect to contiguity. Considering dynamic variation of application behaviors, the migration layer modifies the primary task mapping to improve congestion situation. Furthermore, the routing layer balances utilization of wired and wireless networks by separating short-distance and long-distance communications. Experimental results show meaningful gain in congestion control of wireless-based NoC compared to state-of-the-art works.
  •  
44.
  • Saadatmand, Mehrdad, PhD, 1980-, et al. (författare)
  • SmartDelta project : Automated quality assurance and optimization across product versions and variants
  • 2023
  • Ingår i: Microprocessors and microsystems. - : Elsevier. - 0141-9331 .- 1872-9436. ; 103
  • Tidskriftsartikel (refereegranskat)abstract
    • Software systems are often built in increments with additional features or enhancements on top of existing products. This incremental development may result in the deterioration of certain quality aspects. In other words, the software can be considered an evolving entity emanating different quality characteristics as it gets updated over time with new features or deployed in different operational environments. Approaching software development with this mindset and awareness regarding quality evolution over time can be a key factor for the long-term success of a company in today's highly competitive market of industrial software-intensive products. Therefore, it is important to be able to accurately analyze and determine the quality implications of each change and increment to a software system. To address this challenge, the multinational SmartDelta project develops automated solutions for the quality assessment of product deltas in a continuous engineering environment. The project provides smart analytics from development artifacts and system executions, offering insights into quality degradation or improvements across different product versions, and providing recommendations for the next builds. This paper presents the challenges in incremental software development tackled in the scope of the SmartDelta project, and the solutions that are produced and planned in the project, along with the industrial impact of the project for software-intensive industrial systems.
  •  
45.
  • Sadeghi, Sadegh, et al. (författare)
  • Cryptanalysis of reduced QTL block cipher
  • 2017
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 52, s. 34-48
  • Tidskriftsartikel (refereegranskat)abstract
    • Recently, a new ultra lightweight block cipher called QTL has been proposed. The authors claim to achieve a fast diffusion in QTL by using a new variant of a generalized Feistel network structure that changes all block messages in one iterative round in contrast to traditional Feistel-type structures changing only half of block messages. In this paper, we evaluate the security claims of the designers and show that their claims are not valid as QTL is vulnerable to the standard statistical attacks on block ciphers.
  •  
46.
  • Sadovykh, A., et al. (författare)
  • On a tool-supported model-based approach for building architectures and roadmaps : The MegaM@Rt2 project experience
  • 2019
  • Ingår i: Microprocessors and microsystems. - : Elsevier B.V.. - 0141-9331 .- 1872-9436. ; 71
  • Tidskriftsartikel (refereegranskat)abstract
    • MegaM@Rt2 is a large European project dedicated to the provisioning of a model-based methodology and supporting tooling for system engineering at a wide scale. It notably targets the continuous development and runtime validation of such complex systems by developing a framework addressing a large set of engineering processes and application domains. This collaborative project involves 27 partners from 6 different countries, 9 industrial case studies as well as over 30 different software tools from project partners (and others). In the context of the MegaM@Rt2 project, we elaborated on a pragmatic model-driven approach to specify the case study requirements, design the high-level architecture of a framework, perform the gap analysis between the industrial needs and current state-of-the-art, and plan a first framework development roadmap accordingly. The present paper describes the generic tool-supported approach that came out as a result. It also details its concrete application in the MegaM@Rt2 project. In particular, we discuss the collaborative modeling process, the requirement definition tooling, the approach for components modeling, as well as the traceability and document generation. In addition, we show how we used the proposed solution to specify the MegaM@Rt2 framework's conceptual tool components centered around three complementary tool sets: the MegaM@Rt2 System Engineering Tool Set, the MegaM@Rt2 Runtime Analysis Tool Set and the MegaM@Rt2 Model & Traceability Management Tool Set. The paper ends with a discussion on the practical lessons we have learned from this work so far. 
  •  
47.
  • Savas, Süleyman, 1986-, et al. (författare)
  • A framework to generate domain-specific manycore architectures from dataflow programs
  • 2020
  • Ingår i: Microprocessors and microsystems. - Amsterdam : Elsevier. - 0141-9331 .- 1872-9436. ; 72
  • Tidskriftsartikel (refereegranskat)abstract
    • In the last 15 years we have seen, as a response to power and thermal limits for current chip technologies, an explosion in the use of multiple and even many computer cores on a single chip. But now, to further improve performance and energy efficiency, when there are potentially hundreds of computing cores on a chip, we see a need for a specialization of individual cores and the development of heterogeneous manycore computer architectures.However, developing such heterogeneous architectures is a significant challenge. Therefore, we propose a design method to generate domain specific manycore architectures based on RISC-V instruction set architecture and automate the main steps of this method with software tools. The design method allows generation of manycore architectures with different configurations including core augmentation through instruction extensions and custom accelerators. The method starts from developing applications in a high-level dataflow language and ends by generating synthesizable Verilog code and cycle accurate emulator for the generated architecture.We evaluate the design method and the software tools by generating several architectures specialized for two different applications and measure their performance and hardware resource usages. Our results show that the design method can be used to generate specialized manycore architectures targeting applications from different domains. The specialized architectures show at least 3 to 4 times better performance than the general purpose counterparts. In certain cases, replacing general purpose components with specialized components saves hardware resources. Automating the method increases the speed of architecture development and facilitates the design space exploration of manycore architectures.
  •  
48.
  • Sherazi, Syed Muhammad Yasser, et al. (författare)
  • Ultra Low Energy Design Exploration of Digital Decimation Filters in 65 nm Dual-VT CMOS in the Sub-VT Domain
  • 2013
  • Ingår i: Microprocessors and Microsystems. - : Elsevier BV. - 0141-9331. ; 37:4-5, s. 494-504
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents an analysis of energy dissipation of a decimation filter chain of four half band digital (HBD) filters operated in the sub-threshold (sub-VT) region with throughput constraints. To combat speed degradation due to scaling of supply voltage, various HBD filters are implemented as unfolded structures. The designs are synthesized in 65 nm CMOS technology with low-power and three threshold options, both as single-VT and as dual-VT. A sub-VT energy model is applied to characterize the designs in the sub-VT domain. Simulation results show that the unfolded by 2 and 4 architectures are the most energy efficient for throughput requirements between 250 ksamples/s, and 2Msamples/s. By the selection of optimum architectures and standard cells, at the required throughput the simulated minimum energy dissipation for the required throughput per output sample is 164 fJ and 205 fJ, with single supply voltage of 260mV.
  •  
49.
  • Sourdis, Ioannis, 1979, et al. (författare)
  • DeSyRe: On-demand system reliability
  • 2013
  • Ingår i: Microprocessors and Microsystems. - : Elsevier BV. - 0141-9331. ; 37:8, s. 981-1001
  • Tidskriftsartikel (refereegranskat)abstract
    • The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect-/fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints.
  •  
50.
  • Stathis, Dimitrios, et al. (författare)
  • Clock tree generation by abutment in synchoros VLSI design
  • 2023
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 102
  • Tidskriftsartikel (refereegranskat)abstract
    • Synchoros VLSI design style has been proposed as an alternative to standard cell-based design. Standard cells are replaced by synchoros, large grain, VLSI design objects called SiLago (Silicon Lego) blocks. This new design style eliminates the need to synthesise ad hoc wires of any type: functional and infrastructural. SiLago blocks are organised into region instances. In a region instance, communication amongst SiLago blocks is synchronous and happens over a regional network on chip (NoC), whose fragments are also absorbed into SiLago blocks. Consequently, the regional NoCs get created by the abutment of SiLago blocks. The clock tree used in a region is called a regional clock tree (RCT). The synchoros VLSI design style requires that the RCT, like the regional NoCs, is also created by abutting its fragments. The RCT fragments are absorbed within the SiLago blocks. The RCT created by the abutment is not an ad-hoc clock tree but a structured and predictable design with known cost metrics. The design of such an RCT is the focus of this paper. The scheme is scalable, and we demonstrate that the proposed RCT can be generated for valid VLSI designs of ∼1.5 million gates. The RCT created by abutment is correct by construction, and its properties are predictable. Additionally, we present an in-depth description of the method used to find the optimal configuration for the proposed design. We have validated the generated RCTs with static timing analysis to validate the correct-by-construction claim. Finally, we show that the cost metrics of the SiLago RCT is comparable to the one generated by commercial EDA tools.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-50 av 56

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy