SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L773:0141 9331 OR L773:1872 9436 srt2:(2010-2014)"

Sökning: L773:0141 9331 OR L773:1872 9436 > (2010-2014)

  • Resultat 1-10 av 17
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Daneshtalab, Masoud, et al. (författare)
  • Special issue on many-core embedded systems
  • 2014
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 38:6, s. 525-525
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)
  •  
2.
  • Farahini, Nasim, et al. (författare)
  • Parallel distributed scalable runtime address generation scheme for a coarse grain reconfigurable computation and storage fabric
  • 2014
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 38:8, s. 788-802
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents a hardware based solution for a scalable runtime address generation scheme for DSP applications mapped to a parallel distributed coarse grain reconfigurable computation and storage fabric. The scheme can also deal with non-affine functions of multiple variables that typically correspond to multiple nested loops. The key innovation is the judicious use of two categories of address generation resources. The first category of resource is the low cost AGU that generates addresses for given address bounds for affine functions of up to two variables. Such low cost AGUs are distributed and associated with every read/write port in the distributed memory architecture. The second category of resource is relatively more complex but is also distributed but shared among a few storage units and is capable of handling more complex address generation requirements like dynamic computation of address bounds that are then used to configure the AGUs, transformation of non-affine functions to affine function by computing the affine factor outside the loop, etc. The runtime computation of the address constraints results in negligibly small overhead in latency, area and energy while it provides substantial reduction in program storage, reconfiguration agility and energy compared to the prevalent pre-computation of address constraints. The efficacy of the proposed method has been validated against the prevalent address generation schemes for a set of six realistic DSP functions. Compared to the pre-computation method, the proposed solution achieved 75% average code compaction and compared to the centralized runtime address generation scheme, the proposed solution achieved 32.7% average performance improvement.
  •  
3.
  • Farahnakian, Fahimeh, et al. (författare)
  • Bi-LCQ: A Low-weight Clustering-based Q-learning Approach for NoCs
  • 2014
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 38:1, s. 64-75
  • Tidskriftsartikel (refereegranskat)abstract
    • Network congestion has a negative impact on the performance of on-chip networks due to the increasedpacket latency. Many congestion-aware routing algorithms have been developed to alleviate trafficcongestion over the network. In this paper, we propose a congestion-aware routing algorithm basedon the Q-learning approach for avoiding congested areas in the network. By using the learning method,local and global congestion information of the network is provided for each switch. This information canbe dynamically updated, when a switch receives a packet. However, Q-learning approach suffers fromhigh area overhead in NoCs due to the need for a large routing table in each switch. In order to reducethe area overhead, we also present a clustering approach that decreases the number of routing tablesby the factor of 4. Results show that the proposed approach achieves a significant performance improvementover the traditional Q-learning, C-routing, DBAR and Dynamic XY algorithms.
  •  
4.
  • Guang, Liang, et al. (författare)
  • Interconnection alternatives for hierarchical monitoring communication in parallel SoCs
  • 2010
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 34:5, s. 118-128
  • Tidskriftsartikel (refereegranskat)abstract
    • Interconnection architectures for hierarchical monitoring communication in parallel System-on-Chip (SoC) platforms are explored. Hierarchical agent monitoring design paradigm is an efficient and scalable approach for the design of parallel embedded systems. Between distributed agents on different levels, monitoring communication is required to exchange information, which forms a prioritized traffic class over data traffic. The paper explains the common monitoring operations in SoCs, and categorizes them into different types of functionality and various granularities. Requirements for on-chip interconnections to support the monitoring communication are outlined. Baseline architecture with best-effort service, time division multiple access (TDMA) and two types of physically separate interconnections are discussed and compared, both theoretically and quantitatively on a Network-on-Chip (NoC)-based platform. The simulation uses power estimation of 65 nm technology and NoC microbenchmarks as traffic traces. The evaluation points out the benefits and issues of each interconnection alternative. In particular, hierarchical monitoring networks are the most suitable alternative, which decouple the monitoring communication from data traffic, provide the highest energy efficiency with simple switching, and enable flexible reconfiguration to tradeoff power and performance.
  •  
5.
  • Jafri, Syed Mohammad Asad Hassan, et al. (författare)
  • Design of the coarse-grained reconfigurable architecture DART with on-line error detection
  • 2014
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 38:2, s. 124-136
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents the implementation of the coarse-grained reconfigurable architecture (CGRA) DART with on-line error detection intended for increasing fault-tolerance. Most parts of the data paths and of the local memory of DART are protected using residue code modulo 3, whereas only the logic unit is protected using duplication with comparison. These low-cost hardware techniques would allow to tolerate temporary faults (including so called soft errors caused by radiation), provided that some technique based on re-execution of the last operation is used. Synthesis results obtained for a 90 nm CMOS technology have confirmed significant hardware and power consumption savings of the proposed approach over commonly used duplication with comparison. Introducing one extra pipeline stage in the self-checking version of the basic arithmetic blocks has allowed to significantly reduce the delay overhead compared to our previous design.
  •  
6.
  • Jafri, Syed Mohammad Asad Hassan, et al. (författare)
  • Energy-aware fault-tolerant network-on-chips for addressing multiple traffic classes
  • 2013
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 37:8, s. 811-822
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents an energy efficient architecture to provide on-demand fault tolerance to multiple traffic classes, running simultaneously on single network on chip (NoC) platform. Today, NoCs host multiple traffic classes with potentially different reliability needs. Providing platform-wide worst-case (maximum) protection to all the classes is neither optimal nor desirable. To reduce the overheads incurred by fault tolerance, various adaptive strategies have been proposed. The proposed techniques rely on individual packet fields and operating conditions to adjust the intensity and hence the overhead of fault tolerance. Presence of multiple traffic classes undermines the effectiveness of these methods. To complement the existing adaptive strategies, we propose on-demand fault tolerance, capable of providing required reliability, while significantly reducing the energy overhead. Our solution relies on a hierarchical agent based control layer and a reconfigurable fault tolerance data path. The control layer identifies the traffic class and directs the packet to the path providing the needed reliability. Simulation results using representative applications (matrix multiplication, FFT, wavefront, and HiperLAN) showed up to 95% decrease in energy consumption compared to traditional worst case methods. Synthesis results have confirmed a negligible additional overhead, for providing on-demand protection (up to 5.3% area), compared to the overall fault tolerance circuitry.
  •  
7.
  • Latif, Khalid, et al. (författare)
  • Service based communication for MPSoC platform-SegBus
  • 2011
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 35:7, s. 643-655
  • Tidskriftsartikel (refereegranskat)abstract
    • MPSoC platforms offer solutions to deal with communication limitations for multiple cores on single chip, but many new issues arise within the context. The SegBus platform is one of the solutions for application deployment on multi-core applications. There are many applications where identical data is transferred from the same source towards different destinations. Multicast services may come as a performance improving factor for the interconnection platform, together with interrupt service. In this paper, the task is to analyze, how different services can be designed for the SegBus platform and observe the improvement in system performance. The designer can select the services according to the requirements. The running example is represented by the H.264 encoder. The SegBus platform architecture, the communication mechanism, the allocation of processing elements on the platform, the communication services and their implementation are the main topics elaborated here.
  •  
8.
  • Li, Nan, et al. (författare)
  • Area-efficient high-coverage LBIST
  • 2014
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 38:5, s. 368-374
  • Tidskriftsartikel (refereegranskat)abstract
    • Logic Built-In Self Test (LBIST) is a popular technique for applications requiring in-field testing of digital circuits. LBIST incorporates test generation and response-capture on-chip. It requires no interaction with a large, expensive tester. LBIST offers test time reduction due to at-speed test pattern application, makes possible test data re-usability at many levels, and enables test-ready IP. However, the traditional pseudo-random pattern-based LBIST often has a low test coverage. This paper presents a new method for on-chip generation of deterministic test patterns based on registers with non-linear update. Our experimental results on 7 real designs show that the presented approach can achieve a higher stuck-at coverage than the test point insertion with less area overhead. We also show that registers with non-linear update are asymptotically smaller than memories required to store the same test patterns in a compressed form.
  •  
9.
  • Ma, Ning, et al. (författare)
  • System design of full HD MVC decoding on mesh-based multicore NoCs
  • 2011
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 35:2, s. 217-229
  • Tidskriftsartikel (refereegranskat)abstract
    • Future multimedia applications such as full HD (1920 x 1080) multiview video coding (MVC) present great challenges on computing architectures. Even if with the state-of-the-art ASIC technology which can process single view HD decoding, dealing with multiple views would require times of computation capacity in proportion to the number of views, which is difficult to achieve. In this paper, we explore the system-level design space for full HD MVC applications mapped onto mesh-based multicore Network-on-Chip (NoC) architectures. To this end, we establish a simulation framework capable of simulating the combination of communication networks with computing cores. We investigate two task assignment schemes: picture-level assignment and view-level assignment. With an eight-view MVC decoding, we explore the design options with respect to network size, single-core performance and link bandwidth under both task assignment schemes. Our studies show that, to achieve a certain decoding performance, the computation capability and communication capacity should be balanced in the system. Also, to realize the eight-view HD decoding, the system only requires twice or less than twice of the single-core processing capacity required by single view decoding, thanks to the parallel computation and communication enabled by the multicore NoC architectures. Our results exhibit feasibility and potential of efficiently implementing the full HD MVC decoding on multicore NoC architectures.
  •  
10.
  • Rahmati, Dara, et al. (författare)
  • Power-efficient deterministic and adaptive routing in torus networks-on-chip
  • 2012
  • Ingår i: Microprocessors and microsystems. - : Elsevier. - 0141-9331 .- 1872-9436. ; 36:7, s. 571-585
  • Tidskriftsartikel (refereegranskat)abstract
    • Modern SoC architectures use NoCs for high-speed inter-IP communication. For NoC architectures, high-performance efficient routing algorithms with low power consumption are essential for real-time applications. NoCs with mesh and torus interconnection topologies are now popular due to their simple structures. A torus NoC is very similar to the mesh NoC, but has rather smaller diameter. For a routing algorithm to be deadlock-free in a torus, at least two virtual channels per physical channel must be used to avoid cyclic channel dependencies due to the warp-around links; however, in a mesh network deadlock freedom can be insured using only one virtual channel. The employed number of virtual channels is important since it has a direct effect on the power consumption of NoCs. In this paper, we propose a novel systematic approach for designing deadlock-free routing algorithms for torus NoCs. Using this method a new deterministic routing algorithm (called TRANC) is proposed that uses only one virtual channel per physical channel in torus NoCs. We also propose an algorithmic mapping that enables extracting TRANC-based routing algorithms from existing routing algorithms, which can be both deterministic and adaptive. The simulation results show power consumption and performance improvements when using the proposed algorithms.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 17

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy