SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L773:1939 8018 OR L773:1939 8115 "

Sökning: L773:1939 8018 OR L773:1939 8115

  • Resultat 1-10 av 35
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Alam, Syed Asad, et al. (författare)
  • Improved Particle Filter Resampling Architectures
  • 2020
  • Ingår i: Journal of Signal Processing Systems. - : SPRINGER. - 1939-8018 .- 1939-8115. ; 92:6, s. 555-568
  • Tidskriftsartikel (refereegranskat)abstract
    • The most challenging aspect of particle filtering hardware implementation is the resampling step. This is because of high latency as it can be only partially executed in parallel with the other steps of particle filtering and has no inherent parallelism inside it. To reduce the latency, an improved resampling architecture is proposed which involves pre-fetching from the weight memory in parallel to the fetching of a value from a random function generator along with architectures for realizing the pre-fetch technique. This enables a particle filter using M particles with otherwise streaming operation to get new inputs more often than 2M cycles as the previously best approach gives. Results show that a pre-fetch buffer of five values achieves the best area-latency reduction trade-off while on average achieving an 85% reduction in latency for the resampling step leading to a sample time reduction of more than 40%. We also propose a generic division-free architecture for the resampling steps. It also removes the need of explicitly ordering the random values for efficient multinomial resampling implementation. In addition, on-the-fly computation of the cumulative sum of weights is proposed which helps reduce the word length of the particle weight memory. FPGA implementation results show that the memory size is reduced by up to 50%.
  •  
2.
  • Alipour, Mehdi, et al. (författare)
  • Maximizing limited resources : A limit-based study and taxonomy of out-of-order commit
  • 2019
  • Ingår i: Journal of Signal Processing Systems. - : Springer Science and Business Media LLC. - 1939-8018 .- 1939-8115. ; 91:3-4, s. 379-397
  • Tidskriftsartikel (refereegranskat)abstract
    • Out-of-order execution is essential for high performance, general-purpose computation, as it can find and execute useful work instead of stalling. However, it is typically limited by the requirement of visibly sequential, atomic instruction executionin other words, in-order instruction commit. While in-order commit has a number of advantages, such as providing precise interrupts and avoiding complications with the memory consistency model, it requires the core to hold on to resources (reorder buffer entries, load/store queue entries, physical registers) until they are released in program order. In contrast, out-of-order commit can release some resources much earlier, yielding improved performance and/or lower resource requirements. Non-speculative out-of-order commit is limited in terms of correctness by the conditions described in the work of Bell and Lipasti (2004). In this paper we revisit out-of-order commit by examining the potential performance benefits of lifting these conditions one by one and in combination, for both non-speculative and speculative out-of-order commit. While correctly handling recovery for all out-of-order commit conditions currently requires complex tracking and expensive checkpointing, this work aims to demonstrate the potential for selective, speculative out-of-order commit using an oracle implementation without speculative rollback costs. Through this analysis of the potential of out-of-order commit, we learn that: a) there is significant untapped potential for aggressive variants of out-of-order commit; b) it is important to optimize the out-of-order commit depth for a balanced design, as smaller cores benefit from reduced depth while larger cores continue to benefit from deeper designs; c) the focus on implementing only a subset of the out-of-order commit conditions could lead to efficient implementations; d) the benefits of out-of-order commit increases with higher memory latency and in conjunction with prefetching; e) out-of-order commit exposes additional parallelism in the memory hierarchy.
  •  
3.
  • Asghar, Rizwan, et al. (författare)
  • Implementation of a Radix-4, Parallel Turbo Decoder and Enabling the Multi-Standard Support
  • 2012
  • Ingår i: Journal of Signal Processing Systems. - : Springer Verlag (Germany). - 1939-8018 .- 1939-8115. ; 66:1, s. 25-41
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents a unified, radix-4 implementation of turbo decoder, covering multiple standards such as DVB, WiMAX, 3GPP-LTE and HSPA Evolution. The radix-4, parallel interleaver is the bottleneck while using the same turbo-decoding architecture for multiple standards. This paper covers the issues associated with design of radix-4 parallel interleaver to reach to flexible turbo-decoder architecture. Radix-4, parallel interleaver algorithms and their mapping on to hardware architecture is presented for multi-mode operations. The overheads associated with hardware multiplexing are found to be least significant. Other than flexibility for the turbo decoder implementation, the low silicon cost and low power aspects are also addressed by optimizing the storage scheme for branch metrics and extrinsic information. The proposed unified architecture for radix-4 turbo decoding consumes 0.65 mm(2) area in total in 65 nm CMOS process. With 4 SISO blocks used in parallel and 6 iterations, it can achieve a throughput up to 173.3 Mbps while consuming 570 mW power in total. It provides a good trade-off between silicon cost, power consumption and throughput with silicon efficiency of 0.005 mm(2)/Mbps and energy efficiency of 0.55 nJ/b/iter.
  •  
4.
  • Asghar, Rizwan, 1973-, et al. (författare)
  • Memory Conflict Analysis and Implementation of a Re-configurable Interleaver Architecture Supporting Unified Parallel Turbo Decoding
  • 2010
  • Ingår i: Journal of Signal Processing Systems for Signal, Image, and Video Technology. - : Springer Science and Business Media LLC. - 1939-8018. ; 60:1, s. 15-29
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents a novel hardware interleaver architecture for unified parallel turbo decoding. The architecture is fully re-configurable among multiple standards like HSPA Evolution, DVB-SH, 3GPP-LTE and WiMAX. Turbo codes being widely used for error correction in today’s consumer electronics are prone to introduce higher latency due to bigger block sizes and multiple iterations. Many parallel turbo decoding architectures have recently been proposed to enhance the channel throughput but the interleaving algorithms used indifferent standards do not freely allow using them due to higher percentage of memory conflicts. The architecture presented in this paper provides a re-configurable platform for implementing the parallel interleavers for different standards by managing the conflicts involved in each. The memory conflicts are managed by applying different approaches like stream misalignment, memory division and use of small FIFO buffer. The proposed flexible architecture is low cost and consumes 0.085 mm2 area in 65nm CMOS process. It can implement up to 8 parallel interleavers and can operate at a frequency of 200 MHz, thus providing significant support to higher throughput systems based on parallel SISO processors.
  •  
5.
  • Ashjaei, Mohammad, et al. (författare)
  • Improved Message Forwarding for Multi-Hop HaRTES Real-Time Ethernet Networks
  • 2016
  • Ingår i: Journal of Signal Processing Systems. - : Springer Science and Business Media LLC. - 1939-8018 .- 1939-8115. ; 84:1, s. 47-67
  • Tidskriftsartikel (refereegranskat)abstract
    • Nowadays, switched Ethernet networks are used in complex systems that encompass tens to hundreds of nodes and thousands of signals. Such scenarios require multi-switch architectures where communications frequently occur in multiple hops. In this paper we investigate techniques to allow efficient multi-hop communication using HaRTES switches. These are modified Ethernet switches that provide real-time traffic scheduling, dynamic bandwidth management and temporal isolation between real-time and non-real-time traffic. This paper addresses the problem of forwarding traffic in HaRTES networks. Two methods have been recently proposed, namely Distributed Global Scheduling (DGS) that buffers traffic between switches, and Reduced Buffering Scheme (RBS), that uses immediate forwarding. In this paper, we discuss the design and implementation of RBS within HaRTES and we carry out an experimental validation with a prototype implementation. Then, we carry out a comparison between RBS and DGS using worst-case response time analysis and simulation. The comparison clearly establishes the superiority of RBS concerning end-to-end response times. In fact, with sample message sets, we achieved reductions in end-to-end delay that were as high as 80 %.
  •  
6.
  • Bhattacharyya, Shuvra S., et al. (författare)
  • Overview of the MPEG Reconfigurable Video Coding Framework
  • 2011
  • Ingår i: Journal of Signal Processing Systems. - : Springer Science and Business Media LLC. - 1939-8115 .- 1939-8018. ; 63:2, s. 251-263
  • Tidskriftsartikel (refereegranskat)abstract
    • Abstract in UndeterminedVideo coding technology in the last 20 yearshas evolved producing a variety of different and com-plex algorithms and coding standards. So far the speci-fication of such standards, and of the algorithms thatbuild them, has been done case by case providingmonolithic textual and reference software specifica-tions in different forms and programming languages.However, very little attention has been given to pro-vide a specification formalism that explicitly presentscommon components between standards, and the incre-mental modifications of such monolithic standards. TheMPEG Reconfigurable Video Coding (RVC) frame-work is a new ISO standard currently under its final stage of standardization, aiming at providing videocodec specifications at the level of library componentsinstead of monolithic algorithms. The new concept is tobe able to specify a decoder of an existing standard ora completely new configuration that may better satisfyapplication-specific constraints by selecting standardcomponents from a library of standard coding algo-rithms. The possibility of dynamic configuration andreconfiguration of codecs also requires new method-ologies and new tools for describing the new bitstreamsyntaxes and the parsers of such new codecs. TheRVC framework is based on the usage of a new actor/dataflow oriented language called Cal for the specifi-cation of the standard library and instantiation of theRVC decoder model. This language has been specifi-cally designed for modeling complex signal processingsystems. Cal dataflow models expose the intrinsic con-currency of the algorithms by employing the notionsof actor programming and dataflow. The paper givesan overview of the concepts and technologies buildingthe standard RVC framework and the non standardtools supporting the RVC model from the instantiationand simulation of the Cal model to software and/orhardware code synthesis.
  •  
7.
  • Canale, M., et al. (författare)
  • Dataflow Programs Analysis and Optimization Using Model Predictive Control Techniques : Two Examples of Bounded Buffer Scheduling: Deadlock Avoidance and Deadlock Recovery Strategies
  • 2016
  • Ingår i: Journal of Signal Processing Systems. - : Springer Science and Business Media LLC. - 1939-8018 .- 1939-8115. ; 84:3, s. 371-381
  • Tidskriftsartikel (refereegranskat)abstract
    • The analysis of the trace graphs generated by dataflow program executions has been shown to be an effective tool for exploring and optimizing the design space of application programs on manycore/multicore platforms. In this work a new approach aiming at finding bounded buffer size configurations for implementations generated by dataflow programs is presented. The introduced method is based on an original transformation procedure which converts the execution trace graph into an event driven linear system made up by a Petri Net. A control theoretic approach based on Model Predictive Control methodologies is then applied to the obtained Petri Net system in order to effectively explore the dataflow program design space and find nearly optimal buffer dimensioning solutions leading to a deadlock free program execution. Two real challenging design case examples, namely a JPEG and a MPEG HEVC decoder, are introduced to show the effectiveness of the introduced approach.
  •  
8.
  • Dubrova, Elena, et al. (författare)
  • Two Countermeasures Against Hardware Trojans Exploiting Non-Zero Aliasing Probability of BIST
  • 2016
  • Ingår i: Journal of Signal Processing Systems. - : Springer Science+Business Media B.V.. - 1939-8018 .- 1939-8115.
  • Tidskriftsartikel (refereegranskat)abstract
    • The threat of hardware Trojans has been widely recognized by academia, industry, and government agencies. A Trojan can compromise security of a system in spite of cryptographic protection. The damage caused by a Trojan may not be limited to a business or reputation, but could have a severe impact on public safety, national economy, or national security. An extremely stealthy way of implementing hardware Trojans has been presented by Becker et al. at CHES’2012. Their work have shown that it is possible to inject a Trojan in a random number generator compliant with FIPS 140-2 and NIST SP800-90 standards by exploiting non-zero aliasing probability of Logic Built-In-Self-Test (LBIST). In this paper, we present two methods for modifying LBIST to prevent such an attack. The first method makes test patterns dependent on a configurable key which is programed into a chip after the manufacturing stage. The second method uses a remote test management system which can execute LBIST using a different set of test patterns at each test cycle.
  •  
9.
  • Eghbali, Amir, et al. (författare)
  • Dynamic Frequency-Band Reallocation and Allocation : from Satellite-Based Communication Systems to Cognitive Radios
  • 2011
  • Ingår i: Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology. - : Springer. - 0922-5773 .- 1573-109X. ; 62:2, s. 187-203
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper discusses two approaches for the baseband processing part of cognitive radios. These approaches can be used depending on the availability of (i) a composite signal comprising several user signals or, (ii) the individual user signals. The aim is to introduce solutions which can support different bandwidths and center frequencies for a large set of users and at the cost of simple modifications on the same hardware platform. Such structures have previously been used for satellite-based communication systems and the paper aims to outline their possible applications in the context of cognitive radios. For this purpose, dynamic frequencyband allocation (DFBA) and reallocation (DFBR) structures based on multirate building blocks are introduced and their reconfigurability issues with respect to the required reconfigurability measures in cognitive radios are discussed.
  •  
10.
  • Ferreira, Lucas, et al. (författare)
  • Design of an Application-specific VLIW Vector Processor for ORB Feature Extraction
  • 2023
  • Ingår i: Journal of Signal Processing Systems. - : Springer Science and Business Media LLC. - 1939-8018 .- 1939-8115. ; 95:7, s. 863-875
  • Tidskriftsartikel (refereegranskat)abstract
    • In computer-vision feature extraction algorithms, compressing the image into a sparse set of trackable keypoints, empowers navigation-critical systems such as Simultaneous Localization And Mapping (SLAM) in autonomous robots, and also other applications such as augmented reality and 3D reconstruction. Most of those applications are performed in battery-powered gadgets featuring in common a very stringent power-budget. Near-to-sensor computing of feature extraction algorithms allows for several design optimizations. First, the overall on-chip memory requirements can be lessened, and second, the internal data movement can be minimized. This work explores the usage of an Application Specific Instruction Set Processor (ASIP) dedicated to perform feature extraction in a real-time and energy-efficient manner. The ASIP features a Very Long Instruction Word (VLIW) architecture comprising one RV32I RISC-V and three vector slots. The on-chip memory sub-system implements parallel multi-bank memories with near-memory data shuffling to enable single-cycle multi-pattern vector access. Oriented FAST and Rotated BRIEF (ORB) are thoroughly explored to validate the proposed architecture, achieving a throughput of 140 Frames-Per-Second (FPS) for VGA images for one scale, while reducing the number of memory accesses by 2 orders of magnitude as compared to other embedded general-purpose architectures.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 35
Typ av publikation
tidskriftsartikel (35)
Typ av innehåll
refereegranskat (32)
övrigt vetenskapligt/konstnärligt (3)
Författare/redaktör
Hemani, Ahmed, 1961- (4)
Janneck, Jörn (3)
Gustafsson, Oscar (2)
Nilsson, Peter (2)
Weis, Christian (2)
Wehn, Norbert (2)
visa fler...
Bengtsson, Lars, 195 ... (2)
Öwall, Viktor (2)
Liu, Dake (2)
Liu, Dake, 1957- (2)
Wu, Di, 1979- (2)
Eilert, Johan, 1980- (2)
Gustafsson, Oscar, 1 ... (1)
Åström, Kalle (1)
Johansson, Håkan (1)
Gustafsson, Mats (1)
Kaxiras, Stefanos (1)
Behnam, Moris (1)
Nolte, Thomas (1)
Smeets, Ben (1)
Bril, Reinder J. (1)
Persson, Andreas (1)
Larsson-Edefors, Per ... (1)
Almeida, Luis (1)
Jiang, Lili (1)
Dubrova, Elena (1)
Ul-Abdin, Zain, 1975 ... (1)
Svensson, Bertil, 19 ... (1)
Alam, Syed Asad (1)
Liu, Liang (1)
Hostettler, Roland (1)
Kessler, Christoph, ... (1)
Gu, Irene Yu-Hua, 19 ... (1)
Silva, Luís (1)
Alipour, Mehdi (1)
Carlson, Trevor E. (1)
Black-Schaffer, Davi ... (1)
Pedreiras, Paulo (1)
Stathis, Dimitrios (1)
Stenström, Per, 1957 (1)
Svensson, Lars, 1960 (1)
Näslund, Mats (1)
Carlsson, Gunnar (1)
Asghar, Rizwan, 1973 ... (1)
Asghar, Rizwan (1)
Wu, Di (1)
Saeed, Ali (1)
Huang, Yulin (1)
Ashjaei, Mohammad (1)
Malkowsky, Steffen (1)
visa färre...
Lärosäte
Linköpings universitet (10)
Lunds universitet (8)
Kungliga Tekniska Högskolan (6)
Chalmers tekniska högskola (4)
Uppsala universitet (3)
Högskolan i Halmstad (3)
visa fler...
Umeå universitet (2)
Stockholms universitet (1)
Mälardalens universitet (1)
visa färre...
Språk
Engelska (35)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (20)
Teknik (17)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy