SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Öwall Viktor) srt2:(2010-2014)"

Sökning: WFRF:(Öwall Viktor) > (2010-2014)

  • Resultat 1-49 av 49
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Akgun, OmerCan, et al. (författare)
  • High-level energy estimation in the sub-VT domain: simulation and measurement of a cardiac event detector
  • 2012
  • Ingår i: IEEE Transactions on Biomedical Circuits and Systems. - 1932-4545. ; 6:1, s. 15-27
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents a flow that is suitable to estimate energy dissipation of digital standard-cell based designs which are determined to be operated in the sub-threshold regime. The flow is applicable on gate-level netlists, where back-annotated toggle information is used to find the minimum energy operation point, corresponding maximum clock frequency, as well as the dissipated energy per clock cycle. The application of the model is demonstrated by exploring the energy efficiency of pipelining, retiming and register balancing. Simulation results, which are obtained during a fraction of SPICE simulation time, are validated by measurements on a wavelet based cardiac event detector that was fabricated in 65 nm low-leakage high-threshold technology. The mean of the absolute modeling error is calculated as 5.2 %, with a standard deviation of 6.6% over the measurement points. The cardiac event detector dissipates 0.88 pJ/sample at a supply voltage of 320mV.
  •  
2.
  • Al-Obaidi, Mohammed, et al. (författare)
  • Hardware Acceleration of the Robust Header Compression (RoHC) Algorithm
  • 2013
  • Ingår i: 2013 IEEE International Symposium on Circuits and Systems (ISCAS). - 2158-1525 .- 0271-4310. - 9781467357623 - 9781467357609 ; , s. 293-296
  • Konferensbidrag (refereegranskat)abstract
    • In LTE base-stations, RoHC is a processingintensive algorithm that may limit the system from serving a large number of users when it is used to compress the VoIP packets of mobile traffic. In this paper, a hardware-software and a full-hardware solution are proposed to accelerate the RoHC compression algorithm in LTE base-stations and enhance the system throughput and capacity. Results for both solutions are discussed and compared with respect to design metrics like throughput, capacity, power consumption, and hardware resources. This comparison is instrumental in taking architectural level trade-off decisions in-order to meet the present day requirements and also be ready to support a future evolution. In terms of throughput, a gain of 20% (6250 packets/sec) is achieved in the HW-SW implementation by accelerating the Cyclic Redundancy Check (CRC) and the Least Significant Bit (LSB) encoding in hardware. The full-HW implementation leads to a throughput of 45 times (244000 packets/sec) compared to the SW-Only implementation. The full-HW solution consumes more Adaptive Look-Up Tables (7477 ALUTs) compared to the HW-SW solution (2614 ALUTs) when synthesized on Altera’s Arria II GX FPGA.
  •  
3.
  • Anderson, John B, et al. (författare)
  • Faster-Than-Nyquist Signaling
  • 2013
  • Ingår i: Proceedings of the IEEE. - 0018-9219. ; 101:8, s. 1817-1830
  • Tidskriftsartikel (refereegranskat)abstract
    • In this paper, we survey Faster-than-Nyquist (FTN) signaling, an extension of ordinary linear modulation in which the usual data bearing pulses are simply sent faster, and consequently are no longer orthogonal. Far from a disadvantage, this innovation can transmit up to twice the bits as ordinary modulation at the same bit energy, spectrum, and error rate. The method is directly applicable to orthogonal frequency division multiplex (OFDM) and quadrature amplitude modulation (QAM) signaling. Performance results for a number of practical systems are presented. FTN signaling raises a number of basic issues in communication theory and practice. The Shannon capacity of the signals is considerably higher.
  •  
4.
  • Chakrabartty, Shantanu, et al. (författare)
  • Guest Editorial
  • 2011
  • Ingår i: IEEE Transactions on Biomedical Circuits and Systems. - 1932-4545. ; 5:2, s. 101-102
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)
  •  
5.
  • Dasalukunte, Deepak, et al. (författare)
  • A 0.8mm2 9.6mW Implementation of a Multicarrier Faster-Than-Nyquist Signaling Iterative Decoder in 65nm CMOS
  • 2012
  • Ingår i: [Host publication title missing]. - 1930-8833. - 9781467322126 ; , s. 173-176
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents a decoder for multi-carrier modulated signals employing Faster-than-Nyquist (FTN) signaling. FTN signaling is a method of improving bandwidth efficiency at the expense of higher processing complexity in the transceiver. The decoder can switch between orthogonal and FTN signaling modes and exploits channel properties to improve bandwidth efficiency. The decoder is fabricated in a 65nm CMOS process and occupies an area of 0.8mm2, with a power consumption of 9.6mW at 1.2V when clocked at 100MHz. To the best of our knowledge, those measurement results are from the first-ever silicon implementation of a decoder for FTN signaling.
  •  
6.
  • Dasalukunte, Deepak, et al. (författare)
  • An 0.8-mm(2) 9.6-mW Iterative Decoder for Faster-Than-Nyquist and Orthogonal Signaling Multicarrier Systems in 65-nm CMOS
  • 2013
  • Ingår i: IEEE Journal of Solid-State Circuits. - 0018-9200. ; 48:7, s. 1680-1688
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents an iterative decoder for faster-than-Nyquist (FTN) and orthogonal signaling multi-carrier systems. FTN signaling is a method of improving bandwidth efficiency at the expense of higher processing complexity in the transceiver. The decoder can switch between orthogonal and FTN signaling modes and exploits channel properties to improve bandwidth efficiency. The decoder is fabricated in a 65-nm CMOS process and occupies a total area of 0.8 mm(2) with decoder core taking up 0.567 mm(2). The power consumption of the chip is 9.6 mW at 1.2 V when clocked at 100 MHz, providing a peak information throughput of 1 Mbps and with an energy efficiency of 0.6 nJ per bit per iteration. To the best of our knowledge, those measurement results are from the first ever silicon implementation of a decoder for FTN signaling.
  •  
7.
  •  
8.
  • Dasalukunte, Deepak, et al. (författare)
  • Complexity analysis of IOTA filter architectures in Faster-than-Nyquist multicarrier systems
  • 2011
  • Ingår i: [Host publication title missing]. - 9781457705144
  • Konferensbidrag (refereegranskat)abstract
    • This paper has evaluated the overhead requirements for IOTA pulse shaping filters employed in faster-than-Nyquist multicarrier systems. Faster-than-Nyquist signaling has shown the promise of improving bandwidth efficiency, but comes at the cost of increased processing complexity in the transceiver. The IOTA filter being one of the blocks contributing for the processing overhead, different architectural options have been evaluated. A comparison is drawn between the architectures of the IOTA filter and the suitable architecture with moderate hardware overhead is chosen for implementation.
  •  
9.
  • Dasalukunte, Deepak, et al. (författare)
  • Design and Implementation of Iterative Decoder for Faster-than-Nyquist Signaling Multicarrier systems
  • 2011
  • Ingår i: [Host publication title missing]. - 2159-3477. ; , s. 359-360
  • Konferensbidrag (refereegranskat)abstract
    • Abstract in UndeterminedFaster-than-Nyquist (FTN) signaling is a method of improving bandwidth efficiency by transmitting information beyond Nyquist's orthogonality limit for interference free transmission. Previously have theoretically established that FTN can provide improved bandwidth efficiency. However, this comes at the cost of higher decoding complexity at the receiver. Our work has evaluated multicarrier FTN signaling for its implementation feasibility and complexity overhead compared to the gains in bandwidth efficiency. The work carried out in this research project includes a systems perspective evaluating performance, algorithm hardware tradeoffs and a hardware architecture leading to a silicon implementation of the decoder for FTN signaling. From the systems perspective, co-existence of FTN and OFDM based multicarrier system has been evaluated. OFDM being a part of many existing and upcoming broadband access technologies such as WLAN, LTE, DVB, this analogy is motivated. On the hardware aspect, the proposed architecture can accommodate both OFDM and FTN systems. The processing blocks in transmitter and receiver were designed for reuse and carry out different functions in the transceiver. Furthemore, the hardware could be configured to operate at varying bandwidth efficiencies (by FTN signaling) to exploit the channel conditions. The decoder implementation also considered block sizes and data rates to comply with the 3GPP standard. The decoding is carried out in as few as 8 iterations making it more practical for implementation in power constrained mobile devices. The decoder is implemented in 65nm CMOS process and occupies a total chip area of 0.8mm2.
  •  
10.
  •  
11.
  • Dasalukunte, Deepak, et al. (författare)
  • Improved memory architecture for multicarrier faster-than-Nyquist iterative decoder
  • 2011
  • Ingår i: [Host publication title missing]. ; , s. 296-300
  • Konferensbidrag (refereegranskat)abstract
    • Architectural improvements for a multicarrier faster-than-Nyquist (FTN) decoder are presented in this work. A previously designed FTN decoder has been optimized during implementation, especially with respect to memory considerations to reduce area and power. The memory optimized architecture achieves 28.7% savings in overall chip area and provides 43.8% savings in the estimated power compared to the pre-optimized design. The BER performance tradeoff from one of the memory optimization shows that the degradation is acceptable and can actually provide better performance for certain scenarios. The other memory optimization considers the minimal buffering required within the interference canceller, resulting in memory reduction close to 50% of what was previously reported. The performance from the actual RTL implementation of the FTN decoder is also presented in comparison with the floating and fixed point benchmark performances.
  •  
12.
  • Dasalukunte, Deepak, et al. (författare)
  • Multicarrier faster-than-Nyquist transceivers: hardware architecture and performance analysis
  • 2011
  • Ingår i: IEEE Transactions on Circuits and Systems Part 1: Regular Papers. - 1549-8328. ; 58:4, s. 827-838
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper evaluates the hardware aspects of multicarrier faster-than-Nyquist (FTN) signaling transceivers. The choice of time-frequency spacing of the symbols in an FTN system for improved bandwidth efficiency is targeted towards efficient hardware implementation. This work proposes a hardware architecture for the realization of iterative decoding of FTN multicarrier modulated signals. Compatibility with existing systems has been considered for smooth switching between the faster-than-Nyquist and orthogonal signaling schemes. One such being the use of FFTs for multicarrier modulation. The performance of the fixed point model is very close to that of the floating point representation. The impact of system parameters such as number of projection points, time-frequency spacing, finite wordlengths and their design trade-offs for reduced complexity iterative decoders in FTN systems have been investigated. The FTN decoder has been designed and synthesized in both 65nm CMOS and FPGA. From the hardware resource usage numbers it can be concluded that FTN signaling can be used to achieve higher bandwidth efficiency with acceptable complexity overhead.
  •  
13.
  • Diaz, Isael, et al. (författare)
  • A sign-bit auto-correlation architecture for fractional frequency offset estimation in OFDM
  • 2010
  • Ingår i: ; , s. 3765-3768
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents an architecture of an auto-correlator for Orthogonal Frequency Division Multiplexing systems. The received signal is quantized to only the sign-bit, which dramatically simplifies the frequency offset estimation. Hardware cost is reduced under the assumption that synchronization during acquisition does not have to be very accurate, but sufficient for coarse estimation. The architecture is synthesized towards a 65nm low-leakage high threshold standard cell CMOS library. The proposed architecture results in area reduction of 93% if compared to typical 8-bit implementation. The area occupied by the architecture is 0.063mm^2. The architecture is evaluated for WLAN, LTE and DVB-H. Power simulations for DVB-H transmission shows a power consumption of 4.8uW per symbol.
  •  
14.
  • Diaz, Isael, et al. (författare)
  • Next Generation Digital Front-End for Multi-Standard Concurrent Reception
  • 2013
  • Ingår i: [Host publication title missing].
  • Konferensbidrag (refereegranskat)abstract
    • This article presents an architecture of a Digital Front-End Receiver (DFE-Rx) for the next-generation mobile terminals. A main focus is placed in flexibility, scalability and concurrency. The architecture is capable of detecting, synchronizing and reporting carrier-frequency offset, of multiple concurrent radio standards. The proposed receiver is fabricated in a 65nm CMOS low power high-VT cell technology in a die size of 5mm2. The synchronization engine has been measured at 1.2V and reports an average power consumption of 1.9mW during IEEE 802.11 reception and 1.6mW during configuration, while running at 10MHz.
  •  
15.
  • Diaz, Isael, et al. (författare)
  • Selective Channelization on an SDR Platform for LTE-A Carrier Aggregation.
  • 2012
  • Ingår i: 19th IEEE International Conference on Electronics, Circuits and Systems (ICECS), 2012. - 9781467312615 ; , s. 316-319
  • Konferensbidrag (refereegranskat)abstract
    • The total transmission bandwidth and component carrier aggregation proposed by LTE-Advanced, sets a new challenge to the design of terminals. This article presents a way to assure terminals cope with the large bandwidth in an efficient manner. Various filtering methods are explored showing that an SDR architecture, such as ADRES (Architecture for Dynamically Reconfigurable Embedded Systems), is suitable for dynamic adaptation of filtering methods as function of the aggregation scheme and the individual bandwidth assigned to each terminal. This method is able to reduce the processing load by 70% for LTE-A with legacy support and possibly higher reduction when LTE legacy is not supported. Simulations conclude that the performance loss derived from the proposed method is marginal with no negative repercussion on the posterior baseband stages.
  •  
16.
  • Eilert, Johan (författare)
  • ASIP for Wireless Communication and Media
  • 2010
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • While general purpose processors reach both high performance and high application flexibility, this comes at a high cost in terms of silicon area and power consumption. In systems where high application flexibility is not required, it is possible to trade off flexibility for lower cost by tailoring the processor to the application to create an Application Specific Instruction set Processor (ASIP) with high performance yet low silicon cost. This thesis demonstrates how ASIPs with application specific data types can provide efficient solutions with lower cost. Two examples are presented, an audio decoder ASIP for audio and music processing and a matrix manipulation ASIP for MIMO radio baseband signal processing. The audio decoder ASIP uses a 16-bit floating point data type to reduce the size of the data memory to about 60% of other solutions that use a 32-bit data type. Since the data memory occupies a major part of the silicon area, this has a significant impact on the total silicon area, and thereby also the static and dynamic power consumption. The data width reduction can be done without any noticeable artifacts in the decoded audio due to the natural masking effect ofthe human ear. The matrix manipulation SIMD ASIP is designed to perform various matrix operations such as matrix inversion and QR decomposition of small complex-valued matrices. This type of processing is found in MIMO radio baseband signal processing and the matrices are typically not larger than 4x4. There have been solutions published that use arrays of fixed-function processing elements to perform these operations, but the proposed ASIP performs the computations in less time and with lower hardware cost. The matrix manipulation ASIP data path uses a floating point data type to avoid data scaling issues associated with fixed point computations, especially those related to division and reciprocal calculations, and it also simplifies the program control flow since no special cases for certain inputs are needed which is especially important for SIMD architectures. These two applications were chosen to show how ASIPs can be a suitable alternative and match the requirements for different types of applications, to provide enough flexibility and performance to support different standards and algorithms with low hardware cost.
  •  
17.
  • Fu, Siyuan, et al. (författare)
  • Generalized lock-in amplifier for precision measurement of high frequency signals
  • 2013
  • Ingår i: Review of Scientific Instruments. - : AIP Publishing. - 1089-7623 .- 0034-6748. ; 84:11
  • Tidskriftsartikel (refereegranskat)abstract
    • We herein formulate the concept of a generalized lock-in amplifier for the precision measurement of high frequency signals based on digital cavities. Accurate measurement of signals higher than 200 MHz using the generalized lock-in is demonstrated. The technique is compared with a traditional lock-in and its advantages and limitations are discussed. We also briefly point out how the generalized lock-in can be used for precision measurement of giga-hertz signals by using parallel processing of the digitized signals.
  •  
18.
  • Granlund, Stefan, et al. (författare)
  • Implementation of a Highly-Parallel Soft-Output MIMO Detector with Fast Node Enumeration
  • 2013
  • Ingår i: [Host publication title missing].
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents a high throughput, low latency soft-output signal detector for a 4×4 64-QAM MIMO system. To achieve high data-level parallelism and accurate soft information, the detector adopts a node perturbation technique to generate a list of candidate vectors around Zero Forcing, ZF, result. Additionally a fast and hardware friendly node enumeration scheme is developed to significantly reduce processing delay. Implemented using a 65nm CMOS technology, the detector occupies 0.58mm2 core area with 290K gates. The peak throughput is 3Gb/s at 500 MHz clock frequency with a latency of 20ns. Energy consumption per detected bit is 33pJ.
  •  
19.
  • Jin, Aohan, et al. (författare)
  • High precision measurements using high frequency gigahertz signals
  • 2014
  • Ingår i: Review of Scientific Instruments. - : AIP Publishing. - 1089-7623 .- 0034-6748. ; 85:12
  • Tidskriftsartikel (refereegranskat)abstract
    • Generalized lock-in amplifiers use digital cavities with Q-factors as high as 5 × 108 to measure signals with very high precision. In this Note, we show that generalized lock-in amplifiers can be used to analyze microwave (giga-hertz) signals with a precision of few tens of hertz. We propose that the physical changes in the medium of propagation can be measured precisely by the ultra-high precision measurement of the signal. We provide evidence to our proposition by verifying the Newton's law of cooling by measuring the effect of change in temperature on the phase and amplitude of the signals propagating through two calibrated cables. The technique could be used to precisely measure different physical properties of the propagation medium, for example, the change in length, resistance, etc. Real time implementation of the technique can open up new methodologies of in situ virtual metrology in material design.
  •  
20.
  • Kamuf, Matthias, et al. (författare)
  • Design and Measurement of a Variable-Rate Viterbi Decoder in 130-nm Digital CMOS
  • 2010
  • Ingår i: Microprocessors and Microsystems. - : Elsevier BV. - 0141-9331. ; 34:2010, s. 129-137
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper discusses design and measurements of a flexible Viterbi decoder fabricated in 130-nm digital CMOS. Flexibility was incorporated by providing various code rates and modulation schemes to adjust to varying channel conditions. Based on previous trade-off studies, flexible building blocks were carefully designed to cause as little area penalty as possible. The chip runs down to a minimal core supply of 0.8V. It turns out that striving for more modulation schemes is beneficial in terms of power consumption once the price is paid for accepting different code rates viz. radices in the trellis and survivor path units.
  •  
21.
  • Liu, Liang, et al. (författare)
  • VLSI Implementation of a Soft-Output Signal Detector for Multi-Mode Adaptive MIMO Systems
  • 2013
  • Ingår i: IEEE Transactions on Very Large Scale Integration (VLSI) Systems. - 1063-8210. ; 21:12, s. 2262-2273
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents a multimode soft-output multiple-input multiple-output (MIMO) signal detector that is efficient in hardware cost and energy consumption. The detector is capable of dealing with spatial-multiplexing (SM),break space-division-multiple-access (SDMA), and spatial-diversity (SD) signals of 4 ✕ 4 antenna and 64-QAM modulation. Implementation-friendly algorithms, which reuse most of the mathematical operations in these three MIMO modes, are proposed to provide accurate soft detection information, i.e., log-likelihood ratio, with much reduced complexity. A unified reconfigurable VLSI architecture has been developed to eliminate the implementation of multiple detector modules. In addition, several block level technologies, such as parallel metric update and fast bit-flipping, are adopted to enable a more efficient design. To evaluate the proposed techniques, we implemented the triple-mode MIMO detector in a 65-nm CMOS technology. The core area is 0.25 mm2 with 83.7 K gates. The maximum detecting throughput is 1 Gb/s at 167-MHz clock frequency and 1.2-V supply, which archives the data rate envisioned by the emerging long-term evolution advanced standard. Under frequency-selective channels, the detector consumes 59.3-, 10.5-, and 169.6-pJ energy per bit detection in SM, SD, and SDMA modes, respectively.
  •  
22.
  •  
23.
  • Mehmood, Shahid, et al. (författare)
  • Hardware architecture of IOTA pulse shaping filters for multicarrier systems
  • 2013
  • Ingår i: IEEE Transactions on Circuits and Systems Part 1: Regular Papers. - 1549-8328. ; 60:3, s. 733-742
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents a hardware architecture of pulse shaping filter used in multicarrier systems. The filter can be configured to be used for both transmitter and receiver with limited overhead. Generic implementation complexity analysis for a filter in a multicarrier system with N sub-carriers is presented, while the implemented architecture is for a system with 128 sub-carriers. The pulse shaping filter is part of a larger system based on faster-than-Nyquist signaling and aided in an overall complexity reduction. Hence designing an efficient hardware architecture to keep the overhead moderate was the motivation behind this work. Architectural optimizations has been carried out in order to reduce area and power. The implementation of the proposed hardware architecture was carried out using a 65nm CMOS process. The chip core occupies an area of 0.11mm2 and is estimated to consume 14.4mW of power when running at 200MHz.
  •  
24.
  • Meraji, Reza, et al. (författare)
  • A 3 mu W 500 kb/s Ultra Low Power Analog Decoder with Digital I/O in 65 nm CMOS
  • 2013
  • Ingår i: 2013 IEEE 20th International Conference on Electronics, Circuits, and Systems (ICECS). - 9781479924523 ; , s. 349-352
  • Konferensbidrag (refereegranskat)abstract
    • Measurement results of an analog channel decoder in 65 nm CMOS are presented. We target ultra compact and low power applications with low to medium throughput requirements. The decoding core is designed for (7,5)(8) convolutional codes and takes 0.104 mm(2) on silicon. The degrading effects of analog imperfections are investigated and the presented results allow power, performance and throughput trade-offs. Analyzing the bit error rate (BER) performance under extreme power constraints provides insights on energy efficiency and limitations of small scale analog decoders. For the limited power budget of 3 W the decoder performs the required computations to provide 1 dB of coding gain at BER=0.001 for 500 kb/s throughput. The presented chip has digital I/O that facilitates embedding it in a conventional digital receiver.
  •  
25.
  •  
26.
  •  
27.
  • Meraji, Reza, et al. (författare)
  • An Analog (7,5) Convolutional Decoder in 65 nm CMOS for Low Power Wireless Applications
  • 2011
  • Ingår i: [Host publication title missing]. - 2158-1525 .- 0271-4310. - 9781424494736 ; , s. 2881-2884
  • Konferensbidrag (refereegranskat)abstract
    • A complete architecture with transistor level simulation is presented for a low power analog convolutional decoder in 65 nm CMOS. The decoder core operates in the weak inversion (sub-VT) and realizes the BCJR decoding algorithm corresponding to the 4-state tail-biting trellis of a (7,5) convolutional code. The complete decoder also incorporates serial I/O digital interfaces and current mode differential DACs. The simulated bit error rate is presented to illustrate the coding gain compared to an uncoded system. Our results show that a low power, high throughput convolutional decoder up to 1.25 Mb/s can be implemented using analog circuitry with a total power consumption of 84 μW. For low rate applications the decoder consumes only 47 μW at a throughput of 250 kb/s.
  •  
28.
  •  
29.
  •  
30.
  • Meraji, Reza, et al. (författare)
  • Low power analog channel decoder in sub-threshold 65nm CMOS
  • 2010
  • Konferensbidrag (övrigt vetenskapligt/konstnärligt)abstract
    • This paper presents the architecture and the corresponding simulation results for a very low power half-rate extended Hamming (8,4) decoder implemented in analog integrated circuitry. TI’s 65nm low power CMOS design library was used to simulate the complete decoder including an input interface, an analog decoding core and an output interface. The simulated bit error rate (BER) performance of the decoder is presented and compared to the ideal performance expected from the Hamming code. Transistor-level simulation results suggest that a high throughput Hamming decoder up to 1 Mbits can be implemented in analog circuits with a core power consumption as low as 6 μW.
  •  
31.
  • Meraji, Reza, et al. (författare)
  • Transistor sizing for a 4-state current mode analog channel decoder in 65-nm CMOS
  • 2011
  • Ingår i: [Host publication title missing]. - 9781457705144 ; , s. 1-4
  • Konferensbidrag (refereegranskat)abstract
    • Analog decoders are constructed based on interconnecting CMOS Gilbert vector multipliers using transistors operating in the sub-VT region. They are seen as an interesting alternative to digital implementations with a low transistor count and a potential for a very low power consumption. Analog implementation makes the circuit sensitive to mismatch, requiring careful transistor sizing. A simulation technique combining Monte-Carlo analysis in Spectre with Matlab processing has therefore been used to investigate transistor sizing for an analog (7,5) convolutional decoder. The simulation results indicate that with a tail-biting trellis circle size 14 with transistor size W/L = 1.0μm/0.6μm, the decoder can offer close to maximum coding gain while operating on very low currents when implemented in 65-nm CMOS technology.
  •  
32.
  • Nilsson, Peter, et al. (författare)
  • Lessons from Ten Years of the International Master’s Program in System-on-Chip
  • 2014
  • Konferensbidrag (refereegranskat)abstract
    • In July 2000 the five-year Swedish national “Socware Research & Education Program” was started. One of the aims of the program was to develop an innovative unique educational curriculum in System-on-Chip design. The program was targeted at undergraduate and graduate. In total the program received USD $15 million funding. In 2005 the program entered a new phase; more than 500 Master’s students were admitted and 30 PhD students were funded. Cooperation between the System-on-Chip Master’s programs in Lund, Linköping, and Stockholm was already well- established and the program continued in all three locations as an international Master of Science program in System-on-Chip, with local funding from the participating universities. Between 2003 and 2013 there were 3500 applicants to the program in Lund, an average of 350 applicants per year, of these 250 (8%) were accepted. This paper focuses on the international Master of Science Program in System-on-Chip at Lund University, Sweden.
  •  
33.
  • Ren, Fengbo, et al. (författare)
  • A Square-Root-Free Matrix Decomposition Method for Energy-Efficient Least Squares Computation on Embedded Systems
  • 2014
  • Ingår i: IEEE Embedded Systems Letters. - 1943-0663. ; 6:4, s. 73-76
  • Tidskriftsartikel (refereegranskat)abstract
    • QR decomposition (QRD) is used to solve least squares (LS) problems for a wide range of applications. However, traditional QR decomposition methods, such as Gram-Schmidt (GS), require high computational complexity and non-linear operations to achieve high throughput, limiting their usage on resource-limited platforms. To enable efficient LS computation on embedded systems for real-time applications, this paper presents an alternative decomposition method, called QDRD, which relaxes system requirements while maintaining the same level of performance. Specifically, QDRD eliminates both the square-root operations in the normalization step and the divisions in the subsequent backward substitution. Simulation results show that the accuracy and reliability of factorization matrices can be significantly improved by QDRD, especially when executed on precision-limited platforms. Furthermore, benchmarking results on an embedded platform show that QDRD provides constantly better energy-efficiency and higher throughput than GS-QRD in solving LS problems. Up to 4 and 6.5 times improvement in energy-efficiency and throughput respectively can be achieved for small-size problems.
  •  
34.
  • Rodrigues, Joachim, et al. (författare)
  • A
  • 2010
  • Ingår i: Proceedings of the 2010 18TH IEEE/IFIP International Conference on VLSI and System-on-Chip. - 9781424464692 ; , s. 253-258
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents the hardware implementation of a wavelet based event detector for cardiac pacemakers. A high level energy estimation flow was applied to evaluate energy efficiency of standard-cell based designs, over several CMOS technology generations, from 180 to 65 nm, operated in the sub-threshold domain. The simulation results indicate a 65 nm low-leakage high-threshold (LL-HVT) CMOS technology as the favourable choice. Accordingly, the design was fabricated in 65 nm LL-HVT CMOS. Measurements validate the simulation results and prove that the circuit is fully functional down to a supply voltage of 250mV. At the energy minimum voltage of 320mV the circuit dissipates 0.88 pJ per sample at a clock rate of 20 kHz.
  •  
35.
  • Rodrigues, Joachim, et al. (författare)
  • Energy dissipation reduction of a cardiac event detector in the sub-Vt domain by architectural folding
  • 2010
  • Ingår i: Integrated Circuit and System Design: Power and Timing Modeling, Optimization and Simulation. - 1611-3349 .- 0302-9743. ; 5953, s. 347-356
  • Konferensbidrag (refereegranskat)abstract
    • This manuscript presents the digital hardware realization of a wavelet based event detector for cardiac pacemaker applications. The architecture of the detector is partially folded to minimize hardware cost. An energy model is applied to evaluate the energy efficiency the sub-threshold (sub-VT ) domain. The design is synthesized in 65nm low leakage-high threshold CMOS technology, and it is shown that folding reduces the area cost by 30.6 %. Folding decreases energy dissipation of the circuit by 14.4% in the sub-VT regime, where the circuit dissipates 3.3 pJ per sample at VDD=0.26 V.
  •  
36.
  •  
37.
  • Sjöland, Henrik, et al. (författare)
  • Ultra low power transceivers for wireless sensors and body area networks
  • 2014
  • Ingår i: 2014 8th International Symposium on Medical Information and Communication Technology (ISMICT). - 2326-828X. - 9781479948567
  • Konferensbidrag (refereegranskat)abstract
    • A transceiver suitable for devices in wireless body area networks is presented. Stringent requirements are imposed by the high link loss between opposite sides of the body, about 85 dB in the 2.45 GHz ISM band. Despite this, minimum physical size and power consumption are required, and we target a transceiver with 1 mm2 chip area, 1 mW active power consumption, and data rate 250 kbit/s. The receiver is fully integrated., fabricated and measured in 65-nm CMOS, and size and power consumption are carefully considered at all levels of circuit and system design. The modulation is frequency shift keying, chosen because transmitters can be realized with high efficiency and low spurious emissions; a modulation index 2 creates a midchannel spectral notch. A direct-conversion receiver achieves minimum power consumption. A tailored demodulation structure makes the digital baseband compact and low power. The channel decoder has been implemented in both analog and digital domains to find the most power efficient solution. Antenna design and wave propagation are studied via simulations with phantoms. The 2.45 GHz ISM band was chosen as a good compromise between antenna size and link loss. An ultra-low power medium access scheme based on a duty-cycled wake-up receiver is designed.
  •  
38.
  • Stala, Michal, et al. (författare)
  • Area and Power Reduction in DFT Based Channel Estimators for OFDM Systems
  • 2013
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents a new Hardware (HW) im- plementation proposal for Discrete Fourier Transform (DFT) based channel estimators. The presented algorithm uses the high time correlation property of the channel estimates to reduce the complexity and the power consumption by utilizing a lower number of bits for the FFT in the channel estimator, compared to a traditional approach. The idea is that the channel estimator processes the the difference between channel estimates from two Orthogonal Frequency Division Multiplexing (OFDM) symbols. The paper shows that the resulting HW could be reduced by 30 percent for logic and 15 percent for memory without performance loss in an Long Term Evolution (LTE) channel with up to 300Hz Doppler. The algorithm has been tested in realistic environments with 3GPP channel models.
  •  
39.
  • Stala, Michal, et al. (författare)
  • Implementation of a Novel Architecture for DFT-based Channel Estimators in OFDM Systems
  • 2014
  • Ingår i: [Host publication title missing].
  • Konferensbidrag (refereegranskat)abstract
    • A new architecture for Discrete Fourier Transform (DFT) based channel estimation has been analyzed, implemented and synthesized for ASIC. The core concept of the proposed esti- mation algorithm is to process the channel increments rather than the channel coefficients. With strong enough time correlation, we can reduce the wordlength of processing blocks compared to standard channel estimators and hence the resulting area and power. We provide an analytical tool to predict the potential gains in bit reduction for different mobility scenarios. Our simulations show that the wordlength can be reduced from 9 to 3 bits when operating in low mobility scenarios, with 5Hz Doppler frequency, while maintaining acceptable performance. Synthesis results show up to 40% reduction in area, compared to the original DFT-based approach, in a 65nm CMOS process.
  •  
40.
  • Vieira, Joao, et al. (författare)
  • A flexible 100-antenna testbed for Massive MIMO
  • 2014
  • Konferensbidrag (refereegranskat)abstract
    • Massive multiple-input multiple-output (MIMO) is one of the main candidates to be included in the fifth generation (5G) cellular systems. For further system development it is desirable to have real-time testbeds showing possibilities and limitations of the technology. In this paper we describe the Lund University Massive MIMO testbed – LuMaMi. It is a flexible testbed where the base station operates with up to 100 coherent radio-frequency transceiver chains based on software radio technology. Orthogonal Frequency Division Multiplex (OFDM) based signaling is used for each of the 10 simultaneous users served in the 20 MHz bandwidth. Real time MIMO precoding and decoding is distributed across 50 Xilinx Kintex-7 FPGAs with PCI-Express interconnects. The unique features of this system are: (i) high throughput processing of 384 Gbps of real time baseband data in both the transmit and receive directions, (ii) low-latency architecture with channel estimate to precoder turnaround of less than 500 micro seconds, and (iii) a flexible extension up to 128 antennas. We detail the design goals of the testbed, discuss the signaling and system architecture, and show initial measured results for a uplink Massive MIMO over-the-air transmission from four single-antenna UEs to 100 BS antennas.
  •  
41.
  • Wang, Jian, 1982- (författare)
  • Low Overhead Memory Subsystem Design for a Multicore Parallel DSP Processor
  • 2014
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • The physical scaling following Moore’s law is saturated while the requirement on computing keeps growing. The gain from improving silicon technology is only the shrinking of the silicon area, and the speed-power scaling has almost stopped in the last two years. It calls for new parallel computing architectures and new parallel programming methods.Traditional ASIC (Application Specific Integrated Circuits) hardware has been used for acceleration of Digital Signal Processing (DSP) subsystems on SoC (System-on-Chip). Embedded systems become more complicated, and more functions, more applications, and more features must be integrated in one ASIC chip to follow up the market requirements. At the same time, the product lifetime of a SoC with ASIC has been much reduced because of the dynamic market. The life time of the design for a typical main chip in a mobile phone based on ASIC acceleration is about half a year and the NRE (Non-Recurring Engineering) cost of it can be much more than 50 million US$.The current situation calls for a new solution than that of ASIC. ASIP (Application Specific Instruction set Processor) offers comparable power consumption and silicon cost to ASICs. Its greatest advantage is the functional flexibility in a predefined application domain. ASIP based SoC enables software upgrading without changing hardware. Thus the product life time can be 5-10 times more than that of ASIC based SoC.This dissertation will present an ASIP based SoC, a new unified parallel DSP subsystem named ePUMA (embedded Parallel DSP Platform with Unique Memory Access), to target embedded signal processing in  communication and multimedia applications. The unified DSP subsystem can further reduce the hardware cost, especially the memory cost, of embedded SoC processors, and most importantly, provide full programmability for a wide range of DSP applications. The ePUMA processor is based on a master-slave heterogeneous multi-core architecture. One master core performs the central control, and multiple Single Instruction Multiple Data (SIMD) coprocessors work in parallel to offer a majority of the computing power.The focus and the main contribution of this thesis are on the memory subsystem design of ePUMA. The multi-core system uses a distributed memory architecture based on scratchpad memories and software controlled data movement. It is suitable for the data access properties of streaming applications and the kernel based multi-core computing model. The essential techniques include the conflict free access parallel memory architecture, the multi-layer interconnection network, the non-address stream data transfer, the transitioned memory buffers, and the lookup table based parallel memory addressing. The goal of the design is to minimize the hardware cost, simplify the software protocol for inter-processor communication, and increase the arithmetic computing efficiency.We have so far proved by applications that most DSP algorithms, such as filters, vector/matrix operations, transforms, and arithmetic functions, can achieve computing efficiency over 70% on the ePUMA platform. And the non-address stream network provides equivalent communication bandwidth by less than 30% implementation cost of a crossbar interconnection.
  •  
42.
  • Wilhelmsson, Leif, et al. (författare)
  • Analysis of a novel low complex SNR estimation technique for OFDM systems
  • 2011
  • Ingår i: IEEE Wireless Communications and Networking Conference (WCNC). - 9781612842547 ; , s. 1646-1651
  • Konferensbidrag (refereegranskat)abstract
    • Signal-to-noise ratio (SNR) estimation is commonly used in wireless receivers to enhance the performance in different ways. In this paper a novel low complexity SNR estimator for OFDM is proposed. The estimator might be implemented using floating point representation or by using only the sign-bit, and can if desired be effectively implemented by reconfiguring the standard correlator used for time- and frequency estimation. Closed form expressions for the SNR estimate are derived for both the floating point implementation and the sign-bit implementation, and compared to simulation results both for an additive white Gaussian noise (AWGN) channel and for a frequency selective channel showing the feasibility of the proposed algorithm.
  •  
43.
  • Wilhelmsson, Leif, et al. (författare)
  • Performance analysis of sign-based pre-FFT synchronization in OFDM systems
  • 2010
  • Ingår i: IEEE Vehicular Technology Conference. - 1550-2252. - 9781424425198
  • Konferensbidrag (refereegranskat)abstract
    • This paper treats the feasibility to use only the sign bit of the in-phase and quadrature components when estimating time and frequency in OFDM systems. Using only the sign bit is shown to result in a frequency dependent bias, which can be easily compensated. The approach is evaluated for LTE and DVB-H, where the estimation is performed using the cyclic prefix, and for WLAN 802.11g, where the estimation is done using the short training field (STF). The performance is compared to a floating point implementation, and it is also compared to what is believed to be reasonable requirements for initial time and frequency estimation.
  •  
44.
  • Zhang, Chenxin, et al. (författare)
  • A Highly Parallelized MIMO Detector for Vector-Based Reconfigurable Architectures
  • 2013
  • Ingår i: [Host publication title missing]. ; , s. 3844-3849
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents a highly parallelized MIMO signal detection algorithm targeting vector-based reconfigurable architectures. The detector achieves high data-level parallelism and near-ML performance by adopting a vector-architecture-friendly technique - parallel node perturbation. To further reduce the computational complexity, imbalanced node and successive partial node expansion schemes in conjunction with sorted QR decomposition are applied. The effectiveness of the proposed algorithm is evaluated by simulations performed on a simplified 4x4 MIMO LTE-A testbed and operation analysis. Compared to the K-Best detector and fixed-complexity sphere decoder (FSD), the number of visited nodes in the proposed algorithm is reduced by 15 and 1.9 times respectively, with less than 1dB performance degradation. Benefiting from the fully deterministic non-iterative dataflow structure, reconfiguration rate is 95% less than that of the K-Best detector and 17% less than the case of FSD.
  •  
45.
  • Zhang, Chenxin, et al. (författare)
  • Energy Efficient MIMO Channel Pre-processor Using a Low Complexity On-Line Update Scheme
  • 2012
  • Ingår i: [Host publication title missing].
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents a low-complexity energy efficient channel pre-processing update scheme, targeting the emerging 3GPP long term evolution advanced (LTE-A) downlink. Upon channel matrix renewals, the number of explicit QR decompositions (QRD) and channel matrix inversions are reduced since only the upper triangular matrices R and R^-1 are updated, based on an on-line update decision mechanism. The proposed channel pre-processing updater has been designed as a dedicated unit in a 65nm CMOS technology, resulting in a core area of 0.242mm2 (equivalent gate count of 116K). Running at a 330MHz clock, each QRD or R^-1 update consumes 4 or 2 times less energy compared to one exact state-of-the-art QRD in open literature.
  •  
46.
  • Zhang, Chenxin, et al. (författare)
  • Energy Efficient SQRD Processor for LTE-A using a Group-sort Update Scheme
  • 2014
  • Ingår i: [Host publication title missing]. - 2158-1525 .- 0271-4310. - 9781479934317 ; , s. 193-196
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents an energy-efficient sorted QR decomposition (SQRD) processor for 3GPP LTE-Advanced (LTE-A) systems. The processor adopts a hybrid decomposition scheme to reduce computational complexity and provides a wide-range of performance complexity trade-offs. Based on the energy distribution of spatial channels, it switches between the brute-force SQRD and a low-complexity group-sort QR-update strategy, which is proposed in this work to effectively utilize the LTE-A pilot pattern. As a proof of concept, a run-time reconfigurable vector processor is developed to efficiently implement this adaptive-switching QR decomposition algorithm. In a 65nm CMOS technology, the proposed SQRD processor occupies 0.71 mm2 core area and has a throughput of up to 100MQRD/s. Compared to the brute-force approach, an energy reduction of 5~33% is achieved.
  •  
47.
  • Zhang, Chenxin, et al. (författare)
  • Mapping Channel Estimation and MIMO Detection in LTE-Advanced on a Reconfigurable Cell Array
  • 2012
  • Ingår i: [Host publication title missing]. - 2158-1525 .- 0271-4310. ; , s. 1799-1802
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents a flexible architecture suitable for performing both channel estimation and signal detection in a MIMO-OFDM downlink. Extensive hardware sharing between two tasks is achieved by algorithm and architecture co-design, where robust MMSE sliding window channel estimation and MMSE-based signal detection with symbol perturbation scheme are adopted. The proposed architecture is based on a coarse-grained reconfigurable cell array with fast context switching capabilities. High flexibility is provided by the architecture, which allows task-level resource sharing and dynamic adoption of different algorithms onto the same platform. Simulation and analysis results have confirmed the efficiency of the proposed design solution, where more than 75% hardware resources are reused between the adopted algorithms.
  •  
48.
  • Zhang, Chenxin, et al. (författare)
  • Reconfigurable cell array as enabler for supporting concurrent multiple standards in mobile terminals
  • 2010
  • Ingår i: 10th Swedish System-On-Chip Conference.
  • Konferensbidrag (övrigt vetenskapligt/konstnärligt)abstract
    • This manuscript presents an reconfigurable architecture, suitable to process time synchronization for multiple OFDM standards. The proposed architecture is based on a coarse-grained reconfigurable cell array, and the different radio standards under analysis are IEEE 802.11n, 3GPP Long Term Evolution and Digital Video Broadcast for cellular devices. With the use of a 2-by-2 cell array, composed of two decoupled processing and memory pairs, two concurrent data streams from any two of three radio standards are supported. Dynamic configuration of the cell array enables run-time switching between different standards, and the underlying hardware resources are shared when concurrent streams are processed. The enhanced RISC architecture of the processor cells contributes to a high instruction level parallelism, where the close interactions between processing and memory cells meet the stringent real-time processing requirement. The proposed 2-by-2 cell array is synthesized using a 65nm low-power regular threshold standard cell CMOS library, which occupies 0.338mm2 area and has a maximum clock frequency of 534MHz. The reconfigurable cell array offers a high flexibility while uses 1.83 times more area when compared to a function identical ASIC solution.
  •  
49.
  • Zhang, Chenxin, et al. (författare)
  • Reconfigurable cell array for concurrent support of multiple radio standards by flexible mapping
  • 2011
  • Ingår i: IEEE International Symposium on Circuits and Systems. - 0271-4310 .- 2158-1525. - 9781424494736 ; , s. 1696-1699
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents a flexible architecture suitable for concurrent processing of multiple radio standards. The proposed architecture is based on a coarse-grained reconfigurable cell array, consisting of distinct processing and memory cells. Flexibility of the architecture is demonstrated by performing a coarse time synchronization and fractional frequency offset estimation for multiple OFDM standards. The radio standards under analysis are IEEE 802.11n, LTE, and DVB-H. The reconfigurable cell array, containing 2-by-2 cells, is capable of processing two concurrent data streams from the standards. Dynamic reconfigurability of the architecture enables run-time switching between the standards. The implemented 2-by-2 cell array is synthesized using a 65 nm low-leakage standard cell CMOS library, resulting in an area of 0.479mm2 and a maximum clock frequency of 534MHz. High flexibility offered by the reconfigurable cell array allows the adoption of different algorithms onto the same platform.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-49 av 49

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy