SwePub - search: L773:0271 4310 OR L773:2158 15...

Enumeration	Reference	Cover	Find
1.	Al-Obaidi, Mohammed, et al. (author) Hardware Acceleration of the Robust Header Compression (RoHC) Algorithm 2013 In: 2013 IEEE International Symposium on Circuits and Systems (ISCAS). - 2158-1525 .- 0271-4310. - 9781467357623 - 9781467357609 ; , s. 293-296 Conference paper (peer-reviewed)abstract In LTE base-stations, RoHC is a processingintensive algorithm that may limit the system from serving a large number of users when it is used to compress the VoIP packets of mobile traffic. In this paper, a hardware-software and a full-hardware solution are proposed to accelerate the RoHC compression algorithm in LTE base-stations and enhance the system throughput and capacity. Results for both solutions are discussed and compared with respect to design metrics like throughput, capacity, power consumption, and hardware resources. This comparison is instrumental in taking architectural level trade-off decisions in-order to meet the present day requirements and also be ready to support a future evolution. In terms of throughput, a gain of 20% (6250 packets/sec) is achieved in the HW-SW implementation by accelerating the Cyclic Redundancy Check (CRC) and the Least Significant Bit (LSB) encoding in hardware. The full-HW implementation leads to a throughput of 45 times (244000 packets/sec) compared to the SW-Only implementation. The full-HW solution consumes more Adaptive Look-Up Tables (7477 ALUTs) compared to the HW-SW solution (2614 ALUTs) when synthesized on Altera’s Arria II GX FPGA.
2.	Berkeman, Anders, et al. (author) A configurable divider using digit recurrence 2003 In: Proceedings - IEEE International Symposium on Circuits and Systems. - 2158-1525 .- 0271-4310. ; 5, s. 333-336 Conference paper (peer-reviewed)abstract The division operation is essential in many digital signal processing algorithms. For a hardware implementation, the requirements and constraints on the divider circuit differ significantly with different applications. Therefore, it is not possible to design one divider component having optimal performance and cost for all target applications. Instead, the presented divider has a modular architecture, based on instantiation of small efficient divider sub-blocks. The configuration of the divider architecture is set by a number of parameters controlling wordlength, number of quotient bits, number of clock cycles per operation, and fixed or floating point operation. Digit recurrence algorithms with carry save arithmetic and on-the-fly two's complement output quotient conversion are used to make the sub-blocks small, fast and power efficient, The modularity gives the designer freedom to elaborate different parameters to explore the design space. Two applications using the proposed divider are presented. Furthermore, an example divider circuit has been fabricated and performance measurements are included.
3.	Chen, Cheng, et al. (author) A 10-bit 500-MS/s 124-mW subranging folding ADC in 0.13 μm CMOS 2007 In: Proceedings - IEEE International Symposium on Circuits and Systems. - 0271-4310 .- 2158-1525. ; , s. 1709-1712 Conference paper (peer-reviewed)abstract A 10-bit two-step subranging folding analog-to-digital converter (ADC) that converts signal at 500 MSample/s is presented. Using dual-channel preprocessing blocks with distributed sample-and-hold circuits and two-stage amplifiers in which auto-zero calibration technique is employed, the proposed 10-bit ADC has a wide input bandwidth (>250MHz). The ADC consumes 124mW from a 1.2V power supply. The performance is verified by Sepctre simulation in a digital 0.13μm CMOS process. The chip occupies an active area of 0.54mm2. © 2007 IEEE.
4.	Durkalec, Laurent, et al. (author) Properties of RF bandpass amplifier topology with Q-enhancing 2002 In: Proceedings - IEEE International Symposium on Circuits and Systems. - 0271-4310 .- 2158-1525. ; 1, s. 529-532 Conference paper (peer-reviewed)abstract This paper describes a bandpass amplifier topology for the GHz range using Q-enhancing that allows a systematic design approach to be used to control linearity, noise and power consumption. Frequency selectivity is achieved by a negative feedback network using a LC tank and a positive feedback networks using a resistor to achieve Q-enhancing. The feedback networks are fully passive in order to minimize noise and distortion.
5.	Hedberg, Hugo, et al. (author) Implementation of a labeling algorithm based on contour tracing with feature extraction 2007 In: [Host publication title missing]. - 2158-1525 .- 0271-4310. ; , s. 1101-1104 Conference paper (peer-reviewed)abstract This paper describes an architecture of a connected-cluster labeling algorithm for binary images based on contour tracing with feature extraction. The implementation is intended as a hardware accelerator in a self contained real-time digital surveillance system. The algorithm has lower memory requirements compared to other labeling techniques and can guarantee labeling of a predefined number of clusters independent of their shape. In addition, features especially important in this particular application are extracted during the contour tracing with little increase in hardware complexity. The implementation is verified on an FPGA in an embedded system environment with an image resolution of 320 × 240 at a frame rate of 25 fps. The implementation supports labeling of 61 independent clusters, extracting their location, size and center of gravity. © 2007 IEEE.
6.	Jiang, Hongtu, et al. (author) FPGA implementation of controller-datapath pair in custom image processor design 2004 In: Proceedings of the 2004 International Symposium on Circuits and Systems. - 2158-1525 .- 0271-4310. ; 5, s. 141-144 Conference paper (peer-reviewed)abstract In order to reduce the effort of the controller design in the customized image convolution processor, a controller synthesis tool is developed based on [9] to support the design flow from a system or algorithm specification to RTL level VHDL. Architecture extensions to basic FSMs structures are implemented with the purpose of optimizing controller design for area and power consumption. Together with controller implementation, a custom datapath architecture with three level memory hierarchies is developed aiming at a real-time power efficient image processing solution with low I/O bandwidth requirements. The complete design is prototyped on Xilinx Virtex 2 platform with comparable performance with that of TI C64x processor at only 2/15 of its clock frequency.
7.	Kristensen, Fredrik, et al. (author) Real-time extraction of maximally stable extremal regions on an FPGA 2007 In: Proceedings - IEEE International Symposium on Circuits and Systems. - 2158-1525 .- 0271-4310. ; , s. 165-168 Conference paper (peer-reviewed)abstract This paper describes the implementation of a real-time Maximally Stable Extremal Region (MSER) detector. In order to reach real-time performance, both algorithmic and memory issues have been addressed. The Union-find algorithm, which is the heart of the MSER detector, is extended to create linked regions that significantly decrease the time to extract MSERs. Hash indexed memory structures are used to locate stored regions fast while keeping the amount of stored data low. The design is verified by including it in a demonstrator circuit. Timing and memory requirements are presented for the demonstrator and as a function of image resolution. © 2007 IEEE.
8.	Lenart, Thomas, et al. (author) A 2048 complex point FFT processor using a novel data scaling approach 2003 In: Proceedings - IEEE International Symposium on Circuits and Systems. - 2158-1525 .- 0271-4310. ; 4, s. 45-48 Conference paper (peer-reviewed)abstract In this paper, a novel data scaling method for pipelined FFT processors is proposed. By using data scaling, the FFT processor can operate on a wide range of input signals without performance loss. Compared to existing block scaling methods, like implementations of Convergent Block Floating Point (CBFP), the memory requirements can be reduced while preserving the SNR. The FFT processor has been synthesized and sent for fabrication in a 0.35μm standard CMOS technology. In netlist simulations, the FFT processor is capable of calculating a 2048 complex point FFT or IFFT in 27μs with a maximum clock frequency of 76MHz.
9.	Liu, Liang, et al. (author) A unified multi-mode MIMO detector with soft-output 2012 In: [Host publication title missing]. - 2158-1525 .- 0271-4310. Conference paper (peer-reviewed)abstract This paper presents an area/energy efficient soft-output MIMO detector that supports the detection of spatial-multiplexing (SM), spatial-diversity (SD), and space-division-multiple-access (SDMA)signals. The developed near-optimal detection algorithms for these tree modes share most of the mathematical operations to enable extensive hardware reuse. A unified VLSI architecture is accordingly designed to be reconfigured to different modes. The detector was implemented using a 65-nm CMOS technology with 0.25 mm2 core area, representing a 70% hardware-resource saving to state-of-the-art detectors. Operating at 167-MHz clock frequency with 1.2-V supply, the detector achieves 1 Gb/s peak throughput and only consumes 59.3 pJ/b energy.
10.	Liu, Liang (author) High-Throughput Hardware-Efficient Soft-Input Soft-Output MIMO Detector for Iterative Receivers 2013 In: [Host publication title missing]. - 0271-4310 .- 2158-1525. ; , s. 2151-2154 Conference paper (peer-reviewed)
11.	Liu, Xiaodong, et al. (author) An 11mW Continuous Time Delta-Sigma Modulator with 20 MHz Bandwidth in 65nm CMOS 2014 In: 2014 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS). - 0271-4310 .- 2158-1525. ; , s. 2337-2340 Conference paper (peer-reviewed)abstract This paper presents a multi-bit, continuous time delta-sigma modulator with 20 MHz bandwidth implemented in 65nm CMOS for cellular communication. The modulator features a third order, single loop filter and a 4-bit internal quantizer operating at 640 MHz. The DACs are resistive for lower thermal noise compared to the current-steering DACs and nonreturn-to-zero DAC pulse is used to reduce the clock jitter sensitivity. The measured prototype consumes 11mW from a 1.2 V power supply, and achieves an SNDR/SFDR of 63.5dB/76dB.
12.	Lixin, Yang, et al. (author) An arbitrarily skewable multiphase clock generator combining direct interpolation with phase error average 2003 In: Proceedings - IEEE International Symposium on Circuits and Systems. - 2158-1525 .- 0271-4310. ; 1, s. 645-648 Conference paper (peer-reviewed)abstract A multiphase clock generator based on direct phase interpolation is presented. No feedback loop is required. A simple phase interpolation architecture is proposed, in which the two phase-adjacent signals are interpolated by using a series of resistors via inverters' discharging or charging slopes to generate multiphase outputs in a single stage. A phase error averaging circuit is used to correct interphase errors. The multiphase clock generator has been fabricated in a standard 0.35 μm, 3.3 V CMOS process. The measured performance shows it can operate at the input clock frequencies from 300 MHz to 600 MHz and has the rms jitter of 6 ps at 500 MHz.
13.	Lu, Ping, et al. (author) A 1-1 MASH 2-D Vernier Time-to-Digital Converter with 2nd-order noise shaping 2014 In: [Host publication title missing]. - 0271-4310 .- 2158-1525. - 9781479934317 ; , s. 1324-1327 Conference paper (peer-reviewed)abstract We use a 2-dimensional (2-D) Vernier gated-ring-oscillator (GRO) time-to-digital-converter (TDC) in a cascade structure (MASH), so that a larger raw quantization step can be allowed without sacrificing the final resolution performance. The 2-D approach effectively reduces the latency time under a large input, while the MASH structure provides a 2nd-order noise shaping that produces a lower in-band quantization noise than in a single-stage GRO TDC. The TDC is simulated in a 65nm CMOS process. With the oversampling ratio (OSR) of 20, an equivalent TDC resolution of 2.48ps is achieved under the raw (Vernier) resolution of 56ps. The latency is less than 1/10 of a Vernier TDC’s for a 3ns input signal.
14.	Lu, Ping, et al. (author) A 90nm CMOS Digital PLL Based on Vernier-Gated-Ring-Oscillator Time-to-Digital Converter 2012 In: [Host publication title missing]. - 2158-1525 .- 0271-4310. ; , s. 2593-2596 Conference paper (peer-reviewed)abstract This paper presents the design of a digital PLL which uses a high resolution Gated-Ring-Oscillator-Based Vernier Time-to-Digital Converter (TDC) for low noise RF application. The TDC uses two gated ring oscillators (GRO) acting as the delay lines in an improved Vernier TDC. The already small quantization noise of the standard Vernier TDC is further first-order shaped by the GRO operation. Additionally, an automatic tuning bank controller selects the active bank of the digitally controlled oscillator (DCO), which features three separate tuning banks. The equivalent in-band phase noise at 2.7GHz is -110dBc/Hz with a reference clock of 25MHz. The digital PLL is simulated in a 90nm CMOS process, indicating a current consumption of 21mA from a 1.2V supply.
15.	Meraji, Reza, et al. (author) An Analog (7,5) Convolutional Decoder in 65 nm CMOS for Low Power Wireless Applications 2011 In: [Host publication title missing]. - 2158-1525 .- 0271-4310. - 9781424494736 ; , s. 2881-2884 Conference paper (peer-reviewed)abstract A complete architecture with transistor level simulation is presented for a low power analog convolutional decoder in 65 nm CMOS. The decoder core operates in the weak inversion (sub-VT) and realizes the BCJR decoding algorithm corresponding to the 4-state tail-biting trellis of a (7,5) convolutional code. The complete decoder also incorporates serial I/O digital interfaces and current mode differential DACs. The simulated bit error rate is presented to illustrate the coding gain compared to an uncoded system. Our results show that a low power, high throughput convolutional decoder up to 1.25 Mb/s can be implemented using analog circuitry with a total power consumption of 84 μW. For low rate applications the decoder consumes only 47 μW at a throughput of 250 kb/s.
16.	Meraji, Reza, et al. (author) Analog and Digital Approaches for an Energy Efficient Low Complexity Channel Decoder 2013 In: 2013 IEEE International Symposium on Circuits and Systems, ISCAS 2013. - 0271-4310 .- 2158-1525. ; , s. 1564-1567 Conference paper (peer-reviewed)
17.	Mohammadi, Babak, et al. (author) A 65 nm Single Stage 28 fJ/cycle 0.12 to 1.2V Level-Shifter 2014 In: 2014 IEEE International Symposium on Circuits and Systems (ISCAS). - 2158-1525 .- 0271-4310. ; , s. 990-993 Conference paper (peer-reviewed)abstract A conventional level-shifter is modified to extend the operation range down to subthreshold regime. Leakage current is reduced by utilizing transistor stacking, channel stretching, and reverse body biasing. The design has a standard-cell compliant layout and is fully integrated in a conventional digital design flow. The level-shifter is manufactured in 65 nm CMOS, and functionality is verified by measurements. The proposed design is capable of converting 0.12 to 1.2 V in a single stage, and has a static power consumption of 640 pW at a 0.12 to 1 V conversion. The minimum energy/cycle of 28 fJ/cycle with a conversion speed of 72 MHz was observed at 0.3 to 1 V conversion.
18.	Olsson, Thomas, et al. (author) A digitally controlled PLL for digital SOCs 2003 In: Proceedings - IEEE International Symposium on Circuits and Systems. - 2158-1525 .- 0271-4310. ; 5, s. 437-440 Conference paper (peer-reviewed)abstract A fully integrated digitally controlled PLL used as a clock multiplying circuit is designed and fabricated. The PLL has no off-chip components and it is made from standard cells found in most digital standard cell libraries. It is therefore portable between processes as an IP-block. Using a 0.35 μm standard CMOS process and a 3.0 V supply, the PLL has a frequency range of 152 MHz to 366 MHz and occupies an on-chip area of about 0.07 mm2. In addition, the next version of this all-digital PLL is described in synthesizable VHDL-code, which simplifies digital system simulation and change of process. A new time-to-digital converter with simulated resolution of 250 ps is made for the next PLL.
19.	Olsson, Thomas, et al. (author) A reconfigurable OFDM inner receiver implemented in the CAL dataflow language 2010 In: IEEE International Symposium on Circuits and Systems. - 0271-4310 .- 2158-1525. - 9781424453092 ; , s. 2904-2907 Conference paper (peer-reviewed)abstract This paper presents a reconfigurable inner receiver for the LTE, DVB-H, and IEEE802.11n (WLAN) radio systems, all of which are based on orthogonal frequency division multiplexing (OFDM). The receiver is implemented in the CAL language. An FPGA-based hardware implementation is synthesized from RTL generated from the CAL description. The purpose of our work is to investigate the feasibility of dataflow methodology for high-level description of digital radio transceivers.
20.	Prabhu, Hemanth, et al. (author) Hardware Efficient Approximative Matrix Inversion for Linear Pre-Coding in Massive MIMO 2014 In: [Host publication title missing]. - 0271-4310 .- 2158-1525. ; , s. 1700-1703 Conference paper (peer-reviewed)abstract This paper describes a hardware efficient linear pre-coder for Massive MIMO Base Stations (BSs) comprising a very large number of antennas, say, in the order of 100s, serving multiple users simultaneously. To avoid hardware demanding direct matrix inversions required for the Zero-Forcing (ZF) pre-coder, we use low complexity Neumann series based approximations. Furthermore, we propose a method to speed-up the convergence of the Neumann series by using tri-diagonal pre-condition matrices, which lowers the complexity even further. As a proof of concept a flexible VLSI architecture is presented with an implementation supporting matrix inversion of sizes up-to 16 × 16. In 65 nm CMOS, a throughput of 0.5M matrix inversions per sec is achieved at clock frequency of 420 MHz with a 104K gate count.
21.	Rodrigues, Joachim, et al. (author) A wavelet based R-wave detector for cardiac pacemakers in 0.35 CMOS technology 2004 In: [Host publication title missing]. - 2158-1525 .- 0271-4310. ; 4, s. 13-16 Conference paper (peer-reviewed)abstract This paper presents a wavelet based event detector for cardiac pacemakers implemented in 0.35μm CMOS technology. The architecture is optimized by wordlength and strength reduction resulting in a total chip area of 2.2mm 2. Detector performance is studied by means of databases containing electrograms as well as different types of noise and interferences, which are added to the signals. The results show that reliable detection can be obtained for moderate to high noise levels, whereas the architecture of the implemented detector is optimized to meet low power constraints implemented in digital hardware.
22.	Sherazi, Syed Muhammad Yasser, et al. (author) Design exploration of a 65 nm Sub-VT CMOS digital decimation filter chain 2011 In: 2011 IEEE International Symposium on Circuits and Systems (ISCAS). - 2158-1525 .- 0271-4310. ; , s. 837-840 Conference paper (peer-reviewed)abstract This paper presents an analysis on energy dissipation of digital half-band filters operating in the sub-threshold (sub-VT) region with throughput and supply voltage constraints. A 12-bit filter is implemented along with various unfolded structures, used to form a decimation filter chain. The designs are synthesized in a 65 nm low-leakage CMOS technology with various threshold voltages. A sub-VT energy model is applied to characterize the designs in the sub-VT domain. The results show that the low-leakage standard-threshold technology is suitable for the required throughput range between 250Ksamples/s and 2Msamples/s, at a supply voltage of 260mV. The total energy dissipation of the filter is 205 fJ per sample.
23.	Strandberg, Roland, et al. (author) Bandwidth considerations for a CALLUM transmitter architecture 2002 In: Proceedings - IEEE International Symposium on Circuits and Systems. - 2158-1525 .- 0271-4310. ; 4, s. 25-28 Conference paper (peer-reviewed)abstract This article present an investigation of the combined analog locked loop universal modulator (CALLUM) linear transmitter architecture. A simple approximate formula is derived, linking the instantaneous frequency of the CALLUM signals to the modulation depth of the information signal. Further, the effect of the frequency mismatch between the VCOs present in the CALLUM architecture is examined, and the need for frequency synchronization via a phase-locked loop is discussed.
24.	Svensson, Henrik, et al. (author) Accelerating vector operations by utilizing reconfigurable coprocessor architectures 2007 In: [Host publication title missing]. - 0271-4310 .- 2158-1525. ; , s. 3972-3975 Conference paper (peer-reviewed)abstract To enhance performance of digital signal processing tasks while keeping the flexibility of programmable solutions is a clear motivation for coprocessors implemented as reconfigurable hardware blocks. This paper investigates the applicability of such coprocessors targeting digital signal processing multi-media applications, initially in the field of speech and audio. A tightly coupled coprocessor architecture with reconfigurable datapath and a local memory system is presented. The coprocessor interacts with the main processor through asynchronous FIFOs. Three computational models that provide support for functionality of different granularities to be accelerated are investigated. A speedup in the range of 2 to 46 compared to processor execution is achieved for vector operations and larger kernels such as autocorrelation, block filtering and Fast Fourier Transform. © 2007 IEEE.
25.	Xu, Gang, et al. (author) A differential difference comparator for multi-step A/D converters 2003 In: Proceedings - IEEE International Symposium on Circuits and Systems. - 2158-1525 .- 0271-4310. ; 1, s. 257-260 Conference paper (peer-reviewed)abstract The proposed Differential Difference Comparator (DDC) provides easy linear voltage summing/subtraction and comparison functions via current operation. The speed of this unconventional comparator drastically improved since there is no feedback loop and coupling capacitors needed to maintain the linearity. The linear input range is also enlarged by the common mode compression. The principle, design considerations and simulation results are presented based on a comparator used in an 8 bit 2-step A/D converter over 100 MS/s.
26.	Ye, Dawei, et al. (author) A Wide Bandwidth Fractional-N Synthesizer for LTE with Phase Noise Cancellation Using a Hybrid- -DAC and Charge Re-timing 2013 In: [Host publication title missing]. - 0271-4310 .- 2158-1525. ; , s. 169-172 Conference paper (peer-reviewed)abstract This paper presents a 1MHz bandwidth, ΔƩ fractional-N PLL as the frequency synthesizer for LTE. A noise cancellation path composed of a novel hybrid ΔƩ DAC with 9 output bits is incorporated into the PLL in order to cancel the out-of-band phase noise caused by the quantization error. Further, a re-timing circuit is proposed to reduce the nonlinearity in the Charge Pump and provide pulse shaping signals to decrease the charge mismatch. Therefore, a wide loop bandwidth can be obtained while keeping reasonable performance of out-of-band phase noise. The proposed synthesizer is simulated in 90nm CMOS process, consuming 20.96mA from a 1 V supply.
27.	Yijun, Zhou, et al. (author) A direct digital RF amplitude modulator 2002 In: Proceedings - IEEE International Symposium on Circuits and Systems. - 2158-1525 .- 0271-4310. ; 1, s. 141-144 Conference paper (peer-reviewed)abstract This paper describes a direct digital RF amplitude modulator, which uses a 10-bit linear interpolation current steering digital to analog converter (DAC) and a Gilbert cell mixer to generate an RF amplitude modulated signal directly. The linear interpolation increases the attenuation of the DAC's image components. The low pass filter (LPF) is eliminated, and the RF transmitter structure can be simplified. This modulator is suitable for realizing the system-on-chip design. The chip has been fabricated in a 0.35μm, 3.3V digital CMOS process.
28.	Zhang, Chenxin, et al. (author) Energy Efficient SQRD Processor for LTE-A using a Group-sort Update Scheme 2014 In: [Host publication title missing]. - 2158-1525 .- 0271-4310. - 9781479934317 ; , s. 193-196 Conference paper (peer-reviewed)abstract This paper presents an energy-efficient sorted QR decomposition (SQRD) processor for 3GPP LTE-Advanced (LTE-A) systems. The processor adopts a hybrid decomposition scheme to reduce computational complexity and provides a wide-range of performance complexity trade-offs. Based on the energy distribution of spatial channels, it switches between the brute-force SQRD and a low-complexity group-sort QR-update strategy, which is proposed in this work to effectively utilize the LTE-A pilot pattern. As a proof of concept, a run-time reconfigurable vector processor is developed to efficiently implement this adaptive-switching QR decomposition algorithm. In a 65nm CMOS technology, the proposed SQRD processor occupies 0.71 mm2 core area and has a throughput of up to 100MQRD/s. Compared to the brute-force approach, an energy reduction of 5~33% is achieved.
29.	Zhang, Chenxin, et al. (author) Mapping Channel Estimation and MIMO Detection in LTE-Advanced on a Reconfigurable Cell Array 2012 In: [Host publication title missing]. - 2158-1525 .- 0271-4310. ; , s. 1799-1802 Conference paper (peer-reviewed)abstract This paper presents a flexible architecture suitable for performing both channel estimation and signal detection in a MIMO-OFDM downlink. Extensive hardware sharing between two tasks is achieved by algorithm and architecture co-design, where robust MMSE sliding window channel estimation and MMSE-based signal detection with symbol perturbation scheme are adopted. The proposed architecture is based on a coarse-grained reconfigurable cell array with fast context switching capabilities. High flexibility is provided by the architecture, which allows task-level resource sharing and dynamic adoption of different algorithms onto the same platform. Simulation and analysis results have confirmed the efficiency of the proposed design solution, where more than 75% hardware resources are reused between the adopted algorithms.
30.	Zhang, Chenxin, et al. (author) Reconfigurable cell array for concurrent support of multiple radio standards by flexible mapping 2011 In: IEEE International Symposium on Circuits and Systems. - 0271-4310 .- 2158-1525. - 9781424494736 ; , s. 1696-1699 Conference paper (peer-reviewed)abstract This paper presents a flexible architecture suitable for concurrent processing of multiple radio standards. The proposed architecture is based on a coarse-grained reconfigurable cell array, consisting of distinct processing and memory cells. Flexibility of the architecture is demonstrated by performing a coarse time synchronization and fractional frequency offset estimation for multiple OFDM standards. The radio standards under analysis are IEEE 802.11n, LTE, and DVB-H. The reconfigurable cell array, containing 2-by-2 cells, is capable of processing two concurrent data streams from the standards. Dynamic reconfigurability of the architecture enables run-time switching between the standards. The implemented 2-by-2 cell array is synthesized using a 65 nm low-leakage standard cell CMOS library, resulting in an area of 0.479mm2 and a maximum clock frequency of 534MHz. High flexibility offered by the reconfigurable cell array allows the adoption of different algorithms onto the same platform.
31.	Attari, Mohammad, et al. (author) An application specific vector processor for CNN-based massive MIMO positioning 2021 In: 2021 IEEE International Symposium on Circuits and Systems, ISCAS 2021 - Proceedings. - 0271-4310. - 9781728192017 ; 2021-May Conference paper (peer-reviewed)abstract This paper sets out to create an implementation for fingerprint-based positioning using massive multiple-input multiple-output (MIMO) technology, by means of deep convolutional neural networks (CNN), and utilizing the wireless channel state information (CSI). Due to the sheer volume of computational requirements imposed by CNN processing, an accelerator-assisted design is well-suited to the task at hand. Consequently, an application specific instruction set processor (ASIP) is designed to combine flexibility with implementation efficiency. This ASIP is equipped with vector processing capabilities employing a single instruction multiple data (SIMD) scheme, and additionally has a very large instruction word (VLIW) architecture to further exploit instruction-level parallelism. A configurable 2D array of processing engines (PE) is integrated into the processor, in a tightly coupled manner, to accelerate the CNN operation. Synthesis results will be demonstrated using the GF-22 nm FD-SOI technology with a clock frequency of 555 MHz. The system can achieve a throughput of 271 positionings/s, with an average positioning error of 3.5 λ (40 cm) at a carrier frequency of 2.6 GHz.
32.	Castaneda, Oscar, et al. (author) VLSI Design of a 3-bit Constant-Modulus Precoder for Massive MU-MIMO 2018 In: Proceedings - IEEE International Symposium on Circuits and Systems. - 0271-4310. ; 2018-May Conference paper (peer-reviewed)abstract Fifth-generation (5G) cellular systems will build on massive multi-user (MU) multiple-input multiple-output (MIMO) technology to attain high spectral efficiency. However, having hundreds of antennas and radio-frequency (RF) chains at the base station (BS) entails prohibitively high hardware costs and power consumption. This paper proposes a novel nonlinear precoding algorithm for the massive MU-MIMO downlink in which each RF chain contains an 8-phase (3-bit) constantmodulus transmitter, enabling the use of low-cost and powerefficient analog hardware. We present a high-throughput VLSI architecture and show implementation results on a Xilinx Virtex-7 FPGA. Compared to a recently-reported nonlinear precoder for BS designs that use two 1 -bit digital-to-analog converters per RF chain, our design enables up to 3:75 dB transmit power reduction at no more than a 2.7x increase in FPGA resources.
33.	Chen, Peng, et al. (author) Analysis and design of an 1-20 GHz track and hold circuit 2021 In: 2021 IEEE International Symposium on Circuits and Systems, ISCAS 2021 - Proceedings. - 0271-4310. - 9781728192017 ; 2021-May Conference paper (peer-reviewed)abstract This work analyzes the nonlinear effects in the track and hold circuit applied in high-speed ADCs or RF sampling receiver (RX) front-ends. Non-ideal effects inside the main sampling NMOS switch are studied. Parasitic varactor and sampling on-resistance modulation effects are analyzed through frequency domain Volterra series and the EKV MOS transistor model. Polynomial curve fitting is applied showing that the on-resistance modulation dominates. Finally, a novel bootstrap circuit is proposed with a fast settling time and high bootstrap voltage in a 22 nm FD-SOI CMOS technology, with its settling time analyzed using the Elmore delay model.
34.	Drazdziulis, Mindaugas, 1978, et al. (author) Evaluation of Power Cut-Off Techniques in the Presence of Gate Leakage 2004 In: 2004 IEEE International Symposium on Cirquits and Systems - Proceedings; Vancouver, BC; Canada; 23 May 2004 through 26 May 2004. - 0271-4310. ; 2, s. II745-II748 Conference paper (peer-reviewed)abstract We consider gate leakage next to subthreshold leakage currents in power-saving techniques for future CMOS circuits. Two recently introduced power cut-off techniques are analyzed and compared with respect to the total leakage current using Berkeley PTM. The results show that the efficiency of techniques having logic circuits alternately connected to external supply and ground can drastically de,grade when gate tunneling currents become significant.
35.	Eriksson, Henrik, 1974, et al. (author) Dynamic Pass-Transistor Dot Operators for Efficient Parallel-Prefix Adders 2004 In: International Symposium on Circuits and Systems (ISCAS), Vancouver, CANADA. MAY 23-26, 2004. - 0271-4310. ; 2, s. 461-464 Conference paper (peer-reviewed)abstract We employ a dynamic pass-transistor technique to drastically reduce the area requirement and power dissipation of the dot-operator cell in parallel-prefix adders. The technique is demonstrated in both 0.35 μm and 0.13 μm process technologies on a 64-bit Kogge-Stone carry tree. In a comparison with a corresponding domino implementation it is shown that the transistor count and the power dissipation can be reduced with as much as 25% and 50%, respectively. On top of the area and power reduction, the delay can also be significantly reduced by using NMOS precharge transistors, but this requires a clock signal with a higher voltage.
36.	Eriksson, Henrik, 1974, et al. (author) Glitch-Conscious Low-Power Design of Arithmetic Circuits 2004 In: 2004 IEEE International Symposium on Cirquits and Systems - Proceedings; Vancouver, BC; Canada; 23 May 2004 through 26 May 2004. - 0271-4310. ; 2, s. II281-II284 Conference paper (peer-reviewed)abstract Glitches are common in arithmetic circuits, especially in large multipliers where they often represent the major part of transitions. With the aim to provide a judicious glitch-reduction strategy, we extract and study the relation between generated and propagated glitches for three different arithmetic blocks. We show that the number of propagated glitches is far bigger than those generated regardless of circuit type, supply voltage, and threshold voltage. In contrast to existing glitch-reduction strategies we propose to focus also on the glitch propagation mechanism. It is shown how the inverting property of adder cells can be harnessed to reduce propagation of glitches and thus the overall power dissipation.
37.	Ferreira, Lucas, et al. (author) Reconfigurable multi-access pattern vector memory for real-time orb feature extraction 2021 In: 2021 IEEE International Symposium on Circuits and Systems, ISCAS 2021 - Proceedings. - 0271-4310. - 9781728192017 ; 2021-May Conference paper (peer-reviewed)abstract This work presents an on-chip memory subsystem envisioned for real-time applications performing Oriented FAST and Rotated Brief (ORB) feature extraction for Simultaneous Localization and Mapping (SLAM) systems. For autonomous navigation of battery-powered devices, feature-based SLAM is a computationally frugal alternative to direct methods. This paper thoroughly analyses ORB multiple memory access patterns, exploring possible systematic parallelism and hardware-biased algorithmic enhancements, alleviating requirements on bandwidth and reducing redundant accesses. Enabling those, a suitable multi-bank parallel memory featuring run-time reconfigurable address generation, image allotment, and close-to-memory data-shuffling is proposed. As case study, a 30 Frames-Per-Second (FPS) VGA-resolution ORB-capable 8-bank memory is evaluated using 22 FDX technology, running at 909 MHz, with a negligible area overhead of 0.3%, reducing operand accesses between 54 − 160× relative to Sudoku-like and scalar memories.
38.	Hausmair, Katharina, 1982, et al. (author) Modeling and Linearization of Multi-Antenna Transmitters Using Over-the-Air Measurements 2018 In: Proceedings - IEEE International Symposium on Circuits and Systems. - 0271-4310. ; 2018-May Conference paper (peer-reviewed)abstract In this paper, we present a technique to model and linearize a multi-antenna transmitter using only a small set of observation receivers that perform over-the-air measurements. We assume that the transmitter suffers from distortion due to power amplifier (PA) nonlinearities but not from crosstalk. By avoiding the use of an observation receiver in every transmitter branch, the hardware complexity and cost of multi-antenna transmitters can be reduced. First, equations are developed to extract PA models from observation receivers. Based on the extracted PA models, predistorters can then be identified for every transmit branch. We present simulation results that demonstrate that it is indeed possible to model and linearize a set of PAs using only one single observation receiver.
39.	Hesami, Sara, et al. (author) Single Digital Predistortion Technique for Phased Array Linearization 2019 In: Proceedings - IEEE International Symposium on Circuits and Systems. - 0271-4310. - 9781728103976 ; 2019-May Conference paper (peer-reviewed)abstract In this paper, we present a novel and effective linearization technique for nonlinear phased array antennas. For large phased arrays, linearization of the array using a single digital predistortion (DPD) is inevitable since one digital path is upconverted and feeds several RF transmission paths, each of which is connected to a power amplifier (PA) and an antenna element. However, a critical issue is that the PA characteristics can vary considerably within an array. Thus, linearizing individual PAs with one DPD is rather challenging. We formulate and solve an optimization problem that corresponds to jointly minimizing the maximum residuals between the input to the array and the output of individual PAs. We demonstrate that the proposed technique outperforms state-of-the-art linearization solutions while retaining the linear gain of the array.
40.	Ryman, Erik J, 1982, et al. (author) A SiGe 8-Channel Comparator for Application in a Synthetic Aperture Radiometer 2013 In: Proceedings - IEEE International Symposium on Circuits and Systems. - 0271-4310. - 9781467357609 ; , s. 845-848 Conference paper (peer-reviewed)abstract We present a high-speed low-power 8-channel comparator tailored for the application of sampling antenna signals in a cross-correlator system for space-borne synthetic aperture radiometer instruments. Features like clock return path, per-channel offset calibration and bias current tuning make the comparator adaptable and gives the possibility to adjust the comparator for low power consumption, while keeping performance within the requirements of the cross-correlator system. The comparator has been implemented and fabricated in a 130-nm SiGe BiCMOS process. Measurements show that the comparator can perform sampling at a rate of 4.5 GS/s with a power consumption of 48 mW/channel or 1 GS/s with a power consumption of 17 mW/channel.
41.	Signell, Svante (author) Jittered uniform sampling - Examples 2005 In: IEEE International Symposium on Cicuits and Systems. - NEW YORK, NY : IEEE. - 0271-4302 .- 2158-1525. ; , s. 988-991 Journal article (peer-reviewed)abstract In many communication systems jitter is a problem causing performance degradation. The jitter is present in both analogue parts and in the sampling process. In this contribution examples of calculated and simulated spectra are shown for uncorrelated Uniform and uncorrelated and correlated Gaussian jitter distributions. Closed form expressions are also given for selected correlated noise cases. In two companion contributions, [1] random input signals and [21 deterministic input signals perturbed by jitter are analysed and the mathematical foundation to the examples given here are made. The analytical results show that the output signal spectrum consists of both discrete and continuous parts. For some applications, such as instruments for spectrum analysis, large jitter can be of great use utilizing the fact that discrete alias components can be completely removed with sufficiently large jitter.
42.	Sinaei, Sima, et al. (author) ELC-ECG : Efficient LSTM cell for ECG classification based on quantized architecture 2021 In: Proceedings - IEEE International Symposium on Circuits and Systems. - : Institute of Electrical and Electronics Engineers Inc.. - 9781728192017 ; May Conference paper (peer-reviewed)abstract Long Short-Term Memory (LSTM) is one of the most popular and effective Recurrent Neural Network (RNN) models used for sequence learning in applications such as ECG signal classification. Complex LSTMs could hardly be deployed on resource-limited bio-medical wearable devices due to the huge amount of computations and memory requirements. Binary LSTMs are introduced to cope with this problem. However, naive binarization leads to significant accuracy loss in ECG classification. In this paper, we propose an efficient LSTM cell along with a novel hardware architecture for ECG classification. By deploying 5-level binarized inputs and just 1-level binarization for weights, output, and in-memory cell activations, the delay of one LSTM cell operation is reduced 50x with about 0.004% accuracy loss in comparison with full precision design of ECG classification.
43.	Sinaei, Sima, et al. (author) MuBiNN : Multi-level binarized recurrent neural network for EEG signal classification 2020 In: Proceedings - IEEE International Symposium on Circuits and Systems. - : Institute of Electrical and Electronics Engineers Inc.. - 9781728133201 ; October Conference paper (peer-reviewed)abstract Recurrent Neural Networks (RNN) are widely used for learning sequences in applications such as EEG classification. Complex RNNs could be hardly deployed on wearable devices due to their computation and memory-intensive processing patterns. Generally, reduction in precision leads much more efficiency and binarized RNNs are introduced as energy-efficient solutions. However, naive binarization methods lead to significant accuracy loss in EEG classification. In this paper, we propose a multi-level binarized LSTM, which significantly reduces computations whereas ensuring an accuracy pretty close to the full precision LSTM. Our method reduces the delay of the 3-bit LSTM cell operation 47× with less than 0.01% accuracy loss.

Skapa referenser, mejla, bekava och länka

Permalink

Träfflista för sökning "L773:0271 4310 OR L773:2158 1525 "

Refine your search

Year