SwePub
Sök i SwePub databas

  Extended search

Träfflista för sökning "WFRF:(Prabhu Hemanth) "

Search: WFRF:(Prabhu Hemanth)

  • Result 1-16 of 16
Sort/group result
   
EnumerationReferenceCoverFind
1.
  • Gangarajaiah, Rakesh, et al. (author)
  • A Cholesky decomposition based massive MIMO uplink detector with adaptive interpolation
  • 2017
  • In: IEEE International Symposium on Circuits and Systems : From Dreams to Innovation, ISCAS 2017 - Conference Proceedings - From Dreams to Innovation, ISCAS 2017 - Conference Proceedings. - 9781467368520
  • Conference paper (peer-reviewed)abstract
    • An adaptive uplink detection scheme for a Massive MIMO (MaMi) base station serving up to 16 users is presented. Considering user distribution in a cell, selective matched filtering (MF) is proposed for non-interference limited users and a Cholesky decomposition (CD) based zero-forcing (ZF) detector is implemented for the remaining users. Channel conditions such as coherence bandwidth are exploited to lower computational complexity by interpolating CD outputs. Performance evaluations on measured MaMi channels indicate a reduction in computation count by 60 times with a less than 1 dB loss at an uncoded bit error rate of 10-3. For the CD, a reconfigurable processor optimized for 8×8 matrices with block decomposition extension to support up to 16×16 matrices is presented. Circuit level optimizations in 28 nm FD-SOI resulted in an energy of 1.4 nJ/CD at 400 MHz, and post-layout simulations indicate a 50% reduction in power dissipation when operating with the proposed interpolation based detection scheme compared to traditional ZF detection.
  •  
2.
  • Liu, Yangxurui, et al. (author)
  • Adaptive Resource Scheduling for Energy Efficient QRD Processor with DVFS
  • 2015
  • Conference paper (peer-reviewed)abstract
    • This paper presents an energy efficient adaptive QR decomposition scheme for Long Term Evolution Advance (LTE-A) downlink system. The proposed scheme provides a performance robustness to fluctuating wireless channels while maintaining lower workload on a reconfigurable hardware. A statistic based algorithm-switching strategy is employed in the scheme to achieve workload reduction and stable computing resource requirement for QR decomposition. With run time resource allocation, computing resources are assigned to highest performance gain segments to reduce performance loss. By utilizing the dynamic voltage and frequency scaling (DVFS) technique, we further exploit the potential of power saving in various workload situation while maintaining fixed throughput. The proposed technique brings power reduction upto 57.8% in EVA-5 scenario and 24.4% with a maximum SNR loss of 1 dB in EVA-70 scenario, when mapped on a coarse grain reconfigurable vector-based platform.
  •  
3.
  • Malkowsky, Steffen, et al. (author)
  • A programmable 16-lane SIMD ASIP for massive MIMO
  • 2019
  • In: 2019 IEEE International Symposium on Circuits and Systems, ISCAS 2019 - Proceedings. - 9781728103976 ; 2019
  • Conference paper (peer-reviewed)abstract
    • This paper presents a 16-lane, 16-bit complex application-specific instruction processor (ASIP) for baseband processing in massive multiple-input multiple-output (MIMO). The architecture utilizes a 3/4-way very large instruction word (VLIW) with highly efficient pre- and post-processing units specifically trimmed for massive MIMO requirements. Architecture optimizations include features like single cycle vector-dot-product, vector indexing and broadcasting, hardware loops and full complex accumulator to provide high performance for various massive MIMO algorithms. Moreover, the ASIP is fully C-programmable, which is crucial for adapting to the evolving 5G standard. In our evaluation, a full massive MIMO up-link detection is executed in ≈11k clock cycles while synthesis results in ST 28 nm FD-SOI suggest a clock frequency of 900 MHz equating in a detection throughput of 330 Mb/s for a 128×16 massive MIMO system.
  •  
4.
  • Prabhu, Hemanth, et al. (author)
  • 3.6 A 60pJ/b 300Mb/s 128×8 Massive MIMO precoder-detector in 28nm FD-SOI
  • 2017
  • In: 2017 IEEE International Solid-State Circuits Conference (ISSCC). - 9781509037582 - 9781509037599 ; 60, s. 60-61
  • Conference paper (peer-reviewed)abstract
    • Further exploitation of the spatial domain, as in Massive MIMO (MaMi) systems, is imperative to meet future communication requirements [1]. Up-scaling of conventional 4×4 small-scale MIMO implementations to MaMi is prohibitive in-terms of flexibility, as well as area and power cost. This work discloses a 1.1mm2 128×8 MaMi baseband chip, achieving up to 12dB array and 2× spatial multiplexing gains. The area cost compared to previous state-of-the-art MIMO implementations [2-3], is reduced by 53% and 17% for up- and down-link, respectively. Algorithm optimizations and a highly flexible framework were evaluated on real measured channels. Extensive hardware time multiplexing lowered area cost, and leveraging on flexible FD-SOI body bias and clock gating resulted in an energy efficiency of 6.56nJ/QRD and 60pJ/b at 300Mb/s detection rate.
  •  
5.
  • Prabhu, Hemanth, et al. (author)
  • A 1070 pJ/b 169 Mb/s Quad-core Digital Baseband SoC for Distributed and Cooperative Massive MIMO in 28 nm FD-SOI
  • 2021
  • In: 2021 Symposium on VLSI Circuits, VLSI Circuits 2021. - 9784863487796 ; 2021-June
  • Conference paper (peer-reviewed)abstract
    • A 2.2 mm2 full digital baseband SoC with four heterogeneous cores for 128-node 8-users distributed massive MIMO is presented. Two specialized DSPs perform rapid over-the-air synchronization within 0.1ms. A highly optimized 8-complex lane MIMO vector processor provides 4x hardware efficiency improvement over general-purpose processors. Circuit optimizations and the use of body-bias result in 1070 pJ/b measured energy at 169 Mb/s detection rate.
  •  
6.
  • Prabhu, Hemanth, et al. (author)
  • A GALS ASIC implementation from a CAL dataflow description
  • 2011
  • In: IEEE. - 9781457705144
  • Conference paper (peer-reviewed)abstract
    • This paper presents low power hardware generation, based on a CAL actor language dataflow implementation. The CAL language gives a higher level of abstraction and generate both hardware and software description. The original CAL flow is targeted for hardware-software co-design of complex systems on FPGA. Modifications are done to the original CAL flow to facilitate low power ASIC implementations. The hardware-software co-design and Globally Asynchronous Locally Synchronous (GALS) design at a higher level of abstraction provides more freedom for design-space exploration and reduced design time. Performance is evaluated by a reference design, Orthogonal Frequency-Division Multiplexing (OFDM) multi-standard channel estimator based on robust Minimum Mean-Square Error (MMSE) algorithm. Higher throughput is attained due to inherent parallelism in CAL dataflow and reduced design time for GALS implementation.
  •  
7.
  • Prabhu, Hemanth, et al. (author)
  • A low-complex peak-to-average power reduction scheme for OFDM based massive MIMO systems
  • 2014
  • In: 2014 6th International Symposium on Communications, Control and Signal Processing (Isccsp). ; , s. 114-117
  • Conference paper (peer-reviewed)abstract
    • An Orthogonal Frequency-Division Multiplexing (OFDM) based multi-user massive Multiple-Input Multiple-Output (MIMO) system is considered. The problem of high Peak-to-Average Ratio (PAR) in OFDM based systems is well known and the large number of antennas (RF-chains) at the Base Station (BS) in massive MIMO systems aggravates this further, since large numbers of these Power Amplifiers (PAs) are used. High PAR necessitates linear PAs, which have a high hardware cost and are typically power inefficient. In this paper we propose a low-complex approach to tackle the issue. The idea is to deliberately clip signals sent to one set of antennas, while compensating for this by transmitting correction signals on a set of reserved antennas (antenna-reservation). A reduction of 4 dB in PAR is achieved by reserving 25% of antennas, with only a 15% complexity overhead.
  •  
8.
  • Prabhu, Hemanth, et al. (author)
  • Algorithm and Hardware Aspects of Pre-coding in Massive MIMO Systems
  • 2015
  • In: 2015 49th Asilomar Conference on Signals, Systems and Computers. - 9781467385749 ; , s. 1144-1148
  • Conference paper (peer-reviewed)abstract
    • Massive Multiple-Input Multiple-Output (MIMO) systems have been shown to improve both spectral and energy efficiency one or more orders of magnitude by efficiently exploiting the spatial domain. Low-cost RF chains can be employed to reduce the Base Station (BS) cost, however this may require additional baseband processing to handle induced distortions due to the hardware impairments. In this article the reduction of Peak-to-Average power Ratio (PAR) of the transmitted signals and IQ imbalance in the mixer are analyzed for the down-link. We analyze various pre-coding schemes and estimate the required processing energy per transmitted information bit. Simulation on gate-level show that the energy cost of performing pre-coding and tackling of hardware impairments range from very low to reasonable, compared to the processing necessary in a system without impairments.
  •  
9.
  • Prabhu, Hemanth, et al. (author)
  • Approximative Matrix Inverse Computations for Very-large MIMO and Applications to Linear Pre-coding Systems
  • 2013
  • In: [Host publication title missing]. ; , s. 2710-2715
  • Conference paper (peer-reviewed)abstract
    • In very-large multiple-input multiple-output (MIMO) systems, the BS (base station) is equipped with very large number of antennas as compared to previously considered systems. There are various advantages of increasing the number of antennas, and some schemes would require handling large matrices for joint processing (pre-coding) at the base station. The dirty paper coding (DPC) is an optimal pre-coding scheme and has a very high complexity. However with increasing number of BS antennas linear pre-coding performance tends to that of the optimal DPC. Although linear pre-coding is less complex than DPC, there is a need to compute pseudo inverses of large matrices. In this paper we present a low complexity approximation of down-link Zero Forcing linear pre-coding for very-large multi-user MIMO systems. Approximation using a Neumann series expansion is opted for inversion of matrices over traditional exact computations, by making use of special properties of the matrices, thereby reducing the cost of hardware. With this approximation of linear pre-coding, we can significantly reduce the computational complexity for large enough systems, i.e., where we have enough BS antenna elements. For the investigated case of 8 users, we obtain 90% of the full ZF sum rate, with lower computational complexity, when the number of BS antennas per user is about 20 or more.
  •  
10.
  • Prabhu, Hemanth, et al. (author)
  • Hardware Efficient Approximative Matrix Inversion for Linear Pre-Coding in Massive MIMO
  • 2014
  • In: [Host publication title missing]. - 0271-4310 .- 2158-1525. ; , s. 1700-1703
  • Conference paper (peer-reviewed)abstract
    • This paper describes a hardware efficient linear pre-coder for Massive MIMO Base Stations (BSs) comprising a very large number of antennas, say, in the order of 100s, serving multiple users simultaneously. To avoid hardware demanding direct matrix inversions required for the Zero-Forcing (ZF) pre-coder, we use low complexity Neumann series based approximations. Furthermore, we propose a method to speed-up the convergence of the Neumann series by using tri-diagonal pre-condition matrices, which lowers the complexity even further. As a proof of concept a flexible VLSI architecture is presented with an implementation supporting matrix inversion of sizes up-to 16 × 16. In 65 nm CMOS, a throughput of 0.5M matrix inversions per sec is achieved at clock frequency of 420 MHz with a 104K gate count.
  •  
11.
  • Prabhu, Hemanth (author)
  • Hardware Implementation of Baseband Processing for Massive MIMO
  • 2017
  • Doctoral thesis (other academic/artistic)abstract
    • In the near future, the number of connected mobile devices and data-rates are expected to dramatically increase. Demands exceed the capability of the currently deployed (4G) wireless communication systems. Development of 5G systems is aiming for higher data-rates, better coverage, backward compatibility, and conforming with “green communication” to lower energy consumption. Massive Multiple-Input Multiple-Output (MIMO) is a technology with the potential to fulfill these requirements. In massive MIMO systems, base stations are equipped with a very large number of antennas compared to 4G systems, serving a relatively low number of users simultaneously in the same frequency and time resource. Exploiting the high spatial degrees-of-freedom allows for aggressive spatial multiplexing, resulting in high data-rates without increasing the spectrum. More importantly, achieving high array gains and eliminating inter-user interference results in simpler mobile terminals.These advantages of massive MIMO requires handling a large number of antennas efficiently, by performing baseband signal processing. Compared to small-scale MIMO base stations, the processing can be much more computationally intensive, in particular considering the large dimensions of the matrices. In addition to computational complexity, meeting latency requirements is also crucial. Another aspect is the power consumption of the baseband processing. Typically, major contributors of power consumption are poweramplifiers and analog components, however, in massive MIMO, the transmit power at each antenna can be lowered drastically (by the square of the number of antennas). Thus, the power consumption from the baseband processing becomes more significant in relation to other contributions. This puts forward the main challenge tackled in this thesis, i.e., how to implement low latency baseband signal processing modules with high hardware and energy efficiency.The focus of this thesis has been on co-optimization of algorithms and hardware implementations, to meet the aforementioned challenges/requirements. Algorithm optimization is performed to lower computational complexity, e.g., large scale matrix operations, and also on the system-level to relax constraints on analog/RF components to lower cost and improve efficiency. These optimizations were evaluated by taking into consideration the hardware cost and device level parameters. To this end, a massive MIMO central baseband pre-coding/detection chip was fabricated in 28 nm FD-SOI CMOS technology and measured. The algorithm and hardware co-optimization resulted in the highest reported pre-coding area and energy efficiency of 34.1QRD/s/gate and 6.56nJ/QRD, respectively. For detection, compared to small scale MIMO systems, massive MIMO with linear schemes provided superior performance, with area and energy efficiency of 2.02Mb/s/kGE and 60 pJ/b.The array and spatial multiplexing gains in massive MIMO, combined with high hardware efficiency and schemes to lower constraints on RF/analog components, makes it extremely promising for future deployments.
  •  
12.
  • Prabhu, Hemanth, et al. (author)
  • High Throughput Constant Envelope Pre-coder for Massive MIMO Systems
  • 2015
  • In: 2015 IEEE International Symposium on Circuits and Systems (ISCAS). - 9781479983919
  • Conference paper (peer-reviewed)abstract
    • This study describes a high throughput constant envelope (CE) pre-coder for Massive MIMO systems. A large number of antennas (M), in the order of 100s, serve a relatively small number of users (K) simultaneously. The stringent amplitude constraint (only phase changes) in the CE scheme is motivated by the use of highly power-efficient non-linear RF power amplifiers. We propose a scheme that computes the CE signals to be transmitted based on box-constrained regression (coordinatedescent),with an O(2MK) complexity per iteration per user symbol. A highly scalable systolic architecture is implemented, where M Processing Elements (PEs) perform the pre-coding for a system with up to K = 16 users. This systolic architecture results in a very high throughput of 500 Msamples/sec (at 500 MHz clock rate) with a gate count of 14 K per PE in 65 nm technology.
  •  
13.
  • Tang, Wei, et al. (author)
  • A 1.8Gb/s 70.6pJ/b 128×16 link-adaptive near-optimal massive MIMO detector in 28nm UTBB-FDSOI
  • 2018
  • In: 2018 IEEE International Solid-State Circuits Conference, ISSCC 2018. - 9781509049400 - 9781538622278 ; 61, s. 224-226
  • Conference paper (peer-reviewed)abstract
    • This work presents a 2.0mm2 128×16 massive MIMO detector IC that provides 21dB array gain and 16x multiplexing gain at the system level. The detector implements iterative expectation-propagation detection (EPD) for up to 256-QAM modulation. Tested with measured channel data [1], the detector achieves 4.3dB processing gain over state-of-the-art massive MlMo detectors [2, 3], enabling 2.7x reduction in transmit power for battery-powered mobile terminals. The iC uses link-adaptive processing to meet a variety of practical channel conditions with scalable energy consumption. The design is realized in a condensed systolic array architecture and an approximate moment-matching circuitry to reach 1.8Gb/s at 70.6pJ/b. The performance and energy efficiency can be tuned over a wide range by UTBB-FDSOI body bias.
  •  
14.
  • Zhang, Chenxin, et al. (author)
  • Energy Efficient Group-Sort QRD Processor with On-line Update for MIMO Channel Pre-processing
  • 2015
  • In: IEEE Transactions on Circuits and Systems Part 1: Regular Papers. - 1549-8328. ; 62:5, s. 1220-1229
  • Journal article (peer-reviewed)abstract
    • This paper presents a Sorted QR-Decomposition (SQRD) processor for 3GPP LTE-A system. It achieves energy efficiency by co-optimizing techniques, such as heterogeneous processing, reconfigurable architecture, and dual-supply voltage operation. At algorithm level, a low-complexity hybrid decomposition scheme is adopted, which switches, depending on the energy distribution of spatial channels, between the traditional brute-force SQRD and a proposed group-sort QR update strategy. A reconfigurable vector processor is accordingly developed to support the adaptive processing with high hardware efficiency. Furthermore, on-chip power management technique is also integrated to obtain real-time power-saving by adapting the voltage supply based on the instantaneous workload. As a proof-of-concept, we implemented the processor using a 65nm CMOS technology and conducted post-layout simulation. The proposed SQRD processor occupies 0.71mm2 core area and has a throughput of up to 69MQRD/s. Compared to the brute-force approach, an energy reduction of 10~61.8% is achieved.
  •  
15.
  • Zhang, Chenxin, et al. (author)
  • Energy Efficient MIMO Channel Pre-processor Using a Low Complexity On-Line Update Scheme
  • 2012
  • In: [Host publication title missing].
  • Conference paper (peer-reviewed)abstract
    • This paper presents a low-complexity energy efficient channel pre-processing update scheme, targeting the emerging 3GPP long term evolution advanced (LTE-A) downlink. Upon channel matrix renewals, the number of explicit QR decompositions (QRD) and channel matrix inversions are reduced since only the upper triangular matrices R and R^-1 are updated, based on an on-line update decision mechanism. The proposed channel pre-processing updater has been designed as a dedicated unit in a 65nm CMOS technology, resulting in a core area of 0.242mm2 (equivalent gate count of 116K). Running at a 330MHz clock, each QRD or R^-1 update consumes 4 or 2 times less energy compared to one exact state-of-the-art QRD in open literature.
  •  
16.
  • Zhang, Chenxin, et al. (author)
  • Energy Efficient SQRD Processor for LTE-A using a Group-sort Update Scheme
  • 2014
  • In: [Host publication title missing]. - 2158-1525 .- 0271-4310. - 9781479934317 ; , s. 193-196
  • Conference paper (peer-reviewed)abstract
    • This paper presents an energy-efficient sorted QR decomposition (SQRD) processor for 3GPP LTE-Advanced (LTE-A) systems. The processor adopts a hybrid decomposition scheme to reduce computational complexity and provides a wide-range of performance complexity trade-offs. Based on the energy distribution of spatial channels, it switches between the brute-force SQRD and a low-complexity group-sort QR-update strategy, which is proposed in this work to effectively utilize the LTE-A pilot pattern. As a proof of concept, a run-time reconfigurable vector processor is developed to efficiently implement this adaptive-switching QR decomposition algorithm. In a 65nm CMOS technology, the proposed SQRD processor occupies 0.71 mm2 core area and has a throughput of up to 100MQRD/s. Compared to the brute-force approach, an energy reduction of 5~33% is achieved.
  •  
Skapa referenser, mejla, bekava och länka
  • Result 1-16 of 16

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view