SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L4X0:0345 7524 ;pers:(Liu Dake Professor)"

Sökning: L4X0:0345 7524 > Liu Dake Professor

  • Resultat 1-8 av 8
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Asghar, Rizwan (författare)
  • Flexible Interleaving Sub–systems for FEC in Baseband Processors
  • 2010
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Interleaving is always used in combination with an error control coding. It spreads the burst noise, and changes the burst noise to white noise so that the noise induced bit errors can be corrected. With the advancement of communication systems and substantial increase in bandwidth requirements, use of coding for forward error correction (FEC) has become an integral part in the modern communication systems. Dividing the FEC sub-systems in two categories i.e. channel coding/de-coding and interleaving/de-interleaving, the later appears to be more varying in permutation functions, block sizes and throughput requirements. The interleaving/de-interleaving consumes more silicon due to the silicon cost of the permutation tables used in conventional LUT based approaches. For multi-standard support devices the silicon cost of the permutation tables can grow much higher resulting in an un-efficient solution. Therefore, the hardware re-use among different interleaver modules to support multimode processing platform is of significance.The broadness of the interleaving algorithms gives rise to many challenges when considering a true multimode interleaver implementation. The main challenges include real-time low latency computation for different permutation functions, managing wide range of interleaving block sizes, higher throughput, low cost, fast and dynamic reconfiguration for different standards, and introducing parallelism where ever necessary.It is difficult to merge all currently used interleavers to a singlearchitecture because of different algorithms and throughputs; however, thefact that multimode coverage does not require multiple interleavers to workat the same time, provides opportunities to use hardware multiplexing. The multimode functionality is then achieved by fast switching between differentstandards. We used the algorithmic level transformations such as 2-Dtransformation, and realization of recursive computations, which appear to bethe key to bring different interleaving functions to the same level. In general,the work focuses on function level hardware re-use, but it also utilizesclassical data-path level optimizations for efficient hardware multiplexingamong different standards.The research has resulted in multiple flexible architectures supporting multiple standards. These architectures target both channel interleaving and turbo-code interleaving. The presented architectures can support both types of communication systems i.e. single-stream and multi-stream systems. Introducing the algorithmic level transformations and then applying hardware re-use methodology has resulted in lower silicon cost while supporting sufficient throughput. According to the database searching in March 2010, we have the first multimode interleaver core covering WLAN (802.11a/b/g and 802.11n), WiMAX (802.16e), 3GPP-WCDMA, 3GPP-LTE, and DVB-T/H on a single architecture with minimum silicon cost. The research also provides the support for parallel interleaver address generation using different architectures. It provides the algorithmic modifications and architectures to generate up to 8 addresses in parallel and handle the memory conflicts on-the-fly.One of the vital requirements for multimode operation is the fast switching between different standards, which is supported by the presented architectures with minimal cycle cost overheads. Fast switching between different standards gives luxury to the baseband processor to re-configure theinterleaver architecture on-the-fly and re-use the same hardware for another standard. Lower silicon cost, maximum flexibility and fast switchability among multiple standards during run time make the proposed research a good choice for the radio baseband processing platforms.
  •  
2.
  • Ehliar, Andreas, 1978- (författare)
  • Performance driven FPGA design with an ASIC perspective
  • 2009
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • FPGA devices are an important component in many modern devices. This means that it is important that VLSI designers have a thorough knowledge of how to optimize designs for FPGAs. While the design flows for ASICs and FPGAs are similar, there are many differences as well due to the limitations inherent in FPGA devices. To be able to use an FPGA efficiently it is important to be aware of both the strengths and oweaknesses of FPGAs. If an FPGA design should be ported to an ASIC at a later stage it is also important to take this into account early in the design cycle so that the ASIC port will be efficient.This thesis investigates how to optimize a design for an FPGA through a number of case studies of important SoC components. One of these case studies discusses high speed processors and the tradeoffs that are necessary when constructing very high speed processors in FPGAs. The processor has a maximum clock frequency of 357~MHz in a Xilinx Virtex-4 devices of the fastest speedgrade, which is significantly higher than Xilinx' own processor in the same FPGA.Another case study investigates floating point datapaths and describes how a floating point adder and multiplier can be efficiently implemented in an FPGA.The final case study investigates Network-on-Chip architectures and how these can be optimized for FPGAs. The main focus is on packet switched architectures, but a circuit switched architecture optimized for FPGAs is also investigated.All of these case studies also contain information about potential pitfalls when porting designs optimized for an FPGA to an ASIC. The focus in this case is on systems where initial low volume production will be using FPGAs while still keeping the option open to port the design to an ASIC if the demand is high. This information will also be useful for designers who want to create IP cores that can be efficiently mapped to both FPGAs and ASICs.Finally, a framework is also presented which allows for the creation of custom backend tools for the Xilinx design flow. The framework is already useful for some tasks, but the main reason for including it is to inspire researchers and developers to use this powerful ability in their own design tools.
  •  
3.
  • Eilert, Johan (författare)
  • ASIP for Wireless Communication and Media
  • 2010
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • While general purpose processors reach both high performance and high application flexibility, this comes at a high cost in terms of silicon area and power consumption. In systems where high application flexibility is not required, it is possible to trade off flexibility for lower cost by tailoring the processor to the application to create an Application Specific Instruction set Processor (ASIP) with high performance yet low silicon cost. This thesis demonstrates how ASIPs with application specific data types can provide efficient solutions with lower cost. Two examples are presented, an audio decoder ASIP for audio and music processing and a matrix manipulation ASIP for MIMO radio baseband signal processing. The audio decoder ASIP uses a 16-bit floating point data type to reduce the size of the data memory to about 60% of other solutions that use a 32-bit data type. Since the data memory occupies a major part of the silicon area, this has a significant impact on the total silicon area, and thereby also the static and dynamic power consumption. The data width reduction can be done without any noticeable artifacts in the decoded audio due to the natural masking effect ofthe human ear. The matrix manipulation SIMD ASIP is designed to perform various matrix operations such as matrix inversion and QR decomposition of small complex-valued matrices. This type of processing is found in MIMO radio baseband signal processing and the matrices are typically not larger than 4x4. There have been solutions published that use arrays of fixed-function processing elements to perform these operations, but the proposed ASIP performs the computations in less time and with lower hardware cost. The matrix manipulation ASIP data path uses a floating point data type to avoid data scaling issues associated with fixed point computations, especially those related to division and reciprocal calculations, and it also simplifies the program control flow since no special cases for certain inputs are needed which is especially important for SIMD architectures. These two applications were chosen to show how ASIPs can be a suitable alternative and match the requirements for different types of applications, to provide enough flexibility and performance to support different standards and algorithms with low hardware cost.
  •  
4.
  • Haque, Muhammad Fahim Ul (författare)
  • Pulse-Width Modulated RF Transmitters
  • 2017
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • The market for wireless portable devices has grown signicantly over the recent years.Wireless devices with ever-increased functionality require high rate data transmissionand reduced costs. High data rate is achieved through communication standards such asLTE and WLAN, which generate signals with high peak-to-average-power ratio (PAPR),hence requiring a power amplier (PA) that can handle a large dynamic range signal. Tokeep the costs low, modern CMOS processes allow the integration of the digital, analogand radio functions on to a single chip. However, the design of PAs with large dynamicrange and high eciency is challenging due to the low voltage headroom.To prolong the battery life, the PAs have to be power-ecient as they consume a sizablepercentage of the total power. For LTE and WLAN, traditional transmitters operatethe PA at back-o power, below their peak efficiency, whereas pulse-width modulation(PWM) transmitters use the PA at their peak power, resulting in a higher efficiency.PWM transmitters can use both linear and SMPAs where the latter are more power efficient and easy to implement in nanometer CMOS. The PWM transmitters have a higher efficiency but suffer from image and aliasing distortion, resulting in a lower dynamic range,amplitude and phase resolution.This thesis studies several new transmitter architectures to improve the dynamicrange, amplitude and phase resolution of PWM transmitters with relaxed filtering requirements.The architectures are suited for fully integrated CMOS solutions, in particular forportable applications.The first transmitter (MAF-PWMT) eliminates aliasing and image distortions whileallowing the use of SMPAs by combining RF-PWM and band-limited PWM. The transmittercan be implemented using all-digital techniques and exhibits an improved linearity and spectral performance. The approach is validated using a Class-D PA based transmitter where an improvement of 10.2 dB in the dynamic range compared to a PWM transmitter for a 1.4 MHz of LTE signal is achieved.The second transmitter (AC-PWMT) compensates for aliasing distortion by combining PWM and outphasing. It can be used with switch-mode PAs (SMPAs) or linear PAs at peak power. The proposed transmitter shows better linearity, improved spectral performanceand increased dynamic range as it does not suffer from AM-AM distortion of the PAs and aliasing distortion due to digital PWM. The idea is validated using push-pull PAs and the proposed transmitter shows an improvement of 9 dB in the dynamic rangeas compared to a PWM transmitter using digital pulse-width modulation for a 1.4 MHzLTE signal.The third transmitter (MD-PWMT) is an all-digital implementation of the second transmitter. The PWM is implemented using a Field Programmable Gate Array(FPGA) core, and outphasing is implemented as pulse-position modulation using FPGA transceivers, which drive two class-D PAs. The digital implementation offers the exibility to adapt the transmitter for multi-standard and multi-band signals. From the measurement results, an improvement of 5 dB in the dynamic range is observed as compared to an all-digital PWM transmitter for a 1.4 MHz LTE signal.The fourth transmitter (EP-PWMT) improves the phase linearity of an all-digital PWM transmitter using PWM and asymmetric outphasing. The transmitter uses PWM to encode the amplitude, and outphasing for enhanced phase control thus doubling the phase resolution. The measurement setup uses Class-D PAs to amplify a 1.4 MHz LTEup-link signal. An improvement of 2.8 dB in the adjacent channel leakage ratio is observed whereas the EVM is reduced by 3.3 % as compared to an all-digital PWM transmitter.The fifth transmitter (CRF-ML-PWMT) combines multilevel and RF-PWM, whereas the sixth transmitter (CRF-MP-PMWT) combines multiphase PWM and RF-PWM. Both transmitters have smaller chip area as compared to the conventional multiphase and multilevel PWM transmitters, as a combiner is not required. The proposed transmitters also show better dynamic range and improved amplitude resolution as compared to conventional RF-PWM transmitters.The solutions presented in this thesis aims to enhance the performance and simplify the digital implementation of PWM-based RF transmitters.
  •  
5.
  • Karlsson, Andréas, 1986- (författare)
  • Design of Energy-Efficient High-Performance ASIP-DSP Platforms
  • 2016
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • In the last ten years, limited clock frequency scaling and increasing power density has shifted IC design focus towards parallelism, heterogeneity and energy efficiency. Improving energy efficiency is by no means simple and it calls for a reevaluation of old design choices in processor architecture, and perhaps more importantly, development of new programming methodologies that exploit the features of modern architectures.This thesis discusses the design of energy-efficient digital signal processors with application-specific instructions sets, so-called ASIP-DSPs, and their programming tools. Target applications for such processors include, but are not limited to, communications, multimedia, image processing, intelligent vision and radar. These applications are often implemented by a limited set of kernel algorithms, whose performance and efficiency are critical to the application's success. At the same time, the extreme non-recurring engineering cost of system-on-chip designs means that product life-time must be kept as long as possible. Neither general-purpose processors nor non-programmable ASICs can meet both the flexibility and efficiency requirements, and ASIPs may instead be the best trade-off between all the conflicting goals.Traditional superscalar- and VLIW processor design focus has been to improve the throughput of fine-grained instructions, which results in high flexibility, but also high energy consumption. SIMD architectures, on the other hand, are often restricted by inefficient data access. The result is architectures which spend more energy and/or time on supporting operations rather than actual computing.This thesis defines the performance limit of an architecture with an N-way parallel datapath as consuming 2N elements of compute data per clock cycle. To approach this performance, this work proposes coarse-grained higher-order functional (HOF) instructions, which encode the most  frequently executed compute-, data access- and control sequences into single many-cycle instructions, to reduce the overheads of instruction delivery, while at the same time maintaining orthogonality. The work further investigates opportunities for operation fusion to improve computing performance, and proposes a flexible memory subsystem for conflict-free parallel memory access with permutation and lookup-table-based addressing, to ensure that high computing throughput can be sustained even in the presence of irregular data access patterns. These concepts are extensively studied by implementing a large kernel algorithm library with typical DSP kernels, to prove their effectiveness and adequacy. Compared to contemporary VLIW DSP solutions, our solution can practically eliminate instruction fetching energy in many scenarios, significantly reduce control path switching, simplify the implementation of kernels and reduce code size, sometimes by as much as 30 times.The techniques proposed in this thesis have been implemented in the DSP platform ePUMA (embedded Parallel DSP processor with Unique Memory Access), a configurable control-compute heterogeneous platform with distributed memory, optimized for low-power predictable DSP computing. Hardware evaluation has been done with FPGA prototypes. In addition, several VLSI layouts have been created for energy and area estimations. This includes smaller designs, as well as a large design with 73 cores, capable of 1280 integer GOPS or 256 GFLOPS at 500MHz and which measures 45mm2 in 28nm FD-SOI technology.In addition to the hardware design, this thesis also discusses parallel programming flow for distributed memory architectures and ePUMA application implementation. A DSP kernel programming language and its compiler is presented. This effectively demonstrates how kernels written in a high-level language can be translated into HOF instructions for very high processing efficiency.
  •  
6.
  • Sohl, Joar, 1982- (författare)
  • Efficient Compilation for Application Specific Instruction set DSP Processors with Multi-bank Memories
  • 2015
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Modern signal processing systems require more and more processing capacity as times goes on. Previously, large increases in speed and power efficiency have come from process technology improvements. However, lately the gain from process improvements have been greatly reduced. Currently, the way forward for high-performance systems is to use specialized hardware and/or parallel designs.Application Specific Integrated Circuits (ASICs) have long been used to accelerate the processing of tasks that are too computationally heavy for more general processors. The problem with ASICs is that they are costly to develop and verify, and the product life time can be limited with newer standards. Since they are very specific the applicable domain is very narrow.More general processors are more flexible and can easily adapt to perform the functions of ASIC based designs. However, the generality comes with a performance cost that renders general designs unusable for some tasks. The question then becomes, how general can a processor be while still being power efficient and fast enough for some particular domain?Application Specific Instruction set Processors (ASIPs) are processors that target a specific application domain, and can offer enough performance  with power efficiency and silicon cost that is comparable to ASICs. The flexibility allows for the same hardware design to be used over several system designs, and also for multiple functions in the same system, if some functions are not used simultaneously.One problem with ASIPs is that they are more difficult to program than a general purpose processor, given that we want efficient software. Utilizing all of the features that give an ASIP its performance advantage can be difficult at times, and new tools and methods for programming them are needed.This thesis will present ePUMA (embedded Parallel DSP platform with Unique Memory Access), an ASIP architecture that targets algorithms with predictable data access. These kinds of algorithms are very common in e.g. baseband processing or multimedia applications. The primary focus will be on the specific features of ePUMA that are utilized to achieve high performance, and how it is possible to automatically utilize them using tools. The most significant features include data permutation for conflict-free data access, and utilization of address generation features for overhead free code execution. This sometimes requires specific information; for example the exact sequences of addresses in memory that are accessed, or that some operations may be performed in parallel. This is not always available when writing code using the traditional way with traditional languages, e.g. C, as extracting this information is still a very active research topic. In the near future at least, the way that software is written needs to change to exploit all hardware features, but in many cases in a positive way. Often the problem with current methods is that code is overly specific, and that a more general abstractions are actually easier to generate code from.
  •  
7.
  • Wang, Jian, 1982- (författare)
  • Low Overhead Memory Subsystem Design for a Multicore Parallel DSP Processor
  • 2014
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • The physical scaling following Moore’s law is saturated while the requirement on computing keeps growing. The gain from improving silicon technology is only the shrinking of the silicon area, and the speed-power scaling has almost stopped in the last two years. It calls for new parallel computing architectures and new parallel programming methods.Traditional ASIC (Application Specific Integrated Circuits) hardware has been used for acceleration of Digital Signal Processing (DSP) subsystems on SoC (System-on-Chip). Embedded systems become more complicated, and more functions, more applications, and more features must be integrated in one ASIC chip to follow up the market requirements. At the same time, the product lifetime of a SoC with ASIC has been much reduced because of the dynamic market. The life time of the design for a typical main chip in a mobile phone based on ASIC acceleration is about half a year and the NRE (Non-Recurring Engineering) cost of it can be much more than 50 million US$.The current situation calls for a new solution than that of ASIC. ASIP (Application Specific Instruction set Processor) offers comparable power consumption and silicon cost to ASICs. Its greatest advantage is the functional flexibility in a predefined application domain. ASIP based SoC enables software upgrading without changing hardware. Thus the product life time can be 5-10 times more than that of ASIC based SoC.This dissertation will present an ASIP based SoC, a new unified parallel DSP subsystem named ePUMA (embedded Parallel DSP Platform with Unique Memory Access), to target embedded signal processing in  communication and multimedia applications. The unified DSP subsystem can further reduce the hardware cost, especially the memory cost, of embedded SoC processors, and most importantly, provide full programmability for a wide range of DSP applications. The ePUMA processor is based on a master-slave heterogeneous multi-core architecture. One master core performs the central control, and multiple Single Instruction Multiple Data (SIMD) coprocessors work in parallel to offer a majority of the computing power.The focus and the main contribution of this thesis are on the memory subsystem design of ePUMA. The multi-core system uses a distributed memory architecture based on scratchpad memories and software controlled data movement. It is suitable for the data access properties of streaming applications and the kernel based multi-core computing model. The essential techniques include the conflict free access parallel memory architecture, the multi-layer interconnection network, the non-address stream data transfer, the transitioned memory buffers, and the lookup table based parallel memory addressing. The goal of the design is to minimize the hardware cost, simplify the software protocol for inter-processor communication, and increase the arithmetic computing efficiency.We have so far proved by applications that most DSP algorithms, such as filters, vector/matrix operations, transforms, and arithmetic functions, can achieve computing efficiency over 70% on the ePUMA platform. And the non-address stream network provides equivalent communication bandwidth by less than 30% implementation cost of a crossbar interconnection.
  •  
8.
  • Wu, Di, 1979- (författare)
  • Scalable Multi-Standard Radio Baseband for Modern Wireless Communications
  • 2009
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Today, owing to the rapid advancement of technologies, people can cross the geographic gap and communicate without waiting for a week to receive a mail. Meanwhile, more and more wireless communications standards are emerging, as all claimed to make our life easier. This really brings us into a dilemma: we need new technologies, not because we are fond of technical complication, on the contrary, because we are constantly pursuing convenience and simplicity. Being tangled by so many standards for connectivity is not fun for anyone (even for people who invented these technologies). The demand is rather simple: why not put everything into one unit which can automatically attach itself to the most suitable radio access available in the circumstances? The whole purpose of this thesis is to find out an economic way of meeting such a demand. From semiconductor industry’s point-of-view, traditional ASIC design flow is facing the challenges brought by the ever rapidly changing specification and immense tape-out cost at nanoscale. Let alone the ever increased system complexity requires painstaking and costly integration and verification. This thesis investigates multi-tasking radio which is a concept to allow multiple radio access technologies to be supported by the same hardware platform and switched under different scenarios. By simultaneously looking at different layers of abstraction such as system modeling and simulation, architecture design, and silicon implementation, the design tradeoff for multi-tasking radio baseband is discussed. In this dissertation, taking the emerging mobile broadband standard 3GPP LTE as the focus and other standards (e.g IEEE 802.11n and DVB) as complements, the system architecture of a multi-tasking radio platform is studied. A general multi-tasking radio baseband chain is partitioned into several functional blocks according to the processing flow and investigated separately. These blocks include synchronization, channel estimation, demodulation and channel coding. Different algorithms are evaluated for each functional block. A new multiple-input multipleoutput symbol detection algorithm “modified fixed-complexity soft-output”, in short MFCSO, is proposed and implemented in silicon. A unified synchronization unit is presented to support several standards. The architecture of channel estimator is also addressed. Finally a highspeed radix-2 Turbo decoder implementation is presented leading towards radix-4 scenario. It is worth mentioning that in this dissertation, the performance evaluation takes the complete system into consideration rather than independently analyzing an individual block. Based on this, algorithm/hardware co-optimization is carried out. Using the “Single Instruction Multiple Tasks” architecture presented earlier, by exploring the commonality of signal processing functions and choosing the proper level of hardware multiplexing, it is concluded in this dissertation that system thinking allows a harmony to be achieved for multi-tasking radio baseband design.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-8 av 8

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy