SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "(db:Swepub) pers:(Chen Xiaowen) srt2:(2018)"

Sökning: (db:Swepub) pers:(Chen Xiaowen) > (2018)

  • Resultat 1-3 av 3
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Chen, Xiaowen, et al. (författare)
  • A Variable-Size FFT Hardware Accelerator Based on Matrix Transposition
  • 2018
  • Ingår i: IEEE Transactions on Very Large Scale Integration (vlsi) Systems. - : Institute of Electrical and Electronics Engineers (IEEE). - 1063-8210 .- 1557-9999. ; 26:10, s. 1953-1966
  • Tidskriftsartikel (refereegranskat)abstract
    • Fast Fourier transform (FFT) is the kernel and the most time-consuming algorithm in the domain of digital signal processing, and the FFT sizes of different applications are very different. Therefore, this paper proposes a variable-size FFT hardware accelerator, which fully supports the IEEE-754 single-precision floating-point standard and the FFT calculation with a wide size range from 2 to 220 points. First, a parallel Cooley-Tukey FFT algorithm based on matrix transposition (MT) is proposed, which can efficiently divide a large size FFT into several small size FFTs that can be executed in parallel. Second, guided by this algorithm, the FFT hardware accelerator is designed, and several FFT performance optimization techniques such as hybrid twiddle factor generation, multibank data memory, block MT, and token-based task scheduling are proposed. Third, its VLSI implementation is detailed, showing that it can work at 1 GHz with the area of 2.4 mm(2) and the power consumption of 91.3 mW at 25 degrees C, 0.9 V. Finally, several experiments are carried out to evaluate the proposal's performance in terms of FFT execution time, resource utilization, and power consumption. Comparative experiments show that our FFT hardware accelerator achieves at most 18.89x speedups in comparison to two software-only solutions and two hardware-dedicated solutions.
  •  
2.
  • Wang, Zicong, et al. (författare)
  • Cache Access Fairness in 3D Mesh-Based NUCA
  • 2018
  • Ingår i: IEEE Access. - : Institute of Electrical and Electronics Engineers (IEEE). - 2169-3536. ; 6, s. 42984-42996
  • Tidskriftsartikel (refereegranskat)abstract
    • Given the increase in cache capacity over the past few decades, cache access effciency has come to play a critical role in determining system performance. To ensure effcient utilization of the cache resources, non-uniform cache architecture (NUCA) has been proposed to allow for a large capacity and a short access latency. With the support of networks-on-chip (NoC), NUCA is often employed to organize the last level cache. However, this method also hurts cache access fairness, which denotes the degree of non-uniformity for cache access latencies. This drop in fairness can result in an increased number of cache accesses with overhigh latency, which leads to a bottleneck in system performance. This paper investigates the cache access fairness in the context of NoC-based 3-D chip architecture, and provides new insights into 3-D architecture design. We propose fair-NUCA (F-NUCA), a co-design scheme intended to optimize cache access fairness. In F-NUCA, we strive to improve fairness by equalizing cache access latencies. To achieve this goal, the memory mapping and the channel width are both redistributed non-uniformly, thereby equalizing the non-contention and contention latencies, respectively. The experimental results reveal that F-NUCA can effectively improve cache access fairness. When F-NUCA is compared with the traditional static NUCA in a simulation with PARSEC benchmarks, the average reductions in average latency and latency standard deviation are 4.64%/9.38% for a 4 x 4 x 2 mesh network, as well as 6.31%/13.51% for a 4 x 4 x 4 mesh network. In addition, a 4.0%/ 6.4% improvement in system throughput can be achieved for the two scales of mesh networks, respectively.
  •  
3.
  • Wang, Z., et al. (författare)
  • VP-Router : On balancing the traffic load in on-chip networks
  • 2018
  • Ingår i: IEICE Electronics Express. - : Institute of Electronics Information Communication Engineers. - 1349-2543. ; 15:22
  • Tidskriftsartikel (refereegranskat)abstract
    • Along with the scaling up for network-on-chips (NoC), the network traffic grows increasingly, and generally the central region is easily to become the traffic hotspots. The problem of unbalanced traffic can lead to a part of network links becoming the bottleneck of network communication, and thus hurt the network and system performance. In this paper, we propose load-balanced link distribution method, which is intended to allocating physical channels according to the traffic load on each link. To support connecting multiple physical channels between two routers, we propose a novel concept of virtual port, and design a low-cost multi-port router called virtual port router (VP-Router). Compared to the network with traditional routers, the network with VP-Routers can effectively balance the network traffic load on links. The experiments with SPLASH2 benchmarks exhibit that VP-Router performs 6.3% and 9.0% better in energy-delay-product (EDP) for 4 × 4 and 8 × 8 mesh networks respectively. As for system throughput, VP-Router improves by about 3.5% and 5.8% on average respectively.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-3 av 3
Typ av publikation
tidskriftsartikel (3)
Typ av innehåll
refereegranskat (3)
Författare/redaktör
Lu, Zhonghai (2)
Chen, Xiaowen (2)
Zhang, J. (1)
Guo, Y (1)
Wang, Z. (1)
Guo, Yang (1)
visa fler...
Lei, Yuanwu (1)
Chen, Shuming (1)
Chen, Xiaowen, 1982- (1)
Wang, Zicong (1)
visa färre...
Lärosäte
Kungliga Tekniska Högskolan (3)
Språk
Engelska (3)
Forskningsämne (UKÄ/SCB)
Teknik (2)
Naturvetenskap (1)
År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy