SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Ejaz Ahsen 1986) "

Sökning: WFRF:(Ejaz Ahsen 1986)

  • Resultat 1-9 av 9
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Alvarez, Lluc, et al. (författare)
  • eProcessor: European, Extendable, Energy-Efficient, Extreme-Scale, Extensible, Processor Ecosystem
  • 2023
  • Ingår i: Proceedings of the 20th ACM International Conference on Computing Frontiers 2023, CF 2023. ; , s. 309-314
  • Konferensbidrag (refereegranskat)abstract
    • The eProcessor project aims at creating a RISC-V full stack ecosystem. The eProcessor architecture combines a high-performance out-of-order core with energy-efficient accelerators for vector processing and artificial intelligence with reduced-precision functional units. The design of this architecture follows a hardware/software co-design approach with relevant application use cases from the high-performance computing, bioinformatics and artificial intelligence domains. Two eProcessor prototypes will be developed based on two fabricated eProcessor ASICs integrated into a computer-on-module.
  •  
2.
  • Ejaz, Ahsen, 1986 (författare)
  • DDRNoC: Dual Data-Rate Network-on-Chip
  • 2018
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Networks-on-Chip (NoCs) are becoming increasing important for the performance of modern multi-core system-on-chip. For various on-chip networks with virtual channel (VC) ow control, the slow control logic (VC and switch allocation logic) of the NoC routers limits the NoC clock period while their datapath (switch and link) possesses signifcant slack. This slack results in wasted performance potential of the datapath, limits the saturation throughput of the network and reduces its energy efficiency. The aim of this thesis is to improve NoC performance by eliminating this slack and removing control logic from the router critical path. To this end, this thesis presents the Dual Data-Rate (DDR) network architecture called the DDRNoC. It utilizes the NoC datapath twice with in a clock cycle to forward its at DDR. This not only exploits the slack present in the datapath but also requires a clock with period twice the datapath delay, thus removing the shorter control logic from the critical path. This enables the DDRNoC to achieve throughput higher than single data-rate networks. Moreover, the DDRNoC also employs lookahead signalling to reduce end-to-end packet latency. FreewayNoC, an extension to the DDRNoC supplements the DDRNoC with simplified pipeline stage bypassing to reduce the zero-load latency of packets in the network. Implementation of the DDRNoC and FreewayNoC architectures require redesign of the switch allocation (SA) mechanism to resolve contention among competing its by granting up to two its access to each switch input and output port per clock cycle. It further requires separate paths for the propagation of lookahead control signals. FreewayNoC also requires implementation of multiple checks to guarantee con ict-free bypassing of the SA stage. Physical implementation results using 28nm process technology show that DDRNoC and FreewayNoC have 5% and 15% area overhead, respectively, compared to a simple 3-stage network with VCs. Performance evaluation shows that for a 16X16 mesh network, FreewayNoC supports 25% higher throughput compared to current state-of-the-art NoC, ShortPath. Moreover, FreewayNoC achieves a zero-load latency which scales better than ShortPath and equally well with an ideal network that has no control overheads. For application driven traffic, FreewayNoC reduces average packet latency by 18% compared to ShortPath. Alternatively, low voltage implementation of the DDRNoC and FreewayNoC can be used to conserve power and improve energy efficiency at the cost of higher packet latency.
  •  
3.
  • Ejaz, Ahsen, 1986, et al. (författare)
  • DDRNoC: Dual Data-Rate Network-on-Chip
  • 2017
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • This paper introduces DDRNoC, an on-chip interconnection network able to route packets at Dual Data Rate. The cycle time of current 2D-mesh Network-on-Chip routers is limited by their control as opposed to the datapath (switch and link traversal) which exhibits significant slack. DDRNoC capitalizes on this observation allowing two flits per cycle to share the same datapath. Thereby, DDRNoC achieves higher throughput than a Single Data Rate (SDR) network. Alternatively, using lower voltage circuits, the above slack can be exploited to reduce power consumption while matching the SDR network throughput. In addition, DDRNoC exhibits reduced clock distribution power, improving energy efficiency, as it needs a slower clock than a SDR network that routes packets at the same rate. Post place and route results in 28 nm technology show that, compared to an iso-voltage (1.1V) SDR network, DDRNoC improves throughput proportionally to the SDR datapath slack. Moreover, a low-voltage (0.95V) DDRNoC implementation converts that slack to power reduction offering the 1.1V SDR throughput at a substantially lower energy cost.
  •  
4.
  • Ejaz, Ahsen, 1986, et al. (författare)
  • DDRNoC: Dual Data-Rate Network-on-Chip
  • 2018
  • Ingår i: Transactions on Architecture and Code Optimization. - : Association for Computing Machinery (ACM). - 1544-3973 .- 1544-3566. ; 15:2
  • Tidskriftsartikel (refereegranskat)abstract
    • This article introduces DDRNoC, an on-chip interconnection network capable of routing packets at Dual Data Rate. The cycle time of current 2D-mesh Network-on-Chip routers is limited by their control as opposed to the datapath (switch and link traversal), which exhibits significant slack. DDRNoC capitalizes on this observation, allowing two flits per cycle to share the same datapath. Thereby, DDRNoC achieves higher throughput than a Single Data Rate (SDR) network. Alternatively, using lower voltage circuits, the above slack can be exploited to reduce power consumption while matching the SDR network throughput. In addition, DDRNoC exhibits reduced clock distribution power, improving energy efficiency, as it needs a slower clock than a SDR network that routes packets at the same rate. Post place and route results in 28nm technology show that, compared to an iso-voltage (1.1V) SDR network, DDRNoC improves throughput proportionally to the SDR datapath slack. Moreover, a low-voltage (0.95V) DDRNoC implementation converts that slack to power reduction offering the 1.1V SDR throughput at a substantially lower energy cost.
  •  
5.
  • Ejaz, Ahsen, 1986 (författare)
  • Dual Data Rate Network-on-Chip Architectures
  • 2021
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Networks-on-Chip (NoCs) are becoming increasing important for the performance of modern multi-core systems-on-chip. The performance of current NoCs is limited, among others, by two factors: their limited clock frequency and long router pipeline. The clock frequency of a network defines the limits of its saturation throughput. However, for high throughput routers, clock is constrained by the control logic (for virtual channel and switch allocation) whereas the datapath (crossbar switch and links) possesses significant slack. This slack in the datapath wastes network throughput potential. Secondly, routers require flits to go through a large number of pipeline stages increasing packet latency at low traffic loads. These stages include router resource allocation, switch traversal (ST) and link traversal (LT). The allocation stages are used to manage contention among flits attempting to simultaneously access switch and links, and the ST stage is needed to change the routing dimension. In some cases, these stages are not needed and then requiring flits to go through them increases packet latency. The aim of this thesis is to improve NoC performance, in terms of network throughput, by removing the slack in the router datapath, and in terms of average packet latency, by enabling incoming flits to bypass, when possible, allocation and ST stages. More precisely, this thesis introduces Dual Data-Rate (DDR) NoC architectures which exploit the slack present in the NoC datapath to operate it at DDR. This requires a clock with period twice the datapath delay and removes the control logic from the critical path. DDR datapaths enable throughput higher than existing single data-rate (SDR) networks where the clock period is defined by the control logic. Additionally, this thesis supplements DDR NoC architectures with varying levels of pipeline stage bypassing capabilities to reduce low-load packet latency. In order to avoid complex logic required for bypassing from all inputs to all outputs, this thesis implements and evaluates a simplified bypassing approach. DDR NoC routers support bypassing of the allocation stage for flits propagating an in-network straight hop (i.e. East to West, North to South and vice versa) and when entering or exiting the network. Disabling bypassing during XY-turns limits its benefits, but, for most routing algorithms under low traffic loads, flits encounter at most one XY-turn on their way to the destination. Bypassing allocation stage enables incoming flits to directly initiate ST, when required conditions are met, and propagate at one cycle per hop. Furthermore, DDR NoC routers allow flits to bypass the ST stage when propagating a straight hop from the head of a specific input VC. Restricting ST bypassing from a specific VC further simplifies check logic to have clock period defined by datapath delays. Bypassing ST requires dedicated bypass paths from non-local input ports to opposite output ports. It enables flits to propagate half a cycle per hop. This thesis shows that compared to current state-of-the-art SDR NoCs, operating router’s datapath at DDR improves throughput by up to 20%. Adding to a DDR NoC an allocation bypassing mechanism for straight hops reduces its packet latency by up to 45%, while maintaining the DDR throughput advantage. Enhancing allocation bypassing to include flits entering and exiting the network further reduces latency by another 24%. Finally, adding ST bypassing further reduces latency by another 32%. Overall, DDR NoCs offer up to 40% lower latency and about 20% higher throughput compared to the SDR networks.
  •  
6.
  • Ejaz, Ahsen, 1986, et al. (författare)
  • FastTrackNoC: A DDR NoC with FastTrack Router Datapaths
  • 2021
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • This paper introduces FastTrackNoC, a Network-on-Chip router architecture that reduces latency by bypassing its switch traversal (ST) stage. FastTrackNoC adds a fast-track path between the head of a particular virtual channel (VC) buffer at each input port and the link of the opposite output. This allows non-turning flits to bypass ST when the required router resources are available. FastTrackNoC combines ST bypassing with existing techniques for reducing latency, namely, pipeline bypassing of control stages, precomputed routing and lookahead control signaling, to allow at best a flit to proceed directly to link traversal (LT). FastTrackNoC is applied to a Dual Data Rate (DDR) router in order to maximize throughput. Post place and route results in 28nm technology show that (i) compared to the current state of the art DDR NoCs, FastTrackNoC offers the same throughput and reduces average packet latency by 11-32% requiring up to 5% more power and (ii) compared to current state of the art Single Data Rate (SDR) NoCs, FastTrackNoC reduces packet latency by 9-40% and achieves 16-19% higher throughput with 5% higher power at the SDR NoC saturation point.
  •  
7.
  • Ejaz, Ahsen, 1986, et al. (författare)
  • FastTrackNoC: A NoC with FastTrack Router Datapaths
  • 2022
  • Ingår i: Proceedings - International Symposium on High-Performance Computer Architecture. - 1530-0897. ; 2022-April, s. 971-985
  • Konferensbidrag (refereegranskat)abstract
    • This paper introduces FastTrackNoC, a Network-on-Chip (NoC) router architecture that reduces packet latency by bypassing its switch traversal (ST) stage. It is based on the observation that there is a bias in the direction a flit takes through a router, e.g., in a 2D mesh network, non-turning hops are preferred, especially when dimension order routing is used. FastTrackNoC capitalizes on this observation and adds to a 2D mesh router a fast-track path between the head of a single input virtual channel (VC) buffer and its most popular, opposite output. This allows non-turning flits to bypass ST logic, i.e., buffer-, input-and output multiplexing, when the required router resources are available. FastTrackNoC combines ST bypassing with existing techniques for reducing latency, namely, allocation bypassing, precomputed routing, and lookahead control signaling to allow at best incoming flits to proceed directly to link traversal (LT). Moreover, it is applied to a Dual Data Rate (DDR) router in order to maximize network throughput. Post place and route results in 28nm show the following: compared to previous DDR NoCs, FastTrackNoC offers 13-32% lower average packet latency; compared to previous multi-VC Single Data Rate (SDR) NoCs, FastTrackNoC reduces latency by 10-40% and achieves 18-21% higher throughput, and compared to single-channel SDR NoC offers up to 50% higher throughput and similar latency.
  •  
8.
  • Ejaz, Ahsen, 1986, et al. (författare)
  • FreewayNoC: A DDR NoC with Pipeline Bypassing
  • 2018
  • Ingår i: 2018 12th IEEE/ACM International Symposium on Networks-on-Chip, NOCS 2018. - 9781538648933
  • Konferensbidrag (refereegranskat)abstract
    • This paper introduces FreewayNoC, a Network-on-chip that routes packets at Dual Data Rate (DDR) and allows pipeline bypassing. Based on the observation that routers datapath is faster than control, a recent NoC design allowed flits to be routed at DDR improving throughput to rates defined solely by the switch and link traversal, rather than by the control. However, such a DDR NoC suffers from high packet latency as flits require multiple cycles per hop. A common way to reduce latency at low traffic load is pipeline bypassing, then, flits that find a contention-free way to the output port can directly traverse the switch. Existing Single Data Rate (SDR) NoC routers support it, but applying pipeline bypassing to a DDR router is more challenging. It would need additional bypassing logic which would add to the cycle time compromising the DDR NoC throughput advantage. FreewayNoC design restricts pipeline bypassing on a DDR router to only flits that go straight simplifying its logic. Thereby, it offers low packet latency without affecting DDR router cycle time and throughput. Then, at low traffic loads, besides the few turns that a flit would take on its way from source to destination, all other hops could potentially offer minimum latency equal to the delay of the switch and link traversal. Post place and route results in 28 nm technology confirm the above and also show that zero-load latency scales to the hop count better than current state-of-the-art NoCs.
  •  
9.
  • Ejaz, Ahsen, 1986, et al. (författare)
  • HighwayNoC: Approaching Ideal NoC Performance With Dual Data Rate Routers
  • 2021
  • Ingår i: IEEE/ACM Transactions on Networking. - 1558-2566 .- 1063-6692. ; 29:1, s. 318-331
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper describes HighwayNoC,, a Network-on-chip (NoC) that approaches ideal network performance using a Dual Data Rate (DDR) datapath. Based on the observation that routers datapath is faster than control, a DDR NoC allows flits to be routed at DDR improving throughput to rates defined solely by the datapath, rather than by the control. DDR NoCs can use pipeline bypassing to reduce packet latency at low traffic load. However, existing DDR routers offer bypassing only on in-network, non-turning hops to simplify the required logic. HighwayNoC, extends bypassing support of DDR routers to local ports, allowing flits to enter and exit the network faster. Moreover, it simplifies the DDR switch allocation and the interface of router ports reducing power and area costs. Post place and route results in 28 nm technology show that HighwayNoC, performs better than current state of the art NoCs. Compared to previous DDR NoCs, HighwayNoC, reduces average packet latency by 7.3-27% and power consumption by 1-10%, without affecting throughput. Compared to existing Single Data Rate NoCs, HighwayNoC, achieves 17-22% higher throughput, has similar or up to 13.8% lower packet latency, and mixed energy efficiency results.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-9 av 9

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy