SwePub
Tyck till om SwePub Sök här!
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Papaefstathiou Ioannis) "

Sökning: WFRF:(Papaefstathiou Ioannis)

  • Resultat 1-10 av 17
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Alvarez, Lluc, et al. (författare)
  • eProcessor: European, Extendable, Energy-Efficient, Extreme-Scale, Extensible, Processor Ecosystem
  • 2023
  • Ingår i: Proceedings of the 20th ACM International Conference on Computing Frontiers 2023, CF 2023. ; , s. 309-314
  • Konferensbidrag (refereegranskat)abstract
    • The eProcessor project aims at creating a RISC-V full stack ecosystem. The eProcessor architecture combines a high-performance out-of-order core with energy-efficient accelerators for vector processing and artificial intelligence with reduced-precision functional units. The design of this architecture follows a hardware/software co-design approach with relevant application use cases from the high-performance computing, bioinformatics and artificial intelligence domains. Two eProcessor prototypes will be developed based on two fabricated eProcessor ASICs integrated into a computer-on-module.
  •  
2.
  • Mavroidis, Iakovos, et al. (författare)
  • ECOSCALE: Reconfigurable computing and runtime system for future exascale systems
  • 2016
  • Ingår i: 19th Design, Automation and Test in Europe Conference and Exhibition, DATE 2016, Dresden, Germany, 14-18 March 2016. - 1530-1591. - 9783981537062 ; , s. 696-701
  • Konferensbidrag (refereegranskat)abstract
    • In order to reach exascale performance, current HPC systems need to be improved. Simple hardware scaling is not a feasible solution due to the increasing utility costs and power consumption limitations. Apart from improvements in implementation technology, what is needed is to refine the HPC application development flow as well as the system architecture of future HPC systems. ECOSCALE tackles these challenges by proposing a scalable programming environment and architecture, aiming to substantially reduce energy consumption as well as data traffic and latency. ECOSCALE introduces a novel heterogeneous energy-efficient hierarchical architecture, as well as a hybrid many-core+OpenCL programming environment and runtime system. The ECOSCALE approach is hierarchical and is expected to scale well by partitioning the physical system into multiple independent Workers (i.e. compute nodes). Workers are interconnected in a tree-like fashion and define a contiguous global address space that can be viewed either as a set of partitions in a Partitioned Global Address Space (PGAS), or as a set of nodes hierarchically interconnected via an MPI protocol. To further increase energy efficiency, as well as to provide resilience, the Workers employ reconfigurable accelerators mapped into the virtual address space utilizing a dual stage System Memory Management Unit with coherent memory access. The architecture supports shared partitioned reconfigurable resources accessed by any Worker in a PGAS partition, as well as automated hardware synthesis of these resources from an OpenCL-based programming model.
  •  
3.
  • Brokalakis, A., et al. (författare)
  • COSSIM: An open-source integrated solution to address the simulator gap for systems of systems
  • 2018
  • Ingår i: Proceedings - 21st Euromicro Conference on Digital System Design, DSD 2018. - 9781538673768 ; , s. 115-120
  • Konferensbidrag (refereegranskat)abstract
    • In an era of complex networked heterogeneous systems, simulating independently only parts, components or attributes of a system under design is not a viable, accurate or efficient option. The interactions are too many and too complicated to produce meaningful results and the optimization opportunities are severely limited when considering each part of a system in an isolated manner. The presented COSSIM simulation framework is the first known open-source, high-performance simulator that can handle holistically system-of-systems including processors, peripherals and networks; such an approach is very appealing to both CPS/IoT and Highly Parallel Heterogeneous Systems designers and application developers. Our highly integrated approach is further augmented with accurate power estimation and security sub-tools that can tap on all system components and perform security and robustness analysis of the overall networked system. Additionally, a GUI has been developed to provide easy simulation set-up, execution and visualization of results. COSSIM has been evaluated using real-world applications representing cloud (mobile visual search) and CPS systems (building management) demonstrating high accuracy and performance that scales almost linearly with the number of CPUs dedicated to the simulator.
  •  
4.
  • Ioannou, Aggelos D., et al. (författare)
  • UNILOGIC: A Novel Architecture for Highly Parallel Reconfigurable Systems
  • 2020
  • Ingår i: ACM Transactions on Reconfigurable Technology and Systems. - : Association for Computing Machinery (ACM). - 1936-7414 .- 1936-7406. ; 13:4
  • Tidskriftsartikel (refereegranskat)abstract
    • One of the main characteristics of High-performance Computing (HPC) applications is that they become increasingly performance and power demanding, pushing HPC systems to their limits. Existing HPC systems have not yet reached exascale performance mainly due to power limitations. Extrapolating from today's top HPC systems, about 100-200 MWatts would be required to sustain an exaflop-level of performance. A promising solution for tackling power limitations is the deployment of energy-efficient reconfigurable resources (in the form of Field-programmable Gate Arrays (FPGAs)) tightly integrated with conventional CPUs. However, current FPGA tools and programming environments are optimized for accelerating a single application or even task on a single FPGA device. In this work, we present UNILOGIC (Unified Logic), a novel HPC-tailored parallel architecture that efficiently incorporates FPGAs. UNILOGIC adopts the Partitioned Global Address Space (PGAS) model and extends it to include hardware accelerators, i.e., tasks implemented on the reconfigurable resources. The main advantages of UNILOGIC are that (i) the hardware accelerators can be accessed directly by any processor in the system, and (ii) the hardware accelerators can access any memory location in the system. In this way, the proposed architecture offers a unified environment where all the reconfigurable resources can be seamlessly used by any processor/operating system. The UNILOGIC architecture also provides hardware virtualization of the reconfigurable logic so that the hardware accelerators can be shared among multiple applications or tasks. The FPGA layer of the architecture is implemented by splitting its reconfigurable resources into (i) a static partition, which provides the PGAS-related communication infrastructure, and (ii) fixed-size and dynamically reconfigurable slots that can be programmed and accessed independently or combined together to support both line and coarse grain reconfiguration.(1) Finally, the UNILOGIC architecture has been evaluated on a custom prototype that consists of two 1U chassis, each of which includes eight interconnected daughter boards, called Quad-FPGA Daughter Boards (QFDBs); each QFDB supports four tightly coupled Xilinx Zynq Ultrascalei MPSoCs as well as 64 Gigabytes of DDR4 memory, and thus, the prototype features a total of 64 Zynq MPSoCs and 1 Terabyte of memory. We tuned and evaluated the UNILOGIC prototype using both low-level (baremetal) performance tests, as well as two popular real-world HPC applications, one compute-intensive and one data-intensive. Our evaluation shows that UNILOGIC offers impressive performance that ranges from being 2.5 to 400 times faster and 46 to 300 times more energy efficient compared to conventional parallel systems utilizing only high-end CPUs, while it also outperforms GPUs by a factor ranging from 3 to 6 times in terms of time to solution, and from 10 to 20 times in terms of energy to solution.
  •  
5.
  • Ejaz, Ahsen, 1986, et al. (författare)
  • DDRNoC: Dual Data-Rate Network-on-Chip
  • 2017
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • This paper introduces DDRNoC, an on-chip interconnection network able to route packets at Dual Data Rate. The cycle time of current 2D-mesh Network-on-Chip routers is limited by their control as opposed to the datapath (switch and link traversal) which exhibits significant slack. DDRNoC capitalizes on this observation allowing two flits per cycle to share the same datapath. Thereby, DDRNoC achieves higher throughput than a Single Data Rate (SDR) network. Alternatively, using lower voltage circuits, the above slack can be exploited to reduce power consumption while matching the SDR network throughput. In addition, DDRNoC exhibits reduced clock distribution power, improving energy efficiency, as it needs a slower clock than a SDR network that routes packets at the same rate. Post place and route results in 28 nm technology show that, compared to an iso-voltage (1.1V) SDR network, DDRNoC improves throughput proportionally to the SDR datapath slack. Moreover, a low-voltage (0.95V) DDRNoC implementation converts that slack to power reduction offering the 1.1V SDR throughput at a substantially lower energy cost.
  •  
6.
  • Ejaz, Ahsen, 1986, et al. (författare)
  • DDRNoC: Dual Data-Rate Network-on-Chip
  • 2018
  • Ingår i: Transactions on Architecture and Code Optimization. - : Association for Computing Machinery (ACM). - 1544-3973 .- 1544-3566. ; 15:2
  • Tidskriftsartikel (refereegranskat)abstract
    • This article introduces DDRNoC, an on-chip interconnection network capable of routing packets at Dual Data Rate. The cycle time of current 2D-mesh Network-on-Chip routers is limited by their control as opposed to the datapath (switch and link traversal), which exhibits significant slack. DDRNoC capitalizes on this observation, allowing two flits per cycle to share the same datapath. Thereby, DDRNoC achieves higher throughput than a Single Data Rate (SDR) network. Alternatively, using lower voltage circuits, the above slack can be exploited to reduce power consumption while matching the SDR network throughput. In addition, DDRNoC exhibits reduced clock distribution power, improving energy efficiency, as it needs a slower clock than a SDR network that routes packets at the same rate. Post place and route results in 28nm technology show that, compared to an iso-voltage (1.1V) SDR network, DDRNoC improves throughput proportionally to the SDR datapath slack. Moreover, a low-voltage (0.95V) DDRNoC implementation converts that slack to power reduction offering the 1.1V SDR throughput at a substantially lower energy cost.
  •  
7.
  • Ejaz, Ahsen, 1986, et al. (författare)
  • FreewayNoC: A DDR NoC with Pipeline Bypassing
  • 2018
  • Ingår i: 2018 12th IEEE/ACM International Symposium on Networks-on-Chip, NOCS 2018. - 9781538648933
  • Konferensbidrag (refereegranskat)abstract
    • This paper introduces FreewayNoC, a Network-on-chip that routes packets at Dual Data Rate (DDR) and allows pipeline bypassing. Based on the observation that routers datapath is faster than control, a recent NoC design allowed flits to be routed at DDR improving throughput to rates defined solely by the switch and link traversal, rather than by the control. However, such a DDR NoC suffers from high packet latency as flits require multiple cycles per hop. A common way to reduce latency at low traffic load is pipeline bypassing, then, flits that find a contention-free way to the output port can directly traverse the switch. Existing Single Data Rate (SDR) NoC routers support it, but applying pipeline bypassing to a DDR router is more challenging. It would need additional bypassing logic which would add to the cycle time compromising the DDR NoC throughput advantage. FreewayNoC design restricts pipeline bypassing on a DDR router to only flits that go straight simplifying its logic. Thereby, it offers low packet latency without affecting DDR router cycle time and throughput. Then, at low traffic loads, besides the few turns that a flit would take on its way from source to destination, all other hops could potentially offer minimum latency equal to the delay of the switch and link traversal. Post place and route results in 28 nm technology confirm the above and also show that zero-load latency scales to the hop count better than current state-of-the-art NoCs.
  •  
8.
  • Ejaz, Ahsen, 1986, et al. (författare)
  • HighwayNoC: Approaching Ideal NoC Performance With Dual Data Rate Routers
  • 2021
  • Ingår i: IEEE/ACM Transactions on Networking. - 1558-2566 .- 1063-6692. ; 29:1, s. 318-331
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper describes HighwayNoC,, a Network-on-chip (NoC) that approaches ideal network performance using a Dual Data Rate (DDR) datapath. Based on the observation that routers datapath is faster than control, a DDR NoC allows flits to be routed at DDR improving throughput to rates defined solely by the datapath, rather than by the control. DDR NoCs can use pipeline bypassing to reduce packet latency at low traffic load. However, existing DDR routers offer bypassing only on in-network, non-turning hops to simplify the required logic. HighwayNoC, extends bypassing support of DDR routers to local ports, allowing flits to enter and exit the network faster. Moreover, it simplifies the DDR switch allocation and the interface of router ports reducing power and area costs. Post place and route results in 28 nm technology show that HighwayNoC, performs better than current state of the art NoCs. Compared to previous DDR NoCs, HighwayNoC, reduces average packet latency by 7.3-27% and power consumption by 1-10%, without affecting throughput. Compared to existing Single Data Rate NoCs, HighwayNoC, achieves 17-22% higher throughput, has similar or up to 13.8% lower packet latency, and mixed energy efficiency results.
  •  
9.
  • Malek, Alirad, 1983, et al. (författare)
  • Odd-ECC: On-demand DRAM error correcting codes
  • 2017
  • Ingår i: ACM International Conference Proceeding Series. - New York, NY, USA : ACM. - 9781450353359 ; Part F131197, s. 96-101
  • Konferensbidrag (refereegranskat)abstract
    • An application may have different sensitivity to faults in different subsets of the data it uses. Some data regions may therefore be more critical than others. Capitalizing on this observation, Odd-ECC provides a mechanism to dynamically select the memory fault tolerance of each allocated page of a program on demand depending on the criticality of the respective data. Odd-ECC error correcting codes (ECCs) are stored in separate physical pages and hidden by the OS as pages unavailable to the user. Still, these ECCs are physically aligned with the data they protect so the memory controller can efficiently access them. Thereby, capacity, performance and energy overheads of memory fault tolerance are proportional to the criticality of the data stored. Odd-ECC is applied to memory systems that use conventional 2D DRAM DIMMs as well as to 3D-stacked DRAMs and evaluated using various applications. Compared to flat memory protection schemes, Odd-ECC substantially reduces ECCs capacity overheads while achieving the same Mean Time to Failure (MTTF) and in addition it slightly improves performance and energy costs. Under the same capacity constraints, Odd-ECC achieves substantially higher MTTF, compared to a flat memory protection. This comes at a performance and energy cost, which is however still a fraction of the cost introduced by a flat equally strong scheme.
  •  
10.
  • Pnevmatikatos, Dionisios N., et al. (författare)
  • Effective reconfigurable design: The FASTER approach
  • 2014
  • Ingår i: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). - Cham : Springer International Publishing. - 1611-3349 .- 0302-9743. ; 8405, s. 318-323
  • Konferensbidrag (refereegranskat)abstract
    • While fine-grain, reconfigurable devices have been available for years, they are mostly used in a fixed functionality, "asic-replacement" manner. To exploit opportunities for flexible and adaptable run-time exploitation of fine grain reconfigurable resources (as implemented currently in dynamic, partial reconfiguration), better tool support is needed. The FASTER project aims to provide a methodology and a tool-chain that will enable designers to efficiently implement a reconfigurable system on a platform combining software and reconfigurable resources. Starting from a high-level application description and a target platform, our tools analyse the application, evaluate reconfiguration options, and implement the designer choices on underlying vendor tools. In addition, FASTER addresses micro-reconfiguration, verification, and the run-time management of system resources. We use industrial applications to demonstrate the effectiveness of the proposed framework and identify new opportunities for reconfigurable technologies.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 17

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy