SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L773:1573 0484 OR L773:0920 8542 "

Sökning: L773:1573 0484 OR L773:0920 8542

  • Resultat 1-10 av 31
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Atzori, Marco, 1992-, et al. (författare)
  • In situ visualization of large-scale turbulence simulations in Nek5000 with ParaView Catalyst
  • 2022
  • Ingår i: Journal of Supercomputing. - : Springer. - 0920-8542 .- 1573-0484. ; 78:3, s. 3605-3620
  • Tidskriftsartikel (refereegranskat)abstract
    • In situ visualization on high-performance computing systems allows us to analyze simulation results that would otherwise be impossible, given the size of the simulation data sets and offline post-processing execution time. We develop an in situ adaptor for Paraview Catalyst and Nek5000, a massively parallel Fortran and C code for computational fluid dynamics. We perform a strong scalability test up to 2048 cores on KTH’s Beskow Cray XC40 supercomputer and assess in situ visualization’s impact on the Nek5000 performance. In our study case, a high-fidelity simulation of turbulent flow, we observe that in situ operations significantly limit the strong scalability of the code, reducing the relative parallel efficiency to only ≈ 21 % on 2048 cores (the relative efficiency of Nek5000 without in situ operations is ≈ 99 %). Through profiling with Arm MAP, we identified a bottleneck in the image composition step (that uses the Radix-kr algorithm) where a majority of the time is spent on MPI communication. We also identified an imbalance of in situ processing time between rank 0 and all other ranks. In our case, better scaling and load-balancing in the parallel image composition would considerably improve the performance of Nek5000 with in situ capabilities. In general, the result of this study highlights the technical challenges posed by the integration of high-performance simulation codes and data-analysis libraries and their practical use in complex cases, even when efficient algorithms already exist for a certain application scenario.
  •  
2.
  • Casas, Israel, et al. (författare)
  • PSO-DS : a scheduling engine for scientific workflow managers
  • 2017
  • Ingår i: Journal of Supercomputing. - : Springer Science and Business Media LLC. - 0920-8542 .- 1573-0484. ; 73:9, s. 3924-3947
  • Tidskriftsartikel (refereegranskat)abstract
    • Cloud computing, an important source of computing power for the scientific community, requires enhanced tools for an efficient use of resources. Current solutions for workflows execution lack frameworks to deeply analyze applications and consider realistic execution times as well as computation costs. In this study, we propose cloud user-provider affiliation (CUPA) to guide workflow's owners in identifying the required tools to have his/her application running. Additionally, we develop PSO-DS, a specialized scheduling algorithm based on particle swarm optimization. CUPA encompasses the interaction of cloud resources, workflow manager system and scheduling algorithm. Its featured scheduler PSO-DS is capable of converging strategic tasks distribution among resources to efficiently optimize makespan and monetary cost. We compared PSO-DS performance against four well-known scientific workflow schedulers. In a test bed based on VMware vSphere, schedulers mapped five up-to-date benchmarks representing different scientific areas. PSO-DS proved its efficiency by reducing makespan and monetary cost of tested workflows by 75 and 78%, respectively, when compared with other algorithms. CUPA, with the featured PSO-DS, opens the path to develop a full system in which scientific cloud users can run their computationally expensive experiments.
  •  
3.
  •  
4.
  • Cebrian, Juan M., et al. (författare)
  • Managing power constraints in a single-core scenario through power tokens
  • 2014
  • Ingår i: Journal of Supercomputing. - : Springer Science and Business Media LLC. - 0920-8542 .- 1573-0484. ; 68:1, s. 414-442
  • Tidskriftsartikel (refereegranskat)abstract
    • Current microprocessors face constant thermal and power-related problems during their everyday use, usually solved by applying a power budget to the processor/core. Dynamic voltage and frequency scaling (DVFS) has been an effective technique that allowed microprocessors to match a predefined power budget. However, the continuous increase of leakage power due to technology scaling along with low resolution of DVFS makes it less attractive as a technique to match a predefined power budget as technology goes to deep-submicron. In this paper, we propose the use of microarchitectural techniques to accurately match a power constraint while maximizing the energy-efficiency of the processor. We will predict the processor power dissipation at cycle level (power token throttling) or at a basic block level (basic block level mechanism), using the dissipated power translated into tokens to select between different power-saving microarchitectural techniques. We also introduce a two-level approach in which DVFS acts as a coarse-grain technique to lower the average power dissipation towards the power budget, while microarchitectural techniques focus on removing the numerous power spikes. Experimental results show that the use of power-saving microarchitectural techniques in conjunction with DVFS is up to six times more precise, in terms of total energy consumed over the power budget, than only using DVFS to match a predefined power budget.
  •  
5.
  • Daneshtalab, Masoud, et al. (författare)
  • In-order delivery approach for 2D and 3D NoCs
  • 2015
  • Ingår i: Journal of Supercomputing. - : Springer Science and Business Media LLC. - 0920-8542 .- 1573-0484. ; 71:8, s. 2877-2899
  • Tidskriftsartikel (refereegranskat)abstract
    • In many applications, it is critical to guarantee the in-order delivery of requests from the master cores to the slave cores, so that the requests can be executed in the correct order without requiring buffers. Since in NoCs packets may use different paths and on the other hand traffic congestion varies on different routes, the in-order delivery constraint cannot be met without support. To guarantee the in-order delivery, traditional approaches either use dimension-order routing or employ reordering buffers at network interfaces. Dimension-order routing degrades the performance considerably while the usage of reordering buffers imposes large area overhead. In this paper, we present a mechanism allowing packets to be routed through multiple paths in the network, helping to balance the traffic load while guaranteeing the in-order delivery. The proposed method combines the advantages of both deterministic and adaptive routing algorithms. The simple idea is to use different deterministic algorithms for independent flows. This approach neither requires reordering buffers nor limits packets to use a single path. The algorithm is simple and practical with negligible area overhead over dimension-order routing. The concept is investigated in both 2D and 3D mesh networks.
  •  
6.
  • Dastgeer, Usman, 1985-, et al. (författare)
  • Performance-aware Composition Framework for GPU-based Systems
  • 2015
  • Ingår i: Journal of Supercomputing. - : Springer. - 0920-8542 .- 1573-0484. ; 71:12, s. 4646-4662
  • Tidskriftsartikel (refereegranskat)abstract
    • User-level components of applications can be made performance-aware by annotating them with performance model and other metadata. We present a component model and a composition framework for the automatically optimized composition of applications for modern GPU-based systems from such components, which may expose multiple implementation variants. The framework targets the composition problem in an integrated manner, with the ability to do global performance-aware composition across multiple invocations. We demonstrate several key features of our framework relating to performance-aware composition including implementation selection, both with performance characteristics being known (or learned) beforehand as well as cases when they are learned at runtime. We also demonstrate hybrid execution capabilities of our framework on real applications. Furthermore, we present a bulk composition technique that can make better composition decisions by considering information about upcoming calls along with data flow information extracted from the source program by static analysis. The bulk composition improves over the traditional greedy performance aware policy that only considers the current call for optimization.
  •  
7.
  • de Blanche, Andreas, 1975-, et al. (författare)
  • Addressing characterization methods for memory contention aware co-scheduling
  • 2015
  • Ingår i: Journal of Supercomputing. - : Springer Science and Business Media LLC. - 0920-8542 .- 1573-0484. ; 71:4, s. 1451-1483
  • Tidskriftsartikel (refereegranskat)abstract
    • The ability to precisely predict how memory contention degrades performance when co-scheduling programs is critical for reaching high performance levels in cluster, grid and cloud environments. In this paper we present an overview and compare the performance of state-of-the-art characterization methods for memory aware (co-)scheduling. We evaluate the prediction accuracy and co-scheduling performance of four methods: one slowdown-based, two cache-contention based and one based on memory bandwidth usage. Both our regression analysis and scheduling simulations find that the slowdown based method, represented by Memgen, performs better than the other methods. The linear correlation coefficient (Formula presented.) of Memgen's prediction is 0.890. Memgen's preferred schedules reached 99.53 % of the obtainable performance on average. Also, the memory bandwidth usage method performed almost as well as the slowdown based method. Furthermore, while most prior work promote characterization based on cache miss rate we found it to be on par with random scheduling of programs and highly unreliable.
  •  
8.
  • Elmisery, Ahmed M., et al. (författare)
  • Privacy-enhanced middleware for location-based sub-community discovery in implicit social groups
  • 2016
  • Ingår i: Journal of Supercomputing. - : Springer Science+Business Media B.V.. - 0920-8542 .- 1573-0484. ; 72:1, s. 247-274
  • Tidskriftsartikel (refereegranskat)abstract
    • In our connected world, recommender services have become widely known for their ability to provide expert and personalize information to participants of diverse applications. The excessive growth of social networks, a new kind of services are being embraced which are termed as "group based recommendation services", where recommender services can be utilized to discover sub-communities within implicit social groups and provide referrals to new participants in order to join various sub-communities of other participants who share similar preferences or interests. Nevertheless, protecting participants' privacy in recommendation services is a quite crucial aspect which might prevent participants from exchanging their own data with these services, which in turn detain the accuracy of the generated referrals. So in order to gain accurate referrals, recommendation services should have the ability to discover previously unknown sub-communities from different social groups in a way to preserve privacy of participants in each group. In this paper, we present a middleware that runs on end-users' mobile phones to sanitize their profiles' data when released for generating referrals, such that computation of referrals continues over the sanitized version of their profiles' data. The proposed middleware is equipped with cryptography protocols to facilitate private discovery of sub-communities from the sanitized version of participants' profiles in a university scenario. Location data are added to participants' profiles to improve the awareness of surrounding sub-communities, so the offered referrals can be filtered based on adjacent locations for participant's location. We performed a number of different experiments to test the efficiency and accuracy of our protocols. We also developed a formal model for the tradeoff between privacy level and accuracy of referrals. As supported by the experiments, the sub-communities were correctly identified with good accuracy and an acceptable privacy level.
  •  
9.
  • Elmroth, Erik, 1964-, et al. (författare)
  • High Performance Computations for Large Scale Simulations of Subsurface Multiphase Fluid and Heat Flow
  • 2001
  • Ingår i: Journal of Supercomputing. - 0920-8542 .- 1573-0484. ; 18:3, s. 235-258
  • Tidskriftsartikel (refereegranskat)abstract
    • TOUGH2 is a widely used reservoir simulator for solving subsurface flow related problems such as nuclear waste geologic isolation, environmental remediation of soil and groundwater contamination, and geothermal reservoir engineering. It solves a set of coupled mass and energy balance equations using a finite volume method. This contribution presents the design and analysis of a parallel version of TOUGH2. The parallel implementation first partitions the unstructured computational domain. For each time step, a set of coupled non-linear equations is solved with Newton iteration. In each Newton step, a Jacobian matrix is calculated and an ill-conditioned non-symmetric linear system is solved using a preconditioned iterative solver. Communication is required for convergence tests and data exchange across partitioning borders. Parallel performance results on Cray T3E-900 are presented for two real application problems arising in the Yucca Mountain nuclear waste site study. The execution time is reduced from 7504 seconds on two processors to 126 seconds on 128 processors for a 2D problem involving 52,752 equations. For a larger 3D problem with 293,928 equations the time decreases from 10,055 seconds on 16 processors to 329 seconds on 512 processors.
  •  
10.
  • Fang, Z., et al. (författare)
  • Active memory controller
  • 2012
  • Ingår i: Journal of Supercomputing. - : Springer Science and Business Media LLC. - 1573-0484 .- 0920-8542. ; 62:1, s. 510-549
  • Tidskriftsartikel (refereegranskat)abstract
    • Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs' performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50x faster barriers, 12x faster spinlocks, 8.5x-15x faster stream/array operations, and 3x faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 31
Typ av publikation
tidskriftsartikel (31)
Typ av innehåll
refereegranskat (30)
övrigt vetenskapligt/konstnärligt (1)
Författare/redaktör
Markidis, Stefano (4)
Kaxiras, Stefanos (3)
Daneshtalab, Masoud (3)
Plosila, Juha (3)
Ebrahimi, Masoumeh (2)
Taheri, Javid (2)
visa fler...
Laure, Erwin (2)
Kessler, Christoph (2)
Liljeberg, Pasi (2)
Cebrian, Juan M. (2)
Pllana, Sabri (1)
Parker, M. A. (1)
Zhang, L. (1)
Fernandez, J. (1)
Vasilakos, Athanasio ... (1)
Ros, Alberto (1)
Schlatter, Philipp (1)
Vinuesa, Ricardo (1)
Aguilar, Xavier (1)
Lundkvist, Helene (1)
Tenhunen, Hannu (1)
Goude, Anders (1)
Wang, Hui (1)
Iakymchuk, Roman (1)
Olsson, Bengt (1)
Abraham, Ajith (1)
Ranjan, Rajiv (1)
Dastgeer, Usman, 198 ... (1)
Kessler, Christoph, ... (1)
Elmroth, Erik, 1964- (1)
Lundqvist, Thomas, 1 ... (1)
de Blanche, Andreas, ... (1)
Liu, Felix (1)
Jansson, Niclas, 198 ... (1)
Zomaya, Albert Y. (1)
Rahmani, Amir-Mohamm ... (1)
Engblom, Stefan (1)
Titos Gil, Ruben, 19 ... (1)
Peplinski, Adam (1)
Atzori, Marco, 1992- (1)
Mallor, Fermin (1)
Köpp, Wiebke, 1989- (1)
Chien, Wei Der (1)
Massaro, Daniele (1)
Weinkauf, Tino, 1974 ... (1)
Rezaei, Mohammadtagh ... (1)
Laure, E. (1)
Fischer, Paul (1)
Sinaei, Sima (1)
McKee, Sally A, 1963 (1)
visa färre...
Lärosäte
Kungliga Tekniska Högskolan (8)
Uppsala universitet (4)
Linköpings universitet (4)
Chalmers tekniska högskola (3)
Linnéuniversitetet (2)
Karlstads universitet (2)
visa fler...
Umeå universitet (1)
Luleå tekniska universitet (1)
Högskolan Väst (1)
Mälardalens universitet (1)
Jönköping University (1)
Malmö universitet (1)
RISE (1)
Sveriges Lantbruksuniversitet (1)
visa färre...
Språk
Engelska (31)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (20)
Teknik (11)
Medicin och hälsovetenskap (1)
Lantbruksvetenskap (1)
Samhällsvetenskap (1)
Humaniora (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy