SwePub
Sök i LIBRIS databas

  Utökad sökning

WFRF:(Fan Jie)
 

Sökning: WFRF:(Fan Jie) > (2020-2024) > Reliable and effici...

Reliable and efficient RAR-based distributed model training in computing power network

Chen, Ling (författare)
Beijing University of Posts and Telecommunications (BUPT)
Li, Yajie (författare)
Beijing University of Posts and Telecommunications (BUPT)
Natalino Da Silva, Carlos, 1987 (författare)
Chalmers tekniska högskola,Chalmers University of Technology
visa fler...
Li, Yongcheng (författare)
Soochow University
Zhang, Boxin (författare)
Beijing University of Posts and Telecommunications (BUPT)
Fan, Yingbo (författare)
Beijing University of Posts and Telecommunications (BUPT)
Wang, Wei (författare)
Beijing University of Posts and Telecommunications (BUPT)
Zhao, Yongli (författare)
Beijing University of Posts and Telecommunications (BUPT)
Zhang, Jie (författare)
Beijing University of Posts and Telecommunications (BUPT)
visa färre...
 (creator_code:org_t)
2024
2024
Engelska.
Ingår i: Journal of Optical Communications and Networking. - 1943-0620 .- 1943-0639. ; 16:5, s. 527-540
  • Tidskriftsartikel (refereegranskat)
Abstract Ämnesord
Stäng  
  • The computing power network (CPN) is a novel network technology that integrates computing power from the cloud, edge, and terminals using IP/optical cross-layer networks for distributed computing. CPNs can provide an effective solution for distributed model training (DMT). As a bandwidth optimization architecture based on data parallelism, ring all-reduce (RAR) is widely used in DMT. However, any node or link failure on the ring can interrupt or block the requests deployed on the ring. Meanwhile, due to the resource competition of batch RAR-based DMT requests, inappropriate scheduling strategies will also lead to low training efficiency or congestion. As far as we know, there is currently no research that considers the survivability of rings in scheduling strategies for RAR-based DMT. To fill this gap, we propose a scheduling scheme for RAR-based DMT requests in CPNs to optimize the allocation of computing and wavelength resources considering the time dimension while ensuring reliability. In practical scenarios, service providers may focus on different performance metrics. We formulate an integer linear programming (ILP) model and a RAR-based DMT deployment algorithm (RDDA) to solve this problem considering four optimization objectives under the premise of the minimum blocking rate: minimum computing resource consumption, minimum wavelength resource consumption, minimum training time, and maximum reliability. Simulation results demonstrate that our model satisfies the reliability requirements while achieving corresponding optimal performance for DMT requests under four optimization objectives.

Ämnesord

TEKNIK OCH TEKNOLOGIER  -- Elektroteknik och elektronik -- Kommunikationssystem (hsv//swe)
ENGINEERING AND TECHNOLOGY  -- Electrical Engineering, Electronic Engineering, Information Engineering -- Communication Systems (hsv//eng)
NATURVETENSKAP  -- Data- och informationsvetenskap -- Datavetenskap (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Computer Sciences (hsv//eng)

Publikations- och innehållstyp

art (ämneskategori)
ref (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy