DiPaCo: Towards a New Paradigm of Distributed AI Training by Google DeepMind
The modern AI relies on large-scale models with trillions of parameters. This inevitably poses the question of how to make the training and deployment of large-scale AI systems more efficient, performant, and cost-effective.
Researchers from Google DeepMind propose a new modular paradigm for distributed AI training called Distributed Paths Composition (DiPaCo). DiPaCo is an architecture and training algorithm that aims to enable better scaling.
The high-level idea is to distribute computation by path. In this context, a ‘path’ refers to a sequence of modules that define an input-output function. Paths are small relative to the entire model and require only a handful of tightly connected devices to train or evaluate.
In this lecture, Arthur Douillard, Senior Research Scientist at Google DeepMind, shares with the @BuzzRobot community the technical details of DiPaCo and how it potentially can change the way large-scale AI systems are trained in the future.
Timestamps:
0:00 Introduction
0:45 Google DeepMind’s vision of distributed training
2:57 Building blocks towards DiPaCo
11:33 Building blocks towards DiPaCo: DiLoCo – a low communication distributed training optimization
18:41 Introducing DiPaCo: Distributed Paths Composition
22:37 Q&A session
Social Links:
Newsletter: https://buzzrobot.substack.com/
X: https://x.com/sopharicks
Slack: https://buzzrobot.slack.com/join/shared_invite/zt-1zsh7k8pd-iMu_M8bUxIK3pOJgqJgCRQ#/shared-invite/email
source