Skip to main content

NCCL Communicators

A communicator is NCCL’s way of representing a group of ranks that can participate in collective operations.

Core ideas

Rank: an integer ID (0..world_size-1), usually one process per GPU
World size: total number of ranks participating
Unique ID: a token used to connect all ranks into the same communicator

Typical pattern (multi-process)

Rank 0 generates a unique ID.
The unique ID is distributed to all ranks (e.g., via MPI broadcast).
All ranks call ncclCommInitRank(world, id, rank).

Lifecycle and safety

Create communicator once, reuse across many iterations.
Destroy communicator at the end of the process.
Create and use a CUDA stream for NCCL calls.
NCCL calls are typically enqueued on a CUDA stream; you synchronize as needed.

Next: Collectives

Core ideas
Typical pattern (multi-process)
Lifecycle and safety