Skip to main content

NCCL Installation & Environment

This guide assumes you will run NCCL examples on Linux with NVIDIA GPUs.

Minimum environment checklist

NVIDIA GPU(s)
NVIDIA driver installed and working (nvidia-smi)
CUDA toolkit available (or at least CUDA runtime)
NCCL library installed
A launcher for multi-process runs:
- MPI (e.g., OpenMPI) is the simplest for examples
- or SLURM / your cluster launcher

Typical cluster setup notes (conceptual)

One process per GPU is common for examples
NCCL will choose transports based on what is available:
- NVLink / NVSwitch inside node
- PCIe paths to NICs
- Infiniband / RoCE / TCP between nodes

Helpful environment variables (starter set)

Use these when debugging:

NCCL_DEBUG=INFO (or WARN)
NCCL_DEBUG_SUBSYS=INIT,NET,GRAPH,COLL
NCCL_IB_DISABLE=1 (force non-IB network path for A/B tests)
NCCL_P2P_DISABLE=1 (disable P2P to isolate topology issues)
CUDA_VISIBLE_DEVICES=... (control mapping)

The exact set you use depends on your fabric (IB vs RoCE vs TCP) and how your cluster is configured.

Next: Communicators

Minimum environment checklist
Typical cluster setup notes (conceptual)
Helpful environment variables (starter set)