Single Machine Data Parallel When data is too large to feed in a single GPU data is scatterred across GPUs Model is replicated on each GPUs Pytorch Gather the ouput from each GPUs, and compute loss, gradient, updates weights Code is simple, just one line: