Pytorch Data Loader

Training Streams

Recall the Typical Training Process:


from torch.optim import SGD
loader = ...
model = MyNet()
criterion = torch.nn.CrossEntropyLoss()
optimizer = SGD(model.parameters)
  for epoch in range(10):
    for batch, labels in loader:
    outputs = model(batch)
    loss = criterion(outputs, labels)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Loader

Two styles of dataset:

  • Streams
    • providing 1 sample each iter
  • Map
    • allow access samples in any order
      • e.g. randomly
    • user may augment, manipulate data through dataloader
class IterableStyleDataset(torch.utils. data. IterableDataset):
  def _iter_(self):
  # Support for streams...


class MapStyleDataset(torch.utils. data.Dataset):
  def _getitem_(self, key):
  # Map from (non-int) keys $\ldots$
  def _len_(self):
  # Support sampling ...



DataLoader Object

Pytorch DataLoader allows us to load batches from a dataset

dataloader = DataLoader (
  dataset, # only for map-style dataset
  batch_size=8, # balance speed and convergence
  num_workers =2, # non-blocking when $>0$
  sampler=RandomSampler, # random read may saturate drive
  pin_memory=True, # page-lock memory for data?


  • random read may saturate the storage
  • pin: keep the data in RAM.
    • you may transfer data to GPU faster

Performance

2 main constrains:

  • CPU IPS (instructions per second)
  • storage IOPS (I/O per second)

CPU

You want the CPUs to be performing:

  • preprocessing
  • decompression
  • copying – to get the data to the GPU.

The rule: You don’t want them to be idling or busy-waiting for thread/process synchronization primitives, IO, etc

The easiest way to improve CPU utilization with the PyTorch is to use the worker process support built into Dataloader.

Related