Pytorch Data Loader
Training Streams
Recall the Typical Training Process:
from torch.optim import SGD
loader = ...
model = MyNet()
criterion = torch.nn.CrossEntropyLoss()
optimizer = SGD(model.parameters)
for epoch in range(10):
for batch, labels in loader:
outputs = model(batch)
loss = criterion(outputs, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
Loader
Two styles of dataset:
- Streams
- providing 1 sample each iter
- Map
- allow access samples in any order
- e.g. randomly
- user may augment, manipulate data through dataloader
- allow access samples in any order
class IterableStyleDataset(torch.utils. data. IterableDataset):
def _iter_(self):
# Support for streams...
class MapStyleDataset(torch.utils. data.Dataset):
def _getitem_(self, key):
# Map from (non-int) keys $\ldots$
def _len_(self):
# Support sampling ...
DataLoader
Object
Pytorch DataLoader allows us to load batches from a dataset
dataloader = DataLoader (
dataset, # only for map-style dataset
batch_size=8, # balance speed and convergence
num_workers =2, # non-blocking when $>0$
sampler=RandomSampler, # random read may saturate drive
pin_memory=True, # page-lock memory for data?
- random read may saturate the storage
- pin: keep the data in RAM.
- you may transfer data to GPU faster
Performance
2 main constrains:
- CPU IPS (instructions per second)
- storage IOPS (I/O per second)
CPU
You want the CPUs to be performing:
- preprocessing
- decompression
- copying – to get the data to the GPU.
The rule: You don’t want them to be idling or busy-waiting for thread/process synchronization primitives, IO, etc
The easiest way to improve CPU utilization with the PyTorch is to use the worker
process support built into Dataloader.