Posts

How transformer generate texts

Basic math: auto-regressive language generation Decoding methods Greedy Search Definition Example Feature Beam Search Definition examplle Feature Code Some function features of beam search Why beam search might not be good in open-ended generation Sampling issue 1 with sampling: coherent Top-K Sampling An example Good and Bad Top-p (nucleus) sampling — revised top-k sampling an example Top-K + Top-P Basic math: auto-regressive language generation auto-regressive language generation is based on the assumption that the probability distribution of a word sequence can be decomposed into the product of conditional next word distributions:

Dylan Yang

Last updated on Jul 5, 2022 4 min read

Unable to find JVM

A common issue you face when switching environment using conda or different envs depends on different Java version. Here is the solution. Find where jdk installed you can find the location using whereis java and which java(which only output the dir of Java that is currently used).

Dylan Yang

Last updated on Jun 21, 2022 1 min read

Notes from txtbook Chapter 3

Last updated on Jan 11, 2022 4 min read

How to build your own Machine Learning Server

Buy a PC with Nvidia GPU Install Liunx Get connect to Server add a sudo user ssh connection get internet connection get IP address install ssh in the server generate rsa key pair send public to server config ssh file install dependencies Buy a PC with Nvidia GPU For CPU, you can choose AMD For GPU, Nvidia GPU is your only choice Notice that the bottleneck of your machine could most likely be the GPU memory.

Dylan Yang

Last updated on Jan 11, 2022 3 min read

Pytorch Geometric 1. Massage Passing

2 Classes you need to self define when you implement Graph Neural Network(GNN): MyNet(pytorch.nn.Moduel) MyGraphModel(torch_geometric.nn.MessagePassing) MyNet(pytorch.nn.Moduel) In your overall model structure, you should implement: (in __init__): call a MessagePassing child class to build massage-passing model (in forward): make sure the data follows the requirement of MessagePassing child class do the “iterative massage passing"(K-times) in forward, the final output will be the node embedding you need.

Last updated on Jun 6, 2022 3 min read

Pytorch Basics 3: Loss Function

To train a model, we need: loss function Optimizer model data Relation beetween Loss function and Optimizer? recall the simplest pytorch training process: dataset = My_data() loader = DataLoader(dataset) model = MyNet() criterion = torch.

Last updated on Jun 6, 2022 2 min read

Pytorch Basics 4: optimizer basics

To train a model, we need: loss function Optimizer model data Some facts testing is very expensive, since you have to look at whole testing set.

Last updated on Jun 6, 2022 1 min read

Pytorch Basics 2: Model

| First first, the forward of nn.Module is the one single most important part of the entire implementation. It may effect: other part of the nn.Module customized dataloader optimizer and loss function.

Last updated on Jun 6, 2022 2 min read

Pytorch Basics 1: Data

PyTorch has two primitives to work with data: torch.utils.data.DataLoader and torch.utils.data.Dataset. Dataset is an object that stores the samples and their corresponding labels there are two types of Dataset: https://pytorch.

Last updated on Jun 6, 2022 1 min read

Use Multiple GPUs

Single Machine Data Parallel When data is too large to feed in a single GPU data is scatterred across GPUs Model is replicated on each GPUs Pytorch Gather the ouput from each GPUs, and compute loss, gradient, updates weights Code is simple, just one line:

Dylan Yang

Last updated on Jun 6, 2022 2 min read