Source

The material based mostly on the free deep learning course and library fast.ai (from 2017)

https://course.fast.ai/index.html

GPU Servers

This is a short list of GPU Servers that can be used for fast.ai.

Server Free Description Limitation
Google Colab Yes Free notebook for use (required setup a GPU) 12 continuus hours per session
Kaggle Notebook Yes Free kaggle notebook more 60 continuus minutes per session.
Paperspace No Pay as you go consoles to run machines
Microsoft Azure ML No/200$ for start Free 200$ for machines that can have GPU inside
Google Cloud No/300$ for start Free Tier to use $300 dolars more
Amazon AWS No Free Tier doesn’t have GPU machines.

Artificial Neural Network more

Artificial Neural Network (ANN) is an paradigm for the deep learning method based on how the natural nervous system works. They can be used in the ImageRecognition, SpeechRecognition, natural language processing, desease recognition etc…

The simplest architecture of the Artifical Neural Network is a Single-layer Perceptron that was introduced in the 1957 by Frank Rosenblatt. wiki

Single Layer Perceptron

This architecture contains:

  • Input x as a vector
  • Weights for each x input (w)
  • bias (b)
  • and Activation function that on output has been activated or not (0 or 1)
Card image cap σ = 1

When Use?

  • Can only classify linearly separable cases with the binary output (0 or 1)
  • Single Layer perceptron cannot solve non-linear problem.
  • Not used in modern Deep Learning architecture

Modern Neural Network

Modern Neural Networks has something more than only one layer. In previous architecture, you wouldn’t find any better calculation than some simple linear functions. To create more complicated functions we need hidden layers. With this first input goes to the activation function in the first layer, and the following output if the hidden layer can go to the next layer with the activation function or to the final output. More dense Neural Network gives us the opportunity to define more parameters on weights and biases in the network and next recognise more complicated tasks. Each layer can have some special functions. For example in the image recognition first layer is used to extract features (like edges) on the image.

For example, we define a neural network with one hidden layer with three neurons, two input values, and one output. For the moment we don’t set our activation function yet.

1547391156214

Let’s explain all parameters defined in the network:

Value Description
1547390335050 Number of the examples.
1547389815746 Input on the i position. Sometimes it shows as a zero activation layer. 1547389977628
1547390011533 Input vector. The size of the input vector is (input_layer x m) .
Where m is a number of examples (In this example 1).
1547389753231 the weight on the layer n, from the input from the previous layer position (i) to the activation layer position (j)
1547390155358 The matrix on the layer n. The size of the matrix is (current_layer x previous_layer). In the example 1547390155358 (3 x 2)
1547587884160 The Z value on the layer (n). The product on output from the previous layer (or input) and weights on the current layer. The size of the matrix is (current_layer x m)
1547390538540 The value on the activation function on the layer n and position j. Activation on the last layer is the output on the neural network: 1547390776342
1547587796103 Activation matrix on the layer (n). The size is the same as
1547390298914 Set of parameters of the model. In the example two matricies: 1547390430000
1547391042187 Loss function on the one example.
1547391083924 Cost function on all examples in the one batch. (Very often show as C or E)
import torch
import torch.nn as nn
import torch.nn.functional as F

class Neural_Network(nn.Module):
    def __init__(self, inputSize = 2, outputSize = 1, hiddenSize = 3 ):
        super(Neural_Network, self).__init__()
        # parameters
        self.inputSize = inputSize
        self.outputSize = outputSize
        self.hiddenSize = hiddenSize
        
        # weights
        #self.W1 = torch.randn(self.hiddenSize, self.inputSize) # 3 X 2 tensor
        #self.W2 = torch.randn(self.outputSize, self.hiddenSize) # 1 X 3 tensor
        self.W1 = torch.tensor([[0.5,0.5],[0.6,0.7],[0.6,0.7]])
        self.W2 = torch.tensor([[0.1,0.2,0.3]])

Forward Function

To calculate the output we need to go from the input to the output with all calculations. For example, we define all activation as sigmoid functions, and our loss function as binary cross entropy (the loss function if we have binary output).

For example you have a function to predict that is:

1547485837571

This is a simple function with the binary output (like when you predict if on the image is dog or cat)

def fun_to_predict(x):
  return 1.0 if 3*x[0]*x[1]-2*x[0]>0 else 0.0 #must be float

To calculate the output you need to go through your defined neural network and calculate all the layers.

  1. calculate the first layer by multiply the input with the first layer weights.

1547586657031

  1. Caclulate activation function on the weights.

1547586732170

  1. Calulate the second layer by multiplying the output on the first layer with weights on the second layer.

1547586904341

  1. Calculate the output by activation function in the previous output.

1547586933870

  1. Calculate the cost and lost function.

1547587606618

1547587697072

What’s the difference between cost and loss function?

Usually, they are used interchangeably. But in this case, I thought a cost function as an error on the whole data set as a scalar, but loss function as a single function on one example.

more stackoverflow

Below you can find the example of calculating the output of the 2 layers neural network defined before. You can change values and play with weights on the network.

Deepn Network
    def sigmoid(self, s):
        return 1 / (1 + torch.exp(-s))        
      
    def loss_function(self,yhat,y):
        return -y * torch.log(yhat) - (1-y)* torch.log(1-yhat);
   
    def loss(self, yhat,y):
        # Binary classification logistic loss 
        return torch.mean(self.loss_function(yhat,y));
    
    def forward(self, X): # m - number of examples
        # (3 X 2) * (2 x m) = (3 x m)  
        self.z1 = torch.matmul(self.W1 , X) 
        # (3 X m) activation function
        self.a1 = self.sigmoid(self.z1) 
        # (1 x 3) * (3 x m) = (1 x m)
        self.z2 = torch.matmul(self.W2, self.a1) 
        # (1 x m) final activation function
        yhat = self.sigmoid(self.z2) 
        return yhat    

Gradient Descent more

When you know how to calculate your result is time to update weights to get the best result in your neural network. For this you need to use Gradient Descent. This algorithm is use the fact that if you want find the minimimum of the cost function you can use derivative of the cost function to recognize the direction how to update weights and the value shows how much to update.

Let’s assume that you have only one weight (w), so the cost function J(θ) you can present in a 2D plot. (on the x axis weight, and on the y axis J(θ)).

1548179188819

1548137149402

1548137409720

Where your whole function to calculate the final cost J(θ) is:

1548137505581

Your gradient will be a derivative cost function on your weight.

1548137519406

1548138648303

If your gradient is positive, this means that the plot is increasing, so we have to subtract the value of the gradient (go to the left). If your gradient is negative, this means that you need to add the value to your weight (go to the right).

In your final neural network, you have more dimensions, and it is impossible to show it on the single plot.

Backpropagation

When you know how to update weights, it is time to calculate the gradient for your neural network. You need to find the gradient on your weight, but the problem might be your equation that is little more complicated.

Let’s look how you calculate your final cost.

1548183708290

In backpropagation, you need to go back from right to left and on each step calculate derivative to update the previous layer. For this you use the chain rule: wiki

In the chain rule, you can calculate the partial derivative of the composition of two or more functions.

1548140736187

For example in your neural network you if you want to calculate new weights on your first layer and the second layer according to the gradient descent you have to calculate partial derivative on your cost function.

1548193684435

1548193713633

By using chain rule, you can calculate partial derivative based on the partial derivative on each function.

1548194181907

You can go from the right to the left and calculate the error on each step, and next propagate this to the previous layer and update your weights.

Hadamard product wiki

Sometimes you will see ∘ instead of ∗ in the multiplication between matrices. This sign is a Hadamard product, and by this, you multiply two matrices with the same dimension by multiply each part of matrix themselves. For example if your f(x)=x*(x-1). The matrix calculation wouldn’t be multiple two matrices but Hamarad product of this.

1548182006468

Below you can find an algorithm for a matrix with m examples .

1548182396033

  1. Calculate the error on the cost function
Deepn Network
  1. Backpropagate this to calculate the loss on the dJ/Z[2] layer.
Deepn Network
  1. Backpropagate to get update for the second Layer W[2]
Deepn Network
  1. We go further to update W[1]. First calculate dJ/dA[1]
Deepn Network
  1. Next we calculate dJ/dZ[1]
Deepn Network
  1. The last step before update is to find the dJ/dW[1]
Deepn Network
  1. The final step is to update your weights with new values, and calculate new loss and cost.
Deepn Network

Pytorch Implementation

NN = Neural_Network(2,1)
X = torch.randn(2,10) # random values from [0,1] (2 x m)

def fun_to_predict(x):
  return 1.0 if x[0]*x[1]+2*x[0]>0 else 0.0 #must be float

Y = torch.t( torch.tensor( [[fun_to_predict(x)] for x in torch.t(X)]   ) ) # (1 x m)
yhat = NN(X)

print (" Loss: " + str(NN.loss(yhat,Y).detach().item())) 

for i in range(100):  # trains the NN 100 times
  NN.train(X, Y, 1.0)

yhat = NN(X)
print ("#" + str(i) + " Loss: " + str(NN.loss(yhat,Y).detach().item()))

ypred = NN.predict(X)
     
print ("Predicted data based on trained weights: ")
print ("Input: \n" + str(X))
print ("Output: \n" + str(ypred))   
print ("Real: \n" + str(Y))
Loss: 0.7505306005477905
#99 Loss: 0.00372719275765121
Predicted data based on trained weights: 
Input (scaled): 
tensor([[ 1.5810,  1.3010,  1.2753, -0.2010,  0.9624,  0.2492, -0.4845, -2.0929,
         -0.8199, -0.4210],
        [-0.9620,  1.2825, -0.3430, -0.6821, -0.9887, -1.7018, -0.7498, -1.1285,
          0.4135,  0.2892]])
Output: 
[1, 1, 1, 0, 1, 1, 0, 0, 0, 0]
Real: 
tensor([[1., 1., 1., 0., 1., 1., 0., 0., 0., 0.]])

Do I have to implement all these functions?

Fortunately no. All gradient and backpropagation the only think what you have to set up in the Neural Network is:

  • architecture of the network - How many hidden layer?, What are the activation functions on each layer?
  • Loss Function - How to calculate loss function in the Neural Network
  • Optimilization Function - How to update weights. If you choose wrong you can finish in the local minimum instead of the better result
  • Learning Rate - Find learning rate to use in the neural network
more examples

Let’s bring again the same neural network that we define in previous chapter. For all activation we want to use sigmoid function and for loss

1547391156214

1548182396033

Name Comment
torch.nn.Linear Simple Linear Layer in the neural network more
torch.nn.Sigmoid Sigmoid activation function more
torch.nn.BCELoss Binary classification logistic loss more
torch.optim.SGD stochastic gradient descent more

What’s the difference between stochastic gradient descent adn gradient descent?

Stochastic gradient descent (often shortened to SGD) means that you update set parameters in an iterative for each sample one by one. In Gradient Descent you get your whole data to calculate the loss and update your weights. Quora Question wiki

Using Sequential

inputSize = 2
outputSize = 1
hiddenSize = 3

model = torch.nn.Sequential(
          torch.nn.Linear(inputSize, hiddenSize),
          torch.nn.Sigmoid(),
          torch.nn.Linear(hiddenSize, outputSize),
    	  torch.nn.Sigmoid(),
        )

# define loss function, sinble Binary 
loss_fn = torch.nn.BCELoss()

# Implements stochastic gradient descent (optionally with momentum).
optimizer = torch.optim.SGD(model.parameters(),lr=1e-1)

Using nn.Model

class TwoLayerNet(nn.Module):
  def __init__(self,inputSize = 2, outputSize = 1, hiddenSize = 3):
    super(TwoLayerNet, self).__init__();
    
    # define first layer and second layer
    self.linear1 = torch.nn.Linear(inputSize, hiddenSize)
    self.linear2 = torch.nn.Linear(hiddenSize, outputSize)

  # calculate the output  
  def forward(self, X):
    h = torch.sigmoid(self.linear1(X))
    return torch.sigmoid(self.linear2(h));

#define model
model = TwoLayerNet()

# define loss function, sinble Binary 
loss_fn = torch.nn.BCELoss()

# Implements stochastic gradient descent (optionally with momentum).
optimizer = torch.optim.SGD(model.parameters(),lr=1e-1)

Example of SGD for linear function

example of updating weights for linear function from the fast.ai course.

%matplotlib inline
from fastai.basics import *

n = 100
x = torch.ones(n,2) 
x[:,0].uniform_(-1.,1) # (m x 2) (2 is a number of input parameters)

a = tensor(3.,2); # The parameters that we should find
y = x@a + torch.rand(n) # (m x 1)

def mse(y_hat, y): return ((y_hat-y)**2).mean() # mean square error
a = nn.Parameter(tensor(-1.,1)); # unknown parameter for function y=x@a, (n x 1) = (n x 2) x (2 x 1)

def update(i):
    y_hat = x@a # uour testing, forward function
    loss = mse(y, y_hat) #loss
    if i % 10 == 0: print(loss)
    loss.backward()  # backward functions
    with torch.no_grad(): # with gradient we g
        a.sub_(lr * a.grad) # a.grad - gives information about gration, it will update a, i need to go oposite.
        a.grad.zero_() # zero gradient

lr = 1e-1
        
# ANIMATE
from matplotlib import animation, rc
rc('animation', html='html5')

fig = plt.figure()
plt.scatter(x[:,0], y, c='orange')
line, = plt.plot(x[:,0], x@a)
plt.close()

def animate(i):
    update(i)
    line.set_ydata(x@a)
    return line,

animation.FuncAnimation(fig, animate, np.arange(0, 100), interval=20)

1550262060215

Data / Source

In this chapter I want to brink some information how to get any data to learn, and how to convert them to be helpfull in the learning process.

Resources

List of free resources for your own use.

How to get data and other sources of data for Data-Science and Machine Learning

Name Description
Medical Data link List of a lot of medical data that can be used for Machine Learning
Kaggle Datasets link List a lot of datasets for Kaggle competitions.
Archive DataSet from Wisconsinlink Archive Data-Set that help you to get data from Wisconsin Library.
fast.ai Datasets more List of datatsets for image classification/NLP processing
BelgiumTS DataSet link BelgiumTS DataSet with road signs classification.
Flower Datasets more flower datasets
Human 3.6MLmore 3.6 million 3D human poses and corresponding images.

Text Resources:

Name Description
Sentiment Analysis in Text more .csv file with 40,000 text and defined emetions. (sadness, enthusiasm, worry, love)
EmoBank more Large DataBase text with a score for each emotion (Valence-Arousal-Dominancescheme)
EmoInt more Shared Task on Emotion Intensity, 4 emotions (anger, fear, joy, sadness). visualisation

Tabular Data

Name Description
Movielens more Non-commercial, personalized movie recommendations.

Data Download / Google Drive

If you use Google Colab you can easly mount your Gdrive in the filesystem. I also create a link for “My Drive”(ln), because it is easier to go to a folder without a space in the name.

# Load the Drive helper and mount
from google.colab import drive
drive.mount('/content/drive')
!ln -s "/content/drive/My Drive/" /content/gdrive

1550258836263

Data Download / Google Images

You can also download using google images search engine. For this,

  1. Prepare folder structure for your classes.
from fastai.vision import *

folder = 'teddys'

path = Path('data/bears')
dest = path/folder
dest.mkdir(parents=True, exist_ok=True)

1550259522813

  1. Go to google images, find interesting search images, go to the developer consolectrl+shift+j and type:
urls = Array.from(document.querySelectorAll('.rg_di .rg_meta')).map(el=>JSON.parse(el.textContent).ou);
window.open('data:text/csv;charset=utf-8,' + escape(urls.join('\n')));

1550259689567

This wll download you the file download, rename the file to urls_teddys.txt and upload to your path in the jupyter or colab

1550259797554

  1. Download your files using download_images
file = 'url_teddys.txt'
folder = 'teddys'
dest = path/folder

download_images(path/file, dest, max_pics=200)

1550259892436

  1. The last step is to verify images. This will remove images that are corrupted or doesn’t have 3 channels.
verify_images(path/'teddys', delete=True, max_size=500)

1550260036514

Data Download / wget

CurlWget

There is an extension for the chrome that you can get to generate wget command for your Linux command line. Just add the extension and after start downloading cancel this and copy the command to your console. (or in the Jupyter Notebook with exclamation character “!”).

1549318371426

!wget --header="Host: archive.ics.uci.edu" --header="User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.81 Safari/537.36" --header="Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8" "https://archive.ics.uci.edu/ml/machine-learning-databases/00436/Container_Crane_Controller_Data_Set.csv" -O "Container_Crane_Controller_Data_Set.csv" -c

1549482301055

Data Download / kaggle

!pip install kaggle

Sign to the profile on the Kaggle page:

1549317920891

Next, download new API key. Upload

1549317857572

And upload to your page: (On Colab you can upload your page)

1549317981152

Copy the kaggle configuration to the home folder.

!mkdir ~/.kaggle
!cp /content/kaggle.json ~/.kaggle/kaggle.json
!chmod 600 ~/.kaggle/kaggle.json

Now you can download any competion that you are in: (kaggle will create folders by themself)

!kaggle competitions download -c home-credit-default-risk -p /content/titanic

Errors:

  • Error 401 - means that you have wrong .json file. Generate again and check if exists.
  • Error 403 - You don’t accept terms and condition for the competition. Go to the webpage and join to the competition.

Data Download / linux commands

Useful commands to operate on files.

Command Example Description
Create Directory mkdir folder Create directory named folder
Remove files rm content/*.zip Remove all files with extension ``.zip`
Unizip files unzip -o -q isic-images.zip -d isd/bening Unzip the files isic-images.zip,
the -o option is used to suppress the query,
the -q is used to not show list of extracted files,
-d is a destination folder
Move files mv isd/benign/ISIC-images/**/*.jpg isd/benign Move all files with the extension .jpg from the folder isd/benign/ISIC-images , the double start ** means to tro find files in the folder and subfolder recseively., Destination folder is isd/benign
Copy files cp isd/benign/ISIC-images/**/*.jpg isd/benign The same as above without remove original files.

Architecture

Glossary

Name Description
parameters Configuration of the Neural network (Contains weights and biases)
activation function Function to activate
activation Result of the activation functions
input layer Input for the Nueral network (called layer 0, x)
output layer Output of the Neural network (y). Activation of the last layer.
loss function Function thant compares output with ground truth
embedding more Type of the layer that output is a product of two matrices. X@W, (e.g.(5x1)@(5x8)=(8x1))

1551271030999

Activation Functions more

Instead of using perceptron with output 0 or 1, better is use activation function that return values between 0 and 1. In Deep learning there are couple functions to use as an activation.

Sigmoid function

In sigmoid function the difference is in the activation function.

Logistic function is:

{\displaystyle f(./assets/6f42e36c949c94189976ae00853af9a1b618e099.svg)={\frac {L}{1+e^{-k(x-x_{0})}}}}

Sigmoid is a special case when L=1, k=1, and x0 = 0.

img

1543948420834

Card image cap σ = 0.652

When Use?

  • predict the probability of the output. (simple classification for example: ‘dog/cat’).

List of Activation functions source

When Use?

  • Currently in the architecture except the last layer, ReLU is the most popular.
  • For last layer (output) ReLU is not used because we want to have output in the range (e.g. [0,1]), most popular is Sigmoid (Logistic).
  • Identiny is linear, it can be used for linear regression.

img

img

Loss functions more PyTorch

Name Pytorch Calculation Where Use?
Mean Absolute Error nn.L1Loss 1554546156699 Regression Problems (very rare)
Mean Square Error Loss nn.MSELoss 1554546384434 Regression Problems
Smooth L1 Loss nn.SmoothL1Loss 1554546413250 Regression Problems
Negative Log-Likelihood Loss nn.NLLLoss 1554546459528 Classification
Cross-Entropy Loss nn.CrossEntropyLoss 1554546499670 Classification
Kullback-Leibler divergence nn.KLDivLoss 1554546582184 Classification
Margin Ranking Loss nn.MarginRankingLoss 1554546627954 GANs, Ranking tasks
Hinge Embedding Loss nn.HingeEmbeddingLoss 1554546662939 Learning nonlinear embeddings
Cosine Embedding Loss nn.CosineEmbeddingLoss 1554546723665 Learning nonlinear embeddings

weight decay - SGD - L2 Regularization

SGD is a simple loss.backward() that goes backpropage through the model and update weights. If we define a loss as mean squared error:

1551806539435

Weight decay is a regularization term that penalizes big weights. When the weight decay coefficient is big, the penalty for big weights is also big, when it is small weights can freely grow.

There is no difference between L2 regularization and weight decay, in L1 regularization there is absolute value |w| instead of w^2.

1551806603749

Momentum more

Momentum is intended to help speed the optimisation process through cases, to avoid getting stuck in the "shallow valleys" when gradient is close to 0. Momentum accumulates the gradient of the past steps to determine the direction to go.

You update your weights based on the 90% of your previous result (default) and in 10% of your current gradient descent result. In the first time when you don’t have previous result you update in 100% by your gradient descent.

1551965981091

opt = optim.SGD(model.parameters(), lr, momentum=0.9)

RMSProp

Here instead of multiply by momentum we multiply by gradient divident by the root square of the previous result of the batch. In this case we multiply by the square of gradient. This mean that if you are going in the right direction make the number bigger.

  • If the gradient is really small and St is a small number than the update will be small
  • If the gradient the gradient is constenely very small and not volatile let’s get bigger jumps.

1551975097200

optimizer = optim.RMSprop(model.parameters(), 
                          lr = 0.0001, 
                          alpha=0.99, 
                          eps=1e-08, 
                          weight_decay=0, 
                          momentum=0)

Adam Optimizer

Adam this combination of RMSProp and Momentum, it’s in less epochs than both separetely.

1551967342004

The final calculation you can find with epsilon value (default 1e-10):

  • betas coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999)), in the above calculation bheta1, and bheta2.

1551966978062

optimizer = optim.Adam(model.parameters(), lr = 0.0001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)

Convolution Network more

Convolution Network is type of layer in the neural network that can help you to read information about image by putting a mask (like in the most popular image software ) called kernel https://en.wikipedia.org/wiki/Kernel_(image_processing) . It doesn’t have to bet a 3x3 matrix, but this is most popular. In the webpage http://setosa.io/ev/image-kernels/ you can play with the own matrix. The calculation is don for each pixel in the image. For each of them we get the matrix around the pixel with the size of the kernel (3x3) and multiply each of them. The sum is the new pixel of the image.

img

For example below matrix creates from your image of cat edges around your cat.

1552500642360

1552500673214

1552500687431

Define different matrix you can find different factors that appear on your image. In the popular models first layer can detect simple artifacts like edges more, but next layers can go into some deeper knowledge like if this is a cat or dog, based on this artifacts. In below example there are 96 kernels with image 11x11x3 that shows different aspects.

img

F.conv2d

In pytorch function for calculation convolution is F.conv2d,

parameter definition
input input data
weight kernel to use on the shape
bias optional bial
stride The stride of the convolving kerenl.
padding implicit zero padding on both sides of the input.
dilation the space between kernel elements
groups split input into groups
k = tensor([
    [0.  ,-5/3,1],
    [-5/3,-5/3,1],
    [1.  ,1   ,1],
]).expand(1,3,3,3)/6
F.conv2d(t[None], k,stride=(1,1),padding=1)

1552559867051

stride

Usual your kernel is shifted by 1 pixel (stride=1). Your kernel is start from the position (0,0) and move to the position (1,0) but you can shift the pixel by any value you want.

more

1552564165730

If you set a stride to 2 your kernell will be moved by 2 pixels, and the size of the input image will be 2 times smaller (for example for above image is [250,189]).

1552564234896

output = F.conv2d(t[None],k, 
                  stride=(2,2),
                  padding=1) #shape: ([1, 1, 500, 377])

padding

The default padding for function is 0. The problem is that your image will decrease in the size by 1 pixel for each corner (if you have kernel=(3,3))

[from (10,10) to (9,9)]

1552565495070

To avoid this we can add padding, increase the previous size with 0 values.

1552565871005

The output size is the same as the input size. For 5x5 kernel, the padding must be (2,2) to get the same size of the image.

1552566061084

nn.Conv2d

To create a layer for your model you have a function nn.Conv2d,

n = nn.Conv2d(in_channels = 3,
              out_channels=3,
              kernel_size = (3,3),
              padding=1, stride = 1)

show_image(n(t[None])[0])

1552560544169

show_image can only accept 1 channel and 3 channels, but you can create more and if you look into resnet34 you will find that first convolution layer has 64 channels as output, and 3 as an input.

from fastai.vision import *

model = models.resnet34()
print(model)

1552567407873

The size of the output more

When you look into summary for your model summary() in the fast.ai you will find the layers that represents convolutional network 2d and size of each layer.

learn.summary()

1552647460580

To calculate the output size of the image (width and height), you can use below formula.

1552571969347

Calculation of Param# for layers: more

  • Conv: kernel_size*kernel_size*ch_in*ch_out
  • Linear: (n_in+bias) * n_out
  • Batchnorm: 2 * n_out
  • Embeddings: n_embed * emb_sz

Examples:

t = torch.ones(1,3,100,100) #items, channels, width, height
layer = nn.Conv2d(
          in_channels =3,
          out_channels=3,
          kernel_size = (3,3),
          padding=(1,1), 
    	  stride = (1,1))

layer(t).shape

W = 100, H = 100, K=(3,3), P=(1,1), S=(1,1)

ow = (W-K+2P)/S + 1 = (100-3+2)/1+1=100 oh = (W-K+2P)/S + 1 = (100-3+2)/1+1=100

  • (items, channels,width,height): [1,3,100,100]
  • out_channels: 64
kernel stride padding Output size calc
[3,3] (1,1) (1,1) [1, 64, 100, 100] ((100-3+2*1)/1+1)
[7,7] (1,1) (1,1) [1,64,96,96] ((100-7+2*1)/1+1)
[3,3] (2,2) (1,1) [1,64,50,50] ((100-3+2*1)/2+1)
[3,3] (1,1) (0,0) [1,64,49,49] ((100-3+2*0)/1+1)

NETRON

You can also show your model in the NETRON APPLICATION https://github.com/lutzroeder/netron as a graph. For the pytorch learn.save() the application show error for me, but you can export your model to the ONNX format, and open in the application.

dummy_input = torch.randn(1, 3, 224, 224).cpu() # the size of your input

torch.onnx.export(learn.model.cpu(), 
                  dummy_input, 
                  "./models/resnet34-entire-model.onnx", 
                  verbose=True)

1552575612810

Batch Normalization more paper video

Batch normalization is a type of layer for the neural network that make loss surface smootherby normalizing parameters inside hidden layers, like you do with the input layer.

1552580302141

Bath Normalization reduce problem when input changes, so the loss function is more stable and less bumpy.

Batch normalization doesn’t reducing internal covariate shift

Internal Covariate shift refers to change in the distribution of layer inputs caused by updates to the preceding layers. (where for example you put different style of images next time)

Last papers shows that Batch Normalization doesn’t reduce ICS like in the original paper. https://arxiv.org/pdf/1805.11604.pdf](https://arxiv.org/pdf/1805.11604.pdf)

Benefits:

  • We can use higher learning rates because batch normalization makes sure that there’s no activation that’s gone really high or really low. And by that, things that previously couldn’t get to train, it will start to train.
  • It reduces overfitting because it has a slight regularization effects.

How it is calculated:

  1. We get mini-batch
  2. Calculate mean (e) anv variance (var) of the mini-batch
  3. normalize the output y=(x-e)/sqrt(var)
  4. Scale and shift the output using own parameters γ and β . This are wieghts that are also larned during a backward optimalization. We also need to scale and shift the normalized values otherwise just normalizing a layer would limit the layer in terms of what it can represent.

img

Dropout paper video

1552583126804

Dropout is a tregulariation method that for reducing overfitting in neural network, by preventing complex co-adaptaions on training data. At random, we throw away some percentage of the activations, After we finish traiing we remove dropout on the activation, because we wanted to be accurate. Dropout use Bernoulli distribution to remove some activations.

  • It helps working with overfitting. If we over fit some parts are recognizing particular image. If we turn off them during training the network the network will avoid this behaviour.
torch.manual_seed(7)
m = nn.Dropout(p=.5) # in 50% probability return 0.
input = torch.randn(4, 2)
m(input)

1552588690527

PyTorch WebPage

Functions

PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.

import torch
Command Info Description
torch.cuda.is_available() # 1543956014633 Is CUDA Available (On CPU returns False)
torch.backends.cudnn.enabled # 1543956000527 Is CUDA® Deep Neural Network library Available (used by fast.ai, On CPU returns False)
torch.cuda.device_count() # 1543955982482 Number of GPUs
torch.cuda.get_device_name(device) # 1543955759448 Gets the name of a device number
torch.cuda.set_device(device) # 1543955961561 Set the current GPU device
defaults.device = torch.device('cpu') Set default device to CPU
torch.manual_seed(n) Set manual seed for random initialization for weights, and oher random calculations.

How to train a model?

In this example there will be creation of model for PyTorch without using fastai library from basics. 1st you need download your data http://deeplearning.net/data/mnist/mnist.pkl.gz. and unpack in your folder.

We can load our data using gzip library, and pickle, mnist.pkl is a pickle file that have been divided into train and validation. We can load them separetely.

import gzip
import pickle
import matplotlib.pyplot as plt

path ='./mnist.pkl.gz'

with gzip.open(path, 'rb') as f:
    ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding='latin-1')
    
plt.imshow(x_train[0].reshape((28,28)), cmap="gray")

1551976476263

DataSet / DataLoader

For PyTorch we need to convert the data into Tensor and prepare DataSet, and DataLoader for our model.

  • DataSet -is used for get original values,
  • DataLoader - is a class that returns values during training, we don’t want to load the whole dataset into graphics memory, and second we can do some transformation (like data augmentation) before return to the model.
import torch
import torch.nn as nn
from torch.utils import data

# CUDA for PyTorch
use_cuda = torch.cuda.is_available()
device = torch.device("cuda:0" if use_cuda else "cpu")

x_train,y_train,x_valid,y_valid = map(torch.tensor, (x_train,y_train,x_valid,y_valid))


bs=64

# # We can use simple TensorDataset, instead of creating own
class DatasetXY(data.Dataset):
  'Characterizes a dataset for PyTorch'
  def __init__(self, X, Y):
        'Initialization'
        self.X = X
        self.Y = Y

  def __len__(self):
        'Denotes the total number of samples'
        return len(self.X)

  def __getitem__(self, index):
        'Generates one sample of data'
        # Select sample
        x = self.X[index]
        y = self.Y[index]
        return x.to(device), y.to(device)

train_ds = DatasetXY(x_train,y_train)
valid_ds = DatasetXY(x_valid,y_valid)

# we ccan use this isntead of DatasetXY
#train_ds = data.TensorDataset(x_train, y_train)
#valid_ds = data.TensorDataset(x_valid, y_valid)
    
train_dl = data.DataLoader(train_ds,batch_size=bs)
valid_dl = data.DataLoader(valid_ds,batch_size=bs)

Model

Next, we define our model, with one linear layout, cuda() is required to work model in graphics card.

# Our model
class Mnist_Logistic(nn.Module):
    def __init__(self):
        super().__init__()
        self.lin = nn.Linear(784, 10, bias=True)

    def forward(self, xb): return self.lin(xb)

    model = Mnist_Logistic().cuda()

Backward propagation

The last option is to update our weights, learn algorithm, and prepare loss function. nn.CrossEntropyLoss()


lr=2e-2
loss_func = nn.CrossEntropyLoss()

def update(x,y,lr):
    wd = 1e-5
    y_hat = model(x)
    # weight decay
    w2 = 0.
    for p in model.parameters(): w2 += (p**2).sum() #weight decay
    # add to regular loss
    loss = loss_func(y_hat, y) + w2*wd
    #print(loss.item())
    loss.backward()
    with torch.no_grad():
        for p in model.parameters():
            p.sub_(lr * p.grad)
            p.grad.zero_()
    return loss.item()

Now we can learn our model, based on the training data loader.

Learn

def get_batch(dl):
    for x,y in dl: yield x.to(device),y.to(device)

losses = [update(x,y,lr) for x,y in get_batch(train_dl)]
plt.plot(losses);

1551982934877

Using 2 layer model

We can define different model and work on it.

import torch.nn.functional as F


class Mnist_NN(nn.Module):
    def __init__(self):
        super().__init__()
        self.lin1 = nn.Linear(784, 50, bias=True)
        self.lin2 = nn.Linear(50, 10, bias=True)

    def forward(self, xb):
        x = self.lin1(xb)
        x = F.relu(x)
        return self.lin2(x)
    
model = Mnist_NN().cuda()

losses = [update(x,y,lr) for x,y in get_batch(train_dl)]
plt.plot(losses);

1551983216377

Adam optimizer

We can also use Adam optimizer instead of our own function to update.

from torch import optim

def update(x,y,lr):
    opt = optim.Adam(model.parameters(), lr)
    y_hat = model(x)
    loss = loss_func(y_hat, y)
    loss.backward()
    opt.step()
    opt.zero_grad()
    return loss.item()

losses = [update(x,y,1e-3) for x,y in get_batch(train_dl)]
plt.plot(losses);

1551983650860

Using model with fastai

We can also use our model with fastai library, instead of creating our.

data = DataBunch.create(train_ds, valid_ds, bs=bs) # We don't need create our own loader

loss_func = nn.CrossEntropyLoss()

class Mnist_NN(nn.Module):
    def __init__(self):
        super().__init__()
        self.lin1 = nn.Linear(784, 50, bias=True)
        self.lin2 = nn.Linear(50, 10, bias=True)

    def forward(self, xb):
        x = self.lin1(xb)
        x = F.relu(x)
        return self.lin2(x)

learn = Learner(data, Mnist_NN(), loss_func=loss_func, metrics=accuracy)
learn.lr_find()
learn.recorder.plot()

learn.fit_one_cycle(1, 1e-2)

1551984419055

1551984427879

fast.ai github

Fast.ai is a very popular library for deep learning based on the PyTorch. The library simplifies training fast and accurate neural nets using modern best practices. fast.ai doesn’t replace any base functionality from the PyTorch. It adds a lot of functions that can help you build a model faster.

Requires Python >=3.6 (python --version)

Installation / Google Colab source

On Google Colab you don’t have to install fast.ai. It is installed already. Google Colab gives you Tesla K80 machine with 12GB of GPU RAM.

  1. Go to a page: https://colab.research.google.com
  2. Create new Python 3 notebook

1548781198400

  1. Make sure that you have setup GPU mode

1548781236103

1548781252590

  1. Check version
import fastai
print('fastai: version: ', fastai.__version__)

1548787769768

  1. Test some code from the fast.ai course:

%reload_ext autoreload
%autoreload 2
%matplotlib inline

from fastai import*
from fastai.vision import *

bs = 64
# bs = 16   # uncomment this line if you run out of memory even after clicking Kernel->Restart

path = untar_data(URLs.PETS); path
path_anno = path/'annotations'
path_img = path/'images'


fnames = get_image_files(path_img)
np.random.seed(2)
pat = r'/([^/]+)_\d+.jpg$'


data = ImageDataBunch.from_name_re(path_img, 
                                   fnames, pat, 
                                   ds_tfms=get_transforms(), 
                                   size=224, 
                                   bs=bs
                                  ).normalize(imagenet_stats)

data.show_batch(rows=3, figsize=(7,6))

1548781400369

print(data.classes)
#len(data.classes),data.c

learn = create_cnn(data, models.resnet34, metrics=error_rate)
print(learn.model)

1548781504899

learn.fit_one_cycle(1)
interp = ClassificationInterpretation.from_learner(learn)
losses,idxs = interp.top_losses()


interp.plot_top_losses(9, figsize=(15,11))

1548781518290

1548791676403

  1. You can open the github repo directly from the course https://github.com/fastai/course-v3.git

1548782192774

1548782244713

  1. Or Upload from your computer

1548782293026

1548788564305

installion / AWS

AWS Free Tier doesn’t include GPU computer

The instance called p2.xlarge that is dedicated for the deep learning cost about 0.9$ per hour. . First you need request to increase the service limit.

source

p2.xlarge has the NVIDIA GK210 Tesla K80 GPU Machine

1548792962650

  1. When you login for your AWS as a Free Tier you don’t have acces to create instance called p2.xlarge. First go to the dashboard and request for new limit for this instance.

1548782638507

1548782656757

1548782674826

In your request form choose the best Region, Instance Type, and New Limit Value.

1548782883630

In the description you can type: ‘[FastAI] Limit Increase Request’

1548782954025

After couple of days (usually one) you will get new limit for your Free Tier (This will still charge you for this computer)

  1. After you have ability to create new p2.xlarge instance, you can create new one.

In the services click EC2, and next instances

1548783181553

1548783218231

and click Launch instance

1548783245159

  1. In a list find deep learning instances

1548783307267

And select newest one:

1548783324760

  1. In the Instance type choose p2.xlarge and click Next: Configure Instance details

1548783390814

  1. We don’t change anything in Instance Details, just click next

1548783484833

  1. In the storage we stay as it is ( 75 GB).

1548783505382

  1. We don’t add any tags.
  2. In Configure securiity Group, add your IP to allow from your IP allow to connect (it can be only SSH or All traffic if you want), and click next

1548792681361

  1. You will see that your network is not eligible for the free usage tier, click Launch.

1548792728048

  1. Now you need to create your key pair to login into your instance. If you create one you can use the same, but first:

Choose name, and click Download Key Pair, and store the key in the safe place.

1548783776548

If you are using putty.exe, you need to divide your key into private and public key.

1548783977245

1548783995980

  • Click Save private key

1548784064758

  • This will generate your private key (.ppk) to use in the putty.exe

If you loose your key

You will loose your access to the server

  1. Click view Instances

1548792807620

  1. On the instance you can now login with the putty.exe
  • In a connection\SSH\Auth browse your private key (.ppk)

1548784332519

  • In the session write your IP

1548792569401

  1. Login as ubuntu

  2. update your libralies

conda install -c fastai fastai
  1. Setup password for your notebook
 jupyter notebook password

1548784862543

  1. Run jupyter notebook
ipython notebook --no-browser --port=8002
  1. Start tunneling to get to your notebook from the local webbrowser (-i is your downloaded .pem file from the Amazon AWS instance)
ssh -N -f -L localhost:8002:localhost:8002 ubuntu@__IP__ADRESS__ -i fastai-01.pem
  1. got to the webpage: http://localhost:8002

1548785475325

  1. Verify you choose kernel: pytorch36

1548786200488

  1. Verify library for fastai
from fastai import*
from fastai.vision import *
  • You will get the error about dataclasses. You need to install them

1548785884823

!pip install dataclasses
  1. Verify download data:
bs = 64
# bs = 16   # uncomment this line if you run out of memory even after clicking Kernel->Restart

path = untar_data(URLs.PETS); path
path_anno = path/'annotations'
path_img = path/'images'


fnames = get_image_files(path_img)
np.random.seed(2)
pat = r'/([^/]+)_\d+.jpg$'


data = ImageDataBunch.from_name_re(path_img, 
                                   fnames, pat, 
                                   ds_tfms=get_transforms(), 
                                   size=224, 
                                   bs=bs
                                  ).normalize(imagenet_stats)

data.show_batch(rows=3, figsize=(7,6))

1548786238799

Error while loading data

Sometimes you will get the error:

RecursionError: maximum recursion depth exceeded while calling a Python object

You need to increase limit of the recursion:

import sys; sys.setrecursionlimit(10000) lub więcej, 100k

  1. Verify creating model
print(data.classes)
#len(data.classes),data.c

learn = create_cnn(data, models.resnet34, metrics=error_rate)
print(learn.model)
  1. Verify calculating epochs.
learn.fit_one_cycle(1)
interp = ClassificationInterpretation.from_learner(learn)
losses,idxs = interp.top_losses()


interp.plot_top_losses(9, figsize=(15,11))

1548786340529

1548786462857

  1. Stop your instance after you finish working (or Terminate to remove the whole instance)

1548792872371

Remember to stop your instance

If you don’t do this Amazon will charge you for every hour of using your instance. If You only Stop your instance, Amazon will charge you for ELASTIC BLOCK STORE (Free Tier allow you to have maximum 30GB and p2.xlarge has 75GB) but the cost is much less than running machine (about 5 dollars per month).

installation / Google Cloud

Google cloud also is not a free option, but at the beginining you’ve got 300$ to use in your computer. It is enough to run a machine and go through all courses in the fast.ai library. First think you need to do is to create an account at https://cloud.google.com/.

Free credits are not eligible to run GPU machine, that’s why you need post a ticket to give you permission for GPU machine.

  1. On your console https://console.cloud.google.com, in the IAM & Admin go to Quotas.

1551261621298

  1. Filter metrics to GPUs (all regions)

1551261729127

  1. Click on the Compute Enginge API, and create new Quota.

1551261828322

  1. Set up new limit to 1.

1551261869250

  1. Wait for acceptance of your ticket. It could take couple hours, or even couple days.

1551261929324

  1. When you have new limit, time to create a new instance. The best option is to create your instance by console. Install gcloud going trhough the instructions: https://cloud.google.com/sdk/docs/quickstarts

A lot of errors

I’ve got a lot of errors during installation on Windows 10. Most of them I resolved by restarting a system.

  1. Now create your machine:

Windows command GPU

gcloud compute instances create "my-instance-01" --zone="us-west2-b" --image-family="pytorch-latest-gpu" --image-project=deeplearning-platform-release --maintenance-policy=TERMINATE --accelerator="type=nvidia-tesla-p4,count=1" --machine-type="n1-highmem-8" --boot-disk-size=200GB --metadata="install-nvidia-driver=True" --preemptible

WIndows command CPU

gcloud compute instances create "my-instance-01" --zone="us-west2-b" --image-family="pytorch-latest-cpu" --image-project=deeplearning-platform-release --maintenance-policy=TERMINATE  --machine-type="n1-highmem-8" --boot-disk-size=200GB --metadata="install-nvidia-driver=True" --preemptible

Bash/Linux:

export IMAGE_FAMILY="pytorch-latest-gpu" # or "pytorch-latest-cpu" for non-GPU instances
export ZONE="us-west2-b" # budget: "us-west1-b"
export INSTANCE_NAME="my-fastai-instance"
export INSTANCE_TYPE="n1-highmem-8" # budget: "n1-highmem-4"

# budget: 'type=nvidia-tesla-k80,count=1'
gcloud compute instances create $INSTANCE_NAME \
        --zone=$ZONE \
        --image-family=$IMAGE_FAMILY \
        --image-project=deeplearning-platform-release \
        --maintenance-policy=TERMINATE \
        --accelerator="type=nvidia-tesla-p4,count=1" \
        --machine-type=$INSTANCE_TYPE \
        --boot-disk-size=200GB \
        --metadata="install-nvidia-driver=True" \
        --preemptible

1551262933905

  1. You should also see your instance in the google console. https://console.cloud.google.com

1551263023358

1551263120582

  1. Now you can login to your jupyter notebook. This will forward a jupyter port 8080 to your localhost port.
gcloud compute ssh jupyter@my-instance-01 -- -L 8080:localhost:8080

gcloud compute ssh refuses connection (return code 255)

When I first try to login to the ssh I’ve got error 255. For this recreate your internal routing. more

gcloud compute routes list
gcloud compute routes create default-internet \
--destination-range 0.0.0.0/0 \
--next-hop-gateway default-internet-gateway
  1. Login to your machine http://localhost:8080/lab?.

1551263348669

  1. Update to the latest version
pip install fastai --upgrade
  1. Create new notebook and run. (this version has already installed fastai)
from fastai import*
from fastai.vision import *
print(__version__)

bs = 64
# bs = 16   # uncomment this line if you run out of memory even after clicking Kernel->Restart

path = untar_data(URLs.PETS); path
path_anno = path/'annotations'
path_img = path/'images'


fnames = get_image_files(path_img)
np.random.seed(2)
pat = r'/([^/]+)_\d+.jpg$'


data = ImageDataBunch.from_name_re(path_img, 
                                   fnames, pat, 
                                   ds_tfms=get_transforms(), 
                                   size=224, 
                                   bs=bs
                                  ).normalize(imagenet_stats)

data.show_batch(rows=3, figsize=(7,6))

1551263919741

print(data.classes)
#len(data.classes),data.c

learn = create_cnn(data, models.resnet34, metrics=error_rate)
print(learn.model)

learn.fit_one_cycle(1)
interp = ClassificationInterpretation.from_learner(learn)
losses,idxs = interp.top_losses()


interp.plot_top_losses(9, figsize=(15,11))

1551264505006

Stop your instance

Remember to stop your instance on the console when you finish your work.

fast.ai / Help functions

open_image()

Open single image,

im = open_image('images/Maine_Coon_97.jpg')

1549698053830

open_mask(fn)

Open mask by the function fn., that returns mask image file.

mask = open_mask(get_y_fn(img_f))
mask.show(figsize=(5,5), alpha=1)

1550779832997

.show()

Show image in the output.

im = open_image('images/Maine_Coon_97.jpg')
im.show(title='Title of the image')

1549698531867

.apply_tfms()

Apply transformation for the image and return new image.

im.apply_tfms(get_transforms()[0][3]).show()

1549736278158

ImageCleaner

Displays images for relabeling or deletion and saves changes in path.

from fastai.widgets import *

ds, idxs = DatasetFormatter().from_toplosses(learn, ds_type=DatasetType.Valid)
ImageCleaner(ds, idxs, path)

1550257534536

ImageCleaner doesn’t work on Google Colab

For this situation you need to delete files by yourself, or use Jupyter Notebook.

.show_heatmap()

Pytorch alows you to hook the layer and store some information before. In the resnet34 if you look at the last layer you will find that last linea [1,37] is your layer for classification (cat/dogs breeds), previous layers are also flat layers with classification [512], this is the end of resnet34 model.

We have two groups of layers learn.model[0], and learn.model[1]. We can get first layer with the last Conv2dlayer, and BatchNorm2d that size is [1,512,11,11] , and usign hook_output to get what is stored in the output for that category (preds[0,int(cat)].backward()), next we get this information from the hook_a.stored[0].cpu() variable, and show the heatmap.

1552590896874

def show_heatmap(learn,x,y):
    m = learn.model.eval(); # set mode to evaluation mode
    xb,_ = learn.data.one_item(x)
    
    
    xb_im = Image(data.denorm(xb)[0]);
    
    
    xb = xb.cuda()
    
    print(xb.shape)
    
    from fastai.callbacks.hooks import hook_output
    def hooked_backward(cat=y):
        with hook_output(m[0]) as hook_a: 
            with hook_output(m[0], grad=True) as hook_g:
                # print(m[0])
                preds = m(xb)
                # print(preds.shape)
                preds[0,int(cat)].backward()
        print(hook_a.stored.shape)
        return hook_a,hook_g
    
    hook_a,hook_g = hooked_backward(y);
    acts  = hook_a.stored[0].cpu()
    avg_acts = acts.mean(0)
    
    print(acts.shape,avg_acts.shape)
    
    def show_heatmap(hm):
        _,ax = plt.subplots()
        xb_im.show(ax)
        _,width,height = xb_im.shape
        ax.imshow(hm, 
                  alpha=0.6, extent=(0,width,height,0),
                  interpolation='bilinear', cmap='magma');
        
    show_heatmap(avg_acts)

x,y = data.train_ds[5004]
show_heatmap(learn,x,y)     

1552593708692

1552593695073

Another layer

Let’s now change m[0] to m[0][1]. This is a first layer of your network (after BatchNorm2d). What you can see that first layer regnozie edges of the pet.

1552594605148

fast.ai / Data

Before we put our data into learner we need to load them, and fast.ai gat a whole class that will divide your data into train, test , and validation, prepare some data augmentation so you don’t have to do it by yoursef

Basic functions

Name Description
data.c number of classes
data.classes list of classes
data.train_ds Train Dataset
data.valid_ds Validation dataset
data.test_ds Test dataset
data.batch_size batch size
data.dl(ds_type=DatasetType.Valid) Return data loader, (example: ds.dl(DatasetType.Valid))
data.one_item(item) Get item into a batch
data.one_batch() Get one batch of from the DataBunch. Returns x,y with size of the batch_size (if bs=128 then there is a list of 128 elements).

ImageDataBunch

Class that as an input have images.

parameter description
path folder path to the directory of images
ds_tfms List of transformation for images
size The size of the image as input for data (widthand heightfor the image are the same). If the image is greater than will be cropped.

ImageDataBunch / from_folder

Load data that are prepared in the folder structure. train, test , valid

1549650957873

data = ImageDataBunch.from_folder(path, ds_tfms=get_transforms(), size=24)

1549650578727

ImageDataBunch / from_csv

When you’ve got a .csv file with filenmaes and classes

data = ImageDataBunch.from_csv(path, ds_tfms=get_transforms(), size=28)

1549651318454

1549651350934

ImageDataBunch / from_df

Get data from loaded DataFrame

df = pd.read_csv(path/'labels.csv')
data = ImageDataBunch.from_df(path, df, ds_tfms=tfms, size=24)

[0,1]

ImageDataBunch / from_name_re

Get data from regular expression in the file name. regexr

1549652416468

pat = r"/(\d)/\d+\.png$"
data = ImageDataBunch.from_name_re(path, fnames, pat=pat, ds_tfms=get_transforms(), size=24)

['3', '7']

ImageDataBunch / from_name_func

data = ImageDataBunch.from_name_func(
    path, 
    fnames, 
    ds_tfms=get_transforms(), 
    size=24,
    label_func = lambda x: '3' if '/3/' in str(x) else '7')

['3', '7']

ImageDataBunch / from_list

Get Image data from the list that contains list of classes for each file in fnames.

labels = [('3' if '/3/' in str(x) else '7') for x in fnames]


data = ImageDataBunch.from_lists(path,
                                 fnames, 
                                 labels=labels, 
                                 ds_tfms=get_transforms(), 
                                 size=24)

['3', '7']

DataBunch / Own Item List

Usually you have a lot of datasets with a lot of different type of data, folder structure, etc… It is impossible to write all possible combinations to write your own DataBunch for calculation, that’s why you can step by step put different function on each step of your dataset.

No Step Description
1. DataType Define what is your DataSource, and what is the output
2. from_* Define how to get the input data (files, dataframes, csv)
3. *split* How to split your data for training, validation and test
4. label_* How to label your data, output y. Returns DataSet.
5. Transforms List of transformations for your input data
6. databunch(bs=*) Convert to DataBunchclass
7. normalize() Optional step for DataBunch to normalize input.

Examples 1/ Planet Data

We’ve got all files in one folder, without dividing into train and validation.

1550778286094

1550778228993



np.random.seed(42)
data = (ImageItemList
       .from_csv(path,
                 'train_v2.csv', 
                 folder='train-jpg', 
                 suffix='.jpg')
       .random_split_by_pct(0.2)
       .label_from_df(label_delim=' '))
	   .transform(tfms, size=128)
       .databunch(bs=16)
       .normalize(imagenet_stats))
  1. We setup random value in the begining to get the same validation, and training set each time.
  2. The Input is Image and output is list of categories, that’s why we use ImageItemList
  3. The labels are form .csv file, the first column has image filename (default cols=0), and the file is train_v2.csv in the path. Folder for images is train-jpg, and after read the image name add suffix .jpg.
  4. Because there is no split, we add random split 0.2
  5. Labels are in the 1st column (default cols=1), with space delimeter label_delim
  6. We add default transforms tfms, with flip_vert=True, because this are satelite images.
  7. Create DataBunch, with bs=16.
  8. We normalize images with imagenet_stats

Example 2 / IMDB DataBase

1550778883858

data_lm = (TextList.from_folder(path)
           #Inputs: all the text files in path
            .filter_by_folder(include=['train', 'test', 'unsup']) 
           #We may have other temp folders that contain text files so we only keep what's in train and test
            .random_split_by_pct(0.1)
           #We randomly split and keep 10% (10,000 reviews) for validation
            .label_for_lm()           
           #We want to do a language model so we label accordingly
            .databunch(bs=bs))

1550779135552

DataBunch / 1. DataType more

Class Description
CategoryList for labels and classification
MultiCategoryList for labels in a multi classification problem
FloatList for float labels in a regression problem
ImageItemList for data that are images
SegmentationItemList like ImageItemListbut will default labels to SegmentationLabelList
SegmentationLabelList for segmentation mask
ObjectItemlist like ImageItemListbut will default labels to ObjectLabelList
PointsItemList for points (of the type ImagePoints)
ImageList for image to image tasks
TextList for text date
TabularList for tabular data
CollabList for collaborative filtering

DataBunch / 2. from_

function Description
from_folder(path) From folder defined in pathmore
from_df(path, df) From DataFrame (df)
from_csv(path,csv_name) Create ItemList from .csv file,

DataBunch / 3. split

How to split your data for training, validation and test.

function Description
no_split No split data between train and val (empty validation set)
random_split_by_pct(valid_pct=0.2) Split by random value
split_by_files(valid_names) Split by list of files for validation
split_by_fname_file(fname,path) Split by list of files in the fnamefile.
split_by_folder(train='train', valid='valid') Split by the folder name
split_by_idx(valid_idx) Split by list of indexes of valid_idx
split_by_idxs(train_idx,valid_idx) Split by list of indexes of train_idx, valid_idx
split_by_list(train,valid) Split by list for train, and valid
split_by_vavlid_func(func) Split by the function that return True if it is for valido .
split_from_df(col) Split the data from col in the DataFrom

DataBunch / 4.label

Define the output for items (grand truth)

Name Description
label_empty() EmptyLabel for each item
label_from_list(labels) Label from the list of labels
label_from_df(cols=1) Set label as column in the dataframe
label_const(const=0) Set label as value
label_from_folder() Get label from the parent folder of the file (e.g. cars\train\porshe\img_001.jpg, tha label will be porche)
label_from_func(func) Get label from the function
label_from_re(pat) Get label from pattern
label_for_lm() Labers are from Language Model

DataBunch / 5. Transforms more

Add list of transforms like Data augmentation

Parameter Description
tmfs List of random transformation
size size of the image (224,224) or 224 if it’s square
resize_method Type of resize: ResizeMethod.CROP
ResizeMethod.CROP - resize so that the image fits in the desired canvas on its smaller side and crop
ResizeMethod.PAD - resize so that the image fits in the desired canvas on its bigger side and crop
ResizeMethod.SQUISH - resize theimage by squishing it in the desired canvas
ResizeMethod.NO - doesn't resize the image
padding_mode Padding mode
zeros - fill with zeros
border -fill with values from border pixel
reflection- fill with reflection
.transform(tfms)
data = (ImageList.from_folder(path, 
                      convert_mode='L'
                     ).split_by_folder()
                      .label_from_folder()
                      .transform(tfms=get_transforms(), 
                                 size=(224,224), 
                                 padding_mode='border',
                                 resize_method=ResizeMethod.PAD)
                      .databunch(bs=64, num_workers=4).normalize())

1554552160493

DataBunch / 6. Dataunch

Convert to DataBunch. The difference from DataSet is that DataBunch divide data into train, valid, and test DataSet.

.databunch(bs=64)
Function Description
.show_batch(rows=5,ds_type=DatasetType.Train) Show in the result example images
ds_type=[DatasetType.Train, DatasetType.Test, DatasetType.Valid,Single, DatasetType.Fix]
.dl(ds_type) Returns DeviceDataLoader
.c Return count of classes
.classes List of classes
.train_ds, train_dl Return train DataSet, or train DataLoader. First value is input, and second value is output.(e.g. data.train_ds[0]):
(Image (3, 128, 128), MultiCategory haze;primary)
valid_ds, valid_dl Valid DataSet, Valid DataLoader
test_ds, test_dl Test DataSet, Test DataLoader

.normalize()

Normalize data based on the mean and standard devation of the image using the Standard Score ((x-mean)/std). For the image you normalize value for each channel separetely (R,G,B). If you download images from standard sources you can use predefined statistics. Otherwise use batch_stats() that returns statistics for your images.

cifar_stats = ([0.491, 0.482, 0.447], [0.247, 0.243, 0.261])
imagenet_stats = ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
mnist_stats = ([0.15]*3, [0.15]*3)
data = ImageDataBunch.from_name_re(
    path_img, 
    fnames, 
    pat, 
    ds_tfms=get_transforms(), 
    size=224, bs=bs
)

data = data.normalize(data.batch_stats())

#or data = data.normalize(imagenet_stats)

fast.ai / Data augmentation more

Group of functions that changes your data without changing the meaning of the data. Usefull to test your model when you want to add some data augmentation. For example little zoom on the image, flip the image, rotate in a small range).

get_transforms() source

Return list of transforms for the image that can be used in the ImageDataBunch as a df_tfms parameter. The list is divided into transformation for train and validation

parameter description
do_flip Image can be flipped
flip_vert Image can be flipped vertically (like satelite images or images from top)
max_rotate maximum rotation of the image (default: 10)
max_zoom maximum zoom of the image (default: 1.1)
max_lighting maximum lighting of the image (default: 0.2)
max_warp maximum warp of the image (default: 0.2)
p_affine the probability that each affine transform and symmetric warp is applied
p_lighting the probability that each lighting transform is applied
xtra_tfms a list of additional transforms

1549719418642

zoom crop() more

Randomly zoom and/or crop

parameter description
scale Decimal or range of decimals to zoom the image
do_rand if true, transform is randomized
p probability to apply the zzom
tsfms = zoom_crop(scale=(0.75, 2), do_rand=True)

1549721099158

rand_resize_crop()

Randomly resize and crop the image.

parameter description
size Final size of the image
max_scale Zoom the image to a arandom scal up to this
ratios Range of rations in which a new one will be randomly picked
tfms = [rand_resize_crop(224)]

1549721362837

List of transforms

brightness() more

Apply change in brightness of image. (0 - black, 0.5 no change, 0.9 - bright)

tfms = [brightness(change=(0.1,0.9),p=0.9)]

1549722249727

contrast more

Apply scale to contrast of the image. (0 - grey, >1 - super contrast, 1.0 - original image)

tfms = [contrast(scale=(0.1,2.0),p=0.9)]

1549722744720

crop more

Crop the image with the size and return image with the new size.row_pct and col_pct are the position of the left/top corner.

tfms = [crop(size=250, row_pct=0,col_pct=0,p=0.7)]

1549723001104

crop pad more

Like crop, but if the final image is biggar it will also add padding to the image.

crop_pad(im, int(size), 'zeros', 0.,0.)

1549730189408

Why there is no effect in crop_pad on `apply_tfms`

crop_pad is used but on an image that was resized so that its lower dimension is equal to the size you pass.* source

dihedralmore

Randomly flip and apply rotation of a multiple of 90 degrees.

If the target is an ImagePoints, or an ImageBBox use dihedral_affine function.

tfms = [dihedral()]

1549731670616

for k, ax in enumerate(axs.flatten()):
    dihedral(im, k).show(ax=ax, title=f'k={k}')

1549731840065

flip lrmore

Randomly flip horizontaly.

If the target is an ImagePoints, or an ImageBBox use ``flip_affine`function.

tfms = [flip_lr(p=0.5)]

1549732187896

jitter more

Changes pixels randomly replacing them with pixels from the neighorbhood. magnitude parameter is used to get information how far the neighborhood extends.

tfms = [jitter(magnitude=(-0.05,0.05))]

1549732538184

pad more

Apply padding to the image: (padding - size of padding )

  • mode=zeros : pads with zeros
  • mode=border: repeats the pixels at the border
  • mode=reflection: pads by taking the pixels symmetric to the border
tfms = [pad(padding=30,mode='zeros',p=0.5)]

1549732802157

perspective warp more

Apply perspective warping that show a different 3D perspective of the image.

tfms = [perspective_warp(magnitude=(0,1),p=0.9)]

1549733306310

symmetric_warp more

Apply symetric warp of magnitude.

tfms = [symmetric_warp(magnitude=(-0.2,0.2))]

1549733878458

cutout more

Cut out n_holes number of square holes at random location

tfms = [cutout(n_holes=(1,20))]

1549734050015

fast.ai / Learner

Basic Functions

Function Description
learn.save('filename') Save learner
learn.load('filename') Load learner from the file
learn.export() Export the learner as pickle file export.pkl
learn = load_learner(path) load learner from the folder whith the exported file export.pkl
learn.load_encoder('fine_tuned_enc') Load encoder
learn.save_encoder('fit-tuned-enc') Save encoder
learn.data DataBunch connected with learner
Collaborative filtering:
learn.bias(arr,is_item) a bias vector for user or item, learn.bias(['Star Wars (1977)'],True)
learn.weight(arr,is_item) a weight matrix for user or item, learn.weight(['Star Wars (1977)'],True)
parameters() list of all parameters

Encoder is used in seq2seq more problem, by techinque called trasfer learning.

  • The encoder is essentially tasked with creating a mathematical representation of the language based on the task for predicting the next word.
  • A decoder is responsible for taking that representation and applying it to some problem (e.g., predicting the next word, understanding sentiment, etc…) source

.show_results()

Show the output of the learning with Ground Truth, and Predictions.

learn.show_results(rows=3, figsize=(8,9))

1550783391484

.predict()

predict the data

Parameter Description
item The item to predict
im = open_image(fnames[184])
learn.predict(im)

1549735528945

partial()

Partial is help function from Python, that can create a new function with different parameters. Normally accuracy_tresh get tresh=0.5, but with partial we can use this function with default tresh=0.2

acc_02 = partial(accuracy_thresh, thresh=0.2)
f_score = partial(fbeta, thresh=0.2)

learn = create_cnn(data, arch, metrics=[acc_02, f_score])

1550780654267

.to_fp16()

Normally precission is 32 bit, but you can downgrade this precission to 16bit. It can speed up yuor learning by 200%.

learn = learn.to_fp16()

Learner / cnn_learner()

Create a Learner object from the data object and model from the architecture.

Parameter Description
data DataBunch
arch Architecture of the model
metrics additional metrics to show during learning
data = ImageDataBunch.from_folder(path,
                                   ds_tfms=get_transforms(), 
                                   size=224, 
                                   bs=bs
                                  ).normalize(imagenet_stats)

learn = cnn_learner(data, models.resnet34, metrics=error_rate)

1549734500237

Example using ImageList and converts to Black&White

data = (ImageList.from_folder(path, 
                      convert_mode='L' # 'L', 'RGB', or 'CMYK'
                     ).split_by_folder()
                      .label_from_folder()
                      .transform(tfms=get_transforms(), 
                                 size=(224,224), 
                                 resize_method=ResizeMethod.SQUISH)
                      .databunch(bs=64, num_workers=4).normalize())
data.show_batch(3,3)

1554551513821

Learner / unet_learner() more more

img

Create Unet architecture learner that converts image to image (mask). The suitable class that is used for DataBunch is SegmentationItemList (that is the same as ImageList).

Parameter Description
data DataBunch
arch Architecture
metrics List of metrics
wd From Learner class weight-decay for regularization in Adam optimalization more
bottle bottle flags if we use a bottleneck or not for that skip connection
learn = unet_learner