<hr/>

# Inmas Machine Learning Workshop January 2023
Instructor: Christian Kuemmerle - kuemmerle@uncc.edu <br>
Teaching Assistants: Emily Shinkle, Yuxuan Li, Derek Kielty, Yashil Sukurdeep, Tim Wang, Ben Brindle.

# Neural Network & Deep Learning
This workbook is divided into three parts. <br>

In the first part, you'll learn to build a basic type of neural network (NN) from scratch. <br>

In the second part, you'll use PyTorch's functionality to build a more complicated NN. <br>

In the third part, you'll be able to build a NN of your own design and test it out.

All three sections will use the `FashionMNIST` dataset for training and testing.

In [None]:
import numpy as np
import torch as th
import torch.nn as nn
import PIL
PIL.PILLOW_VERSION = PIL.__version__

In [None]:
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt
from torch.utils.data import DataLoader

# Tensors

First, let's make sure we understand PyTorch's main data structure, the **tensor**. <br>
Torch tensors work very similarly to Numpy arrays. In fact, most of the same syntax has been implemented for tensors, so you can use a lot of the same commands.

**Fill in some numbers to create a 3x2 tensor.**

In [None]:
first_tensor = th.tensor([])
print(first_tensor)

**Access the (1,2)th element of the tensor the same way you would for a numpy array.**

In [None]:
element = 
print(element)

**"Flatten" the tensor `first_tensor` above into 1xN shape. See [its documentation](https://pytorch.org/docs/stable/generated/torch.flatten.html) for some context.**

In [None]:
first_flat = 
print(first_flat)

The main difference is that Torch tracks every operation that is performed on a **tensor**. 
This is done so that we can easily compute derivatives later.<br>
As such, using tensors is more computationally intensive, so we should use np arrays whenever we don't need gradients.

**Cast the tensor `first_tensor` above into a numpy array.**

In [None]:
first_array = 
print(first_array)

Now, we go the other way round: We create a `numpy.array` and then cast it as a `torch.Tensor`.

In [None]:
# create any np array 
second_array = np.array([1,2,3,4])

**Create the tensor `second_tensor` by casting `second_array` into a `torch.Tensor`.**

In [None]:
second_tensor = 

We print the results:

In [None]:
print(first_array, second_tensor)

# Data

Here we load up the `FashionMNIST` dataset, which is included with PyTorch and which we used in previous sessions.
Your computer will download it automatically online if it is not yet in your working folder.

In [None]:
training_data = datasets.FashionMNIST(
    root="./",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="./",
    train=False,
    download=True,
    transform=ToTensor()
)
print(training_data)
print(test_data)

This is a dictionary of the different types of images and their corresponding labels.

In [None]:
labels_map = {
    0: "T-Shirt",
    1: "Trouser",
    2: "Pullover",
    3: "Dress",
    4: "Coat",
    5: "Sandal",
    6: "Shirt",
    7: "Sneaker",
    8: "Bag",
    9: "Ankle Boot",
}
print(labels_map)

We create [`DataLoader`](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html) objects for each set. <br>
Basically, these are sophisticated ways of parsing through the data in batches.

In [None]:
train_dataloader = DataLoader(training_data, batch_size=50, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=1, shuffle=True)
vars(train_dataloader)

# Basic NN From Scratch

In this section, we will build a neural network from scratch. <br>
This means we'll be doing all of the multiplications and updates by hand. <br>
We will still use PyTorch's autograd to compute derivatives! <br>

We first create a **feed-forward, fully connected NN with one hidden layer**.

In Pytorch, neural networks are instances of a class we will create. <br>
If you are unfamiliar with classes - we basically create a type of object (a class), and imbue that type with methods (functions that it knows). <br>
Subsequently, we create an instance of this class to "create" an actual NN.

In [None]:
class first_NN(nn.Module):
    # This method creates the needed parameters for the NN
    def __init__(self, input_size, hidden_size, output_size):
        super(first_NN, self).__init__()
        
        # Set up parameters based on inputs of init function
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        # Set up the weights.
        # Since our network is fully connected, each layer needs one matrix of weights.
        # They can be initialized to random values.
        self.W1 = th.randn(self.input_size, self.hidden_size, requires_grad=True)
        self.W2 = th.randn(self.hidden_size, self.output_size, requires_grad=True)

    # Here we create the sigmoid activation function.
    # Look up the sigmoid function and fill in the missing value here.
    # You will need th.exp(), the exponential function for tensors.
    def sigmoid(self, x):
        return 1 / (1 + th.exp(-x))
    
    # We will need the softmax function. Look it up and fill it in here.
    def softmax(self,x):
        return th.exp(x)/sum(th.exp(x))
    
    # We will also need Cross-Entropy Loss.
    def loss(self,probs,label):
        return -th.log(probs[label]+(1e-15)) # Note that we add a small number to the probs to avoid calculating log(0)
    
    # This function defines a "forward pass" of the network; we take input X, send it through the net, and output class probabilities.
    # In between each layer of the network, we include the sigmoid activation function.
    # At the final layer, the softmax function turns the outputs into a vector of probabilities - one per class.
    def forward(self, x):
        x_flat = th.flatten(x) # We need to turn our images into 1D vectors
        layer1 = th.matmul(x_flat,self.W1)
        activ1 = self.sigmoid(layer1)
        layer2 = th.matmul(activ1,self.W2)
        soft = self.softmax(layer2)
        return soft
    
    # This function takes an input image x and a correct class label y.
    # It performs a forward pass, then compares the results to the true label.
    # It then computes the value of a loss function, takes the derivative with respect to the loss function, and uses the derivative to update the parameters (W1,W2,W3).
    # We do this with multiple samples before performing an update for stability.
    # This is called Stochastic Gradient Descent.
    def train(self, x_list, y_list, gamma, display=0):
        batch_size = x_list.shape[0]
        update_W1 = th.zeros(self.input_size,self.hidden_size,batch_size)
        update_W2 = th.zeros(self.hidden_size,self.output_size,batch_size)
        accumulated_loss = 0
        
        for i in range(batch_size):
            input_data = x_list[i,:,:]
            label = y_list[i]
            
            # Perform a forward pass here using the function we created above
            output_probs = self.forward(input_data)
            
            # Compute the loss function
            loss_value = self.loss(output_probs, label)
            accumulated_loss += loss_value
            
            # Perform backpropogation using autograd. This is the power of PyTorch!
            # All PyTorch tensors initialized with parameter `requires_grad=True` store a gradient value which can be accessed
            # using the `.grad` method. When we call `loss_value.backward()` in the line below, the gradients for all tensors 
            # involved in the calculation of `loss_value` are calculated and stored.
            loss_value.backward()
            nn.utils.clip_grad_norm_([self.W1, self.W2], 10000) # This rescales the gradients if their norm exceeds 10,000, to prevent taking too large of a step.
            update_W1[:,:,i] = self.W1.grad.clone().detach() #Extract the calculated gradients and store for later.
            update_W2[:,:,i] = self.W2.grad.clone().detach()
            
            # Reset the gradients in preparation for the next loop. This is an essential step; future gradients will not be correct otherwise.
            self.W1.grad.data.zero_()
            self.W2.grad.data.zero_()

        # Sum up the updates from our batch, scale them, and add them to the paramter matrices.
        sum_W1 = th.sum(update_W1,2)
        sum_W2 = th.sum(update_W2,2)
        
        with th.no_grad(): # This line prevents this operation from being considered when calculating future gradients.
            self.W1 += -gamma*sum_W1
            self.W2 += -gamma*sum_W2
    
        if display:
            print(output_probs)
        return accumulated_loss.item()
    
    # This function will be what we use to actually classify an image.
    # Set up the function to perform a forward pass, then choose the most likely class as the classification.
    def classify(self,x,display=0):
        with th.no_grad():
            probs = self.forward(x)
            most_likely_class = th.argmax(probs).item()
            if display:
                print("This image is a " + str(labels_map[most_likely_class]) + ".")
        return most_likely_class

We recall the batch size from above - it was defined in the data loader section.

In [None]:
print(train_dataloader.batch_size)

Next, we create the actual neural network. <br> 
We note that the input size is the number of pixels in the image. <br>
The hidden size is the dimension of our hidden layers and is chosen arbitrarily. <br>
The output size is the number of image classes.

In [None]:
net = first_NN(input_size=784, hidden_size=20, output_size=10)

In [None]:
num_batches = 12000 # the number of batches provides a maximum to the the number of optimization "steps" to be taken
tol = 20 # We define a tolerance for the training process. If the value of the evaluated loss is smaller than this number, we stop training

returned_losses = [] # list of returned losses (initialize as empty list)

After having defined the neural network architecture we want to use and having specified its main parameters, we can *train* the network in the following for loop. The losses per iteration are collected in `returned_losses`.

In [None]:
for i in range(num_batches):

    # This line grabs the next bit of data from our set
    x_batch, y_batch = next(iter(train_dataloader))
    x_batch = x_batch[:,0,:,:]
    
    # Run a training instance on the NN and store the returned loss. Print value every 100 iterations.
    if i % 100 == 0:
        returned_losses.append(net.train(x_batch, y_batch, gamma=.01, display=1))
        print("The accumulated loss value at iteration " +str(i) + " is " + str(returned_losses[i]) + ".")
    else:
        returned_losses.append(net.train(x_batch, y_batch, gamma=.01))
    
    # If we ever do really well in terms of loss, we quit training!
    # Note that we set a pretty high tolerance here since we don't want to spend the entire workshop training.
    if returned_losses[i] < tol:
        break

**Use matplotlib or other plotting software to plot your accumulated loss function versus time.**

In [None]:
### Add your code here ###
plt.figure()
plt.plot(returned_losses)
plt.xlabel("Iterations")
plt.ylabel("Value of Loss Function")
plt.show()

Finally, we test the NN using the test set.

In [None]:
size_testset = 10000
errors = 0

for i in range(size_testset):
    x_test, y_test = next(iter(test_dataloader))
    
    # Call the prediction function
    if (i%1000) == 0:
        prediction = net.classify(x_test, display=1)
    else:
        prediction = net.classify(x_test)
    
    # Count how many misclassifications we make
    if prediction != y_test:
        errors += 1
        
print("The misclassification rate is " + str(errors/size_testset) + ".")

## Exercise:
* **Create a new neural network class `first_NN_mod`. Compared to `first_NN`, add another hidden layer of shape `hidden_size` x `hidden_size`. Add it after the first sigmoid function. Add an additional sigmoid function immediately following the new hidden layer. This will involve updating the code in a number of places. Consider making a copy of the cells above to edit.** <br>
Start by copying the entire code of the class `first_NN` above, and then proceeding with the modifications by adding code at certain locations wherever necessary.

In [None]:
class first_NN_mod(nn.Module):
### Add your code here ###



* **Train `first_NN_mod` and evaluate the performance of it after the training process with respect to the test set. Visualize the `returned_losses`. <br> Compare the resulting misclassification rate compared to the one of `first_NN`.**

In [None]:
### Add your code here ###


In [None]:
### Add your code here ###


In [None]:
### Add your code here ###


# A more intricate Neural Network Architecture

In this section, we use the full functionality of PyTorch to build a neural network with more intricate architechtural elements.

**Make sure to identify the parts of the next cell defining `second_NN` that need additional code to be written.**

In [None]:
class second_NN(nn.Module):   
    def __init__(self):
        super(second_NN, self).__init__()
        
        # We set up each layer using PyTorch language. 
        # We start with a convolutional layer, then a normalization, then a ReLU activation function, and finally some pooling.
        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding=1), # Kernel size is the size of the convolution filter. out_channels is the number of outputs produced by the layer.
            nn.BatchNorm2d(32), # A normalization to keep our values in check. Good practice.
            nn.ReLU(), # An activation function, probably the most popular one.
            nn.MaxPool2d(kernel_size=2, stride=2) # A pooling layer.
        )
        
        # Do it all again!
        self.layer2 = nn.Sequential(
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3),
            # Finish the layer! Keep in mind that the number of out_channels is different here, so the shape of the input
            # to the next layer will also be different.
        )
        
        # End with a few linear (fully connected) layers, plus a dropout layer.
        self.fc1 = nn.Linear(in_features=64*6*6, out_features=600) # This is the same thing as our fully connected layers above, ie. multiplication by a matrix.
        self.drop = nn.Dropout2d(0.25) # We remove some features that do not contain useful information.
        self.fc2 = nn.Linear(in_features=600, out_features=120)
        self.fc3 = nn.Linear(in_features=120, out_features=10)
        
    # We've already defined what every layer will be; now we just put them in order.
    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = x.view(x.size(0), -1) # Here we reshape the tensor so it can go into the linear layer.
        x = self.fc1(x)
        x = self.drop(x)
        x = self.fc2(x)
        out = self.fc3(x)
        return out
    
    def classify(self,x,display=0):
        with th.no_grad():
            # Create the softmax function using built-in classes.
            f = nn.Softmax(dim=1)
            probs = f(self.forward(x)) # Call softmax on the output of the forward function.
            most_likely_class = th.argmax(probs).item() # Choose the most likely class.
            if display:
                print("This image is a " + str(labels_map[most_likely_class]) + ".")
        return most_likely_class

A NOTE: How does one know what types of layers to use for a particular application? <br> Often, researchers begin by 
looking for similar problems to determine a starting point. Then one can experiement by adding or removing layers,
resizing layers, and adjusting other layer parameters. Networks can be compared by looking at their performances
on the test data.

In [None]:
# This package allows us to easily add progress bars. Look for the use of `trange` below.
try: 
    from tqdm.notebook import trange
except ModuleNotFoundError:
    !conda install --yes tqdm
    from tqdm.notebook import trange

In [None]:
# Here we create an instance of the net
net = second_NN()

# Here we select a loss function that's built into PyTorch
loss = nn.CrossEntropyLoss()

# Here we select an optimizer. This replaces all of the manual backprop we did before.
# This is usually done externally to the net, unlike earlier.
# There are several choices, but Adam is the most popular.
learning_rate = 0.001
optimizer = th.optim.Adam(net.parameters(), lr=learning_rate)


num_batches = 200 # A small number, for speed
returned_losses = []

for i in trange(num_batches): # Notice the replacement of `range` with `trange`
    # As before
    x_batch, y_batch = next(iter(train_dataloader))
    print(y_batch.shape)

    # Perform a forward pass
    outputs = net.forward(x_batch)
    print(outputs.shape)
    
    # Make sure we zero out the gradient each time - we don't want leftover gradients from the previous iteration to be added in!
    optimizer.zero_grad()

    # Perform backpropogation
    losses = loss(outputs, y_batch)
    losses.backward()
    returned_losses.append(losses.item())
    if (i % 10) == 0:
        print("The accumulated loss value at iteration " + str(i) + " is " + str(losses.item()))

    # Adam automatically updates all the parameters!
    optimizer.step()


In [None]:
# Test the NN out on the test set!
# No guidance here - you can program this yourself!


## Exercises:
* (Optional) Adjust the parameters of the convolutional steps in each layer (`nn.Conv2d`). Make note of how this impacts the network performance and training time.
* (Optional) Try using a different optimizer.

# Build Your Own Neural Network

In this section, you can design your own neural work to test on the same dataset.

**Set up the net as before, train it, and test it out!**

In [None]:
# Create your net!

class my_NN(nn.Module):
    # This method creates the needed parameters for the NN
    def __init__(self, input_size, hidden_size, output_size):
        super(my_NN, self).__init__()
        # Set up the basic parameters of the network

        
        
        # Be sure to define any math functions you may need here.
        
        

    # Create a forward pass function. Most of the creativity happens here! 
    # You can use any combination of linear, convolution, pooling layers etc., and any activation functions.
    # You may need to look at the PyTorch documentation to learn how to set the dimensions of each layer etc.
    # It is probably good to end with a softmax function, unless you have another way to convert the output into a probability vector!
    def forward(self, x):
        x = th.flatten(x, start_dim=-3) #`start_dim=-3` means we only flatten the last three layers, to keep separate images distinct 

        
        return x
        
    # Create a function that performs the needed classification.
    def classify(self,x,display=0):
        
        
        return 


In [None]:
# Train your network!

batch = 50
num_batches = 
returned_losses = []
tol = 

# Actually create the neural network.
# The hidden size is the dimension of our hidden layers.
# Let's see what happens if we increase the size of the hidden layer
net3 = my_NN(input_size=784, hidden_size=   , output_size = 10)




for i in trange(num_batches):
    x_batch, y_batch = next(iter(train_dataloader))
    

    

    # Perform backpropogation

    
    
    # Quit if better than your chosen tolerance
    if returned_losses[i] < tol:
        break
        

In [None]:
# Use matplotlib or other plotting software to plot your accumulated loss function versus time





In [None]:
# Test your network!








## Exercises:
* (Optional) Alter your training loop so that, at each step, the loss is also calculated for the test set data. Record this data in an additional list named `returned_test_losses`. Make a plot showing the evolution of both `returned_loss` and `returned_test_loss` over time. What do you observe?