Description
Q0 (0pts correct answer, -1,000pts incorrect answer: (0,-1,000) pts): A correct answer to the following questions is worth 0pts. An incorrect answer is worth -1,000pts, which carries over to other homeworks and exams, and can result in an F grade in the course.
-
Student interaction with other students / individuals:
-
-
I have copied part of my homework from another student or another person (plagiarism).
-
-
-
Yes, I discussed the homework with another person but came up with my own answers. Their name(s) is (are)
-
-
-
No, I did not discuss the homework with anyone
-
-
On using online resources:
-
-
I have copied one of my answers directly from a website (plagiarism).
-
-
-
I have used online resources to help me answer this question, but I came up with my own answers (you are allowed to use online resources as long as the answer is your own). Here is a list of the websites I have used in this homework:
-
-
-
I have not used any online resources except the ones provided in the course website.
-
1
Learning Objectives: Let students understand basic feed-forward neural networks and Backpropagation algorithm.
Learning Outcomes: After you nish this homework, you should be capable of explaining and implement-ing feed-forward neural networks with arbitrary architectures or components from scratch.
Concepts
Q1 (2.5 pts): Please answer the following questions concisely. All the answers, along with your name and email, should be clearly typed in some editing software, such as Latex or MS Word.
-
(0.5) Most practitioners will not use linear activations in deep multilayer perceptron. Prove that such deep neural networks would only be able to model linear functions (no matter how many layers or how many hidden neurons).
-
(0.5) Following the previous question, what if we place all the activations with Recti ed Linear Unit (ReLU)? Would it solve the problem you mentioned in the previous question? Why or why not?
-
(0.5) Learning with ReLUs. Could a ReLU activation cause problems when learning a model with gradient descent? Could some layers of neural network to stop learning from data? Under which conditions?
-
(0.5) Prove that unsupervised methods can be used to learn supervised tasks.
-
(0.5) In multitask learning we are given a dataset D = f(x(tr)i; yi;(tr)1; : : : ; yi;T(tr))gni=1(tr) , where the ys are
labels of T distinct tasks. Our goal is to learn p(y1; : : : ; yT jx) from D. Describe the di erence between multitask learning and transfer learning. Consider the task of facial recognition to distinguish males from females. To train this model, suppose we have a small dataset with images of humans but a very large dataset with pictures of dogs and cats. Should we use multitask learning or transfer learning. And how?
Programming (7.5 pts)
Throughout this semester you will be using Python and PyTorch as the main tool to complete your homework, which means that getting familiar with them is required. PyTorch (http://pytorch.org/tutorials/ index.html) is a fast-growing Deep Learning toolbox that allows you to create deep learning projects on di erent levels of abstractions, from pure tensor operations to neural network blackboxes. The o cial tutorial and their github repository are your best references. Please make sure you have the latest stable version on the machine. Linux machines with GPU installed are suggested. Moreover, following PEP8 coding style is recommended.
2
Skeleton Package: A skeleton package is available at https://www.dropbox.com/s/69bccs60w4v0f1g/hw2_skeleton.zip?dl=0. You should download it and use the folder structure provided. In some homework, skeleton code might be provided. If so, you should based on the prototype to write your implementations.
Introduciton to PyTorch
PyTorch, in general, provides three modules, from high-level to low-level abstractions, to build up neural networks. We are going to study 3 speci c modules in this homework. First, the module that provides the highest abstraction is called torch.nn. It o eres layer-wise abstraction so that you can de ne a neural layer through a function call. For example, torch.nn.Linear(.) creates a fully connected layer. Coupling with contains like Sequential(.), you can connect the network layer-by-layer and thus easily de ne your own networks. The second module is called torch.AutoGrad. It allows you to compute gradients with respect to all the network parameters, given the feed-forward function de nition (the objective function). It means that you don’t need to analytically compute the gradients, but only need to de ne the objective function while coding your networks. The last module we are going to use is torch.tensor which provides e ecient ways of conducting tensor operations or computations so that you can customize your network in the low-level. The o cial PyTorch has a thorough tutorial to this (http://pytorch.org/tutorials/beginner/ pytorch_with_examples.html#). You are required to go through it and understand all three modules well before you move on.
HW Overview
In this homework, you are going to implement vanilla feed-forward neural networks om a couple of di erent ways. The overall submission should be structured as below:
3
bruno ribeiro hw2
my neural networks
init .py
networks.py
activations.py
mnist.py
minibatcher.py
any others.py
report.pdf
ReadMe
hw2 training.py
hw2 demo.py
hw2 learning curves.py
bruno ribeiro hw2: the top-level folder that contains all the les required in this homework. You should replace the le name with your name and follow the naming convention mentioned above.
report.pdf: Your written solutions to all the homework questions, including theoretical and program-ming parts. Should be submitted in pdf format.
ReadMe: Your ReadMe should begin with a couple of example commands, e.g., “python hw2.py data”, used to generate the outputs you report. TA would replicate your results with the commands provided here. More detailed options, usages and designs of your program can be followed. You can also list any concerns that you think TA should know while running your program. Note that put the information that you think it’s more important at the top. Moreover, the le should be written in pure text format that can be displayed with Linux “less” command.
hw2 training.py: One executable we prepared for you to run training with your networks.
hw2 learning curves.py: One executable for training models and plotting learning curves.
hw2 learning demo.py: Demonstrate some basic Python packages. Just FYI.
my neural networks: Your Python neural network package. The package name in this homework is my neural networks, which should NOT be changed while submitting it. Two modules should be at least included:
{ networks.py { activations.py
Except these two modules, a package constructor init .py is also required for importing your mod-ules. You are welcome to architect the package in your own favorite. For instance, adding another module, called utils.py, to facilitate your implementation.
4
Two additional modules, mnist.py and minibatcher.py, are also attached, and are used in the main executable to load the dataset and create minibatches (which is not needed in this homework.). You don’t need to do anything with them.
Data: MNIST
You are going to conduct a simple classi cation task, called MNIST (http://yann.lecun.com/exdb/ mnist/). It classi es images of hand-written digits (0-9). Each example thus is a 28 28 image.
The full dataset contains 60k training examples and 10k testing examples.
We provide a data loader (read images(.) and read labels(.) in my neural networks/mnist.py) that will automatically download the data.
Warm-up: Implement Activations
Open the le my neural networks/activations.py. As a warm up activity, you are going to implement the activations module, which should realize activation functions and objective functions that will be used in your neural networks. Note that whenever you see “raise NotImplementedError”, you should implement it.
Since these functions are mathematical equations, the code should be pretty short and simple. The main intuition of this section is to help you get familiar with basic Python programming, package structures, and test cases. As an example, a Sigmoid function is already implemented in the module. Here are the functions that you should complete:
relu: Recti ed Linear Unit (ReLU), which is de ned as
-
(zkl
otherwise :
l
= relu(z
l
0
if zkl < 0
a
k
) =
k
softmax: the basic softmax
akL = sof tmax(zkL) =
ezkL
;
(1)
Pc ezcL
stable softmax: the numerically stable softmax. You should test if this outputs the same result as the basic softmax.
sof tmax(xi) = |
exi |
|||||||||
exj |
||||||||||
PCexi |
||||||||||
j |
||||||||||
= |
||||||||||
C |
ePi |
j |
exj |
|||||||
= |
||||||||||
x +log C |
||||||||||
Output is different and there |
j |
exj+logC |
||||||||
P |
||||||||||
was ‘NaN’/’Inf’ value, so I use |
||||||||||
A common choice for the constant is logC = |
maxj xj. |
|||||||||
stable_softmax to avoid the issue |
||||||||||
5 |
z = x – max(x) |
|||||||||
Softmax contains exp() and |
||||||||||
numerator = np.exp(z) |
||||||||||
cross-entropy contains log(),so this can |
denominator = np.sum(numerator) |
|||||||||
happen:large number –> exp() –> |
softmax = numerator/denominator |
|||||||||
overflow NaN –> log() –> still NaNeven |
||||||||||
though, mathematically (i.e., without |
(2)
(3)
(4)
cross entropy:
X X X L
E = td log aLk = td(zdL log ezc ): (5)
d d c
where d is a data point; td is its true label; aLk is the propability predicted by the network.
Hints: make sure you tested your implementation with corner cases before you move on. Otherwise, it would be hard to debug.
Warm-up: Understand Example Network
Open the les hw1 training.py and my neural networks/example networks.py.
hw1 training.py is the main executable (trainer). It controls in a high-level view. The task is called MNIST, which classi es images of hand-written digits. The executable uses a class called TorchNeuralNetwork fully implemented in my neural networks/example networks.py.
In this task, you don’t need to write any codes, but only need to play with the modules/executables pro-vided in the skeleton and answer qeustions. A class called TorchNeuralNetwork is fully implemented in my neural networks/example networks.py. You can run the trainer with it by feeding correct ar-guments into hw2 training.py. Read through all the related code and write down what is the correct command (“python hw2 training.py” with arguments) to train such example networks in the report.
Here is a general summary about each method in the TorchNeuralNetwork.
init (self, shape, gpu id=-1): the constructor that takes network shape as parameters. The network weights are declared as matrices in this method. You should not make any changes to them, but need to think about how to use them to do vectorized implementations.
{ Your implementation should support arbitrary network shape, rather than a xed one. The shape is in speci ed in tuples. For exapmles, “shape=(784, 100, 50, 10)” means that the numbers of neurons in the input layer, rst hidden layer, second hidden layer, and output layer are 784, 100, 50, and 10 respectively.
{ All the hidden layers use ReLU activations. { The output layer uses Softmax activations.
{ Cross-Entropy loss should be used as the objective.
train one epoch(self, X, y, y 1hot, learning rate): conduct network training for one epoch over the given data X. It also returns the loss for the epoch.
{ this method consists of three important components: feed-forward, backpropagation, and weight updates.
6
{ (Non-stochastic) Gradient descent is used. The gradient calculatation should base on all the input data. However, this part is given.
predict(self, X): predicts labels for X.
You need to understand the entire skeleton well at this point. TorchNeuralNetwork should give you a good starting point to understand all the method semantics, and the hw2 training.py should demonstrate the training process we want. In the next task, you are going to implement another two classes supporting the same set of methods. The inputs and outputs for the methods are the same, while the internal imple-mentations have di erent constrains. Therefore, make sure you understand all the method semantics and inputs/outputs before you move on.
Q2 (2 pts): Implement Feedforward Neural Network with Autograd
Open the le my neural networks/networks.py.
The task here is to complete the class AutogradNeuralNetwork. In your implementation, several con-strains are enforced:
You are NOT allowed to use any high-level neural network modules, such as torch.nn, unless it is speci ed. No credits will be given if similar packages or modules are used.
You need to follow the methods prototypes given in the skeleton. This contrain might be removed in the future. However, as the rst homework, we want you to know what do we expect you to complete in a PyTorch project.
You should left at least the hw2 training.py untouched in the nal submission. During grading, we will replace whatever you have with the original hw2 training.py.
For AutogradNeuralNetwork, you only need to complete the feed-forward part. Other parts should already be given in the skeleton. You should be able to run the hw2 training.py in a way similar to what you discovered in the last task. Speci cally, what you need to is as follows:
Understand semantics of all the class members (variables), especially the few de ned in the constructor. Identify the codes related three main components for training: feed-forward, backpropagation, and
weight updates.
The second and third components are given. Only the feed-forward is left for you, so go ahead and complete the feed forward() method.
Things to be included in the report:
1. command line arguments for running this experiment with hw2 training.py.
7
-
Specify network shape as (784, 300, 100, 10). Collect results for 100 epochs. Make two plots: “Loss vs. Epochs” and “Accuracy vs. Epochs”. The accuracy one should include results for both training and testing data. Analyze and compare each plot generated in the last step. Write down your observations.
Hints:
The given skeleton has all the input/output de nitions. Please read through it, and if you found any typos or unclear parts, feel free to ask.
In general, you don’t need to change any codes given in the skeleton, unless it is for debugging. Feel free to de ne any helper functions/modules you need.
You might need to gure out how to conduct vectorized implementations so that the pre-de ned members can be utilized in a succinct and e cient way.
You are welcome to use GPUs to accelerate your program
For debugging, you might want to load less amount of training data to save time. This can done easily by make slight changes to hw2 training.py.
For debugging, you might want to explore some features in a Python package called pdb.
Q3 (1 pts): Learning Curves: Deep vs Shallow
Create a trainer le called hw2 learning curves.py
This executable has very similar structure to the hw2 training.py, but you are going to vary training data size to plot learning curves introduced in the lecture. Speci cally, you need to do the followings:
-
Load MNIST data: http://yann.lecun.com/exdb/mnist/ into torch tensors
-
Use AutogradNeuralNetwork.
-
Vary training data size ranged from 250 to 10000. You can decide a proper step.
-
Train and select a model for each data size. You need to design an early stop strategy to select the model so that the learning curves will be correct.
-
Plot learning curves for training and testing sets with
-
-
a network shape (784, 10)
-
-
-
a network shape (784, 300, 100, 10)
-
8
Things that should be included in the report:
command line arguments for running this experiment with hw2 learning curves.py. The early stop strategy you used in selecting models.
The 2 learning curve plots for the 2 network shapes.
Analyze and compare each plot generated in the last step. Write down your observations.
Hints: You should understand the information embedded in the learning curves and what it should look like. If your implementation is correct, you should be able to see meaningful di erences.
Q4 (4.5 pts): Implement Backpropagation from Scratch
Open the le my neural networks/networks.py.
Implement BasicNeuralNetwork, but you can NOT use torch.Autograd. All the other instructions are similar to what is in Q2. That is, you need to implement the entire “train one epoch” method, including backpropagation, feed forward, and weight updates. For the backpropagation, you need to analytically compute the gradients.
Here, we will use pytorch the same way that we have used numpy in the lecture notes. You will need to write your own backpropagation function from scratch, following what would be the correct gradients of the already-implemented forward pass.
Things to be included in the report:
-
(2.0) Implement the above in the le provided (networks.py). Make sure your code runs with the command line arguments for running hw2 training.py. Points will only be awarded if the code runs with the original command line.
-
(1.0) Write down all the mathematical formulas used in your backpropagation implementation.
-
(0.5) Use BasicNeuralNetwork. Specify network shape as (784, 300, 100, 10). Collect results for 100 epochs. Write in your report PDF two plots: “Loss vs. Epochs” and “Accuracy vs. Epochs”. The accuracy one should include results for both training and testing data. Analyze and compare each plot generated in the last step. Write down your observations.
-
(1.0) Modify hw2 learning curves.py to support creating learning curves of BasicNeuralNetwork. ll the other instructions are similar to what is in Q3. Things should be included in the report:
Implement the above in the le provided (hw2 learning curves.py). Command line arguments for running this experiment with hw2 learning curves.py.
The early stop strategy you used in selecting models. The 2 learning curve plots for the 2 network shapes.
Analyze and compare each plot generated in the last step. Write down your observations in the report PDF.
9
Submission Instructions
Please read the instructions carefully. Failed to follow any part might incur some score deductions.
Naming convention: [ rstname] [lastname] hw2
All your submitting les, including a report, a ReadMe, and codes, should be included in one folder. The folder should be named with the above naming convention. For example, if my rst name is “Bruno” and my last name is “Ribeiro”, then for Homework 2, my le name should be \bruno ribeiro hw2″.
Tar your folder: [ rstname] [lastname] hw2.tar.gz
Remove any unnecessary les in your folder, such as training datasets. Make sure your folder struc-tured as the tree shown in Overview section. Compress your folder with the the command: tar czvf bruno ribeiro hw2.tar.gz czvf bruno ribeiro hw2 .
Submit: TURNIN INSTRUCTIONS
Please submit your compressed le on data.cs.purdue.edu by turnin command line, e.g. “turnin -c cs690dl -p hw2 bruno ribeiro hw2.tar.gz”. Please make sure you didn’t use any library/source explicitly forbidden to use. If such library/source code is used, you will get 0 pt for the coding part of the assignment. If your code doesn’t run on scholar.rcac.purdue.edu, then even if it compiles in another computer, your code will still be considered not-running and the respective part of the assignment will receive 0 pt.
10