Assignment 2, Programming Part Solved

$24.99 $18.99

Summary: In this assignment, you will implement a sequential language model (an LSTM) and an image classifier (a Vision Transformer). In problem 1, you will use built-in PyTorch modules to implement an LSTM and perform language modelling on wikitext and run some LSTM configurations.Download the LSTM embedding file from here and place it in ./data…

Rate this product

You’ll get a: zip file solution

 

Categorys:

Description

Rate this product

Summary: In this assignment, you will implement a sequential language model (an LSTM) and an image classifier (a Vision Transformer).

In problem 1, you will use built-in PyTorch modules to implement an LSTM and perform language modelling on wikitext and run some LSTM configurations.Download the LSTM embedding file from here and place it in ./data folder.

In problem 2, you will implement various building blocks of a transformer, including LayerNorm (layer normalization) and the Attention mechanism for vision Transformer and build an image classfier on CIFAR10.

In problem 3 , you will run the different transformer architectures and compare their performance to a simple CNN network.

The Wikitext-2 dataset comprises 2 million words extracted from the set of verified “Good” and “Featured” articles on Wikipedia. See this blog post for details about the Wikitext dataset and sample data. The dataset you get with the assignment has already been preprocessed using OpenAI’s GPT vocabulary, and each file is a compressed numpy array containing two arrays:tokens containing a flattened list of (integer) tokens, and sizes containing the size of each document.

You are provided a PyTorch dataset class (torch.utils.data.Dataset) named Wikitext2 in the utils folder. This class loads the Wikitext-2 dataset and generates fixed-length sequences from it. Throughout this assignment, all sequences will have length 256, and we will use zero-padding to pad shorter sequences.

In practice though, you will work with mini-batches of data, each with batchsize B elements. You can wrap this dataset object into a torch.utils.data.DataLoader, which will return a dictionary with keys source, target, and mask, each of shape (B, 256).

you want (do not forget the biases), and you can add modules to the init () function. How many learnable parameters does your module have, as a function of num heads and head size?

The ViT forward pass (6pts): You now have all building blocks to implement the forward pass of a miniature Vit model. You are provided a module PostNormAttentionBlock which corresponds to a full block with self-attention and a feed-forward neural network, with skip-connections, using the modules LayerNorm and MultiHeadedAttention you implemented before.

In this part of the exercise, you will fill in theVisionTransformer class in vit solution template.py. This module contains all the blocks necessary to create this model. In particular, get patches() is a module responsible for converting images in to token which are then converted into embeddings

Problem 3

Training ViT models (22pts) You will train each of the following architectures using an optimization technique and scheduler of your choice. For reference, we have provided a feature-complete training script (run exp vit.py) that uses the AdamW optimizer. You are free to modify this script as you deem fit. You do not need to submit code for this part of the assignment. However, you are required to create a report that presents the accuracy and training curve comparisions as specified in the following questions.

  1. (2pts)For each of the experiment configurations above, measure the average steady-state GPU memory usage (nvidia-smi is your friend!). Comment about the GPU memory footprints

2You can also make the table in LaTeX; for convenience you can use tools like LaTeX table generator to generate tables online and get the corresponding LaTeX code.

– Do not distribute –

Assignment 2, Programming Part Solved
$24.99 $18.99