Machine Learning Assignment 2

$24.99 $18.99

Password protect your ZIP file using a password with 8-10 characters. Use only alphanu-meric characters (a-z A-Z 0-9) in your password. Do not use special characters, punctu-ation marks, whitespaces etc in your password. Name your ZIP filesubmit.zip. Specify the file name in the URL properly in the Google form. Remember, your file is not under…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Categorys:

Description

5/5 – (2 votes)
  1. Password protect your ZIP file using a password with 8-10 characters. Use only alphanu-meric characters (a-z A-Z 0-9) in your password. Do not use special characters, punctu-ation marks, whitespaces etc in your password. Name your ZIP filesubmit.zip. Specify the file name in the URL properly in the Google form.

  1. Remember, your file is not under attack from hackers with access to supercomputers. This is just an added security measure so that even if someone guesses your submission URL, they cannot see your code immediately. A length 10 alphanumeric password (that does

TokenKind.LITERAL_INT: 1

,: 2

<: 1

++: 1

): 1

since the token TypeKind.INT appears four times in the line whereas the token < appears only once in the line etc. Since we have 225 dictionary tokens, each BoW vector is represented as a d = 225 dimensional vector. However, this vector is usually very sparse since most lines contain only a few unique tokens. You are given the BoW representations for 10000 erroneous lines in the filetrain in the assignment package. Note that you are not given the original token sequences, just the BoW representations.

The difference between prec@k and mprec@k largely arise due to the presence of rare error classes in the dataset i.e. errors that made more rarely by programmers. You will see in your data itself that an average error class is associated with just nˆ = 200 lines whereas there are a total of n = 10000 lines! Whereas a method can get very high prec@k by just predicting popular error classes in every test scenario, such a method may do poorly on mprec@k which gives a high score only if that method pays good attention to all error classes, not just the popular ones.

  1. LwP: in literature this is often used as a reranking step (reference [31] in the repository) but can be used as an algorithm in its own right as well. This involves learning one or more prototypes per error class (e.g. mean of feature vectors of all data points of a class can serve as a prototype for that class, or else multiple prototypes can be obtained by clustering data points of that class).

Marking Scheme. Parts 1-2 need to be answered in the PDF file itself and part 3 needs to be answered in the ZIP file submission. The 50 marks for part 3 will be awarded as follows: Total size of the submission, once unzipped (smaller is better): 10 marks

Total time taken to return top-5 recommendations for all test users (smaller is better): 10 marks

prec@1,3,5 (larger is better): 3 x 5 = 15 marks

mprec@1,3,5 (larger is better): 3 x 5 = 15 marks

  • How to Prepare the PDF File

Use the following style file to prepare your report. https://media.neurips.cc/Conferences/NeurIPS2022/Styles/neurips_2022.sty

For an example file and instructions, please refer to the following files https://media.neurips.cc/Conferences/NeurIPS2022/Styles/neurips_2022.tex https://media.neurips.cc/Conferences/NeurIPS2022/Styles/neurips_2022.pdf

You must use the following command in the preamble

\usepackage[preprint]{neurips_2022}

instead of \usepackage{neurips_2022} as the example file currently uses. Use proper ALTEX commands to neatly typeset your responses to the various parts of the problem. Use neat math expressions to typeset your derivations. Remember that all parts of the question need to be answered in the PDF file. All plots must be generated electronically – no hand-drawn plots would be accepted. All plots must have axes titles and a legend indicating what the plotted quantities are. Insert the plot into the PDF file using proper ALTEX \includegraphics commands.

  • How to Prepare the ZIP File

Your submission ZIP archive must contain a file called predict.py which we will call to get recommendations for test data points. The assignment package contains a sample submission filesample_submit.zip which shows you how you must structure your submission as well as showing you the way in which your prediction code predict.py must accept input and return recommendations. Do not change this input output behavior otherwise our autograder will not be able to process your recommendations and award you zero marks.

  1. Your ZIP archive itself must contain a python file namedpredict.py which should not be contained inside any other directory within the archive. Apart from this file your ZIP archive may contain any number of files or even directories and subdirectories.

  1. Do not change the name predict.py to anything else. The file must contain a method called findErrorClass which must take in the test features as a csr_matrix and a value k > 0 and return k suggestions for each test data point.

  1. There are no “non-editable regions” in any of the files for this assignment. So long as you preserve the input output behavior, feel free to change the structure of the file as you wish. However, do not change the name of the filepredict.py to anything else. Also, do not change the name of the main method findErrorClass to anything else.

  1. Code you download from the internet may be in C/C++. You are free to perform training in C/C++ etc. However, your prediction code must be a python file calledpredict.py. This file may internally call C/C++ code but it is your job to manage that using extension modules etc. Do not expect us to call anything other than the single filepredict.py

8

Machine Learning Assignment 2
$24.99 $18.99