Name: CS4851/6851 IDL: Homework 1
SKU: 28592
Price: 30.00 USD
Availability: InStock

Description

Rate this product

Note: All coding problems are to be implemented in a single jupyter notebook.

Note: this is the distribution of questions:

Question 1 to Question 8: Required for everyone.

Question 9 – Question 10: Required only for Graduate Students

Question 11 – Question 12: Bonus marks

Problem 1 (5 points)

What is the difference between Root Mean Squared Error (RMSE) and Mean Squared Error(MSE). Describe the context in which they will be most useful? Which of these losses penalizes large differences between the predicted and ex-pected results and why?

Problem 2 (5 points)

Which of the following statements is true about the relationship between features and dataset sizes? Choose one of the following and explain the chosen answer

When training a model, as you add more features to the dataset, you often need to increase the dataset’s size to ensure the model learns reliably.

When training a model, adding more features to the dataset increases the amount of information you can extract from the dataset. This allows you to use smaller datasets to train the model and still achieve good generalization performances from the training data.

When training a learning algorithm, as you decrease the number of features in the dataset, you need to increase the training sample size to make up the difference.

When training a learning algorithm, the number of features in your dataset is entirely dependent on the amount of information you can extract from the dataset.

Problem 3 (4 points)

You are building a deep learning model and after X epochs you see that the training loss is decreasing, while your validation loss is either constant or in-creasing. What would be the most likely cause of this and suggestion to rectify it

Learning rate is too high

Gradient descent does not converge

Insuﬀicient training data size

Gradient descent is stuck in a local minimum

Problem 4 (4 points)

Perceptron

is a linear classifier. A. True B. False

and cannot be trained on linearly unseperable data. A. True B. False

Problem 5 (4 points)

Logistic Regression

is a linear classifier. A. True B. False

and always has a unique solution A. True B. False

Problem 6 (10 points)

Which of the following is true, given the optimal learning rate?

Batch gradient descent is always guaranteed to converge to the global opti-mum of a loss function.

Stochastic gradient descent is always guaranteed to converge to the global optimum of a loss function.

For convex loss functions (i.e. with a bowl shape¹), batch gradient descent is guaranteed to eventually converge to the global optimum while stochastic gradient descent is not.

For convex loss functions (i.e. with a bowl shape), stochastic gradient de-scent is guaranteed to eventually converge to the global optimum while batch gradient descent is not.

For convex loss functions (i.e. with a bowl shape), both stochastic gradient descent and batch gradient descent will eventually converge to the global optimum.

For convex loss functions (i.e. with a bowl shape), neither stochastic gradient ndescent nor batch gradient descent are guaranteed to converge to the global optimum.

Problem 7 (5 points)

Given the following data on a 2D plane :

x y

-1 -2

1 -1

Fit a linear regression model to the data without the constant term: y_i = βx_i + ϵ_i. Give an expression of the minimization problem for finding β, Show how to compute it’s value and show the value

Problem 8 (20 points)

Perform a polynomial fitting to compute a design matrix X such that:

X_ij = y_i^j

(1)

You should implement this without a single for loop by using vectorization and broadcast. Here (1 ≤ j ≤ 50) and y = {−20, −19.9, …, 20}. Implement code that generates such a matrix.

¹More formally f is convex on [a, b] if ∀x₁, x₂ ∈ [a, b] if f (λx₁ + (1 − λ)x₂) ≤ λf (x₁) + (1 − λ)f (x₂)

Bonus for undergraduates beyond this line

Problem 9 (15 points)

Is this a good idea to initialize parameters with zeros when training a logistic regression model? Explain your answer.

Problem 10 (15 points)

As discussed in the class, the logistic regression likelihood simplifies to the fol-lowing form:

cost(f (x), y) = −y log(f (x)) − (1 − y) log(1 − f (x))

∈ {0, 1} class label

What will happen when y = 1 and f (x) = 1? Will this work in actual implementations?

Extra credit problems

Problem 11 (extra credit 20 points)

In the three examples of Figure , find the perceptron weights w₀, w₁, w₂ for each of them. The line that separates the two classes is given by the decision boundary that divied into positive and negative classes. (No calculations required, you can simply note the answer.)

Figure 1: Binary class data and separating decision boundaries.

Problem 12 (extra credit 20 points)

Consider function E below:

E = g(x, y, z) = σ(c(ax + by) + dz),

where x, y, z are input variables and a, b, c, d are parameters. σ is the logistic

sigmoid function defined as:
σ(z) =	1	(2)

	1 + e^−z

Note that E is also a function of a, b, c, d and for completeness could be written as E(x, y, z, a, d, c, d).

Derive expressions of partial derivatives of E with respect to parameters a, b, d

i.e ^∂E_∂a , ^∂E_∂b , ^∂E_∂d

CS4851/6851 IDL: Homework 1

Share this:

Share this:

Description

Share this:

Related products

CS4851/6851 IDL Homework 6

CS4851/6851 IDL Homework 5

CS4851/6851 IDL: Homework 3

CS4851/6851 IDL: Homework 3

CS4851/6851 IDL: Homework 2