CS-Homework 2 Solution

$30.00 $24.00

1 (10 points) PCA algorithm Give at least two algorithms that could take data set X = fx1; : : : ; xN g, xt 2 Rn 1; 8t as input, and output the first principal component w. Specify the computational details of the algorithms, and discuss the advantages or limitations of the algorithms. 2…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Description

5/5 – (2 votes)

1 (10 points) PCA algorithm

Give at least two algorithms that could take data set X = fx1; : : : ; xN g, xt 2 Rn 1; 8t as input, and output the first principal component w. Specify the computational details of the algorithms, and discuss the advantages or limitations of the algorithms.

2 (10 points) Factor Analysis (FA)

Calculate the Bayesian posterior p(yjx) of the Factor Analysis model x = Ay + + e, with p(xjy) = G(xjAy + ; e), p(y) = G(yj0; y), where G(zj ; ) denotes Gaussian distribution density with mean and covariance matrix .

3

(10 points) Independent Component Analysis (ICA)

Explain why maximizing non-Gaussianity could be used as a principle for ICA estimation.

4

(50 points) Dimensionality Reduction by FA

Consider the following Factor Analysis (FA) model,

x = Ay + + e;

(1)

p(xjy) = G(xjAy + ; 2I);

(2)

p(y) = G(yj0; I);

(3)

where the observed variable x 2 Rn, the latent variable y 2 Rm, and G(zj ; ) denotes Gaussian distribution density with mean and covariance matrix . Write a report on experimental comparisons on model selection performance by BIC, AIC on selecting the number of latent factors, i.e., dim(y) = m.

tushikui@sjtu.edu.cn

Specifically, you need to randomly generate datasets based on FA, by varying some setting values, e.g., sample size N, dimensionality n and m, noise level 2, and so on. For example, set N = 100; n = 10; m = 3; 2 = 0:1; = 0, and assign values for A 2 Rn m. The generation process is as follows:

  1. Randomly sample a yt from Gaussian density G(yj0; I), with dim(y) = m = 3;

  1. Randomly sample a noise vector et from Gaussian density G(ej0; 2I), with 2 = 0:1,

et 2 Rn;

  1. Get xt = Ayt + + et.

Collect all the xt as the dataset X = fxtgNt=1.

The two-stage model selection process for BIC, AIC is as follows:

Stage 1: Run EM algorithm on each dataset X for m = 1; :::; M, and calculate the log-likelihood

^

^

value ln[p(Xj m)], where m is the maximum likelihood estimate for parameters;

Stage 2: Select the optimal m by

m = arg maxm=1;:::;M J(m);

(4)

^

dm

(5)

JAIC (m) = ln[p(Xj k)]

^

ln N

(6)

JBIC (m) = ln[p(Xj k)]

dm

2

You may set M = 5, if you generate the dataset X based on n = 10; m = 3.

The following codes might be useful.

Python:

https://scikit-learn.org/stable/modules/generated/sklearn.

decomposition.FactorAnalysis.html#sklearn.decomposition.FactorAnalysis

5 (20 points) Spectral clustering

Use experiments to demonstrate that when spectral clustering works well, when it would fail.

Summarize your results.

The following codes might be helpful.

Python: https://scikit-learn.org/stable/modules/generated/sklearn.cluster. SpectralClustering.html

2

CS-Homework 2 Solution
$30.00 $24.00