Description
Problem 1
(k-means, 40pts) Generate 2 sets of 2-D Gaussian random data, each set containing 500 samples using parameters below.
-
1 = [1;0]; 2
= [0; 1:5]; 1
=
0:4
0:9
; 2
=
0:4
0:9
(1)
0:9
0:4
0:9
0:4
-
(20pts) Write a function cluster = mykmeans(X, k, c) that clusters data X 2 Rn p (n number of objects and p number of attributes) into k clusters. The c here is the initial centers, although this is usually not necessary, we will need it to test your function. Terminate the iteration when the ‘2-norm between a previous center and an updated center is 0:001 or the number of iteration reaches 10000.
-
(10pts) Apply your code to the data generated above with k = 2 and initial centers c1 = (10; 10) and
c2 = ( 10; 10). In your report, report the centers found for each cluster. How many iterations did it take? Show a scatter plot of the data and the centers of clusters found.
3. (10pts) Apply your code to the data generated above with k = 4 and initial centers c1 = (10; 10) and c2 = ( 10; 10), c3 = (10; 10) and c4 = ( 10; 10). In your report, report the centers found for each cluster. How many iterations did it take? Show a scatter plot of the data and the centers of clusters found.
Problem 2
(Non-parameteric density estimation 60pts)
-
(30pts) Write a function [p, x] = mykde(X,h) that performs kernel density estimation on X with bandwidth h. It should return the estimated density p(x) and its domain x where you estimated the p(x) for X in 1-D and 2-D.
Instructor: W. H. Kim (won.kim@uta.edu), TA: Xin Ma (xin.ma@mavs.uta.edu) Page 1 of 2
CSE4334/5334 Data Mining Assignment 1
-
(10pts) Generate N = 1000 Gaussian random data with 1 = 5 and 1 = 1. Test your function mykde
on this data with h = f:1; 1; 5; 10g. In your report, report the histogram of X along with the gures of estimated densities.
-
(10pts) Generate N = 1000 Gaussian random data with 1 = 5 and 1 = 1 and another Gaussian random data with 2 = 0 and 2 = 0:2. Test your function mykde on this data with h = f:1; 1; 5; 10g. In your report, report the histogram of X along with the gures of estimated densities.
-
(10pts) Generate 2 sets of 2-D Gaussian random data with N1 = 500 and N2 = 500 using the following parameters:
-
1 = [1; 0]; 2 = [0; 1:5]; 1
=
0:4
0:9
; 2
=
0:4
0:9
:
(2)
0:9
0:4
0:9
0:4
Test your function mykde on this data with h = f:1; 1; 5; 10g. In your report, report gures of estimated densities.
Instructor: W. H. Kim (won.kim@uta.edu), TA: Xin Ma (xin.ma@mavs.uta.edu) Page 2 of 2