Description
In this project, you will practice what you learn in class to solve a real-world data mining problem. You can choose any problem that you are interested in as long as it can be formulated as a data mining task. This project is a team project. Each team should not have more than two members.
Complete the following tasks:
-
Pick a real-world application that data mining may help.
-
Formulate it as a data mining problem (clustering, classification, pattern mining, anomaly detection, recommendation, or a combination of these tasks).
-
Collect relevant datasets. Some possible sources:
-
Preprocess the datasets into the format that can be used by data mining algorithms if necessary.
-
Apply your implemented algorithms or any existing package to solve the proposed problem.
-
Discuss the data mining results you obtain and evaluate the results.
-
Prepare for a short report based on the key points of your project. Name it as project.pdf or project.doc or project.docx
-
Log in any CSE department server and submit your report as follows:
submit_cse469 project.pdf
Your report should include the following components.
-
Introduction: What data mining problem you are trying to solve? What impact it will bring if the problem is solved?
-
Formulation: Which data mining task it can be formulated into? What’s the input and the expected output?
-
Datasets: Where do you get the datasets? Give some statistics about the data. How do you preprocess the data?
-
Algorithm: Which data mining algorithm do you apply?
-
Experiments: Evaluate the output using an appropriate evaluation metric. Show the results you get and discuss whether they are meaningful.
-
(Optional) Challenges: What challenges do you find in the data? How do you tackle these challenges?