Assignment #4 Solution

$30.00 $24.00

Disclaimer: This assignment requires students to work on BI Framework, and sentiment/semantic analysis. Submissions related to this assignment will not be used for commercial purposes. Objective: The objective of this assignment is to understand BI framework, creating star/ snowflake schema, and concept of sentiment and semantic analysis. Plagiarism Policy: This assignment is an individual task.…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Description

5/5 – (2 votes)

Disclaimer: This assignment requires students to work on BI Framework, and sentiment/semantic analysis. Submissions related to this assignment will not be used for commercial purposes.

Objective:

  • The objective of this assignment is to understand BI framework, creating star/ snowflake schema, and concept of sentiment and semantic analysis.

Plagiarism Policy:

  • This assignment is an individual task. Collaboration of any type amounts to a violation of the academic integrity policy and will be reported to the AIO.

  • Content cannot be copied verbatim from any source(s). Please understand the concept and write in your own words. In addition, cite the actual source. Failing to do so will be considered as plagiarism and/or cheating.

  • The Dalhousie Academic Integrity policy applies to all material submitted as part of this course. Please understand the policy, which is available at: https://www.dal.ca/dept/university_secretariat/academic-integrity.html

Assignment Rubric

Excellent

Proficient (15%)

Marginal (5%)

Unacceptable

Problem #

(25%)

(0%)

where

applied

Completeness

All required

Submission

Some tasks are

Incorrect and

Problem #3

including

tasks are

highlights tasks

completed,

irrelevant

Citation

completed

completion.

which are

However, missed

disjoint in

some tasks in

nature.

between, which

created a

disconnection

Correctness

All parts of the

Most of the given

Most of the

Incorrect and

Problem #2

given tasks are

tasks are correct

given tasks are

unacceptable

correct

However, some

incorrect. The

portions need

submission

Summer 2021 saurabh.dey@dal.ca

minor

requires major

modifications

modifications.

Novelty

The submission

The submission

The submission

There is no

Problem #1

contains novel

lacks novel

does not contain

novelty

contribution in

contributions.

novel

key segments,

There are some

contributions.

which is a clear

evidences of

However, there

indication of

novelty,

is an evidence of

application

however, it is not

some effort

knowledge

significant

Clarity

The written or

The written or

The written or

Failed to prove

Problem #1

graphical

graphical

graphical

the clarity. Need

materials, and

materials and

materials, and

proper

developed

developed

developed

background

applications

applications do

applications fail

knowledge to

provide a clear

not show clear

to prove the

perform the tasks

picture of the

picture of the

clarity.

concept, and

concept. There is

Background

highlights the

room for

knowledge is

clarity

improvement

needed

Citation:

McKinney, B. (2018). The impact of program-wide discussion board grading rubrics on students’ and faculty satisfaction. Online Learning, 22(2), 289-299.

Tasks

  • This assignment requires you to submit programming codes on gitLab, and a single PDF file on Brightspace.

Problem #1

Business Intelligence Reporting using Cognos

  1. Download the weather dataset available on https://www.kaggle.com/PROPPG-PPG/hourly-weather-surface-brazil-southeast-region?select=sudeste.csv

  1. Explore the dataset and identify data field(s) that could be measured by certain factors or dimensions. (Follow recorded lecture #18, and synchronous session #18)

Example: In a Sales dataset, you may find a measurable field “total sales”, which could be analyzed by other factors such as, “products”, “time”, “location” etc. These factors are known as dimensions. Depending on the data, you may also find possibilities of slice and dice, i.e. analysis could be possible in more granular level; From total sales by city to total sales by store

  1. Write ½ page explanation on how did you select the measurable filed, i.e. fact and what are the possible dimensions. Include this part in your PDF file.

Summer 2021 saurabh.dey@dal.ca

  1. Clean the dataset, if required perform formatting. You can perform the cleaning and formatting using spreadsheet operation or programming script. If you use program add that in GitLab, if you use other methods, write the steps in the PDF file.

  1. Create Cognos account and import your dataset. Identify the dimensions, and create/import the dimension tables.

  1. Based on your understanding of the domain (please read the information/metadata available on the dataset source, i.e. Kaggle), create star schema or snowflake schema. Provide justification of your model creation in the PDF file.

  1. In addition to justification, attach screenshot of the model (star schema or snowflake schema) in the PDF file.

  1. Display visual analysis of the data in a suitable format, e.g. bar graph showing temperature change in terms of a suitable dimension. Add the screenshot of the analysis on the pdf or add a screen recording of the analysis on your .zip folder.

Problem #2

Sentiment Analysis – Java Program only

  1. To perform this task, you need to consider the processed news (“content or descriptions” only, ignore other fields) that you obtained and stored in MongoDB in your previous assignment. If you could not perform/complete the task, then obtain the processed MongoDb News collection by contacting your TA Kethan (Cc me in that email)

  1. Write a script to create bag-of-words for each news article. (code from online or other sources are not accepted)

e.g. news1 = “Canada is cold cold. I feel good not bad”

bow1 = {“Canada”:1, “is”:1, “cold”:2, “I”:1, “feel”:1, “good”:1, “not”:1, “bad”:1} You do not need any libraries. Just implement a simple counter using loop.

Compare each bag-of-words with a list of positive and negative words. You can download list of positive and negative words from online source(s). You do not need any libraries. Just perform word by word comparison with a list of positive and negative words that you can get from any online platform. E.g. negative words can be found here https://gist.github.com/mkulakowski2/4289441

  1. Tag each news as “positive”, “negative”, or “neutral” based on overall score. You can add an

additional column to present your finding.

E.g. frequencies of the matches “cold”=2, “not” =1, “bad”=1 (negative -4), “good”=1 (positive +1). Overall score = -3

News Article

News

match

polarity

Description/content

1

Canada is cold cold. I

cold, good, not, bad

negative

feel good not bad

Summer 2021 saurabh.dey@dal.ca

Problem #3

Semantic Analysis

1. For this task, consider the processed news collection that you created in Assignment 3.

2. Use the following steps to compute TF-IDF (term frequency-inverse document frequency)

a. Suppose, you have 50 news articles (description or content only) that are stored in 50 JSON arrays. You need to consider these data points as the total number of documents (N). In this case N=50

Now, use the search query “Canada”, “Moncton”, “Toronto”, and search in how many documents these words have appeared.

Total Documents

50

Search Query

Document

Total Documents(N)/

number

Log10(N/df)

containing

of documents term

appeared

term(df)

(df)

Canada

30

50/30

0.221848749

Moncton

5

50/5

1

Toronto

10

50/10

0.698970004

b. Once you build the above table, you need to find which document has the highest occurrence of the word “Canada”. You can find this by performing frequency count of the word per document.

Term

Canada

Canada appeared in 30 documents

Total Words (m)

Frequency (f )

Article #1

6

2

Article #2

10

1

:

:

:

Article #30

8

1

c. You should print the news article (programmatically), which has the highest relative frequency. You can find this by computing (f/m).

Assignment 4 Submission Format:

1) Compress all your reports/files into a single .zip file and give it a meaningful name.

You are free to choose any meaningful file name, preferably – BannerId_Lastname_firstname_5408_A4 but avoid generic names like assignment-4.

2) Submit your reports only in PDF format.

Please avoid submitting .doc/.docx and submit only the PDF version. You can merge all the reports into a single PDF or keep them separate. You should also include output (if any) and test cases (if any) in the PDF file in the proper format (e.g. tables, charts etc. as required).

3) Your executable code/script needs to be submitted on https://git.cs.dal.ca/

Summer 2021 saurabh.dey@dal.ca

Assignment #4 Solution
$30.00 $24.00