CS Homework #2 Search Engine Solved

$30.00 $24.00

Brief Description In this homework, you will write a search engine. The search engines in real life, search hundreds of millions web pages to see if they have the words that you have typed, and they can do this for thousands of users at a given time. In order to do these searches really fast,…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Description

5/5 – (2 votes)

Brief Description
In this homework, you will write a search engine. The search engines in real life, search hundreds of millions web pages to see if they have the words that you have typed, and they can do this for thousands of users at a given time. In order to do these searches really fast, search engines such as Google do a lot of what we call preprocessing of the pages they search; that is, they transform the contents of a web page (which for the purposes of this homework, we will assume to consist of only strings) into a structure that can be searched very fast.
In this homework, you are going to preprocess the documents provided to you. For each unique word, you will insert a node into your AVL Search Tree. Of course you will also keep track of the details such as the document name which the word appears in and the number of times the word appears in each document. So, you need to implement a templated AVL Search Tree first.

 

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Program Flow
You need to get the number of documents that you need to preprocess first. Then, after getting the names of all documents from the console, you need to preprocess all the text in the documents. Only the alphabetical characters are considered, and the unique words need to be case insensitive (for example; “izmir” and “Izmir” are the same). The rest (such as digits or punctuation characters) will be ignored. For each unique word appearing in the text, you need to insert a new node into the tree. If the node is already there, you need to update the node with the information about this document. For example; you have preprocessed the “a.txt” document first and there is only one “the” word in this document. You need to insert “the” word into the AVL Search Tree. Then, while preprocessing the “b.txt” document, “the” word appears 4 times. So, you need to add this information ({“b.txt”, 4}) into the existing node in your BST and the node of “the” word becomes {{“a.txt”, 1}, {“b.txt”, 4}}.
After you preprocessed all the documents, you need to get a query from the console which consists of a line of strings (HINT: You may use getline(cin, line)). This line might consist of more than one word. Then, you need to output the document names which all words in the query appears including the number of times that each word appears in that document.
Hint:
struct DocumentItem {
string documentName;
int count;
};
struct WordItem {
string word;
// List of DocumentItem’s. In order to keep the documents
//you can again use the BST that you are going to implement.
};
template <class Key, class Value>
class AVLSearchTree
{

};

and then when you create a AVL Search Tree object it looks like this:
AVLSearchTree<string, WordItem *> myTree;
DEMO
a.txt:
Sabanci sabanci cs CS FENS Engineering Computer
b.txt:
Homeworks are so fun. WoW, What a story!!!
Sabanci is so cool. I love 2 CS CS CS CS

Sample Run 1:
Enter number of input files: 2
Enter 1. file name: a.txt
Enter 2. file name: b.txt
Enter queried words in one line: Sabanci
in Document a.txt, sabanci found 2 times.
in Document b.txt, sabanci found 1 times.

Sample Run 2:
Enter number of input files: 2
Enter 1. file name: a.txt
Enter 2. file name: b.txt
Enter queried words in one line: cs sabanci
in Document a.txt, cs found 2 times, sabanci found 2 times.
in Document b.txt, cs found 4 times, sabanci found 1 times.

Sample Run 3:
Enter number of input files: 2
Enter 1. file name: a.txt
Enter 2. file name: b.txt
Enter queried words in one line: fun computer Engineer
No document contains the given query

Sample Run 4:
Enter number of input files: 2
Enter 1. file name: a.txt
Enter 2. file name: b.txt
Enter queried words in one line: sabanci fens
in Document a.txt, fens found 1 times, sabanci found 2 times.

General Rules and Guidelines about Homeworks

The following rules and guidelines will be applicable to all homeworks, unless otherwise noted.

How to get help?
You may ask questions to TAs (Teaching Assistants) of CS300. Office hours of TAs are at the syllabus. Recitations will partially be dedicated to clarify the issues related to homework, so it is to your benefit to attend recitations.

What and Where to Submit
Please see the detailed instructions below/in the next page. The submission steps will get natural/easy for later homeworks.

Grading and Objections
Careful about the semi-automatic grading: Your programs will be graded using a semi-automated system. Therefore, you should follow the guidelines about input and output order; moreover, you should also use same prompts as given in the Sample Runs. Otherwise semi-automated grading process will fail for your homework, and you may get a zero, or in the best scenario you will lose points.

Grading:
• Late penalty is 10% off the full grade and only two late days are allowed.
• Having a correct program is necessary, but not sufficient to get the full grade. Comments, indentation, meaningful and understandable identifier names, informative introduction and prompts, and especially proper use of required functions, unnecessarily long program (which is bad) and unnecessary code duplications will also affect your grade.
• Please submit your own work only (even if it is not working). It is really easy to find out “similar” programs!
• For detailed rules and course policy on plagiarism, please check out

 

http://myweb.sabanciuniv.edu/gulsend/courses/cs201/plagiarism/

Grade announcements: Grades will be posted in SUCourse, and you will get an Announcement at the same time. You will find the grading policy and test cases in that announcement.

Grade objections: It is your right to object to your grade if you think there is a problem, but before making an objection please try the steps below and if you still think there is a problem, contact the TA that graded your homework from the email address provided in the comment section of your announced homework grade or attend the specified objection hour in your grade announcement.
• Check the comment section in the homework tab to see the problem with your homework.
• Download the .zip file you submitted to SUCourse and try to compile it.
• Check the test cases in the announcement and try them with your code.
• Compare your results with the given results in the announcement.
What and where to submit (IMPORTANT)

Submissions guidelines are below. Most parts of the grading process are automatic. Students are expected to strictly follow these guidelines in order to have a smooth grading process. If you do not follow these guidelines, depending on the severity of the problem created during the grading process, 5 or more penalty points are to be deducted from the grade.

Add your name to the program: It is a good practice to write your name and last name somewhere in the beginning program (as a comment line of course).

Name your submission file:
• Use only English alphabet letters, digits or underscore in the file names. Do not use blank, Turkish characters or any other special symbols or characters.
• Name your cpp file that contains your program as follows.
“SUCourseUserName_yourLastname_yourName_HWnumber.cpp”
• Your SUCourse user name is actually your SUNet username which is used for checking sabanciuniv e-mails. Do NOT use any spaces, non-ASCII and Turkish characters in the file name. For example, if your SUCourse user name is cago, name is Çağlayan, and last name is Özbugsızkodyazaroğlu, then the file name must be:
cago_ozbugsizkodyazaroglu_caglayan_hw2.cpp
• Do not add any other character or phrase to the file name.
• Make sure that this file is the latest version of your homework program.
• You need to submit ALL .cpp and .h files in addition to your main.cpp in your VS solution.

Submission:
• Submit via SUCourse ONLY! You will receive no credits if you submit by other means
(e-mail, paper, etc.).
1. Click on “Assignments” at CS300 SUCourse.
2. Click Homework 2 in the assignments list.
3. Click on “Add Attachments” button.
4. Click on “Browse” button and select the zip file that you generated.
5. Now, you have to see your zip file in the “Items to attach” list.
6. Click on “Continue” button.
7. Click on “Submit” button. We cannot see your homework if you do not perform this step even if you upload your file.

Resubmission:
• After submission, you will be able to take your homework back and resubmit. In order to resubmit, follow the following steps.
1. Click on “Assignments” at CS300 SUCourse.
2. Click Homework 2 in the assignments list.
3. Click on “Re-submit” button.
4. Click on “Add/remove Attachments” button
5. Remove the existing zip file by clicking on “remove” link. This step is very important. If you don’t delete the old zip file, we get both files and the old one may be graded.
6. Click on “Browse” button and select the new zip file that you want to resubmit.
7. Now, you have to see your new zip file in the “Items to attach” list.
8. Click on “Continue” button.
9. Click on “Submit” button. We cannot see your homework if you do not perform this step even if you upload your file.

Successful submission is one of the requirements of the homework. If, for some reason, you cannot successfully submit your homework and we cannot grade it, your grade will be 0.

Good Luck!
Gülşen Demiröz

CS Homework #2 Search Engine Solved
$30.00 $24.00