Description
Consider the New York Times best seller dataset (https://www.kaggle.com/cmenca/new-york-times-hardcover-fiction-best-sellers/). Import the dataset: nyt2.json, into your MongoDB database.
Write a MongoDB operation (find, aggregate, update, etc) for each of the following questions:
-
Find out the number of books whose title contains “history” (case insensitive).
-
Find out how many book were ranked numbered 1 (according to the overall “rank” value).
-
Find out the highest price of the books whose overall rank is top 20 (rank value is from 1 to 20). (Hint: use aggregate() with a match() for finding the qualified books and then $group, setting _id set to null, to find out the max value of price among all matched books).
-
Find out, for each publisher, the number of books it published. (list 20 of the query result in .txt)
-
Find out publishers who publish at least 10 books. (list 20 of the query result in .txt)
-
Find out the number of distinct publishers.
-
Find the titles of books published by “Harper” and appeared in the best-selling list at least 5 times. Output the titles in the descending order. (list first 10 of the query result in .txt)
-
Find out the average price of books published by “Harper”.
-
Find the most productive authors (i.e., authors who published the largest number of books).
-
Change the price of book “Breathless” to 28.5. (paste the response after the update in .txt)
Submit a zip file that contains your queries and results.
Submission
-
Name your files as <firstname>_<lastname>_q1.txt, <firstname>_<lastname>_q2.txt, and so on.
-
Submit all .txt files in a zip file with the naming convention: <firstname>_<lastname>_lab3.zip. (example content of .txt files)