Think Summer: Project 4 — 2023
Submission
Students need to submit the following file by 10:00PM EST through Gradescope inside Brightspace.
-
A Jupyter notebook (a
.ipynb
file).
We’ve provided you with a template notebook for you to use. Please carefully read this section to get started.
When you are finished with the project, please make sure to run every cell in the notebook prior to submitting. To do this click |
Questions
Question 1
For the show The Handmaid’s Tale (title_id
is tt5834204), there are 4 seasons listed in the IMDB database. Find the average rating of each of the four seasons. Hint: Use AVG
for find the average, and use GROUP BY
the season_number
.
-
SQL used to solve this problem. (2 pts)
-
Output from running SQL. (1 pt)
Question 2
Identify the six most popular episodes of the show Grey’s Anatomy (where "popular" denotes a high rating).
-
SQL used to solve this problem. (2 pts)
-
Output from running SQL. (1 pt)
Question 3
Make a dotchart in R showing the results of the previous question.
Hint: You can use your work from SQL, and export the results to a dataframe called myDF
in R. Then you can use something like:
# use a dbGetQuery here, to import the SQL results to R, and then
myresults <- myDF$rating
names(myresults) <- myDF$primary_title
dotchart(myresults)
-
R used to solve this problem. (2 pts)
-
Output from running R. (1 pt)
Question 4
Make a dotchart showing the total amount of money donated in each of the top 10 states, during the 2000 federal election cycle.
-
R used to solve this problem. (2 pts)
-
Output from running R. (1 pt)
Question 5
Make a dotchart that shows how many movies premiered in each year. Now make another dotchart, which shows the same data (i.e., how many movies premiered each year) but only since the year 2000.
-
R used to solve this problem. (2 pts)
-
Output from running R. (1 pt)
Question 6
Among the three big New York City airports (JFK
, LGA
, EWR
), which of these airports had the worst DepDelay
(on average) in 2005? (Can you solve this with 1 line of R, using a tapply
(rather than using 3 separate lines of R)? Hint: After you run the tapply
, you can index your results using [c("JFK", "LGA", "EWR")]
to lookup all 3 airports at once.)
-
R used to solve this problem. (2 pts)
-
Output from running R. (1 pt)
Question 7
LIKE
is a very powerful tool. You can read about SQLite’s version of LIKE
here. Use LIKE
to analyze the primary_title
of all IMDB titles: First determine how many titles have Batman
anywhere in the title, and then determine how many titles have Superman
anywhere in the title? Which one occurs more often?
-
SQL used to solve this problem. (2 pts)
-
Output from running SQL. (1 pt)
Question 8
How much money was donated during the 2000 federal election cycle by people who have PURDUE
listed somewhere in their employer name? How much money was donated by people who have MICROSOFT
listed somewhere in their employer name? Hint: You might use the grep
or the grepl
(which is a logical grep) to solve this one.
-
R used to solve this problem. (2 pts)
-
Output from running R. (1 pt)