5.6. Dealing with Multiple DataFrames¶
Forget about budget or runtimes as criteria for selecting a movie, let’s take a
look at popular opinion. Our dataset has two relevant columns: vote_average
and vote_count
.
Let’s create a variable called df_high_rated
that only contains movies that
have received more than 20 votes, and whose average score is greater than 8.
import pandas as pd
df = pd.read_csv("https://runestone.academy/ns/books/published/httlads/_static/movies_metadata.csv").dropna(axis=1, how='all')
df_highly_voted = df[df.vote_count > 20]
df_high_rated = df_highly_voted[df_highly_voted.vote_average > 8]
df_high_rated[['title', 'vote_average', 'vote_count']].head()
title | vote_average | vote_count | |
---|---|---|---|
46 | Se7en | 8.1 | 5915.0 |
49 | The Usual Suspects | 8.1 | 3334.0 |
109 | Taxi Driver | 8.1 | 2632.0 |
256 | Star Wars | 8.1 | 6778.0 |
289 | Leon: The Professional | 8.2 | 4293.0 |
Here we have some high-quality movies, at least according to some people.
But what about my opinion?
Here are my favorite movies and their relative scores. Create a DataFrame called
compare_votes
that contains the title as an index and both the
vote_average
and my_vote
as its columns. Also, only keep the movies that
are both my favorites and popular favorites.
Hint: You’ll need to create two Series, one for my ratings and one that maps titles to vote_average.
my_votes = {
"Star Wars": 9,
"Paris is Burning": 8,
"Dead Poets Society": 7,
"The Empire Strikes Back": 9.5,
"The Shining": 8,
"Return of the Jedi": 8,
"1941": 8,
"Forrest Gump": 7.5,
}
There should be only 6 movies remaining.
Now add a column to compare_votes
that measures the percentage difference
between the popular rating and my rating for each movie. You’ll need to take the
difference between the vote_average
and my_vote
and divide it by
my_vote
.
compare_votes
Q-3: Make up 3 questions you would like to answer about this movie data using the techniques you have learned in this lesson and write them in the box.
Q-4: Summarize the answers to your questions here.
Lesson Feedback
-
During this lesson I was primarily in my...
- 1. Comfort Zone
- 2. Learning Zone
- 3. Panic Zone
-
Completing this lesson took...
- 1. Very little time
- 2. A reasonable amount of time
- 3. More time than is reasonable
-
Based on my own interests and needs, the things taught in this lesson...
- 1. Don't seem worth learning
- 2. May be worth learning
- 3. Are definitely worth learning
-
For me to master the things taught in this lesson feels...
- 1. Definitely within reach
- 2. Within reach if I try my hardest
- 3. Out of reach no matter how hard I try