This is the first project in Udacity's Nanodegree Program for Data Analysis. In this project, I analyzed the dataset and produced a file that outlines the results for sharing. I started by taking a look at the dataset and brainstorming what questions we could answer using it. Then, I used NumPy and pandas to respond to the questions that most interested us, and I produced a report to share the results. (Finalized on 14 December 2022).
This dataset contains information about 10,000 movies collected from The Movie Database (TMDb), including user ratings and revenue. It has 21 columns. The columns that I used in my analysis are:
- popularity: The movie's popularity
- budget: The movie's budget
- revenue: The movie's revenue
- original_title: The movie's title
- cast: The movie's cast
- director: The movie's director
- keywords: The movie's keywords
- runtime: The movie's runtime
- genres: The movie's genres
- production_companies: The movie's production companies
- vote_count: The movie's vote count
- vote_average: The movie's average vote
- release_year: The movie's release year
I asked the following questions:
- How many movies were released each decade?
- Which actors have most appearances in movies?
- Which actors are most frequent in the top 500 movies?
- Which directors directed most movies?
- Which directors are most frequent in the top 500 movies?
- Which production companies made most movies?
- What are the Top 10 most frequent genres?
- Which genres are most popular?
- Which genres have the highest rating?
- Which 3 genres are most popular from decade to decade?
- What are the most popular keywords?
- What are the Top 10 movies with heighest budget?
- What are the Top 10 movies with heighest revenue?
- What are the Top 10 most popular movies?
- What are the Top 10 movies by average vote?
- What are the relationships between revenue, budget, popularity and average vote?
- Every year, more movies are released.
- More than any other actor, actors like Robert De Niro and Samuel L. Jackson have appeared in numerous films. With over 40 films under their belts.
- Woody Allen holds the record for most films directed.
- The studios that produced the majority of films were Warner Bros., Universal Pictures, and Paramount Pictures.
- Over decades, genre interest changes.
- Adventure, Science Fiction, Fantasy, Action, and Animation are the most watched genres.
- The top five genres for viewers' ratings are war, music, animation, documentaries, and history.
- Avatar, Star Wars: The Force Awakens, Titanic, The Avengers, and Jurassic World are the top-grossing films.
- Jurassic World, Mad Max, Interstellar, Guardians of the Galaxy, and Insurgent are the most well-liked films.
- There is a high correlation between budget and revenue, and between popularity and revenue. So the more money spent on the production of a movie, the higher chance of its success.