Video Demo General Script
Video Demo Rough Transcript
- Introduction to the Problem
- Gather Film and Actor data from IMDb and Box Office Mojo
- Following Features:
- FILM SPECIFIC
- weekday (1-7)
- day
- month
- budget
- length
- mpaa_num (converted from strings to ints)
- ACTOR SPECIFIC
- avg_actor_age
- max_actor_film_revenue
- avg_actor_film_revenue
- max_actor_film_votes
- avg_actor_film_votes
- max_actor_film_stars
- avg_actor_film_stars
- max_actor_film_appearances
- avg_actor_film_appearances
- max_actor_film_metascore
- avg_actor_film_metascore
- DIRECTOR SPECIFIC
- director_age
- director_number_of_films
- max_director_film_revenue
- avg_director_film_revenue
- max_director_film_votes
- avg_director_film_votes
- max_director_film_stars
- avg_director_film_stars
- max_director_film_metascore
- avg_director_film_metascore
- Use Python to scrape the data
- Broken into two steps
- Scraping
- Aggregation
- Use MongoDb to save the objects (films and actors)
- Use R to build the multiple regression model
- Scraping Process
- Get list of movies from Box Office Mojo alphabetical
- Use name and ~year~ to match to IMDb
- Gather Film information
- Gather Actor information
- Tons of problems and had to make my own data structures
- Restart if it fails and code needed to be changed
- Had to move to a remote machine
- 10GB of RAM and crashed the program
- Now runs at about 200mb
- Python Code
- SetQueues
- QueueConsumers
- Find IMDb ID
- Scrape IMDb
- Scrape Actors
- Save
- ^ allowed me to restart from any point
- Film
- Actor
- Improvements
- Error handling
- Multiprocessing
- R Code
- Linking to the database
- Converting R values
- Check for outliers (don't currently do anything about it)
- Create the model
- Used backwards fitting
- Insert features removed here
- Test using a 50/50 split and 30/70
- Average error for both
- Need to make two models depending on the film budget
- Python Interface
- Run the script from terminal
- Get results
- Good/Bad?
- Documentation
- Github
- Blog (details on planning, working through problems, improvements etc.)
- R Notebook
- Comments
You are sharing a piece of nice information here. The information you have provided is genuinely instructive and significant for us. Thanks for sharing an article like this.Software Programming Course in Delhi
ReplyDeleteThanks for sharing such valuable information. Your writing is always engaging, and I look forward to your future posts
ReplyDeleteMysore Ooty Coorg Tour Package
Mysore Ooty Kodaikanal Tour Package