Ready, Set, Go!
It's nearing the end of the semester which means that is now time to start the final project for DS4100. This will be post numero uno in a multiple part blog series following my struggle.
To kick things off, here is the rough outline for my project.
Predict the Gross Revenue for an Upcoming Film
Multiple regression model that takes an IMDb page for an upcoming movie and predicts financial performance. The data is on a couple different websites each with their own method of searching and they do not have API access. I was going to use Python to scrape and parse the HTML and to put it into Mongo. I plan on having two Collections: Films and Actors. Then I plan on using R to pull the information into a data frame and to generate a model. I will use Python to create an interface between the user and the R model.
Data is from IMDb and Box Office Mojo:
- Movie rating (critics + people)
- Number of votes
- Length of film
- Film MPAA rating
- Film Budget
- Known movies revenue
- Release month and day of week
- Number in the series
- Average revenue for series
- Actors/Directors
- Male or Female
- Number of films
- Quality of films?
- Total revenue of films?
- Average film rating?
Comments
Post a Comment