Improvements for Future Updates

A list of improvements I've been trying to update as I find mistakes or I purposefully make them.

True multiprocessing (not just Python's Threads that don't really do anything)
Don't use the requests package as it doesn't do concurrency
Find a way to get screen time for actors and weight actors accordingly
Better solution than a high level try-except in QueueConsumer for error handling (I did this so I could sleep peacefully knowing that this will continue to run overnight)
Use IMDb's advanced search instead of the regular one to get better results
Exclude Actors by potentially checking profile photo first to avoid querying each page and failing
More testing to guarantee that all scraped values will be correct
Make two models depending on the budget

I ended up making 3. It helped a little bit, but not by much. See more worthwhile improvement in the bullet below.

Scrape films that don't have revenue information because they are still important for aggregate values such as average actor metascore.
Normalize more than just the budget field

Need to create a util file to generate an index for Box Office Mojo:

Data structure will be a dict: word -> set(titles) and a second structure will be a dict: title -> mojo_id
When given an IMDb title, it looks up each word in the title and does an intersection of the sets. Not the best way but that's what I currently plan on doing until I come up with a better way. (Doesn't help that this is due in 5hrs).

A Kamerican Blog