Implementing Python Scraper Day One
Live Decision Making: Not going to do aggregate values for series because not all films will be in a series and it will be difficult to differentiate. Only going to use the number in the series. Need to remember that I can only use features that will be available for a movie that hasn't been released, so film rating is out. Need to make film MPAA rating a number and not a string (gonna do this in R because I will have collected all possible ratings) There isn't an easy way to find the gender of an Actor so that will be left out ...going back and forth from IMDb to Box Office Mojo is a pain :( hmm... I may scrape BOM Alphabetical instead and run searches on IMDb to find the film rather than go IMDb to BOM Process would then be: Start scraping films from BOM and appending Film objects __init__ with a mojo_id to Queue1 Loop pulls from Queue1 and tries to find IMDb page If fails, film is ignored Else, set imdb_id and append to Queue2 Start scraping non-a...