Getting your hands dirty with Web Scraping

The web can be a trove of openly accessible data, but it is not always readily available in a format that allows it to be downloaded for analysis and reuse. This workshop aims to introduce attendees to web scraping, a technique to automate extracting data from websites. Part one of the workshop will use browser extensions and web tools to get started with web scraping quickly, give examples where this technique can be useful, and introduce how to use XPath queries to select elements on a page. Part two will introduce how to write a spider in Python to follow hyperlinks and scrape several web pages using the Scrapy framework. We will conclude with an overview of the legal aspects of web scraping and an open discussion. You don’t need to be a coder to enjoy this workshop! Anyone wishing to learn web scraping is welcome, although some familiarity with HTML will be helpful. Part two will require some experience with Python, attendees unfamiliar with this language are welcome to stay only for part one and still learn useful web scraping skills!

Speaker(s)

Kim Pham

Thomas Guignard

March 6th

1:30-4:30

Room: Luskin Center Pinnacle room (Level 1)