Web Scrapping

Frameworks

  • BeautifulSoup
  • MechanicalSoup
  • Scrapy
  • Photon → collection of URLs, files, specific data (emails, social networks)
  • Puppeteer

Puppeteer

Allows to drive a chromium instance (works with nodejs).

  • automate tasks (forms, data monitoring)
  • browse web pages (tests, scrapping)
  • make screenshots or export web pages to PDF
  • capture a chronological trace of a site to diagnose performance problems
  • test chrome extensions
  • possibility to display the window to follow the navigation
  • dev chrome tools → Recorder allows to record a navigation and extract the puppeteer code.