Serverless Chrome puppeteer
Say you want to build a scraper, automate manual testing, or generate custom social cards for your website. What do you do?
You could spin up a docker container, set up headless Chrome, add Puppeteer, write a script to run it all, add a server to create an API, and ...
Or you can set up Serverless Chrome with AWS Lambda. Write a bit of code, hit deploy, and get a Chrome browser running on demand.
That's what this chapter is about ๐ค
You'll learn how to:
- configure Chrome Puppeteer on AWS
- build a basic scraper
- take website screenshots
- run it on-demand
We build a scraper that goes to google.com, types in a phrase, and returns the first page of results. Then reuse the same code to return a screenshot.
You can see full code on GitHub
Serverless Chrome
Chrome's engine ships as the open source Chromium browser. Other browsers use it and add their own UI and custom features.
You can use the engine for browser automation โย scraping, testing, screenshots, etc. When you need to render a website, Chromium is your friend.
This means:
- download a chrome binary
- set up an environment that makes it happy
- run in headless mode
- configure processes that talk to each other via complex sockets
Others have solved this problem for you.
Rather than figure it out yourself, I recommend using chrome-aws-lambda. It's the most up-to-date package for running Serverless Chrome.
Here's what you need for a Serverless Chrome setup:
- install dependencies
$ yarn add chrome-aws-lambda@3.1.1 puppeteer@3.1.0 @types/puppeteer puppeteer-core@3.1.0
This installs everything you need to both run and interact with Chrome. โ๏ธ
Check chrome-aws-lambda/README for the latest version of Chrome Puppeteer you can use. Make sure they match.
- configure serverless.yml
# serverless.yml
service: serverless-chrome-example
provider:
name: aws
runtime: nodejs12.x
stage: dev
package:
exclude:
- node_modules/puppeteer/.local-chromium/**
Configure a new service, make it run on AWS, use latest node.
The package
part is important. It tells Serverless not to package the chromium binary with your code. AWS rejects builds of that size.
You are now ready to start running Chrome โ๏ธ