Generating PDFs with AWS Lambda
There are services that offer to turn your HTML into PDFs, but if you don't want another bill, and the risk of outsourcing vital functionality, you can set up the tools for creating PDFs from HTML.
Intro
Although technology has seen an explosion of innovation, the PDF format still dominates when you need to replicate paper in the digital sphere. Any enterprise-scale website project will eventually require you to provide dynamic information in PDF format. There are services that offer to turn your HTML into PDFs, but if you don't want another bill and the risk of outsourcing vital functionality, you can set up the tools for creating PDFs from HTML yourself.
Tooling
Some of the most popular tools for quickly turning a website into a PDF are WeasyPrint and wkhtmltopdf. These are available for free, but setting them up can be difficult depending on your server operating system and language stack. For example, if you need the latest version of WeasyPrint but are running an older version of Python for your web application, you're looking at a significant software update or running multiple Python environments.
Hosting
If you are deploying to many servers, you'll need to install these libraries and their dependencies across your entire fleet. Alternatively, you could install the PDF generating program on a single server with all of the correct dependencies, but that means you have a server running only to make PDFs. If you generate PDFs every few seconds that's probably a reasonable decision, but if generating a PDF-only happens once a day, such as when a new product is added, this is a waste of resources.
If you want a service to generate PDFs that are under your control, available any time, and only running when you need it, AWS Lambda is the obvious solution. The only question is, how do you install the dependencies you need to generate a PDF on Lambda? Fortunately, the hard work has already been done for you.
Walkthrough
The Cloud Print Utils project has taken the time to package the tools you need into one convenient bundle. All you need is make
, zip
, and docker
. Let's walk through the steps to making your own working PDF generator on AWS Lambda.
Step One - Clone the project from GitHub
git clone https://github.com/kotify/cloud-print-utils.git
Step Two - Build the layer
cd cloud-print-utils
make build/weasyprint-layer-python3.8.zip
This will download a Lambda Docker image and install the libraries you need for generating PDFs, then save that to a zip file you can use as a layer in Lambda. (You may need to use sudo
if your user isn't part of the docker
group.)
Step Three - Add your layer file to your Lambda account
The layer file, named weasyprint-layer-python3.8.zip
will be stored in the build
directory.
If you have AWS CLI installed, you can upload the layer with the following command:
aws lambda publish-layer-version --region <region> --layer-name <name> --zip-file fileb://build/weasyprint-layer-python3.8.zip
Don't forget to replace <region>
with your region identifier and <name>
with a memorable name such as PDF
.
If you are not using AWS CLI, you can also upload your layer through the AWS website. Go to your AWS Lambda console and select "Layers" in the left sidebar. Click the "Create layer" button in the top right, and upload your zip file.
Step Four - Configure your Lambda function
From the Lambda console on the AWS website, click "Functions" in the left sidebar, then click the "Create function" button.
Set whatever name you like, but make sure you select the Python 3.8 runtime, to match the layer you created. Python 3.8 is the newest Python runtime offered in Lamba at this time. Click "Create function".
Once your function is generated, you will need to link it to the layer you uploaded. Click "Layers" under your function name, which will open the Layers section below. Click the "Add a layer" button, and then select the "Custom layers" group. You can find the layer you uploaded in the "Custom layers" select box. You should only have one version of your layers, so select that, and click the "Add" button.
Then, you need to set a few environment variables for your function. Scroll down to the "Environment variables" section of the page and click "Manage environment variables". You need to add three environment variable in order for the layer to function:
- GDK_PIXBUF_MODULE_FILE: /opt/lib/loaders.cache
- FONTCONFIG_PATH: /opt/fonts
- XDG_DATA_DIRS: /opt/lib
Once you've added them, hit "Save".
Step Five - Make your function accessible
You need to be able to trigger your function to use it. If you plan to access it across the internet, an API Gateway should be set up. Click on the "Add trigger" button and select "API Gateway". Configure your gateway and click "Add". This will add a gateway to your project. Click on it to find your function's URL. You'll need that later.
Step Six - Write the function
The Cloud Print Utils projects provide a ready-made lambda function for you. You can view it in the repository you cloned at weasyprint/lambda_function.py
. It supports writing its output to S3 or returning it, as well as generating PDFs and PNGs from raw HTML or links. Copy the contents of that Python file into AWS's Lambda editor. Then click "Deploy" above the function editor.
Step Seven - Generate your PDF
Now, all you need to do is generate a POST request to your lambda function to turn HTML into a PDF. The default function accepts the following arguments:
filename
- REQUIRED - The name of the file that will be returned or storedurl
- The URL of an HTML resource to be used as the sourcehtml
- Raw HTML to be used as the source. If used withurl
, this will be ignoredreturn
- Sends the PDF as a base64 encoded response if set to 'base64'. If the return is not set or has any other value, the file will be saved to S3, provided you've set aBUCKET
environment variable.
Supporting PDFs doesn't have to take a week. Most of the work has already been done for you. All you need to do is customize your function for your specific needs, and let AWS Lambda take care of the rest.
The JBS Quick Launch Lab
Free Qualified Assessment
Quantify what it will take to implement your next big idea!
Our assessment session will deliver tangible timelines, costs, high-level requirements, and recommend architectures that will work best. Let JBS prove to you and your team why over 24 years of experience matters.