Generating PDFs with AWS Lambda

There are services that offer to turn your HTML into PDFs, but if you don't want another bill, and the risk of outsourcing vital functionality, you can set up the tools for creating PDFs from HTML.

Intro

Although technology has seen an explosion of innovation, the PDF format still dominates when you need to replicate paper in the digital sphere. Any enterprise-scale website project will eventually require you to provide dynamic information in PDF format. There are services that offer to turn your HTML into PDFs, but if you don't want another bill and the risk of outsourcing vital functionality, you can set up the tools for creating PDFs from HTML yourself.

Tooling

Some of the most popular tools for quickly turning a website into a PDF are WeasyPrint and wkhtmltopdf. These are available for free, but setting them up can be difficult depending on your server operating system and language stack. For example, if you need the latest version of WeasyPrint but are running an older version of Python for your web application, you're looking at a significant software update or running multiple Python environments.

Hosting

If you are deploying to many servers, you'll need to install these libraries and their dependencies across your entire fleet. Alternatively, you could install the PDF generating program on a single server with all of the correct dependencies, but that means you have a server running only to make PDFs. If you generate PDFs every few seconds that's probably a reasonable decision, but if generating a PDF-only happens once a day, such as when a new product is added, this is a waste of resources.

If you want a service to generate PDFs that are under your control, available any time, and only running when you need it, AWS Lambda is the obvious solution. The only question is, how do you install the dependencies you need to generate a PDF on Lambda? Fortunately, the hard work has already been done for you.

Walkthrough

The Cloud Print Utils project has taken the time to package the tools you need into one convenient bundle. All you need is make, zip, and docker. Let's walk through the steps to making your own working PDF generator on AWS Lambda.

Step One - Clone the project from GitHub

git clone https://github.com/kotify/cloud-print-utils.git

Step Two - Build the layer

cd cloud-print-utils

make build/weasyprint-layer-python3.8.zip

This will download a Lambda Docker image and install the libraries you need for generating PDFs, then save that to a zip file you can use as a layer in Lambda. (You may need to use sudo if your user isn't part of the docker group.)

Step Three - Add your layer file to your Lambda account

The layer file, named weasyprint-layer-python3.8.zip will be stored in the build directory.

If you have AWS CLI installed, you can upload the layer with the following command:

aws lambda publish-layer-version --region <region> --layer-name <name> --zip-file fileb://build/weasyprint-layer-python3.8.zip

Don't forget to replace <region> with your region identifier and <name> with a memorable name such as PDF.

If you are not using AWS CLI, you can also upload your layer through the AWS website. Go to your AWS Lambda console and select "Layers" in the left sidebar. Click the "Create layer" button in the top right, and upload your zip file.

Step Four - Configure your Lambda function

From the Lambda console on the AWS website, click "Functions" in the left sidebar, then click the "Create function" button.

Set whatever name you like, but make sure you select the Python 3.8 runtime, to match the layer you created. Python 3.8 is the newest Python runtime offered in Lamba at this time. Click "Create function".

create_function.png

Once your function is generated, you will need to link it to the layer you uploaded. Click "Layers" under your function name, which will open the Layers section below. Click the "Add a layer" button, and then select the "Custom layers" group. You can find the layer you uploaded in the "Custom layers" select box. You should only have one version of your layers, so select that, and click the "Add" button.

add_layer.png

Then, you need to set a few environment variables for your function. Scroll down to the "Environment variables" section of the page and click "Manage environment variables". You need to add three environment variable in order for the layer to function:

  • GDK_PIXBUF_MODULE_FILE: /opt/lib/loaders.cache
  • FONTCONFIG_PATH: /opt/fonts
  • XDG_DATA_DIRS: /opt/lib

Once you've added them, hit "Save".

set_environment_variables.png

Step Five - Make your function accessible

You need to be able to trigger your function to use it. If you plan to access it across the internet, an API Gateway should be set up. Click on the "Add trigger" button and select "API Gateway". Configure your gateway and click "Add". This will add a gateway to your project. Click on it to find your function's URL. You'll need that later.

configure_gateway.png

Step Six - Write the function

The Cloud Print Utils projects provide a ready-made lambda function for you. You can view it in the repository you cloned at weasyprint/lambda_function.py. It supports writing its output to S3 or returning it, as well as generating PDFs and PNGs from raw HTML or links. Copy the contents of that Python file into AWS's Lambda editor. Then click "Deploy" above the function editor.

code.png

Step Seven - Generate your PDF

Now, all you need to do is generate a POST request to your lambda function to turn HTML into a PDF. The default function accepts the following arguments:

  • filename - REQUIRED - The name of the file that will be returned or stored
  • url - The URL of an HTML resource to be used as the source
  • html - Raw HTML to be used as the source. If used with url, this will be ignored
  • return - Sends the PDF as a base64 encoded response if set to 'base64'. If the return is not set or has any other value, the file will be saved to S3, provided you've set a BUCKET environment variable.

Supporting PDFs doesn't have to take a week. Most of the work has already been done for you. All you need to do is customize your function for your specific needs, and let AWS Lambda take care of the rest.

The JBS Quick Launch Lab

Free Qualified Assessment

Quantify what it will take to implement your next big idea!

Our assessment session will deliver tangible timelines, costs, high-level requirements, and recommend architectures that will work best. Let JBS prove to you and your team why over 24 years of experience matters.

Get Your Assessment