Event-Driven Application Logging

Proper logging and exception handling will greatly improve a team’s capability to support any application, but such visibility becomes crucial...

Simple techniques to help your team focus on developing great software

Introduction

Applications today are less often designed as procedural, monolithic systems; instead favoring much smaller, service-based architecture. This loose-coupling of services and events has been shown to be much more resource-efficient both in development and production. This efficiency can be offset by an increase in complexity and obfuscation, particularly in regards to event paths and data flow. Proper logging and exception handling will greatly improve a team’s capability to support any application, but such visibility becomes crucial when dealing with event-driven architecture.

Errors and Exceptions

There are two ways to write error-free programs; only the third one works.
- Alan Perlis

Code is often structured in such a way to consume exceptions to prevent system fault and execution halts in production. The classic example follows:

try:
  x = 1/0
except ZeroDivisionError as err:
  print("Handling run-time error:", err)

Assuming an error within the try-block is not critical to the execution of your program, this type of exception handling can be very beneficial, as it prevents the application from halting at the exception. Additionally, it can act as a form of documentation in that logic flow can easily be walked. When this strategy is combined with centralized application logging, you can easily track alerts and use them to better support or refactor existing code.

Logs are Data

While logs are necessary, they are often cumbersome to parse and can be quite difficult to trace step execution from one to another. Most exception handlers will provide stack tracing options, or provide additional situational detail (i.e. keyword arguments) if provided. But a simpler solution is to convert the error message itself from a string into JSON.

try:
  resp = requests.get("https://httpbin.org/post")
    resp.raise_for_status()
except requests.exceptions.HTTPError as err:
    print({
        "event": "httpbin-post",
        "error": str(err),
        "method": err.request.method,
    })
# {'event': 'httpbin-post', 'message': '405 Client Error: METHOD NOT ALLOWED for url: https://httpbin.org/post', 'method': 'GET'}

It can often be a delicate balance between informational data vs frequency/volume noise. A good rule of thumb is to log enough detail to allow you to reproduce a production issue locally for debugging purposes. Additionally, you can take advantage of logging levels to filter, i.e. INFO from ERROR.

Metrics

Error paths are not the only use case for event logs; logic flow and data assertions can also be analyzed from log metrics. Imagine a case where data assumptions include some value never being below a certain value. The code doesn’t error in either case, but you can easily check the assumption through the use of a logged assertion.

value += 2
try:
  assert value > 5, "never happens"
except AssertionError as err:
  print({
        "event": "assumption-check", 
        "test": "less than 5", 
        "value": value
    })
# {'event': 'assumption-check', 'test': 'less than 5', 'value': 2}

Or, you could log ephemeral data from worker nodes handling message queues.

try:
    success = job.process(message)
except SomeException as err:
    logger.exception({
        "event": "my-queue-worker",
        "message": str(message),
        "error": str(err),
    })
else:
    logger.info({
        "event": "my-queue-worker",
        "message": str(message),
        "success": str(success),
    })
finally:
    logger.info({
        "event": "my-queue-worker",
        "time": process_time_sec,
    })

Once the data has been collected, you are limited only by the way in which you wish to aggregate the data.

CloudWatch Logs Insights

If you choose to send your logs to CloudWatch, you can use Insights to query and analyze your log data. Each event (log) is grouped into a stream (named collection), with multiple streams forming a group for shared retention and access controls.

Using simple query structures, you can quickly analyze data from your CloudWatch logs to help diagnose real-time issues, or track deployment fixes.

filter @message like /Exception/ 
    | stats count(*) as exceptionCount by bin(1h)
    | sort exceptionCount desc
# number of exception per hour
Filters can be used to generate CloudWatch metrics, giving you access to alerts and notifications triggered by your log data.
aws logs put-metric-filter \
  --log-group-name MyApp/message.log \
  --filter-name MyAppErrorCount \
  --filter-pattern 'Error' \
  --metric-transformations \
    metricName=ErrorCount,metricNamespace=MyNamespace,metricValue=1,defaultValue=0

Remember: logs are data. With the adoption of these simple techniques, your team can get back to work developing great software, rather than debugging its edge cases.

The JBS Quick Launch Lab

Free Qualified Assessment

Quantify what it will take to implement your next big idea!

Our assessment session will deliver tangible timelines, costs, high-level requirements, and recommend architectures that will work best. Let JBS prove to you and your team why over 24 years of experience matters.

Get Your Assessment