Regular readers of the Scalyr blog are familiar with our “Getting Started Logging As Quickly As Possible” series. As its name suggests, the series teaches how to get up and running with logging using a huge variety of programming languages and frameworks. While most of the posts stick to the formula laid out by the first post in the series, some venture into different formats, depending on the characteristics of the programming languages or frameworks they’re catering to. One thing is constant through all posts, though.
When explaining the need for logging, all the posts agree that software is complex. Despite your best efforts, as soon as you release an application, it’s anybody’s guess how it’s going to behave in the wild. Having a way to keep an eye on your application as soon as it hits production is essential. That’s the justification behind concepts such as logging, monitoring, log metrics, and observability.
Today’s post is about a specific facet of the observability world: log metrics. You’ll understand what they are and the role they play in the larger concept of observability. Also, we’ll clarify a somewhat common misconception: the perception that logs and metrics are somehow two competing solutions to the same problem. Then, before parting ways with some final considerations, we’ll walk you through a list of valuable insights you can learn by watching your log entries carefully. Let’s dig in.
The Importance of Observability
As its title suggests and we’ve confirmed during the introduction, this post is all about log metrics. You’d think we would start by defining log metrics. But it makes sense to first take a step back since logs and log metrics are but a tiny piece of the larger concept known as observability.
Defining Observability
So, what’s observability? Here’s how Wikipedia defines it:
Observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs.
In the definition above, “system” doesn’t refer specifically to information systems but to the concept of dynamic systems in control theory. How can we be sure that observability is so essential in the software development world as well?
In short, it all boils down to the famous complexity we keep talking about. Or, to be more accurate, it’s a new type of complexity we have to talk about.
The Brave New World
You see, in the past, even moderately complex systems were still somewhat simple enough that you could control their stability. Think of a simple CRUD web application. It’s complex due to the intrinsic complexity of software development. For such an application, you’d typically monitor metrics such as CPU and memory usage, database hits and performance, and so on, and that would be enough, for the most part.
Things are different now. This is 2020, and we live in an era of distributed systems. Things are complex in the true sense of the word—i.e., there are more parts interacting with each other, which dramatically increases the probability of failures.
Observability is what you need in order to troubleshoot such a complex scenario. By being able to ask questions about the internals of your system by watching its externals, you gain the ability to observe it. You achieve observability by having each relevant part of your system emit valuable information about what’s happening at a given moment in the form of log entries.
It’s essential to adopt a log management solution that enables you to collect and aggregate all your log data into a single centralized place.
Logs vs. Metrics? Enough of This Misconception
Before we go on to walk you through our list of valuable insights you can get from log metrics, it’s important that we clarify a common misconception. Some people seem to believe that when it comes to applying observability, logs and metrics are opposing sides of a scale.
That couldn’t be further from the truth. Logs and metrics are both valuable parts of observability. Each one has a role to play.
What Are Logs?
Logs record events that occur in a system. Events can be benign (e.g., a user signed in to the application) or they can indicate that something went wrong. The nature and severity of the event described by the log entry can be specified through mechanisms such as categories, tags, and logging levels.
The data in a log entry will contain details about the event: what happened, when it happened, which resources were being accessed, and so on. In short, you can think of a log entry as a narrative; it tells a story about what happened in production.
What Are Metrics?
Metrics are a measurement of a characteristic of a system at a specified instant in time. They differ from logs in some important ways. You can think of logs as events while metrics are more like snapshots. You can collect the latter whenever a given event happens, but the latter is usually collected at fixed intervals.
Metrics usually include a name, a value, a timestamp, and a source. Similarly to logs, metrics can come from a lot of places in your information stack: applications (your own or a third-party’s), infrastructures, cloud, containers, and much more.
Five Things Your Log Entries Can Teach You
Finally, we get to the meat of the post: what log metrics are and what you can learn from them.
Log metrics refers to metrics that can be gathered by reading and analyzing log entries. Since logging is virtually universal when it comes to informational systems, it might offer a unique window into each and every layer of your IT infrastructure. By carefully keeping track of measurements from your log entries, you can obtain insights that you wouldn’t be able to get otherwise.
Without further ado, let’s move on to our list. What are the five main things your log files can teach you?
They Can Teach You How Your Users Behave
The first thing you can learn from your log files is how your users actually use your application. Being able to understand what drives users to find your application, what makes them leave it, and how much time they spend on each part is invaluable knowledge.
Even though services like Google Analytics might offer a window into user behavior, having that information right out of your own application logs makes a difference. It allows you to uncover usage patterns that are about your application’s features and not merely the most visited pages and average session time.
User behavior patterns can really shine when it comes to practices like A/B testing. Having real data that comes directly from application logs is priceless when trying to access how popular—or not—a given feature is with your users, be they the control group or the treatment group.
They Can Teach You How Often Things Go Wrong
Software is complex. Things go wrong. In fact, things tend to go wrong much more often than we’d like, but how often is that? Log metrics can help you discover the answer.
By monitoring logs you can know how often unhandled exceptions are thrown in your application over a given period. That’s a valuable metric to know. If your organization adopts canary releases, for instance, you can define the number of unhandled exceptions that you’re willing to tolerate and roll back the release if the threshold is crossed.
They Can Teach You How Busy Your Application Is
It’s not only application logs that can teach you valuable lessons. The same is true for server logs as well.
By analyzing metrics related to HTTP requests, you can get a sense of the number of requests received over a period of time. Also, you can determine the rate of successful and unsuccessful requests.
They Can Teach You How User Experience Is Affected by Performance
Log aggregation enables you to perform a pretty neat analysis. You can, for instance, bring log entries from different sources together and obtain valuable insights.
For instance, by observing both server logs and web application logs, you can find out if there’s a positive correlation between server performance and user engagement. In other words, the poorer your web server performance, the fewer users engage with your app.
They Can Teach You About Possible Security Breaches
Log reviews can help you detect attempted security breaches. The detection of abnormal activities, such as an unusually high number of login attempts, should trigger a red flag.
Listen to Your Log Files
Observability is a big part of ensuring quality for your distributed systems. It gives you a window into your system’s internals by allowing you to understand how it behaves by looking at external cues.
Two popular ways of approaching observability are the use of logs and metrics. Although many think they’re antagonistic, that’s not really true. Since each one of them has a different purpose, you can and should use both. You should even go a step further and learn how to capture valuable insights from your log files. To do that you’ll need to collect logs from disparate sources and bring them into a centralized location. A log management solution like Scalyr can help you with that.
Your logs are trying to talk to you. You’ve just got to listen carefully. Thanks for reading, and I’ll see you next time.