Last year, there was a blog that appeared on DevOps.com, We Still Haven’t Solved the Logging Problem. It laid out the problems with centralized logging particularly with respect to modern development practices. The details laid out therein are meaningful but the conclusion, that centralized logging has no place in the modern world of development, is flawed.
In short, it isn’t a flaw of centralized logging in coping with today’s emerging technologies, it’s a flaw in the archaic tools being used to manage today’s high-volume, high-impact application workloads.
Metrics ≠ Root Cause.
As expressed in that blog, there is a fundamental flaw in pure dependence on metrics, be they time series or otherwise. Or to describe the process, alerts tell you something is wrong, metrics can help indicate roughly where something is wrong, but to determine what is actually wrong usually takes a dive into log files, especially when you are diving into the unknown. Most teams recognize the importance of log files but are still tied to a decades-old approach, namely keyword indexing, that fails to deliver flexibility and performance on today’s modern architectural scale. Add to that scale the performance hit of modern technologies like containers, orchestration, and microservices and you have a nearly complete reversal of the factors driving today’s development and operations teams.
So, facing it realistically, we are told of these major concerns in the log data deluge: Data ingest, performance on scale for search and analysis, and data storage (cost) concerns.
KWI is the villain.
The villain of this drama is keyword indexing. Keyword indexes (or KWI) are responsible for almost all of the evils ascribed to centralized logging.
KWI is a technology that has been around for a while now. In fact, Keyword In Context was first proposed in 1856 by Andrea Crestadoro. In computer terms, IBM’s computer scientist Hans Peter Luhn published Bibliography and Index: Literature on information retrieval and machine translation, November 1958 and kicked off the automated indexing.
So, KWI is an older technology and designed for problems of that time. KWI is designed to build indexes for documents. Things that are mostly write-once, read-many. And since indexing is really designed for human language, which while we are getting closer (Hello, Alexa, Siri. Okay Google) computers don’t think or speak human.
So, think of indexing just like it is most often seen, an index in a book. If the term you are searching for is in the index, you are mostly golden. But what if the index term is everywhere? Imagine a cookbook that merely indexed “beef” and expected you to go find the recipe you wanted. You’d get tired of that pretty soon. So now we add subindexing to the indexes, and so forth. Every time you add a new recipe you’re going to have to rebuild the index. Of course, that recipe doesn’t usually only have one ingredient, so you are going to rebuild not just the beef entry, but every other entry as well. So in today’s constantly evolving and growing data mode, you’re going to spend a lot of time rebuilding and adding page numbers to a lot of different items. Sometimes it is easier to just create a whole new book, or edition than to try to continually update an existing one.
Log management may have started with KWI decades ago. However, our computer, applications and deployment models have all evolved. We now talk about microservices. We talk about IoT. We constantly refer to CI/CD. We talk about hybrid and multi-cloud. We are indeed creating more relevant log data from more moving parts, but that actually argues for more centralized and searchable log aggregation rather than less. It’s less about “oh, something is wrong” to “How fast can I get to the MTT WTF?” In our highly distributed environments, the ability to deep dive into any part from a central source is imperative.
Centralized Logging the NoSQL way.
The correct approach for scale is to not process the data on ingest. Scalyr makes use of a NoSQL database designed for time series data. With no preprocessing, it is possible to ingest data on scale. With no munging of the data there is no data expansion. The correct approach is to allow free form searching, not limited to the few things you can foresee and assign as keywords. After all, much of what we need to search is the part where we don’t know what is wrong. And those searches are most often not indexed, killing any perceived performance we might seek with indexing.
The closing questions from the DevOps posting are: “Is centralized logging any better than it was 10 years ago? Or has all this data generated exploded so much that the tools we have today are just barely keeping up?”
The answer is quite clear. Modern logging tools like Scalyr that have moved past the keyword indexing and archaic query languages are designed to not only weather the data deluge but in fact improve troubleshooting as more demands are made on the system.
But don’t just read about it, find out for yourself.