How To Search Log Files: 3 Approaches To Extract Data

Searching log files can be a tedious process. It’s not an easy task to sift through large amounts of log data. However, log files can tell you what happened in your application. Therefore, it’s an important skill for a developer to be able to quickly search log files to solve time-critical problems.

There are many reasons you might want to search logs. Perhaps you want to better understand a certain problem. Log files provide a lot of valuable information that can help you nail down the root cause of your issue.

Some possible use cases where you want to search log files include

  • Finding a specific log level, such as error or fatal
  • Finding logs for events with a specific timestamp or that occurred between two timestamps
  • Searching for a specific keyword in your log data
  • Removing unnecessary information, such as a computer name or user ID

In this post, we’ll show you three ways to extract data from your log files. To accomplish this, we’ll be using the Bash Unix shell to filter, search, and pipe log data. Now, let’s take a look at what the Bash shell can do.

Vaccum sucking up charts signifying search logs

Understanding the Bash Unix Shell

When you start your terminal, the default shell is most frequently the Bash Unix shell for Mac and Linux users. For Windows users, it’s possible to install the Bash shell using the Windows subsystem for Linux. The Bash shell allows you to run programs, also known as commands.

Luckily, the Bash Unix shell provides us with a lot of different commands that we can use to search and filter data. Furthermore, the Bash shell provides you with the ability to pipe data. This means we can chain multiple commands and pass the output of one command to the next command in a single action.

Let’s say we have a file containing 100 lines of log data from which we want to filter out all the error log levels and sort the remaining logs by timestamp. We don’t need to write any code for this. We can use filtering commands to filter out all error-level logs and then pipe the filtered result to the sort command to sort the logs by timestamp. Below, you see a pseudocode example of how this might work:

read all logs -> find 'error' -> sort by timestamp

Now that we have the foundation, it’s time to get practical. Let’s take a look at several commands you can use to filter logs and an example use case for each.

Bash Commands To Extract Data From Log Files

For the examples in this post, let’s use the below dataset. You can also download the dataset from GitHub to try out the commands yourself.

Each log line contains the following information:

  1. Date
  2. Timestamp
  3. Log level
  4. Service or application name
  5. Username
  6. Event description
2015-12-03 17:08:36 DEBUG SEARCHQUEUE-DAILY-SEARCH :: [User1] :: Attempting to add item to cache: Jimmy.Fallon.2015.12.02.Brett.Favre.720p.HDTV.x264-CROOKS[rartv]
2015-12-03 17:08:36 DEBUG SEARCHQUEUE-DAILY-SEARCH :: [User1] :: Unable to parse the filename Jimmy.Fallon.2015.12.02.Brett.Favre.720p.HDTV.x264-CROOKS[rartv] into a valid show
2015-12-03 17:08:36 DEBUG SEARCHQUEUE-DAILY-SEARCH :: [User1] :: Attempting to add item to cache: Moonbeam.City.S01E09.The.Legend.of.Circuit.Lake.720p.CC.WEBRip.AAC2.0.x264-BTW[rartv]
2015-12-03 17:08:38 DEBUG SEARCHQUEUE-WEEKLY-MOVIE :: [User1] :: Unable to parse the filename Moonbeam.City.S01E09.The.Legend.of.Circuit.Lake.720p.CC.WEBRip.AAC2.0.x264-BTW[rartv] into a valid show
2015-12-03 17:08:51 ERROR SEARCHQUEUE-DAILY-SEARCH :: [User1] :: Failed to find item in cache: Black-ish.S02E09.Man.At.Work.720p.EXTENDED.HULU.WEBRip.AAC2.0.H264-NTb[rartv]
2015-12-03 17:08:51 FATAL SEARCHQUEUE-DAILY-SEARCH :: [User1] :: Search service crashed lost connection: ERRORS.PUBKEYERR.service.logger
2015-12-03 17:08:53 DEBUG SEARCHQUEUE-DAILY-SEARCH :: [User1] :: Unable to parse the filename Christmas.Through.the.Decades.Part1.The.60s.HDTV.x264-W4F[rartv] into a valid show
2015-12-03 17:08:59 DEBUG SEARCHQUEUE-DAILY-SEARCH :: [User1] :: Attempting to add item to cache: The.League.S07E12.The.13.Stages.of.Grief.720p.WEB-DL.DD5.1.H264-NTb[rartv]
2015-12-03 17:09:01 DEBUG SEARCHQUEUE-DAILY-SEARCH :: [User1] :: Unable to parse the filename The.League.S07E12.The.13.Stages.of.Grief.720p.WEB-DL.DD5.1.H264-NTb[rartv] into a valid show
2015-12-03 17:09:29 DEBUG SEARCHQUEUE-DAILY-SEARCH :: [admin] :: Unable to parse the filename Dan.Cruickshank.Resurrecting.History.Warsaw.HDTV.x264-C4TV[rartv] into a valid show
2015-12-03 17:09:57 DEBUG SEARCHQUEUE-DAILY-SEARCH :: [User1] :: Unable to parse the filename This.Is.Tottenham.720p.HDTV.x264-C4TV[rartv] into a valid show
2015-12-03 17:09:57 DEBUG SEARCHQUEUE-DAILY-SEARCH :: [User1] :: Transaction with 2 queries executed
2015-12-03 17:09:57 INFO SEARCHQUEUE-DAILY-SEARCH :: [admin] :: Skipping Blindspot.S01E10.nl because we don't want an episode that's Unknown
2015-12-03 17:09:57 DEBUG SEARCHQUEUE-DAILY-SEARCH :: [admin] :: None of the conditions were met, ignoring found episode
2015-12-03 17:09:57 INFO SEARCHQUEUE-DAILY-SEARCH :: [admin] :: Skipping Arrow.S04E08.720p.FASTSUB.VOSTFR.720p.HDTV.x264-ZT.mkv because we don't want an episode that's 720p HDTV
2015-12-03 17:09:58 DEBUG SEARCHQUEUE-DAILY-SEARCH :: [User1] :: Using cached parse result for: Arrow.S04E08.1080p.WEB-DL.DD5.1.H264-RARBG
2015-12-03 17:09:58 INFO SEARCHQUEUE-DAILY-SEARCH :: [User1] :: Skipping Arrow.S04E08.720p.WEB-DL.DD5.1.H264-RARBG because we don't want an episode that's 720p WEB-DL

Before we can explore different commands, we need to know how we can read log data from log files. The simplest solution is to use the cat command, which allows you to read the contents of a file. Then, we can pipe the log data to other commands. However, for some commands, such as grep, you can directly pass a file as input.

Let’s get started!

Command #1: Grep

The first command in our list is the grep command. The Linux manual defines the grep command as follows:

grep searches for PATTERNS in each FILE. PATTERNS is one or more patterns separated by newline characters, and grep prints each line that matches a pattern.

Grep Use Case: Search for Log Level

Let’s start by searching for the error log level. We need to pass the word “ERROR” to the grep command. Note that the grep command is case-sensitive by default. We can use the piping symbol | to pass the log data to the grep command.

cat log.txt | grep "ERROR"

This returns the following two results:

2015-12-03 17:08:51 ERROR    SEARCHQUEUE-DAILY-SEARCH :: [User1] :: Failed to find item in cache: Black-ish.S02E09.Man.At.Work.720p.EXTENDED.HULU.WEBRip.AAC2.0.H264-NTb[rartv]
2015-12-03 17:08:51 FATAL    SEARCHQUEUE-DAILY-SEARCH :: [User1] :: Search service crashed lost connection: ERRORS.PUBKEYERR.service.logger

However, note that this also returned a fatal log level because the description field of this log line contains the word “ERRORS.” Let’s modify our grep command to only match the exact word “ERROR” and not match variants. We can use the -w option to tell the grep command to match the exact word.

cat log.txt | grep -w "ERROR"

And what if we want to filter for both error and info log levels? Luckily, the grep command can accept multiple patterns separated by the piping symbol. Importantly, use a backslash to escape the piping symbol.

cat log.txt | grep -w "ERROR|INFO"

However, for large log files, the number of returned results can contain hundreds of matches. Let’s apply a simple hack to count the number of results quickly. To accomplish this, the Bash shell provides us with the wc command. This command counts the number of returned lines using the -l option.

cat log.txt | grep -w "ERROR|INFO" | wc -l// 4 results

Cool, right? Next, let’s learn how to find logs between two timestamps using the sed command.

Command #2: Sed

Next, let’s explore the sed command. From the GNU manual pages, we can read the following definition:

sed is a stream editor. A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline). While in some ways similar to an editor which permits scripted edits (such as ed), sed works by making only one pass over the input(s), and is consequently more efficient. But it is sed’s ability to filter text in a pipeline which particularly distinguishes it from other types of editors.

To explore the sed command, let’s look for logs that occurred between two timestamps.

Sed Use Case: Find Logs Between Two Timestamps

You often want to look for logs between two specific timestamps. However, it’s not possible to scroll through huge log files to find the exact timestamp.

Therefore, let’s use the sed command to find all logs that happened for the following timestamp: 2015-12-03 17:08. This means we want to find all logs between 2015-12-03 17:08:00 and 2015-12-03 17:08:59. The below command uses the -n flag and p option to only print the matched results:

sed -n '/2015-12-03 17:08/p' log.txt

Moreover, this command still works when the date/time field isn’t the first element in your log line. You can try this out by switching the date/time with the log level.

Next, we want to search between two timestamps for different minutes. I want to retrieve all logs that occurred between 2015-12-03 17:08:00 and 2015-12-03 17:10:00. Here, the sed command accepts a second timestamp. Use a comma to separate the two timestamps. This command should return all lines for our log file:

sed -n '/2015-12-03 17:08/,/2015-12-03 17:10/p' log.txt | wc -l

However, we can accomplish the same using the grep command and a regular expression. Let’s say we only want to return results that happened between 2015-12-03 17:08:50 and 2015-12-03 17:08:59. We can simply pass the following pattern that matches numbers 50 to 59: 17:09:5[0-9].

grep '17:09:5[0-9]' log.txt

As you can see, there are always many possibilities to accomplish the same task or reach similar outcomes.

Command #3: Cut

Last, let’s learn how you can use the cut command to transform log files.

The Wikibooks documentation provides the following definition of cut: “Cut is a Unix command-line tool used to extract fields and the like from lines of input, available on many platforms.”

We’ll use the cut command to transform log file data.

Cut Use Case: Transform Log Files

As mentioned in the introduction, we want to only store logs with the error or fatal log level. Besides that, we don’t want to store the username who has accessed the service. Therefore, let’s remove :: [User1] :: from each log line.

Here’s the explanation of the full command using cut:

  • -d ‘ ‘ allows us to delimit our log line based on whitespace. Each whitespace-delimited snippet of text is identified as a column.
  • -f-4,8- allows us to cut columns 1 through 4 and column 8 until the end of the line. This removes the :: [User1] :: part. Note that :: is also treated as a column since it’s separated by whitespaces.
cat log.txt | grep -w "ERROR|INFO" | cut -d ' ' -f-4,8- log.txt

This is the final result with the username removed:

2015-12-03 17:08:36 DEBUG SEARCHQUEUE-DAILY-SEARCH Attempting to add item to cache: Jimmy.Fallon.2015.12.02.Brett.Favre.720p.HDTV.x264-CROOKS[rartv]

That’s it!

Searching Log Files With Bash: Many Commands To Reach the Same Outcome

I hope you learned how you can use different Bash shell commands to accomplish log data filtering, searching, and transforming.

As you may have noticed, you always have different possibilities and commands to accomplish the same goal. Furthermore, Bash allows you to chain multiple commands. For example, you might chain commands to read a log file, filter for certain log levels, or transform log data to a different format.

If you want to learn more about logging, read Scalyr’s article about the 10 commandments of logging. And make sure to check out Scalyr’s solutions to search log files.