Row vs Column-Oriented Databases | SentinelOne

Adding fit and finish to your PowerQueries reports

PowerQueries provide a useful way to analyze data. Scalyr makes building tables and computations of vast amounts of data easy. In this post we are going to focus on a few techniques to produce more useful reports: parsing from fields, calculating time deltas, enriching and classifying your data, and adding summary rows and columns.

If you are unfamiliar with PowerQueries, I would recommend going through the documentation first. We also have a series of quickstart videos. Below are some tips and tricks for people relatively comfortable with basic PowerQueries.

Tip #1:

Parse from Fields

Parsing is a key part of the data ingestion pipeline in Scalyr, but what if your data did not parse properly, or a parser was not preset for your logs? PowerQueries allow us to parse on the fly.

Let’s take a simple example of a file path on a Widows machine

$serverHost in ("Server 2008") logfile = *
|columns serverHost, logfile
|group count() by logfile

Let’s say that we want to extract data from the logfile field

We can do so with the parse command

In this dataset, let’s extract the drive letter and the logfile name. To do so we will write a regular expression in the parse command. The regular expression (make sure to double-escape your backslashes), will take the value before the :\ in C:\ and will take the value before the .log in agent.log

$serverHost in ("Server 2008") logfile = *
|columns serverHost, logfile
|group count = count() by logfile
|parse "$drive$:(\\\\[^\\\\]+)+\\\\$file$\\.\\w+" from logfile

Now we can perform functions on our new fields

$serverHost in ("Server 2008") logfile = *
|columns serverHost, logfile
|group count = count() by logfile
|parse "$drive$:(\\\\[^\\\\]+)+\\\\$file$\\.\\w+" from logfile
|group count = count() by drive, file
|columns file, drive

Tip #2:

Conditional Logic

Did you know that you can perform conditional logic in PowerQueries using the ternary operators? This gives us the opportunity to specify more readable labels, or classify our data into named buckets.

Programmers use ternary operators in languages like Python for decision making in place of conditional statements if and else. The ternary operator takes three arguments: The first argument is a comparison argument, the second is the result upon a true comparison, and the third is the result upon a false comparison. If it helps, you can think of the operator as shortened way of writing an if-else statement.

Here’s a simple decision-making example using if and else statements in Python:

import re
hosts = ["server-prod", "server-dev", "server-qa"]
value = [100000000,110000000,140000000]
env = []
score = []
for i in range(len(hosts)):
#evaluate host
  if re.search("prod", hosts[i]):
      env.append("prod")
  elif re.search("dev", hosts[i]):
      env.append("dev")
  else:
      env.append("qa")
#evaluate score
  if value[i] > 130000000:
      score.append("High")
  else:
      score.append("Low")
print(hosts)
print(value)
print(env)
print(score)

We can do the same in PowerQueries

tag='logVolume' metric='logBytes'
|group value = sum(value) by host
|sort -value
|columns value,
				 host,
				 score = (value > 130000000) ? "high" : "low",
         env = (host matches "prod") ? "Prod" : (host matches "sb") ? "sb" : "qa"

In other words, we are bucketing each row in to a score of high or low based on value, and specifying a value for env based on the host field.

Conclusion

PowerQueries can be utilized in a multitude of applications. It is a great way to perform SQL-like queries to large unstructured data sets. If you would like to view more about PowerQueries, check out our public documentation.