PowerQueries provide a useful way to analyze data. Scalyr makes building tables and computations of vast amounts of data easy. In this post we are going to focus on a few techniques to produce more useful reports: parsing from fields, calculating time deltas, enriching and classifying your data, and adding summary rows and columns.
If you are unfamiliar with PowerQueries, I would recommend going through the documentation first. We also have a series of quickstart videos. Below are some tips and tricks for people relatively comfortable with basic PowerQueries.
Tip #1:
Parse from Fields
Parsing is a key part of the data ingestion pipeline in Scalyr, but what if your data did not parse properly, or a parser was not preset for your logs? PowerQueries allow us to parse on the fly.
Let’s take a simple example of a file path on a Widows machine
$serverHost in ("Server 2008") logfile = *
|columns serverHost, logfile
|group count() by logfile
Let’s say that we want to extract data from the logfile
field
We can do so with the parse
command
In this dataset, let’s extract the drive letter and the logfile name. To do so we will write a regular expression in the parse command. The regular expression (make sure to double-escape your backslashes), will take the value before the :\
in C:\
and will take the value before the .log
in agent.log
$serverHost in ("Server 2008") logfile = *
|columns serverHost, logfile
|group count = count() by logfile
|parse "$drive$:(\\\\[^\\\\]+)+\\\\$file$\\.\\w+" from logfile
Now we can perform functions on our new fields
$serverHost in ("Server 2008") logfile = *
|columns serverHost, logfile
|group count = count() by logfile
|parse "$drive$:(\\\\[^\\\\]+)+\\\\$file$\\.\\w+" from logfile
|group count = count() by drive, file
|columns file, drive
Tip #2:
Conditional Logic
Did you know that you can perform conditional logic in PowerQueries using the ternary operators? This gives us the opportunity to specify more readable labels, or classify our data into named buckets.
Programmers use ternary operators in languages like Python for decision making in place of conditional statements if and else. The ternary operator takes three arguments: The first argument is a comparison argument, the second is the result upon a true comparison, and the third is the result upon a false comparison. If it helps, you can think of the operator as shortened way of writing an if-else statement.
Here’s a simple decision-making example using if and else statements in Python:
import re
hosts = ["server-prod", "server-dev", "server-qa"]
value = [100000000,110000000,140000000]
env = []
score = []
for i in range(len(hosts)):
#evaluate host
if re.search("prod", hosts[i]):
env.append("prod")
elif re.search("dev", hosts[i]):
env.append("dev")
else:
env.append("qa")
#evaluate score
if value[i] > 130000000:
score.append("High")
else:
score.append("Low")
print(hosts)
print(value)
print(env)
print(score)
We can do the same in PowerQueries
tag='logVolume' metric='logBytes'
|group value = sum(value) by host
|sort -value
|columns value,
host,
score = (value > 130000000) ? "high" : "low",
env = (host matches "prod") ? "Prod" : (host matches "sb") ? "sb" : "qa"
In other words, we are bucketing each row in to a score
of high
or low
based on value
, and specifying a value for env
based on the host
field.
Conclusion
PowerQueries can be utilized in a multitude of applications. It is a great way to perform SQL-like queries to large unstructured data sets. If you would like to view more about PowerQueries, check out our public documentation.