You need to see who’s accessing your systems. This often means you have to grep an IP address from a log file. Grep is a command-line tool for searching text in files using regular expression syntax.
Let’s take a look at searching for IP addresses in log files using grep and how you can use regular expressions to search for addresses in different situations.
For this tutorial, we’ll use a sample HTTPD access log. You can download the file and follow along by opening this link in another tab and then saving the page as a file to your computer.
Grep for an Exact IP Address in a Log File
First, let’s take a look at searching for an exact address in an access log.
Run grep with the IP address you’re looking for and the name of the log file.
$ grep 46.72.177.4 access.log 46.72.177.4 - - [12/Dec/2015:18:31:08 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.177.4 - - [12/Dec/2015:18:31:08 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.177.4 - - [14/Dec/2015:16:39:27 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.177.4 - - [14/Dec/2015:16:39:28 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" “-“~ 188.187.105.165 - - [30/Jan/2019:10:57:26 +0100] "GET /apache-log/46.72.177.4%20-%20-%20[12/Dec/2015:18:31:08%20+0100]%20%22GET%20/administrator/%20HTTP/1.1%22%20200%204263%20%22-%22%20%22Mozilla/5.0%20(Windows%20NT%206.0;%20rv:34.0)%20Gecko/20100101%20Firefox/34.0%22%20%22-%22 HTTP/1.1" 404 417 "-" "Wget/1.20.1 (darwin17.7.0)" "-" 188.187.105.165 - - [30/Jan/2019:10:57:26 +0100] "GET /apache-log/46.72.177.4%20-%20-%20[12/Dec/2015:18:31:08%20+0100]%20%22POST%20/administrator/index.php%20HTTP/1.1%22%20200%204494%20%22http://almhuette-raith.at/administrator/%22%20%22Mozilla/5.0%20(Windows%20NT%206.0;%20rv:34.0)%20Gecko/20100101%20Firefox/34.0%22%20%22-%22 HTTP/1.1" 404 466 "-" "Wget/1.20.1 (darwin17.7.0)" "-" 188.187.105.165 - - [30/Jan/2019:10:59:44 +0100] "GET /apache-log/46.72.177.4%20-%20-%20[14/Dec/2015:16:39:27%20+0100]%20%22GET%20/administrator/%20HTTP/1.1%22%20200%204263%20%22-%22%20%22Mozilla/5.0%20(Windows%20NT%206.0;%20rv:34.0)%20Gecko/20100101%20Firefox/34.0%22%20%22-%22 HTTP/1.1" 404 417 "-" "Wget/1.20.1 (darwin17.7.0)" "-" 188.187.105.165 - - [30/Jan/2019:10:59:44 +0100] "GET /apache-log/46.72.177.4%20-%20-%20[14/Dec/2015:16:39:28%20+0100]%20%22POST%20/administrator/index.php%20HTTP/1.1%22%20200%204494%20%22http://almhuette-raith.at/administrator/%22%20%22Mozilla/5.0%20(Windows%20NT%206.0;%20rv:34.0)%20Gecko/20100101%20Firefox/34.0%22%20%22-%22 HTTP/1.1" 404 466 "-" "Wget/1.20.1 (darwin17.7.0)" "-"
I trimmed the results from the sample file to save room. I’ll do this for most of the results.
Grep did exactly what you asked it to do. It found all of the instances of 46.72.177.4 in the file and returned the lines that contain it. Some of the lines start with the IP address you’re looking for. That indicates an HTTP request from that address. Others contain references to it.
What if you only want to see requests? Try pasting in a regular expression that matches a line beginning with 46.72.177.4.
$ grep "^46.72.177.4" access.log 46.72.177.4 - - [12/Dec/2015:18:31:08 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.177.4 - - [12/Dec/2015:18:31:08 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.177.4 - - [14/Dec/2015:16:39:27 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.177.4 - - [14/Dec/2015:16:39:28 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.177.4 - - [15/Dec/2015:18:16:52 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.177.4 - - [15/Dec/2015:18:16:52 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.177.4 - - [17/Dec/2015:19:43:47 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.177.4 - - [17/Dec/2015:19:43:47 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
You got a much smaller result set now because *^* means match the following string only if it occurs at the beginning of the line.
So, now you see each time that a specific IP address accessed this server. If you’re looking for a count, pipe the results through the word count utility, *wc*.
$ grep "^46.72.177.4" access.log | wc -l 8
Your sample IP address made eight requests.
That covers a lot of ground. But what if you need to match parts of addresses?
Grep for Part of an IP Address in a Log File
Instead of searching for specific IP addresses, you might want to look for parts of an address.
Let’s try searching for the first two octets in an address—in other words, a Class B network. You’re looking for this address as both a requester and within requests.
$ grep "46.72" access.log |more 46.72.177.4 - - [12/Dec/2015:18:31:08 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.177.4 - - [12/Dec/2015:18:31:08 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.213.133 - - [12/Dec/2015:18:39:27 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.213.133 - - [12/Dec/2015:18:39:27 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.184.174 - - [12/Dec/2015:18:51:08 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.184.174 - - [12/Dec/2015:18:51:08 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.185.236 - - [12/Dec/2015:19:31:11 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.185.236 - - [12/Dec/2015:19:31:12 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 37.159.185.154 - - [27/Aug/2020:15:48:21 +0200] "GET /apache-log/access.log HTTP/1.1" 200 14637264 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36" "-" 3.121.24.234 - - [27/Aug/2020:19:14:42 +0200] "GET /apache-log/access.log HTTP/1.1" 200 16846272 "-" "Mozilla/5.0 (Linux; Android 6.0; Nexus 7 Build/KRT16M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36" "-" 3.121.24.234 - - [29/Aug/2020:03:44:42 +0200] "GET /apache-log/access.log HTTP/1.1" 200 10446872 "-" "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Mobile Safari/537.36" "-" 172.58.204.254 - - [30/Aug/2020:01:08:17 +0200] "GET /apache-log/access.log HTTP/1.1" 200 1346272 "https://www.google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36" "-"
Grep matched lines that you weren’t looking for. Why?
In regular expression syntax, a period matches “any character.” So you got lines that contain 46872 and 46272, among others.
Let’s move to extended regular expressions.
$grep -E "46.72" access.log |more 46.72.177.4 - - [12/Dec/2015:18:31:08 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.177.4 - - [12/Dec/2015:18:31:08 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.213.133 - - [12/Dec/2015:18:39:27 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.213.133 - - [12/Dec/2015:18:39:27 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.184.174 - - [12/Dec/2015:18:51:08 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.184.174 - - [12/Dec/2015:18:51:08 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 188.187.105.165 - - [30/Jan/2019:11:03:10 +0100] "GET /apache-log/46.72.192.202%20-%20-%20[18/Dec/2015:07:54:10%20+0100]%20%22GET%20/administrator/%20HTTP/1.1%22%20200%204263%20%22-%22%20%22Mozilla/5.0%20(Windows%20NT%206.0;%20rv:34.0)%20Gecko/20100101%20Firefox/34.0%22%20%22-%22 HTTP/1.1" 404 419 "-" "Wget/1.20.1 (darwin17.7.0)" "-" 188.187.105.165 - - [30/Jan/2019:11:03:10 +0100] "GET /apache-log/46.72.192.202%20-%20-%20[18/Dec/2015:07:54:10%20+0100]%20%22POST%20/administrator/index.php%20HTTP/1.1%22%20200%204494%20%22http://almhuette-raith.at/administrator/%22%20%22Mozilla/5.0%20(Windows%20NT%206.0;%20rv:34.0)%20Gecko/20100101%20Firefox/34.0%22%20%22-%22 HTTP/1.1" 404 468 "-" "Wget/1.20.1 (darwin17.7.0)" "-" 54.185.146.72 - - [29/Nov/2019:15:02:14 +0100] "GET /apache-log/access.log HTTP/1.1" 200 64168 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36" "-" 54.185.146.72 - - [13/Jan/2020:13:33:42 +0100] "GET /apache-log/access.log HTTP/1.1" 200 42904 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36" "-"
When you pass -E to grep, you enable extended regular expressions. This allows you to escape the period in the IP address with a backslash, telling grep to match literal periods instead of any character. So, ”46.72” matches exactly what you’re looking for.
Unfortunately, it turns out that “exactly what you’re looking for” isn’t exactly what you’re looking for. The last few lines above contain addresses that end with 46.72.
If you search with ”^46.72” you’ll get requests that begin with those two octets. But you’ll miss the lines that contain them as part of a query.
Regular expressions have the notion of word boundaries. When you specify words, they’ll match the expression only if it doesn’t overlap any other characters. So, the last two lines above would be filtered because 46.72 is preceded by a numeral 1.
Add w to the command-line arguments to grep.
$ grep -Ew "46.72" access.log |more 46.72.177.4 - - [12/Dec/2015:18:31:08 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.177.4 - - [12/Dec/2015:18:31:08 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.213.133 - - [12/Dec/2015:18:39:27 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.213.133 - - [12/Dec/2015:18:39:27 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 188.187.105.165 - - [30/Jan/2019:11:03:10 +0100] "GET /apache-log/46.72.192.202%20-%20-%20[18/Dec/2015:07:54:10%20+0100]%20%22GET%20/administrator/%20HTTP/1.1%22%20200%204263%20%22-%22%20%22Mozilla/5.0%20(Windows%20NT%206.0;%20rv:34.0)%20Gecko/20100101%20Firefox/34.0%22%20%22-%22 HTTP/1.1" 404 419 "-" "Wget/1.20.1 (darwin17.7.0)" "-" 188.187.105.165 - - [30/Jan/2019:11:03:10 +0100] "GET /apache-log/46.72.192.202%20-%20-%20[18/Dec/2015:07:54:10%20+0100]%20%22POST%20/administrator/index.php%20HTTP/1.1%22%20200%204494%20%22http://almhuette-raith.at/administrator/%22%20%22Mozilla/5.0%20(Windows%20NT%206.0;%20rv:34.0)%20Gecko/20100101%20Firefox/34.0%22%20%22-%22 HTTP/1.1" 404 468 "-" "Wget/1.20.1 (darwin17.7.0)" "-"
Adding the w flag to the command indicates we want word matches only. This filtered out the unwanted matches. (If there were any addresses with 46.72 as the middle or last octets, you would see them, though.)
So far you’ve only worked with searching for addresses you know. What if you want to match any address?
Grep for Any IP Address in a Log File
If you don’t know the address you’re looking for, you need to write an expression that will match one. For this, you’ll need character classes and wildcard matches.
A character class is a set of characters. The set is denoted with square brackets: [ ]. IP addresses are made up of numerals. So, you need the character class [0-9] to match any numeral.
An address has four sets of up to three numerals. So, you need to match at least one, but no more than three numerals for each octet. You can do this with curly braces: { }. [0-9]{1,3} matches up to three numerals.
So, the complete expression for any complete IP address is: [0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}
$ grep -E "[^^][0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}" access.log |more 109.169.248.247 - - [12/Dec/2015:18:25:11 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 109.169.248.247 - - [12/Dec/2015:18:25:11 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.177.4 - - [12/Dec/2015:18:31:08 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-" 46.72.177.4 - - [12/Dec/2015:18:31:08 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
Since we’re using an HTTP access.log, this search will match every line.
What if we want to build a list of IP addresses without the request information? Grep has the -o command-line option, which only returns the part of each line that matches the regular expression.
$ grep -E -o "[^^][0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}" access.log |more 109.169.248.247 109.169.248.247 46.72.177.4 46.72.177.4
That cuts down on the noise and gives us a list of addresses. But, since this is an access log, we’re seeing a lot of duplicates. That’s easy to fix.
$ grep -E -o "^[^^][0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}" access.log|uniq|more 109.169.248.247 46.72.177.4
The uniq command-line tool filters duplicate items from its input. So, piping the output from grep through it gives us a list of all unique IP addresses in the file. You can also get counts for each address by adding -c to uniq.
$ grep -E -o "^[^^][0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}" access.log|uniq -c|more 42 109.169.248.247 16 46.72.177.4
In this case, the two IP addresses appeared 42 and 16 times.
Grep and Regular Expressions
You’ve used grep and regular expression syntax to search for IP addresses in a log file. You also added the uniq command to filter the addresses into a list. These basic building blocks give you everything you need for almost any situation.
There’s a better to analyze IP addresses in your log files. Scalyr indexes your logs in real-time. So, you can search for specific addresses with PowerQueries. For example, you quickly display common IP addresses to see which clients are creating the most traffic. So, instead of trying to generate reports with command-line tools, you already have the data you need at your fingertips!
Sign up for a free trial here and get started with blazing-fast log analytics today!