Linux (or Unix) command line utilities like awk, sort, uniq, can be used to analyze apache log to get interesting stats. One use case is to find top IP addresses hitting your web site. Here is handy command for that and its outcome on a sample data.
$ cat /var/log/apache2/access_log.2016-02-04 | awk '{print $1}' | sort | uniq -c | sort -rn | head 397 52.8.183.64 23 157.55.39.29 20 157.55.39.178 17 157.55.39.176 15 157.55.39.101 11 157.55.39.177 10 185.45.13.148 9 157.55.39.179 8 117.207.192.224 7 141.8.143.217
Note that the access file location is based on Apache installed on Ubuntu Linux.
Few points to note
- sort can handle fairly large amount of data even on low RAM machine.
uniq -c
will output unique entries with count. It works only on sorted data.- sort -rn does a reverse numeric sort