Using awk to effectively filter Apache/Nginx access logs
Access log lines from Apache and Nginx look like:
your.domain.tld 72.81.123.33 - - [15/Feb/2019:17:28:59 +0100] "GET /path HTTP/1.1" 200 16115 "https://www.referrer.net" "Mozilla/5.0 (Linux; Android 5.0.1; SAMSUNG GT-I9515 Build/LRX22C) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/8.2 Chrome/63.0.3239.111 Mobile Safari/537.36" - 0.202
Tools like cut
and awk
don't see the bits between brackets and quotes as
fields, but split on whitespace, making it more difficult to get the data you
need.
FPAT
to the rescue:
head /var/log/nginx/extra.log | awk '{print $6}' FPAT='[^ ]*|"[^"]*"|\\[[^]]*\\]'
This will get you all the requests strings a logfile with the above example format.
Read more about defining fields by content.