Web analytics with Grafana Loki

Oli Zimpasser
3 min readJun 14, 2024

--

I recently asked myself if it is possible to build a typical web analytics system using Grafana Loki.

Throughout the article, I have removed all code that is unnecessary for this purpose. For a complete yet simple example, visit https://github.com/oglimmer/traefik-loki-grafana-web-analytics.

This is what the overall architecture and its building blocks look like:

Flow of information from a user issuing an http request to showing diagrams in grafana

traefik

We have to enable access logs for traefik. Additionally it makes our live easier to write json instead of a common log format. Finally we want to see the User-Agent and Referer headers in the log.

For the purpose of Web Analytics traefik also has to see the source IP of all incoming http requests. There are various ways to archive this, one simple — but not recommended way in production — is to enable network_mode: host.You might want to look up how to enable the proxy protocol between your edge load balancers and traefik for a more secure way.

# docker-compose.yml ...
traefik:
image: traefik:v3.0
command:
- "--accesslog=true"
- "--accesslog.filepath=/opt/access-logs/access.json"
- "--accesslog.format=json"
- "--accesslog.fields.defaultmode=keep"
- "--accesslog.fields.headers.defaultmode=keep"
- "--accesslog.fields.headers.names.User-Agent=keep"
- "--accesslog.fields.headers.names.Referer=keep"
network_mode: host
volumes:
- ./access-logs:/opt/access-logs

Now we have traefik writing proper access logs with source IPs.

promtail

The next step is to push these access into Loki, which is done by promtail.

promtail needs a configuration file, which configures where to look for access logs, how to transform and enrich it and finally where to send it.

Adding promtail to a docker compose definition is mostly defining volumes:

# docker-compose.yml ...
promtail:
image: grafana/promtail:2.9.3
command: -config.file=/etc/promtail/promtail.yaml
volumes:
- "./promtail-config.yml:/etc/promtail/promtail.yaml"
- "./access-logs:/var/log"
- "./promtail-data:/tmp/positions"
- "./GeoLite2-City.mmdb:/etc/promtail/GeoLite2-City.mmdb"

For IP to geographical location lookup we have to provide MaxMind's GeoLite2-City.mmdb file. You can download a version from the MaxMind Homepage.

I have commented the promtail-config.yml on the different sections for a better understanding. You can find the full documentation here.

# for a simple access log push we don't need the server capabilities 
server:
disable: true

# where to send the logs - our Loki server / container
clients:
- url: "http://loki:3100/loki/api/v1/push"

# stores the file pointer inside access logs which have been sent
positions:
filename: /tmp/positions/positions.yaml

target_config:
sync_period: 10s

scrape_configs:
- job_name: traefik-logs
pipeline_stages:
# extracts json fields to make them labels
- json:
expressions:
client_host: ClientHost
user_agent: "\"request_User-Agent\""
request_path: RequestPath
# uses MaxMind GeoLite2 to map IP addresses to geo locations
- geoip:
source: client_host
db: /etc/promtail/GeoLite2-City.mmdb
db_type: city
# drop certain geoip labels, as we are limited to 15 labels in total
- labeldrop:
- geoip_postal_code
- geoip_subdivision_code
- geoip_continent_code
- geoip_continent_name
- geoip_subdivision_name
- geoip_timezone
# uses a regex to extract the OS from the user_agent
- regex:
source: user_agent
expression: "(?P<OS>Windows \\w+ \\d+(?:\\.\\d+)*|Linux(?: (?:i686|x86_64))?|Macintosh|(?:CPU )?iPhone OS|CPU OS.*?like Mac OS X)"
# uses a regex to extract the Device type from the user_agent
- regex:
source: user_agent
expression: "(?P<Device>iPhone|iPad|Mobile|Android(?: \\d+(?:\\.\\d+)*))"
# uses a regex to extract the Browser from the user_agent
- regex:
source: user_agent
expression: "(?P<Browser>(MSIE|(?:Mobile )?Safari|Chrome|\\b\\w+\\b Chromium|Firefox|Version|Mobile|GSA|QuickLook|OPR)[ \\\\/](?:[A-Z\\d]+\\b|\\d+(?:\\.\\d+)*))"
# defines new labels from extracted fields within the pipeline processing
- labels:
client_host:
user_agent:
request_path:
OS:
Device:
Browser:
# define the static labels and the filesystem location to find the
# log to be scraped
static_configs:
- targets:
- localhost
labels:
job: traefik
host: localhost
__path__: /var/log/*.json

Grafana

Finally we have to create a Grafana dashboard to visually present all these information. I have added the Grafana dashboard as JSON in the github repository linked at the beginning.

While this cannot catch up to full-grown web analytics tools, it certainly contains some useful information.

--

--

Oli Zimpasser
Oli Zimpasser

Written by Oli Zimpasser

Did you ever want to manage a Kebab restaurant in JavaScript? https://codeyourrestaurant.com

No responses yet