Set of tools used to generate usage statistics for 42l https://stats.42l.fr
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
neil 915bb3d5b8 fixing path 3 months ago
gen-indexes fixing path 3 months ago
logs-rapports initial commit! 3 months ago
LICENSE Initial commit 3 months ago
README.md initial commit! 3 months ago

README.md

stats-tools

Set of tools used to generate usage statistics for 42l.

Prerequisites

You need to use logrotate to cut logs perfectly, on a weekly basis. The delicate part is to avoid having some lines of logs that slips in the wrong week’s logfiles. It’s a matter of seconds.

logs-rapports.py

This script generates weekly GoAccess reports through multiple logfiles (one per service).

Each service has its own script to allow more granularity on the displayed panels.

For each service, the following reports are generated:

  • Report with crawlers ;
  • Report without crawlers ;
  • Report crawlers only ;
  • Internal report with specified parameters in scripts/internal.sh ;
  • Internal JSON report.

The internal reports are meant to display more sensitive information for system administration / monitoring purposes, and are kept private.

On public reports, the sed command is used in scripts to hide one more octet from the visitor’s IPs. The --anonymize-ip GoAccess parameter was hiding only one octet, which isn’t enough.

The sed command doesn’t affect IPv6 addresses. GoAccess will hide the last 80 bits of each IPv6 address. If this isn’t satisfying enough to you, PRs are welcome :)

The script must be run as root on the host, since it must create containers.

The script interrupts at the first error, so no further unwanted damage is done to your files.

logs-rapports-monthly.py

This one will do exactly the same as the script above, but will generate monthly logs instead of weekly.

Because we are storing weekly logs, some tweaking was necessary to get perfect monthly logs (from the first day to the last day of the month, not more, not less).

The logs are stored following the ISO calendar. The script calculates which weeks contains at least one day of the concerned months, and scans those logs. The grep command is used to prevent GoAccess from scanning days that doesn’t belong the the selected month.

By default, the selected month is the current month - 1.

gen-indexes.py

This script generates index.html files to browse GoAccess reports through a user-friendly interface.

It can be run in a Docker container, but you must install additional packages (pip3 install -r requirements.txt), mainly:

  • Jinja2, the templating engine used to generate the HTML files
  • natsort, to apply a natural ordering on a list.

Mount your reports volume in the /base/ folder (and/or edit the script’s constants to your needs).

The templates’ design is quite poor and the script doesn’t handle errors very well. PRs welcome!

Usability

The scripts aren’t very modular, they’ve been made to answer specific needs and work in specific conditions that aren’t always met for every infrastructure. Feel free to edit or improve them for your needs!