Set of tools used to generate usage statistics for 42l https://stats.42l.fr
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
neil 7e25bba15d removing scripts 2 months ago
scripts removing scripts 2 months ago
.gitignore removing scripts 2 months ago
LICENSE Initial commit 5 months ago
README.md removing mention of gen-indexes (migration) 2 months ago
logs-rapports-monthly.py migrating gen-indexes to 42l/gen-indexes 2 months ago
logs-rapports.py migrating gen-indexes to 42l/gen-indexes 2 months ago

README.md

stats-tools

Set of tools used to generate usage statistics for 42l.

Prerequisites

You need to use logrotate to cut logs perfectly, on a weekly basis. The delicate part is to avoid having some lines of logs that slips in the wrong week's logfiles. It's a matter of seconds.

logs-rapports.py

This script generates weekly GoAccess reports through multiple logfiles (one per service).

Each service has its own script to allow more granularity on the displayed panels.

For each service, the following reports are generated:

  • Report with crawlers ;
  • Report without crawlers ;
  • Report crawlers only ;
  • Internal report with specified parameters in scripts/internal.sh ;
  • Internal JSON report.

The internal reports are meant to display more sensitive information for system administration / monitoring purposes, and are kept private.

On public reports, the sed command is used in scripts to hide one more octet from the visitor's IPs. The --anonymize-ip GoAccess parameter was hiding only one octet, which isn't enough.

The sed command doesn't affect IPv6 addresses. GoAccess will hide the last 80 bits of each IPv6 address. If this isn't satisfying enough to you, PRs are welcome :)

The script must be run as root on the host, since it must create containers.

The script interrupts at the first error, so no further unwanted damage is done to your files.

logs-rapports-monthly.py

This one will do exactly the same as the script above, but will generate monthly logs instead of weekly.

Because we are storing weekly logs, some tweaking was necessary to get perfect monthly logs (from the first day to the last day of the month, not more, not less).

The logs are stored following the ISO calendar. The script calculates which weeks contains at least one day of the concerned months, and scans those logs. The grep command is used to prevent GoAccess from scanning days that doesn't belong the the selected month.

By default, the selected month is the current month - 1.

Usability

The scripts aren't very modular, they've been made to answer specific needs and work in specific conditions that aren't always met for every infrastructure. Feel free to edit or improve them for your needs!