Simple Snowplow is a lightweight, self-hosted analytics collector compatible with the Snowplow analytics protocol. It allows you to collect event data from websites and applications while maintaining full control over your data and infrastructure.
- Compatible with Snowplow JavaScript tracker
- Collects web analytics data including page views, events, and user information
- Stores data in ClickHouse for high-performance analytics queries
- Optional SendGrid event tracking integration
- Configurable data retention and storage settings
- Horizontal scaling capabilities with ClickHouse cluster support
- Built with FastAPI for high performance
- Optional demo mode for easy testing
Simple Snowplow consists of the following components:
- FastAPI Backend: Handles incoming tracking events and forwards them to ClickHouse
- ClickHouse Database: Stores and processes analytics data
- JavaScript Tracking Libraries: Compatible with standard Snowplow js trackers
To install Simple Snowplow for local development:
-
Make sure you have Docker and Docker Compose installed
-
Clone the repository:
git clone https://github.com/yourusername/simple-snowplow.git cd simple-snowplow
-
(One‑time) Initialize ClickHouse databases & tables:
# Ensure ClickHouse service is up first (in a separate terminal) docker compose up -d clickhouse # Create tables (idempotent – safe to re-run) docker compose run app uv run python cli.py db init
-
Start the application using Docker Compose:
docker compose up
For a production environment:
- Install ClickHouse version 22.11 or later (follow the ClickHouse documentation)
- Build the Docker image:
docker build -t simple-snowplow ./simple_snowplow
- Set the configuration you need using environment variables (see Configuration)
- Run the Docker container:
docker run -d \ -p 8000:80 \ -e SNOWPLOW_ENV=production \ -e SNOWPLOW_CLICKHOUSE__CONNECTION__HOST=my-clickhouse-host \ simple-snowplow
You can supply additional configuration through more -e
flags.
Simple Snowplow is configured via a single Pydantic BaseSettings
model. The application reads values from environment variables using the SNOWPLOW_
prefix and double underscores (__
) for nesting. For example:
SNOWPLOW_COMMON__SERVICE_NAME=my-snowplow
SNOWPLOW_COMMON__DEMO=true
SNOWPLOW_CLICKHOUSE__CONNECTION__HOST=my-clickhouse-server
SNOWPLOW_SECURITY__RATE_LIMITING__ENABLED=true
The structure mirrors the configuration sections:
common
: Basic application options (service name, hostname, demo mode)clickhouse
: Connection details and table definitionslogging
: Format and level for application logssecurity
: Docs availability, HTTPS enforcement, and rate limitingproxy
: Allowed domains and paths for the analytics proxyperformance
: Connection pool and concurrency settingselastic_apm
,prometheus
,sentry
: Optional observability integrations
You can inspect the full configuration (with defaults) via the CLI from the repository root:
uv run python simple_snowplow/cli.py settings
Set SNOWPLOW_ENV
to label the running environment (e.g. production
) — the value is propagated to logging integrations such as Sentry but does not change how configuration is loaded.
Table creation has been moved out of the FastAPI startup sequence and is now handled explicitly via the CLI. This gives you predictable, repeatable migrations and avoids race conditions on multi-instance deployments.
You only need to run the init command when:
- First installation (fresh ClickHouse instance)
- You changed table-related configuration (e.g. engine, partitioning, cluster_name)
- Upgrading to a version that adds new tables / columns (future migrations)
Run in Docker Compose (after the image is built):
docker compose up -d clickhouse
docker compose run app uv run python cli.py db init
docker compose up -d app
Idempotency: The command uses CREATE DATABASE IF NOT EXISTS
and CREATE TABLE IF NOT EXISTS
; re-running is safe. If you change schema definitions (e.g. order_by
, engine
) you must manually apply migrations (dropping/recreating or performing ALTER statements) – the CLI purposefully does not perform destructive changes.
Troubleshooting:
- Connection errors: ensure the hostname matches
SNOWPLOW_CLICKHOUSE__CONNECTION__HOST
(defaults toclickhouse
inside Docker network). - Cluster setup: set
SNOWPLOW_CLICKHOUSE__CONFIGURATION__CLUSTER_NAME
before running init so distributed tables are created. - Permissions: use a ClickHouse user with
CREATE DATABASE
andCREATE TABLE
privileges.
Instead of the shell script you can use the built-in command (run from the repository root):
uv run python simple_snowplow/cli.py scripts download
This will place sp.js
, sp.js.map
, plugin bundle, and loader.js
copies in the repository-level static/
directory and adjust the source map file
field for the loader copy.
To track events from your website:
-
Include the Snowplow tracker in your HTML:
<script type="text/javascript"> (function(p,l,o,w,i,n,g){if(!p[i]){p.GlobalSnowplowNamespace=p.GlobalSnowplowNamespace||[]; p.GlobalSnowplowNamespace.push(i);p[i]=function(){(p[i].q=p[i].q||[]).push(arguments) };p[i].q=p[i].q||[];n=l.createElement(o);g=l.getElementsByTagName(o)[0];n.async=1; n.src=w;g.parentNode.insertBefore(n,g)}}(window,document,"script","//your-server.com/static/sp.js","snowplow")); snowplow('newTracker', 'sp', 'your-server.com', { appId: 'my-app', platform: 'web', post: true, forceSecureTracker: true }); snowplow('trackPageView'); </script>
-
Replace
your-server.com
with your Simple Snowplow server address
To test Simple Snowplow with the built-in demo:
- Set
common.demo = true
in your configuration - Access the demo page at
http://your-server.com/demo/
- Events will be tracked and stored in your ClickHouse database
To query the collected data, connect to your ClickHouse instance:
docker exec -it simple-snowplow-ch clickhouse-client
Example queries:
-- Get page views from the last 24 hours
SELECT time, page, refr, device_id, session_id
FROM snowplow.local
WHERE event_type = 'page_view'
AND time > now() - INTERVAL 1 DAY
ORDER BY time DESC;
-- Count events by type
SELECT event_type, count() as events
FROM snowplow.local
GROUP BY event_type
ORDER BY events DESC;
-
Connection refused to ClickHouse
- Check that ClickHouse is running and accessible
- Verify the host and port in your configuration
- Ensure firewall rules allow connections
-
No data being collected
- Check browser console for JavaScript errors
- Verify tracking endpoint is correctly configured in your tracker
- Check server logs for any errors
-
Missing data in queries
- Verify the table name (it may be different if you configured a custom name)
- Check if data partitioning is working as expected
To view application logs:
docker logs -f simple-snowplow
For more verbose logging, set logging.level = "DEBUG"
in your configuration.
-
Create a virtual environment and install uv (if not already installed):
# Install uv (if not already installed) curl -sSf https://astral.sh/uv/install.sh | sh # Or on macOS with Homebrew # brew install uv # Create a virtual environment with uv uv venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install dependencies:
uv sync
-
Install development tools:
uv pip install -G dev pre-commit install
-
Run the application locally:
cd simple_snowplow uv run uvicorn main:app --reload
uv run pytest
docker compose up --watch
This activates the development mode which automatically:
- Syncs your local code changes to the container without rebuilding
- Rebuilds the container only when dependencies change (when uv.lock is modified)
This project is licensed under the terms of the LICENSE file.