Using Airflow to implement our ETL pipelines.
[TOC]
We use uv to manage dependencies and virtual environment.
Below are the steps to create a virtual environment using uv:
To create a virtual environment, run the following command:
uv sync
By default, uv sets up the virtual environment in .venv
After creating the virtual environment, activate it using the following command:
source .venv/bin/activate
When you're done working in the virtual environment, you can deactivate it with:
deactivate
- For development or testing, run
cp .env.template .env.staging
. For production, runcp .env.template .env.production
. - Follow the instructions in
.env.<staging|production>
and fill in your secrets. If you are running the staging instance for development as a sandbox and do not need to access any specific third-party services, leaving.env.staging
as-is should be fine.
Contact the maintainer if you don't have these secrets.
⚠ WARNING: About .env Please do not use the .env file for local development, as it might affect the production tables.
- Set up the Authentication for GCP: https://googleapis.dev/python/google-api-core/latest/auth.html
- After running
gcloud auth application-default login
, you will get a credentials.json file located at$HOME/.config/gcloud/application_default_credentials.json
. - Run
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/keyfile.json"
if you have it.
- After running
service-account.json
: Please contact @david30907d via email or Discord. You do not need this json file if you are running the sandbox staging instance for development.
If you are a developer 👨💻, please check the Contributing Guide.
If you are a maintainer 👨🔧, please check the Maintenance Guide.
# point the database to local "sqlite/airflow.db"
# Run "uv run airflow db migrate" if the file does not exist
export AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=`pwd`/sqlite/airflow.db
# point the airflow home to current directory
export AIRFLOW_HOME=`pwd`
# Run standalone airflow
# Note that there may be slight differences between using this command and running through docker compose
# However, the difference should not be noticeable in most cases.
uv run airflow standalone
# Build the local dev/test image
make build-dev
# Start dev/test services
make deploy-dev
# Stop dev/test services
make down-dev
The difference between production and dev/test compose files is that the dev/test compose file uses a locally built image, while the production compose file uses the image from Docker Hub.
If you are an authorized maintainer, you can pull the image from the [GCP Artifact Registry].
Docker client must be configured to use the [GCP Artifact Registry].
gcloud auth configure-docker asia-east1-docker.pkg.dev
Then, pull the image:
docker pull asia-east1-docker.pkg.dev/pycontw-225217/data-team/pycon-etl:{tag}
Available tags:
cache
: cache the image for faster deploymenttest
: for testing purposes, including the test dependenciesstaging
: when pushing to the staging environmentlatest
: when pushing to the production environment