Skip to content

swiss-ai-center/Parlementaire

Repository files navigation

ParlementAIre

Swiss AI center's POC for the Swiss Parliament

Table of contents

Authors

  • HEIA-FR's team :
    • Vaccarelli Ornella
    • Brenot Eden
    • Hennebert Jean
  • HE-Arc's team :
    • De Salis Emmanuel
    • Marques Reis Henrique
    • Ghorbel Hatem

Python requirements installation

We recommend using uv for faster, reliable installs. Install uv and see docs at: docs.astral.sh/uv

After creating or selecting your virtual environment, synchronize dependencies from pyproject.toml:

  • Base deps: uv sync
  • Include dev extras: uv sync --extra dev
  • Include UI extras: uv sync --extra ui
  • Include server extras: uv sync --extra serve

Add or update dependencies:

  • Add runtime dep: uv add <package>
  • Add dev dep: uv add --dev <package>

Run commands in the environment with uv run:

  • Example: uv run python -V

Environment variables

Below is the complete list supported by the backend, data pipelines, and frontend. Create a .env file in the project root.

Core/Runtime

  • PRODUCTION (default: false): When true, the API forces the open-source stack (Ollama + HuggingFace) and will ignore non-OSS settings (Anthropic, Jina) for safety.
  • BACKEND_ROOT_PATH (optional): Serve the API under a subpath (e.g., parlementaire). Endpoints become /<BACKEND_ROOT_PATH>/....
  • BACKEND_PORT (default: 8099): Host port to expose the FastAPI backend container (mapped to 8080 inside).
  • FRONTEND_PORT (default: 8098): Host port to expose the built Vue app (Nginx).

LLM and Embeddings

  • LLM_BASE_URL: Base URL of the LLM server (e.g., Ollama gateway).
  • LLM_TEMPERATURE: Sampling temperature for LLM responses.
  • EMBEDDINGS_MODEL_NAME: Primary embeddings model name (e.g., bge-m3:latest).
  • EMBEDDINGS_API_NAME: Embeddings provider name: ollama, huggingface, or jina.
  • EMBEDDINGS_MODEL_NAME_DE (optional): Embeddings model for German.
  • EMBEDDINGS_API_NAME_DE (optional): Provider type for the German model.
  • NB_RETRIVED_DOCS: Default number of documents retrieved per query.
  • ANTHROPIC_API_KEY (optional): Used only when calling Anthropic. Ignored if PRODUCTION=true.
  • JINA_API_KEY (optional): Used only for Jina embeddings. Ignored if PRODUCTION=true.

Vector store (Milvus)

  • MILVUS_URL: Milvus endpoint (e.g., http://127.0.0.1:19530). Ensure it is reachable from inside the backend container.
  • MILVUS_COLLECTION_NAME: Target collection name (e.g., Parlementaire_docling).

Data sources and ingestion

  • DATA_DIR: Base directory for local mandate documents used by pipelines.
  • FEDLEX_JSON_DATA: Path to Fedlex JSON (./data/fedlex-data/eli).
  • FEDLEX_PDF_DATA_PATH: Root of Fedlex PDFs (./data/fedlex-pdfs).
  • FEDLEX_HTML_PATH: Root of fetched Fedlex HTML (./data/fedlex-html).
  • FEDLEX_START_YEAR, FEDLEX_END_YEAR: Year range for Fedlex harvesting.
  • CURIA_START_YEAR, CURIA_END_YEAR: Year range for Curia harvesting.
  • CURIA_DATA_PATH: Path to Curia CSV template (./data/curia-vista-!LANG!.csv).

Frontend build

  • VITE_API_BASE_URL: Absolute URL that the Vue frontend will call (e.g., http://localhost:8099). If using a subpath, include it (e.g., http://host/api/parlementaire).

Deployment

Quickstart (Docker)

  1. Prepare .env in the project root (see above).

  2. Start Milvus (required for vector search). If you don’t have an external Milvus, use the provided compose in milvus/:

docker compose -f milvus/docker-compose.yml up -d

Notes for connecting from the backend container:

  • On Linux, set MILVUS_URL to the host machine IP where Milvus listens (e.g., http://192.168.1.10:19530).
  • On macOS/Windows Docker Desktop, http://host.docker.internal:19530 usually works.
  1. Build and run backend + frontend:
docker compose up -d --build

The backend will be reachable at http://<host>:${BACKEND_PORT} and the frontend at http://<host>:${FRONTEND_PORT}.

GPU: The backend compose reserves one NVIDIA GPU. If you don’t have a GPU, remove the deploy.resources.reservations.devices block from docker-compose.yml.

Milvus deployment

To deploy Milvus run docker compose up -d in the milvus/ folder.

⚠️ For hybrid (dense + sparse) search to work, the version of milvus-standalone must be >= v2.5.4.

Milvus backups and restores

Unfortunately, this procedure is kinda tedious and requires some setup and only work on Linux based systems. (on windows you have to use WSL)

First we must install the milvus backup tool.

On MAC you could use the following command (this is not yet tested) :

brew install zilliztech/tap/milvus-backup

Otherwise you have to download the binaries from the release page and extract the archive.

⚠️ The version of Milvus-Backup must be >= v.0.5.2

After that you have to create (at the same place as the binary) a folder named configs and a file named backup.yaml inside it.

You can find an exemple of the backup.yaml file inside this repository in the folder milvus/backups/configs.

Replace the XXXX in the backup.yaml file with the correct ip addresses of the different components.

Backuping the data

To backup the data you have to run the following command :

./milvus-backup create # -n <backup_name> optional

Then if you want to transfer the backup to another machine you have to go to the minio console (MINIO_IP:9001) and download the backup. You should login with the credentials specified in the backup.yaml file and you will find the backup inside the bucket specified in the config in the folder backup.

Restoring the data

First you have to extract the backup archive anywhere then upload the backup folder in minio in the bucket specified in the backup.yaml file. Inside the backup folder if you don't have one create it.

After this is done use the following command to restore the data:

./milvus-backup restore -n <backup_name> #if a name was specified during the backup use the name of the backup folder

After the collection is fully restored connect to the database with Attu and load the collection into memory by creating the index.

Step 1 create the index

Click on the following field in the collection :

attu load index

Then create a Dense index like so:

  • Index type : HNSW
  • Metric type : COSINE
  • M: 48
  • efConstruction: 400

And a Sparse index like so:

  • Index type : AUTOINDEX
  • Metric type : BM25

⚠️ To support sparse indexing, the Attu version must be >= v2.5

Step 2 load the collection

After a few moments the indexation should be created and you will be able to load the collection by clicking the following button:

attu load index

The process is finished once the status shows "loaded"; then you can start querying the database.

Attu deployment

Attu is a web interface to interact with Milvus.

To deploy Attu run this command :

docker run -p 3000:3000 -e MILVUS_URL=MILVUS_URL:19530 zilliz/attu:v2.5

Local development (without Docker)

Backend (FastAPI):

uv sync --extra serve
cp .env .env.local  # optional copy; export vars from .env into your shell
uv run uvicorn controller.controller:app --host 0.0.0.0 --port 8080 --reload

Frontend (Vue): build via Docker Compose or use the Vue dev server in src/vue-frontend if preferred. Ensure VITE_API_BASE_URL points to your backend (e.g., http://localhost:8080).

Reverse proxy and subpath

If you serve the API under a subpath, set BACKEND_ROOT_PATH (e.g., parlementaire). All endpoints are prefixed (e.g., /parlementaire/chat). Make sure the frontend VITE_API_BASE_URL includes this subpath.

Health checks and feedback

  • The backend healthcheck probes http://localhost:8080/${BACKEND_ROOT_PATH} inside the container. Adjust BACKEND_ROOT_PATH if using a subpath.
  • Feedback CSV is persisted to ./data/feedback/feedback.csv (mounted into the backend container).

CI/CD (GitLab)

The repository includes a simple GitLab CI job to deploy on pushes to main.

Important:

  • This pipeline was built for HE-Arc's infrastructure. To use it as-is, set up your own GitLab Runner with the tag lambda1-deploy and ensure the repository lives at /home/gitlab-runner/dev/ParlementAIre on the runner host, or adjust the script paths and tag in .gitlab-ci.yml.
  • File: .gitlab-ci.yml • Runner tag: lambda1-deploy • Trigger: pushes to main.

What it does:

  1. SSH runner checks out the latest code in /home/gitlab-runner/dev/ParlementAIre (via git fetch && git pull).
  2. Exports environment variables from the CI context into the shell.
  3. Runs docker compose down, docker compose build --no-cache, and docker compose -p parlementaire_deploy up -d.

Required CI variables (set in GitLab → Settings → CI/CD → Variables):

  • Core/LLM/Embeddings: LLM_BASE_URL, LLM_TEMPERATURE, EMBEDDINGS_MODEL_NAME, EMBEDDINGS_API_NAME, ANTHROPIC_API_KEY (optional), JINA_API_KEY (optional), PRODUCTION.
  • Milvus: MILVUS_URL, MILVUS_COLLECTION_NAME.
  • Backend pathing: BACKEND_ROOT_PATH (optional, for subpath deployments).
  • Frontend: VITE_API_BASE_URL_PROD (used to set VITE_API_BASE_URL during the build step).
  • Optional: BACKEND_PORT, FRONTEND_PORT if you want non-default ports (8099/8098 by default).

Notes:

  • The job relies on docker compose available on the runner and expects GPU if your compose keeps the NVIDIA reservation; remove that block if your runner has no GPU.
  • The working directory /home/gitlab-runner/dev/ParlementAIre should contain the repo with docker-compose.yml.
  • The compose project name is parlementaire_deploy; adjust if you want side-by-side stacks.

Data pipelines

The project ships two main ingestion pipelines that populate Milvus with searchable chunks:

  • Fedlex: crawl HTML pages, normalize with Docling, then index into Milvus.
  • CuriaVista: fetch parliamentary business data via the Swiss Parliament API, transform, then index.

Before you start:

  • Ensure Milvus is running and reachable (see Milvus deployment). Set MILVUS_URL and MILVUS_COLLECTION_NAME in .env.
  • Ensure an embeddings service is available (default uses Ollama). Set LLM_BASE_URL and embeddings envs accordingly.
  • Install Python deps: uv sync (add --extra dev if needed). For Playwright: install browsers once with uv run python -m playwright install.

Fedlex pipeline (HTML → Docling → Milvus)

  1. Crawl and store HTML

Script: src/ParlementAire/fedlex-processing/html-pipeline.py

Uses Playwright to visit Fedlex, extract the main content, normalize a few tags, and write:

  • Cleaned HTML files to ${FEDLEX_HTML_PATH}/files/*.html
  • A CSV index to ${FEDLEX_HTML_PATH}/fedlex_law_items.csv

Required env:

  • FEDLEX_HTML_PATH (e.g., ./data/fedlex-html)

Run:

# one-time: install Playwright browsers
uv run python -m playwright install

# crawl
uv run python src/ParlementAire/fedlex-processing/html-pipeline.py
  1. Convert with Docling and index to Milvus

Script: src/ParlementAire/fedlex-processing/docling-extractor.py

Reads the CSV and HTML files from step 1, converts to Markdown using Docling, splits into chunks, and indexes both dense and sparse vectors into Milvus. Also writes JSON dumps per language:

  • ${FEDLEX_HTML_PATH}/fedlex_documents_fr.json
  • ${FEDLEX_HTML_PATH}/fedlex_documents_de.json

Required env:

  • FEDLEX_HTML_PATH
  • MILVUS_URL
  • MILVUS_COLLECTION_NAME
  • LLM_BASE_URL (if embeddings provider uses Ollama)

Run:

uv run python src/ParlementAire/fedlex-processing/docling-extractor.py

Notes:

  • The script creates collections suffixed by language (e.g., <COLLECTION>_fr, <COLLECTION>_de) when indexing.
  • Ensure Milvus index parameters and hybrid search are supported by your Milvus version (≥ 2.5.4 as noted above).

CuriaVista pipeline (API → CSV → Milvus)

Script: src/ParlementAire/curiavista-processing/pipeline.py

This pipeline:

  • Fetches business items using swissparlpy, filtered by years and types per language.
  • Writes a CSV to ${CURIA_DATA_PATH} (with !LANG! replaced by the language code).
  • Transforms and indexes content into Milvus using the embeddings service.

Required env:

  • CURIA_START_YEAR, CURIA_END_YEAR
  • CURIA_DATA_PATH (e.g., ./data/curia-vista-!LANG!.csv)
  • MILVUS_URL, MILVUS_COLLECTION_NAME
  • LLM_BASE_URL (embeddings)

Run (French):

uv run python src/ParlementAire/curiavista-processing/pipeline.py

Language:

  • The script currently processes lang = "fr" by default. To index German, set lang = "de" in the script (or adapt it to accept a CLI argument) and rerun.

Outputs:

  • CSV at ${CURIA_DATA_PATH} with the !LANG! placeholder replaced by the active language.
  • Embedded chunks in Milvus under your configured collection.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published