Swiss AI center's POC for the Swiss Parliament
- ParlementAIre
- HEIA-FR's team :
- Vaccarelli Ornella
- Brenot Eden
- Hennebert Jean
- HE-Arc's team :
- De Salis Emmanuel
- Marques Reis Henrique
- Ghorbel Hatem
We recommend using uv for faster, reliable installs. Install uv and see docs at: docs.astral.sh/uv
After creating or selecting your virtual environment, synchronize dependencies from pyproject.toml:
- Base deps:
uv sync - Include dev extras:
uv sync --extra dev - Include UI extras:
uv sync --extra ui - Include server extras:
uv sync --extra serve
Add or update dependencies:
- Add runtime dep:
uv add <package> - Add dev dep:
uv add --dev <package>
Run commands in the environment with uv run:
- Example:
uv run python -V
Below is the complete list supported by the backend, data pipelines, and frontend. Create a .env file in the project root.
PRODUCTION(default:false): When true, the API forces the open-source stack (Ollama + HuggingFace) and will ignore non-OSS settings (Anthropic, Jina) for safety.BACKEND_ROOT_PATH(optional): Serve the API under a subpath (e.g.,parlementaire). Endpoints become/<BACKEND_ROOT_PATH>/....BACKEND_PORT(default: 8099): Host port to expose the FastAPI backend container (mapped to 8080 inside).FRONTEND_PORT(default: 8098): Host port to expose the built Vue app (Nginx).
LLM_BASE_URL: Base URL of the LLM server (e.g., Ollama gateway).LLM_TEMPERATURE: Sampling temperature for LLM responses.EMBEDDINGS_MODEL_NAME: Primary embeddings model name (e.g.,bge-m3:latest).EMBEDDINGS_API_NAME: Embeddings provider name:ollama,huggingface, orjina.EMBEDDINGS_MODEL_NAME_DE(optional): Embeddings model for German.EMBEDDINGS_API_NAME_DE(optional): Provider type for the German model.NB_RETRIVED_DOCS: Default number of documents retrieved per query.ANTHROPIC_API_KEY(optional): Used only when calling Anthropic. Ignored ifPRODUCTION=true.JINA_API_KEY(optional): Used only for Jina embeddings. Ignored ifPRODUCTION=true.
MILVUS_URL: Milvus endpoint (e.g.,http://127.0.0.1:19530). Ensure it is reachable from inside the backend container.MILVUS_COLLECTION_NAME: Target collection name (e.g.,Parlementaire_docling).
DATA_DIR: Base directory for local mandate documents used by pipelines.FEDLEX_JSON_DATA: Path to Fedlex JSON (./data/fedlex-data/eli).FEDLEX_PDF_DATA_PATH: Root of Fedlex PDFs (./data/fedlex-pdfs).FEDLEX_HTML_PATH: Root of fetched Fedlex HTML (./data/fedlex-html).FEDLEX_START_YEAR,FEDLEX_END_YEAR: Year range for Fedlex harvesting.CURIA_START_YEAR,CURIA_END_YEAR: Year range for Curia harvesting.CURIA_DATA_PATH: Path to Curia CSV template (./data/curia-vista-!LANG!.csv).
VITE_API_BASE_URL: Absolute URL that the Vue frontend will call (e.g.,http://localhost:8099). If using a subpath, include it (e.g.,http://host/api/parlementaire).
-
Prepare
.envin the project root (see above). -
Start Milvus (required for vector search). If you don’t have an external Milvus, use the provided compose in
milvus/:
docker compose -f milvus/docker-compose.yml up -dNotes for connecting from the backend container:
- On Linux, set
MILVUS_URLto the host machine IP where Milvus listens (e.g.,http://192.168.1.10:19530). - On macOS/Windows Docker Desktop,
http://host.docker.internal:19530usually works.
- Build and run backend + frontend:
docker compose up -d --buildThe backend will be reachable at http://<host>:${BACKEND_PORT} and the frontend at http://<host>:${FRONTEND_PORT}.
GPU: The backend compose reserves one NVIDIA GPU. If you don’t have a GPU, remove the deploy.resources.reservations.devices block from docker-compose.yml.
To deploy Milvus run docker compose up -d in the milvus/ folder.
⚠️ For hybrid (dense + sparse) search to work, the version of milvus-standalone must be >= v2.5.4.
Unfortunately, this procedure is kinda tedious and requires some setup and only work on Linux based systems. (on windows you have to use WSL)
First we must install the milvus backup tool.
On MAC you could use the following command (this is not yet tested) :
brew install zilliztech/tap/milvus-backupOtherwise you have to download the binaries from the release page and extract the archive.
⚠️ The version of Milvus-Backup must be >= v.0.5.2
After that you have to create (at the same place as the binary) a folder named configs and a file named backup.yaml inside it.
You can find an exemple of the backup.yaml file inside this repository in the folder milvus/backups/configs.
Replace the XXXX in the backup.yaml file with the correct ip addresses of the different components.
To backup the data you have to run the following command :
./milvus-backup create # -n <backup_name> optionalThen if you want to transfer the backup to another machine you have to go to the minio console (MINIO_IP:9001) and download the backup. You should login with the credentials specified in the backup.yaml file and you will find the backup inside the bucket specified in the config in the folder backup.
First you have to extract the backup archive anywhere then upload the backup folder in minio in the bucket specified in the backup.yaml file. Inside the backup folder if you don't have one create it.
After this is done use the following command to restore the data:
./milvus-backup restore -n <backup_name> #if a name was specified during the backup use the name of the backup folderAfter the collection is fully restored connect to the database with Attu and load the collection into memory by creating the index.
Click on the following field in the collection :
Then create a Dense index like so:
- Index type : HNSW
- Metric type : COSINE
- M: 48
- efConstruction: 400
And a Sparse index like so:
- Index type : AUTOINDEX
- Metric type : BM25
⚠️ To support sparse indexing, the Attu version must be >= v2.5
After a few moments the indexation should be created and you will be able to load the collection by clicking the following button:
The process is finished once the status shows "loaded"; then you can start querying the database.
Attu is a web interface to interact with Milvus.
To deploy Attu run this command :
docker run -p 3000:3000 -e MILVUS_URL=MILVUS_URL:19530 zilliz/attu:v2.5Backend (FastAPI):
uv sync --extra serve
cp .env .env.local # optional copy; export vars from .env into your shell
uv run uvicorn controller.controller:app --host 0.0.0.0 --port 8080 --reloadFrontend (Vue): build via Docker Compose or use the Vue dev server in src/vue-frontend if preferred. Ensure VITE_API_BASE_URL points to your backend (e.g., http://localhost:8080).
If you serve the API under a subpath, set BACKEND_ROOT_PATH (e.g., parlementaire). All endpoints are prefixed (e.g., /parlementaire/chat). Make sure the frontend VITE_API_BASE_URL includes this subpath.
- The backend healthcheck probes
http://localhost:8080/${BACKEND_ROOT_PATH}inside the container. AdjustBACKEND_ROOT_PATHif using a subpath. - Feedback CSV is persisted to
./data/feedback/feedback.csv(mounted into the backend container).
The repository includes a simple GitLab CI job to deploy on pushes to main.
Important:
- This pipeline was built for HE-Arc's infrastructure. To use it as-is, set up your own GitLab Runner with the tag
lambda1-deployand ensure the repository lives at/home/gitlab-runner/dev/ParlementAIreon the runner host, or adjust the script paths and tag in.gitlab-ci.yml. - File:
.gitlab-ci.yml• Runner tag:lambda1-deploy• Trigger: pushes tomain.
What it does:
- SSH runner checks out the latest code in
/home/gitlab-runner/dev/ParlementAIre(viagit fetch && git pull). - Exports environment variables from the CI context into the shell.
- Runs
docker compose down,docker compose build --no-cache, anddocker compose -p parlementaire_deploy up -d.
Required CI variables (set in GitLab → Settings → CI/CD → Variables):
- Core/LLM/Embeddings:
LLM_BASE_URL,LLM_TEMPERATURE,EMBEDDINGS_MODEL_NAME,EMBEDDINGS_API_NAME,ANTHROPIC_API_KEY(optional),JINA_API_KEY(optional),PRODUCTION. - Milvus:
MILVUS_URL,MILVUS_COLLECTION_NAME. - Backend pathing:
BACKEND_ROOT_PATH(optional, for subpath deployments). - Frontend:
VITE_API_BASE_URL_PROD(used to setVITE_API_BASE_URLduring the build step). - Optional:
BACKEND_PORT,FRONTEND_PORTif you want non-default ports (8099/8098 by default).
Notes:
- The job relies on
docker composeavailable on the runner and expects GPU if your compose keeps the NVIDIA reservation; remove that block if your runner has no GPU. - The working directory
/home/gitlab-runner/dev/ParlementAIreshould contain the repo withdocker-compose.yml. - The compose project name is
parlementaire_deploy; adjust if you want side-by-side stacks.
The project ships two main ingestion pipelines that populate Milvus with searchable chunks:
- Fedlex: crawl HTML pages, normalize with Docling, then index into Milvus.
- CuriaVista: fetch parliamentary business data via the Swiss Parliament API, transform, then index.
Before you start:
- Ensure Milvus is running and reachable (see Milvus deployment). Set
MILVUS_URLandMILVUS_COLLECTION_NAMEin.env. - Ensure an embeddings service is available (default uses Ollama). Set
LLM_BASE_URLand embeddings envs accordingly. - Install Python deps:
uv sync(add--extra devif needed). For Playwright: install browsers once withuv run python -m playwright install.
- Crawl and store HTML
Script: src/ParlementAire/fedlex-processing/html-pipeline.py
Uses Playwright to visit Fedlex, extract the main content, normalize a few tags, and write:
- Cleaned HTML files to
${FEDLEX_HTML_PATH}/files/*.html - A CSV index to
${FEDLEX_HTML_PATH}/fedlex_law_items.csv
Required env:
FEDLEX_HTML_PATH(e.g.,./data/fedlex-html)
Run:
# one-time: install Playwright browsers
uv run python -m playwright install
# crawl
uv run python src/ParlementAire/fedlex-processing/html-pipeline.py- Convert with Docling and index to Milvus
Script: src/ParlementAire/fedlex-processing/docling-extractor.py
Reads the CSV and HTML files from step 1, converts to Markdown using Docling, splits into chunks, and indexes both dense and sparse vectors into Milvus. Also writes JSON dumps per language:
${FEDLEX_HTML_PATH}/fedlex_documents_fr.json${FEDLEX_HTML_PATH}/fedlex_documents_de.json
Required env:
FEDLEX_HTML_PATHMILVUS_URLMILVUS_COLLECTION_NAMELLM_BASE_URL(if embeddings provider uses Ollama)
Run:
uv run python src/ParlementAire/fedlex-processing/docling-extractor.pyNotes:
- The script creates collections suffixed by language (e.g.,
<COLLECTION>_fr,<COLLECTION>_de) when indexing. - Ensure Milvus index parameters and hybrid search are supported by your Milvus version (≥ 2.5.4 as noted above).
Script: src/ParlementAire/curiavista-processing/pipeline.py
This pipeline:
- Fetches business items using
swissparlpy, filtered by years and types per language. - Writes a CSV to
${CURIA_DATA_PATH}(with!LANG!replaced by the language code). - Transforms and indexes content into Milvus using the embeddings service.
Required env:
CURIA_START_YEAR,CURIA_END_YEARCURIA_DATA_PATH(e.g.,./data/curia-vista-!LANG!.csv)MILVUS_URL,MILVUS_COLLECTION_NAMELLM_BASE_URL(embeddings)
Run (French):
uv run python src/ParlementAire/curiavista-processing/pipeline.pyLanguage:
- The script currently processes
lang = "fr"by default. To index German, setlang = "de"in the script (or adapt it to accept a CLI argument) and rerun.
Outputs:
- CSV at
${CURIA_DATA_PATH}with the!LANG!placeholder replaced by the active language. - Embedded chunks in Milvus under your configured collection.


