ApeRAG is a comprehensive RAG (Retrieval-Augmented Generation) platform designed for building advanced, enterprise-grade AI applications. It integrates LightRAG to enhance its capabilities, notably enabling powerful graph-based query and knowledge retrieval. Key features include:
- Versatile Document Processing: Efficiently parses various document types.
- Flexible Data Management: Utilizes FastAPI backend with Celery for asynchronous tasks, supporting databases like PostgreSQL, Qdrant, Neo4j, and Elasticsearch.
- Dynamic Frontend: Modern user interface built with React and TypeScript (UmiJS).
- Advanced RAG Pipelines: Supports embedding generation, hybrid search (vector, keyword, and graph), and sophisticated workflow automation for complex RAG scenarios.
- LLM Integration: Seamlessly connects with Large Language Models.
This section will guide you through setting up ApeRAG using different methods.
This guide covers deploying ApeRAG to Kubernetes using the provided Helm chart. It involves two main phases: setting up databases (optional if you have them) and deploying the ApeRAG application.
Phase 1: Deploy Databases with KubeBlocks (Optional)
ApeRAG needs PostgreSQL, Redis, Qdrant, and Elasticsearch. If you don't have these, use the KubeBlocks scripts in deploy/databases/
.
Skip this phase if your databases are already available in your Kubernetes cluster.
-
Prerequisites:
- Kubernetes cluster.
kubectl
configured.- Helm v3+.
-
Database Configuration (
deploy/databases/00-config.sh
): This script controls database deployment (defaults: PostgreSQL, Redis, Qdrant, Elasticsearch in thedefault
namespace). Defaults are usually fine; no changes needed for a standard setup. Edit only for advanced cases (e.g., changing namespace, enabling optional databases like Neo4j). -
Run Database Deployment Scripts:
cd deploy/databases/ bash ./01-prepare.sh # Prepares KubeBlocks environment. bash ./02-install-database.sh # Deploys database clusters. cd ../.. # Back to project root.
Monitor pods in the
default
namespace (or your custom one) until ready:kubectl get pods -n default
Phase 2: Deploy ApeRAG Application
With databases running:
-
Helm Chart Configuration (
deploy/aperag/values.yaml
):- Using KubeBlocks (Phase 1 in
default
namespace)? Database connections invalues.yaml
are likely pre-configured. No changes usually needed. - Using your own databases? You MUST update
values.yaml
with your database connection details. - Optionally, review other settings (images, resources, Ingress, etc.).
- Using KubeBlocks (Phase 1 in
-
Deploy ApeRAG with Helm: This installs ApeRAG to the
default
namespace:helm install aperag ./deploy/aperag --namespace default --create-namespace
Monitor ApeRAG pods until
Running
:kubectl get pods -n default -l app.kubernetes.io/instance=aperag
-
Access ApeRAG UI: Use
kubectl port-forward
for quick access:kubectl port-forward svc/aperag-frontend 3000:3000 -n default
Open
http://localhost:3000
in your browser.
For KubeBlocks details (credentials, uninstall), see deploy/databases/README.md
.
This guide is for developers looking to contribute to ApeRAG or run it locally for development. Follow these steps to get ApeRAG running from the source code:
1. Clone the Repository
First, get the source code:
git clone https://github.com/apecloud/ApeRAG.git
cd ApeRAG
2. System Prerequisites
Before you begin, ensure your system has:
- Python 3.11: The project uses Python 3.11. If it's not your system default,
uv
(see below) will attempt to use it when creating the virtual environment if available. - Node.js: Version 20 or higher is recommended for frontend development.
uv
: This is a fast Python package installer and virtual environment manager.- If you don't have
uv
, themake install
command (Step 3) will try to install it viapip
.
- If you don't have
- Docker: (Recommended for local databases) If you plan to run dependent services like PostgreSQL, Redis, etc., locally, Docker is the easiest way. The
make run-db
command uses Docker Compose.
3. Install Dependencies & Setup Virtual Environment
This crucial make
command automates several setup tasks:
make install
This command will:
- Verify or install
uv
. - Create a Python 3.11 virtual environment (located in
.venv/
) usinguv
. - Install all Python backend dependencies (including development tools) from
pyproject.toml
into the virtual environment. - Install frontend Node.js dependencies using
yarn
.
4. Configure Environment Variables
ApeRAG uses .env
files for configuration.
-
Backend (
.env
): Copy the template and customize it for your setup.cp envs/env.template .env
Then, edit the newly created
.env
file.Note: If you start the required database services using the
make run-db
command (see Step 5), the default connection settings in the.env
file (copied fromenvs/env.template
) are pre-configured to work with these services, and you typically won't need to change them. You would only need to modify these if you are connecting to externally managed databases or have custom configurations. -
Frontend (
frontend/.env
) (Optional - if you are developing the frontend):cp frontend/deploy/env.local.template frontend/.env
Edit
frontend/.env
if you need to change frontend-specific settings, such as the backend API URL (though defaults usually work for local development).
5. Start Databases & Apply Migrations
-
Start Database Services: If you're using Docker for local databases, the
Makefile
provides a convenient command:make run-db
-
Apply Database Migrations: Once your databases are running and configured in
.env
, set up the database schema:make migrate
6. Run ApeRAG Backend Services
These should typically be run in separate terminal windows/tabs. The make
commands will automatically use the correct Python virtual environment.
-
FastAPI Development Server:
make run-backend
This starts the main backend application. It will typically be accessible at
http://localhost:8000
and features auto-reload on code changes. -
Celery Worker & Beat:
make run-celery
This starts the Celery worker for processing asynchronous background tasks.
7. Run Frontend Development Server (Optional)
If you need to work on or view the frontend:
make run-frontend
This will start the frontend development server, usually available at http://localhost:3000
. It's configured to proxy API requests to the backend running on port 8000.
8. Access ApeRAG
With the backend (and optionally frontend) services running:
- Access the Frontend UI at
http://localhost:3000
(if started). - The Backend API is available at
http://localhost:8000
.
Now you have ApeRAG running locally from the source code, ready for development or testing!
To get started with ApeRAG using Docker Compose, follow these steps:
-
Prerequisites:
- Docker & Docker Compose
- Git
-
Environment Setup: Configure environment variables by copying the template files:
cp envs/env.template .env cp frontend/deploy/env.local.template frontend/.env
Then, edit the
.env
file to configure your AI service settings and other necessary configurations according to your needs. -
Start Services: You can start all ApeRAG services using the following
make
command:# Optional: Use Aliyun registry if in China # export REGISTRY=apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com # Start ApeRAG services make compose-up
If you need to use the
doc-ray
service for advanced document parsing (recommended for complex documents, tables, or formulas), you can start it along with other services:make compose-up WITH_DOCRAY=1
If your environment has GPUs, you can enable GPU support for
doc-ray
for better performance:make compose-up WITH_DOCRAY=1 WITH_GPU=1
About the doc-ray parsing service
ApeRAG includes a basic built-in parser for extracting text from documents like PDFs and DOCX files for RAG indexing. However, this parser may not optimally handle complex document structures, tables, or formulas.
For enhanced document parsing capabilities and more accurate content extraction, we recommend deploying the doc-ray service.
doc-ray
leverages MinerU for advanced document analysis.- When
WITH_GPU=1
is not specified,doc-ray
will run using only the CPU. In this case, it is recommended to allocate at least 4 CPU cores and 8GB+ of RAM for it. - When
WITH_GPU=1
is specified,doc-ray
will run using the GPU. It requires approximately 6GB of VRAM, along with 2 CPU cores and 8GB of RAM.
- When
-
Access ApeRAG: Once the services are up and running, open your browser and navigate to: http://localhost:3000/web/
This section focuses on the development workflow and tools provided for ApeRAG.
It's recommended to use the "Getting Started with Source Code" approach for setting up a development environment. Ensure all prerequisites are met and dependencies are installed using make install
.
The Makefile
at the root of the project provides several helpful commands to streamline development:
-
Environment & Dependencies:
make install
: Installs all necessary backend (Python) and frontend (Node.js) dependencies. It sets up a Python 3.11 virtual environment usinguv
.make dev
: Installs development tools like pre-commit hooks to ensure code quality before commits.
-
Running Services:
make run-db
: (Uses Docker Compose) Starts all required database services (PostgreSQL, Redis, Qdrant, etc.) as defined indocker-compose.yml
. Useful if you don't have these services running elsewhere.make run-backend
: Starts the FastAPI development server.make run-frontend
: Starts the UmiJS frontend development server.make run-celery
: Starts a Celery worker for processing background tasks (includes Celery Beat).make run-celery-beat
: (Note:make run-celery
usually includes Beat due to the-B
flag. This target might be redundant or for specific scenarios. Check Makefile if explicitly needed separate from worker).
-
Code Quality & Testing:
make format
: Formats Python code using Ruff and frontend code using Prettier.make lint
: Lints Python code with Ruff and frontend code.make static-check
: Performs static type checking for Python code using Mypy (if configured).make test
: Runs all automated tests (Python unit tests, integration tests).
-
Database Management:
make makemigration
: Creates new database migration files based on changes to SQLAlchemy models.make migrate
: Applies pending database migrations to your connected database.make connect-metadb
: Provides a command to connect to the primary PostgreSQL database (usually for inspection, if run viamake run-db
).
-
Generators:
make generate-models
: Generates Pydantic models from the OpenAPI schema.make generate-frontend-sdk
: Generates the frontend API client/SDK from the OpenAPI specification. Run this command whenever backend API definitions change.
-
Docker Compose (for local full-stack testing):
make compose-up
: Starts all services (backend, frontend, databases, Celery) using Docker Compose.make compose-down
: Stops all services started withmake compose-up
.make compose-logs
: Tails the logs from all services running under Docker Compose.
-
Cleanup:
make clean
: Removes temporary files, build artifacts, and caches from the development environment.
Contributing to ApeRAG involves the following typical workflow. Before starting significant work, it's a good idea to open an issue to discuss your proposed changes with the maintainers.
-
Fork and Branch:
- Fork the official ApeRAG repository to your GitHub account.
- Create a new branch from
main
for your feature or bug fix. Use a descriptive branch name (e.g.,feat/add-new-parser
orfix/login-bug
).
-
Environment Setup: Ensure your development environment is set up as described in Development Environment and Getting Started with Source Code (dependencies installed, databases running/accessible).
-
Code Implementation:
- Make your code changes in the backend (
aperag/
) or frontend (frontend/src/
) directories. - Follow Code Style: Adhere to PEP 8 for Python and standard practices for TypeScript/React. Use English for all code, comments, and documentation.
- Regularly use
make format
andmake lint
to ensure code consistency and quality.
- Make your code changes in the backend (
-
Handle API and Model Changes:
- If you change backend API endpoints (add, remove, modify parameters/responses): Update the OpenAPI specification (usually in
aperag/api/openapi.yaml
) and then runmake generate-frontend-sdk
to update the frontend client. Also, runmake generate-models
if schema components are affected. - If you change SQLAlchemy models: Run
make makemigration
to create migration files, and thenmake migrate
to apply changes to your development database.
- If you change backend API endpoints (add, remove, modify parameters/responses): Update the OpenAPI specification (usually in
-
Testing: Add unit tests for new backend logic and integration tests for API changes. Ensure all existing tests pass by running
make test
. -
Documentation: If your changes affect API specifications, user guides, or deployment processes, update the relevant documentation (e.g., OpenAPI specs, this README, files in
docs/
). -
Commit and Push:
- Make clear and concise commit messages.
- Push your branch to your fork on GitHub.
-
Submit a Pull Request (PR):
- Submit a PR from your branch to the
main
branch of the official ApeRAG repository. - Provide a clear description of your changes in the PR and link any relevant issues.
- Submit a PR from your branch to the
-
Code Review: Your PR will be reviewed by maintainers. Be prepared to address feedback and make further changes if necessary.
This section covers how to build ApeRAG container images. It's primarily for users who need to create their own builds or deploy to environments other than the ones covered in "Getting Started".
The project uses Docker and make
commands to build container images.
-
Local Platform Builds: These commands build images for your current machine's architecture.
# Build all necessary images for local platform make build-local # Build only the backend image for local platform make build-aperag-local # Build only the frontend image for local platform make build-aperag-frontend-local
-
Multi-platform Builds: These commands build images for multiple architectures (e.g., amd64, arm64). This requires Docker Buildx to be set up and configured.
# Build all necessary images for multiple platforms make build # Build only the backend image for multiple platforms make build-aperag # Build only the frontend image for multiple platforms make build-aperag-frontend
You can specify the target platforms using the
PLATFORMS
variable, for example:make build PLATFORMS=linux/amd64,linux/arm64
Refer to the "Getting Started" section for common deployment methods:
For custom deployments, you will need to adapt these methods or use the built container images with your chosen orchestration platform. Ensure all required services (databases, backend, frontend, Celery workers) are correctly configured and can communicate with each other.
ApeRAG is licensed under the Apache License 2.0. See the LICENSE file for details.