Skip to content

[Ready For Review][AQUA] Add Supporting Fine-Tuned Models in Multi-Model Deployment #1186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jun 10, 2025

Conversation

mrDzurb
Copy link
Member

@mrDzurb mrDzurb commented May 16, 2025

Description

The current implementation of Multi-Model Deployment in AQUA supports base models only. Fine-tuned models, however, are a critical part of many customer workflows - allowing them to adapt base models to domain-specific use cases.
This PR introduces support for deploying fine-tuned LLM models as part of a multi-model deployment group on the VLLM container.

Implementation

In the first iteration, we will treat each selected model, whether it's a base model or a fine-tuned variant—as an independent entity. Even if multiple fine-tuned models share the same base model, each one will be deployed in its own isolated VLLM instance.

On the SMC side, we will leverage VLLM's capability to dynamically merge LoRA adapter weights during runtime. This means each VLLM instance will load the base model and its corresponding fine-tuned weights independently.

To avoid routing conflicts caused by multiple instances using the same base model name, we will route the base model name to one instance only, but we will not advertise this base model as an endpoint to users (This is current behavior with Single Model Deployment).

This configuration structure will prepare us for future enhancements, such as stacked fine-tuned deployments, where multiple fine-tuned variants are hosted under a single base model within one VLLM instance. However, this future enhancement will apply to single-model deployments initially.

In a second iteration, we will explore expanding this capability to multi-model deployments, enabling grouped deployment of fine-tuned variants with shared GPU allocation. That enhancement will require additional work across the ADS SDK, AQUA UI, and validation logic.

Related PRs

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label May 16, 2025
@mrDzurb mrDzurb requested review from elizjo and dipatidar May 16, 2025 21:40
Copy link

📌 Cov diff with main:

Coverage-24%

📌 Overall coverage:

Coverage-58.63%

@mrDzurb mrDzurb added wip Work in progress do not merge for any issue that isn't ready for merging yet labels May 23, 2025
@mrDzurb mrDzurb changed the title [WIP][AQUA] Add Supporting Fine-Tuned Models in Multi-Model Deployment [Ready For Review][AQUA] Add Supporting Fine-Tuned Models in Multi-Model Deployment May 28, 2025
@mrDzurb mrDzurb removed wip Work in progress do not merge for any issue that isn't ready for merging yet labels May 28, 2025
Copy link
Member

@VipulMascarenhas VipulMascarenhas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍

ads/aqua/app.py Outdated
@@ -284,8 +284,11 @@ def if_artifact_exist(self, model_id: str, **kwargs) -> bool:
logger.info(f"Artifact not found in model {model_id}.")
return False

@cached(cache=TTLCache(maxsize=1, ttl=timedelta(minutes=1), timer=datetime.now))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't 1 min TTL too short here? Makes sense if we have multiple request within a 1 minute interval, probably we can extend to 5 mins or so if user tries out different model combinations for a bit and we want to return cached config here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ads/aqua/app.py Outdated
raise AquaRuntimeError(f"Target model {oci_model.id} is not an Aqua model.")
logger.debug(f"Target model {oci_model.id} is not an Aqua model.")
return ModelConfigResult(config=config, model_details=oci_model)
# raise AquaRuntimeError(f"Target model {oci_model.id} is not an Aqua model.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove commented code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

total_gpus_available (int, optional): The total number of GPUs available for this shape.
"""

models: Optional[List[GPUModelAllocation]] = Field(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add protected_namespaces = () in Config since we have an attribute here with model* to avoid warnings when running CLI commands?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for model_id, config in deployment_configs.items():
# For multi model deployment, we cannot rely on .shape because some models, like Falcon-7B, can only be deployed on a single GPU card (A10.1).
# However, Falcon can also be deployed on a single card in other A10 shapes, such as A10.2.
# Our current configuration does not support this flexibility.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally we should have only relied on configuration for shape info, but this make sense.

Copy link

📌 Cov diff with main:

Coverage-84%

📌 Overall coverage:

Coverage-56.89%

Copy link

📌 Cov diff with main:

Coverage-81%

📌 Overall coverage:

Coverage-56.88%

Copy link

📌 Cov diff with main:

Coverage-81%

📌 Overall coverage:

Coverage-56.87%

@mrDzurb mrDzurb changed the title [Ready For Review][AQUA] Add Supporting Fine-Tuned Models in Multi-Model Deployment [WIP][AQUA] Add Supporting Fine-Tuned Models in Multi-Model Deployment May 30, 2025
Copy link

github-actions bot commented Jun 1, 2025

📌 Cov diff with main:

Coverage-82%

📌 Overall coverage:

Coverage-56.89%

Copy link

github-actions bot commented Jun 5, 2025

📌 Cov diff with main:

Coverage-84%

📌 Overall coverage:

Coverage-56.91%

Copy link

github-actions bot commented Jun 5, 2025

📌 Cov diff with main:

Coverage-84%

📌 Overall coverage:

Coverage-58.48%

@mrDzurb mrDzurb changed the title [WIP][AQUA] Add Supporting Fine-Tuned Models in Multi-Model Deployment [Ready For Review][AQUA] Add Supporting Fine-Tuned Models in Multi-Model Deployment Jun 5, 2025
@mrDzurb mrDzurb requested a review from VipulMascarenhas June 5, 2025 23:57
Copy link

github-actions bot commented Jun 6, 2025

📌 Cov diff with main:

Coverage-84%

📌 Overall coverage:

Coverage-58.49%

Copy link

github-actions bot commented Jun 6, 2025

📌 Cov diff with main:

Coverage-84%

📌 Overall coverage:

Coverage-58.48%

Copy link

github-actions bot commented Jun 6, 2025

📌 Cov diff with main:

Coverage-84%

📌 Overall coverage:

Coverage-58.48%

Copy link

github-actions bot commented Jun 9, 2025

📌 Cov diff with main:

Coverage-84%

📌 Overall coverage:

Coverage-58.49%

@mrDzurb mrDzurb requested a review from VipulMascarenhas June 9, 2025 18:30
Copy link

github-actions bot commented Jun 9, 2025

📌 Cov diff with main:

Coverage-84%

📌 Overall coverage:

Coverage-58.48%

Copy link

github-actions bot commented Jun 9, 2025

📌 Cov diff with main:

Coverage-84%

📌 Overall coverage:

Coverage-58.49%

Copy link
Member

@VipulMascarenhas VipulMascarenhas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added minor comment, rest looks good.

The model-by-reference path to the LoRA Module within the model artifact
"""

model_id: Optional[str] = Field(None, description="The fine tuned model OCID to deploy.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs protected_namespaces = () in config to avoid warning messages showing up when running via CLI.

@mrDzurb mrDzurb merged commit 9d51d44 into main Jun 10, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants