flexion · bdruth · Apr 16, 2025 · Apr 16, 2025 · Apr 16, 2025 · Apr 16, 2025
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,38 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Commands
+- Install dependencies: `cd src && uv pip install -r requirements.txt`
+- Run Python scripts: `cd src && uv run <script_name>.py`
+- Run locally: `cd src && uv run -m uvicorn api.app:app --host 0.0.0.0 --port 8000`
+- Build Docker: `cd scripts && bash ./push-to-ecr.sh`
+- Lint: `pipx run ruff check`
+- Format: `pipx run ruff format`
+
+## Environment Configuration
+- We use `uv` instead of regular `python` or `pip` commands
+- Enable debug mode: `export DEBUG=true`
+- Set AWS region: `export AWS_REGION=us-east-1`
+- Custom models are enabled by default
+
+## Code Style
+- Python version: 3.12
+- Line length: 120 characters max
+- Indentation: 4 spaces
+- Quote style: Double quotes for strings
+- Imports: Group (standard lib, third-party, internal) and alphabetically sorted
+- Use FastAPI patterns for API development
+- Type annotations required for all functions and classes
+- Use abstract base classes (ABC) for interfaces
+- Snake case for variables/functions, PascalCase for classes
+- Explicit error handling with specific exception types
+- Use HTTPException for API errors
+- Document public functions with docstrings
+
+## Architecture
+This project provides OpenAI-compatible RESTful APIs for Amazon Bedrock models, making it easy to use AWS foundation models without changing existing code that uses OpenAI APIs.
+
+## Testing
+- Test API functionality: `cd src && uv run test_api.py`
+- Test custom models: `cd src && uv run test_custom_models.py`
diff --git a/CUSTOM_MODELS_IMPLEMENTATION.md b/CUSTOM_MODELS_IMPLEMENTATION.md
@@ -0,0 +1,123 @@
+# Custom Imported Models Implementation
+
+## Overview
+
+This document describes the implementation of custom imported model support in the Bedrock Access Gateway. Custom models are models that you have imported into Bedrock, and this feature allows you to use them through the OpenAI-compatible API interface just like foundation models.
+
+## User-Friendly Model IDs
+
+One of the key features of this implementation is the creation of user-friendly model IDs that include the model name. Instead of cryptic AWS IDs like `custom.a1b2c3d4`, models are presented with descriptive IDs in the format:
+
+```
+{model-name}-id:custom.{aws_id}
+```
+
+For example: `mistral-7b-instruct-id:custom.a1b2c3d4`
+
+This makes it easier to identify models when using the Models API, while maintaining compatibility with the original AWS ID format.
+
+## Changes Made
+
+1. **Model Discovery**
+   - Extended `list_bedrock_models()` to include:
+     - Custom models via `bedrock_client.list_custom_models()`
+     - Imported models via `bedrock_client.list_imported_models()`
+   - Created user-friendly model IDs that include the model name
+   - Added type field to model metadata to distinguish between "foundation" and "custom" models
+   - Added region information to each model to support cross-region invocation
+   - Stored model ARN for custom models for invocation purposes
+
+2. **Configuration**
+   - Custom models are always enabled by default
+   - Added retry configuration for custom model invocation
+   - The implementation uses the local AWS region by default
+
+3. **Model Validation**
+   - Added ID transformation logic in the `validate()` method to handle both:
+     - Descriptive model IDs (e.g., `mistral-7b-instruct-id:custom.a1b2c3d4`)
+     - Original AWS IDs (e.g., `custom.a1b2c3d4`)
+   - Stores the original display ID to preserve it in responses
+
+4. **Model Invocation**
+   - Added branching logic in `_invoke_bedrock()` to handle custom models differently
+   - Implemented `_invoke_custom_model()` method to handle custom model invocation via `InvokeModel`/`InvokeModelWithResponseStream`
+   - Added custom model response parsing to handle various model output formats
+   - Implemented special handling for `ModelNotReadyException`
+   - Added region-specific client creation for cross-region models
+
+5. **Streaming Support**
+   - Added `_handle_custom_model_stream()` method to handle streaming responses
+   - Added support for parsing different streaming formats from custom models
+
+6. **Message Formatting**
+   - Implemented `_create_prompt_from_messages()` to convert OpenAI-style chat messages to text format for custom models
+
+7. **Documentation**
+   - Updated README.md to include new feature
+   - Updated Usage.md with custom model usage examples
+   - Updated FAQs to indicate custom model support
+
+## Usage
+
+### Listing Custom Models
+
+To list available custom models in your AWS account, use the Models API:
+
+```bash
+curl -s $OPENAI_BASE_URL/models -H "Authorization: Bearer $OPENAI_API_KEY" | jq '.data[] | select(.id | startswith("custom.") or contains("-id:custom."))'
+```
+
+### Using Custom Models
+
+Custom models can be used with either their descriptive ID or the original AWS ID:
+
+```bash
+# Using descriptive ID
+curl $OPENAI_BASE_URL/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $OPENAI_API_KEY" \
+  -d '{
+    "model": "mistral-7b-instruct-id:custom.a1b2c3d4",
+    "messages": [
+      { "role": "user", "content": "Hello, world!" }
+    ]
+  }'
+
+# Using original AWS ID (also supported)
+curl $OPENAI_BASE_URL/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $OPENAI_API_KEY" \
+  -d '{
+    "model": "custom.a1b2c3d4",
+    "messages": [
+      { "role": "user", "content": "Hello, world!" }
+    ]
+  }'
+```
+
+## Troubleshooting
+
+### Model Not Found
+
+If your custom model isn't appearing:
+
+1. Check if the model exists in your AWS account:
+   ```bash
+   aws bedrock list-imported-models --region us-east-1
+   ```
+
+2. Restart the gateway to refresh the model list:
+   ```bash
+   # For Lambda deployments
+   # Go to Lambda console > Find your function > Click "Deploy new image"
+
+   # For Fargate deployments
+   # Go to ECS console > Find your cluster > Tasks tab > Stop running task
+   ```
+
+### Invocation Errors
+
+Common errors when invoking custom models:
+
+- `ModelNotReadyException`: The model is still being prepared - wait a few minutes and try again
+- `ValidationException`: Check that your input format is compatible with the model
diff --git a/README.md b/README.md
@@ -26,7 +26,8 @@ If you find this GitHub repository useful, please consider giving it a free star
 - [x] Support Embedding API
 - [x] Support Multimodal API
 - [x] Support Cross-Region Inference
-- [x] Support Reasoning (**new**)
+- [x] Support Reasoning
+- [x] Support Custom Imported Models (**new**)
 
 Please check [Usage Guide](./docs/Usage.md) for more details about how to use the new APIs.
 
@@ -230,9 +231,13 @@ Also, you can use Lambda Web Adapter + Function URL (see [example](https://githu
 
 Currently, there is no plan to support SageMaker models. This may change provided there's a demand from customers.
 
-### Any plan to support Bedrock custom models?
+### Support for Bedrock custom models
 
-Fine-tuned models and models with Provisioned Throughput are currently not supported. You can clone the repo and make the customization if needed.
+Custom imported models are now supported! You can use them just like foundation models using user-friendly model IDs in the format `{model-name}-id:custom.{aws_id}` or with the original AWS format `custom.{aws_id}`. Run the Models API to see the available custom models in your account.
+
+The user-friendly format makes it easier to identify models while maintaining backward compatibility with the original AWS IDs. For more details, see [Custom Imported Models](./docs/Usage.md#custom-imported-models) in the usage documentation.
+
+Fine-tuned models and models with Provisioned Throughput may require additional configuration.
 
 ### How to upgrade?
 

diff --git a/deployment/BedrockProxy.template b/deployment/BedrockProxy.template
@@ -8,6 +8,9 @@ Parameters:
     Type: String
     Default: anthropic.claude-3-sonnet-20240229-v1:0
     Description: The default model ID, please make sure the model ID is supported in the current region
+  EcrAccountId:
+    Type: String
+    Default: 366590864501
 Resources:
   VPCB9E5F0B4:
     Type: AWS::EC2::VPC
@@ -170,7 +173,8 @@ Resources:
         ImageUri:
           Fn::Join:
             - ""
-            - - 366590864501.dkr.ecr.
+            - - Ref: EcrAccountId
+              - ".dkr.ecr."
               - Ref: AWS::Region
               - "."
               - Ref: AWS::URLSuffix

diff --git a/deployment/BedrockProxyFargate.template b/deployment/BedrockProxyFargate.template
@@ -8,6 +8,9 @@ Parameters:
     Type: String
     Default: anthropic.claude-3-sonnet-20240229-v1:0
     Description: The default model ID, please make sure the model ID is supported in the current region
+  EcrAccountId:
+    Type: String
+    Default: 366590864501
 Resources:
   VPCB9E5F0B4:
     Type: AWS::EC2::VPC
@@ -158,7 +161,9 @@ Resources:
                 - ""
                 - - "arn:aws:ecr:"
                   - Ref: AWS::Region
-                  - :366590864501:repository/bedrock-proxy-api-ecs
+                  - ":"
+                  - Ref: EcrAccountId
+                  - :repository/bedrock-proxy-api-ecs
           - Action: ecr:GetAuthorizationToken
             Effect: Allow
             Resource: "*"
@@ -226,7 +231,8 @@ Resources:
           Image:
             Fn::Join:
               - ""
-              - - 366590864501.dkr.ecr.
+              - - Ref: EcrAccountId
+                - ".dkr.ecr."
                 - Ref: AWS::Region
                 - "."
                 - Ref: AWS::URLSuffix

diff --git a/docs/Usage.md b/docs/Usage.md
@@ -15,6 +15,7 @@ export OPENAI_BASE_URL=<API base url>
 - [Multimodal API](#multimodal-api)
 - [Tool Call](#tool-call)
 - [Reasoning](#reasoning)
+- [Custom Imported Models](#custom-imported-models)
 
 ## Models API
 
@@ -441,4 +442,79 @@ for chunk in response:
         reasoning_content += chunk.choices[0].delta.reasoning_content
     elif chunk.choices[0].delta.content:
         content += chunk.choices[0].delta.content
-```
+```
+
+## Custom Imported Models
+
+This feature allows you to use models that you've imported into Amazon Bedrock. Custom imported models can be used just like foundation models with minor differences in configuration.
+
+**Important Notes:**
+- Custom models are displayed in the Models API with user-friendly IDs in the format `{model-name}-id:custom.{aws_id}`
+- The original AWS ID format `custom.{aws_id}` is also supported for backward compatibility
+- Custom models are always enabled by default
+- Custom imported models may have different response formats than foundation models, so the gateway attempts to normalize the outputs
+- If a model is not ready yet, the API will return a 503 error with a detail message indicating the model is not ready
+
+**Example Request**
+
+First, use the Models API to get a list of available custom models:
+
+```bash
+curl -s $OPENAI_BASE_URL/models -H "Authorization: Bearer $OPENAI_API_KEY" | jq '.data[] | select(.id | contains("-id:custom.") or startswith("custom."))'
+```
+
+Then use the custom model in your chat completions (using either format):
+
+```bash
+# Using the user-friendly model ID
+curl $OPENAI_BASE_URL/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $OPENAI_API_KEY" \
+  -d '{
+    "model": "mistral-7b-instruct-id:custom.a1b2c3d4",
+    "messages": [
+        {
+            "role": "user",
+            "content": "What is the meaning of life?"
+        }
+    ],
+    "max_tokens": 500,
+    "temperature": 0.7
+}'
+
+# Using the original AWS ID format (also supported)
+curl $OPENAI_BASE_URL/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $OPENAI_API_KEY" \
+  -d '{
+    "model": "custom.a1b2c3d4",
+    "messages": [
+        {
+            "role": "user",
+            "content": "What is the meaning of life?"
+        }
+    ],
+    "max_tokens": 500,
+    "temperature": 0.7
+}'
+```
+
+**Example Python SDK Usage**
+
+```python
+from openai import OpenAI
+
+client = OpenAI()
+completion = client.chat.completions.create(
+    # Can use either format:
+    # model="mistral-7b-instruct-id:custom.a1b2c3d4" # User-friendly format
+    model="custom.a1b2c3d4",                      # Original AWS format
+    messages=[{"role": "user", "content": "What is the meaning of life?"}],
+    max_tokens=500,
+    temperature=0.7
+)
+
+print(completion.choices[0].message.content)
+```
+
+For more details on the implementation, see [CUSTOM_MODELS_IMPLEMENTATION.md](../CUSTOM_MODELS_IMPLEMENTATION.md).