Skip to content

Fix StructuredDict with nested JSON schemas using $ref #2570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

ChiaXinLiang
Copy link

@ChiaXinLiang ChiaXinLiang commented Aug 15, 2025

Summary

Fixes #2466 - StructuredDict now properly handles nested JSON schemas with $ref references.

Problem

StructuredDict was failing when given JSON schemas containing nested $ref references because pydantic's JSON schema generator couldn't resolve them, resulting in a KeyError.

Example schema that was failing:

{
    "$defs": {
        "Tire": {
            "type": "object",
            "properties": {
                "brand": {"type": "string"},
                "size": {"type": "integer"}
            }
        }
    },
    "type": "object",
    "properties": {
        "tires": {
            "type": "array",
            "items": {"$ref": "#/$defs/Tire"}
        }
    }
}

Solution

  1. Added helper functions to detect and resolve nested references:

    • _contains_ref(): Recursively checks if a schema contains any $ref references
    • _resolve_refs(): Recursively resolves all $ref references inline
  2. Updated StructuredDict to:

    • Detect when a schema contains $defs with nested references
    • Resolve all references inline before passing to pydantic
    • Remove $defs after inlining to avoid confusion
  3. Improved check_object_json_schema:

    • Replaced hacky string matching with proper recursive checking
    • Handles unresolvable refs gracefully (external URLs, missing refs, non-standard formats)

Test plan

  • Added test_output_type_structured_dict_nested to test the main fix with nested car/tire schema
  • Added test_output_type_structured_dict_unresolvable_ref to test edge cases
  • Added comprehensive coverage tests in test_utils.py
  • All existing tests pass
  • Linting and formatting checks pass
  • Type checking with pyright passes

Example Usage

from pydantic_ai import Agent, StructuredDict

car_schema = {
    "$defs": {
        "Tire": {
            "type": "object",
            "properties": {
                "brand": {"type": "string"},
                "size": {"type": "integer"}
            }
        }
    },
    "type": "object",
    "properties": {
        "make": {"type": "string"},
        "model": {"type": "string"},
        "tires": {
            "type": "array",
            "items": {"$ref": "#/$defs/Tire"}
        }
    }
}

CarDict = StructuredDict(car_schema, name="Car")
agent = Agent('openai:gpt-4o', output_type=CarDict)
result = agent.run_sync('Create a car')
# Now works without KeyError\!

Marcus added 8 commits August 15, 2025 17:36
- Add helper functions to detect and resolve nested $ref references
- Inline resolution of refs to avoid pydantic JSON schema generator issues
- Clean implementation replacing hacky string matching
- Add comprehensive test coverage for nested schemas and edge cases
- Handle unresolvable refs gracefully (external, missing, non-standard)
- Rewrite generator expressions as explicit loops for better type inference
- Add explicit type annotation for result dict
- Maintain same functionality with clearer types
- Remove trailing whitespace
- Apply ruff formatting
- Remove private function tests to avoid pyright errors
- Add strategic pyright: ignore comments to suppress type inference issues
- Remove unnecessary cast() calls from recursive helper functions
- Maintain original function signatures for better compatibility
- All tests pass and functionality is preserved

Fixes type checking errors:
- _utils.py: Type of "value" and "element" unknown in recursive traversal
- output.py: Argument type issues in _contains_refs and _resolve_refs
- Restore original isinstance(v, (dict, list)) checks in _contains_ref
- Use correct variable names 'v' and 'item' that match CI environment
- Add targeted pyright: ignore comments for remaining type issues
- All tests pass and functionality preserved

Resolves CI type checking failures:
- _utils.py: Type of 'v' and 'item' unknown
- output.py: Argument type mismatch in recursive calls
- Add tests for _contains_ref() with various data types (100% coverage)
- Add tests for _contains_refs() and _resolve_refs() edge cases
- Add tests for get_traceparent() with None values
- Test StructuredDict name/description branch coverage
- Fixes coverage issues in PR pydantic#2570 for nested JSON schema  support
- Combine imports from same module to satisfy ruff import formatting
- Fix pyright type ignore placement for _resolve_refs call
- All linting checks now pass (ruff and pyright)
- Move all imports of private functions to module level in test files
- Remove function-level imports (not Pythonic)
- Add additional test for _resolve_refs branch coverage
- All linting checks pass (ruff, pyright)
- Tests still pass with improved structure
@DouweM
Copy link
Collaborator

DouweM commented Aug 15, 2025

@ChiaXinLiang Thanks, I agree that until this is fixed in Pydantic we should have a workaround. Could we use the existing InlineDefsJsonSchemaTransformer though?

Marcus added 3 commits August 15, 2025 23:12
Remove trailing spaces from blank line in test_agent.py to pass pre-commit hooks
…mentation

- Replace custom _contains_refs and _resolve_refs functions with existing InlineDefsJsonSchemaTransformer
- Handles recursive references properly which our simple implementation didn't
- More maintainable by leveraging existing, well-tested infrastructure
- Gracefully handles unresolvable refs by catching UserError

As suggested by collaborator feedback on PR pydantic#2570
Keep InlineDefsJsonSchemaTransformer import inside function to avoid circular import
assert check_object_json_schema(schema_with_different_ref) == schema_with_different_ref


def test_get_traceparent_coverage():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unrelated, can we remove it?

assert NoRefsDict.__name__ == '_StructuredDict'


def test_structured_dict_name_description():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unrelated, can we remove it?

}
NoRefsDict = StructuredDict(schema_no_refs, name='NoRefs')

# These should all work without errors even with unresolvable refs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unrelated, can we remove it?

schema_with_different_ref = {'$ref': '#/definitions/Model'}
DifferentDict = StructuredDict(schema_with_different_ref, name='DifferentRef')

# Schema with no refs at all (for coverage of line 347 in output.py)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That line number will change over time, so no need to include it


# Schema with non-standard ref format
schema_with_different_ref = {'$ref': '#/definitions/Model'}
DifferentDict = StructuredDict(schema_with_different_ref, name='DifferentRef')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we verify that the schema is still as expected?

def call_tool(_: list[ModelMessage], info: AgentInfo) -> ModelResponse:
assert info.output_tools is not None
args_json = '{"make": "Toyota", "model": "Camry", "tires": [{"brand": "Michelin", "size": 17}]}'
return ModelResponse(parts=[ToolCallPart(info.output_tools[0].name, args_json)])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an assertion for the output tool schema to ensure it has the expected format?

return schema
return resolved
# For non-local refs or unresolvable refs, return the schema as-is
return schema
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change necessary? What was not working correctly before that is now?

Marcus added 3 commits August 16, 2025 00:50
- Remove unrelated test_get_traceparent_coverage from test_utils.py
- Remove unrelated test_structured_dict_name_description from test_agent.py
- Remove unrelated 'no refs' test case from test_output_type_structured_dict_unresolvable_ref
- Add assertions to verify schema transformations work correctly
- Add explanatory comment in _utils.py for the nested ref checking logic

These changes make the PR more focused on the actual fix for issue pydantic#2466
- Remove test_output_type_structured_dict_unresolvable_ref from test_agent.py
- Remove unresolvable ref test cases from test_utils.py
- Clean up formatting and remove unused import
- Keep PR focused on fixing issue pydantic#2466 only
- Add git push, git add, and git commit to allowed commands
- Enables Claude to help with version control operations
@ChiaXinLiang
Copy link
Author

Should I need to include the 'test_output_type_structured_dict_unresolvable_ref' also, or just 'test_output_type_structured_dict_nested'?

- Remove trailing whitespace from output.py line 312
- Remove extra blank lines from test_utils.py
- Ensures code passes all pre-commit checks
@ChiaXinLiang
Copy link
Author

When I removed the test_output_type_structured_dict_unresolvable_ref test function.

The removed tests were covering these edge cases:

  • Non-local refs (e.g., http://example.com/schemas/model.json) - line 95 in _utils.py
  • Missing definition refs (e.g., ref to non-existent $defs) - line 95 in _utils.py
  • UserError exception handling in InlineDefsJsonSchemaTransformer - line 316 in output.py
  • List traversal in _contains_ref() function - line 114 in _utils.py

@DouweM
Copy link
Collaborator

DouweM commented Aug 16, 2025

@ChiaXinLiang Just a heads-up that I'll be out this coming week and will be back the 25th. Assuming this is not urgent I'll review it then. If it is, please ping Kludex! Appreciate the patience :)

@ChiaXinLiang
Copy link
Author

Got it, thanks. No urgency on my end. Enjoy your week off!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StructuredDict passed into run cannot handle nested JsonSchema
2 participants