Skip to content

Add OpenRouter Data Extraction Support #160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

sirwolfgang
Copy link
Contributor

@sirwolfgang sirwolfgang commented Aug 7, 2025

Improvements

  • Add Tests for OpenRouter Structured Output

Bug Fixes

  • Fixed issue where backtraces were being dropped from custom raised errors
  • Fixed issue where output schemas would silently fail to load

I refactored some of the internals for the schema loading methods to improve the consistency and reliability of this system, in doing this I switch the naming around to object first ordered language (load_schema => schema_load). If your not used to this, it does read a little weird the first few times in english; but once it clicks it makes managing codebases, especially large files, a lot easier because it forces you to automatically group the concepts that can grow into concerns or other refactors. This can be reverted if so desired

Based on the documentation of OpenAI and OpenRouter the provided response_format examples don't appear to be compatible with the spec.

@sirwolfgang sirwolfgang changed the title Structured Output examples Fixes Structured Output Examples Aug 7, 2025
@TonsOfFun
Copy link
Contributor

Removing the VCR cassette data_extraction_agent_parse_resume_generation_response_with_structured_output.yml and recording again, I get a 400 error:

---
http_interactions:
- request:
    method: post
    uri: https://api.openai.com/v1/responses
    body:
      encoding: UTF-8
      string: '{"model":"gpt-4o-mini","input":[{"role":"system","content":""},{"role":"user","content":[{"type":"input_file","filename":"resume.pdf","file_data":"data:application/pdf;base64,JVBERi0xLjQKMSAwIG9iago8PAovVHlwZSAvQ2F0YWxvZwovUGFnZXMgMiAw\nIFIKPj4KZW5kb2JqCgoyIDAgb2JqCjw8Ci9UeXBlIC9QYWdlcwovS2lkcyBb\nMyAwIFJdCi9Db3VudCAxCj4+CmVuZG9iagoKMyAwIG9iago8PAovVHlwZSAv\nUGFnZQovUGFyZW50IDIgMCBSCi9NZWRpYUJveCBbMCAwIDYxMiA3OTJdCi9S\nZXNvdXJjZXMgPDwKL0ZvbnQgPDwKL0YxIDQgMCBSCj4+Cj4+Ci9Db250ZW50\ncyA1IDAgUgo+PgplbmRvYmoKCjQgMCBvYmoKPDwKL1R5cGUgL0ZvbnQKL1N1\nYnR5cGUgL1R5cGUxCi9CYXNlRm9udCAvSGVsdmV0aWNhCj4+CmVuZG9iagoK\nNSAwIG9iago8PAovTGVuZ3RoIDMwMAo+PgpzdHJlYW0KQlQKL0YxIDE2IFRm\nCjUwIDc1MCBUZAooSm9obiBEb2UgLSBTb2Z0d2FyZSBFbmdpbmVlcikgVGoK\nMCAtMzAgVGQKL0YxIDEyIFRmCihFbWFpbDogam9obi5kb2VAZXhhbXBsZS5j\nb20pIFRqCjAgLTIwIFRkCihQaG9uZTogKDU1NSkgMTIzLTQ1NjcpIFRqCjAg\nLTIwIFRkCihMb2NhdGlvbjogU2FuIEZyYW5jaXNjbywgQ0EpIFRqCjAgLTQw\nIFRkCi9GMSAxNCBUZgooRXhwZXJpZW5jZTopIFRqCjAgLTI1IFRkCi9GMSAx\nMiBUZgooU2VuaW9yIFNvZnR3YXJlIEVuZ2luZWVyIGF0IFRlY2hDb3JwICgy\nMDIwLTIwMjQpKSBUagowIC0yMCBUZAooLSBEZXZlbG9wZWQgd2ViIGFwcGxp\nY2F0aW9ucyB1c2luZyBSdWJ5IG9uIFJhaWxzKSBUagowIC0yMCBUZAooLSBM\nZWQgdGVhbSBvZiA1IGRldmVsb3BlcnMpIFRqCjAgLTIwIFRkCigtIEltcGxl\nbWVudGVkIENJL0NEIHBpcGVsaW5lcykgVGoKMCAtNDAgVGQKL0YxIDE0IFRm\nCihTa2lsbHM6KSBUagowIC0yNSBUZAovRjEgMTIgVGYKKFJ1YnksIFJhaWxz\nLCBKYXZhU2NyaXB0LCBQeXRob24sIEFXUywgRG9ja2VyKSBUagowIC00MCBU\nZAovRjEgMTQgVGYKKEVkdWNhdGlvbjopIFRqCjAgLTI1IFRkCi9GMSAxMiBU\nZgooQlMgQ29tcHV0ZXIgU2NpZW5jZSwgU3RhbmZvcmQgVW5pdmVyc2l0eSAo\nMjAxNi0yMDIwKSkgVGoKRVQKZW5kc3RyZWFtCmVuZG9iagoKeHJlZgowIDYK\nMDAwMDAwMDAwMCA2NTUzNSBmIAowMDAwMDAwMDA5IDAwMDAwIG4gCjAwMDAw\nMDAwNTggMDAwMDAgbiAKMDAwMDAwMDExNSAwMDAwMCBuIAowMDAwMDAwMjY5\nIDAwMDAwIG4gCjAwMDAwMDAzMzcgMDAwMDAgbiAKdHJhaWxlcgo8PAovU2l6\nZSA2Ci9Sb290IDEgMCBSCj4+CnN0YXJ0eHJlZgo2ODcKJSVFT0YK\n"},{"type":"input_text","text":"Parse
        the content of the file or image"}]}],"text":{"type":"json_schema","json_schema":{"name":"resume_schema","strict":true,"schema":{"type":"object","properties":{"name":{"type":"string","description":"The
        full name of the individual."},"email":{"type":"string","format":"email","description":"The
        email address of the individual."},"phone":{"type":"string","description":"The
        phone number of the individual."},"education":{"type":"array","items":{"$ref":"#/$defs/education"}},"experience":{"type":"array","items":{"$ref":"#/$defs/experience"}}},"required":["name","email","phone","education","experience"],"additionalProperties":false,"$defs":{"education":{"type":"object","properties":{"degree":{"type":"string","description":"The
        degree obtained."},"institution":{"type":"string","description":"The institution
        where the degree was obtained."},"year":{"type":"integer","description":"The
        year of graduation."}},"required":["degree","institution","year"],"additionalProperties":false},"experience":{"type":"object","properties":{"job_title":{"type":"string","description":"The
        job title held."},"company":{"type":"string","description":"The company where
        the individual worked."},"duration":{"type":"string","description":"The duration
        of employment."}},"required":["job_title","company","duration"],"additionalProperties":false}}}}}}'
    headers:
      Content-Type:
      - application/json
      Authorization:
      - Bearer <OPENAI_ACCESS_TOKEN>
      Accept-Encoding:
      - gzip;q=1.0,deflate;q=0.6,identity;q=0.3
      Accept:
      - "*/*"
      User-Agent:
      - Ruby
  response:
    status:
      code: 400
      message: Bad Request
    headers:
      Date:
      - Thu, 07 Aug 2025 01:58:23 GMT
      Content-Type:
      - application/json
      Content-Length:
      - '165'
      Connection:
      - keep-alive
      Openai-Version:
      - '2020-10-01'
      Openai-Organization:
      - user-lwlf4w2yvortlzept3wqx7li
      Openai-Project:
      - proj_pcPHiweuB88laiGDTaN3nH2M
      X-Request-Id:
      - req_60e431a98bfcf9388360eaddb3b32b16
      Openai-Processing-Ms:
      - '17'
      X-Envoy-Upstream-Service-Time:
      - '26'
      Strict-Transport-Security:
      - max-age=31536000; includeSubDomains; preload
      Cf-Cache-Status:
      - DYNAMIC
      Set-Cookie:
      - __cf_bm=BJDGnDt_epiIHazsDfJAJoC.E44Ofe1HEIDj0Qs3l_I-1754531903-1.0.1.1-8L8gsV8yNcEoo.l_z8ptHyy98Mt7_NeOphBV8DmneENNDRwNdIrP.M8RyV64Cyil.4ksSdxaV5PLNczC7ZTg08iAqVoj3V.5YTlSh_IU3_s;
        path=/; expires=Thu, 07-Aug-25 02:28:23 GMT; domain=.api.openai.com; HttpOnly;
        Secure; SameSite=None
      - _cfuvid=sfj4na5QvCdV_o5DHN2cqJ5OOygmpSQUpkOoOYBWcmY-1754531903501-0.0.1.1-604800000;
        path=/; domain=.api.openai.com; HttpOnly; Secure; SameSite=None
      X-Content-Type-Options:
      - nosniff
      Server:
      - cloudflare
      Cf-Ray:
      - 96b32b2a5978fa62-SJC
      Alt-Svc:
      - h3=":443"; ma=86400
    body:
      encoding: UTF-8
      string: |-
        {
          "error": {
            "message": "Unknown parameter: 'text.type'.",
            "type": "invalid_request_error",
            "param": "text.type",
            "code": "unknown_parameter"
          }
        }
  recorded_at: Thu, 07 Aug 2025 01:58:23 GMT
recorded_with: VCR 6.3.1

I think this is due to the current implementation with structured outputs is using the OpenAI Responses API which expects:

 "text": {
      "format": {
        "type": "json_schema",
        "name": "math_reasoning",
        ...

Where the chat API's use of structured outputs expects this:

"response_format": {
      "type": "json_schema",
      "json_schema": {
   ...

I think we can accommodate both, but the current Provider/Adapter implementation is a bit crude in this regard.

@sirwolfgang
Copy link
Contributor Author

Ahh. So that's where that format came from

@sirwolfgang sirwolfgang marked this pull request as draft August 9, 2025 17:01
@sirwolfgang sirwolfgang force-pushed the fixes-structure branch 2 times, most recently from f40b9d9 to 47951fe Compare August 9, 2025 21:04
@sirwolfgang sirwolfgang changed the title Fixes Structured Output Examples Add OpenRouter Data Extraction Support Aug 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants