-
-
Notifications
You must be signed in to change notification settings - Fork 52
Add OpenRouter Data Extraction Support #160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Removing the VCR cassette ---
http_interactions:
- request:
method: post
uri: https://api.openai.com/v1/responses
body:
encoding: UTF-8
string: '{"model":"gpt-4o-mini","input":[{"role":"system","content":""},{"role":"user","content":[{"type":"input_file","filename":"resume.pdf","file_data":"data:application/pdf;base64,JVBERi0xLjQKMSAwIG9iago8PAovVHlwZSAvQ2F0YWxvZwovUGFnZXMgMiAw\nIFIKPj4KZW5kb2JqCgoyIDAgb2JqCjw8Ci9UeXBlIC9QYWdlcwovS2lkcyBb\nMyAwIFJdCi9Db3VudCAxCj4+CmVuZG9iagoKMyAwIG9iago8PAovVHlwZSAv\nUGFnZQovUGFyZW50IDIgMCBSCi9NZWRpYUJveCBbMCAwIDYxMiA3OTJdCi9S\nZXNvdXJjZXMgPDwKL0ZvbnQgPDwKL0YxIDQgMCBSCj4+Cj4+Ci9Db250ZW50\ncyA1IDAgUgo+PgplbmRvYmoKCjQgMCBvYmoKPDwKL1R5cGUgL0ZvbnQKL1N1\nYnR5cGUgL1R5cGUxCi9CYXNlRm9udCAvSGVsdmV0aWNhCj4+CmVuZG9iagoK\nNSAwIG9iago8PAovTGVuZ3RoIDMwMAo+PgpzdHJlYW0KQlQKL0YxIDE2IFRm\nCjUwIDc1MCBUZAooSm9obiBEb2UgLSBTb2Z0d2FyZSBFbmdpbmVlcikgVGoK\nMCAtMzAgVGQKL0YxIDEyIFRmCihFbWFpbDogam9obi5kb2VAZXhhbXBsZS5j\nb20pIFRqCjAgLTIwIFRkCihQaG9uZTogKDU1NSkgMTIzLTQ1NjcpIFRqCjAg\nLTIwIFRkCihMb2NhdGlvbjogU2FuIEZyYW5jaXNjbywgQ0EpIFRqCjAgLTQw\nIFRkCi9GMSAxNCBUZgooRXhwZXJpZW5jZTopIFRqCjAgLTI1IFRkCi9GMSAx\nMiBUZgooU2VuaW9yIFNvZnR3YXJlIEVuZ2luZWVyIGF0IFRlY2hDb3JwICgy\nMDIwLTIwMjQpKSBUagowIC0yMCBUZAooLSBEZXZlbG9wZWQgd2ViIGFwcGxp\nY2F0aW9ucyB1c2luZyBSdWJ5IG9uIFJhaWxzKSBUagowIC0yMCBUZAooLSBM\nZWQgdGVhbSBvZiA1IGRldmVsb3BlcnMpIFRqCjAgLTIwIFRkCigtIEltcGxl\nbWVudGVkIENJL0NEIHBpcGVsaW5lcykgVGoKMCAtNDAgVGQKL0YxIDE0IFRm\nCihTa2lsbHM6KSBUagowIC0yNSBUZAovRjEgMTIgVGYKKFJ1YnksIFJhaWxz\nLCBKYXZhU2NyaXB0LCBQeXRob24sIEFXUywgRG9ja2VyKSBUagowIC00MCBU\nZAovRjEgMTQgVGYKKEVkdWNhdGlvbjopIFRqCjAgLTI1IFRkCi9GMSAxMiBU\nZgooQlMgQ29tcHV0ZXIgU2NpZW5jZSwgU3RhbmZvcmQgVW5pdmVyc2l0eSAo\nMjAxNi0yMDIwKSkgVGoKRVQKZW5kc3RyZWFtCmVuZG9iagoKeHJlZgowIDYK\nMDAwMDAwMDAwMCA2NTUzNSBmIAowMDAwMDAwMDA5IDAwMDAwIG4gCjAwMDAw\nMDAwNTggMDAwMDAgbiAKMDAwMDAwMDExNSAwMDAwMCBuIAowMDAwMDAwMjY5\nIDAwMDAwIG4gCjAwMDAwMDAzMzcgMDAwMDAgbiAKdHJhaWxlcgo8PAovU2l6\nZSA2Ci9Sb290IDEgMCBSCj4+CnN0YXJ0eHJlZgo2ODcKJSVFT0YK\n"},{"type":"input_text","text":"Parse
the content of the file or image"}]}],"text":{"type":"json_schema","json_schema":{"name":"resume_schema","strict":true,"schema":{"type":"object","properties":{"name":{"type":"string","description":"The
full name of the individual."},"email":{"type":"string","format":"email","description":"The
email address of the individual."},"phone":{"type":"string","description":"The
phone number of the individual."},"education":{"type":"array","items":{"$ref":"#/$defs/education"}},"experience":{"type":"array","items":{"$ref":"#/$defs/experience"}}},"required":["name","email","phone","education","experience"],"additionalProperties":false,"$defs":{"education":{"type":"object","properties":{"degree":{"type":"string","description":"The
degree obtained."},"institution":{"type":"string","description":"The institution
where the degree was obtained."},"year":{"type":"integer","description":"The
year of graduation."}},"required":["degree","institution","year"],"additionalProperties":false},"experience":{"type":"object","properties":{"job_title":{"type":"string","description":"The
job title held."},"company":{"type":"string","description":"The company where
the individual worked."},"duration":{"type":"string","description":"The duration
of employment."}},"required":["job_title","company","duration"],"additionalProperties":false}}}}}}'
headers:
Content-Type:
- application/json
Authorization:
- Bearer <OPENAI_ACCESS_TOKEN>
Accept-Encoding:
- gzip;q=1.0,deflate;q=0.6,identity;q=0.3
Accept:
- "*/*"
User-Agent:
- Ruby
response:
status:
code: 400
message: Bad Request
headers:
Date:
- Thu, 07 Aug 2025 01:58:23 GMT
Content-Type:
- application/json
Content-Length:
- '165'
Connection:
- keep-alive
Openai-Version:
- '2020-10-01'
Openai-Organization:
- user-lwlf4w2yvortlzept3wqx7li
Openai-Project:
- proj_pcPHiweuB88laiGDTaN3nH2M
X-Request-Id:
- req_60e431a98bfcf9388360eaddb3b32b16
Openai-Processing-Ms:
- '17'
X-Envoy-Upstream-Service-Time:
- '26'
Strict-Transport-Security:
- max-age=31536000; includeSubDomains; preload
Cf-Cache-Status:
- DYNAMIC
Set-Cookie:
- __cf_bm=BJDGnDt_epiIHazsDfJAJoC.E44Ofe1HEIDj0Qs3l_I-1754531903-1.0.1.1-8L8gsV8yNcEoo.l_z8ptHyy98Mt7_NeOphBV8DmneENNDRwNdIrP.M8RyV64Cyil.4ksSdxaV5PLNczC7ZTg08iAqVoj3V.5YTlSh_IU3_s;
path=/; expires=Thu, 07-Aug-25 02:28:23 GMT; domain=.api.openai.com; HttpOnly;
Secure; SameSite=None
- _cfuvid=sfj4na5QvCdV_o5DHN2cqJ5OOygmpSQUpkOoOYBWcmY-1754531903501-0.0.1.1-604800000;
path=/; domain=.api.openai.com; HttpOnly; Secure; SameSite=None
X-Content-Type-Options:
- nosniff
Server:
- cloudflare
Cf-Ray:
- 96b32b2a5978fa62-SJC
Alt-Svc:
- h3=":443"; ma=86400
body:
encoding: UTF-8
string: |-
{
"error": {
"message": "Unknown parameter: 'text.type'.",
"type": "invalid_request_error",
"param": "text.type",
"code": "unknown_parameter"
}
}
recorded_at: Thu, 07 Aug 2025 01:58:23 GMT
recorded_with: VCR 6.3.1 I think this is due to the current implementation with structured outputs is using the OpenAI Responses API which expects: "text": {
"format": {
"type": "json_schema",
"name": "math_reasoning",
... Where the chat API's use of structured outputs expects this: "response_format": {
"type": "json_schema",
"json_schema": {
... I think we can accommodate both, but the current Provider/Adapter implementation is a bit crude in this regard. |
Ahh. So that's where that format came from |
f9a3cf6
to
38708f9
Compare
f40b9d9
to
47951fe
Compare
47951fe
to
7850cea
Compare
Improvements
Bug Fixes
I refactored some of the internals for the schema loading methods to improve the consistency and reliability of this system, in doing this I switch the naming around to object first ordered language (
load_schema
=>schema_load
). If your not used to this, it does read a little weird the first few times in english; but once it clicks it makes managing codebases, especially large files, a lot easier because it forces you to automatically group the concepts that can grow into concerns or other refactors. This can be reverted if so desiredBased on the documentation of OpenAI and OpenRouter the providedresponse_format
examples don't appear to be compatible with the spec.