Skip to content

Add support for Free-Form Function Calling and Context Free Grammar constraints over string output #2513

@matthewfranglen

Description

@matthewfranglen

Description

The release of GPT-5 has added a different form of structured output. This uses a context free grammar implemented using either the LARK or regex formats. It is described here.

With this you can write agents that can reliably produce structured output strings. The example in the link above is for SQL, I have also found this useful for the lucene query language. Composing valid queries of arbitrary complexity using pydantic models is both tricky programmatically and very likely difficult for a given LLM to work with.

I would like to propose the addition of an output_grammar to the agent constructor and or run method which takes a suitable grammar, validates it, and then provides it to the completion endpoint. It would be an error to provide the grammar if the output_type is not a string.

References

https://cookbook.openai.com/examples/gpt-5/gpt-5_new_params_and_tools#3-contextfree-grammar-cfg

my lucene experiment:

from openai import OpenAI
from lark import Lark

client = OpenAI()

prompt = (
    "Call the lucene_query to generate a query for the Solr database that will "
    "match the documents that reference coca-cola by name or alias"
)
grammar = LUCENE_GRAMMAR_FILE.read_text()

# check validity of grammar
Lark(grammar, parser="lalr")

# make request, this works with gpt-5 and gpt-5-mini
response = client.responses.create(
    model="gpt-5",
    input=prompt,
    text={"format": {"type": "text"}},
    tools=[
        {
            "type": "custom",
            "name": "lucene_query",
            "description": (
                "Executes read-only lucene queries that match the text of "
                "the document. YOU MUST REASON HEAVILY ABOUT THE QUERY AND "
                "MAKE SURE IT OBEYS THE GRAMMAR."
            ),
            "format": {
                "type": "grammar",
                "syntax": "lark",
                "definition": grammar,
            },
        },
    ],
    parallel_tool_calls=False,
)

print("--- Lucene Query ---")
print(response.output[1].input)

The grammar is available here.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions