It’s necessary to pay attention to the shape of a language model’s response when incorporating it as a component in a software application.
You can’t programmatically tap into the power of a language model if you can’t reliably parse its response.
In the past, I have mostly used a combination of prose and examples to define the shape of the language model response in my prompts.
Something like:
Respond using JSON with the following shape:
{
...
}
I was curious if I could use something more familiar and structured to archive a similar purpose.
pydantic is a popular library for data validation in Python and is an integral part of a number of leading libraries including fastapi.
We can use pydantic in a toy example to show what it could look like to use it define the desired output structure from the language model.
In the following script, we’ll extract data adhering to the passed pydantic schema for the class Address from the following input text.
We’ll do this by passing the code literal of the pydantic model into the language model prompt, then instruct the model to respond with schema-compliant JSON, that we can unpack into an instance of the schema.
If the response from the language model is not schema-compliant, the code will throw a pydantic.error_wrappers.ValidationError.
Last weekend, I visited my friend who lives in a charming little house on Oak Avenue. The address was 3578 Oak Avenue, and it was easy to find with the help of my GPS. While typing in the address, I discovered there was more than one 3578 Oak Ave, but once I entered right zip code, 90011, I found it. The house was surrounded by trees and had a beautiful garden in the front. Inside, my friend had decorated the place with vintage furniture and colorful paintings. We spent the day catching up and enjoying a cup of coffee in the cozy living room. It was a lovely visit and my first time in Los Angeles, and I can’t wait to go back and see my friend’s new garden in full bloom.
The full prompt after formatting looks as follows:
Last weekend, I visited my friend who lives in a charming little house on Oak Avenue. The address was 3578 Oak Avenue, and it was easy to find with the help of my GPS. While typing in the address, I discovered there was more than one 3578 Oak Ave, but once I entered right zip code, 90011, I found it. The house was surrounded by trees and had a beautiful garden in the front. Inside, my friend had decorated the place with vintage furniture and colorful paintings. We spent the day catching up and enjoying a cup of coffee in the cozy living room. It was a lovely visit and my first time in Los Angeles, and I can’t wait to go back and see my friend’s new garden in full bloom.
From the above input, extract data as a JSON object of type Address that adheres to the following schema.
Infer and fill in missing data values if possible
Schema:
classAddress(BaseModel):
street: str# street name and number
city: str
state: constr(min_length=2, max_length=2)
zip_code: str
JSON output:
Running the script yields the following:
Terminal window
❯pythonscript.py
street='3578 Oak Avenue' city='Los Angeles' state='CA' zip_code='90011'
First thing, it works and pretty well!
Second, there are a few subtleties that help make this approach work as well as it does.
From the prompt, “Infer and fill in missing data values if possible”: The state is not specified in the input text, but the language model infers it based on the instructions
state: constr(min_length=2, max_length=2): It would be hard to fault the model if it inferred the state as “California”, but we use an additional schema constraint to ensure the two character abbreviation is used
street: str # street name and number: After testing the script a few times, I ran into an issue where the resulting street was parsed as “Oak Avenue”. Adding this comment helps ensure that the number and street name are extracted together.
This approach is already relatively general.
One can easily swap the BaseModel subclass passed into extract_data for a different data structure.
There’s a bit more work to do here to extract data adhering to the schema of a model with field values that are other models.
For example:
classAddress(BaseModel):
street: str
city: str
state: str
zip_code: str
classUser(BaseModel):
first_name: str
last_name: str
address: Address
In order to parse out data for a User for an input text, you’d want to include the schemas for both the Address and the User.
I’ve written about experimenting with JSON Schema to construct schema-adherent responses from a language model. You can also obtain and use JSON schema from pydantic classes to use for parsing and schema-adherent response construction:
I believe that language models are most useful when available at your fingertips in the context of what you're doing.
Github Copilot is a well known...
I came upon https://gpa.43z.one today.
It's a GPT-flavored capture the flag.
The idea is, given a prompt containing a secret, convince the LM to leak...