VLM data extraction with Protobufs
In light of OpenAI releasing structured output in the model API, let's move output structuring another level up the stack to the microservice/RPC level.
5 entries
In light of OpenAI releasing structured output in the model API, let's move output structuring another level up the stack to the microservice/RPC level.
In Python, the most straightforward path to implementing a gRPC server for a Protobuf service is to use protoc to generate code that can be imported in a server, which then defines the service logic.
I wrote and screen-recorded myself building a Python app to call a model to extract structured data from an image, making heavy use of codegen with Cursor. The same protobuf is used as instructions in the prompt and to unpack the result returned by the model into an instance of the class generated...
This point resonates with me. The more time I spend prompting models, the more it's becoming clear that the clarity of the instructions are what matter most. Writing clear, unambiguous instructions is not easy. Decrease scope and you have a chance of doing it well.
I've written several posts on using JSON and Pydantic schemas to structure LLM responses. Recently, I've done some work using a similar approach with protobuf message schemas as the data contract. Here's an example to show what that looks like.