Datadog and the OpenAI API Spec
Why I like the Chat Completions API
I write often about the OpenAI Chat Completions API, which has become a sort of standard for the time being for calling LLM APIs. This API is useful because it allows you to easily test different models and providers for inference with text and image inputs.
I like using this API as a starting point because depending on your use case, you’re optimizing different things.
These include:
- maximum performance or correctness against some labels
- latency
- cost
- time to first token
Eventually, you may settle on a particular model and optimize for that, but when you’re getting started, you might not even know what you’re optimizing for or the tradeoffs you’ll need to make given the performance of the models available to you.
Using multiple providers with the OpenAI client
Recently, I set up a Python app that used the OpenAI Python client.
Since many model providers provide OpenAI chat completion APIs in addition to their own, I was experimenting with running inference with other providers just by changing the base_url
and api_key
in the client.
Google has documentation on exactly how you can do this with Gemini.
Several providers do a good job of implementing the spec in a way that is compatible with OpenAI’s client. When it works, you don’t notice or need to think about it. This solid foundation is excellent as you’re figuring out what you’re optimizing for as you build your LLM use case.
However, when implemented poorly, it’s a very bad time.
The failing call
I was calling another provider that claimed to expose an OpenAI-compatible chat completion API. I won’t name names, but the error I got was
{ "error_code": "BAD_REQUEST", "message": "{\"external_model_provider\":\"amazon-bedrock\",\"external_model_error\":{\"message\":\"stream_options: Extra inputs are not permitted\"}}"}
or something like that.
This wasn’t a call to Bedrock by the way, which doesn’t provide native support for an OpenAI-compatible API.[1]
It was a call to a proxy layer in between. One that claimed to be OpenAI-compatible.
The error “stream_options: Extra inputs are not permitted” seems pretty straightforward. The problem was, I wasn’t passing it in my code.
import osfrom openai import OpenAI
def main(): client = OpenAI( api_key=os.getenv("API_KEY"), base_url=os.getenv("BASE_URL"), )
stream = client.chat.completions.create( model="gpt-4o-mini", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Tell me a short joke about programming."}, ], stream=True, )
for chunk in stream: if chunk.choices[0].delta.content is not None: print(chunk.choices[0].delta.content, end="", flush=True) print()
if __name__ == "__main__": main()
What was going on?
I spent a while digging through layers of my code to see how I was inadvertently passing stream_options
in my client call.
I couldn’t find anything.
I wish I had thought to check the raw HTTP request the client was making sooner.
I did that by running httplog
and changing the base_url
to http://localhost:8080/v1
.
Triggering the code path in my app outputted logs like this:
13:51:30.544:Method: POST Path: /v1/chat/completions Host: localhost:8080 Proto: HTTP/1.1Headers: Accept: application/json Accept-Encoding: gzip, deflate Authorization: Bearer test Connection: keep-alive Content-Length: 217 Content-Type: application/json Traceparent: 00-68890a2200000000059e62812c877cbe-1d59ad448017a984-01 Tracestate: dd=p:1d59ad448017a984;s:1;t.dm:-0;t.tid:68890a2200000000 User-Agent: OpenAI/Python 1.97.1 X-Datadog-Parent-Id: 2114912009745574276 X-Datadog-Sampling-Priority: 1 X-Datadog-Tags: _dd.p.dm=-0,_dd.p.tid=68890a2200000000 X-Datadog-Trace-Id: 404869323447303358 X-Stainless-Arch: arm64 X-Stainless-Async: false X-Stainless-Lang: python X-Stainless-Os: MacOS X-Stainless-Package-Version: 1.97.1 X-Stainless-Read-Timeout: 600 X-Stainless-Retry-Count: 0 X-Stainless-Runtime: CPython X-Stainless-Runtime-Version: 3.11.11Body: {"messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Tell me a short joke about programming."}],"model":"gpt-4o-mini","stream":true,"stream_options":{"include_usage":true}}
So there it was. The code was clearly somehow setting stream_options
in the request, even though I couldn’t figure out how.
I created a completely independent reproduction of the same client call outside my codebase and somehow it worked with identical code as shown above. The logs looked like this:
13:55:47.315:Method: POST Path: /v1/chat/completions Host: localhost:8080 Proto: HTTP/1.1Headers: Accept: application/json Accept-Encoding: gzip, deflate Authorization: Bearer test Connection: keep-alive Content-Length: 177 Content-Type: application/json User-Agent: OpenAI/Python 1.97.1 X-Stainless-Arch: arm64 X-Stainless-Async: false X-Stainless-Lang: python X-Stainless-Os: MacOS X-Stainless-Package-Version: 1.97.1 X-Stainless-Read-Timeout: 600 X-Stainless-Retry-Count: 0 X-Stainless-Runtime: CPython X-Stainless-Runtime-Version: 3.11.11Body: {"messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Tell me a short joke about programming."}],"model":"gpt-4o-mini","stream":true}
The difference in these logs was the first useful clue I had come across.
1c1< 13:55:47.315:---> 13:51:30.544:8c8< Content-Length: 177---> Content-Length: 2179a10,11> Traceparent: 00-68890a2200000000059e62812c877cbe-1d59ad448017a984-01> Tracestate: dd=p:1d59ad448017a984;s:1;t.dm:-0;t.tid:68890a220000000010a13,16> X-Datadog-Parent-Id: 2114912009745574276> X-Datadog-Sampling-Priority: 1> X-Datadog-Tags: _dd.p.dm=-0,_dd.p.tid=68890a2200000000> X-Datadog-Trace-Id: 40486932344730335821c27< {"messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Tell me a short joke about programming."}],"model":"gpt-4o-mini","stream":true}---> {"messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Tell me a short joke about programming."}],"model":"gpt-4o-mini","stream":true,"stream_options":{"include_usage":true}}
It seemed that Datadog was somehow adding stream_options
to the request even though I wasn’t setting it in my code.
Datadog’s magic tracing
To add Datadog tracing to a Python app, you install ddtrace
and run your app with ddtrace-run
.
It pretty much just works.
However, with my faulty OpenAI-compatible API, the magic that Datadog uses to slightly modify requests to enable their traceability was breaking my code.
I eventually traced the behavior to the dd-trace-py
library, which sets stream_options["include_usage"] = True
.
Given how this is implemented and the problem I was running into, setting stream_options["include_usage"] = False
in my code led to the same problem since the underlying model still did not support it.
This behavior was introduced in v2.20.0 for dd-trace-py
with the release notes:
- LLM Observability
openai
: Introduces automatic extraction of token usage from streamed chat completions. Unless stream_options:{"include_usage": False}
is explicitly set on your streamed chat completion request, the OpenAI integration will add stream_options:{"include_usage": True}
to your request and automatically extract the token usage chunk from the streamed response.
It’s a relatively innocuous change that allows Datadog to better trace LLM calls with a client that is documented to support it, but not if your OpenAI API isn’t working properly.
The fix
In this case, I still needed to call the problematic provider, so I used httpx
to build a custom SSE client to stream the response back to the caller of my app, smoothing over the differences in what the problematic API supported.
This approach allowed me to keep Datadog as well without needing to make more invasive changes to my code.
While it’s tough to blame Datadog for all this trouble, I was a little salty that I was burned by this magical request modification. But it’s not their fault.
If you implement an OpenAI-compatible API, please at least test it with the OpenAI client, otherwise you’re not going to get the adoption or benefits from users using existing tools to call you. And you’re also going to burn users who are relying on some amount of predictability while using systems (LLMs) that are already quite unpredictable.
Footnotes
-
Unless you want to deploy a Lambda function to proxy the requests ↩
Recommended
Fireworks.ai OpenAI Compatibility
I've starting playing around with Fireworks.ai to run inference using open source language models with an API. Fireworks' product is the best I've...
OpenAI Compatible API Implementation Variance
Lots of language model providers implement the OpenAI API spec. These look similar in shape but often behave differently in subtle ways. Anthropic's...
Gemini 2.5 Uses Thinking By Default
It started because I was using the OpenAI completion API to try several different models while building Tomo.