Gemini Hidden Reasoning
The performance costs of thinking and provider defaults
While building Tomo and several prototypes using LLMs, I've experimented with several popular language models. It's generally easier to prototype using the OpenAI chat responses API because most providers support this early API spec (mostly). This approach makes it pretty simple to switch between...