I’ve now run into the following issue multiple times using Bedrock. It’s my own mistake but I end up confused each time.
An error occurred (ValidationException) when calling the InvokeModelWithResponseStream operation:The maximum tokens you requested exceeds the model limit of 65536I had this prompt when attempting to run inference on Bedrock with us.anthropic.claude-sonnet-4-20250514-v1:0.
The problem was I was trying to set the output tokens larger than the model limit. Claude Sonnet 4 has versions that support 200k and 1M tokens context windows but this is for the input tokens to the model.
I hope this note helps if you run into the same issue.