Posts

Claude 3.5 Sonnet Connections Evals

I've continued experimenting with techniques to prompt a language model to solve Connections. At a high level, I set out to design an approach to hold the model to a similar standard as a human player, within the restrictions of the game. These standards and guardrails include the following: The...

Logs

2024-09-19

I finally found some time to run a more comprehensive evals of Connections with one guess at a time and using Python code to validate the guesses and give feedback. I ran about 100 puzzles with gpt-4o-mini, gp-4o, and claude-3-5-sonnet, but it became clear that Sonnet was going to perform the best,...