Posts

Claude 3.5 Sonnet Connections Evals

I've continued experimenting with techniques to prompt a language model to solve Connections. At a high level, I set out to design an approach to hold the model to a similar standard as a human player, within the restrictions of the game. These standards and guardrails include the following: The...

Logs

2024-09-19

I finally found some time to run a more comprehensive evals of Connections with one guess at a time and using Python code to validate the guesses and give feedback. I ran about 100 puzzles with gpt-4o-mini, gp-4o, and claude-3-5-sonnet, but it became clear that Sonnet was going to perform the best,...

Logs

2024-07-24

I ran the code from my Fine-tuning "Connections" post using gpt-4o-mini. I was hoping the results might be a bit better, which could motivate an effort to fine-tune the model. I'm not sure where my original version of this code went, so I reconstructed a repo for it. Once I was done, I ran 100...

Logs

2024-03-22

Did a bit more work on a LLM evaluator for connections. I'm mostly trying it with gpt-4 and claude-3-opus. On today's puzzle, the best either did was 2/4 correct. I'm unsure how much more improvement is possible with prompting or even fine tuning, but it's an interesting challenge.

Logs

2024-01-16

I spent another hour playing around with different techniques to try and teach and convince gpt-4 to play Connections properly, after a bit of exploration and feedback. I incorporated two new techniques Asking for on category at a time, then giving the model feedback (correct, incorrect, 3/4) Using...