I ran the code from my Fine-tuning “Connections” post using gpt-4o-mini.
I was hoping the results might be a bit better, which could motivate an effort to fine-tune the model.
I’m not sure where my original version of this code went, so I reconstructed a repo for it.
Once I was done, I ran 100 prompts through the model to get a sense of where its baseline performance was.
Correct: 2.00%Incorrect: 98.00%Total Categories Correct: 19.25%Not great, and not much different from gpt-3.5-turbo.
With these kind of results, I wasn’t particularly motivated to put the effort in to do more fine tunes.
I read through the instructor, marvin and open-interpreter docs for the first time in a while.
It has been interesting to see these libraries grow and diverge.
I also read through how Jason has been structuring an evals repo for instructor.