Evals: unit testing for language models
Generative AI and language models are fun to play with but you don't really have something you can confidently ship to users until you test what you've built.
Generative AI and language models are fun to play with but you don't really have something you can confidently ship to users until you test what you've built.
Similar to (and perhaps more simply than) analyzing Youtube video transcripts with language models, I wanted to apply a similar approach to webpages like articles, primarily for the purposes of determining the subject content of lengthy pieces and experimenting to see if this is useful at all.
You can download a Youtube video transcript with yt-dlp.
The following prompt seems to be quite effective at leaking any pre-prompting done to a language model
Temporal gives you flexibility to define different task queues to route workflows and activities to specific workers. When a worker starts up, it is configured to consume from a specific task queue by name, along with the activities and workflows it is capable of running.
I run a lot of different version of various languages and tools across my system. Nix and direnv help make this possible to manage reasonably. Recently, starting a new Python project, I was running into this warning after install dependencies with pip (yes, I am aware there are new/fresh/fast/cool...
On macOS, a Launch Agent is a system daemon that runs in the background and performs various tasks or services for the user. Having recently installed ollama, I've been playing around with various local models. One annoyance about having installed ollama using Nix via nix-darwin, is that I need to...
I've been familiar with Python's -m flag for a while but never had quite internalized what it was really doing. While reading about this cool AI pair programming project called aider, the docs mentioned that the tool could be invoked via python -m aider.main "[i]f your pip install did not place the...
I was pulling the openai/evals repo and trying to running some of the examples. The repo uses git-lfs, so I installed that to my system using home-manager.
I spent yesterday and today working through the excellent guide by Alex on using sqlite-vss to do vector similarity search in a SQLite database. I'm particularly interested in the benefits one can get from having these tools available locally for getting better insights into non-big datasets with a...