Thought Eddies

2025-09-10 Posts

Running a Mile with LLMs

I was not planning on writing about this but after reading Sascha's post, I decided it could be interesting because his demonstration on what it takes to build knowledge happened to relate to prompting a language model for a fitness routine: This drives us to one of the most important conclusions...

language_models fitness

2025-08-19 Logs

2025-08-19

It's been a while since I've attempted to use LLMs to solve Connections puzzles.

connections language_models

2025-07-05 TIL

Gemini 2.5 Uses Thinking By Default

It started because I was using the OpenAI completion API to try several different models while building Tomo.

gemini language_models

2025-06-19 Posts

Agent Coding Strategies

If you've read any of my writing in the past year, you're probably aware I've heavily adopted agents to build much of the software I write now. What I've done less of is write about the strategies I've used to do this.

language_models agents

2025-06-17 Posts

Who Finds LLMs Useful and Why

Who is finding LLMs useful and who is not? And why is this the case?

language_models

2025-03-04 Logs

2025-03-04

Learned more about the post-training phase of fine-tuning LLMs and how the model initially goes through a pre-training phase. From there, it is fine-tuned to contribute to a token stream with a human user, using prompt tokens to demarcate whether a message was written by the user or the assistant.

language_models

2025-03-01 Posts

Goose as a Task Runner

Goose is a CLI language model-based agent. Goose exposes a chat interface and uses tool calling (mostly to invoke shell commands) to accomplish the objective prompted by the user. These tasks can include everything from writing code to running tests to converting a folder full of mov files to...

goose language_models

2025-02-25 TIL

Adding an llms.txt file to Hugo

Today, I set out to add an llms.txt to this site. I've made a few similar additions in the past with raw post markdown files and a search index. Every time I try and change something with outputFormats in Hugo, I forget one of the steps, so in writing this up, finally I'll have it for next time.

hugo language_models llms.txt markdown

2025-02-24 TIL

Claude Code

Today, Anthropic entered the LLM code tools party with Claude Code.

anthropic claude language_models claude-code

2025-02-14 Posts

Developing a Mental Model for using Models

I had an interesting realization today while doing a demo building a web app with Cursor. I was debugging an issue with an MCP server, trying to connect it to Cursor's MCP integration. The code I was using was buggy, and I'd never tried this before (attempting it live was probably a fool's errand...

language_models cursor voice_notes

2024-12-07 Logs

2024-12-07

I don't know anything about rice disease but apparently these are various rice diseases and this is what they look like. Jeremy Howard, Fast.ai Course Lesson 6 I have no idea if Jeremy had this in mind when he said this (alluding to the fact he doesn't know about the subject area, but when...

course.fast.ai language_models

2024-12-01 Logs

2024-12-01

I'm working on a conversation branching tool called "Delta" (for now). The first thing that led me to this idea came from chatting with Llama 3.2 and experimenting with different system prompts. I was actually trying to build myself a local version of an app I've been fascinated by called Dot.

delta language_models llm transcribed

2024-11-19 TIL

Embeddings Clustering

I explored how embeddings cluster by visualizing LLM-generated words across different categories. The visualizations helped build intuition about how these embeddings relate to each other in vector space. Most of the code was generated using Sonnet.

language_models embeddings

2024-11-16 Posts

Conversation Branching

Language models are more than chatbots - they're tools for thought. The real value lies in using them as intellectual sounding boards to brainstorm, refine and challenge our ideas.

language_models conversation_branching

2024-11-14 TIL

Practical Deep Learning, Lesson 5, Pricing Iowa Houses with Random Forests

Having completed lesson 5 of the FastAI course, I prompted Claude to give me some good datasets upon which to train a random forest model. This housing dataset from Kaggle seemed like a nice option, so I decided to give it a try. I am also going to try something that Jeremy Howard recommended for...

course.fast.ai language_models

2024-11-04 TIL

Practical Deep Learning, Lesson 4, Language Model Blog Post Imitator

In this notebook/post, we're going to be using the markdown content from my blog to try a language model. From this, we'll attempt to prompt the model to generate a post for a topic I might write about.

language_models course.fast.ai

2024-11-02 Posts

Models Writing About Coding With Models

I recently found Joe's article, We All Know AI Can’t Code, Right?.

language_models

2024-10-30 Logs

2024-10-30

I wanted to get more hands-on with the language model trained in chapter 12 of the FastAI course, so I got some Google Colab credits and actually ran the training on an A100. It cost about $2.50 and took about 1:40, but generally worked quite well. There was a minor issue with auto-saving the...

language_models google_colab

2024-10-22 Posts

Language model random number generator

I had the idea to try and use a language model as a random number generator. I didn't expect it to actually work as a uniform random number generator but was curious to see what the distribution of numbers would look like.

language_models

2024-08-12 Posts

Structured Output, Functions and Prompting

I've been prompting models to output JSON for about as long as I've been using models. Since text-davinci-003, getting valid JSON out of OpenAI's models didn't seem like that big of a challenge, but maybe I wasn't seeing the long tails of misbehavior because I hadn't massively scaled up a use...

openai language_models function_calling structured_data

2024-07-22 Logs

2024-07-22

I've been wanting to create a chat component for this site for a while, because I really don't like quoting conversations and manually formatting them each time. When using a model playground, usually there is a code snippet option that generates Python code you can copy out intro a script. Using...

shortcodes language_models

2024-07-16 Logs

2024-07-16

Research and experimentation with models presents different problems than I am used to dealing with on a daily basis. The structure of what you want to try out changes often, so I understand why some folks prefer to use notebooks. Personally, notebooks haven't caught on for my so I'm still just...

research language_models

2024-07-12 Posts

Challenges and Opportunities of the Impact of Language Models on Software Engineering

I'm trying something a bit new, writing some of my thoughts about how the future might look based on patterns I've been observing lately.

language_models thoughts

2024-06-18 Posts

Language model-based aggregators

Model-based aggregators

language_models aggregators

2024-05-17 Logs

2024-05-17

Sabrina wrote an interesting write up on solving a math problem with gpt-4o. It turned out the text-only, chain-of-thought approach was the best performing, which is not what I would have guessed.

gpt-4o datasette language_models

2024-05-15 Posts

Evals: unit testing for language models

Generative AI and language models are fun to play with but you don't really have something you can confidently ship to users until you test what you've built.

evals language_models

2024-05-03 Logs

2024-05-03

I read Jason, Ivan and Charles' blog post on Modal about fine tuning an embedding model. It's a bit in the weeds of ML for me but I learn a bit more every time I read something new.

fine-tuning modal language_models

2024-04-30 TIL

Unmasking a Model Prompt

The following prompt seems to be quite effective at leaking any pre-prompting done to a language model

language_models chatgpt claude command-r prompt_eng

2024-04-17 Logs

2024-04-17

For me, invoking a language model using a playground (UI) interface is the most common approach for my usage. Occasionally, it can be helpful to use the a CLI to directly pipe output into a model. For example

language_models playground gemini modal

2024-04-13 Logs

2024-04-13

I enjoyed this article by Ken about production LLM use cases with OpenAI models. When it comes to prompts, less is more

nix language_models

2024-04-09 Logs

2024-04-09

Gemini Pro 1.5 up and running. I've said this before but I will say it again -- the fact that I don't need to deal with GCP to use Google models gives me joy.

gemini language_models

2024-03-31 Logs

2024-03-31

I've been digging more into evals. I wrote a simple Claude completion function in openai/evals to better understand how the different pieces fit together. Quick and dirty code:

language_models openai evals

2024-03-28 Logs

2024-03-28

I can't believe I am saying this but if you play around with language models locally, a 1 TB drive, might not be big enough for very long.

language_models

2024-03-23 Logs

2024-03-23

One of the greatest misconceptions concerning LLMs is the idea that they are easy to use. They really aren’t: getting great results out of them requires a great deal of experience and hard-fought intuition, combined with deep domain knowledge of the problem you are applying them to.

language_models claude3 gpt4 connections

2024-03-22 Logs

2024-03-22

Did a bit more work on a LLM evaluator for connections. I'm mostly trying it with gpt-4 and claude-3-opus. On today's puzzle, the best either did was 2/4 correct. I'm unsure how much more improvement is possible with prompting or even fine tuning, but it's an interesting challenge.

language_models connections

2024-03-21 Logs

2024-03-21

Setup a Temporal worker in Ruby and got familiar with its ergonomics.

ruby temporal language_models

2024-03-17 TIL

SQLite Vector Similarity Search

I spent yesterday and today working through the excellent guide by Alex on using sqlite-vss to do vector similarity search in a SQLite database. I'm particularly interested in the benefits one can get from having these tools available locally for getting better insights into non-big datasets with a...

sqlite sqlite-vss embeddings language_models datasette

2024-03-06 Logs

2024-03-06

Ideal bookmarking from Swyx on Latent Space with Soumith Chintala. create synthetic data off of your retrieved documents and then fine tune on that

language_models rag

2024-03-04 TIL

Installing Python Packages with Nix

I've been meaning to try out Simon's llm package for a while now. From reading the docs and following the development, it's a modular, meet-you-where-you-are CLI for running LLM inference locally or using almost any API out there. In the past, I might have installed this with brew, but we run nix...

nix language_models python home-manager flakes

2024-02-16 Logs

2024-02-16

A very timely (for me) article by Hamel about understanding what a language model prompt abstraction library is doing before blindly adopting it. This really aligned with a lot of my own thoughts on the matter, right down to it's praise of Jason's instructor library baseline example.

language_models

2024-01-31 Posts

Language Model Streaming With SSE

OpenAI popularized a pattern of streaming results from a backend API in realtime with ChatGPT. This approach is useful because the time a language model takes to run inference is often longer than what you want for an API call to feel snappy and fast. By streaming the results as they're produced,...

sse vercel language_models python fastapi

2024-01-23 Logs

2024-01-23

Hardly seemed with a TIL post because it was too easy, but I learned gpt-4 is proficient at building working ffmpeg commands. I wrote the prompt convert m4a to mp3 with ffmpeg

language_models gpt-4 ffmpeg audio

2024-01-21 Posts

Sandboxed Python Environment

Disclaimer: I am not a security expert or a security professional.

language_models python security nix docker

2024-01-17 TIL

Using Vercel's AI SDK to stream responses from different language models

Edit (2024-07-21): Vercel has updated the ai package to use different abstractions than the examples below. Consider reading their docs first before using the example below, which is out of date.

vercel ai language_models

2024-01-16 Logs

2024-01-16

I spent another hour playing around with different techniques to try and teach and convince gpt-4 to play Connections properly, after a bit of exploration and feedback. I incorporated two new techniques Asking for on category at a time, then giving the model feedback (correct, incorrect, 3/4) Using...

language_models openai connections

2024-01-13 Posts

Fine-tuning gpt-3.5-turbo to learn to play "Connections"

I started playing the NYTimes word game "Connections" recently, by the recommendation of a few friends. It has the type of freshness that Wordle lost for me a long time ago. After playing Connections for a few days, I wondered if an OpenAI language model could solve the game (the objective is to...

language_models fine_tuning connections

2024-01-10 Logs

2024-01-10

After some experimentation with GitHub Copilot Chat, my review is mixed. I like the ability to copy from the sidebar chat to the editor a lot. It makes the chat more useful, but the chat is pretty chatty and thus somewhat slow to finish responding as a result. I've also found the inline generation...

cursor copilot language_models

2023-12-12 Logs

2023-12-12

I would love if OpenAI added support for presetting a max_tokens url parameter in the Playground. Something as simple as this:

language_models openai

2023-11-17 Logs

2023-11-17

I'm betting OpenAI will soon have a Cloud Storage product like Google Drive or iCloud for ChatGPT Plus users. Having your personal data available in the context of a language model is a massive value add. With a product like, OpenAI can fully support use cases like "summarize my notes for the week"...

prediction language_models openai

2023-09-09 Logs

2023-09-09

Playing with Rivet and OpenInterpreter

language_models rivet open_interpreter

2023-08-27 Logs

2023-08-27

It's much easier to test Temporal Workflow in Python by invoking the contents of the individual Activities first, in the shell or via a separate script, then composing them into a Workflow. I need to see if there's a better way to surface exceptions and failures through Temporal directly to make...

temporal python language_models

2023-08-22 Logs

2023-08-22

Language models and prompts are magic in a world of deterministic software. As prompts change and use cases evolve, it can be difficult to continue to have confidence in the output of a model. Building a library of example inputs for your model+prompt combination with annotated outputs is critical...

language_models

2023-08-06 Logs

2023-08-06

Simon wrote an excellent post on the current state of the world in LLMs.

language_models lk-99

2023-08-05 TIL

Intro to TypeChat

First attempt

language_models typescript typechat node

2023-08-03 Logs

2023-08-03

It will be interested to see if or when we hit scaling limits to training more powerful models and what our new bottleneck becomes. For now, there appears to be a lot of greenfield.

language_models structured_data

2023-08-01 Logs

2023-08-01

While not an entirely unique perspective, I believe Apple is one of the best positioned companies to take advantage of the recent improvements in language models. I expect more generic chatbots will continue to become commodities whereas Apple will build a bespoke, multi-modal assistant with access...

prediction apple language_models

2023-07-27 Posts

Promptfoo and standardizing output structure across models

promptfoo is a Javascript library and CLI for testing and evaluating LLM output quality. It's straightforward to install and get up and running quickly. As a first experiment, I've used it to compare the output of three similar prompts that specify their output structure using different modes of...

language_models promptfoo

2023-07-25 Logs

2023-07-25

I tried out Llama 2 today using ollama. At first pass, it seemed ok a writing Python code but I struggled to get it to effective generate or adhere to specific schema. I'll have to try a few more things but my initial impressions are mixed (relative to OpenAI models).

llama language_models

2023-07-22 Logs

2023-07-22

It's hard to think because it's hard to think. - Github Copilot

language_models

2023-07-19 Logs

2023-07-19

Meta released Llama 2 yesterday and the hype has ensued. While it's exciting to see more powerful models become available, a model with weights is not the same as an API. It is still far less accessible.

language_models gpt-4

2023-07-07 Logs

2023-07-07

Some unstructured thoughts on the types of tasks language models seem to be good (and bad) at completing:

language_models

2023-07-05 Logs

2023-07-05

Experimenting with using a language model to improve the input prompt, then use that output as the actual prompt for the model, then returning the result. It's a bit of a play on the "critique" approach. Some of the outputs were interesting but I need a better way to evaluate the results.

language_models python openai

2023-06-29 Logs

2023-06-29

I've been following Jason's working experimenting with different abstractions for constructing prompts and structuring responses. I've long felt that building prompts with strings is not the type of developer experience that will win the day. On the other hand, I'm weary of the wrong abstraction...

language_models

2023-06-21 Logs

2023-06-21

I've been thinking about the concept of "prompt overfitting". In this context, there is a distinction between model overfitting and prompt overfitting. Say you want to use a large language model as a classifier. You may give it several example inputs and the expected outputs. I don't have hard data...

language_models

2023-06-18 Posts

OpenAI Function Calling

This past week, OpenAI added function calling to their SDK. This addition is exciting because it now incorporates schema as a first-class citizen in making calls to OpenAI chat models. As the example code and naming suggest, you can define a list of functions and schema of the parameters required...

language_models openai structured_data function_calling

2023-06-14 Logs

2023-06-14

Richard WM Jones booted Linux 292,612 to find a bug where it hangs on boot. I loved reading the recounting of his process to accomplish this, by bisecting through the different versions of Linux and boot each thousands of times to determine whether the version contained the bug.

bisect language_models

2023-06-08 Logs

2023-06-08

Today, I played around with Matt Rickard's ReLLM library, another take on constraining LLM output, in this case, with regex. I tried to use it to steer a language model to generate structure (JSON) from unstructured input. This exercise is sort of like parsing or validating JSON with regex -- it's...

language_models structured_data

2023-06-02 TIL

Trying out Jsonformer

I tried out jsonformer to see how it would perform with some of structured data use cases I've been exploring.

jsonformer language_models structured_data

2023-05-29 Logs

2023-05-29

I've been following Eric's posts about SudoLang since the first installment back in March. I've skimmed through the spec and the value proposition quite compelling. SudoLang seeks to allow programmers all levels to instruct LLMs and can also be transpiled into your programming language of...

sudolang language_models

2023-05-27 Posts

Protobuf contracts with LLMs

I've written several posts on using JSON and Pydantic schemas to structure LLM responses. Recently, I've done some work using a similar approach with protobuf message schemas as the data contract. Here's an example to show what that looks like.

language_models protobuf

2023-05-27 Logs

2023-05-27

NVIDIA researchers introduce an LLM-based agent with "lifelong learning" capabilities that can navigate, discover, and accomplish goals in Minecraft without human intervention.

agents language_models

2023-05-26 Logs

2023-05-26

The Alexandria Index is building embeddings for large, public data sets, to make them more searchable and accessible.

embeddings language_models structured_data

2023-05-24 Logs

2023-05-24

I've seen a lot of "GPT detection" products floating around lately. Sebastian discusses some of the products and their approaches in this article. Some products claim to have developed an "algorithm with an accuracy rate of text detection higher than 98%". Unfortunately, this same algorithm...

language_models ai_detection prompt_engineering

2023-05-22 Logs

2023-05-22

Brex wrote a nice beginner guide on prompt engineering.

language_models prompt_engineering

2023-05-17 Posts

Input data schemas and token efficiency

Plenty of data is ambiguous without additional description or schema to clarify its meaning. It's easy to come up with structured data that can't easily be interpreted without its accompanying schema. Here's an example:

language_models structured_data

2023-05-16 Logs

2023-05-16

LMQL is a SQL-like programming language for interacting with LMs. It takes a declarative approach to specifying the output constraints for a language model, with a SQL flavor.

language_models sql

2023-05-14 Logs

2023-05-14

marvin's @ai_model decorator implements something similar to what I had in mind for extracting structured data from an input to a language model.

language_models open_source

2023-05-13 Logs

2023-05-13

Restricting the next predicted token to adhere to a specific context free grammar seems like a big step forward in weaving language models into applications.

tweet language_models open_source context_free_grammar

2023-05-11 Logs

2023-05-11

Using system prompts provides an intuitive separation for input and output schema from input content.

language_models gpt4 openai

2023-05-10 Logs

2023-05-10

With the support of GPT-4, I feel unstoppable. The overnight surge in productivity is intoxicating, not for making money or starting a business, but for the sheer joy of continuously creating ideas from my mind, which feels like happiness. \- Ke Fang

language_models gpt4

2023-05-09 Logs

2023-05-09

I wrote a few paragraphs disagreeing with Paul's take, asserting that, like Simon suggests, we should think of language models like ChatGPT as a “calculator for words”.

twitter language_models

2023-05-08 Posts

Figuring out how to use LLMs in production

Code needs structure output

The most popular language model use cases I've seen around have been chatbots agents chat your X use cases

language_models python

2023-04-30 Posts

Shaping LLM Responses

It's necessary to pay attention to the shape of a language model's response when incorporating it as a component in a software application. You can't programmatically tap into the power of a language model if you can't reliably parse its response. In the past, I have mostly used a combination of...

python language_models

2023-04-23 Posts

Auto-GPT

Experimenting with Auto-GPT

Auto-GPT is a popular project on Github that attempts to build an autonomous agent on top of an LLM. This is not my first time using Auto-GPT. I used it shortly after it was released and gave it a second try a week or two later, which makes this my third, zero-to-running effort.

language_models python

2023-04-16 Posts

Using GPT-3.5 to Quickly Generate and Run Shell Commands

I believe that language models are most useful when available at your fingertips in the context of what you're doing. Github Copilot is a well known application that applies language models in this manner. There is no need to pre-prompt the model. It knows you're writing code and that you're going...

language_models python cli

2023-04-07 Posts

Future of Personal Knowledge

Over the the years, I've developed a system for capturing knowledge that has been useful to me. The idea behind this practice is to provide immediate access to useful snippets and learnings, often with examples. I'll store things like: Amend commit message with tags like #git, #commit, and #amend...

language_models second_brain personal_knowledge documentation

2023-04-02 Posts

Nix and direnv

I know a little about nix. Not a lot. I know some things about Python virtual environments, asdf and a few things about package managers. I've heard the combo of direnv and nix is fantastic from a number of engineers I trust, but I haven't had the chance to figure out what these tools can really...

partly_ai_authored gpt4 language_models python

2023-03-28 Posts

GPT Prompt Attack

I came upon https://gpa.43z.one today. It's a GPT-flavored capture the flag. The idea is, given a prompt containing a secret, convince the LM to leak the prompt against prior instructions it's been given. It's cool way to develop intuition for how to prompt and steer LMs. I managed to complete all...

language_models prompt_injection

2023-03-18 Posts

Beating Prompt Injection with Focus

Attempts to thwart prompt injection

I've been experimenting with ways to prevent applications for deviating from their intended purpose. This problem is a subset of the generic jailbreaking problem at the model level. I'm not particularly well-suited to solve that problem and I imagine it will be a continued back and forth between...

language_models prompt_injection

2023-03-15 Posts

LLM Prompt Injection

Jailbreaking as prompt injection

language_models prompt_injection jailbreaking

2023-03-11 TIL

Trying Out Deepsparse

I've been keeping an eye out for language models that can run locally so that I can use them on personal data sets for tasks like summarization and knowledge retrieval without sending all my data up to someone else's cloud. Anthony sent me a link to a Twitter thread about product called deepsparse...

hugging_face language_models q&a

2023-03-04 Posts