Logs

2025-03-04

Learned more about the post-training phase of fine-tuning LLMs and how the model initially goes through a pre-training phase. From there, it is fine-tuned to contribute to a token stream with a human user, using prompt tokens to demarcate whether a message was written by the user or the assistant.

Posts

Goose as a Task Runner

Goose is a CLI language model-based agent. Goose exposes a chat interface and uses tool calling (mostly to invoke shell commands) to accomplish the objective prompted by the user. These tasks can include everything from writing code to running tests to converting a folder full of mov files to...

Logs

2024-12-07

I don't know anything about rice disease but apparently these are various rice diseases and this is what they look like. Jeremy Howard, Fast.ai Course Lesson 6 I have no idea if Jeremy had this in mind when he said this (alluding to the fact he doesn't know about the subject area, but when...

Logs

2024-12-01

I'm working on a conversation branching tool called "Delta" (for now). The first thing that led me to this idea came from chatting with Llama 3.2 and experimenting with different system prompts. I was actually trying to build myself a local version of an app I've been fascinated by called Dot.

Logs

2024-10-30

I wanted to get more hands-on with the language model trained in chapter 12 of the FastAI course, so I got some Google Colab credits and actually ran the training on an A100. It cost about $2.50 and took about 1:40, but generally worked quite well. There was a minor issue with auto-saving the...

Logs

2024-07-22

I've been wanting to create a chat component for this site for a while, because I really don't like quoting conversations and manually formatting them each time. When using a model playground, usually there is a code snippet option that generates Python code you can copy out intro a script. Using...

Logs

2024-07-16

Research and experimentation with models presents different problems than I am used to dealing with on a daily basis. The structure of what you want to try out changes often, so I understand why some folks prefer to use notebooks. Personally, notebooks haven't caught on for my so I'm still just...

Logs

2024-03-22

Did a bit more work on a LLM evaluator for connections. I'm mostly trying it with gpt-4 and claude-3-opus. On today's puzzle, the best either did was 2/4 correct. I'm unsure how much more improvement is possible with prompting or even fine tuning, but it's an interesting challenge.

Logs

2024-02-16

A very timely (for me) article by Hamel about understanding what a language model prompt abstraction library is doing before blindly adopting it. This really aligned with a lot of my own thoughts on the matter, right down to it's praise of Jason's instructor library baseline example.

Logs

2024-01-16

I spent another hour playing around with different techniques to try and teach and convince gpt-4 to play Connections properly, after a bit of exploration and feedback. I incorporated two new techniques Asking for on category at a time, then giving the model feedback (correct, incorrect, 3/4) Using...

Logs

2024-01-10

After some experimentation with GitHub Copilot Chat, my review is mixed. I like the ability to copy from the sidebar chat to the editor a lot. It makes the chat more useful, but the chat is pretty chatty and thus somewhat slow to finish responding as a result. I've also found the inline generation...

Logs

2023-11-17

I'm betting OpenAI will soon have a Cloud Storage product like Google Drive or iCloud for ChatGPT Plus users. Having your personal data available in the context of a language model is a massive value add. With a product like, OpenAI can fully support use cases like "summarize my notes for the week"...

Logs

2023-08-27

It's much easier to test Temporal Workflow in Python by invoking the contents of the individual Activities first, in the shell or via a separate script, then composing them into a Workflow. I need to see if there's a better way to surface exceptions and failures through Temporal directly to make...

Logs

2023-08-22

Language models and prompts are magic in a world of deterministic software. As prompts change and use cases evolve, it can be difficult to continue to have confidence in the output of a model. Building a library of example inputs for your model+prompt combination with annotated outputs is critical...

Logs

2023-08-01

While not an entirely unique perspective, I believe Apple is one of the best positioned companies to take advantage of the recent improvements in language models. I expect more generic chatbots will continue to become commodities whereas Apple will build a bespoke, multi-modal assistant with access...

Logs

2023-07-25

I tried out Llama 2 today using ollama. At first pass, it seemed ok a writing Python code but I struggled to get it to effective generate or adhere to specific schema. I'll have to try a few more things but my initial impressions are mixed (relative to OpenAI models).

Logs

2023-07-05

Experimenting with using a language model to improve the input prompt, then use that output as the actual prompt for the model, then returning the result. It's a bit of a play on the "critique" approach. Some of the outputs were interesting but I need a better way to evaluate the results.

Logs

2023-06-29

I've been following Jason's working experimenting with different abstractions for constructing prompts and structuring responses. I've long felt that building prompts with strings is not the type of developer experience that will win the day. On the other hand, I'm weary of the wrong abstraction...

Logs

2023-06-21

I've been thinking about the concept of "prompt overfitting". In this context, there is a distinction between model overfitting and prompt overfitting. Say you want to use a large language model as a classifier. You may give it several example inputs and the expected outputs. I don't have hard data...

Logs

2023-06-14

Richard WM Jones booted Linux 292,612 to find a bug where it hangs on boot. I loved reading the recounting of his process to accomplish this, by bisecting through the different versions of Linux and boot each thousands of times to determine whether the version contained the bug.

Logs

2023-06-08

Today, I played around with Matt Rickard's ReLLM library, another take on constraining LLM output, in this case, with regex. I tried to use it to steer a language model to generate structure (JSON) from unstructured input. This exercise is sort of like parsing or validating JSON with regex -- it's...

Logs

2023-05-29

I've been following Eric's posts about SudoLang since the first installment back in March. I've skimmed through the spec and the value proposition quite compelling. SudoLang seeks to allow programmers all levels to instruct LLMs and can also be transpiled into your programming language of...

Logs

2023-05-10

With the support of GPT-4, I feel unstoppable. The overnight surge in productivity is intoxicating, not for making money or starting a business, but for the sheer joy of continuously creating ideas from my mind, which feels like happiness. \- Ke Fang

Posts

Shaping LLM Responses

It's necessary to pay attention to the shape of a language model's response when incorporating it as a component in a software application. You can't programmatically tap into the power of a language model if you can't reliably parse its response. In the past, I have mostly used a combination of...

Posts

Auto-GPT

Experimenting with Auto-GPT

Auto-GPT is a popular project on Github that attempts to build an autonomous agent on top of an LLM. This is not my first time using Auto-GPT. I used it shortly after it was released and gave it a second try a week or two later, which makes this my third, zero-to-running effort.

Posts

GPT Prompt Attack

I came upon https://gpa.43z.one today. It's a GPT-flavored capture the flag. The idea is, given a prompt containing a secret, convince the LM to leak the prompt against prior instructions it's been given. It's cool way to develop intuition for how to prompt and steer LMs. I managed to complete all...

TIL

Trying Out Deepsparse

I've been keeping an eye out for language models that can run locally so that I can use them on personal data sets for tasks like summarization and knowledge retrieval without sending all my data up to someone else's cloud. Anthony sent me a link to a Twitter thread about product called deepsparse...