I spent some more time experimenting with thought partnership with language models. I've previously experimented with this idea when building write-partner. Referring back to this work, the prompts...
While I didn't have much success getting gpt-4o to perform Task 1 - Counting Line Intersection from the Vision Language Models Are Blind paper, I pulled down some code and did a bit of testing with...
We probably are living in a simulation and we’re probably about to create the next one. Martin Casado
VLMs are Blind showed a number of interesting cases where vision language models fail to solve problems that humans can easily solve. I spent some time trying to build examples with additional...
Kent wrote this post on how to engage an audience by switching the first and second slide of a presentation. The audience focuses more as they try to fill in the gaps of what you've introduced them...
I've been chatting with qwen2, a model from Alibaba. I mostly chatted with it in English but it appears to support several other languages and I noticed a bit of Chinese leaking through even though I...
I was inspired by Daniel's post to add sidenotes to this blog. I used claude-3.5-sonnet to generate the CSS and HTML shortcode to do this. I was impressed how well it turned out[^1]. It was almost...
A nice read by Stuart on Python development tools. This introduced me to the pyproject.toml configuration file, which is more comprehensive than a requirements file. It's something I'll need to...
I reproduced Josh's claude-3.5-sonnet mirror test. I hadn't realized gpt-4 and claude-3-opus had also been "passing" this test since back in March. More interesting still, Sonnet actually seems to...
I spent some time experimenting with OpenDevin using claude-3-opus (I couldn't find an easy way to use claude-3.5-sonnet). The agentic capabilities were not bad. I gave a prompt and behind the...