Thought Eddies

2025-01-16 TIL

Local VLMs Have Improved

About 6 months ago, I experimented with running a few different multi-modal (vision) language models on my Macbook. At the time, the results weren't so great.

ollama vlms

2024-08-16 Posts

I've done some experimentation extracting structured data from documents using VLMs. A summary of one approach I've tried can be found in my repo, impulse. I've found using Protobufs to be a relatively effective approach for extracting values from documents. The high-level idea is you write a...

vlms model_hallucination

2024-08-03 Posts

VLM data extraction with Protobufs

In light of OpenAI releasing structured output in the model API, let's move output structuring another level up the stack to the microservice/RPC level.

vlms protobuf

2024-07-31 Logs

2024-07-31

I tried stacking multiple pages of a pdf vertically as a single image to a model, then doing data extraction from this. It didn't work. I imagine this is because models aren't trained on much data like this. The inference seemed to output made up data. Multiple studies have shown that...

vlms rst model_hallucination

2024-07-16 Posts

VLMs aren't blind

I attempted to reproduce the results for one task from the VLMs are Blind paper. Specifically, Task 1: Counting line intersections. I ran 150 examples of lines generated by the code from the project with line thickness 4.

vlms claude-3.5-sonnet

2024-06-24 TIL

Multi-Modal Models with ollama

I spent some time experimenting with multi-modal model (also called vision models on the ollama site) to see how they perform. You try these out with the CLI ollama run <model> but I opted to use the ollama Python client.

ollama vlms

Tag: vlms

Local VLMs Have Improved

VLMs Hallucinate

VLM data extraction with Protobufs

2024-07-31

VLMs aren't blind

Multi-Modal Models with ollama

Tag: vlms

Local VLMs Have Improved

VLMs Hallucinate

VLM data extraction with Protobufs

2024-07-31

VLMs aren't blind

Multi-Modal Models with ollama

Keyboard Shortcuts

Global