2024-08-12
I tried to run florence-2 and colpali using the Huggingface serverless inference API. Searching around, there seems to pretty pretty start support for image-text-to-text models. On Github, I only found a few projects that even reference these types of models.