VLM Prompt Step Error with Pixel Values #1025

ayushgun · 2025-01-02T17:31:00Z

Summary

The vision model fails during inference with the error message:
"Pixel values were specified for a non-prompt."

Steps to Reproduce

Use the reproducer code provided below.
Fetch an image using reqwest.
Load the image into memory using the image crate.
Create VisionMessages with an image and a user prompt.
Send a chat request to the model.

Code Reproducer

use image;
use mistralrs::{IsqType, TextMessageRole, VisionLoaderType, VisionMessages, VisionModelBuilder};
use reqwest;

#[tokio::main]
async fn main() {
    let model =
        VisionModelBuilder::new("HuggingFaceTB/SmolVLM-Instruct", VisionLoaderType::Idefics3)
            .with_isq(IsqType::Q4_0)
            .with_logging()
            .build()
            .await
            .expect("Failed to build model");

    let response = reqwest::get("http://farm1.staticflickr.com/32/53895647_9ff594a688_z.jpg")
        .await
        .expect("Failed to fetch image");
    let image = image::load_from_memory(&response.bytes().await.expect("Failed to read bytes"))
        .expect("Failed to load image");

    let messages = VisionMessages::new()
        .add_image_message(
            TextMessageRole::User,
            "What is depicted here? Please describe the scene in detail.",
            image,
            &model,
        )
        .expect("Failed to create vision message");

    let response = model
        .send_chat_request(messages)
        .await
        .expect("Error occurred during inference");

    println!("{}", response.choices[0].message.content.as_ref().unwrap());
}

Observed Behavior

The model logs show that the pixel values for an image are incorrectly flagged as specified for a non-prompt. This leads to a runtime error:

2025-01-02T17:18:31.336956Z ERROR mistralrs_core::engine: prompt step - Model failed with error: Msg("Pixel values were specified for a non-prompt.")
thread 'main' panicked at src/main.rs:33:10:
Error occurred during inference: ChatModelError { msg: "Pixel values were specified for a non-prompt.", incomplete_response: ChatCompletionResponse { ... } }

The text was updated successfully, but these errors were encountered:

EricLBuehler · 2025-01-07T22:08:40Z

@ayushgun can you please check this again after git pull? I merged some PRs which should help this.

ayushgun added the bug Something isn't working label Jan 2, 2025

EricLBuehler mentioned this issue Jan 7, 2025

Patch prefix caching to fix incorrect outputs #1035

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VLM Prompt Step Error with Pixel Values #1025

VLM Prompt Step Error with Pixel Values #1025

ayushgun commented Jan 2, 2025 •

edited

Loading

EricLBuehler commented Jan 7, 2025

VLM Prompt Step Error with Pixel Values #1025

VLM Prompt Step Error with Pixel Values #1025

Comments

ayushgun commented Jan 2, 2025 • edited Loading

Summary

Steps to Reproduce

Code Reproducer

Observed Behavior

EricLBuehler commented Jan 7, 2025

ayushgun commented Jan 2, 2025 •

edited

Loading