Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VLM Prompt Step Error with Pixel Values #1025

Open
ayushgun opened this issue Jan 2, 2025 · 1 comment
Open

VLM Prompt Step Error with Pixel Values #1025

ayushgun opened this issue Jan 2, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@ayushgun
Copy link

ayushgun commented Jan 2, 2025

Summary

The vision model fails during inference with the error message:
"Pixel values were specified for a non-prompt."

Steps to Reproduce

  1. Use the reproducer code provided below.
  2. Fetch an image using reqwest.
  3. Load the image into memory using the image crate.
  4. Create VisionMessages with an image and a user prompt.
  5. Send a chat request to the model.

Code Reproducer

use image;
use mistralrs::{IsqType, TextMessageRole, VisionLoaderType, VisionMessages, VisionModelBuilder};
use reqwest;

#[tokio::main]
async fn main() {
    let model =
        VisionModelBuilder::new("HuggingFaceTB/SmolVLM-Instruct", VisionLoaderType::Idefics3)
            .with_isq(IsqType::Q4_0)
            .with_logging()
            .build()
            .await
            .expect("Failed to build model");

    let response = reqwest::get("http://farm1.staticflickr.com/32/53895647_9ff594a688_z.jpg")
        .await
        .expect("Failed to fetch image");
    let image = image::load_from_memory(&response.bytes().await.expect("Failed to read bytes"))
        .expect("Failed to load image");

    let messages = VisionMessages::new()
        .add_image_message(
            TextMessageRole::User,
            "What is depicted here? Please describe the scene in detail.",
            image,
            &model,
        )
        .expect("Failed to create vision message");

    let response = model
        .send_chat_request(messages)
        .await
        .expect("Error occurred during inference");

    println!("{}", response.choices[0].message.content.as_ref().unwrap());
}

Observed Behavior

The model logs show that the pixel values for an image are incorrectly flagged as specified for a non-prompt. This leads to a runtime error:

2025-01-02T17:18:31.336956Z ERROR mistralrs_core::engine: prompt step - Model failed with error: Msg("Pixel values were specified for a non-prompt.")
thread 'main' panicked at src/main.rs:33:10:
Error occurred during inference: ChatModelError { msg: "Pixel values were specified for a non-prompt.", incomplete_response: ChatCompletionResponse { ... } }
@ayushgun ayushgun added the bug Something isn't working label Jan 2, 2025
@EricLBuehler
Copy link
Owner

@ayushgun can you please check this again after git pull? I merged some PRs which should help this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants