Example of using DALI with binary data #135

danvass · 2022-06-22T05:15:28Z

Are there any samples of sending images as binary data to the inception_ensemble example without using Triton client and with raw HTTP requests using CURL or similar? Are there modifications that need to be made (like changing the input data type to string and the dimensions)?

banasraf · 2022-06-22T10:09:34Z

Sending binary data to Triton server through HTTP request is described here.

You don't need to adjust anything in model itself to utilize it.

danvass · 2022-06-22T10:12:30Z

@banasraf do you happen to have an example of a valid payload with the JSON prefix that would work for that config? I have it working with Triton when I'm not using DALI but I get errors when trying to do it with DALI.

danvass · 2022-06-23T15:08:05Z

Hey @banasraf, any chance you'd have a sample payload available?

szalpal · 2022-06-24T11:05:17Z

Hi @danvass ,

Unfortunately, we don't have any example of sending image data through the HTTP request. In case you would like to work put together an example, we'll be happy to help you - just tell us what errors do you have, or any details that would give some insides. We can work something out together and contribute this to DALI Backend repository, if you will agree to.

JanuszL · 2022-06-27T08:03:56Z

Hi @danvas,

I have it working with Triton when I'm not using DALI but I get errors when trying to do it with DALI.

Can you share the code snippet and the server configuration (bot working and non-working cases) so we can take a look?

banasraf · 2022-06-27T08:55:24Z

@danvass
Hi, I tested sending an inference request to ensemble_dali_inception model with curl.
I utilized the fact, that for a single input models (+other constraints) Triton accepts raw binary requests.
So it looks like this:

curl -X POST -H "Content-Type: application/octet-stream" -H "Inference-Header-Content-Length: 0" --data-binary @baboon-3089012_1280.jpg localhost:8000/v2/models/ensemble_dali_inception/infer -v --output response

I got the response:

< HTTP/1.1 200 OK
< Content-Type: application/octet-stream
< Inference-Header-Content-Length: 239
< Content-Length: 4243

Which is valid. The response is binary, so it should be outputted to a file.

Important notes:
The request needs the following headers:

Content-Type: application/octet-stream
Inference-Header-Content-Length: N - this is needed for Triton to split the JSON inference metadata from the actual binary data. In my example it is equal to 0, thanks to sending raw binary request.

Curl will need --binary-data flag to preserve the newlines in the request data.

danvass · 2022-06-30T04:45:44Z

Thanks @banasraf. I am trying to achieve this by specifying the inference header JSON to set the output to not be binary, like so:

{"inputs":[{"name":"INPUT","shape":[1],"datatype":"UINT8","parameters":{"binary_data_size":1005970}}],"outputs":[{"name":"OUTPUT","parameters":{"binary_data":false}}]}

and appending this image to the end.

However, I get the following output:

{"error":"unexpected additional input data for model 'ensemble_dali'"}

I expect I'm missing some formatting when it comes to appending the bytes of the image or calculating its binary_data_size? Hence why I was looking for an example of preparing the JSON and the image for this format.

banasraf · 2022-06-30T08:46:06Z

@danvass
The main problem with your request is the input shape. You have to specify it when sending an infer request - so in your case it would be: "shape":[1, 1005970] (1 - batch size, 1005970 - sample size).
The error you get says that there is excessive binary data attached compared to the expected shape of [1].

For me the whole process looked like this:
Assuming I have an image of size 313950, the inference header looks like this:

{"inputs":[{"name":"INPUT","shape":[1, 313950],"datatype":"UINT8","parameters":{"binary_data_size":313950}}],"outputs":[{"name":"OUTPUT","parameters":{"binary_data":false}}]}

Then I prepare the request:

cat inference_header.json > request.txt && cat images/baboon-174073_1280.jpg >> request.txt

I check the size of the inference_header.json (175 bytes in this case), to use it in the header. The curl command looks like this:

curl -X POST -H "Content-Type: application/octet-stream" -H "Inference-Header-Content-Length: 175" --data-binary @request.txt localhost:8000/v2/models/ensemble_dali_inception/infer -v

And I get the valid response.

danvass · 2022-06-30T10:06:15Z

Thnks @banasraf, are you testing this on the inception_ensemble example?

I'm getting this response when I tried adjusting it to your example:

{"error":"unexpected shape for input 'INPUT' for model 'ensemble_dali'. Expected [-1], got [1,1005970]"}

What is the expected input shape of your one? In the example, it's -1 as per https://github.com/triton-inference-server/dali_backend/blob/main/docs/examples/inception_ensemble/model_repository/ensemble_dali_inception/config.pbtxt#L29.

banasraf · 2022-06-30T10:37:44Z

@danvass
Yes, I use exactly this example.
What version of Triton server are you using?

The shape in config is in fact -1, but that means the dynamic dimension - so it should accept any size.

Could you try specifying [1005970] instead? Maybe the problem is with dimensionality.
Although the config specifies max_batch_size>1 so it should expect the batch size as a part of shape.

banasraf · 2022-06-30T10:39:46Z

Also, I see that the name of the model in your case is ensemble_dali, and in config it's "ensemble_dali_inception"
Did you change anything else in any of the configs?

danvass · 2022-06-30T11:29:54Z

@banasraf I renamed the models and set the batch size to 0. Would that affect it?

name: "ensemble_dali"
platform: "ensemble"
max_batch_size: 0
input [
  {
    name: "INPUT"
    data_type: TYPE_UINT8
    dims: [ -1 ]
  }
]
output [
  {
    name: "OUTPUT"
    data_type: TYPE_FP32
    dims: [-1, 768]
  }
]
ensemble_scheduling {
  step [
    {
      model_name: "dali_preprocess"
      model_version: -1
      input_map {
        key: "DALI_INPUT_0"
        value: "INPUT"
      }
      output_map {
        key: "DALI_OUTPUT_0"
        value: "preprocessed_image"
      }
    },
    {
      model_name: "image_model"
      model_version: -1
      input_map {
        key: "input"
        value: "preprocessed_image"
      }
      output_map {
        key: "output"
        value: "OUTPUT"
      }
    }
  ]
}

I tried changing the input dims to [ -1, -1 ] and that seems to have worked. Does that make sense?

banasraf · 2022-06-30T11:56:46Z

Yup. If you set the max_batch_size to 0, you have to specify the batch dimension in shape field.
In case of ensemble (and most model types, including DALI), you should give the max_batch_size > 0 and ommit the batch dimension from shape field.

danvass · 2022-07-01T02:11:27Z

Got it, that makes sense thank you @banasraf

How do you determine what's a reasonable max_batch_size? How would you also send a batch of images in the request?

banasraf · 2022-07-04T07:39:07Z

@danvass
Choosing the optimal max batch size is often a matter of fine tuning. The main limitation is the memory consumption (there is a limit to how big batch can be processed using your gpu's memory). Other concern is the client latency vs throughput - usually the bigger batch, the bigger throughput, but naturally response latency grows. So that's also depends on your needs.

And about sending a batch of images:
Generally Triton will expect a tensor of dimensions [batch_size, rest_of_dimensions...]. So sending a batch is really just sending a properly shaped tensor. It's easy in case of data that is naturally uniformly shaped (e.g. you send preprocessed images 300x300x3). In case of encoded images, it's tricky because their size varies. Only option to send them in a batch is to pad the data to the common size. Fortunately you can append any number of zero-bytes to an encoded image and they will be ignored, because the proper size is written into the file header. For example, we do this in our test client for the inception model here.

This might seem like an unnecessary overhead of sending zeros, but DALI performance really shines mostly when processing batches of data, so it's rather beneficial to send multiple images at once, even with padding.

brianacraig · 2024-04-11T11:27:52Z

@banasraf Thanks for putting this example together! It helped me a ton yesterday.

jantonguirao assigned banasraf Jun 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example of using DALI with binary data #135

Example of using DALI with binary data #135

danvass commented Jun 22, 2022

banasraf commented Jun 22, 2022 •

edited

Loading

danvass commented Jun 22, 2022 •

edited

Loading

danvass commented Jun 23, 2022

szalpal commented Jun 24, 2022

JanuszL commented Jun 27, 2022

banasraf commented Jun 27, 2022 •

edited

Loading

danvass commented Jun 30, 2022

banasraf commented Jun 30, 2022

danvass commented Jun 30, 2022

banasraf commented Jun 30, 2022 •

edited

Loading

banasraf commented Jun 30, 2022

danvass commented Jun 30, 2022 •

edited

Loading

banasraf commented Jun 30, 2022

danvass commented Jul 1, 2022

banasraf commented Jul 4, 2022

brianacraig commented Apr 11, 2024

Example of using DALI with binary data #135

Example of using DALI with binary data #135

Comments

danvass commented Jun 22, 2022

banasraf commented Jun 22, 2022 • edited Loading

danvass commented Jun 22, 2022 • edited Loading

danvass commented Jun 23, 2022

szalpal commented Jun 24, 2022

JanuszL commented Jun 27, 2022

banasraf commented Jun 27, 2022 • edited Loading

danvass commented Jun 30, 2022

banasraf commented Jun 30, 2022

danvass commented Jun 30, 2022

banasraf commented Jun 30, 2022 • edited Loading

banasraf commented Jun 30, 2022

danvass commented Jun 30, 2022 • edited Loading

banasraf commented Jun 30, 2022

danvass commented Jul 1, 2022

banasraf commented Jul 4, 2022

brianacraig commented Apr 11, 2024

banasraf commented Jun 22, 2022 •

edited

Loading

danvass commented Jun 22, 2022 •

edited

Loading

banasraf commented Jun 27, 2022 •

edited

Loading

banasraf commented Jun 30, 2022 •

edited

Loading

danvass commented Jun 30, 2022 •

edited

Loading