-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best practice for video ingestion #88
Comments
Hello @SiftingSands ! We do not have any example with video yet, but I do agree, working with videos right now via Triton is cumbersome. Reading and decoding video in DALI pipeline is different from other types of data, since both these operations happen at single step. You are the first to ask about videos support, we didn't focus on this yet, but we'll put it on our priority list. We need to think about how we can enable this feature for DALI Backend. I'll be posting updates on the status of this feature request in this Issue. cheers! |
Thanks for confirming the status of video decoding with Triton. Glad I asked before I tried something like decoding a video into a bit stream on the client and passing that over to the server. Looking forward to future developments! |
I don't precisely know, what is your use case. However, have you considered using DeepStream? It is a AI powered video analysis library and it is possible to use it via Triton |
I did see DeepStream with Triton as an option, but I had already used DALI for my inference use case, so I didn't want to switch to another solution if unnecessary. My videos are 'at rest', so I also wasn't sure if DeepStream would be a suitable route. Regardless, I'm not currently in a huge rush for Triton integration. |
@szalpal Is there any update on this? |
Hi @Tandon-A ! Unfortunately not yet. Currently the only way of decoding videos in DALI is to read them from disk, which is not the desired scenario in Triton case. As soon as we add proper enhancements in DALI, we'll add an example to DALI Backend. |
Hi, I have a similar use case, where I have videos saved as files, and I want the user to send me the path to one of them. I was trying to receive the path using Is this use case supposed to work? It looks like the Dali backend refuses to receive a string input, is that really the case? If this is impossible, would it be possible to replace the video file each call and have the Dali pipeline work with one fixed path (I guess in that case it wouldn't need to receive any inputs at all)? |
Hi @nitsanh ! Unfortunately, your use case won't work, because of few reasons. First of all, external_source and DALI Backend currently do not work with TYPE_STRING. We do have this on our radar and you may expect support for TYPE_STRING there in the next few releases. But most importantly, DALI's The solution you proposed, that Anyway, we do plan extend the support for videos, as we are aware that currently video support in Triton+DALI is virtually non-existing. |
Thanks for your reply, I'll try using DeepStream or VPF then. |
@SiftingSands , would you mind sharing with us some more insides into your use-cases? We are starting to work on video support in DALI Backend and we'd like to know some possible requirements. Initial plan is to support the video data, provided that whatever is passed to the input of the DALI pipeline is an entire video file in binary buffer. Would this suffice your needs? |
My use case is this: I also needed a way to get that entire video file from the user, and then back to them. I’m not sure what is the proper way to do that, I thought maybe having another service in the same machine as Triton, that interacts with the user and downloads / uploads the video files from a bucket in the cloud. My main issue was that Dali doesn’t support reading a video from an arbitrary file. I think the best approach would have been to pass Dali with a string representing the video path in the local file system, then use the video reader to read it. I think passing the entire video file in binary buffer wouldn't work well for large videos, would it? |
My initial use case was performing inference with a slew of models on a data warehouse of videos (not live streaming frames from a camera). Currently, inference is performed on a single video frame (Bx1xCxWxH) with each GPU taking a separate shard of the video dataset, but I can see have a batch of multiple frames in the future as well (BxNxCxWxH). One nice thing that I assume Triton would be capable of, is using the dedicated NVDEC GPU hardware to decode the video asynchronously from the model inference operations to increase throughput. You mentioned passing in the entire video in a binary buffer. I have 4K videos encoded at high quality, so they can be 100s of MB and even up to a 1GB in size. Not sure if that's an issue, so just bringing that up in case. I also have a lot of VFR videos, but I know DALI's video reader for that is still in progress. |
Our use case would be the following: Our reason for using vanilla TRITIS and not in combination with Nvidia Deep Stream:
|
Is it visible for you when that feature could be come available? |
@Alwahsh , We recently updated DALI's capabilities to support some of the Video scenarios. We added two video decoding operators:
Hopefully this addresses your use case. If not, please let me know and maybe we can figure something out. In case you have any questions about usage, don't hesitate to ask :) |
@szalpal That is what I was looking for. Thanks a lot. Please let me know if I should open a separate issue for my questions. From the documentation here:
Does that mean that there's currently a limitation of having only batch size of one in Triton's Configuration file?
Is there a way to accept higher batch sizes in videos on Triton? I tried to increase the
where X is the max_batch_size I chose. I can't make sense of that behavior but I assume when it works, it's actually not generating correct information from the batch of videos but instead getting information from the first one only. |
@Alwahsh , it would be best if you could provide info about your input (like video duration etc), but I'll try to answer your question regardless. The
Therefore, if you create a pipeline like this: @pipeline_def(batch_size=3, ...)
def pipe():
return fn.inputs.video(sequence_length=5, ...) In the output you'll get a batch of 3 sequences and every sequence will have 5 frames. However, the error message you've encountered might be a symptom of a bug. Could you tell, what was the duration of your video file (in frames) and what parameters ( To answer your remaining questions:
No. The value you set in If you have any more questions, don't hesitate to ask :) |
@szalpal Thanks for your response and keen on help 🙏. I think I might have confusion about the meaning of This is my pipeline code for reference:
and my Triton Configuration file:
The following table shows the configurations that work and that give an error:
Concurrency of perf analyzer for all trials are set to 1. So, in the 3rd row, although the max_batch_size is set to 2, the case of receiving 2 requests in parallel doesn't happen causing the input batch size to always be 1, still the error happens. If my understanding is correct, does that mean there is currently no way to do dynamic batching and process multiple videos at the same time as a single batch in Triton + DALI setup? |
@Alwahsh , Thank you for providing these details. I'll look into the table you provided and verify if this is an unwanted behaviour or not. If I understand correctly, you want to decode only 16 first frames from every video file you have? My remark in this case is that it won't be easy. Unfortunately at the moment in DALI we assume, that the whole video file needs to be processed, regardless of the type of video processing operator used. How important this approach would be for you? We may be able to add a feature that allows for such behaviour, I'd need to consult this with the team.
Partially correct :)
Correct. |
@szalpal Thanks for the clarification and continued help.
Yes, that's what I want to do. Please note though that I get high performance if I set The functionality is indeed important to me because it affects the performance. The most useful thing would be the ability to parse specific frames not necessarily from the beginning(common sampling in DNN applications) but for now since I'm only measuring the performance, I'm fine with simulating the number of decoded frames as if they're the first X frames rather than the actual ones. |
With DALI, outside of Triton, I would call
fn.readers.video
with a list of video filepaths. However, is that still feasible within Triton, because all the examples I see here are passing numpy arrays toExternalSource
. Was there a example with video decoding available that I missed? My current use case isn't streaming video, so I have videos 'at rest'.The text was updated successfully, but these errors were encountered: