Convert Surface to CuPy mat #405

Renzzauw · 2022-10-27T08:10:36Z

Renzzauw
Oct 27, 2022

I'm researching into CuPy's capabilities and how it might operate together with VPF. With VPF there are direct ways to convert a Surface to a PyTorch Tensor on the GPU with minimal costs (using the PyTorch extension). According to the CuPy docs, both PyTorch and CuPy support __cuda_array_interface__, so they can also be converted to each other with minimal cost.
To my limited understanding, does this mean that technically a VPF Surface can directly be converted to a CuPy mat? If so, do you know a way how to do this or if this is possible with VPF?

Thanks in advance!

Answered by royinx

Sep 2, 2023

Hi @Renzzauw, I have drafted a sample code SampleCupy.py for communicating cupy and VPF,
This may be helpful to you to encode the video, Please report if there are any issues and bugs.

View full answer

RomanArzumanyan · 2022-10-27T09:31:07Z

RomanArzumanyan
Oct 27, 2022

Hi @Renzzauw

@gedoensmax is now PIC for everything VPF and I'm continuing to support it as a community member. Max is a CUDA expert so this question will go to him anyway ;)

WRT the CuPy interop. I'm not 100% sure if I understand you correctly but this is a sample which shows the pytorch tensor <> pycuda interop. Is it same for CuPy?

VideoProcessingFramework/SampleTensorRTResnet.py

Lines 1064 to 1147 in 2b69dc5

    
           class PyTorchTensorHolder(pycuda.driver.PointerHolderBase): 
        
               def __init__(self, tensor): 
        
                   super(PyTorchTensorHolder, self).__init__() 
        
                   self.tensor = tensor 
        
               def get_pointer(self): 
        
                   return self.tensor.data_ptr() 
        
           class HostDeviceMem(): 
        
               def __init__(self, host_mem, device_mem): 
        
                   self.host = host_mem 
        
                   self.device = device_mem 
        
               def __str__(self): 
        
                   return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device) 
        
               def __repr__(self): 
        
                   return self.__str__() 
        
           class TensorRTContext: 
        
               def __init__(self, trt_nn_file: str, gpu_id: int) -> None: 
        
                   self.device = cuda.Device(gpu_id) 
        
                   self.cuda_context = self.device.retain_primary_context() 
        
                   self.push_cuda_ctx() 
        
                   self.stream = cuda.Stream() 
        
                   self.logger = trt.Logger(trt.Logger.INFO) 
        
                   self.runtime = trt.Runtime(self.logger) 
        
                   f = open(trt_nn_file, 'rb') 
        
                   self.engine = self.runtime.deserialize_cuda_engine(f.read()) 
        
                   self.inputs, self.outputs, self.bindings = self.allocate_buffer() 
        
                   self.context = self.engine.create_execution_context() 
        
               def __del__(self) -> None: 
        
                   self.pop_cuda_ctx() 
        
               def push_cuda_ctx(self) -> None: 
        
                   self.cuda_context.push() 
        
               def pop_cuda_ctx(self) -> None: 
        
                   self.cuda_context.pop() 
        
               def allocate_buffer(self): 
        
                   bindings = [] 
        
                   inputs = [] 
        
                   outputs = [] 
        
                   for binding in self.engine: 
        
                       size = trt.volume(self.engine.get_binding_shape( 
        
                           binding)) * self.engine.max_batch_size 
        
                       dtype = trt.nptype(self.engine.get_binding_dtype(binding)) 
        
                       # Allocate host and device buffers 
        
                       host_mem = cuda.pagelocked_empty(size, dtype) 
        
                       device_mem = cuda.mem_alloc(host_mem.nbytes) 
        
                       # Append the device buffer to device bindings. 
        
                       bindings.append(int(device_mem)) 
        
                       if self.engine.binding_is_input(binding): 
        
                           inputs.append(HostDeviceMem(host_mem, device_mem)) 
        
                       else: 
        
                           outputs.append(HostDeviceMem(host_mem, device_mem)) 
        
                   return inputs, outputs, bindings 
        
               def run_inference(self, tensor_image) -> str: 
        
                   # Copy from PyTorch tensor to plain CUDA memory 
        
                   cuda.memcpy_dtod(self.bindings[0], PyTorchTensorHolder(tensor_image), 
        
                                    tensor_image.nelement() * tensor_image.element_size()) 
        
                   # Run inference 
        
                   self.context.execute_async_v2( 
        
                       bindings=self.bindings, stream_handle=self.stream.handle) 
        
                   # Copy outputs from GPU 
        
                   for out in self.outputs: 
        
                       cuda.memcpy_dtoh_async(out.host, out.device, self.stream) 
        
                   # Find most probable image type and return resnet categoy description 
        
                   [result] = [out.host for out in self.outputs] 
        
                   return resnet_categories[np.argmax(result)]

0 replies

Renzzauw · 2022-10-28T12:41:07Z

Renzzauw
Oct 28, 2022
Author

Hi @RomanArzumanyan thanks for the update and your support in the past!

I unfortunately do not know exactly if something similar is possible with CuPy. I did find support for low-level CUDA operations in their docs on this page, where they provide functions/classes for interacting with CUDA memory. It also appears there are some objects related to memory management, e.g. cupy.cuda.MemoryPointer or cupy.cuda.UnownedMemory, but I don't know how I can use these to access my VPF Surface's contents.

I tried accessing the memory location of the Surface as follows:

vpf_mem = rgb_planar.PlanePtr().GpuMem()

mem = cp.cuda.UnownedMemory(vpf_mem, width * height * 3, None, 0)
mem_ptr = cp.cuda.MemoryPointer(mem, 0)
frame_cp = cp.ndarray(shape=(3, height, width), dtype=np.float32, memptr=mem_ptr)

I did not have any luck so far to convert this to a CuPy mat/ndarray or if I'm doing this the right way at all, so I was hoping perhaps you or @gedoensmax would be more familiar with CuPy. If not, I'll try at the CuPy issue board. I'd like to prevent having to convert Surface <-> PyTorch <-> CuPy and instead convert Surface <-> CuPy. Luckily these steps are not very costly, but it would be nice if I could skip the extra step :)

0 replies

gedoensmax · 2022-10-28T15:58:02Z

gedoensmax
Oct 28, 2022
Maintainer

Hey i can definetly take a look, but will be on vacation the next 2 weeks.
Have you tried simply using vpf_mem = rgb_planar.PlanePtr() as pointer ? Because that should be supported if i understand this correctly: cupy/cupy#4644
Sorry for not being able to investigate further right now but i will get back to it if the problem persists

0 replies

Renzzauw · 2022-11-01T10:12:29Z

Renzzauw
Nov 1, 2022
Author

@gedoensmax Thanks for your reply!
It seems the CuPy functions expect an integer, so that's why I did vpf_mem = rgb_planar.PlanePtr().GpuMem().
No worries, enjoy your vacation! If I manage to find the solution in the meantime, I'll close this issue with what I found!

0 replies

sandhawalia · 2022-11-09T12:40:29Z

sandhawalia
Nov 9, 2022

Hi there, I have been using this, maybe it will help. rgb_frame is Surface of type nvc.PixelFormat.RGB.

def get_cupy_array(rgb_frame):

    plane = rgb_frame.PlanePtr()

    # cuPy array zero copy non ownned
    H, W, pitch = (plane.Height(), plane.Width(), plane.Pitch())

    cupy_mem = cupy.cuda.UnownedMemory(get_memptr(rgb_frame), H * W * 1, rgb_frame)
    cupy_memptr = cupy.cuda.MemoryPointer(cupy_mem, 0)

    cupy_frame = cupy.ndarray(
        (H, W // 3, 3), np.uint8, cupy_memptr, strides=(pitch, 3, 1)
    )

    return cupy_frame

0 replies

Renzzauw · 2022-11-09T13:04:47Z

Renzzauw
Nov 9, 2022
Author

@sandhawalia hi, thanks for your reply!

After a small test, your piece of code seems to do the trick on my end, so thank you so much!

I would like to encode the frames again to a new video using VPF after doing various CuPy operations on the decoded video frames. Do you perhaps know if it's also possible to convert back from CuPy -> Surface?

If so, I can completely remove PyTorch from my pipeline and get back a bit of claimed CUDA memory as a bonus :)

0 replies

sandhawalia · 2022-11-09T15:58:28Z

sandhawalia
Nov 9, 2022

Hey @Renzzauw. You can modify the cupy_frame like a regular ndarray the changes you'd make will be in the Suface as well, since they share the same underlying Memory and CuPy doesn't own it.

As long as you don't go outside the valid pixel - When done you can simply re-encode the Surface (rgb_frame)

0 replies

Renzzauw · 2022-11-09T16:11:12Z

Renzzauw
Nov 9, 2022
Author

Hi @sandhawalia, that makes a lot of sense, thank you for clarifying. Really happy to hear I can remove my dependency on PyTorch via this way.

I'll be closing this issue as for now my questions have been answered!

0 replies

Renzzauw · 2022-11-11T14:49:33Z

Renzzauw
Nov 11, 2022
Author

Hi @sandhawalia,

Thanks for the help so far! I had a small question regarding this topic, since my knowledge is a bit limited about the way CUDA surfaces exactly work or are structured, I hope you are able to help me out with perhaps a simple issue.

For my use-case I am doing various operations on each frame of a video, which also includes down-scaling a video 1920x1080 -> 1280x720 and encoding to a 1280x720 video file. I'm aware of the PySurfaceResizer, but I have per-frame logic how I want to crop frames, so it's a bit more technical compared to simply downscaling a video frame, so I cannot apply it in my use-case.

Since this output ndarray does not have the same size as the input surface, I think this means I have to create a new VPF Surface with this size and then in some way copy the data in formatting the VPF surface understands.
I do not know exactly how to do this last step from my new CuPy ndarray to a VPF Surface. I took some inspiration from the sample code for the PyTorch Plugin and tried creating a new Surface with the size of this ndarray, but I do not exactly know how to get the ndarray data into this Surface. Do you perhaps know how this can be done?

Thanks in advance!

See my code example below:

[...]
# Decode NV12 surface
src_surface = video_decoder.DecodeSingleSurface()
if src_surface.Empty():
    break

# Convert input Surface
rgb_surf = nv12_to_rgb.run(src_surface)

if rgb_surf.Empty():
    raise RuntimeError("Could not convert Surface from NV12 to RGB")

plane = rgb_surf.PlanePtr()
H, W, pitch = (plane.Height(), plane.Width(), plane.Pitch())
cupy_mem = cp.cuda.UnownedMemory(rgb_surf.PlanePtr().GpuMem(), H * W * 1, rgb_surf)
cupy_memptr = cp.cuda.MemoryPointer(cupy_mem, 0)

cupy_frame = cp.ndarray(
    (H, W // 3, 3), np.uint8, cupy_memptr, strides=(pitch, 3, 1)
)

# Crop image to 1280x720
cupy_frame_out = cupy_frame[:720, :1280, :]

# Create output surface
surface_out = nvc.Surface.Make(nvc.PixelFormat.RGB, cupy_frame.shape[1], cupy_frame.shape[0], 0)

[?]

# Convert back to NV12
nv12 = rgb_to_nv12.run(surface_out)

[encoding logic]

0 replies

sandhawalia · 2022-11-13T16:40:42Z

sandhawalia
Nov 13, 2022

Hey there.

Yes this is possible within VPF. In fact you don't need to take the CuPy route at all. VPF exposes a convenient callback for cropping Surfaces.

Here's code to shed light on this.

import PyNvCodec as nvc
# you'd replace surface with frame you read from video file 
# rgb/rgb_planar/nv12 etc
surface = nvc.Surface.Make(nvc.PixelFormat.RGB, 1920, 1080, 0)
# call `.Crop` instead of cupy_frame[:720, :1280, :]
cropped_surface = surface.Crop(0, 0, 1280, 720, 0)
cropped_surface.Width()
1280
cropped_surface.Height()
720

here's the docstring to help with arguments.

import inspect
print(inspect.getdoc(surface.Crop))
Crop(*args, **kwargs)
Overloaded function.

1. Crop(self: PyNvCodec.Surface, x: int, y: int, w: int, h: int, gpu_id: int) -> PyNvCodec.Surface


        Crop = select ROI + CUDA mem alloc + CUDA mem copy

        :param x: ROI top left X coordinate
        :param y: ROI top left Y coordinate
        :param w: ROI width in pixels
        :param h: ROI height in pixels
        :param gpu_id: GPU to use


2. Crop(self: PyNvCodec.Surface, x: int, y: int, w: int, h: int, context: int, stream: int) -> PyNvCodec.Surface


        Crop = select ROI + CUDA mem alloc + CUDA mem copy

        :param x: ROI top left X coordinate
        :param y: ROI top left Y coordinate
        :param w: ROI width in pixels
        :param h: ROI height in pixels
        :param context: CUDA contet to use
        :param stream: CUDA stream to use

0 replies

Renzzauw · 2022-12-01T15:58:34Z

Renzzauw
Dec 1, 2022
Author

Hi @sandhawalia,

Hope I don't bother with another question regarding CuPy.

I'm currently in a situation where I don't decode a Surface from a video, but combine various assets (.png layers) myself without VPF and create a CuPy array out of these, so I don't have an existing Surface on the GPU I can manipulate as in the examples above. I would like to write the result of my CuPy operations to a video file using VPF.

This means I probably would have to create a Surface object myself and somehow reference my CuPy array's memory location. Is this possible with VPF? If so, how can I achieve this or what steps should I take to transform it into a format that can be read as a Surface (I don't fully understand how strides/pitch work, so I guess the array data should have a specific layout in memory as well).

Thanks!

0 replies

sandhawalia · 2022-12-06T19:13:24Z

sandhawalia
Dec 6, 2022

Hi there. Interesting use case. If you don't mind multiple copies of your assets i'd do the following. Let's says your assets in CuPy is of the shape NHWC. calling it assets_arr_cupy

Create a N x Surfaces nvc.Surface.Make(format, width, height, gpu_id)
Use the function above to get a CuPy array for each Suface and stack them call the resulting array vpf_surfaces_cupy
copy data over cupy.copyto(vpf_surfaces_cupy, assets_arr_cupy)

Another option would be read your assets into the vpf_surfaces_cupy directly and then encode the video with VPF. Hope this helps.

0 replies

Renzzauw · 2022-12-08T09:44:26Z

Renzzauw
Dec 8, 2022
Author

@sandhawalia Thanks for the extensive explanation, this is really useful!

0 replies

gedoensmax · 2022-12-08T09:48:39Z

gedoensmax
Dec 8, 2022
Maintainer

I feel like this is helpful for more people and i will transition this issue to a discussion !

0 replies

Renzzauw · 2022-12-21T13:27:26Z

Renzzauw
Dec 21, 2022
Author

Hi all!

I've been trying various experiments in altering VPF surfaces using CuPy and encoding the result, but I seem to run into issues at the encoding step.

As a simple experiment setup, I create a Surface used for encoding, to which I copy the pixel values and then convert to NV12 and encode. In my case here, I simply set all pixel values to [125, 125, 125]. When I set breakpoints after this step, I can see in the video_frame_enc the values have indeed been copied to the CuPy ndarray referencing the Surface, suggesting altering of the pixel values has succeeded.

Converting from RGB -> NV12 seems to succeed as well without any errors. Upon running video_encoder.EncodeSingleSurface(nv12, enc_frame), it raises the following exception:
RuntimeError: Error while encoding frame

Does anybody have any clue what could be the potential issue here? Thanks!

        # Initialize video decoder
        video_decoder = nvc.PyNvDecoder(video_path, gpu_device)
        video_metadata = FFProbe.get_file_metadata(video_path)
        fps = video_metadata['video_stream']['fps_exact']
        width = video_decoder.Width()
        height = video_decoder.Height()
        total_frames = video_decoder.Numframes()

        # Initialize video encoder
        output_width, output_height = output_resolution
        enc_args = {
            'preset': 'default',
            'codec': 'h264',
            's': f"{str(width)}x{str(height)}",
            'tuning_info': 'high_quality',
            'bitrate': '7.5M',
            'profile': 'high',
            'bf': '1',
            'fps': fps
        }

        video_encoder = nvc.PyNvEncoder(enc_args,
                                        gpu_id=gpu_device,
                                        format=nvc.PixelFormat.NV12)

        # Surface pixel format conversion NV12 -> planar RGB
        nv12_to_rgb = PixelFormatConverterVpf.nv12_to_rgb(width, height, gpu_device)

        # Surface pixel format conversion planar RGB -> NV12
        rgb_to_nv12 = PixelFormatConverterVpf.rgb_to_nv12(output_width, output_height, gpu_device)

        # Encoded video frame
        enc_frame = np.ndarray(shape=(0), dtype=np.uint8)
        
        # CUPY REFERENCE TO VPF SURFACE USED FOR ENCODING
        enc_surf = nvc.Surface.Make(nvc.PixelFormat.RGB, width, height, gpu_device)
        plane = enc_surf.PlanePtr()
        height, width, pitch = enc_surf.PlanePtr().Height(), enc_surf.PlanePtr().Width(), enc_surf.PlanePtr().Pitch()
        enc_mem_ptr = cp.cuda.MemoryPointer(cp.cuda.UnownedMemory(enc_surf.PlanePtr().GpuMem(), height * width * 1, enc_surf), 0)
        video_frame_enc = cp.ndarray((height, width // 3, 3), np.uint8, enc_mem_ptr, strides=(pitch, 3, 1))


        with open(export_path, "wb") as output_file:
            f = 0

            while True:
                # ========== DECODING ==========

                # Decode NV12 surface
                src_surface = video_decoder.DecodeSingleSurface()
                if src_surface.Empty():
                    break

                # Convert input Surface to RGB
                rgb = nv12_to_rgb.run(src_surface)

                if rgb.Empty():
                    raise RuntimeError("Could not convert Surface from NV12 to RGB")

                plane = rgb.PlanePtr()
                height, width, pitch = plane.Height(), plane.Width(), plane.Pitch()
                mem = cp.cuda.UnownedMemory(rgb.PlanePtr().GpuMem(), height * width * 1, rgb)
                mem_ptr = cp.cuda.MemoryPointer(mem, 0)
                # Copy data to new array as CuPy does not own the Surface data
                video_frame = cp.ndarray((height, width // 3, 3), np.uint8, mem_ptr, strides=(pitch, 3, 1))

                # HERE IS WHERE I TRY TO ALTER THE SURFACE PIXEL VALUES 
                video_frame_enc[...] = 125
                # cp.copyto(video_frame_enc, video_frame)

                # ========== ENCODING ==========
                # Convert PyTorch tensor to surface
                # surface_rgb = VideoFrameDataTypeConversion.tensor_to_surface(frame, gpu_device)

                # Convert back to NV12 for encoding
                nv12 = rgb_to_nv12.run(enc_surf)

                if nv12.Empty():
                    raise RuntimeError("Could not convert Surface from planar RGB to NV12")

                # Encode surface
                success = video_encoder.EncodeSingleSurface(nv12, enc_frame) <-------- RAISES ERROR "RuntimeError: Error while encoding frame"
                if success:
                    byte_array = bytearray(enc_frame)
                    output_file.write(byte_array)

                f += 1
                pbar.update(1)

            # Flush any frames that have been encoded, but not received yet.
            while True:
                success = video_encoder.FlushSinglePacket(enc_frame)
                if success:
                    byte_array = bytearray(enc_frame)
                    output_file.write(byte_array)
                else:
                    break

2 replies

royinx Sep 2, 2023

Hi @Renzzauw, I have drafted a sample code SampleCupy.py for communicating cupy and VPF,
This may be helpful to you to encode the video, Please report if there are any issues and bugs.

Answer selected by Renzzauw

Renzzauw Sep 4, 2023
Author

Hi @royinx, thank you for notifying me! I was still looking for a CuPy solution for encoding my video but failed to get it working myself, so this is perfect! I already gave it a go and it seems to work as expected so far. If I run into any issues I will let you know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert Surface to CuPy mat #405

{{title}}

Replies: 15 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Convert Surface to CuPy mat #405

Renzzauw Oct 27, 2022

Replies: 15 comments · 2 replies

RomanArzumanyan Oct 27, 2022

Renzzauw Oct 28, 2022 Author

gedoensmax Oct 28, 2022 Maintainer

Renzzauw Nov 1, 2022 Author

sandhawalia Nov 9, 2022

Renzzauw Nov 9, 2022 Author

sandhawalia Nov 9, 2022

Renzzauw Nov 9, 2022 Author

Renzzauw Nov 11, 2022 Author

sandhawalia Nov 13, 2022

Renzzauw Dec 1, 2022 Author

sandhawalia Dec 6, 2022

Renzzauw Dec 8, 2022 Author

gedoensmax Dec 8, 2022 Maintainer

Renzzauw Dec 21, 2022 Author

royinx Sep 2, 2023

Renzzauw Sep 4, 2023 Author

Renzzauw
Oct 27, 2022

Replies: 15 comments 2 replies

RomanArzumanyan
Oct 27, 2022

Renzzauw
Oct 28, 2022
Author

gedoensmax
Oct 28, 2022
Maintainer

Renzzauw
Nov 1, 2022
Author

sandhawalia
Nov 9, 2022

Renzzauw
Nov 9, 2022
Author

sandhawalia
Nov 9, 2022

Renzzauw
Nov 9, 2022
Author

Renzzauw
Nov 11, 2022
Author

sandhawalia
Nov 13, 2022

Renzzauw
Dec 1, 2022
Author

sandhawalia
Dec 6, 2022

Renzzauw
Dec 8, 2022
Author

gedoensmax
Dec 8, 2022
Maintainer

Renzzauw
Dec 21, 2022
Author

Renzzauw Sep 4, 2023
Author