-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_video error for slightly large videos when extracting S3D features. #90
Comments
I've tried using My video has ~13k frames, and I'm wondering if the problem is that the code loads all 13k frames into the cpu/gpu at once. I'm new to this field entirely, so please do let me know if I'm missing something. |
I am quite sure that the issue is related to the lack of RAM. You may confirm it by tracking the RAM level as you run the script with your video. The reason why it works with opencv is the way it loads the video. In particular, in contrast to torchvision which tries to read the whole video into RAM, opencv reads frames one by one and features are extracted from chunks of the frames and those are discarded after that. My suggestion is to split your long video into small pieces with ffmpeg. I do admit that such a difference between readers is confusing and limits applications. However, it ensures that the feature extraction process matches the one that was used during training. |
Can anyone share a success story to run this, especially the hardware configuration (RAM and GPU memory size probably)?. |
You can use the OpenCV video reader instead of the torchvision video reader, seemed to fix the issue in my case. # rgb_vid, audio, info = read_video(video_path, pts_unit='sec')
print("Video reading started")
cap = cv2.VideoCapture(video_path)
fps = cap.get(cv2.CAP_PROP_FPS)
rgb_stack = []
while cap.isOpened():
frame_exists, rgb = cap.read()
if frame_exists:
# preprocess the image
rgb = cv2.cvtColor(rgb, cv2.COLOR_BGR2RGB)
rgb_stack.append(rgb)
else:
# we don't run inference if the stack is not full (applicable for i3d)
cap.release()
break
rgb1 = torch.tensor(np.array(rgb_stack)) |
I am using it to extract features from the XD-Violence dataset, and I compared the numpy arrays (using |
Ok, that's great to know. However, I think the suggested code won't work if you have 1000s of frames. The code above needs to be updated to handle chunks of frames and their release after features were extracted for that chunk to free up memory. It should be similar to how it is done for I3D: video_features/models/i3d/extract_i3d.py Lines 116 to 122 in 896b852
If |
I was trying to extract S3D features on a video (~51MB, 11 mins), and was getting an error at the very start of the extraction process, with a console message
Killed
.This is occurring because in
extract-S3D.py
, we're using theread_video
fromtorchvision.io.video
to process the video file. I tried to execute only this statement separately and faced the same issue. However, I was able to process a smaller video file (<1MB, ~5 secs) and feature extraction then proceeded without a hitch. Same for the samples provided in the repo. This issue is not present in the I3D feature extraction. Probably because there you use theVideoCapture
methods from OpenCV?I'm trying to see if some other video reader works for this, but I am unsure if
read_video
applies any transforms before outputting the RGB torch array mentioned in the code. Can you suggest any workaround if this doesn't work?The torchvision version in my environment is 0.12.0, omegaconf is 2.1.1 as described.
EDIT: I've tried extracting the features for the video I had issue with on the S3D colab notebook, but the kernel crashes there as well.
The text was updated successfully, but these errors were encountered: