Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About frames number #38

Open
ApolloRay opened this issue Dec 17, 2024 · 1 comment
Open

About frames number #38

ApolloRay opened this issue Dec 17, 2024 · 1 comment

Comments

@ApolloRay
Copy link

In this paper, it mentions that "For very high-resolution images, we limit the maximum number of grids to 49". For LongVA, each frame of the video is treated as an image 336 * 336 * N. I'm not sure whether it means the maximum number of frame = 49 ?

@jzhang38
Copy link
Collaborator

The max number of N is 49. That is just a limit we use in the code. I do not really know if there is such high res image in the training set. Normally the res for OCR data set is around 1K and the rest is around 500P.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants