Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Differences between forward and generate methods #1819

Open
victorcaquilpan opened this issue Jan 6, 2025 · 0 comments
Open

[Question] Differences between forward and generate methods #1819

victorcaquilpan opened this issue Jan 6, 2025 · 0 comments

Comments

@victorcaquilpan
Copy link

Question

I have been struggling to understand the differences between these two methods. I hope someone can help to clarify some of the next questions:

  1. Checking the documentation of LLava, we use forward for training and generate for inference, however, in some cases I have seen that people use forward for validation. This is right?
  2. It mentioned that generate method is for autoregressive generation. Forward doesn't follow a autoregressive generation? If that is true, what is the practical difference between both methods in this sense? In theory, if I take a finetuned model, and I run an inference for an input prompt with just the user question, using the forward and generate methods I can get different results? why?
  3. During training, using the forward method the input considers both the question and the answer. Does the model use part of the answer for predicting the tokens? or the model just use the answer to calculate the loss function?
  4. In my case, I am trying to use the hidden states of the last layer as input for a subsequent process, however, I have noticed that even though I can get the same output in forward than in generate methods, the hidden states not necessarily are similar, that is right?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant