[Question] Differences between forward and generate methods #1819

victorcaquilpan · 2025-01-06T02:01:26Z

I have been struggling to understand the differences between these two methods. I hope someone can help to clarify some of the next questions:

Checking the documentation of LLava, we use forward for training and generate for inference, however, in some cases I have seen that people use forward for validation. This is right?
It mentioned that generate method is for autoregressive generation. Forward doesn't follow a autoregressive generation? If that is true, what is the practical difference between both methods in this sense? In theory, if I take a finetuned model, and I run an inference for an input prompt with just the user question, using the forward and generate methods I can get different results? why?
During training, using the forward method the input considers both the question and the answer. Does the model use part of the answer for predicting the tokens? or the model just use the answer to calculate the loss function?
In my case, I am trying to use the hidden states of the last layer as input for a subsequent process, however, I have noticed that even though I can get the same output in forward than in generate methods, the hidden states not necessarily are similar, that is right?

Thanks

Provide feedback