You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A relatively simple question, that I couldn't quite clarify by looking through the tech report...
During your pretraining (report section 3.1) or instruction tuning phases (report section 3.2), any time samples are "packed together" does your pipeline allow attention masks to cross document boundaries?
The text was updated successfully, but these errors were encountered:
Right, we didn't add mask there. The hope is to let the model to figure out how to use <eos> or <|endofchat|> to "refresh" the context. @tanyuqian can correct me if I am wrong.
It is a bit unconventional to do it this way with instruction tuning, but packing save us a lot of time. Intuitively, it kinda simulated the scenario where the user ends a prior conversation and starts a new one
A relatively simple question, that I couldn't quite clarify by looking through the tech report...
During your pretraining (report section 3.1) or instruction tuning phases (report section 3.2), any time samples are "packed together" does your pipeline allow attention masks to cross document boundaries?
The text was updated successfully, but these errors were encountered: