-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detail about BERT-based Training #1
Comments
@SivilTaram
|
@liu-nlper Thanks for your quick response! I will try again following your kind suggestions. If it is solved, I will get back to report the experimental results. |
After struggling for a few days, finally I have to admit that it is difficult to incorporate the official 12-layer BERT chinese version into the task of rewrite (either for reproduced T-Ptr-Net, or T-Ptr-Lambda, even for L-Ptr-Lambda). I have tried for several ways as following, but none of them has shown improvements than the non-BERT baseline:
I post the above results for reference. If any reader has employ the BERTology (Google's 12 layer chinese model) into the task successfully, please feel free to concat me (qian dot liu at buaa.edu.cn), thanks :) |
I also try these bert model to initialize transformer layer, but didn't show improvements. Model following:
But I find bert_based model show better in other dev dataset. |
Thanks for the great work on reproducing the T-Ptr-$lambda$ model! I have reproduced the non-BERT result with your kindly instruction. However, when I tried to combine the model with pretrained chinese BERT model (the Google offical
bert-chinese-uncased
), the model seesm not to convergence. Could you kindly provide more detais about your bert-based training for reference (e.g. learning_rate, warmup_steps, and training epoches) ? Any suggestion is also well welcome.Thanks a lot, Qian.
The text was updated successfully, but these errors were encountered: