New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

预训练数据集 #6

Open

yoyoshikc opened this issue Apr 12, 2022 · 2 comments

yoyoshikc commented Apr 12, 2022

请问作者对t5的中文预训练是用的什么数据集？谢谢！

Owner

SunnyGJing commented Apr 17, 2022

用的是精处理后的Pegasus伪摘要式语料，近30G，暂未开源~

520jefferson commented Oct 20, 2022

用的是精处理后的Pegasus伪摘要式语料，近30G，暂未开源~
那T5语言模型前处理mask的脚本有吗？

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment