Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

预训练数据集 #6

Open
yoyoshikc opened this issue Apr 12, 2022 · 2 comments
Open

预训练数据集 #6

yoyoshikc opened this issue Apr 12, 2022 · 2 comments

Comments

@yoyoshikc
Copy link

请问作者对t5的中文预训练是用的什么数据集?谢谢!

@SunnyGJing
Copy link
Owner

用的是精处理后的Pegasus伪摘要式语料,近30G,暂未开源~

@520jefferson
Copy link

用的是精处理后的Pegasus伪摘要式语料,近30G,暂未开源~
那T5语言模型前处理mask的脚本有吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants