We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
你好,请问现在平台支持在本地集群中运行Megatron-LM、Deepspeed等大的训练框架吗? 我们在配置中遇到2个问题 1.Megatron多节点启动bash脚本(每个节点bash有部分参数不一样,如NODE_RANK)。如何可以让分配相同任务节点使用不同配置文件? 2.多节点bash脚本有一个需要配置主Master IP,分配任务节点是由调度器分配的并不知道后续哪一个真正工作节点,这个配置要怎么支持 有没有多节点结合Megatron-LM的实现例子,提供参考一下。
The text was updated successfully, but these errors were encountered:
No branches or pull requests
你好,请问现在平台支持在本地集群中运行Megatron-LM、Deepspeed等大的训练框架吗?
我们在配置中遇到2个问题
1.Megatron多节点启动bash脚本(每个节点bash有部分参数不一样,如NODE_RANK)。如何可以让分配相同任务节点使用不同配置文件?
2.多节点bash脚本有一个需要配置主Master IP,分配任务节点是由调度器分配的并不知道后续哪一个真正工作节点,这个配置要怎么支持
有没有多节点结合Megatron-LM的实现例子,提供参考一下。
The text was updated successfully, but these errors were encountered: