Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeepSeek V3 Support #760

Open
casper-hansen opened this issue Dec 26, 2024 · 0 comments
Open

DeepSeek V3 Support #760

casper-hansen opened this issue Dec 26, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@casper-hansen
Copy link
Contributor

casper-hansen commented Dec 26, 2024

@tianyu-l Support for DeepSeek-V3 would be excellent given their top-tier performance.

Main parallelism components:

  • 64-way expert parallelism
  • 16-way pipeline parallelism
  • with ZeRO-1 data parallelism
  • Note: they do not apply TP.

Other main modeling components:

  • multi-head latent attention (MLA)
  • multi-token prediction with their MTP modules
  • mixed-precision training (mix of FP8, BF16, FP32)

Model weights: https://huggingface.co/deepseek-ai/DeepSeek-V3
Paper link: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf

Performance:
image

@tianyu-l tianyu-l added the enhancement New feature or request label Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants