-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorporate runtime into model configuration #285
Conversation
ae1bc12
to
368aa6b
Compare
50932e1
to
5dd3ac5
Compare
5dd3ac5
to
3959a19
Compare
src/constants.h
Outdated
constexpr char kPyTorchBackend[] = "pytorch"; | ||
|
||
constexpr char kPythonFilename[] = "model.py"; | ||
constexpr char kPythonBackend[] = "python"; | ||
|
||
constexpr char kVLLMBackend[] = "vllm"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need these for any new python based backend? Would be good to avoid that if possible - we shouldn't need to update the constants if someone adds a backend in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need them. constexpr char kVLLMBackend[] = "vllm";
is removed, and they are handled as custom backend.
df1f9ff
to
0d89719
Compare
0d89719
to
b0e4663
Compare
…jacky-python-based-pytorch
…jacky-python-based-pytorch
…jacky-python-based-pytorch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very clean implementation 🚀
Please make sure to run a relatively full pipeline that tests some custom backends and L0_backend_python and PYTORCH:TEST etc. before merge - not just standard L0
Co-authored-by: Ryan McCormick <[email protected]>
Related PRs:
Previously, the runtime for each backend is determined automatically when loading the backend. The PR adds the ability for user to choose which runtime to use for each model on the model configuration. For example:
To retain backward compatibility, the logic previously used to determine runtime is incorporated into autocomplete, which will attempt to fill-in the "runtime" field if left empty. The core will load the exact runtime as specified on the "runtime" field on the model configuration, after any applicable autocomplete.
As a part of the change, the autocomplete logic will fill in Python backend based runtime for backends that are clearly Python backend based backends, for example, vLLM backend.
If a backend provides both C++ and Python backend based runtime, for example, PyTorch backend, the autocomplete will look at the default model filename on the model configuration to try determine whether the C++ or Python backend based runtime is more appropriate. For example, if the default model filename on PyTorch backend is "model.pt", the C++ runtime will be selected. If the default model filename is "model.py", the Python backend based runtime will be selected. In any case, the autocomplete will not alter the runtime if it is explicitly provided by the user.