Skip to content

Commit

Permalink
Document the thread count options (#126)
Browse files Browse the repository at this point in the history
* Document the thread count options

* Format fix

* Apply suggestions from code review

Co-authored-by: Jacky <[email protected]>

---------

Co-authored-by: Jacky <[email protected]>
  • Loading branch information
tanmayv25 and kthui authored Apr 24, 2024
1 parent c50d65b commit 5c97507
Show file tree
Hide file tree
Showing 2 changed files with 45 additions and 4 deletions.
41 changes: 41 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,47 @@ key: "ENABLE_CACHE_CLEANING"
}
```

* `INTER_OP_THREAD_COUNT`:

PyTorch allows using multiple CPU threads during TorchScript model inference.
One or more inference threads execute a model’s forward pass on the given
inputs. Each inference thread invokes a JIT interpreter that executes the ops
of a model inline, one by one. This parameter sets the size of this thread
pool. The default value of this setting is the number of cpu cores. Please refer
to [this](https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html)
document on how to set this parameter properly.

The section of model config file specifying this parameter will look like:

```
parameters: {
key: "INTER_OP_THREAD_COUNT"
value: {
string_value:"1"
}
}
```

* `INTRA_OP_THREAD_COUNT`:

In addition to the inter-op parallelism, PyTorch can also utilize multiple threads
within the ops (intra-op parallelism). This can be useful in many cases, including
element-wise ops on large tensors, convolutions, GEMMs, embedding lookups and
others. The default value for this setting is the number of CPU cores. Please refer
to [this](https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html)
document on how to set this parameter properly.

The section of model config file specifying this parameter will look like:

```
parameters: {
key: "INTRA_OP_THREAD_COUNT"
value: {
string_value:"1"
}
}
```

* Additional Optimizations: Three additional boolean parameters are available to disable
certain Torch optimizations that can sometimes cause latency regressions in models with
complex execution modes and dynamic shapes. If not specified, all are enabled by default.
Expand Down
8 changes: 4 additions & 4 deletions src/libtorch.cc
Original file line number Diff line number Diff line change
Expand Up @@ -476,8 +476,8 @@ ModelState::ParseParameters()
// is made to 'intra_op_thread_count', which by default will take all
// threads
int intra_op_thread_count = -1;
err = ParseParameter(
params, "INTRA_OP_THREAD_COUNT", &intra_op_thread_count);
err =
ParseParameter(params, "INTRA_OP_THREAD_COUNT", &intra_op_thread_count);
if (err != nullptr) {
if (TRITONSERVER_ErrorCode(err) != TRITONSERVER_ERROR_NOT_FOUND) {
return err;
Expand All @@ -500,8 +500,8 @@ ModelState::ParseParameters()
// is made to 'inter_op_thread_count', which by default will take all
// threads
int inter_op_thread_count = -1;
err = ParseParameter(
params, "INTER_OP_THREAD_COUNT", &inter_op_thread_count);
err =
ParseParameter(params, "INTER_OP_THREAD_COUNT", &inter_op_thread_count);
if (err != nullptr) {
if (TRITONSERVER_ErrorCode(err) != TRITONSERVER_ERROR_NOT_FOUND) {
return err;
Expand Down

0 comments on commit 5c97507

Please sign in to comment.