From d548ab27d453599f78f1637db69382e5c24f3c85 Mon Sep 17 00:00:00 2001 From: tanmayv25 Date: Mon, 15 Apr 2024 15:31:05 -0700 Subject: [PATCH 1/3] Document the thread count options --- README.md | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/README.md b/README.md index b82c774..76cd303 100644 --- a/README.md +++ b/README.md @@ -176,6 +176,47 @@ key: "ENABLE_CACHE_CLEANING" } ``` +* `INTER_OP_THREAD_COUNT`: + +PyTorch allows using multiple CPU threads during TorchScript model inference. +One or more inference threads execute a model’s forward pass on the given +inputs. Each inference thread invokes a JIT interpreter that executes the ops +of a model inline, one by one. This parameter sets the size of this thread +pool. The default value of this setting is the number of cpu cores. Please refer +to [this](https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html) +document for learning how to set this parameter properly. + +The section of model config file specifying this parameter will look like: + +``` +parameters: { +key: "INTER_OP_THREAD_COUNT" + value: { + string_value:"1" + } +} +``` + +* `INTRA_OP_THREAD_COUNT`: + +In addition to the inter-op parallelism, PyTorch can also utilize multiple threads +within the ops (intra-op parallelism). This can be useful in many cases, including +element-wise ops on large tensors, convolutions, GEMMs, embedding lookups and +others. The default value for this setting is the number of CPU cores. Please refer +to [this](https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html) +document for learning how to set this parameter properly. + +The section of model config file specifying this parameter will look like: + +``` +parameters: { +key: "INTRA_OP_THREAD_COUNT" + value: { + string_value:"1" + } +} +``` + * Additional Optimizations: Three additional boolean parameters are available to disable certain Torch optimizations that can sometimes cause latency regressions in models with complex execution modes and dynamic shapes. If not specified, all are enabled by default. From 603d6887777c01f14a5de27941582142d9679e68 Mon Sep 17 00:00:00 2001 From: tanmayv25 Date: Thu, 18 Apr 2024 14:36:28 -0700 Subject: [PATCH 2/3] Format fix --- src/libtorch.cc | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/libtorch.cc b/src/libtorch.cc index 8809206..c6d0b5a 100644 --- a/src/libtorch.cc +++ b/src/libtorch.cc @@ -476,8 +476,8 @@ ModelState::ParseParameters() // is made to 'intra_op_thread_count', which by default will take all // threads int intra_op_thread_count = -1; - err = ParseParameter( - params, "INTRA_OP_THREAD_COUNT", &intra_op_thread_count); + err = + ParseParameter(params, "INTRA_OP_THREAD_COUNT", &intra_op_thread_count); if (err != nullptr) { if (TRITONSERVER_ErrorCode(err) != TRITONSERVER_ERROR_NOT_FOUND) { return err; @@ -500,8 +500,8 @@ ModelState::ParseParameters() // is made to 'inter_op_thread_count', which by default will take all // threads int inter_op_thread_count = -1; - err = ParseParameter( - params, "INTER_OP_THREAD_COUNT", &inter_op_thread_count); + err = + ParseParameter(params, "INTER_OP_THREAD_COUNT", &inter_op_thread_count); if (err != nullptr) { if (TRITONSERVER_ErrorCode(err) != TRITONSERVER_ERROR_NOT_FOUND) { return err; From bb18868830ef8fcd7ab3394fc4d65ade4aae3025 Mon Sep 17 00:00:00 2001 From: Tanmay Verma Date: Thu, 18 Apr 2024 16:33:59 -0700 Subject: [PATCH 3/3] Apply suggestions from code review Co-authored-by: Jacky <18255193+kthui@users.noreply.github.com> --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 76cd303..106eb13 100644 --- a/README.md +++ b/README.md @@ -184,7 +184,7 @@ inputs. Each inference thread invokes a JIT interpreter that executes the ops of a model inline, one by one. This parameter sets the size of this thread pool. The default value of this setting is the number of cpu cores. Please refer to [this](https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html) -document for learning how to set this parameter properly. +document on how to set this parameter properly. The section of model config file specifying this parameter will look like: @@ -204,7 +204,7 @@ within the ops (intra-op parallelism). This can be useful in many cases, includi element-wise ops on large tensors, convolutions, GEMMs, embedding lookups and others. The default value for this setting is the number of CPU cores. Please refer to [this](https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html) -document for learning how to set this parameter properly. +document on how to set this parameter properly. The section of model config file specifying this parameter will look like: