ARSN-381 RPC command system between cluster workers #2202

jonathan-gramain · 2023-12-29T17:21:00Z

When using the cluster module, new processes are forked and are dispatched workloads, usually HTTP requests. The ClusterRPC module implements a RPC system to send commands to all cluster worker processes at once from any particular worker, and retrieve their individual command results, like a distributed map operation.

The existing cluster IPC channel is setup from the primary to each worker, but not between workers, so there has to be a hop by the primary.

How a command is treated:

a worker sends a command message to the primary
the primary then forwards that command to each existing worker (including the requestor)
each worker then executes the command and returns a result or an error
the primary gathers all workers results into an array
finally, the primary dispatches the results array to the original requesting worker callback

The original use of this feature is in Metadata DBD (bucketd) to implement a global cache refresh across worker processes.

bert-e · 2023-12-29T17:21:04Z

Hello jonathan-gramain,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Status report is not available.

bert-e · 2023-12-29T17:21:15Z

Request integration branches

Waiting for integration branch creation to be requested by the user.

To request integration branches, please comment on this pull request with the following command:

/create_integration_branches

Alternatively, the /approve and /create_pull_requests commands will automatically
create the integration branches.

tests/functional/clustering/ClusterRPC-test-server.js

lib/clustering/ClusterRPC.ts

rachedbenmustapha · 2024-01-11T19:08:42Z

lib/clustering/ClusterRPC.ts

+        uids,
+        payload,
+    };
+    return process.send?.(message);


If process.send is undefined no worker will receive the call, so it might be worth failing early at init if the IPC channel is not available.

rachedbenmustapha · 2024-01-11T19:12:03Z

lib/clustering/ClusterRPC.ts

+ * - each worker then executes the command and returns a result or an
+ *   error
+ *
+ * - the primary gathers all workers results into an array


I don't see a way to set a deadline or timeout. If one worker is stuck or dies for any reason, the command will stay in-flight and never return - leaving the requester to either hang or trigger its own timeout, and uidsToWorkerId would keep growing over time.

This might be acceptable, in which case a mention of that behavior in the jsdoc would be good.

I think it is a good idea indeed to add a worker timeout for the reason you mention (maybe configurable via an extra argument to sendWorkerCommand).

rachedbenmustapha · 2024-01-11T19:17:36Z

lib/clustering/ClusterRPC.ts

+
+// exported types
+
+export type ResultObject = {


I don't remember if Promises are available within the target environment (I'd think so), but they would help with a nicer API even if async/await is not available: getting rid of nested callbacks and making the ResultObject wrapper type superfluous (even possibly the callback types but that may be going too far).

Yes we could use promises. Actually Metadata can use promises and async/await even though most of the code is legacy, there are small pieces of code that do use them.

It's not clear to me why ResultObject would not be needed anymore though, but I can look at it I may figure it out :)

Ah I think I understand, replacing the "error" field by a rejected promise, and the "results" by a resolved one

Yes exactly!

rachedbenmustapha · 2024-01-11T19:35:41Z

lib/clustering/ClusterRPC.ts

+ * - the primary then forwards that command to each existing worker
+ *   (including the requestor)
+ *
+ * - each worker then executes the command and returns a result or an


Kind of obvious but a mention might be warranted of the type of messages that are allowed in/out - since they have to be serialized, any circular references or very big buffers would cause issues.

rachedbenmustapha · 2024-01-11T19:42:27Z

lib/clustering/ClusterRPC.ts

+            return undefined;
+        }
+        // send back response to original worker
+        _dispatchCommandResultsToWorker(toWorker, uids, completeCommandResultsArray);


It may be a good idea to include a result slot for any new workers spawned after the command was first dispatched, that didn't get to execute the command.

I understand that it's not required for the initial purpose of just a cache flush, but since it's a more "generic" module, future users of this code might be interested to know that some processes that didn't get to run the command are currently servicing requests. Or, just a mention in the jsdoc.

OK, I can mention this fact in the docs. To check this I think we would have to check the cluster.workers array again before responding, or maybe by watching events when a new worker is spawned. Since it's not needed for now I think a comment will be fine.

tests/functional/clustering/ClusterRPC-test-server.js

When using the cluster module, new processes are forked and are dispatched workloads, usually HTTP requests. The ClusterRPC module implements a RPC system to send commands to all cluster worker processes at once from any particular worker, and retrieve their individual command results, like a distributed map operation. The existing cluster IPC channel is setup from the primary to each worker, but not between workers, so there has to be a hop by the primary. How a command is treated: - a worker sends a command message to the primary - the primary then forwards that command to each existing worker (including the requestor) - each worker then executes the command and returns a result or an error - the primary gathers all workers results into an array - finally, the primary dispatches the results array to the original requesting worker callback The original use of this feature is in Metadata DBD (bucketd) to implement a global cache refresh across worker processes.

jonathan-gramain · 2024-01-12T00:32:37Z

/approve

bert-e · 2024-01-12T00:32:47Z

Conflict

A conflict has been raised during the creation of
integration branch w/8.1/improvement/ARSN-381-cluster-rpc-helpers with contents from improvement/ARSN-381-cluster-rpc-helpers
and development/8.1.

I have not created the integration branch.

Here are the steps to resolve this conflict:

 $ git fetch
 $ git checkout -B w/8.1/improvement/ARSN-381-cluster-rpc-helpers origin/development/8.1
 $ git merge origin/improvement/ARSN-381-cluster-rpc-helpers
 $ # <intense conflict resolution>
 $ git commit
 $ git push -u origin w/8.1/improvement/ARSN-381-cluster-rpc-helpers

The following options are set: approve

bert-e · 2024-01-12T00:42:30Z

In the queue

The changeset has received all authorizations and has been added to the
relevant queue(s). The queue(s) will be merged in the target development
branch(es) as soon as builds have passed.

The changeset will be merged in:

✔️ development/7.70
✔️ development/8.1

The following branches will NOT be impacted:

development/6.4
development/7.10
development/7.4

There is no action required on your side. You will be notified here once
the changeset has been merged. In the unlikely event that the changeset
fails permanently on the queue, a member of the admin team will
contact you to help resolve the matter.

IMPORTANT

Please do not attempt to modify this pull request.

Any commit you add on the source branch will trigger a new cycle after the
current queue is merged.
Any commit you add on one of the integration branches will be lost.

If you need this pull request to be removed from the queue, please contact a
member of the admin team now.

The following options are set: approve

bert-e · 2024-01-12T00:42:42Z

I have successfully merged the changeset of this pull request
into targetted development branches:

✔️ development/7.70
✔️ development/8.1

The following branches have NOT changed:

development/6.4
development/7.10
development/7.4

Please check the status of the associated issue ARSN-381.

Goodbye jonathan-gramain.

jonathan-gramain force-pushed the improvement/ARSN-381-cluster-rpc-helpers branch from 97150d8 to e5fa1b6 Compare December 29, 2023 22:17

github-advanced-security bot found potential problems Dec 29, 2023

View reviewed changes

tests/functional/clustering/ClusterRPC-test-server.js Fixed Show fixed Hide fixed

jonathan-gramain requested review from tmacro and rachedbenmustapha January 2, 2024 18:14

tmacro reviewed Jan 5, 2024

View reviewed changes

lib/clustering/ClusterRPC.ts Show resolved Hide resolved

tmacro approved these changes Jan 5, 2024

View reviewed changes

rachedbenmustapha reviewed Jan 11, 2024

View reviewed changes

jonathan-gramain force-pushed the improvement/ARSN-381-cluster-rpc-helpers branch from b713f0f to 7cd0518 Compare January 11, 2024 23:51

github-advanced-security bot found potential problems Jan 11, 2024

View reviewed changes

tests/functional/clustering/ClusterRPC-test-server.js Dismissed Show dismissed Hide dismissed

rachedbenmustapha approved these changes Jan 12, 2024

View reviewed changes

jonathan-gramain force-pushed the improvement/ARSN-381-cluster-rpc-helpers branch from 7cd0518 to 5c2bfce Compare January 12, 2024 00:16

jonathan-gramain added 2 commits January 11, 2024 16:26

ARSN-381 bump arsenal version

3b9c93b

jonathan-gramain force-pushed the improvement/ARSN-381-cluster-rpc-helpers branch from 5c2bfce to 3b9c93b Compare January 12, 2024 00:26

bert-e merged commit 3b9c93b into development/7.70 Jan 12, 2024
7 checks passed

bert-e deleted the improvement/ARSN-381-cluster-rpc-helpers branch January 12, 2024 00:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARSN-381 RPC command system between cluster workers #2202

ARSN-381 RPC command system between cluster workers #2202

jonathan-gramain commented Dec 29, 2023

bert-e commented Dec 29, 2023

bert-e commented Dec 29, 2023

rachedbenmustapha Jan 11, 2024

rachedbenmustapha Jan 11, 2024

jonathan-gramain Jan 11, 2024

rachedbenmustapha Jan 11, 2024

jonathan-gramain Jan 11, 2024

jonathan-gramain Jan 11, 2024

rachedbenmustapha Jan 11, 2024

rachedbenmustapha Jan 11, 2024

rachedbenmustapha Jan 11, 2024

jonathan-gramain Jan 11, 2024

jonathan-gramain commented Jan 12, 2024

bert-e commented Jan 12, 2024

bert-e commented Jan 12, 2024

bert-e commented Jan 12, 2024

ARSN-381 RPC command system between cluster workers #2202

ARSN-381 RPC command system between cluster workers #2202

Conversation

jonathan-gramain commented Dec 29, 2023

bert-e commented Dec 29, 2023

Hello jonathan-gramain,

bert-e commented Dec 29, 2023

Request integration branches

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonathan-gramain commented Jan 12, 2024

bert-e commented Jan 12, 2024

Conflict

bert-e commented Jan 12, 2024

In the queue

bert-e commented Jan 12, 2024