-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARSN-381 RPC command system between cluster workers #2202
ARSN-381 RPC command system between cluster workers #2202
Conversation
Hello jonathan-gramain,My role is to assist you with the merge of this Status report is not available. |
Request integration branchesWaiting for integration branch creation to be requested by the user. To request integration branches, please comment on this pull request with the following command:
Alternatively, the |
97150d8
to
e5fa1b6
Compare
lib/clustering/ClusterRPC.ts
Outdated
uids, | ||
payload, | ||
}; | ||
return process.send?.(message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If process.send
is undefined no worker will receive the call, so it might be worth failing early at init if the IPC channel is not available.
* - each worker then executes the command and returns a result or an | ||
* error | ||
* | ||
* - the primary gathers all workers results into an array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a way to set a deadline or timeout. If one worker is stuck or dies for any reason, the command will stay in-flight and never return - leaving the requester to either hang or trigger its own timeout, and uidsToWorkerId
would keep growing over time.
This might be acceptable, in which case a mention of that behavior in the jsdoc would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a good idea indeed to add a worker timeout for the reason you mention (maybe configurable via an extra argument to sendWorkerCommand
).
|
||
// exported types | ||
|
||
export type ResultObject = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember if Promise
s are available within the target environment (I'd think so), but they would help with a nicer API even if async
/await
is not available: getting rid of nested callbacks and making the ResultObject
wrapper type superfluous (even possibly the callback types but that may be going too far).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we could use promises. Actually Metadata can use promises and async/await
even though most of the code is legacy, there are small pieces of code that do use them.
It's not clear to me why ResultObject
would not be needed anymore though, but I can look at it I may figure it out :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I think I understand, replacing the "error" field by a rejected promise, and the "results" by a resolved one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly!
* - the primary then forwards that command to each existing worker | ||
* (including the requestor) | ||
* | ||
* - each worker then executes the command and returns a result or an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of obvious but a mention might be warranted of the type of messages that are allowed in/out - since they have to be serialized, any circular references or very big buffers would cause issues.
return undefined; | ||
} | ||
// send back response to original worker | ||
_dispatchCommandResultsToWorker(toWorker, uids, completeCommandResultsArray); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be a good idea to include a result slot for any new workers spawned after the command was first dispatched, that didn't get to execute the command.
I understand that it's not required for the initial purpose of just a cache flush, but since it's a more "generic" module, future users of this code might be interested to know that some processes that didn't get to run the command are currently servicing requests. Or, just a mention in the jsdoc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I can mention this fact in the docs. To check this I think we would have to check the cluster.workers
array again before responding, or maybe by watching events when a new worker is spawned. Since it's not needed for now I think a comment will be fine.
b713f0f
to
7cd0518
Compare
7cd0518
to
5c2bfce
Compare
When using the cluster module, new processes are forked and are dispatched workloads, usually HTTP requests. The ClusterRPC module implements a RPC system to send commands to all cluster worker processes at once from any particular worker, and retrieve their individual command results, like a distributed map operation. The existing cluster IPC channel is setup from the primary to each worker, but not between workers, so there has to be a hop by the primary. How a command is treated: - a worker sends a command message to the primary - the primary then forwards that command to each existing worker (including the requestor) - each worker then executes the command and returns a result or an error - the primary gathers all workers results into an array - finally, the primary dispatches the results array to the original requesting worker callback The original use of this feature is in Metadata DBD (bucketd) to implement a global cache refresh across worker processes.
5c2bfce
to
3b9c93b
Compare
/approve |
ConflictA conflict has been raised during the creation of I have not created the integration branch. Here are the steps to resolve this conflict: $ git fetch
$ git checkout -B w/8.1/improvement/ARSN-381-cluster-rpc-helpers origin/development/8.1
$ git merge origin/improvement/ARSN-381-cluster-rpc-helpers
$ # <intense conflict resolution>
$ git commit
$ git push -u origin w/8.1/improvement/ARSN-381-cluster-rpc-helpers The following options are set: approve |
In the queueThe changeset has received all authorizations and has been added to the The changeset will be merged in:
The following branches will NOT be impacted:
There is no action required on your side. You will be notified here once IMPORTANT Please do not attempt to modify this pull request.
If you need this pull request to be removed from the queue, please contact a The following options are set: approve |
I have successfully merged the changeset of this pull request
The following branches have NOT changed:
Please check the status of the associated issue ARSN-381. Goodbye jonathan-gramain. |
When using the cluster module, new processes are forked and are dispatched workloads, usually HTTP requests. The ClusterRPC module implements a RPC system to send commands to all cluster worker processes at once from any particular worker, and retrieve their individual command results, like a distributed map operation.
The existing cluster IPC channel is setup from the primary to each worker, but not between workers, so there has to be a hop by the primary.
How a command is treated:
a worker sends a command message to the primary
the primary then forwards that command to each existing worker (including the requestor)
each worker then executes the command and returns a result or an error
the primary gathers all workers results into an array
finally, the primary dispatches the results array to the original requesting worker callback
The original use of this feature is in Metadata DBD (bucketd) to implement a global cache refresh across worker processes.