Skip to content

Commit

Permalink
feat: add register -> nodes in graph endpoint
Browse files Browse the repository at this point in the history
This adds the registration command for the subsystem, ensuring
that the subsystem root is created (e.g., for IO) and the cluster
name (e.g., IO for cluster keebler) created off of that. We currently
allow any vertex to be created with an edge to itself (within the
same subsystem) OR to a reference in the dominant subystem. We have
that reference birectional so if/when there is a delete command
we can parse the subsystem nodes (e.g., IO) to find the link to
the dominant subsystem node, and then clean it up from the other
side (and no dangling entries that no longer exist). Now that
this is added I can work on a prototype of intents (basically
a jobspec asking to request resources for this)

Signed-off-by: vsoch <[email protected]>
  • Loading branch information
vsoch committed Mar 9, 2024
1 parent 8cb60b6 commit b505e8a
Show file tree
Hide file tree
Showing 24 changed files with 1,067 additions and 486 deletions.
6 changes: 5 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,11 @@ stream: ## Runs the interface client

.PHONY: register
register: ## Run mock registration
go run cmd/rainbow/rainbow.go register --cluster-name keebler --cluster-nodes ./docs/examples/scheduler/cluster-nodes.json --config-path ./docs/examples/scheduler/rainbow-config.yaml --save
go run cmd/rainbow/rainbow.go register cluster --cluster-name keebler --nodes-json ./docs/examples/scheduler/cluster-nodes.json --config-path ./docs/examples/scheduler/rainbow-config.yaml --save

.PHONY: subsystem
subsystem: ## Run mock registration
go run cmd/rainbow/rainbow.go register subsystem --subsystem io --nodes-json ./docs/examples/scheduler/cluster-io-subsystem.json --config-path ./docs/examples/scheduler/rainbow-config.yaml

.PHONY: tag
tag: ## Creates release tag
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ For more information:
- clusters
- implement function to add a subsystem to an existing cluster (e.g., add I/O)
- subsystems
- do we need bidirectional links for the memory graph?
- a satisfies request will need to have a representation of subsystems. E.g., what are we asking of each?
- right now we assume a node resources request going to the dominant subsystem
- we will want a function to add a new subsystem, right now we have one dominant for nodes
Expand Down
8 changes: 5 additions & 3 deletions api/v1/rainbow.proto
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,17 @@ service RainbowScheduler {
// Register cluster - request to register a new cluster
rpc Register(RegisterRequest) returns (RegisterResponse);

// Register cluster - request to register a new cluster
rpc RegisterSubsystem(RegisterRequest) returns (RegisterResponse);

// Job Submission - request for submitting a job to a named cluster
rpc SubmitJob(SubmitJobRequest) returns (SubmitJobResponse);

// Request Job - ask the rainbow scheduler for up to max jobs
rpc ReceiveJobs(ReceiveJobsRequest) returns (ReceiveJobsResponse);

// Accept Jobs - accept some number of jobs
rpc AcceptJobs(AcceptJobsRequest) returns (AcceptJobsResponse);
// Accept Jobs - accept some number of jobs
rpc AcceptJobs(AcceptJobsRequest) returns (AcceptJobsResponse);
}

// RegisterRequest registers a cluster to the scheduler service
Expand Down Expand Up @@ -47,7 +50,6 @@ message SubmitJobRequest {
string name = 1;
string token = 2;
}

}

// RequestJobsRequest is used by a cluster (or other entity that can run jobs)
Expand Down
57 changes: 40 additions & 17 deletions cmd/rainbow/rainbow.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ func main() {
registerCmd := parser.NewCommand("register", "Register a new cluster")
submitCmd := parser.NewCommand("submit", "Submit a job to a rainbow scheduler")
receiveCmd := parser.NewCommand("receive", "Receive and accept jobs")
registerClusterCmd := registerCmd.NewCommand("cluster", "Register a new cluster")

// Configuration
configCmd := parser.NewCommand("config", "Interact with rainbow configs")
Expand All @@ -54,11 +55,16 @@ func main() {
clusterSecret := receiveCmd.String("", "request-secret", &argparse.Options{Help: "Cluster 'secret' to retrieve jobs"})
maxJobs := receiveCmd.Int("j", "max-jobs", &argparse.Options{Help: "Maximum number of jobs to accept"})

// Register
secret := registerCmd.String("s", "secret", &argparse.Options{Default: defaultSecret, Help: "Registration 'secret'"})
clusterNodes := registerCmd.String("", "cluster-nodes", &argparse.Options{Help: "Cluster nodes json (JGF v2)"})
// Register Shared arguments
clusterNodes := registerCmd.String("", "nodes-json", &argparse.Options{Help: "Cluster nodes json (JGF v2)"})

// Cluster register arguments
secret := registerClusterCmd.String("s", "secret", &argparse.Options{Default: defaultSecret, Help: "Registration 'secret'"})
subsystem := registerCmd.String("", "subsystem", &argparse.Options{Help: "Subsystem to register cluster to (defaults to dominant, nodes)"})
saveSecret := registerCmd.Flag("", "save", &argparse.Options{Help: "Save cluster secret to config file, if provided"})
saveSecret := registerClusterCmd.Flag("", "save", &argparse.Options{Help: "Save cluster secret to config file, if provided"})

// Register subsystem (requires config file for authentication)
subsysCmd := registerCmd.NewCommand("subsystem", "Register a new subsystem")

// Submit (note that command for now needs to be in quotes to get the whole thing)
token := submitCmd.String("", "token", &argparse.Options{Default: defaultSecret, Help: "Client token to submit jobs with."})
Expand All @@ -82,20 +88,37 @@ func main() {
}

} else if registerCmd.Happened() {
err := register.Run(
*host,
*clusterName,
*clusterNodes,
*secret,
*saveSecret,
*cfg,
*graphDatabase,
*subsystem,
*selectionAlgorithm,
)
if err != nil {
log.Fatalf("Issue with register: %s\n", err)

if subsysCmd.Happened() {
err := register.RegisterSubsystem(
*host,
*clusterName,
*clusterNodes,
*subsystem,
*cfg,
)
if err != nil {
log.Fatalf("Issue with register subsystem: %s\n", err)
}
} else if registerClusterCmd.Happened() {
err := register.Run(
*host,
*clusterName,
*clusterNodes,
*secret,
*saveSecret,
*cfg,
*graphDatabase,
*subsystem,
*selectionAlgorithm,
)
if err != nil {
log.Fatalf("Issue with register: %s\n", err)
}
} else {
log.Fatal("Register requires a command.")
}

} else if receiveCmd.Happened() {
err := receive.Run(
*host,
Expand Down
53 changes: 53 additions & 0 deletions cmd/rainbow/register/subsystem.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
package register

import (
"context"
"fmt"
"log"

"github.com/converged-computing/rainbow/pkg/client"
"github.com/converged-computing/rainbow/pkg/config"
)

// RegisterSubsystem registers a subsystem
func RegisterSubsystem(
host,
clusterName,
subsystemNodes,
subsystem,
cfgFile string,
) error {

c, err := client.NewClient(host)
if err != nil {
return err
}

// A config file is required here
if cfgFile == "" {
return fmt.Errorf("an existing configuration file is required to register a subsystem")
}
if subsystem == "" {
return fmt.Errorf("a subsystem name is required to register")
}
// Read in the config, if provided, command line takes preference
cfg, err := config.NewRainbowClientConfig(cfgFile, "", "", "", "")
if err != nil {
return err
}

log.Printf("registering subsystem to cluster: %s", cfg.Scheduler.Name)

// Last argument is subsystem name, which we can derive from graph
response, err := c.RegisterSubsystem(
context.Background(),
cfg.Cluster.Name,
cfg.Cluster.Secret,
subsystemNodes,
subsystem,
)
// If we get here, success! Dump all the stuff.
log.Printf("%s", response)
return err

}
69 changes: 64 additions & 5 deletions docs/commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ make register
If you ran this using the rainbow client you would do:

```bash
rainbow register --cluster-name keebler --cluster-nodes ./docs/examples/scheduler/cluster-nodes.json --config-path ./docs/examples/scheduler/rainbow-config.yaml --save
rainbow register --cluster-name keebler --nodes-json ./docs/examples/scheduler/cluster-nodes.json --config-path ./docs/examples/scheduler/rainbow-config.yaml --save
```

Note in the above we are providing a config file path and `--save` so our cluster secret gets saved there. Be careful always about overwriting any configuration file.
Expand Down Expand Up @@ -101,6 +101,7 @@ go run cmd/server/server.go --global-token rainbow
"keebler": {
"Name": "keebler",
"Counts": {
"cluster": 1,
"core": 36,
"node": 3,
"rack": 1,
Expand Down Expand Up @@ -363,14 +364,72 @@ We would expect edges for I/O to reference them as follows - in the example belo

Note that is only partial json, and validation when adding a subsystem will ensure that:

- All nodes in the subsystem are linked to the dominant subsystem graph except for the root
- All nodes in the subsystem are linked to the dominant subsystem graph or another subsystem node.
- All edges defined for the subsystem exist in the graph.

The root exists primarily as a handle to all of the children in the subsyste. You are not allowed to add edges to nodes that don't exist in the dominant subsystem, nor are you allowed to add subsystem nodes that are not being used (and are unlinked or have no edges).
The root exists primarily as a handle to all of the children in the subsystem. You are not allowed to add edges to nodes that don't exist in the dominant subsystem, nor are you allowed to add subsystem nodes that are not being used (and are unlinked or have no edges). When you run the register command, you'll see the following output (e.g, I normally have two terminals and do):

**Question for Hari**
```bash
# terminal 1
rm rainbow.db && make server
# terminal 2
make register && make subsystem
```

And then I'll see the following output in terminal 1:

```console
...
2024/03/08 18:34:44 📝️ received register: keebler
2024/03/08 18:34:44 Received cluster graph with 44 nodes and 86 edges
2024/03/08 18:34:44 SELECT count(*) from clusters WHERE name = 'keebler': (0)
2024/03/08 18:34:44 INSERT into clusters (name, token, secret) VALUES ("keebler", "rainbow", "d6aa12a2-cbff-4504-8a0b-1b36e8796ed8"): (1)
2024/03/08 18:34:44 Preparing to load 44 nodes and 86 edges
2024/03/08 18:34:44 We have made an in memory graph (subsystem cluster) with 45 vertices!
{
"keebler": {
"Name": "keebler",
"Counts": {
"cluster": 1,
"core": 36,
"node": 3,
"rack": 1,
"socket": 3
}
}
}
2024/03/08 18:34:45 SELECT * from clusters WHERE name LIKE "keebler" LIMIT 1: keebler
2024/03/08 18:34:45 📝️ received subsystem register: keebler
2024/03/08 18:34:45 Preparing to load 6 nodes and 30 edges
2024/03/08 18:34:45 We have made an in memory graph (subsystem io) with 7 vertices, with 15 connections to the dominant!
{
"keebler": {
"Name": "keebler",
"Counts": {
"io": 1,
"mtl1unit": 1,
"mtl2unit": 1,
"mtl3unit": 1,
"nvme": 1,
"shm": 1
}
}
}
```
And in terminal 2:

```console
...
2024/03/08 18:34:44 Saving cluster secret to ./docs/examples/scheduler/rainbow-config.yaml
go run cmd/rainbow/rainbow.go register subsystem --subsystem io --nodes-json ./docs/examples/scheduler/cluster-io-subsystem.json --config-path ./docs/examples/scheduler/rainbow-config.yaml
2024/03/08 18:34:45 🌈️ starting client (localhost:50051)...
2024/03/08 18:34:45 registering subsystem to cluster: keebler
2024/03/08 18:34:45 status:REGISTER_SUCCESS
```

Next we need to think about how to add metadata to the jobspec that is relevant to asking for specific subsystem resources (in this case, IO). Hari is calling these intents.

What is a `mtl1unit` vs `mtl2unit` for the rabbit? The first is local (exclusive) and the second shared but I want to understand the names there.


[home](/README.md#rainbow-scheduler)
2 changes: 1 addition & 1 deletion docs/examples/scheduler/rainbow-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ scheduler:
name: random
cluster:
name: keebler
secret: edfae545-08a5-4d20-8ff9-ed16e33a786b
secret: d6aa12a2-cbff-4504-8a0b-1b36e8796ed8
graphdatabase:
name: memory
host: 127.0.0.1:50051
Expand Down
Loading

0 comments on commit b505e8a

Please sign in to comment.