Skip to content

Commit

Permalink
adding hwloc metric (#72)
Browse files Browse the repository at this point in the history
* adding hwloc metric
* hwloc should be system family

Signed-off-by: vsoch <[email protected]>
  • Loading branch information
vsoch authored Oct 17, 2023
1 parent 968540b commit 5fa5b07
Show file tree
Hide file tree
Showing 14 changed files with 265 additions and 2 deletions.
7 changes: 7 additions & 0 deletions docs/_static/data/metrics.json
Original file line number Diff line number Diff line change
Expand Up @@ -117,5 +117,12 @@
"family": "performance",
"image": "ghcr.io/converged-computing/metric-sysstat:latest",
"url": "https://github.com/sysstat/sysstat"
},
{
"name": "sys-hwloc",
"description": "install hwloc for inspecting hardware locality",
"family": "performance",
"image": "ghcr.io/converged-computing/metric-hwloc:latest",
"url": "https://www.open-mpi.org/projects/hwloc/tutorials/20120702-POA-hwloc-tutorial.html"
}
]
3 changes: 3 additions & 0 deletions docs/_static/data/table.html
Original file line number Diff line number Diff line change
Expand Up @@ -460,6 +460,9 @@
if(data.family == 'solver'){
$(row).find('td:eq(1)').css('background-color', 'lightgreen');
}
if(data.family == 'system'){
$(row).find('td:eq(1)').css('background-color', 'gray');
}
if(data.family == 'performance'){
$(row).find('td:eq(1)').css('background-color', '#f79fb7');
}
Expand Down
15 changes: 15 additions & 0 deletions docs/getting_started/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,21 @@ We likely will tweak and improve upon these categories.

## Implemented Metrics

### sys-hwloc

- *[sys-hwloc](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/sys-hwloc)*

[Hwloc](https://www.open-mpi.org/projects/hwloc/) or "portable hardware locality" can be used to look at the hardware of your system.
There is a [nice tutorial here](https://www.open-mpi.org/projects/hwloc/tutorials/20120702-POA-hwloc-tutorial.html) for the default command that is run,
"lstopo" that does exactly that - listing your hardware topology! Specifically we output a png image and machine spec for the default command, and this can be updated.
[This man page](https://manpages.ubuntu.com/manpages/impish/man1/lstopo.1.html) is recommended to see the different commands and options.

| Name | Description | Type | Default |
|-----|-------------|------------|------|
| command | Change the default command to something else. | string | lstopo architecture.png && hwloc-ls machine.xml |

The above saves a png image, and the machine data to xml.

### perf-sysstat

- *[perf-hello-world](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/perf-hello-world)*
Expand Down
59 changes: 59 additions & 0 deletions examples/tests/sys-hwloc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Hwloc Example

Let's run an interactive metric with hwloc.

## Usage

Create a cluster and install JobSet to it.

```bash
kind create cluster
VERSION=v0.2.0
kubectl apply --server-side -f https://github.com/kubernetes-sigs/jobset/releases/download/$VERSION/manifests.yaml
```

Install the operator (from the development manifest here):

```bash
$ kubectl apply -f ../../dist/metrics-operator-dev.yaml
```

How to see metrics operator logs:

```bash
$ kubectl logs -n metrics-system metrics-controller-manager-859c66464c-7rpbw
```

Then create the metrics set. This is going to run lstopo and save the image application.png in the container root!
Note that you can customize the command as an option "command" (e.g., to run something else).

```bash
kubectl apply -f metrics.yaml
```

Wait until you see pods created by the job and then running (there should be one with two containers, one for the app lammps and the other for the stats):

```bash
kubectl get pods
```
```diff
NAME READY STATUS RESTARTS AGE
- metricset-sample-m-0-0-mkwrh 0/1 ContainerCreating 0 2m20s
+ metricset-sample-m-0-0-mkwrh 1/1 Running 0 3m10s
```

This container doesn't have interesting logs unless your command generates output. For the default, we generate "application.png" and can
save it here:

```bash
$ kubectl cp metricset-sample-m-0-0-9h7b7:/architecture.png architecture.png
```
And there it is!

![architecture.png](architecture.png)

When you are done, cleanup!

```bash
kubectl delete -f metrics.yaml
```
Binary file added examples/tests/sys-hwloc/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
18 changes: 18 additions & 0 deletions examples/tests/sys-hwloc/metrics.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
apiVersion: flux-framework.org/v1alpha2
kind: MetricSet
metadata:
labels:
app.kubernetes.io/name: metricset
app.kubernetes.io/instance: metricset-sample
name: metricset-sample
spec:
logging:
interactive: true
metrics:
- name: sys-hwloc

# These are the default and do not need to be provided
listOptions:
commands:
- lstopo architecture.png
- hwloc-ls machine.xml
1 change: 1 addition & 0 deletions hack/metrics-gen/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ import (
_ "github.com/converged-computing/metrics-operator/pkg/metrics/io"
_ "github.com/converged-computing/metrics-operator/pkg/metrics/network"
_ "github.com/converged-computing/metrics-operator/pkg/metrics/perf"
_ "github.com/converged-computing/metrics-operator/pkg/metrics/sys"
//
// +kubebuilder:scaffold:imports
)
Expand Down
1 change: 1 addition & 0 deletions main.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ import (
_ "github.com/converged-computing/metrics-operator/pkg/metrics/io"
_ "github.com/converged-computing/metrics-operator/pkg/metrics/network"
_ "github.com/converged-computing/metrics-operator/pkg/metrics/perf"
_ "github.com/converged-computing/metrics-operator/pkg/metrics/sys"
//
// +kubebuilder:scaffold:imports
)
Expand Down
5 changes: 5 additions & 0 deletions pkg/metrics/application.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ package metrics
import (
api "github.com/converged-computing/metrics-operator/api/v1alpha2"
"github.com/converged-computing/metrics-operator/pkg/specs"
"k8s.io/apimachinery/pkg/util/intstr"
jobset "sigs.k8s.io/jobset/api/jobset/v1alpha2"
)

Expand All @@ -28,6 +29,10 @@ func (m SingleApplication) HasSoleTenancy() bool {
return false
}

func (m SingleApplication) Options() map[string]intstr.IntOrString {
return map[string]intstr.IntOrString{}
}

// Default SingleApplication is generic performance family
func (m SingleApplication) Family() string {
return PerformanceFamily
Expand Down
12 changes: 10 additions & 2 deletions pkg/metrics/base.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,17 @@ type BaseMetric struct {
// RegisterAddon adds an addon to the set, assuming it's already validated
func (m *BaseMetric) RegisterAddon(addon *addons.Addon) {
a := (*addon)
m.InitAddons()
logger.Infof("🟧️ Registering addon %s", a.Name())
m.Addons[a.Name()] = addon
}

// InitAddons ensures we don't have an empty map
func (m *BaseMetric) InitAddons() {
if m.Addons == nil {
logger.Infof("🟧️ Resetting addons - they are unset.")
m.Addons = map[string]*addons.Addon{}
}
logger.Infof("🟧️ Registering addon %s", a.Name())
m.Addons[a.Name()] = addon
}

// Name returns the metric name
Expand Down Expand Up @@ -131,6 +136,9 @@ func (m BaseMetric) AddAddons(
containerSpecs []*specs.ContainerSpec,
) ([]*specs.ContainerSpec, error) {

// Ensure we have created the map!
m.InitAddons()

// VolumeMounts can be generated from container specs
// For each addon, do custom logic depending on the type
// These are the main set of volumes, containers we are going to add
Expand Down
2 changes: 2 additions & 0 deletions pkg/metrics/containers.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ func getReplicatedJobContainers(
// Each needs to have the sys trace capability to see the application pids
for _, cs := range containerSpecs {

logger.Infof("Checking container spec %s", cs)

// Skip containers not intended for the replicated job
if cs.JobName != "" && cs.JobName != rj.Name {
continue
Expand Down
1 change: 1 addition & 0 deletions pkg/metrics/set.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ var (
const (

// Metric Family Types (these likely can be changed)
SystemFamily = "system"
StorageFamily = "storage"
MachineLearningFamily = "machine-learning"
NetworkFamily = "network"
Expand Down
114 changes: 114 additions & 0 deletions pkg/metrics/sys/hwloc.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
/*
Copyright 2023 Lawrence Livermore National Security, LLC
(c.f. AUTHORS, NOTICE.LLNS, COPYING)
SPDX-License-Identifier: MIT
*/

package sys

import (
"fmt"

api "github.com/converged-computing/metrics-operator/api/v1alpha2"
"github.com/converged-computing/metrics-operator/pkg/metadata"
metrics "github.com/converged-computing/metrics-operator/pkg/metrics"
"github.com/converged-computing/metrics-operator/pkg/specs"
"k8s.io/apimachinery/pkg/util/intstr"
)

const (
hwlocIdentifier = "sys-hwloc"
hwlocSummary = "install hwloc for inspecting hardware locality"
hwlocContainer = "ghcr.io/converged-computing/metric-hwloc:latest"
)

type Hwloc struct {
metrics.SingleApplication

// Custom Options
commands []string
}

func (m Hwloc) Url() string {
return "https://www.open-mpi.org/projects/hwloc/tutorials/20120702-POA-hwloc-tutorial.html"
}

func (m *Hwloc) Famliy() string {
return metrics.SystemFamily
}

// Set custom options / attributes for the metric
func (m *Hwloc) SetOptions(metric *api.Metric) {

m.Identifier = hwlocIdentifier
m.Summary = hwlocSummary
m.Container = hwlocContainer

// Defaults for lstopo command
m.ResourceSpec = &metric.Resources
m.AttributeSpec = &metric.Attributes
m.commands = []string{"lstopo architecture.png", "hwloc-ls machine.xml"}

cmd, ok := metric.ListOptions["command"]
if ok {
m.commands = []string{}
for _, val := range cmd {
m.commands = append(m.commands, val.StrVal)
}
}
}

func (m Hwloc) ListOptions() map[string][]intstr.IntOrString {
opts := map[string][]intstr.IntOrString{}
for _, val := range m.commands {
opts["commands"] = append(opts["commands"], intstr.FromString(val))
}
return opts
}

func (m Hwloc) PrepareContainers(
spec *api.MetricSet,
metric *metrics.Metric,
) []*specs.ContainerSpec {

// Metadata to add to beginning of run
meta := metrics.Metadata(spec, metric)

// Assemble commands into separate things
commands := ""
for _, cmd := range m.commands {
commands += fmt.Sprintf("\necho %s\n%s\n echo '%s'", cmd, cmd, metadata.Separator)
}
preBlock := `#!/bin/bash
echo "%s"
. /root/.profile
export PATH=/opt/view/bin:$PATH
echo "%s"
%s
echo "%s"
ls
`

interactive := metadata.Interactive(spec.Spec.Logging.Interactive)
preBlock = fmt.Sprintf(
preBlock,
meta,
metadata.CollectionStart,
commands,
metadata.CollectionEnd,
)
postBlock := fmt.Sprintf("\n%s\n", interactive)
return m.ApplicationContainerSpec(preBlock, "", postBlock)
}

func init() {
base := metrics.BaseMetric{
Identifier: hwlocIdentifier,
Summary: hwlocSummary,
Container: hwlocContainer,
}
app := metrics.SingleApplication{BaseMetric: base}
Hwloc := Hwloc{SingleApplication: app}
metrics.Register(&Hwloc)
}
29 changes: 29 additions & 0 deletions pkg/metrics/sys/logs.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
/*
Copyright 2023 Lawrence Livermore National Security, LLC
(c.f. AUTHORS, NOTICE.LLNS, COPYING)
SPDX-License-Identifier: MIT
*/

package sys

import (
"log"

"go.uber.org/zap"
)

// Consistent logging identifiers that should be echoed to have newline after
var (
handle *zap.Logger
logger *zap.SugaredLogger
)

func init() {
handle, err := zap.NewProduction()
if err != nil {
log.Fatalf("can't initialize zap logger: %v", err)
}
logger = handle.Sugar()
defer handle.Sync()
}

0 comments on commit 5fa5b07

Please sign in to comment.