-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving the monitoring portability #20
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ErwanAliasr1
force-pushed
the
monito
branch
5 times, most recently
from
May 31, 2024 14:16
5b618a5
to
7a22bd8
Compare
When starting hwbench or when reading a result file, there is no mention of the BMC driver used. This could be useful to understand some metrics or even for hwgraph to take some decision. This commit is : - adding BMC.get_driver_name() to report the class name as the driver name - adding a BMC.dump() so the driver name can be added in the result file. The hardware data structure looks like the following : "hardware": { "dmi": { "vendor": "Dell Inc.", "product": "PowerEdge C6615", "serial": "XXXXXX", "bios": { "version": "1.2.3", "release": "1.2" }, "chassis": { "product": "PowerEdge C6600", "serial": "XXXXXX" }, "sysconf_threads": 128 }, "cpu": { "vendor": "AuthenticAMD", "model": "AMD EPYC 8534P 64-Core Processor", "logical_cores": 128, "physical_cores": 64, "numa_domains": 8, "sockets": 1 }, "bmc": { "driver": "IDRAC" } - updating the startup message to indicate which driver is used, a typical output looks like : python3 -m hwbench.hwbench -j configs/mini.conf -m monitoring.cfg Starting monitoring for DELL vendor with driver IDRAC @ 10.168.97.148 ... Signed-off-by: Erwan Velu <[email protected]>
Some block devices like zram does not have any scheduler. This case made hwbench crashing at starting time. This commit is just ignoring block devices with no scheduler. Signed-off-by: Erwan Velu <[email protected]>
When an engine is using a 3rd-party binary, it's mandatory to test its presence unless the code will crash. This commit is : - adding a new helper (is_binary_available) to check if a binary is available - Add a generic check for engines Signed-off-by: Erwan Velu <[email protected]>
Testing if the BMC IP is set to 0.0.0.0 is useless since: - Some vendors uses dedicated channel interface like CHIF on HPE - If a network connection is required (like redfish), the connection is already established or generate a fault. So this commit is removing this code that is useless Signed-off-by: Erwan Velu <[email protected]>
This simple commit is updating the monitoring text at start time. A typical output looks like the following: Monitoring/turbostat: initialize Monitoring/turbostat: Freq metrics:64xCPU Monitoring/BMC: initialize DELL vendor with IDRAC driver @ 10.168.97.148 Monitoring/BMC: Thermal metrics:1xCPU, 1xIntake Monitoring/BMC: Fans metrics:10xFan Monitoring/BMC: PowerConsumption metrics:65xCPU, 4xBMC Monitoring/BMC: PowerSupplies metrics:2xBMC Signed-off-by: Erwan Velu <[email protected]>
When External class is used, if the pointed binary is not installed, a FileNotFoundError exception is triggered. Instead of this crash, let's have a custom fatal message to indicate what binary is missing. Signed-off-by: Erwan Velu <[email protected]>
hwbench requires at least turbostat 2022.04.16 (from Kernel 5.19) unless filtering C1% field would not be possible. This commit is: - update the requirement in the documentation - implements a simple test when Turbostat() is instantiated to guarantee the minimal release is present. - If no suitable release is found, hwbench will stop with a fatal message. A typical example looks like the following : Monitoring/turbostat: Detected release 19.8.31 ERROR:root:Monitoring/turbostat: minimal expected release is 2022.4.16 Signed-off-by: Erwan Velu <[email protected]>
Some processors like Intel(R) Core(TM) i7-9750H, report the Corewatt only for Core0. This commit is about to just ignore cores that do not report corewatt even if the header mention it. A typical turbostat output of such processor: Core CPU Avg_MHz Busy% Bzy_MHz TSC_MHz IPC IRQ SMI POLL C1 C1E C3 C6 C7s C8 C9 C10 POLL% C1% C1E% C3% C6% C7s% C8% C9% C10% CPU%c1 CPU%c3 CPU%c6 CPU%c7 CoreTmp CoreThr PkgTmp Totl%C0 Any%C0 GFX%C0 CPUGFX% Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10 CPU%LPI SYS%LPI PkgWatt CorWatt GFXWatt RAMWatt PKG_% RAM_% UncMHz - - 3 0.33 800 2592 0.50 1620 0 1 3 10 16 206 0 214 1 1342 0.00 0.00 0.00 0.00 0.26 0.00 0.36 0.02 99.05 0.64 0.00 0.47 98.57 40 2592 40 4.90 4.24 0.00 0.00 9.55 85.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 11.38 0.25 0.00 1.17 0.00 0.00 800 0 0 1 0.09 800 2592 0.35 20 0 0 0 0 0 2 0 4 0 113 0.00 0.00 0.00 0.00 0.02 0.00 0.07 0.00 99.82 1.13 0.00 0.08 98.69 37 2592 40 4.90 4.24 0.00 0.00 9.55 85.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 11.38 0.25 0.00 1.17 0.00 0.00 800 0 6 6 0.69 800 2592 0.31 341 0 0 0 0 0 7 0 3 0 311 0.00 0.00 0.00 0.00 0.08 0.00 0.06 0.00 99.20 0.53 1 1 6 0.70 800 2592 0.51 260 0 1 3 3 3 15 0 23 0 187 0.00 0.00 0.01 0.01 0.20 0.00 0.47 0.00 98.64 0.62 0.01 0.32 98.35 40 1352 1 7 2 0.31 800 2592 1.57 67 0 0 0 1 0 11 0 10 0 36 0.00 0.00 0.00 0.00 0.16 0.00 0.21 0.00 99.33 1.00 2 2 5 0.57 800 2592 0.33 66 0 0 0 1 3 11 0 9 0 145 0.00 0.00 0.00 0.00 0.17 0.00 0.19 0.00 99.08 0.46 0.00 0.52 98.44 38 1255 2 8 1 0.17 800 2592 0.38 108 0 0 0 1 2 24 0 21 0 66 0.00 0.00 0.00 0.01 0.42 0.00 0.41 0.00 99.01 0.86 3 3 4 0.44 800 2592 0.32 230 0 0 0 1 0 9 0 15 0 203 0.00 0.00 0.00 0.00 0.11 0.00 0.30 0.00 99.17 0.70 0.00 0.75 98.11 37 1078 3 9 2 0.29 800 2592 0.54 151 0 0 0 0 0 48 0 50 1 62 0.00 0.00 0.00 0.00 0.73 0.00 1.00 0.21 97.79 0.85 4 4 3 0.39 800 2592 0.30 264 0 0 0 2 7 34 0 57 0 158 0.00 0.00 0.00 0.01 0.52 0.00 1.13 0.00 97.98 0.38 0.00 0.50 98.73 37 237 4 10 1 0.08 800 2592 0.58 18 0 0 0 0 0 5 0 6 0 17 0.00 0.00 0.00 0.00 0.08 0.00 0.12 0.00 99.72 0.68 5 5 0 0.05 800 2592 0.47 25 0 0 0 1 0 7 0 1 0 22 0.00 0.00 0.00 0.00 0.10 0.00 0.02 0.00 99.84 0.26 0.01 0.62 99.07 36 0 5 11 1 0.14 800 2592 0.90 70 0 0 0 0 1 33 0 15 0 22 0.00 0.00 0.00 0.01 0.58 0.00 0.30 0.00 98.98 0.17 Signed-off-by: Erwan Velu <[email protected]>
Starting Kernel 6.9, the -n option became ambigous which prevents turbostat to run with the following message: turbostat: option '-n' is ambiguous; possibilities: '-num_iterations' '-no-msr' '-no-perf' This commit is removing all short name options and replace them with long name to avoid this case. This patch got tested successfully from Kernel 5.19 (2022.4.16) up to the incoming 6.10 (2024.5.10). Signed-off-by: Erwan Velu <[email protected]>
beorn-
approved these changes
May 31, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to go !
This was referenced May 31, 2024
Closed
Closed
Closed
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is about improving the monitoring feature with all feedback received since.