Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dashboard: add panels for Tarantool 3 configuration #231

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,18 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added
- Panels for Tarantool 3 configuration status and alerts (#224)

### Changed
- Use consistent style for panel requirements (PR #231)

### Fixed
- Missing panel requirement for vinyl Bloom filter panel (PR #231)


## [3.0.0] - 2024-07-09
Grafana revisions:
- Tarantool 3:
Expand Down
189 changes: 181 additions & 8 deletions dashboard/panels/cluster.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -345,8 +345,8 @@ local prometheus = grafana.prometheus;
failover and switchover issues, clock issues, memory fragmentation,
configuration issues and alien members warnings.

Panel works with `cartridge >= 2.0.2`, `metrics >= 0.6.0`,
while `metrics >= 0.9.0` is recommended for per instance display.
Panel minimal requirements: cartridge 2.0.2, metrics 0.6.0;
at least metrics 0.9.0 is recommended for per instance display.
|||,
):: cartridge_issues(
cfg,
Expand All @@ -363,8 +363,8 @@ local prometheus = grafana.prometheus;
"critical" issues includes replication process critical fails and
running out of available memory.

Panel works with `cartridge >= 2.0.2`, `metrics >= 0.6.0`,
while `metrics >= 0.9.0` is recommended for per instance display.
Panel minimal requirements: cartridge 2.0.2, metrics 0.6.0;
at least metrics 0.9.0 is recommended for per instance display.
|||,
):: cartridge_issues(
cfg,
Expand All @@ -373,14 +373,187 @@ local prometheus = grafana.prometheus;
level='critical',
),

local tarantool3_config_description_note(description) = std.join('\n', [description, |||
Panel minimal requirements: metrics 1.2.0, Tarantool 3.
|||]),

tarantool3_config_status(
cfg,
title='Tarantool configuration status',
description=tarantool3_config_description_note(|||
Current Tarantool 3 configuration apply status for a cluster instance.
`uninitialized` decribes uninitialized instance,
`check_errors` decribes instance with at least one apply error,
`check_warnings` decribes instance with at least one apply warning,
`startup_in_progress` decribes instance doing initial configuration apply,
`reload_in_progress` decribes instance doing configuration apply over existing configuration,
`ready` describes a healthy instance.

Panel minimal requirements: Grafana 8.
|||),
):: timeseries.new(
title=title,
description=description,
datasource=cfg.datasource,
panel_width=12,
max=6,
min=1,
).addValueMapping(
1, 'dark-red', 'uninitialized'
).addRangeMapping(
1.001, 1.999, '-'
).addValueMapping(
2, 'red', 'check_errors'
).addRangeMapping(
2.001, 2.999, '-'
).addValueMapping(
3, 'yellow', 'startup_in_progress'
).addRangeMapping(
3.001, 3.999, '-'
).addValueMapping(
4, 'dark-yellow', 'reload_in_progress'
).addRangeMapping(
4.001, 4.999, '-'
).addValueMapping(
5, 'dark-orange', 'check_warnings'
).addRangeMapping(
5.001, 5.999, '-'
).addValueMapping(
6, 'green', 'ready'
).addTarget(
if cfg.type == variable.datasource_type.prometheus then
local expr = std.format(
|||
1 * %(metric_full_name)s{%(uninitialized_filters)s} + on(alias)
2 * %(metric_full_name)s{%(check_errors_filters)s} + on(alias)
3 * %(metric_full_name)s{%(startup_in_progress_filters)s} + on(alias)
4 * %(metric_full_name)s{%(reload_in_progress_filters)s} + on(alias)
5 * %(metric_full_name)s{%(check_warnings_filters)s} + on(alias)
6 * %(metric_full_name)s{%(ready_filters)s}
|||, {
metric_full_name: cfg.metrics_prefix + 'tnt_config_status',
uninitialized_filters: common.prometheus_query_filters(cfg.filters { status: ['=', 'uninitialized'] }),
check_errors_filters: common.prometheus_query_filters(cfg.filters { status: ['=', 'check_errors'] }),
startup_in_progress_filters: common.prometheus_query_filters(cfg.filters { status: ['=', 'startup_in_progress'] }),
reload_in_progress_filters: common.prometheus_query_filters(cfg.filters { status: ['=', 'reload_in_progress'] }),
check_warnings_filters: common.prometheus_query_filters(cfg.filters { status: ['=', 'check_warnings'] }),
ready_filters: common.prometheus_query_filters(cfg.filters { status: ['=', 'ready'] }),
}
);
prometheus.target(expr=expr, legendFormat='{{alias}}')
else if cfg.type == variable.datasource_type.influxdb then
local query = std.format(|||
SELECT (1 * last("uninitialized") + 2 * last("check_errors") + 3 * last("startup_in_progress") +
4 * last("reload_in_progress") + 5 * last("check_warnings") + 6 * last("ready")) as "status" FROM
(
SELECT "value" as "uninitialized" FROM %(measurement_with_policy)s
WHERE ("metric_name" = '%(metric_full_name)s' AND %(uninitialized_filters)s) AND $timeFilter
),
(
SELECT "value" as "check_errors" FROM %(measurement_with_policy)s
WHERE ("metric_name" = '%(metric_full_name)s' AND %(check_errors_filters)s) AND $timeFilter
),
(
SELECT "value" as "startup_in_progress" FROM %(measurement_with_policy)s
WHERE ("metric_name" = '%(metric_full_name)s' AND %(startup_in_progress_filters)s) AND $timeFilter
),
(
SELECT "value" as "reload_in_progress" FROM %(measurement_with_policy)s
WHERE ("metric_name" = '%(metric_full_name)s' AND %(reload_in_progress_filters)s) AND $timeFilter
),
(
SELECT "value" as "check_warnings" FROM %(measurement_with_policy)s
WHERE ("metric_name" = '%(metric_full_name)s' AND %(check_warnings_filters)s) AND $timeFilter
),
(
SELECT "value" as "ready" FROM %(measurement_with_policy)s
WHERE ("metric_name" = '%(metric_full_name)s' AND %(ready_filters)s) AND $timeFilter
)
GROUP BY time($__interval), "label_pairs_alias" fill(0)
|||, {
metric_full_name: cfg.metrics_prefix + 'tnt_config_status',
measurement_with_policy: std.format('%(policy_prefix)s"%(measurement)s"', {
policy_prefix: if cfg.policy == 'default' then '' else std.format('"%(policy)s".', cfg.policy),
measurement: cfg.measurement,
}),
uninitialized_filters: common.influxdb_query_filters(cfg.filters { label_pairs_status: ['=', 'uninitialized'] }),
check_errors_filters: common.influxdb_query_filters(cfg.filters { label_pairs_status: ['=', 'check_errors'] }),
startup_in_progress_filters: common.influxdb_query_filters(cfg.filters { label_pairs_status: ['=', 'startup_in_progress'] }),
reload_in_progress_filters: common.influxdb_query_filters(cfg.filters { label_pairs_status: ['=', 'reload_in_progress'] }),
check_warnings_filters: common.influxdb_query_filters(cfg.filters { label_pairs_status: ['=', 'check_warnings'] }),
ready_filters: common.influxdb_query_filters(cfg.filters { label_pairs_status: ['=', 'ready'] }),
});
influxdb.target(
rawQuery=true,
query=query,
alias='$tag_label_pairs_alias',
)
),

local tarantool3_config_alerts(
cfg,
title,
description,
level,
) = common.default_graph(
cfg,
title=title,
description=tarantool3_config_description_note(description),
min=0,
legend_avg=false,
legend_max=false,
panel_height=8,
panel_width=6,
).addTarget(
common.target(
cfg,
'tnt_config_alerts',
additional_filters={
[variable.datasource_type.prometheus]: { level: ['=', level] },
[variable.datasource_type.influxdb]: { label_pairs_level: ['=', level] },
},
converter='last',
),
),

tarantool3_config_warning_alerts(
cfg,
title='Tarantool configuration warnings',
description=|||
Number of "warn" alerts on Tarantool 3 configuration apply on a cluster instance.
"warn" alerts cover non-critical issues which do not result in apply failure,
like missing a role to grant for a user.
|||,
):: tarantool3_config_alerts(
cfg,
title=title,
description=description,
level='warn',
),

tarantool3_config_error_alerts(
cfg,
title='Tarantool configuration errors',
description=|||
Number of "error" alerts on Tarantool 3 configuration apply on a cluster instance.
"error" alerts cover critical issues which results in apply failure,
like instance missing itself in configuration.
|||,
):: tarantool3_config_alerts(
cfg,
title=title,
description=description,
level='error',
),

failovers_per_second(
cfg,
title='Failovers triggered',
description=|||
Displays the count of failover triggers in a replicaset.
Graph shows average per second.

Panel works with `metrics >= 0.15.0`.
Panel minimal requirements: metrics 0.15.0.
|||,
):: common.default_graph(
cfg,
Expand All @@ -400,7 +573,7 @@ local prometheus = grafana.prometheus;
write operations. `replica` status means instance is
available only for read operations.

Panel works with `metrics >= 0.11.0` and Grafana 8.x.
Panel minimal requirements: metrics 0.11.0, Grafana 8.
|||,
panel_width=12,
):: timeseries.new(
Expand All @@ -423,7 +596,7 @@ local prometheus = grafana.prometheus;
local election_warning(description) = std.join(
'\n',
[description, |||
Panel works with metrics 0.15.0 or newer, Tarantool 2.6.1 or newer.
Panel minimal requirements: metrics 0.15.0, Tarantool 2.6.1.
|||]
),

Expand All @@ -438,7 +611,7 @@ local prometheus = grafana.prometheus;
`candidate`s are nodes that start a new election round.
`leader` is a node that collected a quorum of votes.

Panel works with Grafana 8.x.
Panel minimal requirements: Grafana 8.
|||),
):: timeseries.new(
title=title,
Expand Down
4 changes: 2 additions & 2 deletions dashboard/panels/cpu.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ local prometheus = grafana.prometheus;
spent by instance process executing in user mode.
Metrics obtained using `getrusage()` call.

Panel works with `metrics >= 0.8.0`.
Panel minimal requirements: metrics 0.8.0.
|||,
):: getrusage_cpu_percentage_graph(
cfg=cfg,
Expand All @@ -51,7 +51,7 @@ local prometheus = grafana.prometheus;
spent by instance process executing in kernel mode.
Metrics obtained using `getrusage()` call.

Panel works with `metrics >= 0.8.0`.
Panel minimal requirements: metrics 0.8.0.
|||,
):: getrusage_cpu_percentage_graph(
cfg=cfg,
Expand Down
4 changes: 2 additions & 2 deletions dashboard/panels/luajit.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@ local common = import 'dashboard/panels/common.libsonnet';
row:: common.row('Tarantool LuaJit statistics'),

local version_warning(description) =
std.join('\n\n', [description, 'Panel works with `metrics >= 0.6.0` and `Tarantool >= 2.6`.']),
std.join('\n\n', [description, 'Panel minimal requirements: metrics 0.6.0, Tarantool 2.6.']),

local version_warning_renamed(description) =
std.join('\n\n', [description, 'Panel works with `metrics >= 0.15.0` and `Tarantool >= 2.6`.']),
std.join('\n\n', [description, 'Panel minimal requirements: metrics 0.15.0, Tarantool 2.6.']),

snap_restores(
cfg,
Expand Down
2 changes: 1 addition & 1 deletion dashboard/panels/mvcc.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ local prometheus = grafana.prometheus;
local mvcc_warning(description) = std.join(
'\n',
[description, |||
Panel works with metrics 0.15.1 or newer, Tarantool 2.10 or newer.
Panel minimal requirements: metrics 0.15.1, Tarantool 2.10.
|||]
),

Expand Down
10 changes: 5 additions & 5 deletions dashboard/panels/net.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ local prometheus = grafana.prometheus;
description=|||
Average number of requests processed by tx thread per second.

Panel works with `metrics >= 0.13.0` and `Tarantool >= 2.10-beta2`.
Panel minimal requirements: metrics 0.13.0, Tarantool 2.10-beta2.
|||,
):: common.default_graph(
cfg,
Expand All @@ -132,7 +132,7 @@ local prometheus = grafana.prometheus;
description=|||
Number of requests currently being processed in the tx thread.

Panel works with `metrics >= 0.13.0` and `Tarantool >= 2.10-beta2`.
Panel minimal requirements: metrics 0.13.0, Tarantool 2.10-beta2.
|||,
):: common.default_graph(
cfg,
Expand All @@ -152,7 +152,7 @@ local prometheus = grafana.prometheus;
Average number of requests which was placed in queues
of streams per second.

Panel works with `metrics >= 0.13.0` and `Tarantool >= 2.10-beta2`.
Panel minimal requirements: metrics 0.13.0, Tarantool 2.10-beta2.
|||,
):: common.default_graph(
cfg,
Expand All @@ -170,7 +170,7 @@ local prometheus = grafana.prometheus;
description=|||
Number of requests currently waiting in queues of streams.

Panel works with `metrics >= 0.13.0` and `Tarantool >= 2.10-beta2`.
Panel minimal requirements: metrics 0.13.0, Tarantool 2.10-beta2.
|||,
):: common.default_graph(
cfg,
Expand Down Expand Up @@ -219,7 +219,7 @@ local prometheus = grafana.prometheus;
local per_thread_warning(description) = std.join(
'\n',
[description, |||
Panel works with metrics 0.15.0 or newer, Tarantool 2.10 or newer.
Panel minimal requirements: metrics 0.15.0, Tarantool 2.10.
|||]
),

Expand Down
10 changes: 5 additions & 5 deletions dashboard/panels/operations.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ local prometheus = grafana.prometheus;
SQL prepare calls.
Graph shows average calls per second.

Panel works with Tarantool 2.x.
Panel minimal requirements: Tarantool 2.
|||,
):: operation_rps(
cfg,
Expand All @@ -199,7 +199,7 @@ local prometheus = grafana.prometheus;
SQL execute calls.
Graph shows average calls per second.

Panel works with Tarantool 2.x.
Panel minimal requirements: Tarantool 2.
|||,
):: operation_rps(
cfg,
Expand All @@ -218,7 +218,7 @@ local prometheus = grafana.prometheus;
operations with `TRANSACTION START` and IPROTO_BEGIN operations.
Graph shows average calls per second.

Panel works with Tarantool 2.10 or newer.
Panel minimal requirements: Tarantool 2.10.
|||,
):: operation_rps(
cfg,
Expand All @@ -237,7 +237,7 @@ local prometheus = grafana.prometheus;
operations with `COMMIT` and IPROTO_COMMIT operations.
Graph shows average calls per second.

Panel works with Tarantool 2.10 or newer.
Panel minimal requirements: Tarantool 2.10.
|||,
):: operation_rps(
cfg,
Expand All @@ -256,7 +256,7 @@ local prometheus = grafana.prometheus;
operations with `ROLLBACK` and IPROTO_ROLLBACK operations.
Graph shows average calls per second.

Panel works with Tarantool 2.10 or newer.
Panel minimal requirements: Tarantool 2.10.
|||,
):: operation_rps(
cfg,
Expand Down
Loading
Loading