-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changefeedccl: add metric tracking tables watched #119391
changefeedccl: add metric tracking tables watched #119391
Conversation
576e86e
to
90f9e7a
Compare
90f9e7a
to
fb01b11
Compare
fb01b11
to
23bd1be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @amruss, @jayshrivastava, @rharding6373, and @wenyihu6)
-- commits
line 12 at r1:
nit: maybe we should say "number of unique tables"
pkg/ccl/changefeedccl/changefeed_stmt.go
line 313 at r1 (raw file):
logChangefeedCreateTelemetry(ctx, jr, changefeedStmt.Select != nil) err = func() error {
Maybe move this code into a named helper function? Especially given that it's more or less repeated verbatim below
pkg/ccl/changefeedccl/changefeed_stmt.go
line 1497 at r1 (raw file):
} return func() error {
nit: maybe put this in an if err := ...; err != nil { ... }
and still have the return nil
at the end of the function, would make it a bit easier to insert things after this function (if needed) in the future
pkg/ccl/changefeedccl/metrics.go
line 165 at r1 (raw file):
resolved map[int64]hlc.Timestamp checkpoint map[int64]hlc.Timestamp targets map[int]int64
Could you add comments documenting what these fields store?
pkg/ccl/changefeedccl/metrics.go
line 229 at r1 (raw file):
// WatchedTargets metrics if this is the first target with this id. // Requires m.mu to be locked. func (m *sliMetrics) incrTargetLocked(id int) {
nit: maybe rename to incTargetLocked
to match the Gauge
API
pkg/ccl/changefeedccl/metrics.go
line 240 at r1 (raw file):
// WatchedTargets metric if this was the last target with this id. // Requires m.mu to be locked. func (m *sliMetrics) decrTargetLocked(id int) {
nit: maybe rename to decTargetLocked
to match the Gauge
API
pkg/ccl/changefeedccl/metrics.go
line 886 at r1 (raw file):
metaChangefeedWatchedTargets := metric.Metadata{ Name: "changefeed.watched_targets", Help: "Targets watched by all feeds",
nit: maybe we should clarify that it's the number of "unique targets"
pkg/ccl/changefeedccl/helpers_test.go
line 1379 at r1 (raw file):
} func waitForChangefeedRunning(t *testing.T, sli *sliMetrics, targetCount int64) {
nit: maybe rename to something like waitForChangefeedRunningCount
pkg/ccl/changefeedccl/metrics_test.go
line 23 at r1 (raw file):
) func TestRunningAndTables(t *testing.T) {
nit: maybe rename to TestRunningCountAndWatchedTargets
since those are the only two metrics being tested?
pkg/ccl/changefeedccl/metrics_test.go
line 96 at r1 (raw file):
require.EqualValues(t, 2, sliMetric.WatchedTargets.Value()) // The running count should be decremented when the job is paused, but the
Should we maybe add an assertion for the running count being 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks for adding these metrics! They will be really useful for escalations.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @amruss, @jayshrivastava, and @rharding6373)
-- commits
line 13 at r1:
nit: i think we only remove the watched targets if the job has failed or been canceled. Could you add a note here clarifying how they change and they won't be removed if they are paused?
pkg/ccl/changefeedccl/changefeed_stmt.go
line 316 at r1 (raw file):
metrics := p.ExecCfg().JobRegistry.MetricsStruct().Changefeed.(*Metrics) pl := sj.Payload() cf := pl.GetDetails().(*jobspb.Payload_Changefeed).Changefeed
It seems there's a potential for pl.GetDetails()
to return nil, which could lead to a nil pointer dereference when accessing Changefeed here. But I notice that other parts of the code also directly access its fields, so maybe this is not possible.
pkg/ccl/changefeedccl/changefeed_stmt.go
line 317 at r1 (raw file):
pl := sj.Payload() cf := pl.GetDetails().(*jobspb.Payload_Changefeed).Changefeed scope := cf.Opts[changefeedbase.OptMetricsScope]
Do you know how this metrics would work if we add a metrics label for different changefeeds? Will sli.WatchedTargets
still be correct?
pkg/ccl/changefeedccl/changefeed_stmt.go
line 1156 at r1 (raw file):
}) sli.updateLastTargets(jobID, targetSet) return nil
nit: this part of code dealing with updating metrics for starting, resuming, and canceling jobs seems quite repetitive. Is there a straightforward way to modularize it?
pkg/ccl/changefeedccl/metrics.go
line 197 at r1 (raw file):
} func (m *sliMetrics) updateLastTargets(id jobspb.JobID, newTargets intsets.Fast) {
nit: could you add a comment for this function?
pkg/ccl/changefeedccl/metrics.go
line 207 at r1 (raw file):
} addedTargets = newTargets.Difference(oldTargets) removedTargets = oldTargets.Difference(newTargets)
For my education, what's the time efficiency to call Difference or Equals on intsets.Fast?
pkg/ccl/changefeedccl/metrics_test.go
line 164 at r1 (raw file):
sqlDB.Exec(t, `CREATE TABLE bar (a INT PRIMARY KEY, b STRING)`) require.NoError(t, testfeed.Resume()) require.EqualValues(t, 2, sliMetric.WatchedTargets.Value())
Nice tests!
23bd1be
to
9a3a1c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a major change to move this metric from sliMetrics into the main Metrics, since we want to capture the # of unique tables across all changefeeds regardless of the metric label the changefeed uses. The result is a tiny bit simpler. PTAL.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @amruss, @andyyang890, @jayshrivastava, and @wenyihu6)
Previously, andyyang890 (Andy Yang) wrote…
nit: maybe we should say "number of unique tables"
Done
Previously, wenyihu6 (Wenyi Hu) wrote…
nit: i think we only remove the watched targets if the job has failed or been canceled. Could you add a note here clarifying how they change and they won't be removed if they are paused?
Done
pkg/ccl/changefeedccl/changefeed_stmt.go
line 313 at r1 (raw file):
Previously, andyyang890 (Andy Yang) wrote…
Maybe move this code into a named helper function? Especially given that it's more or less repeated verbatim below
Good suggestion. Done.
pkg/ccl/changefeedccl/changefeed_stmt.go
line 316 at r1 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
It seems there's a potential for
pl.GetDetails()
to return nil, which could lead to a nil pointer dereference when accessing Changefeed here. But I notice that other parts of the code also directly access its fields, so maybe this is not possible.
The payload is always set when the job is adopted. I don't think it can be nil.
pkg/ccl/changefeedccl/changefeed_stmt.go
line 317 at r1 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
Do you know how this metrics would work if we add a metrics label for different changefeeds? Will
sli.WatchedTargets
still be correct?
This is a good point, and one thing I realized since writing this is that due to the way metrics labels work, we would double-count tables watched by changefeeds under different labels 😬 I made a major change to fix this and added some test coverage. PTAL
pkg/ccl/changefeedccl/changefeed_stmt.go
line 1156 at r1 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
nit: this part of code dealing with updating metrics for starting, resuming, and canceling jobs seems quite repetitive. Is there a straightforward way to modularize it?
Done
pkg/ccl/changefeedccl/changefeed_stmt.go
line 1497 at r1 (raw file):
Previously, andyyang890 (Andy Yang) wrote…
nit: maybe put this in an
if err := ...; err != nil { ... }
and still have thereturn nil
at the end of the function, would make it a bit easier to insert things after this function (if needed) in the future
With the new changes, this is no longer an issue.
pkg/ccl/changefeedccl/metrics.go
line 165 at r1 (raw file):
Previously, andyyang890 (Andy Yang) wrote…
Could you add comments documenting what these fields store?
Done
pkg/ccl/changefeedccl/metrics.go
line 197 at r1 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
nit: could you add a comment for this function?
Done
pkg/ccl/changefeedccl/metrics.go
line 207 at r1 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
For my education, what's the time efficiency to call Difference or Equals on intsets.Fast?
Worst case implementation is a linked list of bitmaps, so Equals would be O(N). However, since table IDs are usually within a relatively small (100s) range, in practice I expect in most cases it would fit in a couple of bitmaps with only 1-3 linked list nodes.
Difference makes a copy of one of the sets first, but is also O(N).
pkg/ccl/changefeedccl/metrics.go
line 229 at r1 (raw file):
Previously, andyyang890 (Andy Yang) wrote…
nit: maybe rename to
incTargetLocked
to match theGauge
API
I made the name more similar (while still using locked naming convention). What do you think?
pkg/ccl/changefeedccl/metrics.go
line 240 at r1 (raw file):
Previously, andyyang890 (Andy Yang) wrote…
nit: maybe rename to
decTargetLocked
to match theGauge
API
see above
pkg/ccl/changefeedccl/metrics.go
line 886 at r1 (raw file):
Previously, andyyang890 (Andy Yang) wrote…
nit: maybe we should clarify that it's the number of "unique targets"
Done
pkg/ccl/changefeedccl/helpers_test.go
line 1379 at r1 (raw file):
Previously, andyyang890 (Andy Yang) wrote…
nit: maybe rename to something like
waitForChangefeedRunningCount
I did some refactoring here and made this more generic. Wdyt?
pkg/ccl/changefeedccl/metrics_test.go
line 23 at r1 (raw file):
Previously, andyyang890 (Andy Yang) wrote…
nit: maybe rename to
TestRunningCountAndWatchedTargets
since those are the only two metrics being tested?
Done
pkg/ccl/changefeedccl/metrics_test.go
line 96 at r1 (raw file):
Previously, andyyang890 (Andy Yang) wrote…
Should we maybe add an assertion for the running count being 0?
Good idea. Done.
pkg/ccl/changefeedccl/metrics_test.go
line 164 at r1 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
Nice tests!
Thanks!
This PR adds a new metric, changefeed.watched_targets, which tracks the number of unique tables watched by all changefeeds. If a table is tracked in multiple feeds, it is only counted once. Epic: none Fixes: cockroachdb#118379 Release note (enterprise change): Adds a new metric, changefeed.watched_targets, which tracks the number of unique tables watched by all changefeeds. Note that a table remains watched even if a changefeed is paused, and only becomes unwatched if the changefeed is altered to remove the table (the changefeed must be resumed), is canceled, or fails.
9a3a1c3
to
46c2422
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM! Thanks for working on this.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @amruss, @andyyang890, @rharding6373, and @wenyihu6)
pkg/ccl/changefeedccl/metrics.go
line 1136 at r2 (raw file):
// WatchedTargets metrics if this is the first target with this id. // Requires m.mu to be locked. func (m *Metrics) incLocked(id int) {
It's probably better to name these with a specific name ex. incWatchedTargetsLocked
.
pkg/ccl/changefeedccl/metrics.go
line 1136 at r2 (raw file):
// WatchedTargets metrics if this is the first target with this id. // Requires m.mu to be locked. func (m *Metrics) incLocked(id int) {
nit: use targetID
instead of id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @amruss, @rharding6373, and @wenyihu6)
pkg/ccl/changefeedccl/metrics.go
line 165 at r1 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
Done
Thanks for adding comments!
pkg/ccl/changefeedccl/metrics.go
line 1083 at r2 (raw file):
// targets maps table IDs to the number of occurrences of the table ID in // changefeeds. targets map[int]int64
nit: do we maybe want to call this something more descriptive liketargetCounts
? if so, we can rename the functions below like incTargetCountLocked
and decTargetCountLocked
pkg/ccl/changefeedccl/metrics.go
line 1133 at r2 (raw file):
} // incrTarget increments the count with the target ID and increments the
nit: function name in comment needs to be updated
pkg/ccl/changefeedccl/metrics.go
line 1144 at r2 (raw file):
} // decrTarget decrements the count with the target ID and decrements the
nit: function name in comment needs to be updated
pkg/ccl/changefeedccl/helpers_test.go
line 1379 at r1 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
I did some refactoring here and made this more generic. Wdyt?
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mod some nit comments others already made
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @amruss and @rharding6373)
pkg/ccl/changefeedccl/metrics.go
line 207 at r1 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
Worst case implementation is a linked list of bitmaps, so Equals would be O(N). However, since table IDs are usually within a relatively small (100s) range, in practice I expect in most cases it would fit in a couple of bitmaps with only 1-3 linked list nodes.
Difference makes a copy of one of the sets first, but is also O(N).
Thanks for the explanation!
This PR adds a new metric, changefeed.watched_targets, which tracks the number of unique tables watched by all changefeeds. If a table is tracked in multiple feeds, it is only counted once.
Epic: none
Fixes: #118379
Release note (enterprise change): Adds a new metric, changefeed.watched_targets, which tracks the number of unique tables watched by all changefeeds. Note that a table remains watched even if a changefeed is paused, and only becomes unwatched if the changefeed is altered to remove the table (the changefeed must be resumed), is canceled, or fails.