Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use last second copy rate to project ETA #1231

Merged
merged 4 commits into from
Dec 18, 2024

Conversation

morgo
Copy link
Contributor

@morgo morgo commented Dec 14, 2022

A Pull Request should be associated with an Issue.

We wish to have discussions in Issues. A single issue may be targeted by multiple PRs.
If you're offering a new feature or fixing anything, we'd like to know beforehand in Issues,
and potentially we'll be able to point development in a particular direction.

Related issue: #1204

Further notes in https://github.com/github/gh-ost/blob/master/.github/CONTRIBUTING.md
Thank you! We are open to PRs, but please understand if for technical reasons we are unable to accept each and any PR

Description

Currently, the ETA that gh-ost uses is based on a projection that averages over the whole run time. Because (with a b+tree) the insert rate will slow down as the table gets larger, this can lead to inaccurate projections.

This uses a different approach, which is to project from the number of rows copied in the last second. It has the downside that the estimate can tend to "jump around" a bit, but sometimes this is helpful. I played around with changing the sample average to 10s etc, but it looks like 1s seems to be long enough to balance estimate stability.

Here is a sample output showing the time bouncing around a little:

~/go/src/github.com/morgo/gh-ost/go/cmd/gh-ost more-accurate-eta go run . \
--max-load=Threads_running=25 \
--critical-load=Threads_running=1000 \
--chunk-size=1000 \
--user="msandbox" \
--password="msandbox" \
--host="127.0.0.1" \
--port=8031 \
--allow-on-master \
--database="test" \
--table="customer" \
--initially-drop-old-table \
--ok-to-drop-table \
--execute  --initially-drop-ghost-table --alter="engine=innodb"
[2022/12/14 14:28:21] [info] binlogsyncer.go:148 create BinlogSyncer with config {99999 mysql 127.0.0.1 8031 msandbox    false false <nil> false UTC true 0 0s 0s 0 false false 0 <nil>}
[2022/12/14 14:28:21] [info] binlogsyncer.go:374 begin to sync binlog from position (binlog.000199, 157)
[2022/12/14 14:28:21] [info] binlogsyncer.go:791 rotate to (binlog.000199, 157)
# Migrating `test`.`customer`; Ghost table is `test`.`_customer_gho`
# Migrating mtocker-macbookpro.local:8031; inspecting mtocker-macbookpro.local:8031; executing on mtocker-macbookpro.local
# Migration started at Wed Dec 14 14:28:21 -0700 2022
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# Serving on unix socket: /tmp/gh-ost.test.customer.sock
Copy: 0/2881784 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: binlog.000199:3566; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: N/A
Copy: 0/2881784 0.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: binlog.000199:8040; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: N/A
Copy: 84000/2881784 2.9%; Applied: 0; Backlog: 0/1000; Time: 2s(total), 2s(copy); streamer: binlog.000199:47281787; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 1m6s
Copy: 168000/2881784 5.8%; Applied: 0; Backlog: 0/1000; Time: 3s(total), 3s(copy); streamer: binlog.000199:94566987; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 32s
Copy: 245000/2881784 8.5%; Applied: 0; Backlog: 0/1000; Time: 4s(total), 4s(copy); streamer: binlog.000199:137949978; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 34s
Copy: 323000/2881784 11.2%; Applied: 0; Backlog: 0/1000; Time: 5s(total), 5s(copy); streamer: binlog.000199:181313206; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 32s
Copy: 402000/2881784 13.9%; Applied: 0; Backlog: 0/1000; Time: 6s(total), 6s(copy); streamer: binlog.000199:226333018; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 31s
Copy: 476000/2881784 16.5%; Applied: 0; Backlog: 0/1000; Time: 7s(total), 7s(copy); streamer: binlog.000199:268011561; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 32s
Copy: 555000/2881784 19.3%; Applied: 0; Backlog: 0/1000; Time: 8s(total), 8s(copy); streamer: binlog.000199:312501660; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 29s
Copy: 615000/2881784 21.3%; Applied: 0; Backlog: 0/1000; Time: 9s(total), 9s(copy); streamer: binlog.000199:346275029; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 37s
Copy: 669000/2881784 23.2%; Applied: 0; Backlog: 0/1000; Time: 10s(total), 10s(copy); streamer: binlog.000199:376674938; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 40s
Copy: 726000/2881784 25.2%; Applied: 0; Backlog: 0/1000; Time: 11s(total), 11s(copy); streamer: binlog.000199:408728243; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 37s
Copy: 783000/2881784 27.2%; Applied: 0; Backlog: 0/1000; Time: 12s(total), 12s(copy); streamer: binlog.000199:440842291; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 36s
Copy: 846000/2881784 29.4%; Applied: 0; Backlog: 0/1000; Time: 13s(total), 13s(copy); streamer: binlog.000199:476296347; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 32s
Copy: 910000/2881784 31.6%; Applied: 0; Backlog: 0/1000; Time: 14s(total), 14s(copy); streamer: binlog.000199:512338908; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 30s
Copy: 969000/2881784 33.6%; Applied: 0; Backlog: 0/1000; Time: 15s(total), 15s(copy); streamer: binlog.000199:545572110; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 32s
Copy: 1026000/2881784 35.6%; Applied: 0; Backlog: 0/1000; Time: 16s(total), 16s(copy); streamer: binlog.000199:577675650; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 32s
Copy: 1087000/2881784 37.7%; Applied: 0; Backlog: 0/1000; Time: 17s(total), 17s(copy); streamer: binlog.000199:612023226; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 29s
Copy: 1148000/2881784 39.8%; Applied: 0; Backlog: 0/1000; Time: 18s(total), 18s(copy); streamer: binlog.000199:646342977; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 28s
Copy: 1204000/2881784 41.8%; Applied: 0; Backlog: 0/1000; Time: 19s(total), 19s(copy); streamer: binlog.000199:677881385; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 29s
Copy: 1265000/2881784 43.9%; Applied: 0; Backlog: 0/1000; Time: 20s(total), 20s(copy); streamer: binlog.000199:712235510; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 26s
Copy: 1335000/2881784 46.3%; Applied: 0; Backlog: 0/1000; Time: 21s(total), 21s(copy); streamer: binlog.000199:751664902; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 22s
Copy: 1403000/2881784 48.7%; Applied: 0; Backlog: 0/1000; Time: 22s(total), 22s(copy); streamer: binlog.000199:789461333; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 21s
Copy: 1464000/2881784 50.8%; Applied: 0; Backlog: 0/1000; Time: 23s(total), 23s(copy); streamer: binlog.000199:824289714; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 23s
Copy: 1527000/2881784 53.0%; Applied: 0; Backlog: 0/1000; Time: 24s(total), 24s(copy); streamer: binlog.000199:859468766; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 21s
Copy: 1592000/2881784 55.2%; Applied: 0; Backlog: 0/1000; Time: 25s(total), 25s(copy); streamer: binlog.000199:896372430; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 19s
Copy: 1662000/2881784 57.7%; Applied: 0; Backlog: 0/1000; Time: 26s(total), 26s(copy); streamer: binlog.000199:935798318; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 17s
Copy: 1725000/2881784 59.9%; Applied: 0; Backlog: 0/1000; Time: 27s(total), 27s(copy); streamer: binlog.000199:971271951; Lag: 0.11s, HeartbeatLag: 0.11s, State: migrating; ETA: 18s
Copy: 1793000/2881784 62.2%; Applied: 0; Backlog: 0/1000; Time: 28s(total), 28s(copy); streamer: binlog.000199:1009545478; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 16s
Copy: 1856000/2881784 64.4%; Applied: 0; Backlog: 0/1000; Time: 29s(total), 29s(copy); streamer: binlog.000199:1044945554; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 16s
[2022/12/14 14:28:51] [info] binlogsyncer.go:791 rotate to (binlog.000200, 4)
[2022/12/14 14:28:51] [info] binlogsyncer.go:791 rotate to (binlog.000200, 4)
Copy: 1920000/2881784 66.6%; Applied: 0; Backlog: 0/1000; Time: 30s(total), 30s(copy); streamer: binlog.000200:6753737; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 15s
Copy: 1977000/2881784 68.6%; Applied: 0; Backlog: 0/1000; Time: 31s(total), 31s(copy); streamer: binlog.000200:38877488; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 15s
Copy: 2032000/2881784 70.5%; Applied: 0; Backlog: 0/1000; Time: 32s(total), 32s(copy); streamer: binlog.000200:69863585; Lag: 0.11s, HeartbeatLag: 0.11s, State: migrating; ETA: 15s
Copy: 2093000/2881784 72.6%; Applied: 0; Backlog: 0/1000; Time: 33s(total), 33s(copy); streamer: binlog.000200:104128743; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 12s
Copy: 2158000/2881784 74.9%; Applied: 0; Backlog: 0/1000; Time: 34s(total), 34s(copy); streamer: binlog.000200:140822753; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 11s
Copy: 2220000/2881784 77.0%; Applied: 0; Backlog: 0/1000; Time: 35s(total), 35s(copy); streamer: binlog.000200:175718354; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 10s
Copy: 2280000/2881784 79.1%; Applied: 0; Backlog: 0/1000; Time: 36s(total), 36s(copy); streamer: binlog.000200:209458784; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 10s
Copy: 2342000/2881784 81.3%; Applied: 0; Backlog: 0/1000; Time: 37s(total), 37s(copy); streamer: binlog.000200:244384714; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 8s
Copy: 2405000/2881784 83.5%; Applied: 0; Backlog: 0/1000; Time: 38s(total), 38s(copy); streamer: binlog.000200:279865282; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 7s
Copy: 2469000/2881784 85.7%; Applied: 0; Backlog: 0/1000; Time: 39s(total), 39s(copy); streamer: binlog.000200:315425854; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 6s
Copy: 2529000/2881784 87.8%; Applied: 0; Backlog: 0/1000; Time: 40s(total), 40s(copy); streamer: binlog.000200:349673270; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 5s
Copy: 2598000/2881784 90.2%; Applied: 0; Backlog: 0/1000; Time: 41s(total), 41s(copy); streamer: binlog.000200:388534212; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 4s
Copy: 2661000/2881784 92.3%; Applied: 0; Backlog: 0/1000; Time: 42s(total), 42s(copy); streamer: binlog.000200:423824648; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 3s
Copy: 2718000/2881784 94.3%; Applied: 0; Backlog: 0/1000; Time: 43s(total), 43s(copy); streamer: binlog.000200:456090786; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 2s
Copy: 2775000/2881784 96.3%; Applied: 0; Backlog: 0/1000; Time: 44s(total), 44s(copy); streamer: binlog.000200:488195145; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: 1s
Copy: 2833000/2881784 98.3%; Applied: 0; Backlog: 0/1000; Time: 45s(total), 45s(copy); streamer: binlog.000200:520841157; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: due
Copy: 2894000/2881784 100.4%; Applied: 0; Backlog: 0/1000; Time: 46s(total), 46s(copy); streamer: binlog.000200:555202612; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: due
Copy: 2952000/2881784 102.4%; Applied: 0; Backlog: 0/1000; Time: 47s(total), 47s(copy); streamer: binlog.000200:587420664; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: due
Copy: 3000000/3000000 100.0%; Applied: 0; Backlog: 0/1000; Time: 47s(total), 47s(copy); streamer: binlog.000200:614436979; Lag: 0.01s, HeartbeatLag: 0.03s, State: migrating; ETA: due
Copy: 3000000/3000000 100.0%; Applied: 0; Backlog: 1/1000; Time: 48s(total), 47s(copy); streamer: binlog.000200:614902121; Lag: 0.01s, HeartbeatLag: 0.01s, State: migrating; ETA: due
# Migrating `test`.`customer`; Ghost table is `test`.`_customer_gho`
# Migrating mtocker-macbookpro.local:8031; inspecting mtocker-macbookpro.local:8031; executing on mtocker-macbookpro.local
# Migration started at Wed Dec 14 14:28:21 -0700 2022
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# Serving on unix socket: /tmp/gh-ost.test.customer.sock
Copy: 3000000/3000000 100.0%; Applied: 0; Backlog: 0/1000; Time: 48s(total), 47s(copy); streamer: binlog.000200:614905425; Lag: 0.01s, HeartbeatLag: 0.03s, State: migrating; ETA: due
[2022/12/14 14:29:10] [info] binlogsyncer.go:180 syncer is closing...
[2022/12/14 14:29:10] [info] binlogsyncer.go:864 kill last connection id 11
[2022/12/14 14:29:10] [info] binlogsyncer.go:210 syncer is closed
# Done

In case this PR introduced Go code changes:

  • contributed code is using same conventions as original code
  • script/cibuild returns with no formatting errors, build errors or unit test errors.

@meiji163
Copy link
Contributor

This is great 🚀 ! We often noticed the ETA is inaccurate for long-running migrations.

@meiji163 meiji163 merged commit 09052e6 into github:master Dec 18, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants