Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when building images, do chunk compression in worker thread pool #196

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

poettering
Copy link
Member

This gives a 10% speed improvement. (Not more unfortunately, as this just parallelizes the zstd work, but it's the sha512-256 logic that costs the most CPU time, and parallelizing that is much harder)

Let's copy in a new version from systemd
…chunks

This is a low hanging optimization fruit: when we encode a stream, then
pass the chunk compression/storing to a pool of worker threads.

This doesn't speed up encoding as much as I hoped, but still:

The firefox 63 sources (2.2G) are encoded in 1m16,902s instead of
1m24,281s on my 4 cpu system. i.e. roughly a 10% speed-up.
@keszybz
Copy link
Member

keszybz commented Apr 24, 2019

On my laptop:

$ rm -rf /var/tmp/default.castr /var/tmp/ffox.* ; casync --digest=default --compression=gzip --without=privileged make /var/tmp/ffox.caidx ~/Downloads/firefox/
(before)
4fe6372c31710ac3f3c2af54f56a896c5b30dfab97988abf2aedd3b145e378ad
29.95s user 1.82s system 96% cpu 32.915 total
29.85s user 1.89s system 97% cpu 32.691 total
29.87s user 1.80s system 98% cpu 32.087 total
(after)
36.57s user 1.52s system 232% cpu 16.405 total
36.60s user 1.48s system 233% cpu 16.324 total
36.90s user 1.52s system 211% cpu 18.167 total

So there's both a slow-down in total time, and growth in CPU usage.

On rpi3, I see a detect speedup, 19s→11–14s on /usr/lib/modules/4.19.2-300.fc29.aarch64/kernel/drivers/, and similar speedup on f30-test.fedorainfracloud.org (2 vCPU cloud instance).
I need to also test on a "beefy" machine, but I don't have one at hand right now.

@keszybz
Copy link
Member

keszybz commented Apr 24, 2019

Fails with a corrupt stack here:

casync: ../src/castore.c:274: worker_thread: Assertion `store->worker_thread_socket[1] >= 0' failed.
Core was generated by `build/casync make /tmp/archive.caidx /usr/lib64/kde3'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007efcc399353f in raise () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7efcc37df700 (LWP 32593))]
(gdb) bt
#0  0x00007efcc399353f in raise () from /lib64/libc.so.6
#1  0x00007efcc397d895 in abort () from /lib64/libc.so.6
#2  0x00007efcc397d769 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x00007efcc398b9f6 in __assert_fail () from /lib64/libc.so.6
#4  0x00007ffd3e60dbbe in ?? ()
#5  0x00007ffd3e60dbbf in ?? ()
#6  0x000000000041f94b in worker_thread (p=<error reading variable: Cannot access memory at address 0xffffffffffffff98>) at ../src/castore.c:274
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

(This is 100% repeatable.)

@keszybz
Copy link
Member

keszybz commented Apr 24, 2019

On a machine with 12 cores:

$ rm -rf /tmp/default.castr /tmp/archive.* && time build/casync make /tmp/archive.caidx /usr/lib64/firefox                            
ccf3d08f975b7be1fc274d798e81293ee3e12deb1922e12b59118beee46cac28

10.39s user 0.32s system 99% cpu 10.763 total
10.48s user 0.28s system 99% cpu 10.822 total
10.58s user 0.31s system 99% cpu 10.957 total

(after)
10.82s user 0.45s system 123% cpu 9.113 total
10.77s user 0.49s system 123% cpu 9.113 total
11.21s user 0.49s system 128% cpu 9.127 total

Again, a moderate speedup.

I guess we could merge this if the crash is figured out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants