Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BitFilter made faster #433

Merged
merged 9 commits into from
Nov 7, 2024
Merged

BitFilter made faster #433

merged 9 commits into from
Nov 7, 2024

Conversation

Scooletz
Copy link
Contributor

@Scooletz Scooletz commented Nov 6, 2024

This PR amends BitFilter to make it faster. Tracing shows that up to 8% of the block execution falls into the BitFilter creation. By making it faster, we can improve the time before other endeavors like #417 are taken. This PR:

  1. improves Page.OrWith by unrolling the loop and making two vectorized operations per spin that gives ~20% improvement
  2. makes or-ing of multiple filters parallel, by applying the operation per range of pages at the specific index. Think running it by column rather then by row. This should help in both cases, where the number of committed sets is low and high.
  3. uses SSE prefetch to get the next page before it's ORed
  4. modifies the OfN bit filter so that it uses a generic <TSize> parameter that allows to skip the length, mask and shift fields

Overall, BitFilter creation is ~60% faster now (depends on the number of ancestors).

The allocation on the filter comes mostly from Parallel.For that allocates 3.6kb on its own. It looks that we need to pay the cost.

Benchmarks

Page.OrWith

Method Mean Error StdDev Code Size
OrWith (before) 41.86 ns 0.576 ns 0.510 ns 48 B
OrWith (after) 36.77 ns 0.372 ns 0.330 ns 63 B

BitFiler.OrWith

Method count Mean Error StdDev Gen0 Code Size Allocated
Or_BitMapFilter_OfN_128_Multiple (before) 16 223.0 us 4.13 us 3.86 us 0 376 B 328 B
Or_BitMapFilter_OfN_128_Multiple 16 101.9 us 5.61 us 16.28 us 0.3662 2,370 B 5.56 KB
Or_BitMapFilter_OfN_128_Multiple (before) 32 498.0 us 9.87 us 17.29 us 0 376 B 584 B
Or_BitMapFilter_OfN_128_Multiple 32 108.6 us 2.14 us 3.08 us 0.2441 2,408 B 4.77 KB
Or_BitMapFilter_OfN_128_Multiple (before) 64 1,423.7 us 19.28 us 16.10 us 0 376 B 1097 B
Or_BitMapFilter_OfN_128_Multiple 64 344.2 us 5.55 us 4.92 us 0.4883 2,401 B 6.42 KB

Assembly

The assembly for Page.OrWith

Before

; Paprika.Benchmarks.PageExtensionsBenchmarks.OrWith()
       vzeroupper
       mov       rax,[rcx+8]
       mov       rcx,[rcx+10]
       xor       edx,edx
       nop       dword ptr [rax]
M00_L00:
       vmovups   ymm0,[rax+rdx]
       vpor      ymm0,ymm0,[rcx+rdx]
       vmovups   [rax+rdx],ymm0
       add       rdx,20
       cmp       rdx,1000
       jb        short M00_L00
       vzeroupper
       ret
; Total bytes of code 48

After

; Paprika.Benchmarks.PageExtensionsBenchmarks.OrWith()
       vzeroupper
       mov       rax,[rcx+8]
       mov       rcx,[rcx+10]
       xor       edx,edx
M00_L00:
       vmovups   ymm0,[rax+rdx]
       vpor      ymm0,ymm0,[rcx+rdx]
       vmovups   [rax+rdx],ymm0
       vmovups   ymm0,[rax+rdx+20]
       vpor      ymm0,ymm0,[rcx+rdx+20]
       vmovups   [rax+rdx+20],ymm0
       add       rdx,40
       cmp       rdx,1000
       jb        short M00_L00
       vzeroupper
       ret
; Total bytes of code 63

@Scooletz Scooletz added the 🐌 performance Perofrmance related issue label Nov 6, 2024
Copy link

github-actions bot commented Nov 6, 2024

Code Coverage

Package Line Rate Branch Rate Health
Paprika 83% 81%
Summary 83% (4817 / 5784) 81% (1641 / 2034)

Minimum allowed line rate is 75%

@Scooletz Scooletz marked this pull request as ready for review November 7, 2024 08:14
@Scooletz Scooletz merged commit a4f07e8 into main Nov 7, 2024
5 checks passed
@Scooletz Scooletz deleted the bit-filter-or branch November 7, 2024 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐌 performance Perofrmance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant