Change `versions` table layout for performance #1457

fatkodima · 2024-01-27T16:57:47Z

We are currently using paper_trail and have billions of items in the versions table and the table is huge.

The one easy improvement I noticed is the versions table layout. Currently, its layout is not optimal and will cause unnecessary fragmentation inside the table. There are good articles on the theme like one and two. Basically, we need to have fields with static sizes first in the table packed in a way to reduce paddings.

With the currently implemented layout, if we consider that the user decides to use bigint for whodunnit (see #1456), then whodunnit, item_id and created_at should be positioned on 8 bytes boundaries (because each of them have 8 bytes in size) and the fields that precede them can have a padding added at the end for this to happen. This can be as much as 7 bytes of padding for each field.

For example, if we have 4 billions of records in the database and each row has a 21 byte of wasteful padding, then we can save 4 * 10^9 * 21 / 10^9 ~ 100Gb 🔥 of memory by just doing this simple table layout change.

Also, afaik, postgres precalculates padding for columns in the row for statically sized columns (for prefix of the columns with static types) and than can easily jump to specific columns using that offsets when reading the row. Instead of manually traversing the row with dynamic column sizes to get to the needed column. So, this will also speedup the reading of whodunnit, item_id and created_at columns.

I believe, this will improve the situation for MySQL too.

Wrote good commit messages.
Feature branch is up-to-date with master (if not - rebase it).
Squashed related commits together.
Added tests.
Added an entry to the Changelog if the new
code introduces user-observable changes.
The PR relates to only one subject with a clear title
and description in grammatically correct, complete sentences.

jonatas · 2024-04-12T20:02:40Z

Hey @fatkodima , that looks so cool! Have you checked adding timescaledb to also partition the data by time? It would have a massive storage gains using compression with dictionary algorithms over all the repeated values.

Happy to have a chat and introduce it.

jaredbeck · 2024-05-28T17:04:01Z

Nice analysis, @fatkodima ! Thanks for the contribution.

franzliedke · 2024-09-09T06:14:21Z

Is it recommended to adopt this new table layout for existing apps? 🤔 I can imagine that would be quite a blocking DB migration...

fatkodima · 2024-09-09T08:06:08Z

Depending oh how much size it is expected to be saved, so maybe worth it. This can be done in nonblocking way by creating a separate table with proper layout, copying the data and making a switch.

Change versions table layout for performance

00827b9

jaredbeck merged commit 67a1ec2 into paper-trail-gem:master May 28, 2024
7 checks passed

fatkodima deleted the optimize-table-layout branch May 28, 2024 17:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change `versions` table layout for performance #1457

Change `versions` table layout for performance #1457

fatkodima commented Jan 27, 2024

jonatas commented Apr 12, 2024 •

edited

Loading

jaredbeck commented May 28, 2024

franzliedke commented Sep 9, 2024

fatkodima commented Sep 9, 2024

Change versions table layout for performance #1457

Change versions table layout for performance #1457

Conversation

fatkodima commented Jan 27, 2024

jonatas commented Apr 12, 2024 • edited Loading

jaredbeck commented May 28, 2024

franzliedke commented Sep 9, 2024

fatkodima commented Sep 9, 2024

Change `versions` table layout for performance #1457

Change `versions` table layout for performance #1457

jonatas commented Apr 12, 2024 •

edited

Loading