Change versions
table layout for performance
#1457
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We are currently using paper_trail and have billions of items in the
versions
table and the table is huge.The one easy improvement I noticed is the
versions
table layout. Currently, its layout is not optimal and will cause unnecessary fragmentation inside the table. There are good articles on the theme like one and two. Basically, we need to have fields with static sizes first in the table packed in a way to reduce paddings.With the currently implemented layout, if we consider that the user decides to use
bigint
forwhodunnit
(see #1456), thenwhodunnit
,item_id
andcreated_at
should be positioned on 8 bytes boundaries (because each of them have 8 bytes in size) and the fields that precede them can have a padding added at the end for this to happen. This can be as much as 7 bytes of padding for each field.For example, if we have 4 billions of records in the database and each row has a 21 byte of wasteful padding, then we can save
4 * 10^9 * 21 / 10^9
~ 100Gb 🔥 of memory by just doing this simple table layout change.Also, afaik, postgres precalculates padding for columns in the row for statically sized columns (for prefix of the columns with static types) and than can easily jump to specific columns using that offsets when reading the row. Instead of manually traversing the row with dynamic column sizes to get to the needed column. So, this will also speedup the reading of
whodunnit
,item_id
andcreated_at
columns.I believe, this will improve the situation for MySQL too.
master
(if not - rebase it).code introduces user-observable changes.
and description in grammatically correct, complete sentences.