-
Notifications
You must be signed in to change notification settings - Fork 4
Migrations
Migrating tables under low_card_tables
works almost identically to without low_card_tables
. (Obviously, the table structure is different, but migrations themselves change almost not at all.)
Key Hint: Keep any VARCHAR
fields (or other such fields) in your low-card table limited to relatively short lengths — e.g., :limit => 20
or whatever. Many databases impose a restriction on the maximum width of an index key (in MySQL InnoDB, it's 767 bytes); if you simply say t.string :foo
in your migrations, this creates a VARCHAR(255)
, and it only takes three of these to exceed that limit. Generally speaking, this isn't much of an issue, since you're storing only strings with programmatic meaning in a low-card table, anyway.
There are a couple of key exceptions:
- Low-card tables require a unique index across all attribute columns (and in fact the
low_card_tables
gem tests for this index, and will blow up and refuse to run if it's not present), and so there's support for automatically creating and maintaining this index. - Removing columns from a low-card table is slightly more complex, exactly because of the requirement that there always be a unique index and thus exactly one row in the low-card table for each unique combination of attribute values. There's automatic support for this, too.
As mentioned previously, there must always be a unique index on all columns in a low-card table. Fortunately, low_card_tables
will create and maintain this index for you automatically — all it has to do is know that a table is a low-card table. It can do this in one of two ways:
-
Explicitly: if you add
:low_card => true
to the optionsHash
of migration statements likecreate_table
,add_column
,remove_column
, orchange_table
, it will know it's a low-card table. -
Implicitly: if there is a model defined for a given table, and it declares
is_low_card_table
, thenlow_card_tables
knows that it's a low-card table.
(Hint: it never hurts to be too explicit. ;)
Either way, when columns are changed on the table in question, low_card_tables
will first drop the unique index, then run the migration code, then recreate the unique index automatically. low_card_tables
always names the index something like index_user_statuses_lc_on_all
; it will truncate the embedded table name as necessary to maintain an index name that's not too long for your database system. However, the index can be called anything at all — the code simply verifies that there is a unique index over all columns in the low-card table, not that it is named anything in particular.
(See also Options — the unique index will not cover, and is not required to cover, columns like created_at
or updated_at
that are not conceptually attributes on the low-card table.)
If you need to perform several migration operations at once, or run any code at all, and want the unique index dropped and created only once, simply do the following in a migration:
change_low_card_table(:user_statuses) do
...any code at all...
end
The index will be dropped before executing the block and added after it, and automatic dropping/adding of the index disabled for the duration of the block.
Adding columns to a live system is always slightly tricky, but it is no more tricky using low_card_tables
than it would be without it:
- When you add a new column, currently-running production code will not know about it. Thus, you must always add new columns as nullable (no
:null => false
in your migration) or with a default (:default => ...
) — otherwise, any production code that tries to insert into the table will fail, since it won't know about the new column or what it should put there. - It is perfectly safe to add a nullable new column to a low-card table and then run an
UPDATE
statement (e.g.,UserStatus.update_all(...)
) to fill it in —low_card_tables
will pick up the new values the next time it flushes its cache. (See Caching for more information.) - If you need to add a new column whose values may be different for different referring rows — for example, imagine you decide to add a
key_customer
column that's computed based on that customer's history — then you simply have to do the same thing you would withoutlow_card_tables
. Add the column and either make it nullable or give it a default, and then bulk-update referring rows as necessary. See Bulk Operations for more details about how to update in bulk efficiently.
Removing columns from a low-card table is interesting exactly because of the unique-index requirement: when you drop a column, you will almost certainly end up with duplicate rows, because there were previously rows that differed only in the value present in the now-removed column. The right thing to do is to collapse these rows into one (pick a winner and delete the other rows)...and then go look at every column in every table that refers to the low-card table, find all values that point to 'loser' rows, and update them to point to the remaining 'winner' rows. Fortunately, this happens automatically for you — you can just use remove_column
and forget about it.
So why are we writing this?
Because sometimes updating referring tables is a big task, as there might be millions (or billions) of rows, and you might want some control over this task. As such, there's additional support in low_card_tables
for controlling how you want to handle this:
remove_column :foo, :low_card_collapse_rows => false
If you do this, low_card_tables
will not delete any rows or update any referring tables; you will be left with duplicate rows in your low-card table, and no unique index. (Note that this will prevent low_card_tables
from working until you fix it.) You're on your own to do this yourself, in whatever way you think is best; see API for methods that are useful, including low_card_collapse_rows_and_update_referrers!
on the low-card model class.
remove_column :foo, :low_card_update_referring_models => false
If you do this, low_card_tables
will delete duplicate rows, picking a winner, but will not update any referring models. This means you're on your own to update those models properly — or else, you'll have dangling foreign keys. This is very dangerous, because you'll have no way of knowing what the dangling foreign keys used to point to; use it only in exceptional circumstances. Usually, you're better off using :low_card_update_referring_models => false
, then collapsing the rows with an explicit call to low_card_collapse_rows_and_update_referrers!(:low_card_update_referring_models => false)
, which will return you a Hash
mapping winners to an Array
of losers; you can then update data properly.
remove_column :foo, :low_card_referrers => [ User, Admin ]
This actually simply forces low_card_tables
to look at User
and Admin
when it looks for referring models that it needs to update, in addition to the models it picks up automatically. This can be used in case there are models it doesn't automatically pick up (which, theoretically, should never happen).
All updates in question, including the remove_column
itself, are automatically wrapped in a transaction that runs across the low-card model and any referring models. However, most databases don't support transactional DDL; as a result, running code will 'see' an inconsistent state, where the column has been removed but rows haven't been updated, and this can be dangerous.
Fixing this is relatively straightforward:
- First, modify your low-card model to exclude the column you're going to remove:
is_low_card_table :exclude_column_names => [ :foo ]
. - Run an empty migration over that model —
change_low_card_table :user_statuses { }
. This will cause the migration code to remove the unique index and then re-create it without the column in question. - Deploy this code with the excluded column to production. Your entire system will now completely ignore the missing column.
- Use
remove_column
to remove the column.
There are different schemes for updating referring models. By default, the code will update the referring models in chunks of 10,000 rows at a time, but wrapped in a transaction so that it appears to all happen at once to running code. However, this might not be what you want. So, on a referring model:
class User < ActiveRecord::Base
has_low_card_table :status
low_card_value_collapsing_update_scheme lambda { |map| ... }
end
Now, instead of any automatic updating, the lambda
specified will get called with a Hash
mapping model objects that are remaining in the table to the corresponding model objects that have been deleted; you can use this to update rows however you want.
class User < ActiveRecord::Base
has_low_card_table :status
low_card_value_collapsing_update_scheme :none
end
You're on your own — no automatic updating will happen at all.
class User < ActiveRecord::Base
has_low_card_table :status
low_card_value_collapsing_update_scheme 500
end
low_card_tables
will update this table in chunks of 500 rows, instead of 10,000.