Skip to content
Andrew Geweke edited this page Nov 2, 2013 · 2 revisions

One of the principles of low_card_tables is that everything should be done in bulk, with one-by-one operations merely an instance of a bulk operation that happens to only operate on one item. Consequently:

  • Reading the low-card table is done in O(1) SQL statements.
  • Creating new low-card rows is done in O(1) SQL statements.

This also means that low_card_tables provides the tools you need to process large numbers of rows in a constant number of database queries. You will want to see the API for more details. But, as an example...

Say we have a database of 25,000,000 users with a brand-new low-card table, and we want to update those users to point to the right low-card rows. Here's what I would do:

  1. Use ActiveRecord::Batches, and in particular #find_in_batches, to find chunks of, say, 10,000 users at a time.
  2. Iterate through this chunk, extracting from each record the attribute values for that record — for example, { :deleted => false, :deceased => false, :gender => 'female', :donation_level => 5 }.
  3. Build a Hash that maps each of these attribute Hashes to an Array of all of the IDs of all the User models that have that particular set of attributes.
  4. At the end of the chunk, call UserStatus.low_card_find_or_create_ids_for, and pass in the array of unique Hashes.
  5. The return value will be a Hash that maps each of those Hashes to the correct low-card ID for that set of attributes.
  6. Now, using the Hash you built above, fire off one update statement per unique set of attributes, setting the low-card ID where the user ID is IN (...). (Alternatively, activerecord-import and its on_duplicate_key_update support can actually bulk-update all of them in one fell swoop.)
  7. Repeat for the next chunk of users.

In general, any operation you could want to do with low_card_tables, you can do in bulk, and it is efficiently implemented that way.

Clone this wiki locally