Transfer Learning and variable collections #857

cgarciae · 2021-01-15T14:58:39Z

cgarciae
Jan 15, 2021
Maintainer

Hey,

I was wondering what is the purpose of variable collections? I don't know what is the current state of transfer learning in Flax is but as it requires "parameters surgery" my guess is that having all these different collections makes getting previous parameters to their correct positions in the new architecture increasingly more challenging.

Proposal

Remove / hide the Module.variable(col) parameter and just have 2 canonical collections:

params: what we currently have for the trainable parameters
states: name for the non-trainable parameters.

jheek · 2021-01-15T15:06:02Z

jheek
Jan 15, 2021
Maintainer

I'm not sure what you mean with "all the different collections" the base Modules only use 'params', 'batch_stats' (BatchNorm) and 'cache' for decoding in autoregressive attention. The idea here is to not simply put anything that is not a paramater instead one big collection.

Consider for example syncing the batch statistics across devices for a certain model. What if all its internal state was in 'state'. How do I now make sure that I only sync the batch statistics?

1 reply

avital Jan 15, 2021

Other examples are user-defined "write-only collections" such as those exposing intermediate values, or collecting additional statistics.

cgarciae · 2021-01-15T16:47:15Z

cgarciae
Jan 15, 2021
Maintainer Author

I see, thanks for the info! In general am wondering what is the current recommendation for the following scenario:

Say you train Module A and serialize its variables to disk. Latter, create a Module B that uses Module A and want to inject the pre-trained parameters of A.

Similarly, what if you want to do opposite, you train B and then want to extract just A for inference (e.g. A is a decoder and you want to generate samples).

1 reply

avital Jan 19, 2021

(We should probably write this down more carefully as a "HOWTO/Pattern" that's also tested, but here's a short answer)

For case 1, you'd construct the variable dict for module B, e.g.:

decoder_variables = load_from_disk()
encoder_variables = Encoder(...).init(...)

autoencoder_variables = {
  'params': {
    'decoder': decoder_variables['params'],
    'encoder': encoder_variables['params'],
  },
  # ... if there are other variable collections, add them here
}

And similarly for the second case:

decoder_variables = {
  'params': autoencoder_variables['params']['decoder'],
  # ... more collections here
}

Perhaps we should expose a small utility function to help with this -- essentially you need to "transpose" the collection vs submodule nesting order.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transfer Learning and variable collections #857

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Transfer Learning and variable collections #857

cgarciae Jan 15, 2021 Maintainer

Proposal

Replies: 2 comments · 2 replies

jheek Jan 15, 2021 Maintainer

avital Jan 15, 2021

cgarciae Jan 15, 2021 Maintainer Author

avital Jan 19, 2021

cgarciae
Jan 15, 2021
Maintainer

Replies: 2 comments 2 replies

jheek
Jan 15, 2021
Maintainer

cgarciae
Jan 15, 2021
Maintainer Author