Proper way to serialize and load model structure #4089

JeyRunner · 2024-07-18T10:35:56Z

JeyRunner
Jul 18, 2024

Hi,
I want to be able to serialize multiple different nnx models to disk (not just the weights but also the full layer structure).
This is helpful when trying out a bunch of different architectures that I trained beforehand and just want to test in eval/inference mode.

Currently, I am using Orbax to save the model train state. But this requires that the trainstate structure is created before loading the checkpoint. I am doing something like this:

# this is problematic since the model structure is changed for different trails
model = create_model(params...)    # this is problematic since the model structure is changed for different trails
train_state = create_train_state(model)

train_state = checkpointing.restore(...)

What I would like to do is something like this:

# load trainstate with model structure
train_state = checkpointing.restore(step=....)

This may be possible with pickling the create_model function (but could not work because of lambda functions in the create_model function), but I guess this is not the idiomatic way.

In torch you can do:

torch.save(model, PATH)

# just load model for inference
model = torch.load(PATH)
model.eval()

I basically would like to use the same kind of API as in torch but with flax/nnx.

cgarciae · 2024-07-19T14:22:30Z

cgarciae
Jul 19, 2024
Maintainer

Hey, we currently don't have a load-like API. Using Orbax and nnx.eval_shape you can do the following:

def load_model(path: str) -> MLP:
  # create that model with abstract shapes
  model = nnx.eval_shape(lambda: create_model(0))
  state = nnx.state(model)
  # Load the parameters
  checkpointer = orbax.PyTreeCheckpointer()
  state = checkpointer.restore(f'{path}/state', item=state)
  # update the model with the loaded state
  nnx.update(model, state)
  return model

This is taken from this 08_save_load_checkpoints.py.

5 replies

JeyRunner Jul 19, 2024
Author

Thank you for the answer (I didn't notice there is an example for this usecase), when I understand this correctly this allows to have different shapes in the defined and loaded model. So e.g. the parameter dimensions can be different. This is a step in the right direction.

However, this would not allow for a different number or types of layers in the loaded checkpoint, right? Optimally the create_model function would not be used at all to load a checkpoint.

cgarciae Jul 19, 2024
Maintainer

However, this would not allow for a different number or types of layers in the loaded checkpoint, right?

Can you clarify your use case? Very generally I would say you can either customize your create_model function or just do some surgery. Check the Model Surgery guide which is actually has more realistic examples of checkpointing.

JeyRunner Jul 22, 2024
Author

Sorry when my explanation was a bit confusing 😅 .
The use case is like this:

create a model with some architecture
train it on a fixed dataset
save the model structure and weights to disk
repeat steps 1. to 3. but with a different model architectures (like convolution instead of linear layers, or more complex changes)
All the models have the same input/output format (since they are trained on the same dataset).

Now I just want to run the trained models from before, but I don't care about the architecture anymore.
The used architectures may be very different, thus I don't want to parameterize the create_model function (since the parametrization would essentially be the full model structure).

Thus it would be nice to be able to just load a model (as an nnx.Module) with some weights from a checkpoint without providing the create_model function.
Just like:

# load model structure as module form disk
model: nnx.Module = nnx.load_module_from_disk('path/to/my_nnx_module')  # the new required new function
state = nnx.state(model)

# load weights from a checkpoint
checkpointer = orbax.PyTreeCheckpointer()
state = checkpointer.restore(f'{path}/state', item=state)

# use the model ...
model.eval()
# ...

I just need a way to serialize and deserialize the model structure and recreate the nnx.module from it.

cgarciae Jul 23, 2024
Maintainer

You can try using pickle / cloudpickle for this, we currently don't test for this but if you have issues please bring them up.

JeyRunner Jul 23, 2024
Author

Thanks, I'll try cloudpickle :)
I was just thinking there may be a more idiomatic way of doing this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proper way to serialize and load model structure #4089

{{title}}

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Proper way to serialize and load model structure #4089

JeyRunner Jul 18, 2024

Replies: 1 comment · 5 replies

cgarciae Jul 19, 2024 Maintainer

JeyRunner Jul 19, 2024 Author

cgarciae Jul 19, 2024 Maintainer

JeyRunner Jul 22, 2024 Author

cgarciae Jul 23, 2024 Maintainer

JeyRunner Jul 23, 2024 Author

JeyRunner
Jul 18, 2024

Replies: 1 comment 5 replies

cgarciae
Jul 19, 2024
Maintainer

JeyRunner Jul 19, 2024
Author

cgarciae Jul 19, 2024
Maintainer

JeyRunner Jul 22, 2024
Author

cgarciae Jul 23, 2024
Maintainer

JeyRunner Jul 23, 2024
Author