Question about LayerNorm calculation #1089

bhchiang · 2021-03-06T01:22:22Z

bhchiang
Mar 6, 2021

I'm looking to implement InstanceNorm, and I was looking at the LayerNorm implementation (https://flax.readthedocs.io/en/latest/_modules/flax/linen/normalization.html#LayerNorm).

In CNNs, Flax accepts input in the NHWC format (from inspecting the MNIST example).

LayerNorm is defined as so in the paper (https://arxiv.org/abs/1607.06450) as computing mean and variance over all channels (C) and spatial dimensions (HW). But the implementation computes mean and variance over only the last axis:

x = jnp.asarray(x, jnp.float32)
features = x.shape[-1]
mean = jnp.mean(x, axis=-1, keepdims=True)
mean2 = jnp.mean(lax.square(x), axis=-1, keepdims=True)
var = mean2 - lax.square(mean)

Why don't we compute it as so?

mean = jnp.mean(x, axis=(-1, -2, -3), keepdims=True) # -1:-(len(x.shape)-1)

Thanks!

Answered by jheek

Mar 8, 2021

I think we should be a little careful here to avoid reducing over axes that actually aren't spatial dims. To me it would make most sense if axis is an constructor argument with default value -1. Feel free to file a PR or issue for this.

View full answer

jheek · 2021-03-08T14:22:43Z

jheek
Mar 8, 2021
Maintainer

I think we should be a little careful here to avoid reducing over axes that actually aren't spatial dims. To me it would make most sense if axis is an constructor argument with default value -1. Feel free to file a PR or issue for this.

2 replies

bhchiang Mar 28, 2021
Author

Adding an extra axis argument makes sense - I can file a PR for this. It seems that InstanceNorm is a special case of GroupNorm with group_size=1 (compute across all spatial dimensions for each channel), and LayerNorm is a special case of GroupNorm with num_groups=1 (compute across spatial dimensions across all channels).

Would it be worth writing these two cases up separately?

jheek Mar 29, 2021
Maintainer

Ideally we would just have a single Norm layer IMO and a bunch of partially applied specialisations of it. We have had subtle bugs show up in the past due to small (numerical) difference between the norm variants.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about LayerNorm calculation #1089

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Question about LayerNorm calculation #1089

bhchiang Mar 6, 2021

Replies: 1 comment · 2 replies

jheek Mar 8, 2021 Maintainer

bhchiang Mar 28, 2021 Author

jheek Mar 29, 2021 Maintainer

bhchiang
Mar 6, 2021

Replies: 1 comment 2 replies

jheek
Mar 8, 2021
Maintainer

bhchiang Mar 28, 2021
Author

jheek Mar 29, 2021
Maintainer