The loss function in WGAN GP seems a little bit confusing #8

Unispac · 2019-06-04T15:20:24Z

I notice that your loss in WGAN GP is different from that in the original paper. From the theory of GAN, the loss should just use D and don't need a log to wrap it . BUT I found that you have used a log ... I tested the code , your version works very well , and then I replace the loss with the version offered by the paper , but it behaves badly ...
I don't know the reason ,... I think using D WITHOUT LOG is more right theorically ... if we use a log , won’t it just be the same to. GAN ?

JasonYao81000 · 2019-06-05T06:13:42Z

Could you provide how you replace tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits()) without logits?
The loss of the original WGAN should not include the sigmoid or softmax functions, it must be linear output, so the loss can be the mean over a batch outputs.
You can try to replace our loss with something like this self.d_loss = -tf.reduce_mean(D_real) + tf.reduce_mean(D_fake) + tf.reduce_mean(D_wrong_img) + tf.reduce_mean(D_wrong_label), and the self.g_loss can be also replaced like this.
There may not be the exact answer to your question, have fun to try your own experiment.

Unispac · 2019-06-05T09:07:10Z

Thank you for your reply !!!

"""
beta = tf.random_uniform(imageRotated.get_shape(),minival=0.,maxval=1.)
differenes = G - imageRotated
interpolates = imageRotated + beta*differenes
D_inter = self.discriminator(interpolates,isTraining=True,reuse=True)
gradients = tf.gradients(D_inter,[interpolates])[0]
slopes = tf.sqrt(tf.reduce_sum(tf.square(gradients),reduction_indices=[1])) # gradient penalty

    dLossReal = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=dReal, labels=tf.ones_like(dReal)))
    dLossFake = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=dFake, labels=tf.zeros_like(dFake)))
    self.dLoss = dLossReal + dLossFake + self.theta*GP  #loss of discriminator. (with GP)
	
    self.gLoss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=dFake, labels=tf.ones_like(dFake)))

"""

Above is the loss function you use in WGAN-GP, right?

In the original GAN, we use a sigmoid to constrain the output of D, and we define the loss with log(D).
So the loss of D may looks like this : -E(log(D_real)) - E(log(1-D_fake))

But in WGAN, there should not be a log to wrap the D, because the D is used to fit a function f which is
used to calculate the Wasserstin distance. So the loss should use D directly instead of log(D).
And we use a clip strategy to constraint the parameter for D, so we can meet the Lipschitz condition.
And the loss of D is like this : -E(D_real) + E(D_fake).

And in WGAN-GP, we don't use the clip strategy, we use a GP to meet the Lipschitz condition.
Then the loss of D is like this : -E(D_real) + E(D_fake) + lamda*GP

So I think the best loss when we implement WGAN-GP should be like this :
"""
dLossReal = -tf.reduce_mean(dReal)
dLossFake = tf.reduce_mean(dFake)
self.dLoss = dLossFake + dLossReal + self.theta*GP #loss of discriminator. (with GP)
self.gLoss = -tf.reduce_mean(dFake) #loss of generator.
"""

But when I replace your loss with the version above,it failed!!!..
I have spent a day in finding the reason. But I just can not make it...

Firstly, I think if we use the loss function offered by you, we are just training an original GAN rather than WGAN-GP..
I don't get the point why you use a log to wrap the output of D... We are training a D to fit the Wasserstin distance.
But when there is a log, will it be the Wasserstin distance? I am a little bit confused..

Also, I found that you are using a lamda = 0.25 .. But usually, people use lambada = 10..
So I think your GP may not actually act as the role to keep the Lipschitz condition ..
It may just be a regularization item..
After realizing this, I modifed the lamba, and make it 10. But also failed.. The result is bad.

Also,I have compared your code with other implementations.. I find that there is some differences when calculating GP..

In your code :
beta = tf.random_uniform(imageRotated.get_shape(),minival=0.,maxval=1.)
the shape of the beta is 64x64x64x3 ..

But in other implementations, they usually use :
beta = tf.random_uniform(shape=[self.batchSize,1,1,1],minival=0.,maxval=1.)
the shape is 64x1x1x1 ...

Also In your code :
slopes = tf.sqrt(tf.reduce_sum(tf.square(gradients),reduction_indices=[1])) # gradient penalty

But the most used version is :
slopes = tf.sqrt(tf.reduce_sum(tf.square(gradients),reduction_indices=[1,2,3])) # gradient penalty

These points also make me think that your GP don't function much ..
It seems that your code is just a GAN with a little changes..
I am not sure ..
I am still trying to make it to be right ..
But the loss without log failed again ad again ..

JasonYao81000 · 2019-06-05T09:33:37Z

Here is the WGAN-GP we referenced.
We are the WGAN-GP with condition label, but the original WGAN-GP is pure GAN.
You are right, you can say that we just implemented a condition GAN with gradient penalty.

Unispac · 2019-06-05T09:49:33Z

Thank you! I will go to check it again.

huangjicun · 2019-06-05T09:53:54Z

dLossReal = -tf.reduce_mean(dReal)
dLossFake = tf.reduce_mean(dFake)
self.dLoss = dLossFake + dLossReal + self.theta*GP #loss of discriminator. (with GP)
self.gLoss = -tf.reduce_mean(dFake) #loss of generator.

Because We use WGAN-GP with condition label, our loss has to contain the part of the condition.
Therefore, our loss is calculated by using tf.nn.sigmoid_cross_entropy_with_logits.

Unispac · 2019-06-05T11:31:08Z

dLossReal = -tf.reduce_mean(dReal)
dLossFake = tf.reduce_mean(dFake)
self.dLoss = dLossFake + dLossReal + self.theta*GP #loss of discriminator. (with GP)
self.gLoss = -tf.reduce_mean(dFake) #loss of generator.
Because We use WGAN-GP with condition label, our loss has to contain the part of the condition.
Therefore, our loss is calculated by using tf.nn.sigmoid_cross_entropy_with_logits.

Do you mean , as you have to contain the condition loss so you have to use a log ?
But maybe this will ruin the principles of WGAN ?
I presume that as the lamba for GP you use is merely 0.25, so It is just like normal GAN, and thus you get a good result.

I am testing the hw3-1, there is no label, and don't need a condition item.
The sigmoid_cross_entropy loss works well, but when they are replaced with the WGAN version , I can't get a good result.. I still don't know why..

Unispac · 2019-06-05T12:08:01Z

And another important question is : Whether the calculation of GP is right ? ... I really think it seems to be wrong...
slopes = tf.sqrt(tf.reduce_sum(tf.square(gradients),reduction_indices=[1])) # gradient penalty
It seems that the shape of the gradient (a scalar differentiate a 64x64x3 image) is 64x64x3 .. And as it is a batch , it should be 64x64x64x3 .. And just sumarize the axis 1 seems wrong.
I find that many other implementations use :
slopes = tf.sqrt(tf.reduce_sum(tf.square(gradients),reduction_indices=[1,2,3])) # gradient penalty

Unispac · 2019-06-06T02:28:59Z

Update
I have gotten the bug.
As there are norm layers in G, so its optimizer must declare the dependency.
If we want to make the WGAN-GP work, the calculation of GP must be modified as the version I offered above. And the lambda should be 10.

JasonYao81000 · 2019-06-06T09:59:37Z

Cool, thanks for your experiments.
Besides the WGAN-GP, you should try spectral normalization for the discriminator, it works like a magic.

Unispac · 2019-06-06T13:44:14Z

Cool, thanks for your experiments.
Besides the WGAN-GP, you should try spectral normalization for the discriminator, it works like a magic.

Really? In discriminators? Maybe you mean generator? Can We add norm layers in discriminator for WGan?

JasonYao81000 · 2019-06-06T13:59:09Z

You can take a look at the conclusions of this paper by Google.
They said Our fair and thorough empirical evaluation suggests that when the computational budget is limited one should consider non-saturating GAN loss and spectral normalization as default choices when applying GANs to a new dataset.
And the spectral normalization is also implemented into pytorch now.

Unispac · 2019-06-06T14:52:22Z

You can take a look at the conclusions of this paper by Google.
They said Our fair and thorough empirical evaluation suggests that when the computational budget is limited one should consider non-saturating GAN loss and spectral normalization as default choices when applying GANs to a new dataset.
And the spectral normalization is also implemented into pytorch now.

Oh, I get it. Thanks!!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The loss function in WGAN GP seems a little bit confusing #8

The loss function in WGAN GP seems a little bit confusing #8

Unispac commented Jun 4, 2019

JasonYao81000 commented Jun 5, 2019

Unispac commented Jun 5, 2019 •

edited

Loading

JasonYao81000 commented Jun 5, 2019 •

edited

Loading

Unispac commented Jun 5, 2019

huangjicun commented Jun 5, 2019 •

edited by JasonYao81000

Loading

Unispac commented Jun 5, 2019

Unispac commented Jun 5, 2019 •

edited

Loading

Unispac commented Jun 6, 2019 •

edited

Loading

JasonYao81000 commented Jun 6, 2019

Unispac commented Jun 6, 2019

JasonYao81000 commented Jun 6, 2019

Unispac commented Jun 6, 2019

The loss function in WGAN GP seems a little bit confusing #8

The loss function in WGAN GP seems a little bit confusing #8

Comments

Unispac commented Jun 4, 2019

JasonYao81000 commented Jun 5, 2019

Unispac commented Jun 5, 2019 • edited Loading

JasonYao81000 commented Jun 5, 2019 • edited Loading

Unispac commented Jun 5, 2019

huangjicun commented Jun 5, 2019 • edited by JasonYao81000 Loading

Unispac commented Jun 5, 2019

Unispac commented Jun 5, 2019 • edited Loading

Unispac commented Jun 6, 2019 • edited Loading

JasonYao81000 commented Jun 6, 2019

Unispac commented Jun 6, 2019

JasonYao81000 commented Jun 6, 2019

Unispac commented Jun 6, 2019

Unispac commented Jun 5, 2019 •

edited

Loading

JasonYao81000 commented Jun 5, 2019 •

edited

Loading

huangjicun commented Jun 5, 2019 •

edited by JasonYao81000

Loading

Unispac commented Jun 5, 2019 •

edited

Loading

Unispac commented Jun 6, 2019 •

edited

Loading