03_pytorch_computer_vision.ipynb crossEntropyLoss() #1031

rithvikshetty · 2024-08-04T18:44:10Z

rithvikshetty
Aug 4, 2024

Model used in the video for this was:

class FashionMNISTModelV1(nn.Module):
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
        super().__init__()
        self.layer_stack = nn.Sequential(
            nn.Flatten(), # flatten inputs into single vector
            nn.Linear(in_features=input_shape, out_features=hidden_units),
            nn.ReLU(),
            nn.Linear(in_features=hidden_units, out_features=output_shape),
            nn.ReLU()
        )
    
    def forward(self, x: torch.Tensor):
        return self.layer_stack(x)

But, the CrossEntropy requires the input to be Logits and not the activated output right?

So, shouldn't we remove the ReLU from the last output layer?

LuluW8071 · 2024-08-05T16:07:31Z

LuluW8071
Aug 5, 2024

Yes, the activation function should be kept only between layers not outside the layers that is the case for every model architecture

class FashionMNISTModelV1(nn.Module):
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
        super().__init__()
        self.layer_stack = nn.Sequential(
            nn.Flatten(), # flatten inputs into single vector
            nn.Linear(in_features=input_shape, out_features=hidden_units),
            nn.ReLU(),
            nn.Linear(in_features=hidden_units, out_features=output_shape),
        )
    
    def forward(self, x: torch.Tensor):
        return self.layer_stack(x)

6 replies

LuluW8071 Aug 6, 2024

Inputs for BCEWithLogitsLoss are logits. It wont be called non-linear either rather called raw and unbounded logits to be precise. In case of BCEWithLogitsLoss, when you feed these raw logits, the loss internally applies sigmoid function converting it into probabilities and calculates loss_fn between preds and ground truth.

rithvikshetty Aug 6, 2024
Author

Exactly, when you apply bounded activation function (sigmoid in this case) for the model used in the video

class FashionMNISTModelV1(nn.Module):
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
        super().__init__()
        self.layer_stack = nn.Sequential(
            nn.Flatten(), # flatten inputs into single vector
            nn.Linear(in_features=input_shape, out_features=hidden_units),
            nn.ReLU(),
            nn.Linear(in_features=hidden_units, out_features=output_shape),
            nn.ReLU()
        )
    
    def forward(self, x: torch.Tensor):
        return self.layer_stack(x)

the output is activated, to which you are again applying sigmoid in BCEWithLogitsLoss.

rithvikshetty Aug 6, 2024
Author

It'd be the non-linear output

I meant that, if we use the model from the video, we'd not be using logits, but the bounded output.

LuluW8071 Aug 6, 2024

Do Note:

It is not recommended to use activation functions such as ReLU, SiLU, GeLU, Leaky ReLU, or others in the final layers of a model architecture (except for Sigmoid and Softmax, depending on the loss function u choose). In ur case, applying ReLU in the final layer is not appropriate as it will restrict the output to the range from 0 to positive values, which is not suitable for all tasks.

If the video has ReLU() on final layers, it is still not recommended.

LuluW8071 Aug 6, 2024

I meant that, if we use the model from the video, we'd not be using logits, but the bounded output.

The final layer outputs are always logits for BCEwithLogitLoss and CrossEntropyLoss except for BCELoss (in BCELoss u should apply sigmoid in final layer)

Check documentation of pytorch once for further clarification.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

03_pytorch_computer_vision.ipynb crossEntropyLoss() #1031

{{title}}

Replies: 1 comment 6 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

03_pytorch_computer_vision.ipynb crossEntropyLoss() #1031

rithvikshetty Aug 4, 2024

Replies: 1 comment · 6 replies

LuluW8071 Aug 5, 2024

LuluW8071 Aug 6, 2024

rithvikshetty Aug 6, 2024 Author

rithvikshetty Aug 6, 2024 Author

LuluW8071 Aug 6, 2024

LuluW8071 Aug 6, 2024

rithvikshetty
Aug 4, 2024

Replies: 1 comment 6 replies

LuluW8071
Aug 5, 2024

rithvikshetty Aug 6, 2024
Author

rithvikshetty Aug 6, 2024
Author