Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to finetune hyper parameters to beat the LSTM performance? #121

Open
Mahesha999 opened this issue Oct 7, 2024 · 1 comment
Open

Comments

@Mahesha999
Copy link

I am trying to train an ML model on time series data. The input is 10 timeseries which are essentially a sensor data. The output is another set of three time series. I feed the model with the window of 100. So, the input shape becomes (100, 10). I want to predict output time series values for single time step. So, the output shape becomes (1, 3). (If I create mini batches of size say x, the input and output shapes become (x, 100, 10) and (x, 1, 3)).

My approach is to first overfit the model on smaller number of records. See if model is indeed learning / able to overfit the data. Then add some regularization (mostly dropout) and then try to train the model on full dataset.

First, I tried to overfit LSTM model on small dataset and visualised the outcome. It did well. So, I tried to train it on the whole dataset. It did okayish, but still struggled at some places. The LSTM model which I tried is as follows:

class LSTMModel(nn.Module):
    def __init__(self, in_dim=10, hidden_size=1400, num_layers=1, output_size=3):
        super(LSTMModelV3, self).__init__()

        self.lstm_1 = nn.LSTM(in_dim, hidden_size, num_layers, batch_first=True) 
        self.lstm_2 = nn.LSTM(hidden_size, hidden_size, num_layers, batch_first=True)
        self.lstm_3 = nn.LSTM(hidden_size, hidden_size, num_layers, batch_first=True)
        self.lstm_4 = nn.LSTM(hidden_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x, _ = self.lstm_1(x)
        x, _ = self.lstm_2(x)
        x, _ = self.lstm_3(x)
        x, _ = self.lstm_4(x)
        output = self.fc(x[:, -1, :])
        return output

I tried adding dropouts too, but it did not yield any significant improvement. So, I tried to train PatchTST transformer model. First, I tried to overfit smaller model and did well. In fact, when I visualized the output, I realised that it was able to get tighter overfit than the LSTM model. So, I tried to train it on the whole dataset. But the performance was not at all closer to LSTM.

The initial version of PatchTST I tried is as follows:

config = PatchTSTConfig(
    num_input_channels=10,
    context_length=100,
    num_targets=3,
    patch_length=10,
    patch_stride=5,
    prediction_length=1,
    num_hidden_layers=5,
    num_attention_heads=3,
    d_model=300,
)

model = PatchTSTForRegression(config)

With this as base config, I tried different changes to it for hyper parameter optimization:

  1. d_model = 600
  2. d_model = 800
  3. d_model = 600, num_hidden_layer = 7
  4. d_model = 600, patch_stride = 7
  5. d_model = 300, patch_stride = 7, num_hidden_layers = 8

And some more combinations. These hyperparameter combinations are selected so that I can fit the model in GPU with 24GB memory. However, no configuration yield validation loss comparable to LSTM. These are the LSTM vs PatchTST curves:

enter image description here

The corresponding learning rate curves are as follows:
enter image description here

I used to step down the learning rate if the performance does not improve for 7 epochs.

What I am missing here? Do I miss any time series transformer related insight?

PS1: Yes, the base LR starts from 0.00005, followed by step down to 0.000005, 0.0000005, 0.00000005. I know these are excessively tiny. But, in the beginning I tried to train LSTM with bigger base like 0.001, LR 0.005, 0.0005 etc, but it did not work at all. It all started working only after starting with 0.00005. May be because my sensor values themselves are very tiny.

PS2: It might seem that the LSTM val loss has already reached near 0. But, thats only because I have higher validation loss PatchTST runs in the plot. If I remove them and add LSTM overfitting run, then it looks something like this:

enter image description here

@Mahesha999 Mahesha999 changed the title How to do hyper param optimization to improve beat the performance of LSTM? How to do hyper param tuning to improve beat the performance of LSTM? Oct 7, 2024
@Mahesha999 Mahesha999 changed the title How to do hyper param tuning to improve beat the performance of LSTM? How to do hyper param tuning to improve the performance of LSTM? Oct 7, 2024
@Mahesha999 Mahesha999 changed the title How to do hyper param tuning to improve the performance of LSTM? How to finetune hyper parameters to beat the LSTM performance? Oct 7, 2024
@Greek-Guardian
Copy link

for a single time step, LSTM is definitely better. TFM performs better when it comes to long run forcasting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants