Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving forward, some notes #4

Open
khufkens opened this issue Feb 28, 2024 · 3 comments
Open

Moving forward, some notes #4

khufkens opened this issue Feb 28, 2024 · 3 comments

Comments

@khufkens
Copy link
Contributor

khufkens commented Feb 28, 2024

The premise of the manuscript would be that the physiological basis of fLUE can be used as a target for a new drought index through machine learning. Two caveats remain.

  1. potential data leakage and circularity in the use of the data.
  • i.e. MODIS is used as input in both analysis (fLUE, this work)
  • this has been addressed by showing sensitivity to fLUE by Landsat data, therefore decoupling the calculated index and the input data (while keeping all other things static). It must be noted that the model structure does change when using landsat data.
  1. how does this more complex model compare to known indices? The idea in this work has been that a ML model trained on fLUE should outperform existing indices when it comes to its relation to fLUE. Is this the case?
  • it seems that the model generally outperforms the bulk of the indices, but there are exceptions. One needs to check if the same indices return on the top of the list across clusters and sites. The parsimonious solution of the ML model might be a benefit in comparison to a tailored site / vegetation specific index.

Point 1. has been answered through the use of landsat data with results which hold up. Number 2. has been proven by a cross comparison to a zoo of indices - but needs nuances wrt to the indices.

See vignettes
https://geco-bern.github.io/index_based_drought_monitoring/

A third caveat remains, but is part of any simple index, mainly the fact that this metric/model is diagnostic only (calculated for each time step) and not prognostic.

@khufkens
Copy link
Contributor Author

A check of the VI which rank better or as good as ML do not show consistency across clusters. ML is the parsimonious way of dealing with large scale drought assessments it seems.

@khufkens
Copy link
Contributor Author

@stineb

No consistent VI tops the ML indices.
https://geco-bern.github.io/index_based_drought_monitoring/articles/model_evaluation_VI.html

Things to consider is to further simplify the model to avoid overfitting. Currently only limited hyperparameters are tuned, but this can be expanded to prune the trees severely (to decrease model sizes).

@khufkens
Copy link
Contributor Author

By and large I would consider this done, conceptually. There is a need to clean things up, tighten things up a bit more - write code for nicer graphs and output all relevant stats. However, this should be a solid basis for a small analysis / manuscript (addressing the most pressing methodological issues through cross validation, independent datasets and illustrate impact through a scaling exercise).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant