Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crop Mask Integration #85

Open
wants to merge 40 commits into
base: main
Choose a base branch
from
Open

Crop Mask Integration #85

wants to merge 40 commits into from

Conversation

gabrieltseng
Copy link
Collaborator

@gabrieltseng gabrieltseng commented Apr 19, 2022

Related issue: #83

  • Renaming script
  • Check that the renaming script works (the lat/lon of the tif file should correspond to whats in the filename)
  • Rewrite the engineer to handle the new naming format
    • Training data
    • Test data
  • Rewrite the EO exporter to export using this format by default (remove the identifier from the default labels)
    • Remove warning about dataset identifier in the EO exporter
  • Update the CropHarvest dataset to handle the new naming format

Related issue: #83

In addition, some smaller updates and bugfixes:

  • The "index" column in the labels.geojson conflicts with the index used by pandas. Rename it to "dataset_index" instead.
  • 🐛 The dataset_identifier was being incorrectly constructed - this is now fixed
  • 🐛 The export end date for the default labels was being incorrectly constructed. This is now fixed.
  • 🐛 Don't hardcode the seed in the CropHarvest dataset class

⚠️ This should only be merged into main when the Zenodo link has been updated. The failing integration test reflects this.

  • Recreate features
  • Update Zenodo

@gabrieltseng gabrieltseng removed the request for review from ivanzvonkov April 25, 2022 21:17
# TODO: The load_labels doesn't actually allow the root to be
# modified. We should probably do this at a package level, not
# at a class level
self._labels = Engineer.load_labels(root=root)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivanzvonkov , do you think the idea of a root is still relevant? Or is the data is small enough that we can just have the DATAFOLDER_PATH act as the root and remove that as an option for the user?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per discussion today I think removing is fine

@@ -22,6 +25,14 @@
FEATURES_DIR = "features"
TEST_FEATURES_DIR = "test_features"

# These values describe the structure of the data folder
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivanzvonkov , this locks in the folder structure but I think that's fine.

We could potentially have a way of over-riding this datafolder path at a package level, but otherwise I'd be for removing folder manipulation for the user entirely and controlling it here.

array = np.asarray(array)
idx = (np.abs(array - value)).argmin()
return array[idx]
def load_labels(root=DATAFOLDER_PATH) -> geopandas.GeoDataFrame:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the updates to the geojson here are from crop-mask


labelled_np = da.sel(x=closest_lon).sel(y=closest_lat).values
else:
min_distance_from_point = np.inf
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From crop-mask

@ivanzvonkov ivanzvonkov self-requested a review May 4, 2022 21:53
Mapping them to 3d space allows us to do that
"""
lat, lon = self.get_centre(in_radians=True)
return [cos(lat) * cos(lon), cos(lat) * sin(lon), sin(lat)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neato

from pathlib import Path

from typing import Optional
from cropharvest.boundingbox import BBox
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

)
return tif_paths

def create_h5_dataset(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hf = h5py.File(arrays_dir / file_name, "w")
filename = (
f"lat={instance.label_lat}_lon={instance.label_lon}_year={instance.year}.h5"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about start and end date? How do we know the months will match up?

@@ -148,3 +149,13 @@ def read_geopandas(file_path) -> geopandas.GeoDataFrame:

class NoDataForBoundingBoxError(Exception):
pass


def filter_geojson(gpdf: geopandas.GeoDataFrame, bounding_box: BBox) -> geopandas.GeoDataFrame:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the year?

@@ -0,0 +1,66 @@
"""
After 20220418_renaming.py was run,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe some underscores in the date would be cleaner in the file names?

@ivanzvonkov
Copy link
Contributor

@gabrieltseng what's the latest status on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants