-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crop Mask Integration #85
base: main
Are you sure you want to change the base?
Conversation
# TODO: The load_labels doesn't actually allow the root to be | ||
# modified. We should probably do this at a package level, not | ||
# at a class level | ||
self._labels = Engineer.load_labels(root=root) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivanzvonkov , do you think the idea of a root
is still relevant? Or is the data is small enough that we can just have the DATAFOLDER_PATH
act as the root and remove that as an option for the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per discussion today I think removing is fine
@@ -22,6 +25,14 @@ | |||
FEATURES_DIR = "features" | |||
TEST_FEATURES_DIR = "test_features" | |||
|
|||
# These values describe the structure of the data folder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivanzvonkov , this locks in the folder structure but I think that's fine.
We could potentially have a way of over-riding this datafolder path at a package level, but otherwise I'd be for removing folder manipulation for the user entirely and controlling it here.
array = np.asarray(array) | ||
idx = (np.abs(array - value)).argmin() | ||
return array[idx] | ||
def load_labels(root=DATAFOLDER_PATH) -> geopandas.GeoDataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the updates to the geojson here are from crop-mask
|
||
labelled_np = da.sel(x=closest_lon).sel(y=closest_lat).values | ||
else: | ||
min_distance_from_point = np.inf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From crop-mask
Mapping them to 3d space allows us to do that | ||
""" | ||
lat, lon = self.get_centre(in_radians=True) | ||
return [cos(lat) * cos(lon), cos(lat) * sin(lon), sin(lat)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neato
from pathlib import Path | ||
|
||
from typing import Optional | ||
from cropharvest.boundingbox import BBox |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
) | ||
return tif_paths | ||
|
||
def create_h5_dataset(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about the type of comments in https://github.com/nasaharvest/crop-mask/blob/69da6cef8258b3171c6a02771bfc2219d8eadf5b/src/ETL/dataset.py#L320
hf = h5py.File(arrays_dir / file_name, "w") | ||
filename = ( | ||
f"lat={instance.label_lat}_lon={instance.label_lon}_year={instance.year}.h5" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about start and end date? How do we know the months will match up?
@@ -148,3 +149,13 @@ def read_geopandas(file_path) -> geopandas.GeoDataFrame: | |||
|
|||
class NoDataForBoundingBoxError(Exception): | |||
pass | |||
|
|||
|
|||
def filter_geojson(gpdf: geopandas.GeoDataFrame, bounding_box: BBox) -> geopandas.GeoDataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the year?
@@ -0,0 +1,66 @@ | |||
""" | |||
After 20220418_renaming.py was run, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe some underscores in the date would be cleaner in the file names?
@gabrieltseng what's the latest status on this? |
Related issue: #83
CropHarvest
dataset to handle the new naming formatRelated issue: #83
In addition, some smaller updates and bugfixes:
"index"
column in thelabels.geojson
conflicts with the index used by pandas. Rename it to"dataset_index"
instead.dataset_identifier
was being incorrectly constructed - this is now fixed