-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save intermediate COG files to S3 #563
Changes from 1 commit
c10843b
ce2b2cd
89c4441
4b83ab9
45a91c4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,44 +3,50 @@ | |
set -e | ||
|
||
# requires arguments | ||
# -s | --source | ||
# -T | --target | ||
# --block_size | ||
# -r | --resample | ||
# -G | --export_to_gee | ||
# -d | --dataset | ||
# -I | --implementation | ||
# -t | --target | ||
# --prefix | ||
|
||
ME=$(basename "$0") | ||
. get_arguments.sh "$@" | ||
#. get_arguments.sh "$@" | ||
|
||
set -x | ||
# download all GeoTiff files | ||
aws s3 cp --recursive --exclude "*" --include "*.tif" "${SRC}" . | ||
|
||
# create VRT of input files so we can use gdal_translate | ||
if [ ! -f "merged.vrt" ]; then | ||
if [[ $(aws s3 ls "${PREFIX}/merged.tif") ]]; then | ||
aws s3 cp "${PREFIX}/merged.tif" merged.tif | ||
else | ||
aws s3 cp --recursive --exclude "*" --include "*.tif" "${SRC}" . | ||
|
||
# create VRT of input files so we can use gdal_translate | ||
gdalbuildvrt merged.vrt *.tif | ||
fi | ||
|
||
# merge all rasters into one huge raster using COG block size | ||
if [ ! -f "merged.tif" ]; then | ||
# merge all rasters into one huge raster using COG block size | ||
gdal_translate -of GTiff -co TILED=YES -co BLOCKXSIZE="${BLOCK_SIZE}" -co BLOCKYSIZE="${BLOCK_SIZE}" -co COMPRESS=DEFLATE -co BIGTIFF=IF_SAFER -co NUM_THREADS=ALL_CPUS --config GDAL_CACHEMAX 70% --config GDAL_NUM_THREADS ALL_CPUS merged.vrt merged.tif | ||
aws s3 cp merged.tif "${PREFIX}/merged.tif" | ||
fi | ||
|
||
# create overviews in raster | ||
if ! gdalinfo "merged.tif" | grep -q "Overviews"; then | ||
gdaladdo merged.tif -r "${RESAMPLE}" --config GDAL_NUM_THREADS ALL_CPUS --config GDAL_CACHEMAX 70% | ||
if [[ $(aws s3 ls "${PREFIX}/merged.tif.ovr") ]]; then | ||
aws s3 cp "${PREFIX}/merged.tif.ovr" merged.tif.ovr | ||
else | ||
# generate overviews externally | ||
gdaladdo merged.tif -r "${RESAMPLE}" -ro --config GDAL_NUM_THREADS ALL_CPUS --config GDAL_CACHEMAX 70% --config COMPRESS_OVERVIEW DEFLATE | ||
aws s3 cp merged.tif.ovr "${PREFIX}/merged.tif.ovr" | ||
fi | ||
|
||
# convert to COG using existing overviews, this adds some additional layout optimizations | ||
if [ ! -f "cog.tif" ]; then | ||
gdal_translate merged.tif cog.tif -of COG -co COMPRESS=DEFLATE -co BLOCKSIZE="${BLOCK_SIZE}" -co BIGTIFF=IF_SAFER -co NUM_THREADS=ALL_CPUS --config GDAL_CACHEMAX 70% --config GDAL_NUM_THREADS ALL_CPUS | ||
fi | ||
gdal_translate merged.tif cog.tif -of COG -co COMPRESS=DEFLATE -co BLOCKSIZE="${BLOCK_SIZE}" -co BIGTIFF=IF_SAFER -co NUM_THREADS=ALL_CPUS -co OVERVIEWS=FORCE_USE_EXISTING --config GDAL_CACHEMAX 70% --config GDAL_NUM_THREADS ALL_CPUS | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks good. Just wondering - so I assume that the OVERVIEWS=FORCE_USE_EXISTING is the option that makes sure that merged.tif.ovr is used, even though it is not explicitly named on the command line? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Exactly, that confused me too, but there's no option to specify an overview file, you just need to follow the naming convention. |
||
|
||
# upload to data lake | ||
aws s3 cp cog.tif "${TARGET}" | ||
|
||
# delete intermediate file | ||
aws s3 rm "${PREFIX}/merged.tif" | ||
aws s3 rm "${PREFIX}/merged.tif.ovr" | ||
|
||
if [ -n "$EXPORT_TO_GEE" ]; then | ||
export_to_gee.py --dataset "${DATASET}" --implementation "${IMPLEMENTATION}" | ||
fi | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused about why you commented this out. Doesn't this break the argument parsing? Did you just do this temporarily for some testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this commented out line is the reason your tests are failing (in
test_cog_asset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah, good catch, that was to test running it locally