How to efficiently read HDF5 Variable with Structure using NetCDF-Java #1405
-
Just FYI, I posted the following a couple of weeks ago on stackoverflow. I haven't received an response there, so I thought I'd try here... What is an efficient way to use the NetCDF-java API to read an HDF5 giving a raster variable consisting of a structure in the following form?
In the past, I've had good luck using the NetCDF-java API to process simple raster Variables (see Reading NetCDF files). But when I attempt to access a Structure type, my code runs very slow. Processing the data for the structure shown above requires 36 minutes to run. Processing the same file with the JHDF Java HDF library requires only about 2 seconds. Clearly, I am not using the NetCDF-java API the way its authors intended. Unfortunately, I couldn't find any good examples for dealing with structure variables. I did read the Javadoc and Junit test cases from the project, but the best I could figure out is the code shown below. I am using the current release of NetCDF-java, version 5.6.0. The sample file I tested is an IHO S-102 format (HDF5) file giving bottom depths for harbors. Sample files can be downloaded at NOAA S-102 Bathymetric Surface Data. I tested a number of files with similar results (the variable shown above is from 102US00_US4NJ1FH.h5). Although the code below loops on row and column, I also tried a variation that accessed grid cells based on the chunk size scheme. Looking at the code, it’s obvious that the loop creates an awful lot of short-persistence objects, but I think the real problem is due to the underlying approach. I'm pretty sure the code makes a distinct file access operation for each data value it retrieves. But I haven't been able to figure out a more efficient way to use the API.
Thanks in advance for your help. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Greetings! I'm not certain yet what is causing the slowness using the code you've provided, but something does not seem quite right in the There are a few other approaches you could take if you know you want to read all of the data for a given member of the structure variable, both of which are considerably faster. Note that in these examples, I'm using this file (102US00_US4NJ1FH.h5) and First, you could use a structure iterator:
This takes just under 4 seconds on my machine. Second, you could skip the structure specific APIs all together and read the member as
This takes about 500 ms on my machine, although with the cost of loading the entire member data array into memory. |
Beta Was this translation helpful? Give feedback.
-
I can do that if you'd like. I have not started watching stackoverflow since my return in November, but this is just the nudge I needed to do that :-) |
Beta Was this translation helpful? Give feedback.
Greetings!
I'm not certain yet what is causing the slowness using the code you've provided, but something does not seem quite right in the
readStructure
call. It seems like each time it reads an individual value from the struct, it's loading in the full chunk from disk, decompressing it, reading the single value, then repeating itself (so 3,458,025 load-chunk/decompress/get-single-float iterations...not good).There are a few other approaches you could take if you know you want to read all of the data for a given member of the structure variable, both of which are considerably faster. Note that in these examples, I'm using this file (102US00_US4NJ1FH.h5) and
targetVariableName = "Bathymet…