-
Notifications
You must be signed in to change notification settings - Fork 13
Document and understand data flow leading to duplicate RA/NDBC stations #422
Comments
@dpsnowden , the most interesting fact here is that the CeNCOOS SOS Capability document does not report a station with WMO ID 46251 at all. It is just not there! So, the dual affiliation
indicated on the station page, is fake, and the link to the service in SOS|CeNCOOS view just returns an Internal Server Error 500 (in SOS|NOAA-NDBC view that link properly leads to the NDBC SOS page). On the other hand, a station with WMO ID 46244 actually is reported by both CeNCOOS and NDBC, and it is indicated by showing a triple affiliation
where the last line provides the real link to the service, although everything else is almost identical to the first line view. @lukecampbell , can you explain why the Catalog falsely attributed the station with WMO ID 46251 to CeNCOOS? The only reason for that indication that I can imagine, is that the station's and CeNCOOS' observedProperties are partially overlapping. However, it is a big stretch, and the indication seems wrong regardless. |
@abirger's two examples are pointing out two "ghost" CeNCOOS datasets that have no valid service link/page in the IOOS Catalog. His second example, WMO ID 46244, makes that even more apparent: the dataset is shown as occurring twice under the SOS | CeNCOOS service, but only one of them has a valid service link; that one also happens to have a "Last updated" date of a year ago. So, we really are dealing with two possibly separate issues here:
I think the core issue @dpsnowden was bringing up is the second one, and that's the only one where I can add some, umm, insight or details. Though the 1st issue is obviously also important. There are two relevant/important types of stations (datasets) that will have a WMO ID: NDBC-operated stations (buoys, C-MAN stations, etc), and stations from other providers (RA's, etc) made available for redistribution by NDBC and assigned a WMO ID. The two examples Derrick and Alex listed are both of the second type, as they are both CDIP wave buoys. AOOS and CeNCOOS (or maybe more to the point, Axiom) most likely get this CDIP data from NDBC, and add it to their SOS services while retaining the WMO ID in the station urn identifier. In the process, they also lose any metadata linkage to CDIP, and incorrectly assign NDBC as the "Operator Name" (eg, WMO 46244, http://catalog.ioos.us/datasets/53469a1d8c0db36efb591be3). But this begs the question: Why doesn't that WMO ID bring up the corresponding datasets from the CDIP DAC? It may be that the CDIP web service isn't providing a WMO ID for each buoy. The NDBC station page for WMO 46244 gives us the CDIP buoy code, 168 (see its CDIP page). Armed with that code, you'll see in the CDIP dataset listing that it occurs under 6 or so datasets! And most of those actually have two manifestations: "DAP | Other" and "DAP | CDIP" (eg, see this CDIP 168 datasets). And, BTW, the "DAP | Other" dataset appears to be another "ghost" dataset whose Catalog service link is broken or doesn't exist! Ok, so in this case one maybe could blame CDIP for not broadcasting the WMO ID for each of its buoys (and also having multiple datasets for each buoy). Let me turn to a cleaner example of another non-NDBC-operated station with a WMO ID. NANOOS operates the "NH-10" buoy whose data are redistributed on NDBC and has WMO ID 46094. Searching for the WMO ID on the catalog only brings up the datasets served by NDBC and CeNCOOS. Note that that dataset page has *three service links (top left), two from CeNCOOS, and one of those CeNCOOS datasets being yet another ghost dataset with a "Last Update" date of a year ago. Let's stick with the CeNCOOS service link that seems current. Again, CeNCOOS (Axiom) is ingesting this dataset from NDBC and redistributing it via their i52N SOS. They retain the wmo id in the station id urn they advertise ( In that NH-10 Catalog dataset page, the NANOOS entry from our i52N SOS service should be listed together with the NDBC and CeNCOOS service links, but it's not. In fact, ideally it should be listed as the "primary" source (same with the CDIP example), but that's a somewhat different topic (and one that was discussed in #376). Unlike the CDIP example, we do make the WMO ID available in the proper metadata element in the DescribeSensor response, and the Catalog is properly parsing that information. See the NANOOS NH-10 dataset page on the IOOS Catalog; it correctly shows 46094 under Station WMO ID. NANOOS doesn't construct a station urn with the wmo id, since that's not an IOOS DMAC requirement. Our SOS service only includes stations for which we're either the sole service provider to IOOS, or the primary one (as in NH-10). We don't redistribute data already available in IOOS services from their primary providers, such as NDBC and COOPS. All our station urn's have a nanoos namespace, so NH-10 is To summarize some conclusions from this long comment:
|
The "ghost" datasets are the result of dangling pointers. When a new service is identified from NGDC Geoportal, a "dataset" record is created. That dataset has a pointer back to the parent service from which it is harvested from. Any new services that are identified that contain a reference to the URL for that dataset are added to the list of service parents. Here's the document {
"_id" : ObjectId("53469a1d8c0db36efb591be3"),
"updated" : ISODate("2015-05-15T11:49:36.150Z"),
"uid" : "urn:ioos:station:wmo:46244",
"created" : ISODate("2014-04-10T13:18:21.359Z"),
"services" : [
{
"updated" : ISODate("2014-04-13T02:45:33.944Z"),
"description" : "Humboldt Bay, North Spit, CA",
"variables" : [
"http://mmisw.org/ont/cf/parameter/sea_surface_swell_wave_period",
"http://mmisw.org/ont/cf/parameter/sea_surface_swell_wave_significant_height",
"http://mmisw.org/ont/cf/parameter/sea_surface_swell_wave_to_direction",
"http://mmisw.org/ont/cf/parameter/sea_surface_wave_significant_height",
"http://mmisw.org/ont/cf/parameter/sea_surface_wave_to_direction",
"http://mmisw.org/ont/cf/parameter/sea_surface_wind_wave_period",
"http://mmisw.org/ont/cf/parameter/sea_surface_wind_wave_significant_height",
"http://mmisw.org/ont/cf/parameter/sea_surface_wind_wave_to_direction",
"http://mmisw.org/ont/cf/parameter/sea_water_temperature",
"http://mmisw.org/ont/fake/parameter/sea_surface_dominant_wave_period",
"http://mmisw.org/ont/fake/parameter/sea_surface_wave_mean_period"
],
"messages" : [],
"geojson" : {
"type" : "Point",
"coordinates" : [
-124.357,
40.888
]
},
"service_type" : "SOS",
"metadata_type" : "sensorml",
"keywords" : [],
"data_provider" : "CeNCOOS",
"service_id" : ObjectId("5311d2538c0db3469f7bfecb"),
"metadata_value" : "",
"asset_type" : "Buoy",
"name" : "Humboldt Bay, North Spit, CA"
},
{
"updated" : ISODate("2015-05-06T08:24:40.947Z"),
"description" : "Humboldt Bay, North Spit, CA",
"variables" : [
"urn:ioos:sensor:wmo:46244::summarywav1",
"urn:ioos:sensor:wmo:46244::watertemp1"
],
"messages" : [],
"geojson" : {
"type" : "Point",
"coordinates" : [
-124.357,
40.888
]
},
"name" : "46244",
"keywords" : [],
"metadata_type" : "sensorml",
"service_type" : "SOS",
"data_provider" : "NOAA-NDBC",
"metadata_value" : "",
"asset_type" : "MOORED BUOY",
"service_id" : ObjectId("53d49ca78c0db37ff137030c")
},
{
"updated" : ISODate("2015-05-15T11:49:36.150Z"),
"description" : "Humboldt Bay, North Spit, CA (46244)",
"variables" : [
"http://mmisw.org/ont/cf/parameter/sea_surface_swell_wave_period",
"http://mmisw.org/ont/cf/parameter/sea_surface_swell_wave_significant_height",
"http://mmisw.org/ont/cf/parameter/sea_surface_swell_wave_to_direction",
"http://mmisw.org/ont/cf/parameter/sea_surface_wave_significant_height",
"http://mmisw.org/ont/cf/parameter/sea_surface_wave_to_direction",
"http://mmisw.org/ont/cf/parameter/sea_surface_wind_wave_period",
"http://mmisw.org/ont/cf/parameter/sea_surface_wind_wave_significant_height",
"http://mmisw.org/ont/cf/parameter/sea_surface_wind_wave_to_direction",
"http://mmisw.org/ont/cf/parameter/sea_water_temperature",
"http://mmisw.org/ont/fake/parameter/sea_surface_dominant_wave_period",
"http://mmisw.org/ont/fake/parameter/sea_surface_wave_mean_period"
],
"metadata_type" : "sensorml",
"keywords" : [
"Humboldt Bay, North Spit, CA (46244)",
"NONE",
"urn:ioos:network:cencoos:all",
"urn:ioos:station:wmo:46244"
],
"data_provider" : "CeNCOOS",
"time_min" : ISODate("2014-11-01T00:15:00.163Z"),
"asset_type" : "Buoy",
"time_max" : ISODate("2015-05-11T23:43:00.876Z"),
"name" : "Humboldt Bay, North Spit, CA (46244)",
"messages" : [],
"geojson" : {
"type" : "Point",
"coordinates" : [
-124.356,
40.888
]
},
"service_type" : "SOS",
"service_id" : ObjectId("53d34aed8c0db37e0b538fda"),
"metadata_value" : ""
}
],
"active" : false
} So the parent service(s) may have been deactivated or deleted but the dataset is still being referenced by at least one active service. We will need to update the catalog behavior to periodically prune unreferenced services. |
Should be fixed now |
@lukecampbell , Yep, the ghost issue seems to be gone; however, the Catalog definitely does a selective matching of the WMO IDs as @emiliom has mentioned - it only matches datasets for the WMO ID that is a part of the station's URN, but drops the ball if WMO ID is defined in a separate identifier. It looks like a bug, since the Catalog seems to get the right WMO ID in both cases. In regard to the wrong Operator attribution, it seems that the Catalog does not parse this attribute from NDBC at all (while it properly shows there |
I happened to be thinking about WMO ID issues, and remembered this issue. It looks like this is still a problem on the IOOS Catalog (here's @abirger's summary, in the last entry on this issue):
Curious to hear if there are plans to fix this. Thanks. |
Still curious about this. @lukecampbell ? |
I believe it's still an outstanding bug. |
On the NDBC data set page you can see duplicate stations from CeNCOOS and AOOS. For example the station with WMO ID 46251 shows up as being served by both CeNCOOS and NDBC. I believe this is the correct behavior. The question is, why don't the other RA's show up this way?
Since it is only AOOS and CeNCOOS that exhibit this behavior, I'm guessing it is related to i52N configuration. What differs between i52N ID handling and ncSOS ID handling? But, since NANOOS and GCOOS aren't showing up this way, it must be more than just the software. It probably has more to do with the way @shane-axiom is handling URN station identifiers as part of the ingestion process that is not being done at the other sites.
We need to understand the data flow that led to this behavior and see what we can do to get it implemented everywhere. It may require a tool at the catalog that provides a summary of asset identifier usage for each collection (aka region). @emiliom I think you were looking into WMO ID issues, any insight? @lukecampbell thoughts?
@abirger or @carmelortiz can you track this or assign it to the appropriate person?
The text was updated successfully, but these errors were encountered: