This repository has been archived by the owner on Aug 2, 2022. It is now read-only.
[BUG] Roll up target index metric schema is not proper for max and min when values of type float/double #450
Labels
bug
Something isn't working
Describe the bug
I have created roll up policy successful and able to index the data into the target index but the schema of some of the metric fields is getting configured wrongly .
For example max and min metrics its been mapped to keyword and more details below :
roll up policy which we tried ::
curl -XPUT "localhost:9200/_opendistro/_rollup/jobs/latest_stats_roll_up" -H 'Content-Type: application/json' -d'{"rollup":{"enabled":true,"schedule":{"interval":{"period":1,"unit":"Minutes"}},"description":"An example policy that rolls up the sample ecommerce data","source_index":"fmstats_2021-06-03*","target_index":"temp_stats_roll_1","page_size":1000,"delay":0,"continuous":false,"dimensions":[{"date_histogram":{"source_field":"timestamp","fixed_interval":"60m"}},{"terms":{"source_field":"portIdToClusterId"}},{"terms":{"source_field":"alias"}}],"metrics":[{"source_field":"port.rx.packets","metrics":[{"avg":{}},{"sum":{}},{"max":{}},{"min":{}},{"value_count":{}}]}]}}
'Data fetching indexing is properly done from the index management :
source[{"rollup._id":"latest_stats_roll_up","rollup._doc_count":12,"rollup._schema_version":9,"timestamp.date_histogram":1620342000000,"portIdToClusterId.terms":"4_1_x18;c2c20049","alias.terms":"c20049-4-1-x18","port.rx.packets.sum":1.0810224507E10,"port.rx.packets.value_count":12,"port.rx.packets.max":9.01208458E8,"port.rx.packets.min":9.00494681E8,"port.rx.packets.avg.sum":1.0810224507E10,"port.rx.packets.avg.value_count":12}]}
we have 5 minutes of actual data and we are rolling up to 60m granularity .
Problem is the schema which it got generated on the target index is wrong for the metrics like max and min its keeping the type as keyword instead of long/float.
here is the schema of the target index
{"temp_stats_roll_1":{"mappings":{"_meta":{"rollups":{"latest_stats_roll_up":{"enabled_time":1622716301620,"target_index":"temp_stats_roll_1","roles":[],"description":"An example policy that rolls up the sample ecommerce data","source_index":"fmstats_2021-06-03*","enabled":true,"rollup_id":"latest_stats_roll_up","schema_version":8,"schedule":{"interval":{"start_time":1622716301620,"period":1,"unit":"Minutes"}},"delay":0,"last_updated_time":1622716301620,"continuous":false,"metadata_id":"IFpu0XkBCA59Kcdjqyqa","metrics":[{"source_field":"port.rx.packets","metrics":[{"avg":{}},{"sum":{}},{"max":{}},{"min":{}},{"value_count":{}}]}],"page_size":1000,"dimensions":[{"date_histogram":{"fixed_interval":"60m","source_field":"timestamp","target_field":"timestamp","timezone":"UTC"}},{"terms":{"source_field":"portIdToClusterId","target_field":"portIdToClusterId"}},{"terms":{"source_field":"alias","target_field":"alias"}}]}}},"dynamic_templates":[{"strings":{"match_mapping_type":"string","mapping":{"type":"keyword"}}},{"date_histograms":{"path_match":"*.date_histogram","mapping":{"type":"date"}}}],"properties":{"alias":{"properties":{"terms":{"type":"keyword"}}},"port":{"properties":{"rx":{"properties":{"packets":{"properties":{"avg":{"properties":{"sum":{"type":"float"},"value_count":{"type":"long"}}},"max":{"type":"keyword"},"min":{"type":"keyword"},"sum":{"type":"float"},"value_count":{"type":"long"}}}}}}},"portIdToClusterId":{"properties":{"terms":{"type":"keyword"}}},"rollup":{"properties":{"_doc_count":{"type":"long"},"_id":{"type":"keyword"},"_schema_version":{"type":"long"}}},"timestamp":{"properties":{"date_histogram":{"type":"date"}}}}}}}
snippet is :
"port":{"properties":{"rx":{"properties":{"packets":{"properties":{"avg":{"properties":{"sum":{"type":"float"},"value_count":{"type":"long"}}},"max":{"type":"keyword"},"min":{"type":"keyword"}
For sum its proper and for max/min its getting as keyword .
further analysis on this issue looks like we are dynamically mapping all float/double value to strings and dynamically mapping and getting indexing those fields as keywords as per the dynamic schema on the target roll indices ..
confirmed the same by making the following change to the indexer file and it started working ..
For example if we run the query with sum its working
curl -XPOST "localhost:9200/temp*/_search?pretty" -H 'Content-Type: application/json' -d'{"size":0,"aggregations":{"daily_numbers":{"terms":{"field":"portIdToClusterId"},"aggregations":{"Sub_dateHistogramAgg":{"date_histogram":{"field":"timestamp","missing":0,"fixed_interval":"5m","offset":0,"order":{"_key":"asc"},"keyed":false,"min_doc_count":0},"aggregations":{"sumAggporttxpackets":{"sum":{"field":"port.rx.packets"}}}}}}}}'
output:
if we run with max/min its not working :
[root@fmha1 opendistro-index-management]# curl -XPOST "localhost:9200/temp*/_search?pretty" -H 'Content-Type: application/json' -d'{"size":0,"aggregations":{"daily_numbers":{"terms":{"field":"portIdToClusterId"},"aggregations":{"Sub_dateHistogramAgg":{"date_histogram":{"field":"timestamp","missing":0,"fixed_interval":"5m","offset":0,"order":{"_key":"asc"},"keyed":false,"min_doc_count":0},"aggregations":{"maxAggportRxPackets":{"max":{"field":"port.rx.packets"}}}}}}}}'
output:
After adding above fix search queries are working , can you please validate the fix and fix accordingly .
Expected behavior
All the metrics used in the dimensions should have schema as non keywords
OpenDistro version : 1.13.2.0
The text was updated successfully, but these errors were encountered: