Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Action timed out and retry not consumed #1087

Open
kksaha opened this issue Feb 1, 2024 · 5 comments
Open

[BUG] Action timed out and retry not consumed #1087

kksaha opened this issue Feb 1, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@kksaha
Copy link

kksaha commented Feb 1, 2024

What is the bug?
I have several indices with Action timed out.

Here is my policy:
"default_state": "hot", "states": [ { "name": "hot", "actions": [ { "timeout": "4h", "retry": { "count": 10, "backoff": "exponential", "delay": "20m" }, "rollover": { "min_size": "90gb", "min_index_age": "7d", "min_primary_shard_size": "30gb", "copy_alias": false } } ], "transitions": [ { "state_name": "snapshot", "conditions": { "min_rollover_age": "14d" } } ] }, { "name": "snapshot", "actions": [ { "timeout": "10h", "retry": { "count": 5, "backoff": "exponential", "delay": "2h" }, "snapshot": { "repository": "KK_data_repository", "snapshot": "{{ctx.index}}" } } ], "transitions": [ { "state_name": "delete" } ] }, { "name": "delete", "actions": [ { "retry": { "count": 100, "backoff": "exponential", "delay": "10m" }, "delete": {} } ], "transitions": [] } ],

and despite the timeout configuration, we've got Action timeout for several indices that use that policy. And it looks like the system didn't perform any retries:

Here is explain output:

"state": { "name": "hot", "start_time": 1706450388940 }, "action": { "name": "rollover", "start_time": 1706450636772, "index": 0, "failed": true, "consumed_retries": 0, "last_retry_time": 0 }, "step": { "name": "attempt_rollover", "start_time": 1706450636772, "step_status": "condition_not_met" }, "retry_info": { "failed": false, "consumed_retries": 0 }, "info": { "message": "Action timed out" }

What is your host/environment?

  • Opensearch version - v2.11.1

Do you have any screenshots?
Screenshot 2024-02-01 at 3 04 10 PM

@kksaha kksaha added bug Something isn't working untriaged labels Feb 1, 2024
@kksaha
Copy link
Author

kksaha commented Feb 8, 2024

Can anyone please suggest.

@Juliaj
Copy link

Juliaj commented Mar 25, 2024

We're hitting this intermittently as well. From the output listed above

"name": "hot", "start_time": 1706450388940 -> Sunday, January 28, 2024 1:59:48.940 PM
"attempt_rollover", "start_time": 1706450636772 -> Sunday, January 28, 2024 2:03:56.772 PM

"step_status": "condition_not_met" indicated that rollover condition hadn't been met, thus rollover action shouldn't be triggered. But how was this tied to "Action timed out"?

@Juliaj
Copy link

Juliaj commented Mar 25, 2024

Found a previous issue related to this #315

@dblock
Copy link
Member

dblock commented Jun 17, 2024

Looks like this is still a problem/bug. Catch All Triage - 1 2 3 4 5

@dblock dblock removed the untriaged label Jun 17, 2024
@spapadop
Copy link

@kksaha or anyone still looking for a work-around, try removing the "timeout": "4h" from your ISM policy and then update the policy on the respective data stream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants