Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tasks are running even when orchestration is complete #1137

Open
saurav-microsoft opened this issue Jul 16, 2024 · 11 comments
Open

Tasks are running even when orchestration is complete #1137

saurav-microsoft opened this issue Jul 16, 2024 · 11 comments

Comments

@saurav-microsoft
Copy link

Hi,
I am using DTF with Azure Storage Provider.

I have an orchestration which contains several tasks. The orchestration completes successfully. However I see that there are 2 tasks which are hidden in the workItem queue. Even after the orchestration is complete, these tasks keep on executing forever.

I tried to debug from logs to understand what is going on here, but could not go further.
Below is an example of a good task (task#3) and a bad task (task#5) which executes forever.

LifeStory of Task#3

Sending [TaskScheduled#3] message to newprovisioninghub-workitems for instance 'c2102a9a4d5942e4a1bf91e0767b50c8'

Fetched [TaskScheduled#3] message from newprovisioninghub-workitems (delay = 45ms)

Processing [TaskScheduled#3] (total delay = 45ms)

.......... Runs the Task - <>

Sending [TaskCompleted#3] message to newprovisioninghub-control-00 for instance 'c2102a9a4d5942e4a1bf91e0767b50c8'

Deleting [TaskScheduled#3] message from newprovisioninghub-workitems

Fetched [TaskCompleted#3] message from newprovisioninghub-control-00 (delay = 90ms)

Processing [TaskCompleted#3] (total delay = 166ms)

Deleting [TaskCompleted#3] message from newprovisioninghub-control-00


LifeStory of Task#5

Sending [TaskScheduled#5] message to newprovisioninghub-workitems for instance 'c2102a9a4d5942e4a1bf91e0767b50c8'

Fetched [TaskScheduled#5] message from newprovisioninghub-workitems (delay = 0ms)

Processing [TaskScheduled#5] (total delay = 0ms)

newprovisioninghub-control-00: No new messages were found - backing off

Renewing [TaskScheduled#5] message on newprovisioninghub-workitems for an additional 1200 seconds

Sending [TaskCompleted#5] message to newprovisioninghub-control-00 for instance 'c2102a9a4d5942e4a1bf91e0767b50c8'

Deleting [TaskScheduled#5] message from newprovisioninghub-workitems

Fetched [TaskCompleted#5] message from newprovisioninghub-control-00 (delay = 251ms)

Failed to update or delete message [TaskScheduled#5] with ID = c4eb20ac-48be-47d2-bf25-dd202a67c8e4 because it no longer exists

Processing [TaskCompleted#5] (total delay = 335ms)

Deleting [TaskCompleted#5] message from newprovisioninghub-control-00

@saurav-microsoft
Copy link
Author

@davidmrdavid @cgillum - Requesting your help if I can do something more to add logs, or any other information

@saurav-microsoft
Copy link
Author

Similar to #1049

@cgillum
Copy link
Member

cgillum commented Jul 16, 2024

According to those logs, it looks like the message should be getting deleted. Are there any errors in the logs? What is the content of the queue messages?

@saurav-microsoft
Copy link
Author

Apart from this log I do not see any error in the logs,

Failed to update or delete message [TaskScheduled#5] with ID = c4eb20ac-48be-47d2-bf25-dd202a67c8e4 because it no longer exists

The content of the queue message of type TaskScheduledEvent

@saurav-microsoft
Copy link
Author

saurav-microsoft commented Jul 16, 2024

Also I observed a pattern that the task where renew happens executes forever,

Renewing [TaskScheduled#5] message on newprovisioninghub-workitems for an additional 1200 seconds

@cgillum
Although it says that the message was not found, I can still see that message in my queue. Initially it was hidden though.

Failed to update or delete message [TaskScheduled#5] with ID = 24ffa893-77fe-4afa-b4bb-eb2c0ba618cf because it no longer exists

image

@cgillum
Copy link
Member

cgillum commented Jul 16, 2024

Although it says that the message was not found, I can still see that message in my queue. Initially it was hidden though.

This message could also mean that the message exists, but it was hidden. The messages become hidden when they are being processed by some other listener. In your case, the fact that there's a "Renewing [TaskScheduled#5] message on newprovisioninghub-workitems for an additional 1200 seconds" tells me that the processing of this message is taking a long time, and that it will remain hidden for up to 1200 seconds (20 minutes).

What version of the Microsoft.Azure.DurableTask.AzureStorage nuget package are you using?

@saurav-microsoft
Copy link
Author

saurav-microsoft commented Jul 16, 2024

Yes, the Task does some heavy work which usually takes longer. In that case, it is okay that my long running Task messages are renewed. It's just that it should be deleting it when the processing is done which I believe is failing here.

I am using Microsoft.Azure.DurableTask.AzureStorage-2.0.0-rc.

Moved to this NuGet to move away from StorageConnectionString and use TokenCredential

@saurav-microsoft
Copy link
Author

I also checked Azure Queue docs and that it throws 404 when pop receipt is not what it gave away in the GET call. Not sure, if it is a bug here

@davidmrdavid
Copy link
Collaborator

@saurav-microsoft - please note that thet -rc pacakges are not production ready, we cannot recommend them to be used in production.

That said - we'ved fixed a few major bugs since the rc release. Can you upgrade to the rc.3 package and see if your issue reproduces?

@davidmrdavid
Copy link
Collaborator

In particular, there was an error in the updating logic for the popReceipt property, which seems to match your symptoms (note I've only taken a cursory look at this thread, apologies if I've missed something major)

@saurav-microsoft
Copy link
Author

Yes, it was indeed pop receipt issue during renewal flows. It works now for me. @davidmrdavid Can you please help me with a tentative timeline when you are planning for final release?
We are not in production yet, but we plan to go live soon. I was hoping I consume the right NuGet before that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants