Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pmm-agent shown as Disconnected/Unknown/Failed after temporary network disconnect #3441

Open
1 task done
ValeriiVozniuk opened this issue Jan 8, 2025 · 3 comments
Open
1 task done
Assignees
Labels
bug Bug report

Comments

@ValeriiVozniuk
Copy link

ValeriiVozniuk commented Jan 8, 2025

Description

Hi,

We are observing an issue with pmm-agent statuses after temporary network disconnects. It stays in Disconnected state, and PMM Server UI shows node as Failed, with agent in Unknown status. Per what we see in pmm-agent trace logs, it is reconnecting to the server, and able to send data after reconnect, but state remains broken.
Agent or server restart restores the correct statuses.
We have just 9 clients connected, and all of them except the local one on PMM Server have this issue.

Expected Results

After reconnect, agent is in Connected state, and UI shows that all is ok.

Actual Results

sudo pmm-admin list
Service type        Service name                                     Address and port        Service ID
MongoDB             dbhost01-mongo        127.0.0.1:27017         /service_id/00d622c0-0504-4d60-8d5d-b7a4e85ec2bf

Agent type                    Status              Metrics Mode        Agent ID                                              Service ID                                              Port
pmm_agent                     Disconnected                            /agent_id/06698602-553b-4dcc-a879-09236136c734                                                                0
node_exporter                 Running             push                /agent_id/3e0d31cb-ccfb-4160-a1d9-e6daaf036037                                                                42001
mongodb_exporter              Running             push                /agent_id/cf82eadb-cbef-4a39-a74d-1a0c63d6ed8d        /service_id/00d622c0-0504-4d60-8d5d-b7a4e85ec2bf        42002
mongodb_profiler_agent        Running                                 /agent_id/a222e840-2145-4743-b4fe-9d65e2822eb1        /service_id/00d622c0-0504-4d60-8d5d-b7a4e85ec2bf        0
vmagent                       Running             push                /agent_id/31ec8f09-13a1-4c9e-9fca-39c59d9e4250                                                                42000

UI1
UI2

Version

Agent ID : /agent_id/06698602-553b-4dcc-a879-09236136c734
Node ID  : /node_id/3d6aa28c-9d71-424d-a3c6-e7bb9c736b33
Node name: dbhost01

PMM Server:
        URL    : https://pmm-test.domain.com:443/
        Version: 2.43.2

PMM Client:
        Connected        : true
        Time drift       : 688.4µs
        Latency          : 609.81µs
        Connection uptime: 100
        pmm-admin version: 2.44.0
        pmm-agent version: 2.44.0
Agents:
        /agent_id/31ec8f09-13a1-4c9e-9fca-39c59d9e4250 vmagent Running 42000
        /agent_id/3e0d31cb-ccfb-4160-a1d9-e6daaf036037 node_exporter Running 42001
        /agent_id/a222e840-2145-4743-b4fe-9d65e2822eb1 mongodb_profiler_agent Running 0
        /agent_id/cf82eadb-cbef-4a39-a74d-1a0c63d6ed8d mongodb_exporter Running 42002

Steps to reproduce

  1. Deployed PMM Server via Helm chart.
  2. Deployed PMM Client on a node with MongoDB.
  3. Connected Client to the Server, confirmed that all is green.
  4. Add temporary firewall rule to block traffic between Client and Server.
  5. Observe in Client log that connection is broken, and reconnect is running.
  6. Remove firewall rule, check that Client reconnects to the Server.
  7. Check the Client status in pmm-admin and Server UI, note that it is broken.

Relevant logs

pmm-agent trace log will be attached to a separate comment

Code of Conduct

  • I agree to follow Percona Community Code of Conduct
@ValeriiVozniuk
Copy link
Author

pmm-agent.log

@BupycHuk
Copy link
Member

BupycHuk commented Jan 8, 2025

Hi @ValeriiVozniuk , pmm-agent logs looks fine, could you share pmm-managed and nginx logs?

@ValeriiVozniuk
Copy link
Author

Please find requested logs attached for this morning disconnect
agent-nginx-managed.zip
pmm-managed.filtered.log is filtered by node_id/agent_id events for your convenience, but full log is also present just in case it would be needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

No branches or pull requests

2 participants