Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adapter randomly disconnects/doesn't reconnect #435

Open
overwerk opened this issue Mar 11, 2023 · 10 comments
Open

adapter randomly disconnects/doesn't reconnect #435

overwerk opened this issue Mar 11, 2023 · 10 comments

Comments

@overwerk
Copy link

overwerk commented Mar 11, 2023

Hi,

I use several dozens of instances of the adapter and about every couple of days some instances (usually about 3-5, but sometime 15-20) wont reconnect/stay disconnected to the AP.
The AP works fine, I can connect throughout via app, but need to restart the adapter manually every time.

I can’t reproduce the behavior and it’s usually not the same instance that has this issue.

All APs are on connections with 24h reconnect scheme, so every 24h the router of the network the AP is in reconnects to the internet and usually get a new IP address assigned.

But I can't see a pattern here. For example I have 2 AP in the same building (same router/internet access) and just one of the has the issue, the other works fine. And the disconnected of of the adapter usually doesn't happen when the reconnected to the ISP is going on. So i assume is has nothing do to with the reconnect of the AP.

In the log file is not a single entry relating to the issue (currently working with info-level).

Using version 1.20.

Would be great if you could help to determine the problem or a way how to narrow it down.

Best regards

@Apollon77
Copy link
Member

Please set to debug ansd check log then.

I can only say that "several dozens of instances of the adapter" sounds very strange and I have NO IDEA if and how eq-3 sees this when that many conn ections comes from a single IP ... WHy you need that many instances?

@overwerk
Copy link
Author

Will do.

Its not a single IP, its several as well.

@Apollon77
Copy link
Member

Also too many for the same "Cloud Account" might be problematic ... eq-3 assumes one AP and "a temporary opened App" ... so 1,x connections at the same time :-)

@overwerk
Copy link
Author

Hmmm, that is certainly true and I think in this case something else is triggering the disconnect of the adapter.
I never received a ban or something, when I restart the adapter, it works right away.

I set log level to debug and let you know :-)

@Apollon77
Copy link
Member

Apollon77 commented Mar 11, 2023

I also had one case where the automatic reconnect did not worked somehow once whre I still wanted to look into, but did not had time. With a debug log I would see if you experience the same. It also happened as I had an internet reconnect. Maybe the library used for MQTT is not handling special timeout cases correctly and then hangs ... but I need to look into this as said - but would potentially match to your case

@overwerk
Copy link
Author

overwerk commented Apr 4, 2023

Ok, I've now the first case with logs, but there is nothing special I guess.

This morning ALL of my adapters disconnected with this message:
2023-04-04 02:03:16.974 - �[33mwarn�[39m: hmip.1 (677514) ws connection closed (0) - code: 1006 - reason:

A couple seconds later basically all BUT ONE reconnected with message:
2023-04-04 02:03:27.078 - �[32minfo�[39m: hmip.5 (677002) ws connection opened

And there is nothing in the log file regarding hmip.1 (the one that didn't reconnect) afterwards.

Also the 24h disconnect from the ISP was 2 hours later, so it has nothing to do with that.

From my opinion, homematic disconnects deliberately connections and usually the adapter reconnects again and fails every once in a while to do so.

That is also why I am experience this just every couple of weeks, but then usually its more than one adapter that is failing to reconnect.

Anything I can do about it or help to solve it?

Thanks

@Apollon77
Copy link
Member

I also had that effect once ... I need to look into it and maybe implement an additional reconnection mechanism or check ho to maybe betteer provide timeouts

@overwerk
Copy link
Author

overwerk commented Apr 15, 2023

Just wanted to confirm the issue again.
Yesterday morning 24 adapters got disconnected at the same time:

2023-04-13 08:02:11.021  - �[33mwarn�[39m: hmip.4 (677694) ws connection closed (0) - code: 1006 - reason: 

This time 7 couldn't reconnect.

@stale stale bot added the wontfix This will not be worked on label Aug 10, 2023
@iobroker-community-adapters iobroker-community-adapters deleted a comment from stale bot Aug 11, 2023
@stale stale bot removed the wontfix This will not be worked on label Aug 11, 2023
@elek76
Copy link

elek76 commented Aug 19, 2023

Joining here as I see this issue from time to time (ca. once in a month). Most recent occurrence was last night (internet connection has been stable during this time):
2023-08-19 03:44:12.840 - �[33mwarn�[39m: hmip.0 (242) ws connection closed (0) - code: 1006 - reason:
Next log entry from the adapter is where I restarted it:
2023-08-19 09:34:21.993 - �[32minfo�[39m: host.3c701d7c21ba stopInstance system.adapter.hmip.0 (force=false, process=true)
I usually notice the issue when the diagrams on vis are not updated. My ioBroker is running in a Docker container on a Synology DS in case it makes a difference.
Just activated debug logs - will sit and wait for the next occurrence...
I will implement a workaround to automatically restart the adapter when some selected DP has not been updated for a while, but of course a fix would be welcome.

@overwerk
Copy link
Author

overwerk commented Aug 28, 2023

[...] . I will implement a workaround to automatically restart the adapter when some selected DP has not been updated for a while, but of course a fix would be welcome.

Could you share your script? Facing a lot of reconnect issues lately...

I tried to find a solution and from my point of view its in this line:
if ((this.wsConnectionErrorCounter > 6 || reason.includes('ECONNREFUSED')) && !this._unloaded) {
Because this line prevents the adapter from reconnecting.

These log-entries never showed up:

this.log.error(error updating homematic ip for unknown states: ${code} - ${reason});
this.log.error('Try reconnect in 30s');

From my basic programming knowledge, you wanted to prevent a reconnect when we already tried 6 times or the reason for the termination is "ECONNREFUSED".
In our cases, its neigher, wsConnectionErrorCounter is 0, so its false, and reason.includes('ECONNREFUSED') is also false

I would change the line to:

if (this.wsConnectionErrorCounter <= 6 && !reason.includes('ECONNREFUSED') && !this._unloaded) {
So a reconnect would initiate if the error-counter is below 7, the reason is not ECONNREFUSED and the adapter is not unloaded.

I'll make the changes locally and keep you updated.

EDIT:
I updated the function, because from my understanding, with the current build, the reconnect is working, when the reason is "ECONNREFUSED", otherwise its not. That is also what I see in the logfile.
Now the adapter tries to reconnect regardless the reason, but in case of ECONNREFUSED a lot slower:

    _closed(code, reason) {
        if (this.wsConnectionStableTimeout || !this.wsConnected) {
            this.wsConnectionErrorCounter++;
        } else {
            this.wsConnectionErrorCounter = 0;
        }
        reason = reason ? reason.toString() : '';
        this.log.warn(`ws connection closed (${this.wsConnectionErrorCounter}) - code: ${code} - reason: ${reason}`);
        this.wsConnected = false;
		
	if (this._unloaded || this.wsConnectionErrorCounter > 6) {
	   this.log.error(`skipping reconnect: ${this._unloaded} - ${this.wsConnectionErrorCounter}`);
	   return;
	}
		
	let delay;
	if (reason.includes('ECONNREFUSED')) {
	   delay = this.wsConnectionErrorCounter * 5 * 60 * 1000; // 5min * wsConnectionErrorCounter
	} else {
	   delay = this.wsConnectionErrorCounter * 1 * 60 * 1000; // 1min * wsConnectionErrorCounter
	}
		
        this._api.dispose();
        this.log.error(`error updating homematic ip for unknown states: ${code} - ${reason}`);
        this.log.error('Try reconnect in ${delay / 60 / 1000} minute(s)');
        this.reInitTimeout && clearTimeout(this.reInitTimeout);
        this.reInitTimeout = setTimeout(() => {
           this.reInitTimeout = null;
           this._ready()
        }, delay);   
    }

I have a lot of instances running so I guess within a couple of weeks I'm able to determine if the changes are working properly.

EDIT2: Unfortunately my code is not correct, but I'm getting there. I should have debugged the whole thing with the next disconnect...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants