-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zone synchronization fails sporadically #411
Comments
Hi @maintain3r.
It's best to not forcefully trigger a transfer. Let's leave that as is for now. The error almost seems like a timeout or something, but I really can't be sure yet. I'll try analyze the problem further tomorrow. |
Hi @k0ekk0ek thank you for your reply. Please let me know if you want me to check anything else. |
Hello, any updates on this one? |
Hi @maintain3r. Sorry for the delay in response, something else came up and one of the other devs will need to pick this up instead. |
Hi @maintain3r, Sorry for the late response. Since your secondaries are able to receive transfers, we don't think the "failed writing tcp" or "could not SSL_write crypto error" are problematic. The tcp error is likely about TCP fastopen (with zone transfers) . The SSL_write error may be related to that as well (but then on the nsd-control channel). The reason that the secondary did not transfer, was mentioned by the message "ignoring old serial (222/221)". The secondary did have a zone with a higher serial loaded than it saw on the primary, so it didn't transfer. There could be many reasons for this. Are you certain there is only a single primary? Do you use a zone database (what version of NSD do you use, and what is the value of the Maybe we can spot some potential causes if you can share your configuration? You can send it privately to us by PGP encrypting it. For example to me (willem at nlnetlabs.nl) encrypted with my PGP Key: E5F8 F821 2F77 A498. |
Hello @wtoorop and thank you for your attention to this topic. NSD on primary nodes: NSD on secondary nodes: What is the value of the database setting on the secondaries (and primary): The config is pretty much standard but if you need anything else please let me know. Thanks! |
Hello NLnetLabs Team,
Im reaching out to you as you're the last instance.
I have NSD servres running as one primary and multiple secondary servers pointing to the same primary as for the source of truth. Primary and Secondary NSD servers don't know anything about each other.
Primary NSD configured to allow XFR queries to come from the subnets of Secondary NSD hosts.
Running nsd-control transfer on Secondary hosts is used to fetch the zones configured in nsd.conf file at secondary instance bootstrap. The same cmd is configured as Cron job and runs every 5min. This way I know that the zone changes I make on the primary host will appear on secondary in not more than 5min.
The reason I don't have notify section configured on the primary host is that secondary hosts are cloud vms and can be replaced by the cloud for any reason and at any time. This makes it impossible to preconfigure Primary NSD with ip addresses of any secondary servers.
Recently the sync between primary and the secondary hosts didn't work.
Quick check on secondary hosts with cmd "nsd-control zonestatus" showed that of the zones has its state as "refreshing".
I tried to run nsd-control transfer in my terminal but that didn't change anything, then I ran "nsd-control force_transfer"
which made a zone transfer ignoring Serial field of SOA reacord of all zones and the issue was fixed.
Repeating nsd-control zonestatus was showing all zones "status: ok".
After checking the logs I saw some weird stuff:
nsd.log:
[timestamp] nsd[475]: error: xfrd: failed writing tcp Operation now in progress
[timestamp] nsd[475]: info: xfrd: zone example.com ignoring old serial (222/221) from
[timestamp] nsd[475]: info: xfrd: zone example.com bad transfer 0 from
This lines also appear randomly in nsd.log file:
error: could not SSL_write crypto error:00000000:lib(0)::reason(0)
For some reason NSD on secondary hosts didn't work properly.
But even if there was an issue with zone transfer I still don't understand why Cron job command 'nsd-control transfer' that was repeating every few minutes didn't fix the issue. The TCP line from the log does not really say anything if it was a communication issue with primary server. The "old serial" lines are not informative either. How did secondary end up with a higher Serial number than the primary makes no sense to me.
Im not sure putting "nsd-control force_transfer" in Cron is a good idea as I don't want to fetch zones from primary regardless of the change in the Serial number for the zone.
Ubuntu srv 22.04
NSD version 4.3.9
Configure line: --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --with-configdir=/etc/nsd --with-nsd_conf_file=/etc/nsd/nsd.conf --with-pidfile=/run/nsd/nsd.pid --with-dbfile=/var/lib/nsd/nsd.db --with-zonesdir=/etc/nsd --with-xfrdfile=/var/lib/nsd/xfrd.state --disable-largefile --disable-recvmmsg --enable-root-server --enable-mmap --enable-ratelimit --enable-zone-stats --enable-systemd --enable-checking --enable-dnstap
Event loop: libevent 2.1.12-stable (uses epoll)
Linked with OpenSSL 3.0.2 15 Mar 2022
Please let me know if you need any other details.
Thanks!
The text was updated successfully, but these errors were encountered: