Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to pass DNS challenge with Caddy 2.8+ #42

Open
ozapotichnyi opened this issue Jun 6, 2024 · 17 comments
Open

Unable to pass DNS challenge with Caddy 2.8+ #42

ozapotichnyi opened this issue Jun 6, 2024 · 17 comments

Comments

@ozapotichnyi
Copy link

Wildcard DNS challenge stopped working after update to Caddy 2.8.

The minimum reproducible setup:

Caddy config:

{
  storage consul {
    prefix "caddytls"
  }
  admin :2019

  debug

  email [email protected]
}

*.example.com {
  log {
    format json
  }

  tls {
    dns route53
  }
}

Dockerfile:

FROM --platform=linux/amd64 caddy:2-builder-alpine@sha256:cdf3364f8cb02338b857728fdc0a9b8875b343996db347300bf2361db3da9094 AS builder

RUN xcaddy build \
    --with github.com/pteich/caddy-tlsconsul \
    --with github.com/caddy-dns/route53

FROM --platform=linux/amd64 caddy:2-alpine@sha256:a48e22edad925dc216fd27aa4f04ec49ebdad9b64c9e5a3f1826d0595ef2993c

COPY --from=builder /usr/bin/caddy /usr/bin/caddy

Logs:

{"level":"info","ts":1717682068.1877885,"logger":"tls.obtain","msg":"lock acquired","identifier":"*.example.com"}
{"level":"info","ts":1717682068.1907144,"logger":"tls.obtain","msg":"obtaining certificate","identifier":"*.example.com"}
{"level":"debug","ts":1717682068.1908574,"logger":"events","msg":"event","name":"cert_obtaining","id":"60de8b42-ab04-4b13-9920-03713277aa4a","origin":"tls","data":{"identifier":"*.example.com"}}
{"level":"debug","ts":1717682068.1911874,"logger":"tls.obtain","msg":"trying issuer 1/1","issuer":"acme-v02.api.letsencrypt.org-directory"}
{"level":"debug","ts":1717682068.191264,"logger":"caddy.storage.consul","msg":"loading data from Consul for acme/acme-v02.api.letsencrypt.org-directory/users/[email protected]/caddy.json"}
{"level":"debug","ts":1717682068.1937697,"logger":"caddy.storage.consul","msg":"loading data from Consul for acme/acme-v02.api.letsencrypt.org-directory/users/[email protected]/caddy.key"}
{"level":"info","ts":1717682068.1980238,"logger":"tls.issuance.acme","msg":"waiting on internal rate limiter","identifiers":["*.example.com"],"ca":"https://acme-v02.api.letsencrypt.org/directory","account":"[email protected]"}
{"level":"info","ts":1717682068.198052,"logger":"tls.issuance.acme","msg":"done waiting on internal rate limiter","identifiers":["*.example.com"],"ca":"https://acme-v02.api.letsencrypt.org/directory","account":"[email protected]"}
{"level":"info","ts":1717682068.1981454,"logger":"tls.issuance.acme","msg":"using ACME account","account_id":"https://acme-v02.api.letsencrypt.org/acme/acct/1763210887","account_contact":["mailto:[email protected]"]}
{"level":"debug","ts":1717682068.400449,"logger":"tls.issuance.acme.acme_client","msg":"http request","method":"GET","url":"https://acme-v02.api.letsencrypt.org/directory","headers":{"User-Agent":["Caddy/2.8.4 CertMagic acmez (linux; amd64)"]},"response_headers":{"Cache-Control":["public, max-age=0, no-cache"],"Content-Length":["746"],"Content-Type":["application/json"],"Date":["Thu, 06 Jun 2024 13:54:28 GMT"],"Server":["nginx"],"Strict-Transport-Security":["max-age=604800"],"X-Frame-Options":["DENY"]},"status_code":200}
{"level":"debug","ts":1717682068.400676,"logger":"tls.issuance.acme.acme_client","msg":"creating order","account":"https://acme-v02.api.letsencrypt.org/acme/acct/1763210887","identifiers":["*.example.com"]}
{"level":"debug","ts":1717682068.4561968,"logger":"tls.issuance.acme.acme_client","msg":"http request","method":"HEAD","url":"https://acme-v02.api.letsencrypt.org/acme/new-nonce","headers":{"User-Agent":["Caddy/2.8.4 CertMagic acmez (linux; amd64)"]},"response_headers":{"Cache-Control":["public, max-age=0, no-cache"],"Date":["Thu, 06 Jun 2024 13:54:28 GMT"],"Link":["<https://acme-v02.api.letsencrypt.org/directory>;rel=\"index\""],"Replay-Nonce":["su1caOmbBxQwQu9hLgYH8tMvuXSY0yd8jUjEqWyqAihX7TMZGos"],"Server":["nginx"],"Strict-Transport-Security":["max-age=604800"],"X-Frame-Options":["DENY"]},"status_code":200}
{"level":"debug","ts":1717682068.5403905,"logger":"tls.issuance.acme.acme_client","msg":"http request","method":"POST","url":"https://acme-v02.api.letsencrypt.org/acme/new-order","headers":{"Content-Type":["application/jose+json"],"User-Agent":["Caddy/2.8.4 CertMagic acmez (linux; amd64)"]},"response_headers":{"Boulder-Requester":["1763210887"],"Cache-Control":["public, max-age=0, no-cache"],"Content-Length":["345"],"Content-Type":["application/json"],"Date":["Thu, 06 Jun 2024 13:54:28 GMT"],"Link":["<https://acme-v02.api.letsencrypt.org/directory>;rel=\"index\""],"Location":["https://acme-v02.api.letsencrypt.org/acme/order/1763210887/275980118617"],"Replay-Nonce":["su1caOmb2AuTy7-eFJ7SHv1wOCyVgybSNdoJKeGjNcwOLeTGn7k"],"Server":["nginx"],"Strict-Transport-Security":["max-age=604800"],"X-Frame-Options":["DENY"]},"status_code":201}
{"level":"debug","ts":1717682068.600306,"logger":"tls.issuance.acme.acme_client","msg":"http request","method":"POST","url":"https://acme-v02.api.letsencrypt.org/acme/authz-v3/360439255817","headers":{"Content-Type":["application/jose+json"],"User-Agent":["Caddy/2.8.4 CertMagic acmez (linux; amd64)"]},"response_headers":{"Boulder-Requester":["1763210887"],"Cache-Control":["public, max-age=0, no-cache"],"Content-Length":["391"],"Content-Type":["application/json"],"Date":["Thu, 06 Jun 2024 13:54:28 GMT"],"Link":["<https://acme-v02.api.letsencrypt.org/directory>;rel=\"index\""],"Replay-Nonce":["su1caOmbKx5cQpgNcP62Uc4bXmQr1rpUrDLGB9LmzmTeSj7AokU"],"Server":["nginx"],"Strict-Transport-Security":["max-age=604800"],"X-Frame-Options":["DENY"]},"status_code":200}
{"level":"info","ts":1717682068.6005263,"logger":"tls.issuance.acme.acme_client","msg":"trying to solve challenge","identifier":"*.example.com","challenge_type":"dns-01","ca":"https://acme-v02.api.letsencrypt.org/directory"}
{"level":"error","ts":1717682068.6307743,"logger":"tls.issuance.acme.acme_client","msg":"cleaning up solver","identifier":"*.example.com","challenge_type":"dns-01","error":"no memory of presenting a DNS record for \"_acme-challenge.example.com\" (usually OK if presenting also failed)"}
{"level":"debug","ts":1717682068.6949975,"logger":"tls.issuance.acme.acme_client","msg":"http request","method":"POST","url":"https://acme-v02.api.letsencrypt.org/acme/authz-v3/360439255817","headers":{"Content-Type":["application/jose+json"],"User-Agent":["Caddy/2.8.4 CertMagic acmez (linux; amd64)"]},"response_headers":{"Boulder-Requester":["1763210887"],"Cache-Control":["public, max-age=0, no-cache"],"Content-Length":["395"],"Content-Type":["application/json"],"Date":["Thu, 06 Jun 2024 13:54:28 GMT"],"Link":["<https://acme-v02.api.letsencrypt.org/directory>;rel=\"index\""],"Replay-Nonce":["su1caOmbzRAm8TrBKvAcq-Lm4Xi-o3g5q22uZzpGo6jRk7hundE"],"Server":["nginx"],"Strict-Transport-Security":["max-age=604800"],"X-Frame-Options":["DENY"]},"status_code":200}
{"level":"error","ts":1717682068.696349,"logger":"tls.obtain","msg":"could not get certificate from issuer","identifier":"*.example.com","issuer":"acme-v02.api.letsencrypt.org-directory","error":"[*.example.com] solving challenges: presenting for challenge: adding temporary record for zone \"example.com.\": not found, ResolveEndpointV2 (order=https://acme-v02.api.letsencrypt.org/acme/order/1763210887/275980118617) (ca=https://acme-v02.api.letsencrypt.org/directory)"}
{"level":"debug","ts":1717682068.6964366,"logger":"events","msg":"event","name":"cert_failed","id":"8bf8efb3-0aa5-4e63-8478-33cf4bb9906a","origin":"tls","data":{"error":{},"identifier":"*.example.com","issuers":["acme-v02.api.letsencrypt.org-directory"],"renewal":false}}
{"level":"error","ts":1717682068.6964548,"logger":"tls.obtain","msg":"will retry","error":"[*.example.com] Obtain: [*.example.com] solving challenges: presenting for challenge: adding temporary record for zone \"example.com.\": not found, ResolveEndpointV2 (order=https://acme-v02.api.letsencrypt.org/acme/order/1763210887/275980118617) (ca=https://acme-v02.api.letsencrypt.org/directory)","attempt":1,"retrying_in":60,"elapsed":0.508639999,"max_duration":2592000}

Everything pass fine with Caddy 2.7.6.

Any suggestions are appreciated.

@ryantiger658
Copy link

Same issue here. I tried re-issuing my AWS keys, but AWS is reporting that they are "not used". I think for some reason it is not presenting the auth.

@ryantiger658
Copy link

I am wondering if we just need to bump the caddy version since there were so many breaking changes

github.com/caddyserver/caddy/v2 v2.7.3

@ryantiger658
Copy link

It looks like it is related to this issue: libdns/route53#235 (comment)

Which is related to this issue: aws/aws-sdk-go-v2#2370 (comment)

richid added a commit to richid/caddy-docker-proxy-r53-dns that referenced this issue Jun 14, 2024
There is a bug in the version of the AWS SDK that libdns/route53 currently uses, so instead use a fork that has the SDK version bumped.

Related:
 * aws/aws-sdk-go-v2#2370 (comment)
 * libdns/route53#235
 * caddy-dns/route53#42
@kdevan
Copy link

kdevan commented Jun 14, 2024

Ran into the same issue with a single individual domain, not wildcard. The fix mentioned here that ryantiger685 mentions worked for me. Looks like PRs in that repository need to get merged to fix this officially.

Edit: Just tested wildcard and that's working with this fix as well.

@eth-limo
Copy link

Just ran into this as well after upgrading Caddy to v2.8.4.

@aymanbagabas
Copy link
Collaborator

aymanbagabas commented Jun 24, 2024

Could you test this with the latest version and wait_for_propagation enabled?

{
  "module": "acme",
  "challenges": {
    "dns": {
      "provider": {
        "name": "route53",
        "wait_for_propagation": true,
      }
    }
  }
}

@checkerbomb
Copy link

checkerbomb commented Jun 25, 2024

FWIW, I'm using a Dockerfile to build https://github.com/lucaslorentz/caddy-docker-proxy with this plugin, and simply rebuilding the container with the latest release of this plugin and Caddy 2.8.4 was enough to solve the DNS challenge problem described in this thread, although I am not using a wildcard domain. I did not need to use the wait_for_propagation parameter.

@kdevan
Copy link

kdevan commented Jun 27, 2024

Could you test this with the latest version and wait_for_propagation enabled?

{
  "module": "acme",
  "challenges": {
    "dns": {
      "provider": {
        "name": "route53",
        "wait_for_propagation": true,
      }
    }
  }
}

Yes, this works! Just tested with a new domain. Feels good removing all the hacks :)

This may be unrelated but just to note, I did get a new error from Route 53: Invalid Configuration: Missing Region

I just added us-east-1 as the region value and the error went away and everything works! Just thought I'd mention that this parameter may be required now.

@kdevan
Copy link

kdevan commented Jun 27, 2024

Ah sorry, I spoke too soon. The normal domain worked but the wildcard domain did not.

{
  "level": "error",
  "ts": 1719515037.2461495,
  "logger": "tls.obtain",
  "msg": "will retry",
  "error": "[*.stage.foo.bar.com] Obtain: [*.stage.foo.bar.com] solving challenges: presenting for challenge: adding temporary record for zone \"foo.bar.com.\": exceeded max wait time for ResourceRecordSetsChanged waiter (order=https://acme-staging-v02.api.letsencrypt.org/acme/order/152473533/17457386443) (ca=https://acme-staging-v02.api.letsencrypt.org/directory)",
  "attempt": 4,
  "retrying_in": 300,
  "elapsed": 546.902648806,
  "max_duration": 2592000
}

Edit:

I manually deleted the TXT record from Route 53, restarted Caddy, and the wildcard domain works! Not sure what happened here the first time but might just have been something on my end.

I saw that these two are the first errors which led me to do the extra troubleshooting:

{
  "level": "error",
  "ts": 1719514555.4299963,
  "logger": "tls.issuance.acme.acme_client",
  "msg": "cleaning up solver",
  "identifier": "stage.foo.bar.com",
  "challenge_type": "dns-01",
  "error": "deleting temporary record for name \"foo.bar.com.\" in zone {\"\" \"TXT\" \"_acme-challenge.stage\" \"wEz6Z5Ta1vy5Z9ebcVcfyZTmptaYdfc-QtYRA_wV6Bs\" \"0s\" '\\x00' '\\x00'}: exceeded max wait time for ResourceRecordSetsChanged waiter"
}
{
  "level": "error",
  "ts": 1719514643.3972101,
  "logger": "tls.issuance.acme.acme_client",
  "msg": "cleaning up solver",
  "identifier": "*.stage.foo.bar.com",
  "challenge_type": "dns-01",
  "error": "deleting temporary record for name \"foo.bar.com.\" in zone {\"\" \"TXT\" \"_acme-challenge.stage\" \"JvKk2qrEWpbsgvZ06rU1GKc28NKvKAxP_gwc-j1IVGA\" \"0s\" '\\x00' '\\x00'}: operation error Route 53: ChangeResourceRecordSets, https response error StatusCode: 400, RequestID: d4277a4b-bef0-423b-bfef-8e68495ea501, InvalidInput: Invalid XML ; javax.xml.stream.XMLStreamException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 248; cvc-complex-type.2.4.b: The content of element 'ResourceRecords' is not complete. One of '{\"https://route53.amazonaws.com/doc/2013-04-01/\":ResourceRecord}' is expected."
}

@aymanbagabas
Copy link
Collaborator

I just added us-east-1 as the region value and the error went away and everything works! Just thought I'd mention that this parameter may be required now.

fwiw, the plugin can take the value from the AWS_REGION environment variable.

@aymanbagabas
Copy link
Collaborator

@kdevan The exceeded max wait time for ResourceRecordSetsChanged waiter error just means the default wait time, 1 minute, wasn't enough for the records to propagate. You could try and increase the time using max_wait_dur.

@RigoOnRails
Copy link

@aymanbagabas Hi! Just to clarify, we should be setting wait_for_propagation to true when working with wildcard certificates right? Thanks :)

@batesenergy
Copy link

We still get the "exceeded max wait time for ResourceRecordSetsChanged waiter" we have set wait_for_propagation to "true" and set a "max_wait_dur" to 120. Anyone else still having this issue?

@nebez
Copy link

nebez commented Sep 9, 2024

The only way to get it working for me with a wildcart certificate was this:

*.mydomain.tld {
  tls {
    dns route53 {
      region "ca-central-1"
      wait_for_propagation true
    }
  }
}

Importantly, setting max_wait_dur to anything other than the default value was not working. And I did need to specify the region... for some reason.

@batesenergy
Copy link

For anyone also having trouble. I finally made this work by removing the "wait_for_propagation true" from the caddyfile and it worked right away.

tls {
dns route53 {
access_key_id "id"
secret_access_key "password"
region "us-east-1"
}

@aymanbagabas
Copy link
Collaborator

aymanbagabas commented Sep 9, 2024

Importantly, setting max_wait_dur to anything other than the default value was not working.

There was a bug with max_wait_dur always using nanoseconds. With v1.5.1, the value for max_wait_dur is always in seconds.

And I did need to specify the region... for some reason.

If region is not specified, it will try to load the region from $AWS_REGION as described in https://aws.github.io/aws-sdk-go-v2/docs/configuring-sdk/#specifying-the-aws-region

EDIT: I've updated the readme to indicate that defining AWS_REGION and aws credentials are required

@dkebler
Copy link

dkebler commented Sep 28, 2024

Amazing this was unexpected! This new requirement of AWS region totally brought down my whole set of reverse proxies including my cloud when the certs needed to be updated. As soon as I saw that region error I came here.

One question is what region? Does it even matter? Do I use the one I see in the AWS console? https://us-east-1.console.aws.amazon.com/ AFAIK route53 is not related to a region so why the region anyway. So I did set mine to us-east-1. Sure am glad this was just my personal network so being down overnight was not a big issue. Not sure how one could get info on a "breaking" change like this beforehand, but sure would be nice.

Below is working for me now for wildcards. My IAM credentials are environment variables. As others mentioned some times old _acme records don't get cleaned out so I do so via the AWS console. If I feel like I need a clean slate (recreate all the certs) I delete all the caddy settings/certs and restart. At least for arch they can be found at /var/lib/caddy

  tls <redcat>@gmail.com {
    dns route53 {
      max_retries 10
      region "us-east-1"
      wait_for_propagation true
    }
    resolvers 8.8.8.8 1.1.1.1
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants