Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSL certificates refactor #1310

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

swalkinshaw
Copy link
Member

@swalkinshaw swalkinshaw commented Aug 29, 2021

Newer replacement for #896 with many of the same goals:

  • better structured within roles - all SSL related tasks are centralized within the ssl_certificates role now instead of in 3+ places
  • uses Certbot for ACME certificates instead of acme-tiny (makes it much easier to manually manage/troubleshoot certs)
  • support DNS challenges in addition to HTTP
  • simpler configuration with better defaults (just set enabled: true) to use Lets Encrypt
  • uses step to run a local certificate authority (CA) in development/staging which gives us two things:
    1. same ACME certificate generation instead of self-signed certs
    2. certificates can be more easily trusted in local browsers

Todo:

  • much more testing
  • decide if Certbot is the best option. acme.sh is another one
  • improvements/consistency for naming
  • properly handle the ACME CA server across dev + prod
  • documentation
  • production testing

cc @tangrufus

@tangrufus
Copy link
Member

Question: Does certbot support https://roots.io/docs/trellis/master/ssl/#multiple-servers ?

@swalkinshaw
Copy link
Member Author

Not sure if there's a better way, copying /etc/letsencrypt/accounts across servers would work.

@tangrufus
Copy link
Member

Are we going backwards if we merge all ssl providers into 1 role?
With ansible collections, shouldn't we separate them into individual roles and let users install the roles they actually use?


Question: For self-signed certs on local vagrant VMs, which certs should the user trust on the host machines? (Something like $ step ca root root.crt?)

@swalkinshaw
Copy link
Member Author

swalkinshaw commented Aug 31, 2021

Are we going backwards if we merge all ssl providers into 1 role?
With ansible collections, shouldn't we separate them into individual roles and let users install the roles they actually use?

Tough question... entirely depends on how we want to structure this. With 1 role, they can just change their site config to use acme or manual. With multiple roles, they'd have to do the same and change their playbook to reference a different role.

Does ansible collections actually change much? Couldn't we achieve the same thing now with galaxy roles if we wanted?

This PR really only has two "roles" and the manual is very simple. But ideally we make it easy for someone to replace our SSL certificate implementation with another one. Maybe another ACME client, or something different entirely.

Question: For self-signed certs on local vagrant VMs, which certs should the user trust on the host machines? (Something like $ step ca root root.crt?)

Yeah you need to download the root cert locally and there's a few ways we can make that easier:

  1. use Trellis to fetch it to the host computer
  2. use step ca bootstrap --ca-url https://{site_dev_url}:8443 --fingerprint {FINGERPRINT_THAT_TRELLIS_OUTPUTS}

and then

  1. run step certificate install --all {cert_path}

@tangrufus
Copy link
Member

This PR really only has two "roles" and the manual is very simple. But ideally we make it easy for someone to replace our SSL certificate implementation with another one.

Agree.

Yeah you need to download the root cert locally and there's a few ways we can make that easier:
...

Let make a trellis-cli command for it when this is merged.

@swalkinshaw swalkinshaw force-pushed the ssl-certificates-refactor branch 2 times, most recently from db79cc7 to cc294d1 Compare July 27, 2022 18:43
@swalkinshaw swalkinshaw force-pushed the ssl-certificates-refactor branch 9 times, most recently from f9142fe to 1a374a7 Compare August 2, 2022 15:27
@swalkinshaw swalkinshaw marked this pull request as ready for review August 2, 2022 18:05
@swalkinshaw
Copy link
Member Author

@tangrufus I think this is ready. Other than general fixes + cleanup, there's three main updates:

  1. it ensures any existing renew cron is disabled
  2. I've removed any built-in DNS challenge functionality and made it easier for anyone to provide a custom challenge task file. In the future we can properly build out the DNS plugin functionality if we want
  3. Add certificate install and uninstall commands trellis-cli#311 exists to easily trust the root cert

with_dict: "{{ wordpress_sites }}"
notify: reload nginx

- name: Download SSL client certificate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client certificate shouldn't be "scoped" under manual.
Client certificates are independent from ssl providers.

E.g: Client certificates can be used with acme, manual and other 3rd-party providers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thanks! I've moved this task into main.yml.

Comment on lines +2 to +3
sites_using_acme_ssl: "{{ sites_using_ssl | rejectattr('manual', 'defined') | map(attribute='acme') | map('default', ssl_defaults.acme )}}"
sites_using_manual_ssl: "{{ sites_using_ssl | selectattr('manual', 'defined') }}"
Copy link
Member

@tangrufus tangrufus Aug 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question 1: How should we write the new wordpress_sites.yml?

Question 2: Does sites_using_acme_ssl's rejectattr('manual', 'defined') correctly handle 3rd-party providers?

For example:

    ssl:
      enabled: true
      stapling_enabled: false
      provider: cloudflare-origin-ca

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question 1: How should we write the new wordpress_sites.yml?

For the standard Let's Encrypt case, nothing changes if you want defaults. But here's what the implementation would expect:

ssl:
  enabled: true
  acme:
    challenge:
      type: http-01

I structured it this way so that DNS challenges could be supported in the future:

ssl:
  enabled: true
  acme:
    challenge:
      type: dns-01
      options:
        some_value: foo

Manual:

ssl:
  enabled: true
  manual:
    certificate: path/to/cert
    key: path/to/key

Question 2: Does sites_using_acme_ssl's rejectattr('manual', 'defined') correctly handle 3rd-party providers?

This PR (as is) wouldn't be compatible with any existing 3P roles that relied on provider existing. But the new structure just relies on a dict key being defined:

ssl:
  enabled: true
  cloudflare-origin-ca:
    stapling_enabled: false

Because of how the tasks are split up and imported dynamically right now, none of the manual or acme tasks would run for the above cloudflare-origin-ca` example (which is good). That should make it fairly simple for a 3P role to conditionally run its own tasks based on the key existing?

{% if item.value.ssl.client_cert_url is defined -%}
ssl_verify_client on;
ssl_client_certificate {{ nginx_ssl_path }}/client-{{ (item.value.ssl.client_cert_url | hash('md5'))[:7] }}.crt;
{% if site_ssl.manual is defined and site_ssl.manual.client_cert_url is defined -%}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client cert can be used in non-manual setups as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the site_ssl.manual is defined part of the conditional 👍

@swalkinshaw swalkinshaw force-pushed the ssl-certificates-refactor branch from c94f980 to 9aab040 Compare August 11, 2022 04:10
Copy link
Member

@tangrufus tangrufus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@craigpearson
Copy link
Contributor

Hey I've been watching the progress on this, good work! One thing that's always been a bit of a bug bear with Trellis and multisite is that the SSL configuration is pretty much tied to a global config. This is great for standard sites, but limits flexibility with multisite setups.

In some Trellis mulitisite setups I've moved the SSL config to be URL specific, this would allow for configurations such as:

wordpress_sites:
  example.com:
    site_hosts:
      - canonical: example.com
        redirects:
          - www.example.com
        ssl:
           provider: manual # Example where you supply a certificate
           cert_file: X # Stored in the recommended group_vars/files/cert.vault.(crt/pem/yml)
           key_file: X
      - canonical: test.com
        redirects:
          - www.test.com
        ssl:
           provider: letsencrypt # Default certificate behaviour
     - canonical: foo.com
        redirects:
          - www.foo.com
        ssl:
           provider: none # SSL termination is handled elsewhere such as at a load balancer

This would allow users to be more granular in their approach to SSL certs per domain on multisite installs. Additionally, this approach could be seen as an override, so you could still set SSL vars as a global WP rule, but on a domain per domain basis override where desired.

My question is, if I was to create a separate PR of how this might look, do you see there being any issues with this approach? Appreciate that this may not of be interest as it's quite a marginalised use case, but I work towards this anyway, so maybe it's beneficial upstream too?

@tangrufus
Copy link
Member

tangrufus commented Aug 12, 2022

I support @craigpearson’s suggestion.

To discuss: Is it possible (or, how much work) to support ssl settings in redirect domains?

site_hosts:
      - canonical: acme-letsencrypt.com
        redirects:
          - www.no-ssl.com
          - *.wildcard-ssl.com
          - 3rd-party-ssl-provider.com
          - acme-but-not-letsencrypt.com

We might need to support client certificates for each of them as well.

Of course, we don’t need to implement all these in 1 PR.

What do you think, @swalkinshaw ?

@swalkinshaw
Copy link
Member Author

Is it possible (or, how much work) to support ssl settings in redirect domains?

Is this separate from @craigpearson's suggestion? Or just reframing his proposal? Not sure if you mean SSL settings per individual redirect host.

@craigpearson I'm not against your feature idea but I do imagine it's quite rare. For that reason I'd rather not sacrifice too much to support it, but it would be nice to know up front if it's even compatible with this proposed PR as is right now.

The main issue I can see is complexity for downstream consumers of SSL certificates (like Nginx confs) and not necessarily generating certs for different domains. Right now the Nginx conf template is structured around a "flattened" list of hosts; they're all treated together which wouldn't work if we wanted to vary HTTPS settings per host.

@swalkinshaw swalkinshaw added this to the v2.0 milestone Sep 28, 2022
@swalkinshaw swalkinshaw force-pushed the ssl-certificates-refactor branch from 9aab040 to 9a5b5b3 Compare October 16, 2022 16:00
@swalkinshaw swalkinshaw force-pushed the ssl-certificates-refactor branch from 9a5b5b3 to e201452 Compare October 16, 2022 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants