Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add collection redirect page and consolidate htaccess configuration files #261

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

oraNod
Copy link
Collaborator

@oraNod oraNod commented Nov 11, 2024

This PR makes the following changes:

  • Updates https://docs.ansible.com/collections.html to act as a stub page for redirecting plugin and module pages that existed prior to collections.
  • Adds rules to redirect users from plugin and module pages to the new collections.html page.
  • Consolidates all the dynamic redirects, except for version 2.9, in the ansible subdirectory.
  • Removes all consolidated htaccess configuration files in the ansible subdirectory.

As a result of this change users that access plugin and module pages will be redirected to collections.html. For example, this link is available from an ansible.com blog post: https://docs.ansible.com/ansible/latest/modules/k8s_module.html

A rule exists to redirect this page to the corresponding collection page, which has been moved and results in a 404 when the redirect is followed:

RedirectMatch "^(/ansible/[^/]+)/modules/k8s_module.html" "$1/collections/community/kubernetes/k8s_module.html"

Instead of having to maintain all the rules in the htaccess configuration files between releases, we can redirect users to a single page. While true that this will require users to search for the appropriate documentation, it does avoid 404s and any resulting SEO degradation.

For any reviewers, here's a link to the collections.html page on the RTD preview build: https://ansible--261.org.readthedocs.build/collections.html

To evaluate these changes on the test server, use this branch in my fork because it tests the collections.html page on the test server. Note to say that we can't actually evaluate these changes with the test server because the rsync command that the jenkins job uses doesn't include the --delete flag so the old .htaccess configuration files aren't getting removed. They aren't playing nicely with the redirects here. Plan B is to stand up a test server somewhere. I don't want to prune old files off the actual test server just to validate these redirects.

<ul class="fa-ul">
<li>
<span class="fa-li"><i class="fas fa-book"></i></span>
<a href="{{ index.quicklinks.collection_index.link }}" target="_blank">{{ index.quicklinks.collection_index.label }}</a>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

builtin modules are by far the most popular. Can we add a quick link here for https://docs.ansible.com/ansible/latest/collections/ansible/builtin/index.html#plugins-in-ansible-builtin

@samccann
Copy link
Collaborator

samccann commented Nov 11, 2024

I copied the .httaccess to https://htaccess.madewithlove.com/ and typed in a bunch of urls to see what it would redirect to:

  • Verify all test cases from Documentation Checklist: Ansible 9 release ansible-documentation#428 (comment) point to the new collection stub page.
  • Verify 2.9 module pages do NOT redirect
  • Verify Ansible 9 and latest pages do NOT redirect
  • Verify 2.3-2.6 module pages redirect to the new collection stub page
  • Verify 2.3, 2.4, 2,5 and 2.6 specific redirects still work (not related to vault, guides, or modules - see next comment for details)
  • Verify top 20 pages based on web analytics work as expected.

So ansible (precollections) 2.3 through 2.6 redirect today to /latest/ and we republished those guides to an archive site so those who do still need them can fine them. We didn't do this with 2.7 and 2.8 because I couldn't republish them (jenkins updates made it too difficult so I gave up). Just putting that here for history on why these proposed new redirects don't go beyond 2.6.

@samccann
Copy link
Collaborator

Some problems (running same test as prior comment):

@samccann
Copy link
Collaborator

My general thoughts - though it does 'lower' the user experience ( aka I used to get to the correct module page from an ancient stack overflow page that predated collections), it is acceptable since collections are now 4+ years old. I agree with @oraNod that this is necessary both from a maintenance point of view (see k8s module currently 404s because we can't test/update 1k redirects to keep them all accurate), and from the overall stratey to move to RTD (which cannot handle 1k redirects anyway).

@samccann
Copy link
Collaborator

oh last thoughts - once we get enough approvers, we should blast out a warning in matrix before we merge (and don't merge on a Friday lol) in case it all blows up...

@oraNod
Copy link
Collaborator Author

oraNod commented Nov 11, 2024

oh last thoughts - once we get enough approvers, we should blast out a warning in matrix before we merge (and don't merge on a Friday lol) in case it all blows up...

@samccann I plan to create a forum post on this but would like to get some thoughts from folks on the review list first. I'll also announce this on the bullhorn for at least two editions.

@oraNod oraNod requested a review from gundalow November 11, 2024 20:09
@oraNod
Copy link
Collaborator Author

oraNod commented Nov 11, 2024

Some problems (running same test as prior comment):

* 2.3 - 2.6 guides redirect to a strange location. (Prior to this PR, they would redirect to their related pages on /latest/) https://docs.ansible.com/ansible/2.6/installation_guide/intro_configuration.html is one example I used.

* vault redirect (2.3, 2.4) doesn't seem to work https://docs.ansible.com//ansible/2.4/vault.html

* vault redirects (2.5,2.6) go to archive instead of latest https://docs.ansible.com/ansible/2.5/user_guide/vault.html

thanks for the initial test findings! I'm going to create a separate PR for docs.testing.ansible.com so we can validate everything.

can you be more specific about the redirects you were testing in this comment, please?

* 2.3 - 2.6 guides redirect to a strange location. (Prior to this PR, they would redirect to their related pages on /latest/) https://docs.ansible.com/ansible/2.6/installation_guide/intro_configuration.html is one example I used.

Are you referring to the (plugins|modules) rules here?

RedirectMatch permanent "^/ansible/(2\.(10|[3-6]))/(plugins|modules)/(.+)\.html$" "https://docs.ansible.com/collections.html"

Or did you observe something weird with the specific guide redirects here?

# Vault redirects (2.3, 2.4, 2.5, 2.6)

There aren't any rules configured for installation_guide/intro_configuration.html so if you were looking at that page the weirdness could be from something else.

@oraNod
Copy link
Collaborator Author

oraNod commented Nov 11, 2024

also @samccann I think that catch all rule at the end might be causing a problem:

# EOL Archive Redirects for all the rest

I haven't set this up on the test server yet (will do that tomorrow morning b/c vpn...) but I'm getting a server 500 error now. this rule might be causing an issue. specifically the $1 backreference might not be capturing the right path. I'm not sure but that could be what is behind that strange location you were hitting.

that catch all redirect came from the original config file that Toshio set up:

RedirectMatch permanent "^/ansible/2.6/?(.+)?.html" "/ansible/latest/$1.html"

I bet we could just nuke that. it's not even in all the old 2.x config files and likely more hassle than it's worth. I think we really only need to be concerned with the pre-collections plugins and modules stuff.

@oraNod
Copy link
Collaborator Author

oraNod commented Nov 11, 2024

My general thoughts - though it does 'lower' the user experience ( aka I used to get to the correct module page from an ancient stack overflow page that predated collections), it is acceptable since collections are now 4+ years old. I agree with @oraNod that this is necessary both from a maintenance point of view (see k8s module currently 404s because we can't test/update 1k redirects to keep them all accurate), and from the overall stratey to move to RTD (which cannot handle 1k redirects anyway).

wrt user experience I think it will be possible to embed a straightforward enough search implementation with Javascript - or better ReadTheDocs query with ElasticSearch indexing - so that you can search through https://docs.ansible.com/ansible/latest/collections/* and find matching results.

at the same time I don't think we should go overkill with it and create another maintenance nightmare for ourselves, especially as collections are 4+ years old at this point. tbh avoiding 404s and degrading SEO authority from thousands of broken links is more of a concern than users not being taken directly to the corresponding collection.

now that I think about it we should also point users to the forum to ask for help as well as the collection index and other doc links...

@samccann
Copy link
Collaborator

samccann commented Nov 11, 2024

There aren't any rules configured for installation_guide/intro_configuration.html so if you were looking at that page the weirdness could be from something else.

So yes, based on that tool I was using, all the guides will go someplace wonky. So anything that is NOT a plugin or NOT that vault page, seems to end up in a strange place.

just tried https://docs.ansible.com/ansible/2.3/reference_appendices/YAMLSyntax.html and in that tool, it triggers this redirect:
RedirectMatch permanent "^/ansible/(2\.(10|[3-6]))/(.+)\.html$" "/ansible/latest/$1.html"
And ends up at https://docs.ansible.com/ansible/latest/2.3.html

So you are correct, it's caused by the combined redirect that should be pushing older EOL docs to latest.

@samccann
Copy link
Collaborator

that catch all redirect came from the original config file that Toshio set up:

RedirectMatch permanent "^/ansible/2.6/?(.+)?.html" "/ansible/latest/$1.html"

The catchall redirect came from me, not toshio. It's the redirect that takes all links to 2.6 and redirects to latest. We need to keep this category of redirects as that is what ensures those top google hits that used to point to 2.4-2.6 docs now end up on latest.

@samccann
Copy link
Collaborator

The vault redirects aren't working and I wonder if they ever did. I'll try to test this more tomorrow, but the original vault location for 2.3, 2.4 for example was https://docs.ansible.com/archive/ansible/2.4/vault.html but that doesn't match what I put in the EOL redirects in the 2.4 directory....

@oraNod
Copy link
Collaborator Author

oraNod commented Nov 13, 2024

Thanks for the extra details @samccann - I'll have to come back to that. I was thinking that catchall redirect might have been causing a conflict with other redirect rules that was borking the test server.

^/ansible/2.6/?(.+)?.html" "/ansible/latest/$1.html

I think the good news here is that we can add these to Read The Docs like this:

Type: Exact Redirect
From URL: /ansible/2.6/*
To URL: /projects/ansible/latest/:splat

See https://docs.readthedocs.io/en/stable/user-defined-redirects.html#redirecting-a-directory

I know we're limited to 100 redirects per RTD project but if all this consolidation works as expected then we'll be well below that limit for the top-level project.

@oraNod oraNod force-pushed the redirect-collections branch from 7905896 to a4b3ab2 Compare November 13, 2024 17:55
.htaccess Outdated Show resolved Hide resolved
.htaccess Outdated Show resolved Hide resolved
.htaccess Outdated Show resolved Hide resolved
.htaccess Outdated Show resolved Hide resolved
.htaccess Outdated Show resolved Hide resolved
.htaccess Outdated Show resolved Hide resolved
@oraNod oraNod force-pushed the redirect-collections branch from de740da to d1dd82c Compare November 20, 2024 17:46
@oraNod oraNod force-pushed the redirect-collections branch from d1dd82c to f64b482 Compare November 21, 2024 11:46
@oraNod oraNod requested a review from samccann November 21, 2024 11:46
@oraNod
Copy link
Collaborator Author

oraNod commented Nov 21, 2024

@samccann I've rebased and updated everything. If you want to do any more validation, I've put the same .htaccess configuration on the test instance.

I also pushed a commit to tidy up somethings by adding more comments and removing some old lines that were commented out. Something else we can do is to get rid of this configuration: https://github.com/ansible/docsite/blob/main/ansible/.htaccess

# This disables multiview options that prevent the server from trying to match similar page names.
# This option tries to get exact URL matches to prevent unexpected redirects.
Options -MultiViews

Docs on the multiviews option are in the content negotiation page and options directive page.

# This points to the 404 page in the package docs instead of the 404 in the landing pages.
ErrorDocument 404 /ansible/latest/404.html

I guess the idea is to make it seem like users remain in the package docs, which is good. Another approach might be to improve the 404 in the landing pages and just use that but it might be frustrating for users to have to navigate back to the package docs. However my thinking would be to provide some useful resources / links in the main 404 page that could help folks with finding what they need more easily. Since we have a custom 404 page we should make the most of it.

Finally, I'd suggest the next steps before merging this are to finalize that new collections.html page and then post to the bullhorn and forum to give community folks a heads up and invite feedback.

Copy link
Member

@nitzmahone nitzmahone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My directive syntax is sooo rusty, I don't want to "approve" as in "I've verified this for correctness", but I'm a big +1 for the overall approach.

As I previously shared with @oraNod privately, I think this approach is really the only reasonable way to go- while I hate breaking links, we just don't have the resources to permalink everything we've ever done while both accounting for all past structural/content/ownership changes and avoiding paralysis for future changes.

Thanks!

@oraNod
Copy link
Collaborator Author

oraNod commented Nov 29, 2024

Testing the changes in this PR

I've been testing the consolidated redirect rules in the .htaccess configuration file in this PR. Sharing my findings here. I'll try to keep it as brief as possible.

Test methodology

I created a couple of scripts to iterate over URLs and return the http status code for pages in the Ansible package docs. You can find the scripts and all the test details here: https://github.com/oraNod/url-status-checker (I plan to add that Python script to this repo but I want to make a few tweaks to it first.)

I used docs.testing.ansible.com as the test environment because it contains Ansible 2.x and 3-11 builds of the packge docs. Before running any tests, I removed any existing .htaccess files to avoid conflicting redirect rules. I also temporarily modified the build jobs that were deploying to the test environment so changes weren't lost in the middle of a run.

Complete details about how everything works is available in the README here: https://github.com/oraNod/url-status-checker/blob/main/README.md

Test results

I've attached txt and csv files generated from the url checker script in this tarball:

redirect-reports.tar.gz

  • All plugins and modules pages return a 301 status and redirect to the collections.html page.
  • Ansible 3 - 10 pages return a 200 status and are not redirected.
  • Ansible 2.9 pages return a 200 status.
  • Redirect rules for Ansible 2.9 return a 404 status. This is deceptive though because the Ansible 2.9 redirects are a special case and used for the version switcher. When Ansible 2.9 is removed from the version switcher and reaches EOL, we can add that to the consolidated redirects and validate them.

Target page 404s

Approximately 870 target pages (the page to which you are redirected) return a 404 status.

A large number of these pages get created by the catch all redirect rules when a page from an older version gets redirected to a non-existent latest version. For example, this page in 2.10 docs: https://docs.ansible.com/ansible/2.10/collections/amazon/aws/aws_az_facts_module.html

The catch all rule constructs a redirect to https://docs.ansible.com/ansible/2.10/collections/amazon/aws/aws_az_facts_module.html which returns a 404 because the originating page is deprecated.

Another example are pages that get renamed, such as /dev_guide/developing_modules_python3.html which was renamed to /dev_guide/developing_python_3.html. There are also around 30 404 pages for various scenario guides and other pages that were moved or renamed, such as 2.x community roadmaps.

Many of these are 404 pages that already exist outside of this PR. For example:

Some of the 404 pages are caused by the catch all rules in this PR though. For example:

I think it's reasonable to accept the fact that older pages will break and 404s are going to happen. The target page 404s are all for Ansible 2.x versions and there are less than 900 of them out of 19,646 URLs tested. That is less than 5% of Ansible 2.x redirects leading to a 404 page.

@oraNod oraNod force-pushed the redirect-collections branch from 8a6da4a to a10f424 Compare December 4, 2024 18:40
link: "https://docs.ansible.com/ansible/latest/collections/all_plugins.html"
builtin_index:
label: Index of all modules and plugins contained in ansible-core
link: "https://docs.ansible.com/ansible/latest/collections/ansible/builtin/index.html"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nicer if this link goes to https://docs.ansible.com/ansible-core/..., but unfortunately there's no latest equivalent for ansible-core...

.htaccess Outdated
#####################################################################

# Redirect plugin and module pages for devel and latest
RedirectMatch permanent "^/ansible/(devel|latest)/(plugins|modules)/(.+)\.html$" "/collections.html"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One big downside of this redirect is that it breaks a lot of existing links on the web (in blog posts, stack overflow answers, mailing list answers, ...) from Ansible 2.9 and before that right now still work.

I have no idea how frequently these URLs are still used though. Finding the module/plugin in question isn't too hard with the new collections.html page, though, so I guess it will be OK...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look @felixfontein I can also see this is going to hose the links to these pages: https://docs.ansible.com/ansible/latest/plugins/

I overlooked those plugin pages, but glad I caught them. I'll raise this at the next DaWGs meeting but maybe we should remove this redirect rule and avoid globbing any latest or devel urls to the collections.html page.

Either that or we add a negative lookahead for the plugin pages which do exist, such as:

RedirectMatch permanent "^/ansible/(devel|latest)/modules/(.+)\.html$" "/collections.html"
RedirectMatch permanent "^/ansible/(devel|latest)/plugins/(?!action\.html|become\.html|cache\.html|callback\.html|cliconf\.html|connection\.html|docs_fragment\.html|filter\.html|httpapi\.html|inventory\.html|lookup\.html|module\.html|module_util\.html|netconf\.html|plugins\.html|shell\.html|strategy\.html|terminal\.html|test\.html|vars\.html)/(.+)\.html$" "/collections.html"

An alternative is to take all the latest source pages and redirect them to collections using the https://pypi.org/project/sphinx-reredirects/ extension.

It doesn't remove the maintenance overhead and maybe adds to the build time. We could probably add some regex to reduce the number of redirects that we'd need and point to specific collections, for example:

redirects = {
     "modules/azure_*": "collections/azure/index.html",
}

I guess we can see but we need to rethink the devel and latest rules here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that works like a champ. I tried an experimental commit: oraNod/ansible-documentation@ec0790a

And deployed it to the test site on pages. All the latest redirects are there and it doesn't seem to add to the build time.

@oraNod
Copy link
Collaborator Author

oraNod commented Dec 18, 2024

@samccann (and all) Heads up that I've pushed a final commit: 467753a

This restores the redirects for any "latest" urls so that the changes in this PR are limited in scope to only the 2.x versions.

We've got a couple of options for the "latest" urls and can do them separately. I'll send related PRs and update the forum post to explain everything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants