-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'describe-architecture-with-hugo'
Now that the site is converted to be built with Hugo and Pagefind, let's reflect that status quo in the document describing the site's architecture. Signed-off-by: Johannes Schindelin <[email protected]>
- Loading branch information
Showing
1 changed file
with
51 additions
and
115 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,161 +1,97 @@ | ||
# git-scm.com architecture | ||
|
||
This document describes the general setup and architecture that runs the | ||
git-scm.com site. The idea is to document all the moving parts that | ||
_aren't_ checked in to this repository. That may help new people joining | ||
the project to help out, as well provide some continuity in case the | ||
maintainer is hit by a bus. | ||
git-scm.com site. | ||
|
||
## Content | ||
|
||
Though the site is a rails app, it can _mostly_ be thought of as serving | ||
static content. It's just that we suck in that static content and | ||
pre-process it using nightly scheduled jobs. We never write anything to | ||
the database on behalf of user requests. | ||
This site is served via GitHub Pages and is a [Hugo](https://gohugo.io/) site | ||
with the search implemented using [Pagefind](https://pagefind.app/). | ||
|
||
The content is a mix of: | ||
|
||
- actual static content in this repository | ||
- original content from this repository | ||
|
||
- community book content brought in from https://github.com/progit; | ||
see the `lib/tasks/book2.rake` file. | ||
see the `script/update-book2.rb` and `script/book.rb` files. | ||
|
||
- manpages from releases of the git project, imported and formatted | ||
via asciidoctor; see the `lib/tasks/index.rake` task. | ||
The content is pre-rendered and tracked in the `external/book/` directory | ||
tree. | ||
|
||
- manual pages from releases of the git project, imported and formatted via | ||
AsciiDoctor, and translated versions of the manual pages from | ||
https://github.com/jnavila/git-manpages-l10n/ (which itself contains | ||
pre-rendered pages from https://github.com/jnavila/git-manpages-l10n/); see | ||
the `script/update-docs.rb` file. | ||
|
||
## Heroku | ||
The pre-rendered pages are tracked in the `external/docs/` directory tree. | ||
|
||
The app itself is served by Heroku. The app name is `git-scm` (so you | ||
can visit it directly as https://git-scm.herokuapp.com). The site is | ||
owned by the git-scm.com team. If you want to be involved in managing | ||
uptime/deploys/etc, you'll need a Heroku account and request to be added | ||
to that team. | ||
To deploy to GitHub Pages, it is necessary to turn off the default setting to | ||
"publish from a branch" and instead change the setting to "publish with a | ||
custom GitHub Actions workflow": | ||
https://docs.github.com/en/pages/getting-started-with-github-pages/configuring-a-publishing-source-for-your-github-pages-site#publishing-with-a-custom-github-actions-workflow | ||
With this change, the site can be tested in the fork by pushing to the | ||
`gh-pages` branch (which will trigger the `deploy.yml` workflow) and then | ||
navigating to https://git-scm.<user>.github.io/. | ||
|
||
We use a few Heroku add-ons: | ||
## Non-static parts | ||
|
||
- Bonsai elasticsearch (see below) | ||
While the site consists mostly of static content, there are a couple of | ||
parts that are sort of dynamic. | ||
|
||
- Heroku Postgres as the database | ||
The search is implemented client-side, via [Pagefind](https://pagefind.app/). | ||
|
||
- Heroku Redis for rails caching | ||
A few scheduled GitHub workflows keep the content up to date: | ||
|
||
- Heroku scheduler for cron jobs | ||
- `update-git-version-and-manual-pages` and `update-download-data` (pick | ||
up newly released git versions) | ||
|
||
The nightly scheduled jobs are: | ||
- `update-translated-manual-pages` (fetch and format translated manual | ||
pages from the jnavila/git-html-l10n repository) | ||
|
||
- `rake downloads` (pick up newly released git versions) | ||
|
||
- `rake preindex` (pull in and format manpages for released git | ||
versions) | ||
|
||
- `rake remote_genbook2` (pull in and format progit2 book content, | ||
- `update-book` (fetch and format progit2 book content, | ||
including translations) | ||
|
||
It should be safe to run any of those jobs more frequently. E.g., if you | ||
know there's a new Git release out, then: | ||
|
||
heroku run rake preindex | ||
heroku run rake downloads | ||
|
||
will get it on the site without waiting for the nightly run. | ||
|
||
Merges to the `main` branch on GitHub auto-deploy to Heroku, so unless | ||
you're doing something tricky you generally shouldn't need to manually | ||
deploy. | ||
|
||
Note that some of the formatting of manpages and book content happens | ||
when they are imported by the rake tasks. So after fixing some | ||
formatting and deploying, the rake jobs may need to be re-run with a | ||
special flag to re-import (see the individual tasks for details). | ||
|
||
|
||
## Cloudflare | ||
|
||
We get enough requests that it's easy to overwhelm the single Heroku | ||
dyno. So we have Cloudflare sitting in front of it, aggressively caching | ||
everything. That also should make the site faster to serve to regions | ||
far away from Heroku's servers. | ||
|
||
The Cloudflare setup is mostly pretty simple: | ||
These workflows are also marked as `workflow_dispatch`, i.e. they can be run | ||
manually (e.g. to update the download links just after Git for Windows | ||
published a new release). | ||
|
||
- they serve DNS for the whole domain (that's where they insert the CDN | ||
magic) | ||
|
||
- Cloudflare provides `https://` support to the user. Obviously the | ||
site is totally open and doesn't have any sensitive data, so this is | ||
really more about integrity. The certificate is generated by | ||
Cloudflare (and requires SNI on the browser side). | ||
|
||
- the Cloudflare connection to Heroku is passed over TLS; they provide an | ||
"internal" certificate that we ask Heroku to use, so the connection | ||
is secured between the two (again, mostly for integrity) | ||
|
||
- the most exotic config is that we use "page rules" to mark the whole | ||
site to be cached aggressively, regardless of any caching headers | ||
sent from Heroku. This is a bit of a hack, but there's very little on | ||
the site that can't be cached (which is perhaps a sign that the rails | ||
setup needs to be tweaked to send more reasonable caching headers, | ||
but this has been simple and effective so far). | ||
|
||
There are a few special page rules to lift this caching for cases | ||
where we do server-side logic (e.g., | ||
https://github.com/git/git-scm.com/issues/1129#issuecomment-363067019"), | ||
but the long-term goal is to push that logic onto the client side as | ||
much as possible. | ||
|
||
Both domains (c.f., the section on [DNS](#DNS) below) are owned by a | ||
Cloudflare "Team", and membership of that team is required to | ||
administrate the domains. Similar to the Heroku setup, you can ask to | ||
join this team if you wish to help out. The information about the team | ||
setup is in escrow with the Git PLC at Software Freedom Conservancy. | ||
Cloudflare provides the project with enough credits that it doesn't cost | ||
anything (though we're not using very many features, so it's possible | ||
that a free account would be sufficient, too). | ||
|
||
## Bonsai Elasticsearch | ||
|
||
The search functionality on the site is served by an elasticsearch | ||
cluster. The index can be populated by running `rake search_index` | ||
(manpages) and `rake search_index_book` (book) on Heroku (we only index | ||
the manpages and book). This perhaps should be run nightly, or at least | ||
after pulling in new content, but it currently isn't done automatically. | ||
|
||
The elasticsearch cluster is provided by Bonsai via their Heroku plugin. | ||
Our needs are larger than their free tier provides, but we receive | ||
credits from them that provide the service for free. | ||
Merges to the `gh-pages` branch on GitHub auto-deploy to GitHub Pages via the | ||
`deploy` GitHub workflow. | ||
|
||
Note that some of the formatting of manual pages and book content happens | ||
when they are imported by the GitHub workflows. Therefore, whenever there are | ||
changes to the scripts/workflows/automation that affect formatting, these | ||
workflows may need to be triggered using the force-rebuild flag to be toggled | ||
(see the individual workflows for details). | ||
|
||
## DNS | ||
|
||
The actual DNS service is provided by Cloudflare (see above). The domain | ||
itself is registered with Gandi, and is owned by the project via | ||
Software Freedom Conservancy. Funds for the registration are provided | ||
from the Git project's Conservancy funds, and both the Git PLC and | ||
Conservancy have credentials to modify the setup. | ||
The actual DNS service is provided by Cloudflare. The domain itself is | ||
registered with Gandi, and is owned by the project via Software Freedom | ||
Conservancy. Funds for the registration are provided from the Git project's | ||
Conservancy funds, and both the Git PLC and Conservancy have credentials to | ||
modify the setup. | ||
|
||
Note that we own both git-scm.com and git-scm.org; the latter redirects | ||
to the former. | ||
|
||
|
||
## Manual Intervention | ||
|
||
The site mostly just runs without intervention: | ||
|
||
- code merged to `main` is auto-deployed | ||
- code merged to `gh-pages` is auto-deployed | ||
|
||
- new git versions are detected daily and manpages and download links | ||
- new git versions are detected daily and manual pages and download links | ||
updated | ||
|
||
- book updates (including translations) are picked up daily | ||
|
||
There are a few tasks that still need to be handled by a human: | ||
|
||
- new images added to the book have to be copied manually from | ||
progit/progit2 | ||
|
||
- new languages for book translations need to be added to | ||
`lib/tasks/book2.rake` | ||
`script/book.rb` | ||
|
||
- forced re-imports of content (e.g., a formatting fix to imported | ||
manpages) must be triggered manually | ||
- forced re-imports of content (e.g., when fixing formatting in the | ||
imported manual pages) must be triggered manually with `force-rebuild` | ||
toggled |