-
Notifications
You must be signed in to change notification settings - Fork 44
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Co-authored-by: Pavel Semyonov <[email protected]> Co-authored-by: Anna Balaeva <[email protected]> Co-authored-by: TarantoolBot <[email protected]> Co-authored-by: Kseniia Antonova <[email protected]>
- Loading branch information
1 parent
ffefe88
commit b02b66e
Showing
870 changed files
with
155,437 additions
and
5,016 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
name: Push POTs | ||
on: | ||
push: | ||
branches: | ||
- '3.0' | ||
permissions: | ||
contents: write | ||
jobs: | ||
generate-pot: | ||
runs-on: ubuntu-latest | ||
container: tarantool/doc-builder:fat-4.3 | ||
steps: | ||
- uses: actions/checkout@v3 | ||
|
||
- name: Generate Portable Object Templates | ||
run: | | ||
cmake . | ||
make update-pot | ||
- name: Commit generated pots | ||
run: | | ||
git config --global --add safe.directory /__w/doc/doc | ||
git config --global user.name 'TarantoolBot' | ||
git config --global user.email '[email protected]' | ||
if [[ $(git status) =~ .*"nothing to commit".* ]]; then | ||
echo "status=nothing-to-commit" | ||
exit 0 | ||
fi | ||
git add locale/en | ||
git commit -m "updated pot" | ||
git push origin 3.0 | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,6 +29,7 @@ webhooks/.env | |
|
||
locale/* | ||
!locale/ru | ||
!locale/en | ||
|
||
# redundant folders created by sphinx | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,126 +1,161 @@ | ||
.. _admin-disaster_recovery: | ||
|
||
================================================================================ | ||
Disaster recovery | ||
================================================================================ | ||
================= | ||
|
||
The minimal fault-tolerant Tarantool configuration would be a | ||
:ref:`replication cluster<replication-topologies>` | ||
The minimal fault-tolerant Tarantool configuration would be a :ref:`replica set <replication-architecture>` | ||
that includes a master and a replica, or two masters. | ||
The basic recommendation is to configure all Tarantool instances in a replica set to create :ref:`snapshot files <index-box_persistence>` on a regular basis. | ||
|
||
The basic recommendation is to configure all Tarantool instances in a cluster to | ||
create :ref:`snapshot files <index-box_persistence>` at a regular basis. | ||
Here are action plans for typical crash scenarios. | ||
|
||
Here follow action plans for typical crash scenarios. | ||
|
||
.. _admin-disaster_recovery-master_replica: | ||
|
||
-------------------------------------------------------------------------------- | ||
Master-replica | ||
-------------------------------------------------------------------------------- | ||
-------------- | ||
|
||
Configuration: One master and one replica. | ||
.. _admin-disaster_recovery-master_replica_manual_failover: | ||
|
||
Problem: The master has crashed. | ||
Master crash: manual failover | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Your actions: | ||
**Configuration:** master-replica (:ref:`manual failover <replication-master_replica_bootstrap>`). | ||
|
||
1. Ensure the master is stopped for good. For example, log in to the master | ||
machine and use ``systemctl stop tarantool@<instance_name>``. | ||
**Problem:** The master has crashed. | ||
|
||
2. Switch the replica to master mode by setting | ||
:ref:`box.cfg.read_only <cfg_basic-read_only>` parameter to *false* and let | ||
the load be handled by the replica (effective master). | ||
**Actions:** | ||
|
||
3. Set up a replacement for the crashed master on a spare host, with | ||
:ref:`replication <cfg_replication-replication>` parameter set to replica | ||
(effective master), so it begins to catch up with the new master’s state. | ||
The new instance should have :ref:`box.cfg.read_only <cfg_basic-read_only>` | ||
parameter set to *true*. | ||
1. Ensure the master is stopped. | ||
For example, log in to the master machine and use ``tt stop``. | ||
|
||
You lose the few transactions in the master | ||
:ref:`write ahead log file <index-box_persistence>`, which it may have not | ||
transferred to the replica before crash. If you were able to salvage the master | ||
.xlog file, you may be able to recover these. In order to do it: | ||
2. Configure a new replica set leader using the :ref:`<replicaset_name>.leader <configuration_reference_replicasets_name_leader>` option. | ||
|
||
1. Find out the position of the crashed master, as reflected on the new master. | ||
3. Reload configuration on all instances using :ref:`config:reload() <config-module>`. | ||
|
||
a. Find out instance UUID from the crashed master :ref:`xlog <internals-wal>`: | ||
4. Make sure that a new replica set leader is a master using :ref:`box.info.ro <box_introspection-box_info>`. | ||
|
||
.. code-block:: console | ||
5. On a new master, :ref:`remove a crashed instance from the '_cluster' space <replication-remove_instances-remove_cluster>`. | ||
|
||
$ head -5 *.xlog | grep Instance | ||
Instance: ed607cad-8b6d-48d8-ba0b-dae371b79155 | ||
6. Set up a replacement for the crashed master on a spare host. | ||
|
||
b. On the new master, use the UUID to find the position: | ||
See also: :ref:`Performing manual failover <replication-controlled_failover>`. | ||
|
||
.. code-block:: tarantoolsession | ||
|
||
tarantool> box.info.vclock[box.space._cluster.index.uuid:select{'ed607cad-8b6d-48d8-ba0b-dae371b79155'}[1][1]] | ||
--- | ||
- 23425 | ||
<...> | ||
.. _admin-disaster_recovery-master_replica_auto_failover: | ||
|
||
2. Play the records from the crashed .xlog to the new master, starting from the | ||
new master position: | ||
Master crash: automated failover | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
a. Issue this request locally at the new master's machine to find out | ||
instance ID of the new master: | ||
**Configuration:** master-replica (:ref:`automated failover <replication-bootstrap-auto>`). | ||
|
||
.. code-block:: tarantoolsession | ||
**Problem:** The master has crashed. | ||
|
||
tarantool> box.space._cluster:select{} | ||
--- | ||
- - [1, '88580b5c-4474-43ab-bd2b-2409a9af80d2'] | ||
... | ||
**Actions:** | ||
|
||
b. Play the records to the new master: | ||
1. Use ``box.info.election`` to make sure a new master is elected automatically. | ||
|
||
.. code-block:: console | ||
2. On a new master, :ref:`remove a crashed instance from the '_cluster' space <replication-remove_instances-remove_cluster>`. | ||
|
||
3. Set up a replacement for the crashed master on a spare host. | ||
|
||
See also: :ref:`Testing automated failover <replication-automated-failover-testing>`. | ||
|
||
|
||
.. _admin-disaster_recovery-master_replica_data_loss: | ||
|
||
Data loss | ||
~~~~~~~~~ | ||
|
||
**Configuration:** master-replica. | ||
|
||
**Problem:** Some transactions are missing on a replica after the master has crashed. | ||
|
||
**Actions:** | ||
|
||
You lose a few transactions in the master | ||
:ref:`write-ahead log file <index-box_persistence>`, which may have not | ||
transferred to the replica before the crash. If you were able to salvage the master | ||
``.xlog`` file, you may be able to recover these. | ||
|
||
1. Find out instance UUID from the crashed master :ref:`xlog <internals-wal>`: | ||
|
||
.. code-block:: console | ||
$ head -5 var/lib/instance001/*.xlog | grep Instance | ||
Instance: 9bb111c2-3ff5-36a7-00f4-2b9a573ea660 | ||
2. On the new master, use the UUID to find the position: | ||
|
||
.. code-block:: tarantoolsession | ||
app:instance002> box.info.vclock[box.space._cluster.index.uuid:select{'9bb111c2-3ff5-36a7-00f4-2b9a573ea660'}[1][1]] | ||
--- | ||
- 999 | ||
... | ||
3. :ref:`Play the records <tt-play>` from the crashed ``.xlog`` to the new master, starting from the | ||
new master position: | ||
|
||
.. code-block:: console | ||
$ tt play 127.0.0.1:3302 var/lib/instance001/00000000000000000000.xlog \ | ||
--from 1000 \ | ||
--replica 1 \ | ||
--username admin --password secret | ||
$ tt play <new_master_uri> <xlog_file> --from 23425 --replica 1 | ||
.. _admin-disaster_recovery-master_master: | ||
|
||
-------------------------------------------------------------------------------- | ||
Master-master | ||
-------------------------------------------------------------------------------- | ||
------------- | ||
|
||
**Configuration:** :ref:`master-master <replication-bootstrap-master-master>`. | ||
|
||
**Problem:** one master has crashed. | ||
|
||
Configuration: Two masters. | ||
**Actions:** | ||
|
||
Problem: Master#1 has crashed. | ||
1. Let the load be handled by another master alone. | ||
|
||
Your actions: | ||
2. Remove a crashed master from a replica set. | ||
|
||
1. Let the load be handled by master#2 (effective master) alone. | ||
3. Set up a replacement for the crashed master on a spare host. | ||
Learn more from :ref:`Adding and removing instances <replication-master-master-add-remove-instances>`. | ||
|
||
2. Follow the same steps as in the | ||
:ref:`master-replica <admin-disaster_recovery-master_replica>` recovery scenario | ||
to create a new master and salvage lost data. | ||
|
||
.. _admin-disaster_recovery-data_loss: | ||
|
||
-------------------------------------------------------------------------------- | ||
Data loss | ||
-------------------------------------------------------------------------------- | ||
Master-replica/master-master: data loss | ||
--------------------------------------- | ||
|
||
**Configuration:** master-replica or master-master. | ||
|
||
**Problem:** Data was deleted at one master and this data loss was propagated to the other node (master or replica). | ||
|
||
**Actions:** | ||
|
||
1. Put all nodes in read-only mode. | ||
Depending on the :ref:`replication.failover <configuration_reference_replication_failover>` mode, this can be done as follows: | ||
|
||
- ``manual``: change a replica set leader to ``null``. | ||
- ``election``: set :ref:`replication.election_mode <configuration_reference_replication_election_mode>` to ``voter`` or ``off`` at the replica set level. | ||
- ``off``: set ``database.mode`` to ``ro``. | ||
|
||
Configuration: Master-master or master-replica. | ||
Reload configurations on all instances using the ``reload()`` function provided by the :ref:`config <config-module>` module. | ||
|
||
Problem: Data was deleted at one master and this data loss was propagated to the | ||
other node (master or replica). | ||
2. Turn off deletion of expired checkpoints with :doc:`/reference/reference_lua/box_backup/start`. | ||
This prevents the Tarantool garbage collector from removing files | ||
made with older checkpoints until :doc:`/reference/reference_lua/box_backup/stop` is called. | ||
|
||
The following steps are applicable only to data in memtx storage engine. | ||
Your actions: | ||
3. Get the latest valid :ref:`.snap file <internals-snapshot>` and | ||
use ``tt cat`` command to calculate at which LSN the data loss occurred. | ||
|
||
1. Put all nodes in :ref:`read-only mode <cfg_basic-read_only>` and disable | ||
deletion of expired checkpoints with :doc:`/reference/reference_lua/box_backup/start`. | ||
This will prevent the Tarantool garbage collector from removing files | ||
made with older checkpoints until :doc:`/reference/reference_lua/box_backup/stop` is called. | ||
4. Start a new instance and use :ref:`tt play <tt-play>` command to | ||
play to it the contents of ``.snap`` and ``.xlog`` files up to the calculated LSN. | ||
|
||
2. Get the latest valid :ref:`.snap file <internals-snapshot>` and | ||
use ``tt cat`` command to calculate at which lsn the data loss occurred. | ||
5. Bootstrap a new replica from the recovered master. | ||
|
||
3. Start a new instance (instance#1) and use ``tt play`` command to | ||
play to it the contents of .snap/.xlog files up to the calculated lsn. | ||
.. NOTE:: | ||
|
||
4. Bootstrap a new replica from the recovered master (instance#1). | ||
The steps above are applicable only to data in the memtx storage engine. |
Oops, something went wrong.