Skip to content

Commit

Permalink
Add many docs
Browse files Browse the repository at this point in the history
  • Loading branch information
ThetaSinner committed Jan 6, 2025
1 parent 9264567 commit 05a7d49
Show file tree
Hide file tree
Showing 9 changed files with 1,446 additions and 3 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
/target
/output-*
.cargo

crates/dht/DOCS.md
155 changes: 155 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

38 changes: 36 additions & 2 deletions crates/api/src/doc/glossary.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,40 @@
//! A time slice is a period of time. It is defined to be an interval from a start time,
//! inclusive, up to some end time, exclusive.
//!
//! Time slices are used to group data by time, hashes of the data are combined so that a single
//! hash describing that data can be used to check for consistency of that data with other peers.
//! Time slices come in two flavours: full and partial. A full time slice is a time slice that
//! covers a time interval that is maximally large, according to configuration. This represents an
//! amount of historical time. A partial time slice represents a time interval that is smaller than
//! a full time slice and is used to represent a time interval that is more recent.
//!
//! ## DHT
//! A Distributed Hash Table (DHT) is a data structure that is keyed by a hash value and distributed
//! across a network of nodes. The DHT is used to store and retrieve data based on the hash key.
//!
//! ## DHT: Sector
//! In the context of the DHT, a sector is a range of hash locations. It is equivalent to a DhtArc
//! but the sectors are equally-sized and cover the entire hash space without overlapping.
//!
//! ## DHT: Disc
//! In the context of the DHT, the disc is the set of data with a hash at any location, that have
//! a timestamp before the end of the most recent full time slice.
//!
//! ## DHT: Ring
//! In the context of the DHT, a ring is the set of data with a hash at any location, that have a
//! timestamp between the start and end times of a given partial time slice.
//!
//! ## Combined hash
//! A combined hash is a value that is computed by combining hashes. For example, taking all the
//! data hashes within a sector and a full time slice and combining them to produce a single hash.
//!
//! ## Top hash
//! A top hash is a hash value that is computed by combining some combined hashes. For example,
//! taking all the combined hashes for a ring and combining them to produce a single hash.
//!
//! The natural names are given to these in the code. For example "disc top hash" is the top hash
//! formed by combining full time slice combined hashes and then combining those over sectors.
//!
//! Note that a top hash is not necessarily complete. When computing a disc top hash for example,
//! only the portion of the location space that is covered by both parties who are involved in
//! computing a DHT diff is considered. This is still referred to as a disc top hash in the code,
//! although in reality it is actually only a part of the disc.
//!
3 changes: 3 additions & 0 deletions crates/dht/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,6 @@ kitsune2_test_utils = { workspace = true }

tokio = { workspace = true, features = ["macros", "rt"] }
rand = { workspace = true }

[build-dependencies]
handlebars = "6.3.0"
59 changes: 59 additions & 0 deletions crates/dht/DOCS.md.hbs
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
A distributed hash table (DHT) implementation for use in Kitsune2.

The DHT model partitions data by location and time. The location is derived from the hash of the data, and is a 32-bit
number. The time is a 64-bit number, representing a number of microseconds since the Unix epoch.

The model first organises by location, then within a set of locations, it organises by time. The way that the model is
partitioned is designed to permit computing a diff against another DHT model. This is the primary purpose of this crate,
to enable efficient syncing of DHT models between agents.

You can think of the DHT as a graph, where the vertical axis represents location and the horizontal axis represents time.
Every piece of data can be represented as a point on this graph:

<details>
<summary>DHT model (graph representation)</summary>
{{{ dht-graph }}}
</details>

Notice that the partitioning introduces some new terminology:
- **Sector**: The location space is split into equally-sized chunks called sectors. A sector spans the entire time axis.
- **Time slice**: The horizontal axis is split into time slices. These come in two flavours, full and partial. The light
blue slices are full, and the dark blue slices are partial. The full time slices are fixed-size, while the partial
time slices start at half the size of a full time slice and halve in size towards the current time.
- **Disc**: The disc is the light blue area of the graph. It is defined by the set of full time slices and the full set
of sectors.
- **Ring**: A ring is one vertical section in the dark blue area of the graph. It is defined by a partial time slice and
the full set of sectors.

The reason for the geometric naming is that the locations are thought of as a circle, with the highest location being
adjacent to the lowest location. This is not a perfect analogy, but it is useful to have terminology that describes
areas of the graph in a consistent way. See the following diagram:

<details>
<summary>DHT model (circle representation)</summary>
{{{ dht-circle }}}
</details>

With time starting at the center of the circle and moving outwards, and location segments delimited by straight lines
radiating from the center. Then we can see the definition of the terms above more clearly. The disc is the light blue
area, which is a disc in the geometric sense, bounded by the end of the last full time slice. The rings are the dark
blue area, delimited by the end of the last partial time slice and the current time. The time slices are the concentric
circles, delimited by white lines, with the light blue ones being full and the dark blue ones being partial.

Each contained area within this shape can be represented by a single "combined hash". A "combined hash" is simply a
combination of all the hashes of data within that area. A "combined hash" is generally pre-computed or updated
incrementally, as opposed to a "top hash". The term "top hash" is used to refer to a combination of "combined hash"es.
For example, computing a "top hash" of the disc would involve taking all the combined hashes for full time slices in
each sector and combining them, then combining all of those across the sectors.

There are detailed explanations of the pieces of the model in each module. To aid with navigation, here is a brief
overview of the modules:
- The model starts in the [time] module. This module defines time slices, and handles combining hashes
within time slices. It can also combine hashes to yield top hashes.
- The next step in the model is the [hash] module. This module defines the partition over location, and
uses the time module for its inner state.
- The supporting module [arc_set] defines a set of agent arcs. This module provides a simple
representation of the overlap between the storage [DhtArc](kitsune2_api::DhtArc)s' of two agents.
- The top level module is the [dht] module. This module uses the hash module as its inner state, and adds
the logic for computing a diff between two DHT models. This is a multistep process, and is designed to balance the
amount of data that must be sent, with the number of round-trips required to sync the models.
Loading

0 comments on commit 05a7d49

Please sign in to comment.