ocaml-multicore · lyrm · Dec 2, 2024 · Nov 4, 2024 · Nov 28, 2024 · Dec 2, 2024
diff --git a/README.md b/README.md
diff --git a/bench/README.md b/bench/README.md
@@ -1,7 +1,67 @@
-Benchmarks for Saturn
+# Benchmarks for Saturn
 
-# General usage
+Benchmarks are written using [multicore-bench](https://github.com/ocaml-multicore/multicore-bench).
 
-Execute `make bench` from root of the repository to run the standard set of
-benchmarks. The output is in JSON, as it is intended to be consumed by
-[current-bench](https://bench.ci.dev/ocaml-multicore/saturn/branch/main/benchmark/default).
+## General Usage
+
+To execute benchmarks, you can run:
+```shell
+make bench
+```
+
+Alternatively, you can use:
+```shell
+dune exec -- ./bench/main.exe
+```
+
+It is recommended to run the benchmarks with a budget of at least `1` second (as done with `make bench`):
+```shell
+dune exec -- ./bench/main.exe -budget 1
+```
+
+You can also print a brief version of the benchmarks with the `-brief` option. Additionally, it is possible to run only selected benchmarks by providing a part of the benchmark names. You can get the list of available benchmarks with the `--help` option.
+
+For example, running:
+```shell
+dune exec -- ./bench/main.exe --help
+```
+returns:
+
+```
+Usage: main.exe <option>* filter*
+
+The filters are regular expressions for selecting benchmarks to run.
+
+Benchmarks:
+
+  Saturn Queue
+  Saturn Queue_unsafe
+  Saturn Bounded_Queue
+  Saturn Bounded_Queue_unsafe
+  Saturn Single_prod_single_cons_queue
+  Saturn Size
+  Saturn Skiplist
+  Saturn Htbl
+  Saturn Htbl_unsafe
+  Saturn Stack
+  Saturn Work_stealing_deque
+  Saturn Bounded_Stack
+
+Options:
+
+  -budget seconds   Budget for a benchmark
+  -debug            Print progress information to help debugging
+  -diff path.json   Show diff against specified base results
+  -brief            Show brief human-readable results
+  -help             Show this help message
+  --help            Show this help message
+```
+
+For example, if you want to run only the `Htbl` benchmarks to compare the performance of `Htbl` and its unsafe version `Htbl_unsafe`, you can run:
+```shell
+dune exec -- ./bench/main.exe -budget 1 -brief Htbl
+```
+
+## Current-bench
+
+The output is in JSON format, as it is intended to be consumed by [current-bench](https://bench.ci.dev/ocaml-multicore/saturn/branch/main/benchmark/default).
diff --git a/bench/main.ml b/bench/main.ml
@@ -1,7 +1,7 @@
 let benchmarks =
   [
-    ("Saturn Queue", Bench_queue.Safe.run_suite);
-    ("Saturn Queue_unsafe", Bench_queue.Unsafe.run_suite);
+    ("Saturn Queue (MS)", Bench_queue.Safe.run_suite);
+    ("Saturn Queue_unsafe (MS)", Bench_queue.Unsafe.run_suite);
     ("Saturn Bounded_Queue", Bench_bounded_queue.Safe.run_suite);
     ("Saturn Bounded_Queue_unsafe", Bench_bounded_queue.Unsafe.run_suite);
     ("Saturn Single_prod_single_cons_queue", Bench_spsc_queue.run_suite);

diff --git a/doc/composability.md b/doc/composability.md
@@ -0,0 +1,93 @@
+# About Composability
+
+Composability refers to the ability to combine functions while preserving their properties. For Saturn data structures, the expected properties include atomic consistency (or linearizability) and progress guarantees, such as lock-freedom. Unfortunately, Saturn's data structures are not composable.
+
+Let’s illustrate this with an example. Suppose we want to implement a `split` function for Saturn's queue. The goal is to have multiple domains simultaneously split a source queue into two destination queues based on a predicate. We expect the `split` function to be linearizable, meaning the order of elements in the source queue should be preserved in the destination queues. For instance, `split [0; 1; 2; 3; 4]`, with a predicate that returns `true` for even numbers and `false` otherwise, should produce `[0; 2; 4]` and `[1; 3]`.
+
+Here’s how we can implement `split` using Saturn’s queue functions:
+
+```ocaml
+let split source pred true_dest false_dest : bool =
+  match Queue.pop source with
+  | None -> false
+  | Some elt ->
+      if pred elt then Queue.push true_dest elt 
+      else Queue.push false_dest elt;
+      true
+```
+
+Domains run the `split` function in parallel until the source queue is empty:
+
+```ocaml
+let work source pred true_dest false_dest =
+  while split source pred true_dest false_dest do
+    ()
+  done
+```
+
+To test this, we can use the following function:
+
+```ocaml
+let test input =
+  (* Initialization *)
+  let true_dest = Queue.create () in
+  let false_dest = Queue.create () in
+  let source = Queue.create () in
+  List.iter (Queue.push source) input;
+
+  let barrier = Barrier.create 2 in
+
+  (* Predicate: split by parity *)
+  let pred elt = elt mod 2 = 0 in
+
+  let d1 =
+    Domain.spawn (fun () ->
+        Barrier.await barrier;
+        work source pred true_dest false_dest)
+  in
+  let d2 =
+    Domain.spawn (fun () ->
+        Barrier.await barrier;
+        work source pred true_dest false_dest)
+  in
+  Domain.join d1;
+  Domain.join d2;
+  (get_content true_dest, get_content false_dest)
+```
+
+For an input of `[0; 1; 2; 3; 4]`, the expected output is `([0; 2; 4], [1; 3])`. Most of the time, the function will return the correct result, but occasionally, the queues may appear unsorted.
+
+To measure how often this issue occurs, we can define a `check` function that runs `test` multiple times and counts the number of incorrect results:
+
+```ocaml
+let check inputs max_round =
+  let expected_even = List.filter (fun elt -> elt mod 2 = 0) inputs in
+  let expected_odd = List.filter (fun elt -> elt mod 2 = 1) inputs in
+  let rec loop round bugged =
+    let even, odd = test inputs in
+    if round >= max_round then bugged
+    else if even <> expected_even || odd <> expected_odd then
+      loop (round + 1) (bugged + 1)
+    else loop (round + 1) bugged
+  in
+  Format.printf "%d/%d rounds are bugged.@." (loop 0 0) max_round
+```
+
+Running this function:
+
+```ocaml
+# check [0;1;2;3;4;5;6] 1000;;
+35/1000 rounds are bugged.
+```
+
+As expected, the function is not working correctly. The reason is that our `split` function is not linearizable. While we could make it atomic by introducing a mutex, doing so would sacrifice the progress guarantees of the underlying queue functions, i.e. lock-freedom.
+
+## Extending Data Structures
+
+Note that in the case above, we transfer from and to a queue of the same `int Saturn.Queue.t` type. It may possible to write a `val transfer : t -> t -> unit` function with the right properties and add it directly to `Saturn.Queue` module.
+
+If you think of any such functions, that is useful and missing, let's us know by creating an issue!
+
+## Composable Parallelism-Safe Data Structures
+
+If you need composable parallelism-safe data structures, you can check [kcas_data](https://github.com/ocaml-multicore/cas#programming-with-transactional-data-structures).
diff --git a/doc/domain-role.md b/doc/domain-role.md
@@ -0,0 +1,64 @@
+# Data Structures with Domain Roles
+
+Some provided data structures are designed to work with specific domain configurations. These restrictions optimize their implementation, but failing to respect them may compromise safety properties. These limitations are clearly indicated in the documentation and often reflected in the name of the data structure itself. For instance, a single-consumer queue must have only one domain performing `pop` operations at any given time.
+
+## Example: `Single_prod_single_cons_queue`
+
+As the name suggests, a `Single_prod_single_cons_queue` is designed to be used with exactly one domain performing `push` operations (the producer) and one domain performing `pop` operations (the consumer) at the same time. If multiple domains attempt to `push` (or `pop`) simultaneously, it will break the queue’s safety guarantees and likely lead to unexpected behavior.
+
+Here’s an example of what happens when the queue is misused by giving it an inappropriate alias:
+
+```ocaml
+module Queue = Saturn.Single_prod_single_cons_queue
+```
+
+In this case, each domain will attempt to `push` 10 times in parallel:
+
+```ocaml
+let work id barrier q =
+  Barrier.await barrier;
+  for i = 0 to 9 do
+    Queue.try_push q id |> ignore
+  done
+```
+
+Our `test` function initializes the queue and creates two domains that simultaneously attempt to `push`:
+
+```ocaml
+let test () =
+  let q = Queue.create ~size_exponent:5 in
+  let barrier = Barrier.create 2 in
+  let d1 = Domain.spawn (fun () -> work 1 barrier q) in
+  let d2 = Domain.spawn (fun () -> work 2 barrier q) in
+  Domain.join d1;
+  Domain.join d2;
+  q
+```
+
+To inspect the contents of the queue after the test, we define a function that extracts all elements into a list:
+
+```ocaml
+let get_content q =
+  let rec loop acc =
+    match Queue.pop_opt q with
+    | None -> acc
+    | Some a -> loop (a :: acc)
+  in
+  List.rev (loop [])
+```
+
+Let’s run the test:
+
+```ocaml
+test () |> get_content;;
+- : int list = [2; 1; 1; 1; 1; 1; 1; 1; 1; 1; 2]
+```
+
+## Analysis
+
+The resulting queue contains only 11 elements, despite both domains attempting to `push` 10 times each. This happens because the implementation assumes that only one domain will perform `push` operations at any time. Without this assumption, the implementation would need to add synchronization mechanisms, which are intentionally omitted for performance reasons. Consequently, bad interleaving of operations occurs, leading to lost `push`es.
+
+
+## Conclusion 
+
+This example highlights the importance of adhering to the intended usage of data structures. While these restrictions allow for highly optimized implementations, misusing the data structure—such as having multiple producers or consumers in this case—can lead to unpredictable bugs. Always refer to the documentation and use the appropriate data structure for your concurrency needs to ensure both correctness and performance.