All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- 12 find-any-like adaptors on
ParallelIteratorExt
, that short-circuit once a match is found:all()
,any()
,eq()
,eq_by_key()
,eq_by_keys()
,find_any()
,find_map_any()
,ne()
,ne_by_key()
,ne_by_keys()
,try_for_each()
andtry_for_each_init()
. A nightly API is available fortry_for_each()
andtry_for_each_init()
for all types implementing thestd::ops::Try
trait. - 10 find-first-like adaptors on
ParallelIteratorExt
, that restrict the search to previous items once a match is found:cmp()
,cmp_by()
,cmp_by_key()
,cmp_by_keys()
,find_first()
,find_map_first()
,partial_cmp()
,partial_cmp_by()
,partial_cmp_by_key()
andpartial_cmp_by_keys()
. - 4 additional adaptors on
ParallelIteratorExt
:for_each_init()
,map_init()
,product()
andsum()
. - To support these, the
Accumulator
trait and 3 methods onParallelIterator
:iter_pipeline()
,short_circuiting_pipeline()
andupper_bounded_pipeline()
. - The
step_by()
adaptor onParallelSourceExt
. - The
num_threads()
getter onThreadPool
, to query the number of worker threads that have been spawned. - A new
ParallelAdaptor
trait, thatCloned
,Copied
,Filter
,FilterMap
,Inspect
andMap
implement as an intermediate step beforeParallelIterator
. - More recommendations for the thread pool configuration in the README.
- Tests with the thread sanitizer (TSAN).
- The nightly APIs for
ParallelIterator
on ranges have been migrated to the latest version of thestd::iter::Step
trait. - Slightly changed the format of logging statements.
- Internal synchronization primitives are now aligned to a cache line via the
CachePadded
wrapper provided by thecrossbeam-utils
crate. - Micro-optimized the layout of the shared context with worker threads, using a
single
Arc
instead of oneArc
per field. - Code coverage now also tracks coverage of the nightly APIs, and logging statements up to the debug level.
- Reduced the number of iterations in tests to make them complete faster.
- Support for mutable slices (producing mutable references to the items) and ranges (producing integers) as parallel sources.
- The
ParallelSource
trait, as an intermediate interface between parallel sources and iterators, with 7 adaptors:chain()
,enumerate()
,rev()
,skip()
,skip_exact()
,take()
andtake_exact()
. - 3 adaptors on tuples (up to 12 elements) of parallel sources, via the
ZipableSource
trait:zip_eq()
,zip_max()
andzip_min()
. - A
nightly
feature for experimental APIs available only with a Rust nightly toolchain. - A benchmark for element-wise addition of slices.
- Thread pools are now static rather than scoped: replaced the
ThreadPoolBuilder::scope()
function byThreadPoolBuilder::build()
and removed the'scope
lifetime parameter of the correspondingThreadPool
struct. - The
par_iter()
and similar methods don't take a thread pool parameter anymore: a thread pool is attached later via the newwith_thread_pool()
method.
- The
IntoParallelIterator
trait, in favor of the new traitsIntoParallelSource
,IntoParallelRefSource
andIntoParallelRefMutSource
.
- The benchmarks now correctly compute the throughput (they were underestimating the size by one item beforehand).
- A parallel iterator trait implemented for slices, with 14 adaptors:
cloned()
,copied()
,filter()
,filter_map()
,for_each()
,inspect()
,map()
,max()
,max_by()
,max_by_key()
,min()
,min_by()
,min_by_key()
andreduce()
. - A thread pool configuration to control whether worker threads are pinned to CPUs.
- The
num_threads
configuration inThreadPoolBuilder
now also has anAvailableParallelism
variant to use the number of threads reported bystd::thread::available_parallelism()
. - The
ThreadPool
struct doesn't have an'env
lifetime anymore, only a'scope
lifetime. - Tests for non-
Sync
types now useCell
rather than a synthetic type, allowing them to run on stable Rust. - Tests now use the available parallelism (rather than a fixed number of threads) and don't pin worker threads to CPUs. Miri tests are configured to use 4 CPUs. Benchmarks still use a set number of threads and CPU pinning.
- The
ThreadPool::pipeline()
function from the public API.
- Clarified that the input length cannot exceed
u32::MAX
in theWorkStealing
configuration. Trying to use a larger input now panics rather than processing a smaller subset of items. - Fixed possible integer overflows in
WorkStealing
mode if some intermediate results would exceedu32::MAX
. This could notably have happened if the input length multiplied bymax(2, number of threads)
would exceedu32::MAX
.
- Support for running different pipelines on the same thread pool.
- Support for using a local input slice, and capturing local variables in the pipeline functions.
- Benchmarks of summing a slice of integers, compared with serial iterators and with rayon, using divan and criterion.
- The main API to create a thread pool. There is now a
ThreadPoolBuilder
from which to spawn a scoped thread pool. - The main API to run a pipeline on a thread pool. The input slice and pipeline
functions are not captured by the whole thread pool anymore, but are local
parameters of the
pipeline()
function.
- Initial version.