diff --git a/dev/api-dagger/functions/index.html b/dev/api-dagger/functions/index.html index 192fd81d3..894a45b82 100644 --- a/dev/api-dagger/functions/index.html +++ b/dev/api-dagger/functions/index.html @@ -1,15 +1,15 @@ -Functions and Macros · Dagger.jl

Dagger Functions

Task Functions/Macros

Dagger.@spawnMacro
@spawn [opts] f(args...) -> Thunk

Convenience macro like Dagger.@par, but eagerly executed from the moment it's called (equivalent to spawn).

See the docs for @par for more information and usage examples.

source
Dagger.spawnFunction
spawn(f, args...; kwargs...) -> EagerThunk

Spawns a task with f as the function, args as the arguments, and kwargs as the keyword arguments, returning an EagerThunk. Uses a scheduler running in the background to execute code.

source
Dagger.delayedFunction
delayed(f, options=Options())(args...; kwargs...) -> Thunk
-delayed(f; options...)(args...; kwargs...) -> Thunk

Creates a Thunk object which can be executed later, which will call f with args and kwargs. options controls various properties of the resulting Thunk.

source
Dagger.@parMacro
@par [opts] f(args...; kwargs...) -> Thunk

Convenience macro to call Dagger.delayed on f with arguments args and keyword arguments kwargs. May also be called with a series of assignments like so:

x = @par begin
+Functions and Macros · Dagger.jl

Dagger Functions

Task Functions/Macros

Dagger.@spawnMacro
@spawn [opts] f(args...) -> Thunk

Convenience macro like Dagger.@par, but eagerly executed from the moment it's called (equivalent to spawn).

See the docs for @par for more information and usage examples.

source
Dagger.spawnFunction
spawn(f, args...; kwargs...) -> EagerThunk

Spawns a task with f as the function, args as the arguments, and kwargs as the keyword arguments, returning an EagerThunk. Uses a scheduler running in the background to execute code.

source
Dagger.delayedFunction
delayed(f, options=Options())(args...; kwargs...) -> Thunk
+delayed(f; options...)(args...; kwargs...) -> Thunk

Creates a Thunk object which can be executed later, which will call f with args and kwargs. options controls various properties of the resulting Thunk.

source
Dagger.@parMacro
@par [opts] f(args...; kwargs...) -> Thunk

Convenience macro to call Dagger.delayed on f with arguments args and keyword arguments kwargs. May also be called with a series of assignments like so:

x = @par begin
     a = f(1,2)
     b = g(a,3)
     h(a,b)
-end

x will hold the Thunk representing h(a,b); additionally, a and b will be defined in the same local scope and will be equally accessible for later calls.

Options to the Thunk can be set as opts with namedtuple syntax, e.g. single=1. Multiple options may be provided, and will be applied to all generated thunks.

source

Task Options Functions/Macros

Dagger.with_optionsFunction
with_options(f, options::NamedTuple) -> Any
-with_options(f; options...) -> Any

Sets one or more options to the given values, executes f(), resets the options to their previous values, and returns the result of f(). This is the recommended way to set options, as it only affects tasks spawned within its scope. Note that setting an option here will propagate its value across Julia or Dagger tasks spawned by f() or its callees (i.e. the options propagate).

source
Dagger.get_optionsFunction
get_options(key::Symbol, default) -> Any
-get_options(key::Symbol) -> Any

Returns the value of the option named key. If option does not have a value set, then an error will be thrown, unless default is set, in which case it will be returned instead of erroring.

get_options() -> NamedTuple

Returns a NamedTuple of all option key-value pairs.

source
Dagger.@optionMacro
@option name myfunc(A, B, C) = value

A convenience macro for defining default_option. For example:

Dagger.@option single mylocalfunc(Int) = 1

The above call will set the single option to 1 for any Dagger task calling mylocalfunc(Int) with an Int argument.

source
Dagger.default_optionFunction
default_option(::Val{name}, Tf, Targs...) where name = value

Defines the default value for option name to value when Dagger is preparing to execute a function with type Tf with the argument types Targs. Users and libraries may override this to set default values for tasks.

An easier way to define these defaults is with @option.

Note that the actual task's argument values are not passed, as it may not always be possible or efficient to gather all Dagger task arguments on one worker.

This function may be executed within the scheduler, so it should generally be made very cheap to execute. If the function throws an error, the scheduler will use whatever the global default value is for that option instead.

source

Data Management Functions

Dagger.tochunkFunction
tochunk(x, proc::Processor, scope::AbstractScope; device=nothing, kwargs...) -> Chunk

Create a chunk from data x which resides on proc and which has scope scope.

device specifies a MemPool.StorageDevice (which is itself wrapped in a Chunk) which will be used to manage the reference contained in the Chunk generated by this function. If device is nothing (the default), the data will be inspected to determine if it's safe to serialize; if so, the default MemPool storage device will be used; if not, then a MemPool.CPURAMDevice will be used.

All other kwargs are passed directly to MemPool.poolset.

source
Missing docstring.

Missing docstring for @mutable. Check Documenter's build log for details.

Dagger.shardFunction
shard(f; kwargs...) -> Chunk{Shard}

Executes f on all workers in workers, wrapping the result in a process-scoped Chunk, and constructs a Chunk{Shard} containing all of these Chunks on the current worker.

Keyword arguments:

  • procs – The list of processors to create pieces on. May be any iterable container of Processors.
  • workers – The list of workers to create pieces on. May be any iterable container of Integers.
  • per_thread::Bool=false – If true, creates a piece per each thread, rather than a piece per each worker.
source

Scope Functions

Dagger.scopeFunction
scope(scs...) -> AbstractScope
+end

x will hold the Thunk representing h(a,b); additionally, a and b will be defined in the same local scope and will be equally accessible for later calls.

Options to the Thunk can be set as opts with namedtuple syntax, e.g. single=1. Multiple options may be provided, and will be applied to all generated thunks.

source

Task Options Functions/Macros

Dagger.with_optionsFunction
with_options(f, options::NamedTuple) -> Any
+with_options(f; options...) -> Any

Sets one or more options to the given values, executes f(), resets the options to their previous values, and returns the result of f(). This is the recommended way to set options, as it only affects tasks spawned within its scope. Note that setting an option here will propagate its value across Julia or Dagger tasks spawned by f() or its callees (i.e. the options propagate).

source
Dagger.get_optionsFunction
get_options(key::Symbol, default) -> Any
+get_options(key::Symbol) -> Any

Returns the value of the option named key. If option does not have a value set, then an error will be thrown, unless default is set, in which case it will be returned instead of erroring.

get_options() -> NamedTuple

Returns a NamedTuple of all option key-value pairs.

source
Dagger.@optionMacro
@option name myfunc(A, B, C) = value

A convenience macro for defining default_option. For example:

Dagger.@option single mylocalfunc(Int) = 1

The above call will set the single option to 1 for any Dagger task calling mylocalfunc(Int) with an Int argument.

source
Dagger.default_optionFunction
default_option(::Val{name}, Tf, Targs...) where name = value

Defines the default value for option name to value when Dagger is preparing to execute a function with type Tf with the argument types Targs. Users and libraries may override this to set default values for tasks.

An easier way to define these defaults is with @option.

Note that the actual task's argument values are not passed, as it may not always be possible or efficient to gather all Dagger task arguments on one worker.

This function may be executed within the scheduler, so it should generally be made very cheap to execute. If the function throws an error, the scheduler will use whatever the global default value is for that option instead.

source

Data Management Functions

Dagger.tochunkFunction
tochunk(x, proc::Processor, scope::AbstractScope; device=nothing, kwargs...) -> Chunk

Create a chunk from data x which resides on proc and which has scope scope.

device specifies a MemPool.StorageDevice (which is itself wrapped in a Chunk) which will be used to manage the reference contained in the Chunk generated by this function. If device is nothing (the default), the data will be inspected to determine if it's safe to serialize; if so, the default MemPool storage device will be used; if not, then a MemPool.CPURAMDevice will be used.

All other kwargs are passed directly to MemPool.poolset.

source
Missing docstring.

Missing docstring for @mutable. Check Documenter's build log for details.

Dagger.shardFunction
shard(f; kwargs...) -> Chunk{Shard}

Executes f on all workers in workers, wrapping the result in a process-scoped Chunk, and constructs a Chunk{Shard} containing all of these Chunks on the current worker.

Keyword arguments:

  • procs – The list of processors to create pieces on. May be any iterable container of Processors.
  • workers – The list of workers to create pieces on. May be any iterable container of Integers.
  • per_thread::Bool=false – If true, creates a piece per each thread, rather than a piece per each worker.
source

Scope Functions

Dagger.scopeFunction
scope(scs...) -> AbstractScope
 scope(;scs...) -> AbstractScope

Constructs an AbstractScope from a set of scope specifiers. Each element in scs is a separate specifier; if scs is empty, an empty UnionScope() is produced; if scs has one element, then exactly one specifier is constructed; if scs has more than one element, a UnionScope of the scopes specified by scs is constructed. A variety of specifiers can be passed to construct a scope:

  • :any - Constructs an AnyScope()
  • :default - Constructs a DefaultScope()
  • (scs...,) - Constructs a UnionScope of scopes, each specified by scs
  • thread=tid or threads=[tids...] - Constructs an ExactScope or UnionScope containing all Dagger.ThreadProcs with thread ID tid/tids across all workers.
  • worker=wid or workers=[wids...] - Constructs a ProcessScope or UnionScope containing all Dagger.ThreadProcs with worker ID wid/wids across all threads.
  • thread=tid/threads=tids and worker=wid/workers=wids - Constructs an ExactScope, ProcessScope, or UnionScope containing all Dagger.ThreadProcs with worker ID wid/wids and threads tid/tids.

Aside from the worker and thread specifiers, it's possible to add custom specifiers for scoping to other kinds of processors (like GPUs) or providing different ways to specify a scope. Specifier selection is determined by a precedence ordering: by default, all specifiers have precedence 0, which can be changed by defining scope_key_precedence(::Val{spec}) = precedence (where spec is the specifier as a Symbol). The specifier with the highest precedence in a set of specifiers is used to determine the scope by calling to_scope(::Val{spec}, sc::NamedTuple) (where sc is the full set of specifiers), which should be overriden for each custom specifier, and which returns an AbstractScope. For example:

# Setup a GPU specifier
 Dagger.scope_key_precedence(::Val{:gpu}) = 1
 Dagger.to_scope(::Val{:gpu}, sc::NamedTuple) = ExactScope(MyGPUDevice(sc.worker, sc.gpu))
 
 # Generate an `ExactScope` for `MyGPUDevice` on worker 2, device 3
-Dagger.scope(gpu=3, worker=2)
source
Dagger.constrainFunction
constraint(x::AbstractScope, y::AbstractScope) -> ::AbstractScope

Constructs a scope that is the intersection of scopes x and y.

source

Lazy Task Functions

Dagger.domainFunction
domain(x::T)

Returns metadata about x. This metadata will be in the domain field of a Chunk object when an object of type T is created as the result of evaluating a Thunk.

source
Dagger.computeFunction
compute(ctx::Context, d::Thunk; options=nothing) -> Chunk

Compute a Thunk - creates the DAG, assigns ranks to nodes for tie breaking and runs the scheduler with the specified options. Returns a Chunk which references the result.

source
Dagger.dependentsFunction
dependents(node::Thunk) -> Dict{Union{Thunk,Chunk}, Set{Thunk}}

Find the set of direct dependents for each task.

source
Dagger.noffspringFunction
noffspring(dpents::Dict{Union{Thunk,Chunk}, Set{Thunk}}) -> Dict{Thunk, Int}

Recursively find the number of tasks dependent on each task in the DAG. Takes a Dict as returned by dependents.

source
Dagger.orderFunction
order(node::Thunk, ndeps) -> Dict{Thunk,Int}

Given a root node of the DAG, calculates a total order for tie-breaking.

  • Root node gets score 1,
  • rest of the nodes are explored in DFS fashion but chunks of each node are explored in order of noffspring, i.e. total number of tasks depending on the result of the said node.

Args:

source

Processor Functions

Dagger.execute!Function
execute!(proc::Processor, f, args...; kwargs...) -> Any

Executes the function f with arguments args and keyword arguments kwargs on processor proc. This function can be overloaded by Processor subtypes to allow executing function calls differently than normal Julia.

source
Dagger.iscompatibleFunction
iscompatible(proc::Processor, opts, f, Targs...) -> Bool

Indicates whether proc can execute f over Targs given opts. Processor subtypes should overload this function to return true if and only if it is essentially guaranteed that f(::Targs...) is supported. Additionally, iscompatible_func and iscompatible_arg can be overriden to determine compatibility of f and Targs individually. The default implementation returns false.

source
Dagger.default_enabledFunction
default_enabled(proc::Processor) -> Bool

Returns whether processor proc is enabled by default. The default value is false, which is an opt-out of the processor from execution when not specifically requested by the user, and true implies opt-in, which causes the processor to always participate in execution when possible.

source
Dagger.get_processorsFunction
get_processors(proc::Processor) -> Set{<:Processor}

Returns the set of processors contained in proc, if any. Processor subtypes should overload this function if they can contain sub-processors. The default method will return a Set containing proc itself.

source
Dagger.get_parentFunction
get_parent(proc::Processor) -> Processor

Returns the parent processor for proc. The ultimate parent processor is an OSProc. Processor subtypes should overload this to return their most direct parent.

source
Dagger.moveFunction
move(from_proc::Processor, to_proc::Processor, x)

Moves and/or converts x such that it's available and suitable for usage on the to_proc processor. This function can be overloaded by Processor subtypes to transport arguments and convert them to an appropriate form before being used for exection. Subtypes of Processor wishing to implement efficient data movement should provide implementations where x::Chunk.

source

Context Functions

Dagger.addprocs!Function
addprocs!(ctx::Context, xs)

Add new workers xs to ctx.

Workers will typically be assigned new tasks in the next scheduling iteration if scheduling is ongoing.

Workers can be either Processors or the underlying process IDs as Integers.

source
Dagger.rmprocs!Function
rmprocs!(ctx::Context, xs)

Remove the specified workers xs from ctx.

Workers will typically finish all their assigned tasks if scheduling is ongoing but will not be assigned new tasks after removal.

Workers can be either Processors or the underlying process IDs as Integers.

source

Thunk Execution Environment Functions

These functions are used within the function called by a Thunk.

Dynamic Scheduler Control Functions

These functions query and control the scheduler remotely.

Base.fetchFunction
Base.fetch(c::DArray)

If a DArray tree has a Thunk in it, make the whole thing a big thunk.

source

Waits on a thunk to complete, and fetches its result.

source
+Dagger.scope(gpu=3, worker=2)
source
Dagger.constrainFunction
constraint(x::AbstractScope, y::AbstractScope) -> ::AbstractScope

Constructs a scope that is the intersection of scopes x and y.

source

Lazy Task Functions

Dagger.domainFunction
domain(x::T)

Returns metadata about x. This metadata will be in the domain field of a Chunk object when an object of type T is created as the result of evaluating a Thunk.

source
Dagger.computeFunction
compute(ctx::Context, d::Thunk; options=nothing) -> Chunk

Compute a Thunk - creates the DAG, assigns ranks to nodes for tie breaking and runs the scheduler with the specified options. Returns a Chunk which references the result.

source
Dagger.dependentsFunction
dependents(node::Thunk) -> Dict{Union{Thunk,Chunk}, Set{Thunk}}

Find the set of direct dependents for each task.

source
Dagger.noffspringFunction
noffspring(dpents::Dict{Union{Thunk,Chunk}, Set{Thunk}}) -> Dict{Thunk, Int}

Recursively find the number of tasks dependent on each task in the DAG. Takes a Dict as returned by dependents.

source
Dagger.orderFunction
order(node::Thunk, ndeps) -> Dict{Thunk,Int}

Given a root node of the DAG, calculates a total order for tie-breaking.

  • Root node gets score 1,
  • rest of the nodes are explored in DFS fashion but chunks of each node are explored in order of noffspring, i.e. total number of tasks depending on the result of the said node.

Args:

source

Processor Functions

Dagger.execute!Function
execute!(proc::Processor, f, args...; kwargs...) -> Any

Executes the function f with arguments args and keyword arguments kwargs on processor proc. This function can be overloaded by Processor subtypes to allow executing function calls differently than normal Julia.

source
Dagger.iscompatibleFunction
iscompatible(proc::Processor, opts, f, Targs...) -> Bool

Indicates whether proc can execute f over Targs given opts. Processor subtypes should overload this function to return true if and only if it is essentially guaranteed that f(::Targs...) is supported. Additionally, iscompatible_func and iscompatible_arg can be overriden to determine compatibility of f and Targs individually. The default implementation returns false.

source
Dagger.default_enabledFunction
default_enabled(proc::Processor) -> Bool

Returns whether processor proc is enabled by default. The default value is false, which is an opt-out of the processor from execution when not specifically requested by the user, and true implies opt-in, which causes the processor to always participate in execution when possible.

source
Dagger.get_processorsFunction
get_processors(proc::Processor) -> Set{<:Processor}

Returns the set of processors contained in proc, if any. Processor subtypes should overload this function if they can contain sub-processors. The default method will return a Set containing proc itself.

source
Dagger.get_parentFunction
get_parent(proc::Processor) -> Processor

Returns the parent processor for proc. The ultimate parent processor is an OSProc. Processor subtypes should overload this to return their most direct parent.

source
Dagger.moveFunction
move(from_proc::Processor, to_proc::Processor, x)

Moves and/or converts x such that it's available and suitable for usage on the to_proc processor. This function can be overloaded by Processor subtypes to transport arguments and convert them to an appropriate form before being used for exection. Subtypes of Processor wishing to implement efficient data movement should provide implementations where x::Chunk.

source

Context Functions

Dagger.addprocs!Function
addprocs!(ctx::Context, xs)

Add new workers xs to ctx.

Workers will typically be assigned new tasks in the next scheduling iteration if scheduling is ongoing.

Workers can be either Processors or the underlying process IDs as Integers.

source
Dagger.rmprocs!Function
rmprocs!(ctx::Context, xs)

Remove the specified workers xs from ctx.

Workers will typically finish all their assigned tasks if scheduling is ongoing but will not be assigned new tasks after removal.

Workers can be either Processors or the underlying process IDs as Integers.

source

Thunk Execution Environment Functions

These functions are used within the function called by a Thunk.

Dynamic Scheduler Control Functions

These functions query and control the scheduler remotely.

Base.fetchFunction

Waits on a thunk to complete, and fetches its result.

source
Base.fetch(c::DArray)

If a DArray tree has a Thunk in it, make the whole thing a big thunk.

source
diff --git a/dev/api-dagger/types/index.html b/dev/api-dagger/types/index.html index eebe85381..cce35d5d0 100644 --- a/dev/api-dagger/types/index.html +++ b/dev/api-dagger/types/index.html @@ -1,12 +1,12 @@ -Types · Dagger.jl

Dagger Types

Task Types

Dagger.ThunkType
Thunk

Wraps a callable object to be run with Dagger. A Thunk is typically created through a call to delayed or its macro equivalent @par.

Constructors

delayed(f; kwargs...)(args...)
+Types · Dagger.jl

Dagger Types

Task Types

Dagger.ThunkType
Thunk

Wraps a callable object to be run with Dagger. A Thunk is typically created through a call to delayed or its macro equivalent @par.

Constructors

delayed(f; kwargs...)(args...)
 @par [option=value]... f(args...)

Examples

julia> t = delayed(sin)(π)  # creates a Thunk to be computed later
 Thunk(sin, (π,))
 
 julia> collect(t)  # computes the result and returns it to the current process
-1.2246467991473532e-16

Arguments

  • f: The function to be called upon execution of the Thunk.
  • args: The arguments to be passed to the Thunk.
  • kwargs: The properties describing unique behavior of this Thunk. Details

for each property are described in the next section.

  • option=value: The same as passing kwargs to delayed.

Public Properties

  • meta::Bool=false: If true, instead of fetching cached arguments from

Chunks and passing the raw arguments to f, instead pass the Chunk. Useful for doing manual fetching or manipulation of Chunk references. Non-Chunk arguments are still passed as-is.

  • processor::Processor=OSProc() - The processor associated with f. Useful if

f is a callable struct that exists on a given processor and should be transferred appropriately.

  • scope::Dagger.AbstractScope=DefaultScope() - The scope associated with f.

Useful if f is a function or callable struct that may only be transferred to, and executed within, the specified scope.

Options

  • options: A Sch.ThunkOptions struct providing the options for the Thunk.

If omitted, options can also be specified by passing key-value pairs as kwargs.

source
Dagger.EagerThunkType
EagerThunk

Returned from spawn/@spawn calls. Represents a task that is in the scheduler, potentially ready to execute, executing, or finished executing. May be fetch'd or wait'd on at any time.

source

Task Options Types

Options
+1.2246467991473532e-16

Arguments

  • f: The function to be called upon execution of the Thunk.
  • args: The arguments to be passed to the Thunk.
  • kwargs: The properties describing unique behavior of this Thunk. Details

for each property are described in the next section.

  • option=value: The same as passing kwargs to delayed.

Public Properties

  • meta::Bool=false: If true, instead of fetching cached arguments from

Chunks and passing the raw arguments to f, instead pass the Chunk. Useful for doing manual fetching or manipulation of Chunk references. Non-Chunk arguments are still passed as-is.

  • processor::Processor=OSProc() - The processor associated with f. Useful if

f is a callable struct that exists on a given processor and should be transferred appropriately.

  • scope::Dagger.AbstractScope=DefaultScope() - The scope associated with f.

Useful if f is a function or callable struct that may only be transferred to, and executed within, the specified scope.

Options

  • options: A Sch.ThunkOptions struct providing the options for the Thunk.

If omitted, options can also be specified by passing key-value pairs as kwargs.

source
Dagger.EagerThunkType
EagerThunk

Returned from spawn/@spawn calls. Represents a task that is in the scheduler, potentially ready to execute, executing, or finished executing. May be fetch'd or wait'd on at any time.

source

Task Options Types

Options
 Sch.ThunkOptions
 Sch.SchedulerOptions

Data Management Types

Chunk
-Shard

Processor Types

Dagger.ProcessorType
Processor

An abstract type representing a processing device and associated memory, where data can be stored and operated on. Subtypes should be immutable, and instances should compare equal if they represent the same logical processing device/memory. Subtype instances should be serializable between different nodes. Subtype instances may contain a "parent" Processor to make it easy to transfer data to/from other types of Processor at runtime.

source
Dagger.OSProcType
OSProc <: Processor

Julia CPU (OS) process, identified by Distributed pid. The logical parent of all processors on a given node, but otherwise does not participate in computations.

source

Scope Types

Context Types

Dagger.ContextType
Context(xs::Vector{OSProc}) -> Context
-Context(xs::Vector{Int}) -> Context

Create a Context, by default adding each available worker.

It is also possible to create a Context from a vector of OSProc, or equivalently the underlying process ids can also be passed directly as a Vector{Int}.

Special fields include:

  • 'log_sink': A log sink object to use, if any.
  • log_file::Union{String,Nothing}: Path to logfile. If specified, at

scheduler termination, logs will be collected, combined with input thunks, and written out in DOT format to this location.

  • profile::Bool: Whether or not to perform profiling with Profile stdlib.
source

Array Types

Dagger.DArrayType
DArray{T,N,F}(domain, subdomains, chunks, concat)
-DArray(T, domain, subdomains, chunks, [concat=cat])

An N-dimensional distributed array of element type T, with a concatenation function of type F.

Arguments

  • T: element type
  • domain::ArrayDomain{N}: the whole ArrayDomain of the array
  • subdomains::AbstractArray{ArrayDomain{N}, N}: a DomainBlocks of the same dimensions as the array
  • chunks::AbstractArray{Union{Chunk,Thunk}, N}: an array of chunks of dimension N
  • concat::F: a function of type F. concat(x, y; dims=d) takes two chunks x and y and concatenates them along dimension d. cat is used by default.
source
Dagger.BlocksType
Blocks(xs...)

Indicates the size of an array operation, specified as xs, whose length indicates the number of dimensions in the resulting array.

source

Logging Event Types

+Shard

Processor Types

Dagger.ProcessorType
Processor

An abstract type representing a processing device and associated memory, where data can be stored and operated on. Subtypes should be immutable, and instances should compare equal if they represent the same logical processing device/memory. Subtype instances should be serializable between different nodes. Subtype instances may contain a "parent" Processor to make it easy to transfer data to/from other types of Processor at runtime.

source
Dagger.OSProcType
OSProc <: Processor

Julia CPU (OS) process, identified by Distributed pid. The logical parent of all processors on a given node, but otherwise does not participate in computations.

source
Dagger.ThreadProcType
ThreadProc <: Processor

Julia CPU (OS) thread, identified by Julia thread ID.

source

Scope Types

Dagger.AnyScopeType

Widest scope that contains all processors.

source
Dagger.NodeScopeType

Scoped to the same physical node.

source
Dagger.ProcessScopeType

Scoped to the same OS process.

source
Dagger.ProcessorTypeScopeFunction

Scoped to any processor with a given supertype.

source
Dagger.TaintScopeType

Taints a scope for later evaluation.

source
Dagger.UnionScopeType

Union of two or more scopes.

source
Dagger.ExactScopeType

Scoped to a specific processor.

source

Context Types

Dagger.ContextType
Context(xs::Vector{OSProc}) -> Context
+Context(xs::Vector{Int}) -> Context

Create a Context, by default adding each available worker.

It is also possible to create a Context from a vector of OSProc, or equivalently the underlying process ids can also be passed directly as a Vector{Int}.

Special fields include:

  • 'log_sink': A log sink object to use, if any.
  • log_file::Union{String,Nothing}: Path to logfile. If specified, at

scheduler termination, logs will be collected, combined with input thunks, and written out in DOT format to this location.

  • profile::Bool: Whether or not to perform profiling with Profile stdlib.
source

Array Types

Dagger.DArrayType
DArray{T,N,F}(domain, subdomains, chunks, concat)
+DArray(T, domain, subdomains, chunks, [concat=cat])

An N-dimensional distributed array of element type T, with a concatenation function of type F.

Arguments

  • T: element type
  • domain::ArrayDomain{N}: the whole ArrayDomain of the array
  • subdomains::AbstractArray{ArrayDomain{N}, N}: a DomainBlocks of the same dimensions as the array
  • chunks::AbstractArray{Union{Chunk,Thunk}, N}: an array of chunks of dimension N
  • concat::F: a function of type F. concat(x, y; dims=d) takes two chunks x and y and concatenates them along dimension d. cat is used by default.
source
Dagger.BlocksType
Blocks(xs...)

Indicates the size of an array operation, specified as xs, whose length indicates the number of dimensions in the resulting array.

source
Dagger.ArrayDomainType
ArrayDomain{N}

An N-dimensional domain over an array.

source
Dagger.UnitDomainType
UnitDomain

Default domain – has no information about the value

source

Logging Event Types

Dagger.Events.BytesAllocdType
BytesAllocd

Tracks memory allocated for Chunks.

source
Dagger.Events.ProcessorSaturationType
ProcessorSaturation

Tracks the compute saturation (running tasks) per-processor.

source
Dagger.Events.WorkerSaturationType
WorkerSaturation

Tracks the compute saturation (running tasks).

source
diff --git a/dev/api-daggerwebdash/functions/index.html b/dev/api-daggerwebdash/functions/index.html index f2289bab2..43646e223 100644 --- a/dev/api-daggerwebdash/functions/index.html +++ b/dev/api-daggerwebdash/functions/index.html @@ -1,2 +1,2 @@ -Functions and Macros · Dagger.jl
+Functions and Macros · Dagger.jl
diff --git a/dev/api-daggerwebdash/types/index.html b/dev/api-daggerwebdash/types/index.html index 36b6202c6..e19ec1336 100644 --- a/dev/api-daggerwebdash/types/index.html +++ b/dev/api-daggerwebdash/types/index.html @@ -1,5 +1,5 @@ -Types · Dagger.jl

DaggerWebDash Types

Logging Event Types

DaggerWebDash.D3RendererType
D3Renderer(port::Int, port_range::UnitRange; seek_store=nothing) -> D3Renderer

Constructs a D3Renderer, which is a TimespanLogging aggregator which renders the logs over HTTP using the d3.js library. port is the port that will be serving the HTTP website. port_range specifies a range of ports that will be used to listen for connections from other Dagger workers. seek_store, if specified, is a Tables.jl-compatible object that logs will be written to and read from. This table can be written to disk and then re-read later for offline log analysis.

source
DaggerWebDash.TableStorageType
TableStorage

LogWindow-compatible aggregator which stores logs in a Tables.jl-compatible sink.

Using a TableStorage is reasonably simple:

ml = TimespanLogging.MultiEventLog()
+Types · Dagger.jl

DaggerWebDash Types

Logging Event Types

DaggerWebDash.D3RendererType
D3Renderer(port::Int, port_range::UnitRange; seek_store=nothing) -> D3Renderer

Constructs a D3Renderer, which is a TimespanLogging aggregator which renders the logs over HTTP using the d3.js library. port is the port that will be serving the HTTP website. port_range specifies a range of ports that will be used to listen for connections from other Dagger workers. seek_store, if specified, is a Tables.jl-compatible object that logs will be written to and read from. This table can be written to disk and then re-read later for offline log analysis.

source
DaggerWebDash.TableStorageType
TableStorage

LogWindow-compatible aggregator which stores logs in a Tables.jl-compatible sink.

Using a TableStorage is reasonably simple:

ml = TimespanLogging.MultiEventLog()
 
 ... # Add some events
 
@@ -15,4 +15,4 @@
 ml.aggregators[:lw] = lw
 
 # Logs will now be saved into `df` automatically, and packages like
-# DaggerWebDash.jl will automatically use it to retrieve subsets of the logs.
source
+# DaggerWebDash.jl will automatically use it to retrieve subsets of the logs.
source
diff --git a/dev/api-timespanlogging/functions/index.html b/dev/api-timespanlogging/functions/index.html index 52f8bf163..dcc947fb3 100644 --- a/dev/api-timespanlogging/functions/index.html +++ b/dev/api-timespanlogging/functions/index.html @@ -1,2 +1,2 @@ -Functions and Macros · Dagger.jl

TimespanLogging Functions

Basic Functions

TimespanLogging.timespan_startFunction
timespan_start(ctx, category::Symbol, id, tl)

Generates an Event{:start} which denotes the start of an event. The event is categorized by category, and uniquely identified by id; these two must be the same passed to timespan_finish to close the event. tl is the "timeline" of the event, which is just an arbitrary payload attached to the event.

source
TimespanLogging.timespan_finishFunction
timespan_finish(ctx, category::Symbol, id, tl)

Generates an Event{:finish} which denotes the end of an event. The event is categorized by category, and uniquely identified by id; these two must be the same as previously passed to timespan_start. tl is the "timeline" of the event, which is just an arbitrary payload attached to the event.

source
TimespanLogging.get_logs!Function
get_logs!(::LocalEventLog, raw=false; only_local=false) -> Union{Vector{Timespan},Vector{Event}}

Get the logs from each process' local event log, clearing it in the process. Set raw to true to get potentially unmatched Events; the default is to return only matched events as Timespans. If only_local is set true, only process-local logs will be fetched; the default is to fetch logs from all processes.

source

Logging Metric Functions

+Functions and Macros · Dagger.jl

TimespanLogging Functions

Basic Functions

TimespanLogging.timespan_startFunction
timespan_start(ctx, category::Symbol, id, tl)

Generates an Event{:start} which denotes the start of an event. The event is categorized by category, and uniquely identified by id; these two must be the same passed to timespan_finish to close the event. tl is the "timeline" of the event, which is just an arbitrary payload attached to the event.

source
TimespanLogging.timespan_finishFunction
timespan_finish(ctx, category::Symbol, id, tl)

Generates an Event{:finish} which denotes the end of an event. The event is categorized by category, and uniquely identified by id; these two must be the same as previously passed to timespan_start. tl is the "timeline" of the event, which is just an arbitrary payload attached to the event.

source
TimespanLogging.get_logs!Function
get_logs!(::LocalEventLog, raw=false; only_local=false) -> Union{Vector{Timespan},Vector{Event}}

Get the logs from each process' local event log, clearing it in the process. Set raw to true to get potentially unmatched Events; the default is to return only matched events as Timespans. If only_local is set true, only process-local logs will be fetched; the default is to fetch logs from all processes.

source

Logging Metric Functions

diff --git a/dev/api-timespanlogging/types/index.html b/dev/api-timespanlogging/types/index.html index 0d5e1d004..e79be5e50 100644 --- a/dev/api-timespanlogging/types/index.html +++ b/dev/api-timespanlogging/types/index.html @@ -1,2 +1,2 @@ -Types · Dagger.jl

TimespanLogging Types

Log Sink Types

TimespanLogging.MultiEventLogType
MultiEventLog

Processes events immediately, generating multiple log streams. Multiple consumers may register themselves in the MultiEventLog, and when accessed, log events will be provided to all consumers. A consumer is simply a function or callable struct which will be called with an event when it's generated. The return value of the consumer will be pushed into a log stream dedicated to that consumer. Errors thrown by consumers will be caught and rendered, but will not otherwise interrupt consumption by other consumers, or future consumption cycles. An error will result in nothing being appended to that consumer's log.

source
TimespanLogging.LocalEventLogType
LocalEventLog

Stores events in a process-local array. Accessing the logs is all-or-nothing; if multiple consumers call get_logs!, they will get different sets of logs.

source

Event Types

Built-in Event Types

```

+Types · Dagger.jl

TimespanLogging Types

Log Sink Types

TimespanLogging.MultiEventLogType
MultiEventLog

Processes events immediately, generating multiple log streams. Multiple consumers may register themselves in the MultiEventLog, and when accessed, log events will be provided to all consumers. A consumer is simply a function or callable struct which will be called with an event when it's generated. The return value of the consumer will be pushed into a log stream dedicated to that consumer. Errors thrown by consumers will be caught and rendered, but will not otherwise interrupt consumption by other consumers, or future consumption cycles. An error will result in nothing being appended to that consumer's log.

source
TimespanLogging.LocalEventLogType
LocalEventLog

Stores events in a process-local array. Accessing the logs is all-or-nothing; if multiple consumers call get_logs!, they will get different sets of logs.

source

Event Types

Built-in Event Types

```

diff --git a/dev/benchmarking/index.html b/dev/benchmarking/index.html index 86286b542..87ffebdfe 100644 --- a/dev/benchmarking/index.html +++ b/dev/benchmarking/index.html @@ -1,2 +1,2 @@ -Benchmarking · Dagger.jl

Benchmarking Dagger

For ease of benchmarking changes to Dagger's scheduler and the DArray, a benchmarking script exists at benchmarks/benchmark.jl. This script currently allows benchmarking a non-negative matrix factorization (NNMF) algorithm, which we've found to be a good evaluator of scheduling performance. The benchmark script can test with and without Dagger, and also has support for using CUDA or AMD GPUs to accelerate the NNMF via DaggerGPU.jl.

The script checks for a number of environment variables, which are used to control the benchmarks that are performed (all of which are optional):

  • BENCHMARK_PROCS: Selects the number of Julia processes and threads to start-up. Specified as 8:4, this option would start 8 extra Julia processes, with 4 threads each. Defaults to 2 processors and 1 thread each.
  • BENCHMARK_REMOTES: Specifies a colon-separated list of remote servers to connect to and start Julia processes on, using BENCHMARK_PROCS to indicate the processor/thread configuration of those remotes. Disabled by default (uses the local machine).
  • BENCHMARK_OUTPUT_FORMAT: Selects the output format for benchmark results. Defaults to jls, which uses Julia's Serialization stdlib, and can also be jld to use JLD.jl.
  • BENCHMARK_RENDER: Configures rendering, which is disabled by default. Can be "live" or "offline", which are explained below.
  • BENCHMARK: Specifies the set of benchmarks to run as a comma-separated list, where each entry can be one of cpu, cuda, or amdgpu, and may optionally append +dagger (like cuda+dagger) to indicate whether or not to use Dagger. Defaults to cpu,cpu+dagger, which runs CPU benchmarks with and without Dagger.
  • BENCHMARK_SCALE: Determines how much to scale the benchmark sizing by, typically specified as a UnitRange{Int}. Defaults to 1:5:50, which runs each scale from 1 to 50, in steps of 5.

Rendering with BENCHMARK_RENDER

Dagger contains visualization code for the scheduler (as a Gantt chart) and thunk execution profiling (flamechart), which can be enabled with BENCHMARK_RENDER. Additionally, rendering can be done "live", served via a Mux.jl webserver run locally, or "offline", where the visualization will be embedded into the results output file. By default, rendering is disabled. If BENCHMARK_RENDER is set to live, a Mux webserver is started at localhost:8000 (the address is not yet configurable), and the Gantt chart and profiling flamechart will be rendered once the benchmarks start. If set to offline, data visualization will happen in the background, and will be passed in the results file.

Note that Gantt chart and flamechart output is only generated and relevant during Dagger execution.

TODO: Plotting

+Benchmarking · Dagger.jl

Benchmarking Dagger

For ease of benchmarking changes to Dagger's scheduler and the DArray, a benchmarking script exists at benchmarks/benchmark.jl. This script currently allows benchmarking a non-negative matrix factorization (NNMF) algorithm, which we've found to be a good evaluator of scheduling performance. The benchmark script can test with and without Dagger, and also has support for using CUDA or AMD GPUs to accelerate the NNMF via DaggerGPU.jl.

The script checks for a number of environment variables, which are used to control the benchmarks that are performed (all of which are optional):

  • BENCHMARK_PROCS: Selects the number of Julia processes and threads to start-up. Specified as 8:4, this option would start 8 extra Julia processes, with 4 threads each. Defaults to 2 processors and 1 thread each.
  • BENCHMARK_REMOTES: Specifies a colon-separated list of remote servers to connect to and start Julia processes on, using BENCHMARK_PROCS to indicate the processor/thread configuration of those remotes. Disabled by default (uses the local machine).
  • BENCHMARK_OUTPUT_FORMAT: Selects the output format for benchmark results. Defaults to jls, which uses Julia's Serialization stdlib, and can also be jld to use JLD.jl.
  • BENCHMARK_RENDER: Configures rendering, which is disabled by default. Can be "live" or "offline", which are explained below.
  • BENCHMARK: Specifies the set of benchmarks to run as a comma-separated list, where each entry can be one of cpu, cuda, or amdgpu, and may optionally append +dagger (like cuda+dagger) to indicate whether or not to use Dagger. Defaults to cpu,cpu+dagger, which runs CPU benchmarks with and without Dagger.
  • BENCHMARK_SCALE: Determines how much to scale the benchmark sizing by, typically specified as a UnitRange{Int}. Defaults to 1:5:50, which runs each scale from 1 to 50, in steps of 5.

Rendering with BENCHMARK_RENDER

Dagger contains visualization code for the scheduler (as a Gantt chart) and thunk execution profiling (flamechart), which can be enabled with BENCHMARK_RENDER. Additionally, rendering can be done "live", served via a Mux.jl webserver run locally, or "offline", where the visualization will be embedded into the results output file. By default, rendering is disabled. If BENCHMARK_RENDER is set to live, a Mux webserver is started at localhost:8000 (the address is not yet configurable), and the Gantt chart and profiling flamechart will be rendered once the benchmarks start. If set to offline, data visualization will happen in the background, and will be passed in the results file.

Note that Gantt chart and flamechart output is only generated and relevant during Dagger execution.

TODO: Plotting

diff --git a/dev/checkpointing/index.html b/dev/checkpointing/index.html index b704b31df..103c207c3 100644 --- a/dev/checkpointing/index.html +++ b/dev/checkpointing/index.html @@ -1,5 +1,5 @@ -Checkpointing · Dagger.jl

Checkpointing

If at some point during a Dagger computation a thunk throws an error, or if the entire computation dies because the head node hit an OOM or other unexpected error, the entire computation is lost and needs to be started from scratch. This can be unacceptable for scheduling very large/expensive/mission-critical graphs, and for interactive development where errors are common and easily fixable.

Robust applications often support "checkpointing", where intermediate results are periodically written out to persistent media, or sharded to the rest of the cluster, to allow resuming an interrupted computation from a point later than the original start. Dagger provides infrastructure to perform user-driven checkpointing of intermediate results once they're generated.

As a concrete example, imagine that you're developing a numerical algorithm, and distributing it with Dagger. The idea is to sum all the values in a very big matrix, and then get the square root of the absolute value of the sum of sums. Here is what that might look like:

X = compute(randn(Blocks(128,128), 1024, 1024))
+Checkpointing · Dagger.jl

Checkpointing

If at some point during a Dagger computation a thunk throws an error, or if the entire computation dies because the head node hit an OOM or other unexpected error, the entire computation is lost and needs to be started from scratch. This can be unacceptable for scheduling very large/expensive/mission-critical graphs, and for interactive development where errors are common and easily fixable.

Robust applications often support "checkpointing", where intermediate results are periodically written out to persistent media, or sharded to the rest of the cluster, to allow resuming an interrupted computation from a point later than the original start. Dagger provides infrastructure to perform user-driven checkpointing of intermediate results once they're generated.

As a concrete example, imagine that you're developing a numerical algorithm, and distributing it with Dagger. The idea is to sum all the values in a very big matrix, and then get the square root of the absolute value of the sum of sums. Here is what that might look like:

X = compute(randn(Blocks(128,128), 1024, 1024))
 Y = [delayed(sum)(chunk) for chunk in X.chunks]
 inner(x...) = sqrt(sum(x))
 Z = delayed(inner)(Y...)
@@ -27,4 +27,4 @@
     open("checkpoint-final.bin", "r") do io
         Dagger.tochunk(deserialize(io))
     end
-end))

In this case, the entire computation will be skipped if checkpoint-final.bin exists!

+end))

In this case, the entire computation will be skipped if checkpoint-final.bin exists!

diff --git a/dev/darray/index.html b/dev/darray/index.html index c9e297140..953c10f5e 100644 --- a/dev/darray/index.html +++ b/dev/darray/index.html @@ -1,5 +1,5 @@ -Distributed Arrays · Dagger.jl

Distributed Arrays

The DArray, or "distributed array", is an abstraction layer on top of Dagger that allows loading array-like structures into a distributed environment. The DArray partitions a larger array into smaller "blocks" or "chunks", and those blocks may be located on any worker in the cluster. The DArray uses a Parallel Global Address Space (aka "PGAS") model for storing partitions, which means that a DArray instance contains a reference to every partition in the greater array; this provides great flexibility in allowing Dagger to choose the most efficient way to distribute the array's blocks and operate on them in a heterogeneous manner.

Aside: an alternative model, here termed the "MPI" model, is not yet supported, but would allow storing only a single partition of the array on each MPI rank in an MPI cluster. DArray support for this model is planned in the near future.

This should not be confused with the DistributedArrays.jl package.

Creating DArrays

A DArray can be created in two ways: through an API similar to the usual rand, ones, etc. calls, or by distributing an existing array with distribute. It's generally not recommended to manually construct a DArray object unless you're developing the DArray itself.

Allocating new arrays

As an example, one can allocate a random DArray by calling rand with a Blocks object as the first argument - Blocks specifies the size of partitions to be constructed, and must be the same number of dimensions as the array being allocated.

# Add some Julia workers
+Distributed Arrays · Dagger.jl

Distributed Arrays

The DArray, or "distributed array", is an abstraction layer on top of Dagger that allows loading array-like structures into a distributed environment. The DArray partitions a larger array into smaller "blocks" or "chunks", and those blocks may be located on any worker in the cluster. The DArray uses a Parallel Global Address Space (aka "PGAS") model for storing partitions, which means that a DArray instance contains a reference to every partition in the greater array; this provides great flexibility in allowing Dagger to choose the most efficient way to distribute the array's blocks and operate on them in a heterogeneous manner.

Aside: an alternative model, here termed the "MPI" model, is not yet supported, but would allow storing only a single partition of the array on each MPI rank in an MPI cluster. DArray support for this model is planned in the near future.

This should not be confused with the DistributedArrays.jl package.

Creating DArrays

A DArray can be created in two ways: through an API similar to the usual rand, ones, etc. calls, or by distributing an existing array with distribute. It's generally not recommended to manually construct a DArray object unless you're developing the DArray itself.

Allocating new arrays

As an example, one can allocate a random DArray by calling rand with a Blocks object as the first argument - Blocks specifies the size of partitions to be constructed, and must be the same number of dimensions as the array being allocated.

# Add some Julia workers
 julia> using Distributed; addprocs(6)
 6-element Vector{Int64}:
  2
@@ -99,4 +99,4 @@
  0.1046     3.65967   1.62098     5.33185   0.0822769     3.30334     5.90173    4.06603    5.00789   4.40601
  1.9622     0.755491  2.12264     1.67299   2.34482       4.50632     3.84387    3.22232    5.23164   2.97735
  4.37208    5.15253   0.346373    2.98573   5.48589       0.336134    2.25751    2.39057    1.97975   3.24243
- 3.83293    1.69017   3.00189     1.80388   3.43671       5.94085     1.27609    3.98737    0.334963  5.84865

A variety of other operations exist on the DArray, and it should generally behavior otherwise similar to any other AbstractArray type. If you find that it's missing an operation that you need, please file an issue!

+ 3.83293 1.69017 3.00189 1.80388 3.43671 5.94085 1.27609 3.98737 0.334963 5.84865

A variety of other operations exist on the DArray, and it should generally behavior otherwise similar to any other AbstractArray type. If you find that it's missing an operation that you need, please file an issue!

diff --git a/dev/data-management/index.html b/dev/data-management/index.html index b7c871eaf..a92e7fcb3 100644 --- a/dev/data-management/index.html +++ b/dev/data-management/index.html @@ -1,5 +1,5 @@ -Data Management · Dagger.jl

Data Management

Dagger is not just a computing platform - it also has awareness of where each piece of data resides, and will move data between workers and perform conversions as necessary to satisfy the needs of your tasks.

Chunks

Dagger often needs to move data between workers to allow a task to execute. To make this efficient when communicating potentially large units of data, Dagger uses a remote reference, called a Chunk, to refer to objects which may exist on another worker. Chunks are backed by a distributed refcounting mechanism provided by MemPool.jl, which ensures that the referenced data is not garbage collected until all Chunks referencing that object are GC'd from all workers.

Conveniently, if you pass in a Chunk object as an input to a Dagger task, then the task's payload function will get executed with the value contained in the Chunk. The scheduler also understands Chunks, and will try to schedule tasks close to where their Chunk inputs reside, to reduce communication overhead.

Chunks also have a cached type, a "processor", and a "scope", which are important for identifying the type of the object, where in memory (CPU RAM, GPU VRAM, etc.) the value resides, and where the value is allowed to be transferred and dereferenced. See Processors and Scopes for more details on how these properties can be used to control scheduling behavior around Chunks.

Mutation

Normally, Dagger tasks should be functional and "pure": never mutating their inputs, always producing identical outputs for a given set of inputs, and never producing side effects which might affect future program behavior. However, for certain codes, this restriction ends up costing the user performance and engineering time to work around.

Thankfully, Dagger provides the Dagger.@mutable macro for just this purpose. @mutable allows data to be marked such that it will never be copied or serialized by the scheduler (unless copied by the user). When used as an argument to a task, the task will be forced to execute on the same worker that @mutable was called on. For example:

Dagger.@mutable worker=2 Threads.Atomic{Int}(0)
+Data Management · Dagger.jl

Data Management

Dagger is not just a computing platform - it also has awareness of where each piece of data resides, and will move data between workers and perform conversions as necessary to satisfy the needs of your tasks.

Chunks

Dagger often needs to move data between workers to allow a task to execute. To make this efficient when communicating potentially large units of data, Dagger uses a remote reference, called a Chunk, to refer to objects which may exist on another worker. Chunks are backed by a distributed refcounting mechanism provided by MemPool.jl, which ensures that the referenced data is not garbage collected until all Chunks referencing that object are GC'd from all workers.

Conveniently, if you pass in a Chunk object as an input to a Dagger task, then the task's payload function will get executed with the value contained in the Chunk. The scheduler also understands Chunks, and will try to schedule tasks close to where their Chunk inputs reside, to reduce communication overhead.

Chunks also have a cached type, a "processor", and a "scope", which are important for identifying the type of the object, where in memory (CPU RAM, GPU VRAM, etc.) the value resides, and where the value is allowed to be transferred and dereferenced. See Processors and Scopes for more details on how these properties can be used to control scheduling behavior around Chunks.

Mutation

Normally, Dagger tasks should be functional and "pure": never mutating their inputs, always producing identical outputs for a given set of inputs, and never producing side effects which might affect future program behavior. However, for certain codes, this restriction ends up costing the user performance and engineering time to work around.

Thankfully, Dagger provides the Dagger.@mutable macro for just this purpose. @mutable allows data to be marked such that it will never be copied or serialized by the scheduler (unless copied by the user). When used as an argument to a task, the task will be forced to execute on the same worker that @mutable was called on. For example:

Dagger.@mutable worker=2 Threads.Atomic{Int}(0)
 x::Dagger.Chunk # The result is always a `Chunk`
 
 # x is now considered mutable, and may only be accessed on worker 2:
@@ -12,4 +12,4 @@
 wait.([Dagger.@spawn Threads.atomic_add!(cs, 1) for i in 1:1000])
 
 # And let's fetch the total sum of all counters:
-@assert sum(map(ctr->fetch(ctr)[], cs)) == 1000

Note that map, when used on a shard, will execute the provided function once per shard "piece", and each result is considered immutable. map is an easy way to make a copy of each piece of the shard, to be later reduced, scanned, etc.

Further details about what arguments can be passed to @shard/shard can be found in Data Management Functions.

+@assert sum(map(ctr->fetch(ctr)[], cs)) == 1000

Note that map, when used on a shard, will execute the provided function once per shard "piece", and each result is considered immutable. map is an easy way to make a copy of each piece of the shard, to be later reduced, scanned, etc.

Further details about what arguments can be passed to @shard/shard can be found in Data Management Functions.

diff --git a/dev/datadeps/index.html b/dev/datadeps/index.html new file mode 100644 index 000000000..49bb094f7 --- /dev/null +++ b/dev/datadeps/index.html @@ -0,0 +1,32 @@ + +Datadeps · Dagger.jl

Datadeps (Data Dependencies)

For many programs, the restriction that tasks cannot write to their arguments feels overly restrictive and makes certain kinds of programs (such as in-place linear algebra) hard to express efficiently in Dagger. Thankfully, there is a solution: spawn_datadeps. This function constructs a "datadeps region", within which tasks are allowed to write to their arguments, with parallelism controlled via dependencies specified via argument annotations. Let's look at a simple example to make things concrete:

A = rand(1000)
+B = rand(1000)
+C = zeros(1000)
+add!(X, Y) = X .+= Y
+Dagger.spawn_datadeps() do
+    Dagger.@spawn add!(InOut(B), In(A))
+    Dagger.@spawn copyto!(Out(C), In(B))
+end

In this example, we have two Dagger tasks being launched, one adding A into B, and the other copying B into C. The add! task is specifying that A is being only read from (In for "input"), and that B is being read from and written to (Out for "output", InOut for "input and output"). The copyto task, similarly, is specifying that B is being read from, and C is only being written to.

Without spawn_datadeps and In, Out, and InOut, the result of these tasks would be undefined; the two tasks could execute in parallel, or the copyto! could occur before the add!, resulting in all kinds of mayhem. However, spawn_datadeps changes things: because we have told Dagger how our tasks access their arguments, Dagger knows to control the parallelism and ordering, and ensure that add! executes and finishes before copyto! begins, ensuring that copyto! "sees" the changes to B before executing.

There is another important aspect of spawn_datadeps that makes the above code work: if all of the Dagger.@spawn macros are removed, along with the dependency specifiers, the program would still produce the same results, without using Dagger. In other words, the parallel (Dagger) version of the program produces identical results to the serial (non-Dagger) version of the program. This is similar to using Dagger with purely functional tasks and without spawn_datadeps - removing Dagger.@spawn will still result in a correct (sequential and possibly slower) version of the program. Basically, spawn_datadeps will ensure that Dagger respects the ordering and dependencies of a program, while still providing parallelism, where possible.

But where is the parallelism? The above example doesn't actually have any parallelism to exploit! Let's take a look at another example to see the datadeps model truly shine:

# Tree reduction of multiple arrays into the first array
+function tree_reduce!(op::Base.Callable, As::Vector{<:Array})
+    Dagger.spawn_datadeps() do
+        to_reduce = Vector[]
+        push!(to_reduce, As)
+        while !isempty(to_reduce)
+            As = pop!(to_reduce)
+            n = length(As)
+            if n == 2
+                Dagger.@spawn Base.mapreducedim!(identity, op, InOut(As[1]), In(As[2]))
+            elseif n > 2
+                push!(to_reduce, [As[1], As[div(n,2)+1]])
+                push!(to_reduce, As[1:div(n,2)])
+                push!(to_reduce, As[div(n,2)+1:end])
+            end
+        end
+    end
+    return As[1]
+end
+
+As = [rand(1000) for _ in 1:1000]
+Bs = copy.(As)
+tree_reduce!(+, As)
+@assert isapprox(As[1], reduce((x,y)->x .+ y, Bs))

In the above implementation of tree_reduce! (which is designed to perform an elementwise reduction across a vector of arrays), we have a tree reduction operation where pairs of arrays are reduced, starting with neighboring pairs, and then reducing pairs of reduction results, etc. until the final result is in As[1]. We can see that the application of Dagger to this algorithm is simple - only the single Base.mapreducedim! call is passed to Dagger - yet due to the data dependencies and the algorithm's structure, there should be plenty of parallelism to be exploited across each of the parallel reductions at each "level" of the reduction tree. Specifically, any two Dagger.@spawn calls which access completely different pairs of arrays can execute in parallel, while any call which has an In on an array will wait for any previous call which has an InOut on that same array.

Additionally, we can notice a powerful feature of this model - if the Dagger.@spawn macro is removed, the code still remains correct, but simply runs sequentially. This means that the structure of the program doesn't have to change in order to use Dagger for parallelization, which can make applying Dagger to existing algorithms quite effortless.

diff --git a/dev/dynamic/index.html b/dev/dynamic/index.html index cf68639df..b077c7cdf 100644 --- a/dev/dynamic/index.html +++ b/dev/dynamic/index.html @@ -1,5 +1,5 @@ -Dynamic Scheduler Control · Dagger.jl

Dynamic Scheduler Control

Normally, Dagger executes static graphs defined with delayed and @par. However, it is possible for thunks to dynamically modify the graph at runtime, and to generally exert direct control over the scheduler's internal state. The Dagger.sch_handle function provides this functionality within a thunk:

function mythunk(x)
+Dynamic Scheduler Control · Dagger.jl

Dynamic Scheduler Control

Normally, Dagger executes static graphs defined with delayed and @par. However, it is possible for thunks to dynamically modify the graph at runtime, and to generally exert direct control over the scheduler's internal state. The Dagger.sch_handle function provides this functionality within a thunk:

function mythunk(x)
     h = Dagger.sch_handle()
     Dagger.halt!(h)
     return x
@@ -9,4 +9,4 @@
         y + 1
     end
     return fetch(h, id)
-end

Alternatively, Base.wait can be used when one does not wish to retrieve the returned value of the thunk.

Users with needs not covered by the built-in functions should use the Dagger.exec! function to pass a user-defined function, closure, or callable struct to the scheduler, along with a payload which will be provided to that function:

Dagger.exec!

Note that all functions called by Dagger.exec! take the scheduler's internal lock, so it's safe to manipulate the internal ComputeState object within the user-provided function.

+end

Alternatively, Base.wait can be used when one does not wish to retrieve the returned value of the thunk.

Users with needs not covered by the built-in functions should use the Dagger.exec! function to pass a user-defined function, closure, or callable struct to the scheduler, along with a payload which will be provided to that function:

Dagger.exec!

Note that all functions called by Dagger.exec! take the scheduler's internal lock, so it's safe to manipulate the internal ComputeState object within the user-provided function.

diff --git a/dev/index.html b/dev/index.html index c981367e1..ee3a298e7 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,5 +1,5 @@ -Home · Dagger.jl

Dagger: A framework for out-of-core and parallel execution

Dagger.jl is a framework for parallel computing across all kinds of resources, like CPUs and GPUs, and across multiple threads and multiple servers.


Quickstart: Task Spawning

For more details: Task Spawning

Launch a task

If you want to call a function myfunc with arguments arg1, arg2, arg3, and keyword argument color=:red:

function myfunc(arg1, arg2, arg3; color=:blue)
+Home · Dagger.jl

Dagger: A framework for out-of-core and parallel execution

Dagger.jl is a framework for parallel computing across all kinds of resources, like CPUs and GPUs, and across multiple threads and multiple servers.


Quickstart: Task Spawning

For more details: Task Spawning

Launch a task

If you want to call a function myfunc with arguments arg1, arg2, arg3, and keyword argument color=:red:

function myfunc(arg1, arg2, arg3; color=:blue)
     arg_total = arg1 + arg2 * arg3
     printstyled(arg_total; color)
     return arg_total
@@ -67,4 +67,4 @@
 DA = distribute(A, Blocks(4, 4))

Allocate a distributed array directly

To allocate a DArray, just pass your Blocks partitioning object into the appropriate allocation function, such as rand, ones, or zeros:

rand(Blocks(20, 20), 100, 100)
 ones(Blocks(20, 100), 100, 2000)
 zeros(Blocks(50, 20), 300, 200)

Convert a DArray back into an Array

To get back an Array from a DArray, just call collect:

DA = rand(Blocks(32, 32), 256, 128)
-collect(DA) # returns a `Matrix{Float64}`
+collect(DA) # returns a `Matrix{Float64}`
diff --git a/dev/logging/index.html b/dev/logging/index.html index 9d2f50614..26dae7123 100644 --- a/dev/logging/index.html +++ b/dev/logging/index.html @@ -1,5 +1,5 @@ -Logging and Graphing · Dagger.jl

Logging and Graphing

Dagger's scheduler keeps track of the important and potentially expensive actions it does, such as moving data between workers or executing thunks, and tracks how much time and memory allocations these operations consume, among other things. It does it through the TimespanLogging.jl package (which used to be directly integrated into Dagger). Saving this information somewhere accessible is disabled by default, but it's quite easy to turn it on, by setting a "log sink" in the Context being used, as ctx.log_sink. A variety of log sinks are built-in to TimespanLogging; the NoOpLog is the default log sink when one isn't explicitly specified, and disables logging entirely (to minimize overhead). There are currently two other log sinks of interest; the first and newer of the two is the MultiEventLog, which generates multiple independent log streams, one per "consumer" (details in the next section). The second and older sink is the LocalEventLog, which is explained later in this document. Most users are recommended to use the MultiEventLog since it's far more flexible and extensible, and is more performant in general.

MultiEventLog

The MultiEventLog is intended to be configurable to exclude unnecessary information, and to include any built-in or user-defined metrics. It stores a set of "sub-log" streams internally, appending a single element to each of them when an event is generated. This element can be called a "sub-event" (to distinguish it from the higher-level "event" that Dagger creates), and is created by a "consumer". A consumer is a function or callable struct that, when called with the Event object generated by TimespanLogging, returns a sub-event characterizing whatever information the consumer represents. For example, the Dagger.Events.BytesAllocd consumer calculates the total bytes allocated and live at any given time within Dagger, and returns the current value when called. Let's construct one:

ctx = Context()
+Logging and Graphing · Dagger.jl

Logging and Graphing

Dagger's scheduler keeps track of the important and potentially expensive actions it does, such as moving data between workers or executing thunks, and tracks how much time and memory allocations these operations consume, among other things. It does it through the TimespanLogging.jl package (which used to be directly integrated into Dagger). Saving this information somewhere accessible is disabled by default, but it's quite easy to turn it on, by setting a "log sink" in the Context being used, as ctx.log_sink. A variety of log sinks are built-in to TimespanLogging; the NoOpLog is the default log sink when one isn't explicitly specified, and disables logging entirely (to minimize overhead). There are currently two other log sinks of interest; the first and newer of the two is the MultiEventLog, which generates multiple independent log streams, one per "consumer" (details in the next section). The second and older sink is the LocalEventLog, which is explained later in this document. Most users are recommended to use the MultiEventLog since it's far more flexible and extensible, and is more performant in general.

MultiEventLog

The MultiEventLog is intended to be configurable to exclude unnecessary information, and to include any built-in or user-defined metrics. It stores a set of "sub-log" streams internally, appending a single element to each of them when an event is generated. This element can be called a "sub-event" (to distinguish it from the higher-level "event" that Dagger creates), and is created by a "consumer". A consumer is a function or callable struct that, when called with the Event object generated by TimespanLogging, returns a sub-event characterizing whatever information the consumer represents. For example, the Dagger.Events.BytesAllocd consumer calculates the total bytes allocated and live at any given time within Dagger, and returns the current value when called. Let's construct one:

ctx = Context()
 ml = TimespanLogging.MultiEventLog()
 
 # Add the BytesAllocd consumer to the log as `:bytes`
@@ -15,4 +15,4 @@
 Dagger.@spawn 1+a

There are a variety of other consumers built-in to TimespanLogging and Dagger, under the TimespanLogging.Events and Dagger.Events modules, respectively; see Dagger Types and TimespanLogging Types for details.

The MultiEventLog also has a mechanism to call a set of functions, called "aggregators", after all consumers have been executed, and are passed the full set of log streams as a Dict{Symbol,Vector{Any}}. The only one currently shipped with TimespanLogging directly is the LogWindow, and DaggerWebDash.jl has the TableStorage which integrates with it; see DaggerWebDash Types for details.

LocalEventLog

The LocalEventLog is generally only useful when you want combined events (event start and finish combined as a single unit), and only care about a few simple built-in generated events. Let's attach one to our context:

ctx = Context()
 log = TimespanLogging.LocalEventLog()
 ctx.log_sink = log

Now anytime ctx is used as the context for a scheduler, the scheduler will log events into log.

Once sufficient data has been accumulated into a LocalEventLog, it can be gathered to a single host via TimespanLogging.get_logs!(log). The result is a Vector of TimespanLogging.Timespan objects, which describe some metadata about an operation that occured and the scheduler logged. These events may be introspected directly, or may also be rendered to a DOT-format string:

logs = TimespanLogging.get_logs!(log)
-str = Dagger.show_plan(logs)

Dagger.show_plan can also be called as Dagger.show_plan(io::IO, logs) to write the graph to a file or other IO object. The string generated by this function may be passed to an external tool like Graphviz for rendering. Note that this method doesn't display input arguments to the DAG (non-Thunks); you can call Dagger.show_plan(logs, thunk), where thunk is the output Thunk of the DAG, to render argument nodes.

Note

TimespanLogging.get_logs! clears out the event logs, so that old events don't mix with new ones from future DAGs.

As a convenience, it's possible to set ctx.log_file to the path to an output file, and then calls to compute(ctx, ...)/collect(ctx, ...) will automatically write the graph in DOT format to that path. There is also a benefit to this approach over manual calls to get_logs! and show_plan: DAGs which aren't Thunks (such as operations on the Dagger.DArray) will be properly rendered with input arguments (which normally aren't rendered because a Thunk is dynamically generated from such operations by Dagger before scheduling).

FilterLog

The FilterLog exists to allow writing events to a user-defined location (such as a database, file, or network socket). It is not currently tested or documented.

+str = Dagger.show_plan(logs)

Dagger.show_plan can also be called as Dagger.show_plan(io::IO, logs) to write the graph to a file or other IO object. The string generated by this function may be passed to an external tool like Graphviz for rendering. Note that this method doesn't display input arguments to the DAG (non-Thunks); you can call Dagger.show_plan(logs, thunk), where thunk is the output Thunk of the DAG, to render argument nodes.

Note

TimespanLogging.get_logs! clears out the event logs, so that old events don't mix with new ones from future DAGs.

As a convenience, it's possible to set ctx.log_file to the path to an output file, and then calls to compute(ctx, ...)/collect(ctx, ...) will automatically write the graph in DOT format to that path. There is also a benefit to this approach over manual calls to get_logs! and show_plan: DAGs which aren't Thunks (such as operations on the Dagger.DArray) will be properly rendered with input arguments (which normally aren't rendered because a Thunk is dynamically generated from such operations by Dagger before scheduling).

FilterLog

The FilterLog exists to allow writing events to a user-defined location (such as a database, file, or network socket). It is not currently tested or documented.

diff --git a/dev/processors/index.html b/dev/processors/index.html index ea094c4b5..cda72be52 100644 --- a/dev/processors/index.html +++ b/dev/processors/index.html @@ -1,5 +1,5 @@ -Processors · Dagger.jl

Processors

Dagger contains a flexible mechanism to represent CPUs, GPUs, and other devices that the scheduler can place user work on. The individual devices that are capable of computing a user operation are called "processors", and are subtypes of Dagger.Processor. Processors are automatically detected by Dagger at scheduler initialization, and placed in a hierarchy reflecting the physical (network-, link-, or memory-based) boundaries between processors in the hierarchy. The scheduler uses the information in this hierarchy to efficiently schedule and partition user operations.

Dagger's Chunk objects can have a processor associated with them that defines where the contained data "resides". Each processor has a set of functions that define the mechanisms and rules by which the data can be transferred between similar or different kinds of processors, and will be called by Dagger's scheduler automatically when fetching function arguments (or the function itself) for computation on a given processor.

Setting the processor on a function argument is done by wrapping it in a Chunk with Dagger.tochunk:

a = 1
+Processors · Dagger.jl

Processors

Dagger contains a flexible mechanism to represent CPUs, GPUs, and other devices that the scheduler can place user work on. The individual devices that are capable of computing a user operation are called "processors", and are subtypes of Dagger.Processor. Processors are automatically detected by Dagger at scheduler initialization, and placed in a hierarchy reflecting the physical (network-, link-, or memory-based) boundaries between processors in the hierarchy. The scheduler uses the information in this hierarchy to efficiently schedule and partition user operations.

Dagger's Chunk objects can have a processor associated with them that defines where the contained data "resides". Each processor has a set of functions that define the mechanisms and rules by which the data can be transferred between similar or different kinds of processors, and will be called by Dagger's scheduler automatically when fetching function arguments (or the function itself) for computation on a given processor.

Setting the processor on a function argument is done by wrapping it in a Chunk with Dagger.tochunk:

a = 1
 b = 2
 # Let's say `b` "resides" on the second thread of the first worker:
 b_chunk = Dagger.tochunk(b, Dagger.ThreadProc(1, 2))::Dagger.Chunk
@@ -33,4 +33,4 @@
 @show fetch(job) |> unique
 
 # and cleanup after ourselves...
-workers() |> rmprocs
+workers() |> rmprocs
diff --git a/dev/propagation/index.html b/dev/propagation/index.html index e65869b23..f682e5649 100644 --- a/dev/propagation/index.html +++ b/dev/propagation/index.html @@ -1,5 +1,5 @@ -Option Propagation · Dagger.jl

Option Propagation

Most options passed to Dagger are passed via @spawn/spawn or delayed directly. This works well when an option only needs to be set for a single thunk, but is cumbersome when the same option needs to be set on multiple thunks, or set recursively on thunks spawned within other thunks. Thankfully, Dagger provides the with_options function to make this easier. This function is very powerful, by nature of using "context variables"; let's first see some example code to help explain it:

function f(x)
+Option Propagation · Dagger.jl

Option Propagation

Most options passed to Dagger are passed via @spawn/spawn or delayed directly. This works well when an option only needs to be set for a single thunk, but is cumbersome when the same option needs to be set on multiple thunks, or set recursively on thunks spawned within other thunks. Thankfully, Dagger provides the with_options function to make this easier. This function is very powerful, by nature of using "context variables"; let's first see some example code to help explain it:

function f(x)
     m = Dagger.@spawn myid()
     return Dagger.@spawn x+m
 end
@@ -15,4 +15,4 @@
     # Or, if `scope` might not have been propagated as an option, we can give
     # it a default value:
     fetch(@async @assert Dagger.get_options(:scope, AnyScope()) == ProcessScope(2))
-end

This is a very powerful concept: with a single call to with_options, we can apply any set of options to any nested set of operations. This is great for isolating large workloads to different workers or processors, defining global checkpoint/restore behavior, and more.

+end

This is a very powerful concept: with a single call to with_options, we can apply any set of options to any nested set of operations. This is great for isolating large workloads to different workers or processors, defining global checkpoint/restore behavior, and more.

diff --git a/dev/scheduler-internals/index.html b/dev/scheduler-internals/index.html index 96bae3882..91e2a44d7 100644 --- a/dev/scheduler-internals/index.html +++ b/dev/scheduler-internals/index.html @@ -1,2 +1,2 @@ -Scheduler Internals · Dagger.jl

Scheduler Internals

Dagger's scheduler can be found primarily in the Dagger.Sch module. It performs a variety of functions to support tasks and data, and as such is a complex system. This documentation attempts to shed light on how the scheduler works internally (from a somewhat high level), with the hope that it will help users and contributors understand how to improve the scheduler or fix any bugs that may arise from it.

Warn

Dagger's scheduler is evolving at a rapid pace, and is a complex mix of interacting parts. As such, this documentation may become out of date very quickly, and may not reflect the current state of the scheduler. Please feel free to file PRs to correct or improve this document, but also beware that the true functionality is defined in Dagger's source!

Core vs. Worker Schedulers

Dagger's scheduler is really two kinds of entities: the "core" scheduler, and "worker" schedulers:

The core scheduler runs on worker 1, thread 1, and is the entrypoint to tasks which have been submitted. The core scheduler manages all task dependencies, notifies calls to wait and fetch of task completion, and generally performs initial task placement. The core scheduler has cached information about each worker and their processors, and uses that information (together with metrics about previous tasks and other aspects of the Dagger runtime) to generate a near-optimal just-in-time task schedule.

The worker schedulers each run as a set of tasks across all workers and all processors, and handles data movement and task execution. Once the core scheduler has scheduled and launched a task, it arrives at the worker scheduler for handling. The worker scheduler will pass the task to a queue for the assigned processor, where it will wait until the processor has a sufficient amount of "occupancy" for the task. Once the processor is ready for the task, it will first fetch all of the task's arguments from other workers, and then it will execute the task, package the task's result into a Chunk, and pass that back to the core scheduler.

Core: Basics

The core scheduler contains a single internal instance of type ComputeState, which maintains (among many other things) all necessary state to represent the set of waiting, ready, and running tasks, cached task results, and maps of interdependencies between tasks. It uses Julia's task infrastructure to asynchronously send work requests to remote Julia processes, and uses a RemoteChannel as an inbound queue for completed work.

There is an outer loop which drives the scheduler, which continues executing either eternally (excepting any internal scheduler errors or Julia exiting), or until all tasks in the graph have completed executing and the final task in the graph is ready to be returned to the user. This outer loop continuously performs two main operations: the first is to launch the execution of nodes which have become "ready" to execute; the second is to "finish" nodes which have been completed.

Core: Initialization

At the very beginning of a scheduler's lifecycle, a ComputeState object is allocated, workers are asynchronously initialized, and the outer loop is started. Additionally, the scheduler is passed one or more tasks to start scheduling, and so it will also fill out the ComputeState with the computed sets of dependencies between tasks, initially placing all tasks are placed in the "waiting" state. If any of the tasks are found to only have non-task input arguments, then they are considered ready to execute and moved from the "waiting" state to "ready".

Core: Outer Loop

At each outer loop iteration, all tasks in the "ready" state will be scheduled, moved into the "running" state, and asynchronously sent to the workers for execution (called "firing"). Once all tasks are either waiting or running, the scheduler may sleep until actions need to be performed

When fired tasks have completed executing, an entry will exist in the inbound queue signaling the task's result and other metadata. At this point, the most recently-queued task is removed from the queue, "finished", and placed in the "finished" state. Finishing usually unlocks downstream tasks from the waiting state and allows them to transition to the ready state.

Core: Task Scheduling

Once one or more tasks are ready to be scheduled, the scheduler will begin assigning them to the processors within each available worker. This is a sequential operation consisting of:

  • Selecting candidate processors based on the task's combined scope
  • Calculating the cost to move needed data to each candidate processor
  • Adding a "wait time" cost proportional to the estimated run time for all the tasks currently executing on each candidate processor
  • Selecting the least costly candidate processor as the executor for this task

After these operations have been performed for each task, the tasks will be fired off to their appropriate worker for handling.

Worker: Task Execution

Once a worker receives one or more tasks to be executed, the tasks are immediately enqueued into the appropriate processor's queue, and the processors are notified that work is available to be executed. The processors will asynchronously look at their queues and pick the task with the lowest occupancy first; a task with zero occupancy will always be executed immediately, but most tasks have non-zero occupancy, and so will be executed in order of increasing occupancy (effectively prioritizing asynchronous tasks like I/O).

Before a task begins executions, the processor will collect the task's arguments from other workers as needed, and convert them as needed to execute correctly according to the processor's semantics. This operation is called a "move".

Once a task's arguments have been moved, the task's function will be called with the arguments, and assuming the task doesn't throw an error, the result will be wrapped in a Chunk object. This Chunk will then be sent back to the core scheduler along with information about which task generated it. If the task does throw an error, then the error is instead propagated to the core scheduler, along with a flag indicating that the task failed.

Worker: Workload Balancing

In general, Dagger's core scheduler tries to balance workloads as much as possible across all the available processors, but it can fail to do so effectively when either its cached knowledge of each worker's status is outdated, or when its estimates about the task's behavior are inaccurate. To minimize the possibility of workload imbalance, the worker schedulers' processors will attempt to steal tasks from each other when they are under-occupied. Tasks will only be stolen if the task's scope is compatibl with the processor attempting the steal, so tasks with wider scopes have better balancing potential.

Core: Finishing

Finishing a task which has completed executing is generally a simple set of operations:

  • The task's result is registered in the ComputeState for any tasks or user code which will need it
  • Any unneeded data is cleared from the scheduler (such as preserved Chunk arguments)
  • Downstream dependencies will be moved from "waiting" to "ready" if this task was the last upstream dependency to them

Core: Shutdown

If the core scheduler needs to shutdown due to an error or Julia exiting, then all workers will be shutdown, and the scheduler will close any open channels. If shutdown was due to an error, then an error will be printed or thrown back to the caller.

+Scheduler Internals · Dagger.jl

Scheduler Internals

Dagger's scheduler can be found primarily in the Dagger.Sch module. It performs a variety of functions to support tasks and data, and as such is a complex system. This documentation attempts to shed light on how the scheduler works internally (from a somewhat high level), with the hope that it will help users and contributors understand how to improve the scheduler or fix any bugs that may arise from it.

Warn

Dagger's scheduler is evolving at a rapid pace, and is a complex mix of interacting parts. As such, this documentation may become out of date very quickly, and may not reflect the current state of the scheduler. Please feel free to file PRs to correct or improve this document, but also beware that the true functionality is defined in Dagger's source!

Core vs. Worker Schedulers

Dagger's scheduler is really two kinds of entities: the "core" scheduler, and "worker" schedulers:

The core scheduler runs on worker 1, thread 1, and is the entrypoint to tasks which have been submitted. The core scheduler manages all task dependencies, notifies calls to wait and fetch of task completion, and generally performs initial task placement. The core scheduler has cached information about each worker and their processors, and uses that information (together with metrics about previous tasks and other aspects of the Dagger runtime) to generate a near-optimal just-in-time task schedule.

The worker schedulers each run as a set of tasks across all workers and all processors, and handles data movement and task execution. Once the core scheduler has scheduled and launched a task, it arrives at the worker scheduler for handling. The worker scheduler will pass the task to a queue for the assigned processor, where it will wait until the processor has a sufficient amount of "occupancy" for the task. Once the processor is ready for the task, it will first fetch all of the task's arguments from other workers, and then it will execute the task, package the task's result into a Chunk, and pass that back to the core scheduler.

Core: Basics

The core scheduler contains a single internal instance of type ComputeState, which maintains (among many other things) all necessary state to represent the set of waiting, ready, and running tasks, cached task results, and maps of interdependencies between tasks. It uses Julia's task infrastructure to asynchronously send work requests to remote Julia processes, and uses a RemoteChannel as an inbound queue for completed work.

There is an outer loop which drives the scheduler, which continues executing either eternally (excepting any internal scheduler errors or Julia exiting), or until all tasks in the graph have completed executing and the final task in the graph is ready to be returned to the user. This outer loop continuously performs two main operations: the first is to launch the execution of nodes which have become "ready" to execute; the second is to "finish" nodes which have been completed.

Core: Initialization

At the very beginning of a scheduler's lifecycle, a ComputeState object is allocated, workers are asynchronously initialized, and the outer loop is started. Additionally, the scheduler is passed one or more tasks to start scheduling, and so it will also fill out the ComputeState with the computed sets of dependencies between tasks, initially placing all tasks are placed in the "waiting" state. If any of the tasks are found to only have non-task input arguments, then they are considered ready to execute and moved from the "waiting" state to "ready".

Core: Outer Loop

At each outer loop iteration, all tasks in the "ready" state will be scheduled, moved into the "running" state, and asynchronously sent to the workers for execution (called "firing"). Once all tasks are either waiting or running, the scheduler may sleep until actions need to be performed

When fired tasks have completed executing, an entry will exist in the inbound queue signaling the task's result and other metadata. At this point, the most recently-queued task is removed from the queue, "finished", and placed in the "finished" state. Finishing usually unlocks downstream tasks from the waiting state and allows them to transition to the ready state.

Core: Task Scheduling

Once one or more tasks are ready to be scheduled, the scheduler will begin assigning them to the processors within each available worker. This is a sequential operation consisting of:

  • Selecting candidate processors based on the task's combined scope
  • Calculating the cost to move needed data to each candidate processor
  • Adding a "wait time" cost proportional to the estimated run time for all the tasks currently executing on each candidate processor
  • Selecting the least costly candidate processor as the executor for this task

After these operations have been performed for each task, the tasks will be fired off to their appropriate worker for handling.

Worker: Task Execution

Once a worker receives one or more tasks to be executed, the tasks are immediately enqueued into the appropriate processor's queue, and the processors are notified that work is available to be executed. The processors will asynchronously look at their queues and pick the task with the lowest occupancy first; a task with zero occupancy will always be executed immediately, but most tasks have non-zero occupancy, and so will be executed in order of increasing occupancy (effectively prioritizing asynchronous tasks like I/O).

Before a task begins executions, the processor will collect the task's arguments from other workers as needed, and convert them as needed to execute correctly according to the processor's semantics. This operation is called a "move".

Once a task's arguments have been moved, the task's function will be called with the arguments, and assuming the task doesn't throw an error, the result will be wrapped in a Chunk object. This Chunk will then be sent back to the core scheduler along with information about which task generated it. If the task does throw an error, then the error is instead propagated to the core scheduler, along with a flag indicating that the task failed.

Worker: Workload Balancing

In general, Dagger's core scheduler tries to balance workloads as much as possible across all the available processors, but it can fail to do so effectively when either its cached knowledge of each worker's status is outdated, or when its estimates about the task's behavior are inaccurate. To minimize the possibility of workload imbalance, the worker schedulers' processors will attempt to steal tasks from each other when they are under-occupied. Tasks will only be stolen if the task's scope is compatibl with the processor attempting the steal, so tasks with wider scopes have better balancing potential.

Core: Finishing

Finishing a task which has completed executing is generally a simple set of operations:

  • The task's result is registered in the ComputeState for any tasks or user code which will need it
  • Any unneeded data is cleared from the scheduler (such as preserved Chunk arguments)
  • Downstream dependencies will be moved from "waiting" to "ready" if this task was the last upstream dependency to them

Core: Shutdown

If the core scheduler needs to shutdown due to an error or Julia exiting, then all workers will be shutdown, and the scheduler will close any open channels. If shutdown was due to an error, then an error will be printed or thrown back to the caller.

diff --git a/dev/scheduler-visualization/index.html b/dev/scheduler-visualization/index.html index 6aff7b01c..ac380cf74 100644 --- a/dev/scheduler-visualization/index.html +++ b/dev/scheduler-visualization/index.html @@ -1,5 +1,5 @@ -Scheduler Visualization · Dagger.jl

Scheduler Visualization with DaggerWebDash

When working with Dagger, especially when working with its scheduler, it can be helpful to visualize what Dagger is doing internally. To assist with this, a web dashboard is available in the DaggerWebDash.jl package. This web dashboard uses a web server running within each Dagger worker, along with event logging information, to expose details about the scheduler. Information like worker and processor saturation, memory allocations, profiling traces, and much more are available in easy-to-interpret plots.

Using the dashboard is relatively simple and straightforward; if you run Dagger's benchmarking script, it's enabled for you automatically if the BENCHMARK_RENDER environment variable is set to webdash. This is the easiest way to get started with the web dashboard for new users.

For manual usage, the following snippet of code will suffice:

using Dagger, DaggerWebDash, TimespanLogging
+Scheduler Visualization · Dagger.jl

Scheduler Visualization with DaggerWebDash

When working with Dagger, especially when working with its scheduler, it can be helpful to visualize what Dagger is doing internally. To assist with this, a web dashboard is available in the DaggerWebDash.jl package. This web dashboard uses a web server running within each Dagger worker, along with event logging information, to expose details about the scheduler. Information like worker and processor saturation, memory allocations, profiling traces, and much more are available in easy-to-interpret plots.

Using the dashboard is relatively simple and straightforward; if you run Dagger's benchmarking script, it's enabled for you automatically if the BENCHMARK_RENDER environment variable is set to webdash. This is the easiest way to get started with the web dashboard for new users.

For manual usage, the following snippet of code will suffice:

using Dagger, DaggerWebDash, TimespanLogging
 
 ctx = Context() # or `ctx = Dagger.Sch.eager_context()` for eager API usage
 ml = TimespanLogging.MultiEventLog()
@@ -48,4 +48,4 @@
 ml.aggregators[:d3r] = d3r
 
 ctx.log_sink = ml
-# ... use `ctx`

Once the server has started, you can browse to http://localhost:8080/ (if running on your local machine) to view the plots in real time. The dashboard also provides options at the top of the page to control the drawing speed, enable and disable reading updates from the server (disabling freezes the display at the current instant), and a selector for which worker to look at. If the connection to the server is lost for any reason, the dashboard will attempt to reconnect at 5 second intervals. The dashboard can usually survive restarts of the server perfectly well, although refreshing the page is usually a good idea. Informational messages are also logged to the browser console for debugging.

+# ... use `ctx`

Once the server has started, you can browse to http://localhost:8080/ (if running on your local machine) to view the plots in real time. The dashboard also provides options at the top of the page to control the drawing speed, enable and disable reading updates from the server (disabling freezes the display at the current instant), and a selector for which worker to look at. If the connection to the server is lost for any reason, the dashboard will attempt to reconnect at 5 second intervals. The dashboard can usually survive restarts of the server perfectly well, although refreshing the page is usually a good idea. Informational messages are also logged to the browser console for debugging.

diff --git a/dev/scopes/index.html b/dev/scopes/index.html index d94ed8397..8b183250d 100644 --- a/dev/scopes/index.html +++ b/dev/scopes/index.html @@ -1,5 +1,5 @@ -Scopes · Dagger.jl

Scopes

Sometimes you will have data that is only meaningful in a certain location, such as within a single Julia process, a given server, or even for a specific Dagger processor. We call this location a "scope" in Dagger, denoting the bounds within which the data is meaningful and valid. For example, C pointers are typically scoped to a process, file paths are scoped to one or more servers dependent on filesystem configuration, etc. By default, Dagger doesn't recognize this; it treats everything passed into a task, or generated from a task, as inherently safe to transfer anywhere else. When this is not the case, Dagger provides optional scopes to instruct the scheduler where data is considered valid.

Scope Basics

Let's take the example of a webcam handle generated by VideoIO.jl. This handle is a C pointer, and thus has process scope. We can open the handle on a given process, and set the scope of the resulting data to be locked to the current process with Dagger.scope to construct a ProcessScope:

using VideoIO, Distributed
+Scopes · Dagger.jl

Scopes

Sometimes you will have data that is only meaningful in a certain location, such as within a single Julia process, a given server, or even for a specific Dagger processor. We call this location a "scope" in Dagger, denoting the bounds within which the data is meaningful and valid. For example, C pointers are typically scoped to a process, file paths are scoped to one or more servers dependent on filesystem configuration, etc. By default, Dagger doesn't recognize this; it treats everything passed into a task, or generated from a task, as inherently safe to transfer anywhere else. When this is not the case, Dagger provides optional scopes to instruct the scheduler where data is considered valid.

Scope Basics

Let's take the example of a webcam handle generated by VideoIO.jl. This handle is a C pointer, and thus has process scope. We can open the handle on a given process, and set the scope of the resulting data to be locked to the current process with Dagger.scope to construct a ProcessScope:

using VideoIO, Distributed
 
 function get_handle()
     handle = VideoIO.opencamera()
@@ -54,4 +54,4 @@
 
 d2 = Dagger.@spawn generate(ps2) # Run on process 2
 d3 = Dagger.@spawn generate(ps3) # Run on process 3
-res = Dagger.@spawn d2 * d3 # An error!

Moral of the story: only use scopes when you know you really need them, and if you aren't careful to arrange everything just right, be prepared for Dagger to refuse to schedule your tasks! Scopes should only be used to ensure correctness of your programs, and are not intended to be used to optimize the schedule that Dagger uses for your tasks, since restricting the scope of execution for tasks will necessarily reduce the optimizations that Dagger's scheduler can perform.

+res = Dagger.@spawn d2 * d3 # An error!

Moral of the story: only use scopes when you know you really need them, and if you aren't careful to arrange everything just right, be prepared for Dagger to refuse to schedule your tasks! Scopes should only be used to ensure correctness of your programs, and are not intended to be used to optimize the schedule that Dagger uses for your tasks, since restricting the scope of execution for tasks will necessarily reduce the optimizations that Dagger's scheduler can perform.

diff --git a/dev/search/index.html b/dev/search/index.html index 5bd51fdbe..172ec1f27 100644 --- a/dev/search/index.html +++ b/dev/search/index.html @@ -1,2 +1,2 @@ -Search · Dagger.jl

Loading search...

    +Search · Dagger.jl

    Loading search...

      diff --git a/dev/search_index.js b/dev/search_index.js index feca27064..ff98034a1 100644 --- a/dev/search_index.js +++ b/dev/search_index.js @@ -1,3 +1,3 @@ var documenterSearchIndex = {"docs": -[{"location":"dynamic/#Dynamic-Scheduler-Control","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"","category":"section"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"Normally, Dagger executes static graphs defined with delayed and @par. However, it is possible for thunks to dynamically modify the graph at runtime, and to generally exert direct control over the scheduler's internal state. The Dagger.sch_handle function provides this functionality within a thunk:","category":"page"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"function mythunk(x)\n h = Dagger.sch_handle()\n Dagger.halt!(h)\n return x\nend","category":"page"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"The above example prematurely halts a running scheduler at the next opportunity using Dagger.halt!:","category":"page"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"Dagger.halt!","category":"page"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"There are a variety of other built-in functions available for various uses:","category":"page"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"Dagger.get_dag_ids Dagger.add_thunk!","category":"page"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"When working with thunks acquired from get_dag_ids or add_thunk!, you will have ThunkID objects which refer to a thunk by ID. Scheduler control functions which work with thunks accept or return ThunkIDs. For example, one can create a new thunkt and get its result with Base.fetch:","category":"page"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"function mythunk(x)\n h = Dagger.sch_handle()\n id = Dagger.add_thunk!(h, x) do y\n y + 1\n end\n return fetch(h, id)\nend","category":"page"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"Alternatively, Base.wait can be used when one does not wish to retrieve the returned value of the thunk.","category":"page"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"Users with needs not covered by the built-in functions should use the Dagger.exec! function to pass a user-defined function, closure, or callable struct to the scheduler, along with a payload which will be provided to that function:","category":"page"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"Dagger.exec!","category":"page"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"Note that all functions called by Dagger.exec! take the scheduler's internal lock, so it's safe to manipulate the internal ComputeState object within the user-provided function.","category":"page"},{"location":"scheduler-internals/#Scheduler-Internals","page":"Scheduler Internals","title":"Scheduler Internals","text":"","category":"section"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"Dagger's scheduler can be found primarily in the Dagger.Sch module. It performs a variety of functions to support tasks and data, and as such is a complex system. This documentation attempts to shed light on how the scheduler works internally (from a somewhat high level), with the hope that it will help users and contributors understand how to improve the scheduler or fix any bugs that may arise from it.","category":"page"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"warn: Warn\nDagger's scheduler is evolving at a rapid pace, and is a complex mix of interacting parts. As such, this documentation may become out of date very quickly, and may not reflect the current state of the scheduler. Please feel free to file PRs to correct or improve this document, but also beware that the true functionality is defined in Dagger's source!","category":"page"},{"location":"scheduler-internals/#Core-vs.-Worker-Schedulers","page":"Scheduler Internals","title":"Core vs. Worker Schedulers","text":"","category":"section"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"Dagger's scheduler is really two kinds of entities: the \"core\" scheduler, and \"worker\" schedulers:","category":"page"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"The core scheduler runs on worker 1, thread 1, and is the entrypoint to tasks which have been submitted. The core scheduler manages all task dependencies, notifies calls to wait and fetch of task completion, and generally performs initial task placement. The core scheduler has cached information about each worker and their processors, and uses that information (together with metrics about previous tasks and other aspects of the Dagger runtime) to generate a near-optimal just-in-time task schedule.","category":"page"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"The worker schedulers each run as a set of tasks across all workers and all processors, and handles data movement and task execution. Once the core scheduler has scheduled and launched a task, it arrives at the worker scheduler for handling. The worker scheduler will pass the task to a queue for the assigned processor, where it will wait until the processor has a sufficient amount of \"occupancy\" for the task. Once the processor is ready for the task, it will first fetch all of the task's arguments from other workers, and then it will execute the task, package the task's result into a Chunk, and pass that back to the core scheduler.","category":"page"},{"location":"scheduler-internals/#Core:-Basics","page":"Scheduler Internals","title":"Core: Basics","text":"","category":"section"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"The core scheduler contains a single internal instance of type ComputeState, which maintains (among many other things) all necessary state to represent the set of waiting, ready, and running tasks, cached task results, and maps of interdependencies between tasks. It uses Julia's task infrastructure to asynchronously send work requests to remote Julia processes, and uses a RemoteChannel as an inbound queue for completed work.","category":"page"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"There is an outer loop which drives the scheduler, which continues executing either eternally (excepting any internal scheduler errors or Julia exiting), or until all tasks in the graph have completed executing and the final task in the graph is ready to be returned to the user. This outer loop continuously performs two main operations: the first is to launch the execution of nodes which have become \"ready\" to execute; the second is to \"finish\" nodes which have been completed.","category":"page"},{"location":"scheduler-internals/#Core:-Initialization","page":"Scheduler Internals","title":"Core: Initialization","text":"","category":"section"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"At the very beginning of a scheduler's lifecycle, a ComputeState object is allocated, workers are asynchronously initialized, and the outer loop is started. Additionally, the scheduler is passed one or more tasks to start scheduling, and so it will also fill out the ComputeState with the computed sets of dependencies between tasks, initially placing all tasks are placed in the \"waiting\" state. If any of the tasks are found to only have non-task input arguments, then they are considered ready to execute and moved from the \"waiting\" state to \"ready\".","category":"page"},{"location":"scheduler-internals/#Core:-Outer-Loop","page":"Scheduler Internals","title":"Core: Outer Loop","text":"","category":"section"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"At each outer loop iteration, all tasks in the \"ready\" state will be scheduled, moved into the \"running\" state, and asynchronously sent to the workers for execution (called \"firing\"). Once all tasks are either waiting or running, the scheduler may sleep until actions need to be performed","category":"page"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"When fired tasks have completed executing, an entry will exist in the inbound queue signaling the task's result and other metadata. At this point, the most recently-queued task is removed from the queue, \"finished\", and placed in the \"finished\" state. Finishing usually unlocks downstream tasks from the waiting state and allows them to transition to the ready state.","category":"page"},{"location":"scheduler-internals/#Core:-Task-Scheduling","page":"Scheduler Internals","title":"Core: Task Scheduling","text":"","category":"section"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"Once one or more tasks are ready to be scheduled, the scheduler will begin assigning them to the processors within each available worker. This is a sequential operation consisting of:","category":"page"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"Selecting candidate processors based on the task's combined scope\nCalculating the cost to move needed data to each candidate processor\nAdding a \"wait time\" cost proportional to the estimated run time for all the tasks currently executing on each candidate processor\nSelecting the least costly candidate processor as the executor for this task","category":"page"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"After these operations have been performed for each task, the tasks will be fired off to their appropriate worker for handling.","category":"page"},{"location":"scheduler-internals/#Worker:-Task-Execution","page":"Scheduler Internals","title":"Worker: Task Execution","text":"","category":"section"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"Once a worker receives one or more tasks to be executed, the tasks are immediately enqueued into the appropriate processor's queue, and the processors are notified that work is available to be executed. The processors will asynchronously look at their queues and pick the task with the lowest occupancy first; a task with zero occupancy will always be executed immediately, but most tasks have non-zero occupancy, and so will be executed in order of increasing occupancy (effectively prioritizing asynchronous tasks like I/O).","category":"page"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"Before a task begins executions, the processor will collect the task's arguments from other workers as needed, and convert them as needed to execute correctly according to the processor's semantics. This operation is called a \"move\".","category":"page"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"Once a task's arguments have been moved, the task's function will be called with the arguments, and assuming the task doesn't throw an error, the result will be wrapped in a Chunk object. This Chunk will then be sent back to the core scheduler along with information about which task generated it. If the task does throw an error, then the error is instead propagated to the core scheduler, along with a flag indicating that the task failed.","category":"page"},{"location":"scheduler-internals/#Worker:-Workload-Balancing","page":"Scheduler Internals","title":"Worker: Workload Balancing","text":"","category":"section"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"In general, Dagger's core scheduler tries to balance workloads as much as possible across all the available processors, but it can fail to do so effectively when either its cached knowledge of each worker's status is outdated, or when its estimates about the task's behavior are inaccurate. To minimize the possibility of workload imbalance, the worker schedulers' processors will attempt to steal tasks from each other when they are under-occupied. Tasks will only be stolen if the task's scope is compatibl with the processor attempting the steal, so tasks with wider scopes have better balancing potential.","category":"page"},{"location":"scheduler-internals/#Core:-Finishing","page":"Scheduler Internals","title":"Core: Finishing","text":"","category":"section"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"Finishing a task which has completed executing is generally a simple set of operations:","category":"page"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"The task's result is registered in the ComputeState for any tasks or user code which will need it\nAny unneeded data is cleared from the scheduler (such as preserved Chunk arguments)\nDownstream dependencies will be moved from \"waiting\" to \"ready\" if this task was the last upstream dependency to them","category":"page"},{"location":"scheduler-internals/#Core:-Shutdown","page":"Scheduler Internals","title":"Core: Shutdown","text":"","category":"section"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"If the core scheduler needs to shutdown due to an error or Julia exiting, then all workers will be shutdown, and the scheduler will close any open channels. If shutdown was due to an error, then an error will be printed or thrown back to the caller.","category":"page"},{"location":"use-cases/parallel-nested-loops/#Use-Case:-Parallel-Nested-Loops","page":"Parallel Nested Loops","title":"Use Case: Parallel Nested Loops","text":"","category":"section"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"One of the many applications of Dagger is that it can be used as a drop-in replacement for nested multi-threaded loops that would otherwise be written with Threads.@threads.","category":"page"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"Consider a simplified scenario where you want to calculate the maximum mean values of random samples of various lengths that have been generated by several distributions provided by the Distributions.jl package. The results should be collected into a DataFrame. We have the following function:","category":"page"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"using Dagger, Random, Distributions, StatsBase, DataFrames\n\nfunction f(dist, len, reps, σ)\n v = Vector{Float64}(undef, len) # avoiding allocations\n maximum(mean(rand!(dist, v)) for _ in 1:reps)/σ\nend","category":"page"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"Let us consider the following probability distributions for numerical experiments, all of which have expected values equal to zero, and the following lengths of vectors:","category":"page"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"dists = [Cosine, Epanechnikov, Laplace, Logistic, Normal, NormalCanon, PGeneralizedGaussian, SkewNormal, SkewedExponentialPower, SymTriangularDist]\nlens = [10, 20, 50, 100, 200, 500]","category":"page"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"Using Threads.@threads those experiments could be parallelized as:","category":"page"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"function experiments_threads(dists, lens, K=1000)\n res = DataFrame()\n lck = ReentrantLock()\n Threads.@threads for T in dists\n dist = T()\n σ = std(dist)\n for L in lens\n z = f(dist, L, K, σ)\n Threads.lock(lck) do\n push!(res, (;T, σ, L, z))\n end\n end\n end\n res\nend","category":"page"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"Note that DataFrames.push! is not a thread safe operation and hence we need to utilize a locking mechanism in order to avoid two threads appending the DataFrame at the same time.","category":"page"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"The same code could be rewritten in Dagger as:","category":"page"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"function experiments_dagger(dists, lens, K=1000)\n res = DataFrame()\n @sync for T in dists\n dist = T()\n σ = Dagger.@spawn std(dist)\n for L in lens\n z = Dagger.@spawn f(dist, L, K, σ)\n push!(res, (;T, σ, L, z))\n end\n end\n res.z = fetch.(res.z)\n res.σ = fetch.(res.σ)\n res\nend","category":"page"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"In this code we have job interdependence. Firstly, we are calculating the standard deviation σ and than we are using that value in the function f. Since Dagger.@spawn yields an EagerThunk rather than actual values, we need to use the fetch function to obtain those values. In this example, the value fetching is perfomed once all computations are completed (note that @sync preceding the loop forces the loop to wait for all jobs to complete). Also, note that contrary to the previous example, we do not need to implement locking as we are just pushing the EagerThunk results of Dagger.@spawn serially into the DataFrame (which is fast since Dagger.@spawn doesn't block).","category":"page"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"The above use case scenario has been tested by running julia -t 8 (or with JULIA_NUM_THREADS=8 as environment variable). The Threads.@threads code takes 1.8 seconds to run, while the Dagger code, which is also one line shorter, runs around 0.9 seconds, resulting in a 2x speedup.","category":"page"},{"location":"scheduler-visualization/#Scheduler-Visualization-with-DaggerWebDash","page":"Scheduler Visualization","title":"Scheduler Visualization with DaggerWebDash","text":"","category":"section"},{"location":"scheduler-visualization/","page":"Scheduler Visualization","title":"Scheduler Visualization","text":"When working with Dagger, especially when working with its scheduler, it can be helpful to visualize what Dagger is doing internally. To assist with this, a web dashboard is available in the DaggerWebDash.jl package. This web dashboard uses a web server running within each Dagger worker, along with event logging information, to expose details about the scheduler. Information like worker and processor saturation, memory allocations, profiling traces, and much more are available in easy-to-interpret plots.","category":"page"},{"location":"scheduler-visualization/","page":"Scheduler Visualization","title":"Scheduler Visualization","text":"Using the dashboard is relatively simple and straightforward; if you run Dagger's benchmarking script, it's enabled for you automatically if the BENCHMARK_RENDER environment variable is set to webdash. This is the easiest way to get started with the web dashboard for new users.","category":"page"},{"location":"scheduler-visualization/","page":"Scheduler Visualization","title":"Scheduler Visualization","text":"For manual usage, the following snippet of code will suffice:","category":"page"},{"location":"scheduler-visualization/","page":"Scheduler Visualization","title":"Scheduler Visualization","text":"using Dagger, DaggerWebDash, TimespanLogging\n\nctx = Context() # or `ctx = Dagger.Sch.eager_context()` for eager API usage\nml = TimespanLogging.MultiEventLog()\n\n## Add some logging events of interest\n\nml[:core] = TimespanLogging.Events.CoreMetrics()\nml[:id] = TimespanLogging.Events.IDMetrics()\nml[:timeline] = TimespanLogging.Events.TimelineMetrics()\n# ...\n\n# (Optional) Enable profile flamegraph generation with ProfileSVG\nml[:profile] = DaggerWebDash.ProfileMetrics()\nctx.profile = true\n\n# Create a LogWindow; necessary for real-time event updates\nlw = TimespanLogging.Events.LogWindow(20*10^9, :core)\nml.aggregators[:logwindow] = lw\n\n# Create the D3Renderer server on port 8080\nd3r = DaggerWebDash.D3Renderer(8080)\n\n## Add some plots! Rendered top-down in order\n\n# Show an overview of all generated events as a Gantt chart\npush!(d3r, DaggerWebDash.GanttPlot(:core, :id, :esat, :psat; title=\"Overview\"))\n\n# Show various numerical events as line plots over time\npush!(d3r, DaggerWebDash.LinePlot(:core, :wsat, \"Worker Saturation\", \"Running Tasks\"))\npush!(d3r, DaggerWebDash.LinePlot(:core, :loadavg, \"CPU Load Average\", \"Average Running Threads\"))\npush!(d3r, DaggerWebDash.LinePlot(:core, :bytes, \"Allocated Bytes\", \"Bytes\"))\npush!(d3r, DaggerWebDash.LinePlot(:core, :mem, \"Available Memory\", \"% Free\"))\n\n# Show a graph rendering of compute tasks and data movement between them\n# Note: Profile events are ignored if absent from the log\npush!(d3r, DaggerWebDash.GraphPlot(:core, :id, :timeline, :profile, \"DAG\"))\n\n# TODO: Not yet functional\n#push!(d3r, DaggerWebDash.ProfileViewer(:core, :profile, \"Profile Viewer\"))\n\n# Add the D3Renderer as a consumer of special events generated by LogWindow\npush!(lw.creation_handlers, d3r)\npush!(lw.deletion_handlers, d3r)\n\n# D3Renderer is also an aggregator\nml.aggregators[:d3r] = d3r\n\nctx.log_sink = ml\n# ... use `ctx`","category":"page"},{"location":"scheduler-visualization/","page":"Scheduler Visualization","title":"Scheduler Visualization","text":"Once the server has started, you can browse to http://localhost:8080/ (if running on your local machine) to view the plots in real time. The dashboard also provides options at the top of the page to control the drawing speed, enable and disable reading updates from the server (disabling freezes the display at the current instant), and a selector for which worker to look at. If the connection to the server is lost for any reason, the dashboard will attempt to reconnect at 5 second intervals. The dashboard can usually survive restarts of the server perfectly well, although refreshing the page is usually a good idea. Informational messages are also logged to the browser console for debugging.","category":"page"},{"location":"propagation/#Option-Propagation","page":"Option Propagation","title":"Option Propagation","text":"","category":"section"},{"location":"propagation/","page":"Option Propagation","title":"Option Propagation","text":"Most options passed to Dagger are passed via @spawn/spawn or delayed directly. This works well when an option only needs to be set for a single thunk, but is cumbersome when the same option needs to be set on multiple thunks, or set recursively on thunks spawned within other thunks. Thankfully, Dagger provides the with_options function to make this easier. This function is very powerful, by nature of using \"context variables\"; let's first see some example code to help explain it:","category":"page"},{"location":"propagation/","page":"Option Propagation","title":"Option Propagation","text":"function f(x)\n m = Dagger.@spawn myid()\n return Dagger.@spawn x+m\nend\nDagger.with_options(;scope=ProcessScope(2)) do\n @sync begin\n @async @assert fetch(Dagger.@spawn f(1)) == 3\n @async @assert fetch(Dagger.@spawn f(2)) == 4\n end\nend","category":"page"},{"location":"propagation/","page":"Option Propagation","title":"Option Propagation","text":"In the above example, with_options sets the scope for both Dagger.@spawn f(1) and Dagger.@spawn f(2) to ProcessScope(2) (locking Dagger tasks to worker 2). This is of course very useful for ensuring that a set of operations use a certain scope. What it also does, however, is propagates this scope through calls to @async, Threads.@spawn, and Dagger.@spawn; this means that the task spawned by f(x) also inherits this scope! This works thanks to the magic of context variables, which are inherited recursively through child tasks, and thanks to Dagger intentionally propagating the scope (and other options passed to with_options) across the cluster, ensuring that no matter how deep the recursive task spawning goes, the options are maintained.","category":"page"},{"location":"propagation/","page":"Option Propagation","title":"Option Propagation","text":"It's also possible to retrieve the options currently set by with_options, using Dagger.get_options:","category":"page"},{"location":"propagation/","page":"Option Propagation","title":"Option Propagation","text":"Dagger.with_options(;scope=ProcessScope(2)) do\n fetch(@async @assert Dagger.get_options().scope == ProcessScope(2))\n # Or:\n fetch(@async @assert Dagger.get_options(:scope) == ProcessScope(2))\n # Or, if `scope` might not have been propagated as an option, we can give\n # it a default value:\n fetch(@async @assert Dagger.get_options(:scope, AnyScope()) == ProcessScope(2))\nend","category":"page"},{"location":"propagation/","page":"Option Propagation","title":"Option Propagation","text":"This is a very powerful concept: with a single call to with_options, we can apply any set of options to any nested set of operations. This is great for isolating large workloads to different workers or processors, defining global checkpoint/restore behavior, and more.","category":"page"},{"location":"checkpointing/#Checkpointing","page":"Checkpointing","title":"Checkpointing","text":"","category":"section"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"If at some point during a Dagger computation a thunk throws an error, or if the entire computation dies because the head node hit an OOM or other unexpected error, the entire computation is lost and needs to be started from scratch. This can be unacceptable for scheduling very large/expensive/mission-critical graphs, and for interactive development where errors are common and easily fixable.","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"Robust applications often support \"checkpointing\", where intermediate results are periodically written out to persistent media, or sharded to the rest of the cluster, to allow resuming an interrupted computation from a point later than the original start. Dagger provides infrastructure to perform user-driven checkpointing of intermediate results once they're generated.","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"As a concrete example, imagine that you're developing a numerical algorithm, and distributing it with Dagger. The idea is to sum all the values in a very big matrix, and then get the square root of the absolute value of the sum of sums. Here is what that might look like:","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"X = compute(randn(Blocks(128,128), 1024, 1024))\nY = [delayed(sum)(chunk) for chunk in X.chunks]\ninner(x...) = sqrt(sum(x))\nZ = delayed(inner)(Y...)\nz = collect(Z)","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"Let's pretend that the above calculation of each element in Y takes a full day to run. If you run this, you might realize that if the final sum call returns a negative number, sqrt will throw a DomainError (because sqrt can't accept negative Real inputs). Of course, you forgot to add a call to abs before the call to sqrt! Now, you know how to fix this, but once you do, you'll have to spend another entire day waiting for it to finish! And maybe you fix this one bug and wait a full day for it to finish, and begin adding more very computationally-heavy code (which inevitably has bugs). Those later computations might fail, and if you're running this as a script (maybe under a cluster scheduler like Slurm), you have to restart everything from the very beginning. This is starting to sound pretty wasteful...","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"Thankfully, Dagger has a simple solution to this: checkpointing. With checkpointing, Dagger can be instructed to save intermediate results (maybe the results of computing Y) to a persistent storage medium of your choice. Probably a file on disk, but maybe a database, or even just stored in RAM in a space-efficient form. You also tell Dagger how to restore this data: how to take the result stored in its persistent form, and turn it back into something identical to the original intermediate data that Dagger computed. Then, when the worst happens and a piece of your algorithm throws an error (as above), Dagger will call the restore function and try to materialize those intermediate results that you painstakingly computed, so that you don't need to re-compute them.","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"Let's see how we'd modify the above example to use checkpointing:","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"using Serialization\n\nX = compute(randn(Blocks(128,128), 1024, 1024))\nY = [delayed(sum; checkpoint=(thunk,result)->begin\n open(\"checkpoint-$idx.bin\", \"w\") do io\n serialize(io, collect(result))\n end\nend, restore=(thunk)->begin\n open(\"checkpoint-$idx.bin\", \"r\") do io\n Dagger.tochunk(deserialize(io))\n end\nend)(chunk) for (idx,chunk) in enumerate(X.chunks)]\ninner(x...) = sqrt(sum(x))\nZ = delayed(inner)(Y...)\nz = collect(Z)","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"Two changes were made: first, we enumerate(X.chunks) so that we can get a unique index to identify each chunk; second, we specify a ThunkOptions to delayed with a checkpoint and restore function that is specialized to write or read the given chunk to or from a file on disk, respectively. Notice the usage of collect in the checkpoint function, and the use of Dagger.tochunk in the restore function; Dagger represents intermediate results as Dagger.Chunk objects, so we need to convert between Chunks and the actual data to keep Dagger happy. Performance-sensitive users might consider modifying these methods to store the checkpoint files on the filesystem of the server that currently owns the Chunk, to minimize data transfer times during checkpoint and restore operations.","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"If we run the above code once, we'll still end up waiting a day for Y to be computed, and we'll still get the DomainError from sqrt. However, when we fix the inner function to include that call to abs that was missing, and we re-run this code starting from the creation of Y, we'll find that we don't actually spend a day waiting; we probably spend a few seconds waiting, and end up with our final result! This is because Dagger called the restore function for each element of Y, and was provided a result by the user-specified function, so it skipped re-computing those sums entirely.","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"You might also notice that when you ran this code the first time, you received errors about \"No such file or directory\", or some similar error; this occurs because Dagger always calls the restore function when it exists. In the first run, the checkpoint files don't yet exist, so there's nothing to restore; Dagger reports the thrown error, but keeps moving along, merrily computing the sums of Y. You're welcome to explicitly check if the file exists, and if not, return nothing; then Dagger won't report an annoying error, and will skip the restoration quietly.","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"Of course, you might have a lot of code that looks like this, and may want to also checkpoint the final result of the z = collect(...) call as well. This is just as easy to do:","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"# compute X, Y, Z above ...\nz = collect(Z; options=Dagger.Sch.SchedulerOptions(;\ncheckpoint=(result)->begin\n open(\"checkpoint-final.bin\", \"w\") do io\n serialize(io, collect(result))\n end\nend, restore=()->begin\n open(\"checkpoint-final.bin\", \"r\") do io\n Dagger.tochunk(deserialize(io))\n end\nend))","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"In this case, the entire computation will be skipped if checkpoint-final.bin exists!","category":"page"},{"location":"api-dagger/types/","page":"Types","title":"Types","text":"CurrentModule = Dagger","category":"page"},{"location":"api-dagger/types/#Dagger-Types","page":"Types","title":"Dagger Types","text":"","category":"section"},{"location":"api-dagger/types/","page":"Types","title":"Types","text":"Pages = [\"types.md\"]","category":"page"},{"location":"api-dagger/types/#Task-Types","page":"Types","title":"Task Types","text":"","category":"section"},{"location":"api-dagger/types/","page":"Types","title":"Types","text":"Thunk\nEagerThunk","category":"page"},{"location":"api-dagger/types/#Dagger.Thunk","page":"Types","title":"Dagger.Thunk","text":"Thunk\n\nWraps a callable object to be run with Dagger. A Thunk is typically created through a call to delayed or its macro equivalent @par.\n\nConstructors\n\ndelayed(f; kwargs...)(args...)\n@par [option=value]... f(args...)\n\nExamples\n\njulia> t = delayed(sin)(π) # creates a Thunk to be computed later\nThunk(sin, (π,))\n\njulia> collect(t) # computes the result and returns it to the current process\n1.2246467991473532e-16\n\nArguments\n\nf: The function to be called upon execution of the Thunk.\nargs: The arguments to be passed to the Thunk.\nkwargs: The properties describing unique behavior of this Thunk. Details\n\nfor each property are described in the next section.\n\noption=value: The same as passing kwargs to delayed.\n\nPublic Properties\n\nmeta::Bool=false: If true, instead of fetching cached arguments from\n\nChunks and passing the raw arguments to f, instead pass the Chunk. Useful for doing manual fetching or manipulation of Chunk references. Non-Chunk arguments are still passed as-is.\n\nprocessor::Processor=OSProc() - The processor associated with f. Useful if\n\nf is a callable struct that exists on a given processor and should be transferred appropriately.\n\nscope::Dagger.AbstractScope=DefaultScope() - The scope associated with f.\n\nUseful if f is a function or callable struct that may only be transferred to, and executed within, the specified scope.\n\nOptions\n\noptions: A Sch.ThunkOptions struct providing the options for the Thunk.\n\nIf omitted, options can also be specified by passing key-value pairs as kwargs.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.EagerThunk","page":"Types","title":"Dagger.EagerThunk","text":"EagerThunk\n\nReturned from spawn/@spawn calls. Represents a task that is in the scheduler, potentially ready to execute, executing, or finished executing. May be fetch'd or wait'd on at any time.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Task-Options-Types","page":"Types","title":"Task Options Types","text":"","category":"section"},{"location":"api-dagger/types/","page":"Types","title":"Types","text":"Options\nSch.ThunkOptions\nSch.SchedulerOptions","category":"page"},{"location":"api-dagger/types/#Data-Management-Types","page":"Types","title":"Data Management Types","text":"","category":"section"},{"location":"api-dagger/types/","page":"Types","title":"Types","text":"Chunk\nShard","category":"page"},{"location":"api-dagger/types/#Processor-Types","page":"Types","title":"Processor Types","text":"","category":"section"},{"location":"api-dagger/types/","page":"Types","title":"Types","text":"Processor\nOSProc\nThreadProc","category":"page"},{"location":"api-dagger/types/#Dagger.Processor","page":"Types","title":"Dagger.Processor","text":"Processor\n\nAn abstract type representing a processing device and associated memory, where data can be stored and operated on. Subtypes should be immutable, and instances should compare equal if they represent the same logical processing device/memory. Subtype instances should be serializable between different nodes. Subtype instances may contain a \"parent\" Processor to make it easy to transfer data to/from other types of Processor at runtime.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.OSProc","page":"Types","title":"Dagger.OSProc","text":"OSProc <: Processor\n\nJulia CPU (OS) process, identified by Distributed pid. The logical parent of all processors on a given node, but otherwise does not participate in computations.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.ThreadProc","page":"Types","title":"Dagger.ThreadProc","text":"ThreadProc <: Processor\n\nJulia CPU (OS) thread, identified by Julia thread ID.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Scope-Types","page":"Types","title":"Scope Types","text":"","category":"section"},{"location":"api-dagger/types/","page":"Types","title":"Types","text":"AnyScope\nNodeScope\nProcessScope\nProcessorTypeScope\nTaintScope\nUnionScope\nExactScope","category":"page"},{"location":"api-dagger/types/#Dagger.AnyScope","page":"Types","title":"Dagger.AnyScope","text":"Widest scope that contains all processors.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.NodeScope","page":"Types","title":"Dagger.NodeScope","text":"Scoped to the same physical node.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.ProcessScope","page":"Types","title":"Dagger.ProcessScope","text":"Scoped to the same OS process.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.ProcessorTypeScope","page":"Types","title":"Dagger.ProcessorTypeScope","text":"Scoped to any processor with a given supertype.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/types/#Dagger.TaintScope","page":"Types","title":"Dagger.TaintScope","text":"Taints a scope for later evaluation.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.UnionScope","page":"Types","title":"Dagger.UnionScope","text":"Union of two or more scopes.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.ExactScope","page":"Types","title":"Dagger.ExactScope","text":"Scoped to a specific processor.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Context-Types","page":"Types","title":"Context Types","text":"","category":"section"},{"location":"api-dagger/types/","page":"Types","title":"Types","text":"Context","category":"page"},{"location":"api-dagger/types/#Dagger.Context","page":"Types","title":"Dagger.Context","text":"Context(xs::Vector{OSProc}) -> Context\nContext(xs::Vector{Int}) -> Context\n\nCreate a Context, by default adding each available worker.\n\nIt is also possible to create a Context from a vector of OSProc, or equivalently the underlying process ids can also be passed directly as a Vector{Int}.\n\nSpecial fields include:\n\n'log_sink': A log sink object to use, if any.\nlog_file::Union{String,Nothing}: Path to logfile. If specified, at\n\nscheduler termination, logs will be collected, combined with input thunks, and written out in DOT format to this location.\n\nprofile::Bool: Whether or not to perform profiling with Profile stdlib.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Array-Types","page":"Types","title":"Array Types","text":"","category":"section"},{"location":"api-dagger/types/","page":"Types","title":"Types","text":"DArray\nBlocks\nArrayDomain\nUnitDomain","category":"page"},{"location":"api-dagger/types/#Dagger.DArray","page":"Types","title":"Dagger.DArray","text":"DArray{T,N,F}(domain, subdomains, chunks, concat)\nDArray(T, domain, subdomains, chunks, [concat=cat])\n\nAn N-dimensional distributed array of element type T, with a concatenation function of type F.\n\nArguments\n\nT: element type\ndomain::ArrayDomain{N}: the whole ArrayDomain of the array\nsubdomains::AbstractArray{ArrayDomain{N}, N}: a DomainBlocks of the same dimensions as the array\nchunks::AbstractArray{Union{Chunk,Thunk}, N}: an array of chunks of dimension N\nconcat::F: a function of type F. concat(x, y; dims=d) takes two chunks x and y and concatenates them along dimension d. cat is used by default.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.Blocks","page":"Types","title":"Dagger.Blocks","text":"Blocks(xs...)\n\nIndicates the size of an array operation, specified as xs, whose length indicates the number of dimensions in the resulting array.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.ArrayDomain","page":"Types","title":"Dagger.ArrayDomain","text":"ArrayDomain{N}\n\nAn N-dimensional domain over an array.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.UnitDomain","page":"Types","title":"Dagger.UnitDomain","text":"UnitDomain\n\nDefault domain – has no information about the value\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Logging-Event-Types","page":"Types","title":"Logging Event Types","text":"","category":"section"},{"location":"api-dagger/types/","page":"Types","title":"Types","text":"Events.BytesAllocd\nEvents.ProcessorSaturation\nEvents.WorkerSaturation","category":"page"},{"location":"api-dagger/types/#Dagger.Events.BytesAllocd","page":"Types","title":"Dagger.Events.BytesAllocd","text":"BytesAllocd\n\nTracks memory allocated for Chunks.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.Events.ProcessorSaturation","page":"Types","title":"Dagger.Events.ProcessorSaturation","text":"ProcessorSaturation\n\nTracks the compute saturation (running tasks) per-processor.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.Events.WorkerSaturation","page":"Types","title":"Dagger.Events.WorkerSaturation","text":"WorkerSaturation\n\nTracks the compute saturation (running tasks).\n\n\n\n\n\n","category":"type"},{"location":"task-spawning/#Task-Spawning","page":"Task Spawning","title":"Task Spawning","text":"","category":"section"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"The main entrypoint to Dagger is @spawn:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Dagger.@spawn [option=value]... f(args...; kwargs...)","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"or spawn if it's more convenient:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Dagger.spawn(f, Dagger.Options(options), args...; kwargs...)","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"When called, it creates an EagerThunk (also known as a \"thunk\" or \"task\") object representing a call to function f with the arguments args and keyword arguments kwargs. If it is called with other thunks as args/kwargs, such as in Dagger.@spawn f(Dagger.@spawn g()), then, in this example, the function f gets passed the results of executing g(), once that result is available. If g() isn't yet finished executing, then the execution of f waits on g() to complete before executing.","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"An important observation to make is that, for each argument to @spawn/spawn, if the argument is the result of another @spawn/spawn call (thus it's an EagerThunk), the argument will be computed first, and then its result will be passed into the function receiving the argument. If the argument is not an EagerThunk (instead, some other type of Julia object), it'll be passed as-is to the function f (with some exceptions).","category":"page"},{"location":"task-spawning/#Options","page":"Task Spawning","title":"Options","text":"","category":"section"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"The Options struct in the second argument position is optional; if provided, it is passed to the scheduler to control its behavior. Options contains a NamedTuple of option key-value pairs, which can be any of:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Any field in Dagger.Sch.ThunkOptions (see Scheduler and Thunk options)\nmeta::Bool – Pass the input Chunk objects themselves to f and not the value contained in them","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"There are also some extra optionss that can be passed, although they're considered advanced options to be used only by developers or library authors:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"get_result::Bool – return the actual result to the scheduler instead of Chunk objects. Used when f explicitly constructs a Chunk or when return value is small (e.g. in case of reduce)\npersist::Bool – the result of this Thunk should not be released after it becomes unused in the DAG\ncache::Bool – cache the result of this Thunk such that if the thunk is evaluated again, one can just reuse the cached value. If it’s been removed from cache, recompute the value.","category":"page"},{"location":"task-spawning/#Simple-example","page":"Task Spawning","title":"Simple example","text":"","category":"section"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Let's see a very simple directed acyclic graph (or DAG) constructed with Dagger:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"using Dagger\n\nadd1(value) = value + 1\nadd2(value) = value + 2\ncombine(a...) = sum(a)\n\np = Dagger.@spawn add1(4)\nq = Dagger.@spawn add2(p)\nr = Dagger.@spawn add1(3)\ns = Dagger.@spawn combine(p, q, r)\n\n@assert fetch(s) == 16","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"The thunks p, q, r, and s have the following structure:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"(Image: graph)","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"The final result (from fetch(s)) is the obvious consequence of the operation:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"add1(4) + add2(add1(4)) + add1(3)","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"(4 + 1) + ((4 + 1) + 2) + (3 + 1) == 16","category":"page"},{"location":"task-spawning/#Eager-Execution","page":"Task Spawning","title":"Eager Execution","text":"","category":"section"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Dagger's @spawn macro works similarly to @async and Threads.@spawn: when called, it wraps the function call specified by the user in an EagerThunk object, and immediately places it onto a running scheduler, to be executed once its dependencies are fulfilled.","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"x = rand(400,400)\ny = rand(400,400)\nzt = Dagger.@spawn x * y\nz = fetch(zt)\n@assert isapprox(z, x * y)","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"One can also wait on the result of @spawn and check completion status with isready:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"x = Dagger.@spawn sleep(10)\n@assert !isready(x)\nwait(x)\n@assert isready(x)","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Like @async and Threads.@spawn, Dagger.@spawn synchronizes with locally-scoped @sync blocks:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"function sleep_and_print(delay, str)\n sleep(delay)\n println(str)\nend\n@sync begin\n Dagger.@spawn sleep_and_print(3, \"I print first\")\nend\nwait(Dagger.@spawn sleep_and_print(1, \"I print second\"))","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"One can also safely call @spawn from another worker (not ID 1), and it will be executed correctly:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"x = fetch(Distributed.@spawnat 2 Dagger.@spawn 1+2) # fetches the result of `@spawnat`\nx::EagerThunk\n@assert fetch(x) == 3 # fetch the result of `@spawn`","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"This is useful for nested execution, where an @spawn'd thunk calls @spawn. This is detailed further in Dynamic Scheduler Control.","category":"page"},{"location":"task-spawning/#Errors","page":"Task Spawning","title":"Errors","text":"","category":"section"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"If a thunk errors while running under the eager scheduler, it will be marked as having failed, all dependent (downstream) thunks will be marked as failed, and any future thunks that use a failed thunk as input will fail. Failure can be determined with fetch, which will re-throw the error that the originally-failing thunk threw. wait and isready will not check whether a thunk or its upstream failed; they only check if the thunk has completed, error or not.","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"This failure behavior is not the default for lazy scheduling (Lazy API), but can be enabled by setting the scheduler/thunk option (Scheduler and Thunk options) allow_error to true. However, this option isn't terribly useful for non-dynamic usecases, since any thunk failure will propagate down to the output thunk regardless of where it occurs.","category":"page"},{"location":"task-spawning/#Lazy-API","page":"Task Spawning","title":"Lazy API","text":"","category":"section"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Alongside the modern eager API, Dagger also has a legacy lazy API, accessible via @par or delayed. The above computation can be executed with the lazy API by substituting @spawn with @par and fetch with collect:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"p = Dagger.@par add1(4)\nq = Dagger.@par add2(p)\nr = Dagger.@par add1(3)\ns = Dagger.@par combine(p, q, r)\n\n@assert collect(s) == 16","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"or similarly, in block form:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"s = Dagger.@par begin\n p = add1(4)\n q = add2(p)\n r = add1(3)\n combine(p, q, r)\nend\n\n@assert collect(s) == 16","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Alternatively, if you want to compute but not fetch the result of a lazy operation, you can call compute on the thunk. This will return a Chunk object which references the result (see Chunks for more details):","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"x = Dagger.@par 1+2\ncx = compute(x)\ncx::Chunk\n@assert collect(cx) == 3","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Note that, as a legacy API, usage of the lazy API is generally discouraged for modern usage of Dagger. The reasons for this are numerous:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Nothing useful is happening while the DAG is being constructed, adding extra latency\nDynamically expanding the DAG can't be done with @par and delayed, making recursive nesting annoying to write\nEach call to compute/collect starts a new scheduler, and destroys it at the end of the computation, wasting valuable time on setup and teardown\nDistinct schedulers don't share runtime metrics or learned parameters, thus causing the scheduler to act less intelligently\nDistinct schedulers can't share work or data directly","category":"page"},{"location":"task-spawning/#Scheduler-and-Thunk-options","page":"Task Spawning","title":"Scheduler and Thunk options","text":"","category":"section"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"While Dagger generally \"just works\", sometimes one needs to exert some more fine-grained control over how the scheduler allocates work. There are two parallel mechanisms to achieve this: Scheduler options (from Dagger.Sch.SchedulerOptions) and Thunk options (from Dagger.Sch.ThunkOptions). These two options structs contain many shared options, with the difference being that Scheduler options operate globally across an entire DAG, and Thunk options operate on a thunk-by-thunk basis.","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Scheduler options can be constructed and passed to collect() or compute() as the keyword argument options for lazy API usage:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"t = Dagger.@par 1+2\nopts = Dagger.Sch.SchedulerOptions(;single=1) # Execute on worker 1\n\ncompute(t; options=opts)\n\ncollect(t; options=opts)","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Thunk options can be passed to @spawn/spawn, @par, and delayed similarly:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"# Execute on worker 1\n\nDagger.@spawn single=1 1+2\nDagger.spawn(+, Dagger.Options(;single=1), 1, 2)\n\ndelayed(+; single=1)(1, 2)","category":"page"},{"location":"task-queues/#Task-Queues","page":"Task Queues","title":"Task Queues","text":"","category":"section"},{"location":"task-queues/","page":"Task Queues","title":"Task Queues","text":"By default, @spawn/spawn submit tasks immediately and directly into Dagger's scheduler without modifications. However, sometimes you want to be able to tweak this behavior for a region of code; for example, when working with GPUs or other operations which operate in-place, you might want to emulate CUDA's stream semantics by ensuring that tasks execute sequentially (to avoid one kernel reading from an array while another kernel is actively writing to it). Or, you might want to ensure that a set of Dagger tasks are submitted into the scheduler all at once for benchmarking purposes or to emulate the behavior of delayed. This and more is possible through a mechanism called \"task queues\".","category":"page"},{"location":"task-queues/","page":"Task Queues","title":"Task Queues","text":"A task queue in Dagger is an object that can be configured to accept unlaunched tasks from @spawn/spawn and either modify them or delay their launching arbitrarily. By default, Dagger tasks are enqueued through the EagerTaskQueue, which submits tasks directly into the scheduler before @spawn/spawn returns. However, Dagger also has an InOrderTaskQueue, which ensures that tasks enqueued through it execute sequentially with respect to each other. This queue can be allocated with Dagger.spawn_sequential:","category":"page"},{"location":"task-queues/","page":"Task Queues","title":"Task Queues","text":"A = rand(16)\nB = zeros(16)\nC = zeros(16)\nfunction vcopy!(B, A)\n B .= A .+ 1.0\n return\nend\nfunction vadd!(C, A, B)\n C .+= A .+ B\n return\nend\nwait(Dagger.spawn_sequential() do\n Dagger.@spawn vcopy!(B, A)\n Dagger.@spawn vadd!(C, A, B)\nend)","category":"page"},{"location":"task-queues/","page":"Task Queues","title":"Task Queues","text":"In the above example, vadd! is guaranteed to wait until vcopy! is completed, even though vadd! isn't taking the result of vcopy! as an argument (which is how tasks are normally ordered).","category":"page"},{"location":"task-queues/","page":"Task Queues","title":"Task Queues","text":"What if we wanted to launch multiple vcopy! calls within a spawn_sequential region and allow them to execute in parallel, but still ensure that the vadd! happens after they all finish? In this case, we want to switch to another kind of task queue: the LazyTaskQueue. This task queue batches up task submissions into groups, so that all tasks enqueued with it are placed in the scheduler all at once. But what would happen if we used this task queue (via spawn_bulk) within a region using spawn_sequential:","category":"page"},{"location":"task-queues/","page":"Task Queues","title":"Task Queues","text":"A = rand(16)\nB1 = zeros(16)\nB2 = zeros(16)\nC = zeros(16)\nwait(Dagger.spawn_sequential() do\n Dagger.spawn_bulk() do\n Dagger.@spawn vcopy!(B1, A)\n Dagger.@spawn vcopy!(B2, A)\n end\n Dagger.@spawn vadd!(C, B1, B2)\nend)","category":"page"},{"location":"task-queues/","page":"Task Queues","title":"Task Queues","text":"Conveniently, Dagger's task queues can be nested to get the expected behavior; the above example will submit the two vcopy! tasks as a group (and they can execute concurrently), while still ensuring that those two tasks finish before the vadd! task executes.","category":"page"},{"location":"task-queues/","page":"Task Queues","title":"Task Queues","text":"warn: Warn\nTask queues do not propagate to nested tasks; if a Dagger task launches another task internally, the child task doesn't inherit the task queue that the parent task was enqueued in.","category":"page"},{"location":"benchmarking/#Benchmarking-Dagger","page":"Benchmarking","title":"Benchmarking Dagger","text":"","category":"section"},{"location":"benchmarking/","page":"Benchmarking","title":"Benchmarking","text":"For ease of benchmarking changes to Dagger's scheduler and the DArray, a benchmarking script exists at benchmarks/benchmark.jl. This script currently allows benchmarking a non-negative matrix factorization (NNMF) algorithm, which we've found to be a good evaluator of scheduling performance. The benchmark script can test with and without Dagger, and also has support for using CUDA or AMD GPUs to accelerate the NNMF via DaggerGPU.jl.","category":"page"},{"location":"benchmarking/","page":"Benchmarking","title":"Benchmarking","text":"The script checks for a number of environment variables, which are used to control the benchmarks that are performed (all of which are optional):","category":"page"},{"location":"benchmarking/","page":"Benchmarking","title":"Benchmarking","text":"BENCHMARK_PROCS: Selects the number of Julia processes and threads to start-up. Specified as 8:4, this option would start 8 extra Julia processes, with 4 threads each. Defaults to 2 processors and 1 thread each.\nBENCHMARK_REMOTES: Specifies a colon-separated list of remote servers to connect to and start Julia processes on, using BENCHMARK_PROCS to indicate the processor/thread configuration of those remotes. Disabled by default (uses the local machine).\nBENCHMARK_OUTPUT_FORMAT: Selects the output format for benchmark results. Defaults to jls, which uses Julia's Serialization stdlib, and can also be jld to use JLD.jl.\nBENCHMARK_RENDER: Configures rendering, which is disabled by default. Can be \"live\" or \"offline\", which are explained below.\nBENCHMARK: Specifies the set of benchmarks to run as a comma-separated list, where each entry can be one of cpu, cuda, or amdgpu, and may optionally append +dagger (like cuda+dagger) to indicate whether or not to use Dagger. Defaults to cpu,cpu+dagger, which runs CPU benchmarks with and without Dagger.\nBENCHMARK_SCALE: Determines how much to scale the benchmark sizing by, typically specified as a UnitRange{Int}. Defaults to 1:5:50, which runs each scale from 1 to 50, in steps of 5.","category":"page"},{"location":"benchmarking/#Rendering-with-BENCHMARK_RENDER","page":"Benchmarking","title":"Rendering with BENCHMARK_RENDER","text":"","category":"section"},{"location":"benchmarking/","page":"Benchmarking","title":"Benchmarking","text":"Dagger contains visualization code for the scheduler (as a Gantt chart) and thunk execution profiling (flamechart), which can be enabled with BENCHMARK_RENDER. Additionally, rendering can be done \"live\", served via a Mux.jl webserver run locally, or \"offline\", where the visualization will be embedded into the results output file. By default, rendering is disabled. If BENCHMARK_RENDER is set to live, a Mux webserver is started at localhost:8000 (the address is not yet configurable), and the Gantt chart and profiling flamechart will be rendered once the benchmarks start. If set to offline, data visualization will happen in the background, and will be passed in the results file.","category":"page"},{"location":"benchmarking/","page":"Benchmarking","title":"Benchmarking","text":"Note that Gantt chart and flamechart output is only generated and relevant during Dagger execution.","category":"page"},{"location":"benchmarking/#TODO:-Plotting","page":"Benchmarking","title":"TODO: Plotting","text":"","category":"section"},{"location":"api-timespanlogging/types/","page":"Types","title":"Types","text":"CurrentModule = TimespanLogging","category":"page"},{"location":"api-timespanlogging/types/#TimespanLogging-Types","page":"Types","title":"TimespanLogging Types","text":"","category":"section"},{"location":"api-timespanlogging/types/","page":"Types","title":"Types","text":"Pages = [\"types.md\"]","category":"page"},{"location":"api-timespanlogging/types/#Log-Sink-Types","page":"Types","title":"Log Sink Types","text":"","category":"section"},{"location":"api-timespanlogging/types/","page":"Types","title":"Types","text":"MultiEventLog\nLocalEventLog\nNoOpLog","category":"page"},{"location":"api-timespanlogging/types/#TimespanLogging.MultiEventLog","page":"Types","title":"TimespanLogging.MultiEventLog","text":"MultiEventLog\n\nProcesses events immediately, generating multiple log streams. Multiple consumers may register themselves in the MultiEventLog, and when accessed, log events will be provided to all consumers. A consumer is simply a function or callable struct which will be called with an event when it's generated. The return value of the consumer will be pushed into a log stream dedicated to that consumer. Errors thrown by consumers will be caught and rendered, but will not otherwise interrupt consumption by other consumers, or future consumption cycles. An error will result in nothing being appended to that consumer's log.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#TimespanLogging.LocalEventLog","page":"Types","title":"TimespanLogging.LocalEventLog","text":"LocalEventLog\n\nStores events in a process-local array. Accessing the logs is all-or-nothing; if multiple consumers call get_logs!, they will get different sets of logs.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#TimespanLogging.NoOpLog","page":"Types","title":"TimespanLogging.NoOpLog","text":"NoOpLog\n\nDisables event logging entirely.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#Event-Types","page":"Types","title":"Event Types","text":"","category":"section"},{"location":"api-timespanlogging/types/","page":"Types","title":"Types","text":"Event","category":"page"},{"location":"api-timespanlogging/types/#TimespanLogging.Event","page":"Types","title":"TimespanLogging.Event","text":"An event generated by timespan_start or timespan_finish.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#Built-in-Event-Types","page":"Types","title":"Built-in Event Types","text":"","category":"section"},{"location":"api-timespanlogging/types/","page":"Types","title":"Types","text":"Events.CoreMetrics\nEvents.IDMetrics\nEvents.TimelineMetrics\nEvents.FullMetrics\nEvents.CPULoadAverages\nEvents.MemoryFree\nEvents.EventSaturation\nEvents.DebugMetrics\nEvents.LogWindow","category":"page"},{"location":"api-timespanlogging/types/#TimespanLogging.Events.CoreMetrics","page":"Types","title":"TimespanLogging.Events.CoreMetrics","text":"CoreMetrics\n\nTracks the timestamp, category, and kind of the Event object generated by log events.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#TimespanLogging.Events.IDMetrics","page":"Types","title":"TimespanLogging.Events.IDMetrics","text":"IDMetrics\n\nTracks the ID of Event objects generated by log events.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#TimespanLogging.Events.TimelineMetrics","page":"Types","title":"TimespanLogging.Events.TimelineMetrics","text":"TimelineMetrics\n\nTracks the timeline of Event objects generated by log events.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#TimespanLogging.Events.FullMetrics","page":"Types","title":"TimespanLogging.Events.FullMetrics","text":"FullMetrics\n\nTracks the full Event object generated by log events.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#TimespanLogging.Events.CPULoadAverages","page":"Types","title":"TimespanLogging.Events.CPULoadAverages","text":"CPULoadAverages\n\nMonitors the CPU load averages.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#TimespanLogging.Events.MemoryFree","page":"Types","title":"TimespanLogging.Events.MemoryFree","text":"MemoryFree\n\nMonitors the percentage of free system memory.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#TimespanLogging.Events.EventSaturation","page":"Types","title":"TimespanLogging.Events.EventSaturation","text":"EventSaturation\n\nTracks the compute saturation (running tasks) per-processor.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#TimespanLogging.Events.DebugMetrics","page":"Types","title":"TimespanLogging.Events.DebugMetrics","text":"Debugging metric, used to log event start/finish via @debug.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#TimespanLogging.Events.LogWindow","page":"Types","title":"TimespanLogging.Events.LogWindow","text":"LogWindow\n\nAggregator that prunes events to within a given time window.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/","page":"Types","title":"Types","text":"```","category":"page"},{"location":"logging/#Logging-and-Graphing","page":"Logging and Graphing","title":"Logging and Graphing","text":"","category":"section"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"Dagger's scheduler keeps track of the important and potentially expensive actions it does, such as moving data between workers or executing thunks, and tracks how much time and memory allocations these operations consume, among other things. It does it through the TimespanLogging.jl package (which used to be directly integrated into Dagger). Saving this information somewhere accessible is disabled by default, but it's quite easy to turn it on, by setting a \"log sink\" in the Context being used, as ctx.log_sink. A variety of log sinks are built-in to TimespanLogging; the NoOpLog is the default log sink when one isn't explicitly specified, and disables logging entirely (to minimize overhead). There are currently two other log sinks of interest; the first and newer of the two is the MultiEventLog, which generates multiple independent log streams, one per \"consumer\" (details in the next section). The second and older sink is the LocalEventLog, which is explained later in this document. Most users are recommended to use the MultiEventLog since it's far more flexible and extensible, and is more performant in general.","category":"page"},{"location":"logging/#MultiEventLog","page":"Logging and Graphing","title":"MultiEventLog","text":"","category":"section"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"The MultiEventLog is intended to be configurable to exclude unnecessary information, and to include any built-in or user-defined metrics. It stores a set of \"sub-log\" streams internally, appending a single element to each of them when an event is generated. This element can be called a \"sub-event\" (to distinguish it from the higher-level \"event\" that Dagger creates), and is created by a \"consumer\". A consumer is a function or callable struct that, when called with the Event object generated by TimespanLogging, returns a sub-event characterizing whatever information the consumer represents. For example, the Dagger.Events.BytesAllocd consumer calculates the total bytes allocated and live at any given time within Dagger, and returns the current value when called. Let's construct one:","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"ctx = Context()\nml = TimespanLogging.MultiEventLog()\n\n# Add the BytesAllocd consumer to the log as `:bytes`\nml[:bytes] = Dagger.Events.BytesAllocd()\n\nctx.log_sink = ml","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"As we can see above, each consumer gets a unique name as a Symbol that identifies it. Now that the log sink is attached with a consumer, we can execute some Dagger tasks, and then collect the sub-events generated by BytesAllocd:","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"# Using the lazy API, for explanatory purposes\ncollect(ctx, delayed(+)(1, delayed(*)(3, 4))) # Allocates 8 bytes\nlog = TimspanLogging.get_logs!(ctx)[1] # Get the logs for worker 1\n@show log[:bytes]","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"You'll then see that 8 bytes are allocated and then freed during the process of executing and completing those tasks.","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"Note that the MultiEventLog can also be used perfectly well when using Dagger's eager API:","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"ctx = Dagger.Sch.eager_context()\nctx.log_sink = ml\n\na = Dagger.@spawn 3*4\nDagger.@spawn 1+a","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"There are a variety of other consumers built-in to TimespanLogging and Dagger, under the TimespanLogging.Events and Dagger.Events modules, respectively; see Dagger Types and TimespanLogging Types for details.","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"The MultiEventLog also has a mechanism to call a set of functions, called \"aggregators\", after all consumers have been executed, and are passed the full set of log streams as a Dict{Symbol,Vector{Any}}. The only one currently shipped with TimespanLogging directly is the LogWindow, and DaggerWebDash.jl has the TableStorage which integrates with it; see DaggerWebDash Types for details.","category":"page"},{"location":"logging/#LocalEventLog","page":"Logging and Graphing","title":"LocalEventLog","text":"","category":"section"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"The LocalEventLog is generally only useful when you want combined events (event start and finish combined as a single unit), and only care about a few simple built-in generated events. Let's attach one to our context:","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"ctx = Context()\nlog = TimespanLogging.LocalEventLog()\nctx.log_sink = log","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"Now anytime ctx is used as the context for a scheduler, the scheduler will log events into log.","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"Once sufficient data has been accumulated into a LocalEventLog, it can be gathered to a single host via TimespanLogging.get_logs!(log). The result is a Vector of TimespanLogging.Timespan objects, which describe some metadata about an operation that occured and the scheduler logged. These events may be introspected directly, or may also be rendered to a DOT-format string:","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"logs = TimespanLogging.get_logs!(log)\nstr = Dagger.show_plan(logs)","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"Dagger.show_plan can also be called as Dagger.show_plan(io::IO, logs) to write the graph to a file or other IO object. The string generated by this function may be passed to an external tool like Graphviz for rendering. Note that this method doesn't display input arguments to the DAG (non-Thunks); you can call Dagger.show_plan(logs, thunk), where thunk is the output Thunk of the DAG, to render argument nodes.","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"note: Note\nTimespanLogging.get_logs! clears out the event logs, so that old events don't mix with new ones from future DAGs.","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"As a convenience, it's possible to set ctx.log_file to the path to an output file, and then calls to compute(ctx, ...)/collect(ctx, ...) will automatically write the graph in DOT format to that path. There is also a benefit to this approach over manual calls to get_logs! and show_plan: DAGs which aren't Thunks (such as operations on the Dagger.DArray) will be properly rendered with input arguments (which normally aren't rendered because a Thunk is dynamically generated from such operations by Dagger before scheduling).","category":"page"},{"location":"logging/#FilterLog","page":"Logging and Graphing","title":"FilterLog","text":"","category":"section"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"The FilterLog exists to allow writing events to a user-defined location (such as a database, file, or network socket). It is not currently tested or documented.","category":"page"},{"location":"api-daggerwebdash/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"CurrentModule = DaggerWebDash","category":"page"},{"location":"api-daggerwebdash/functions/#DaggerWebDash-Functions","page":"Functions and Macros","title":"DaggerWebDash Functions","text":"","category":"section"},{"location":"api-daggerwebdash/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"Pages = [\"functions.md\"]","category":"page"},{"location":"api-timespanlogging/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"CurrentModule = TimespanLogging","category":"page"},{"location":"api-timespanlogging/functions/#TimespanLogging-Functions","page":"Functions and Macros","title":"TimespanLogging Functions","text":"","category":"section"},{"location":"api-timespanlogging/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"Pages = [\"functions.md\"]","category":"page"},{"location":"api-timespanlogging/functions/#Basic-Functions","page":"Functions and Macros","title":"Basic Functions","text":"","category":"section"},{"location":"api-timespanlogging/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"timespan_start\ntimespan_finish\nget_logs!\nmake_timespan","category":"page"},{"location":"api-timespanlogging/functions/#TimespanLogging.timespan_start","page":"Functions and Macros","title":"TimespanLogging.timespan_start","text":"timespan_start(ctx, category::Symbol, id, tl)\n\nGenerates an Event{:start} which denotes the start of an event. The event is categorized by category, and uniquely identified by id; these two must be the same passed to timespan_finish to close the event. tl is the \"timeline\" of the event, which is just an arbitrary payload attached to the event.\n\n\n\n\n\n","category":"function"},{"location":"api-timespanlogging/functions/#TimespanLogging.timespan_finish","page":"Functions and Macros","title":"TimespanLogging.timespan_finish","text":"timespan_finish(ctx, category::Symbol, id, tl)\n\nGenerates an Event{:finish} which denotes the end of an event. The event is categorized by category, and uniquely identified by id; these two must be the same as previously passed to timespan_start. tl is the \"timeline\" of the event, which is just an arbitrary payload attached to the event.\n\n\n\n\n\n","category":"function"},{"location":"api-timespanlogging/functions/#TimespanLogging.get_logs!","page":"Functions and Macros","title":"TimespanLogging.get_logs!","text":"get_logs!(::LocalEventLog, raw=false; only_local=false) -> Union{Vector{Timespan},Vector{Event}}\n\nGet the logs from each process' local event log, clearing it in the process. Set raw to true to get potentially unmatched Events; the default is to return only matched events as Timespans. If only_local is set true, only process-local logs will be fetched; the default is to fetch logs from all processes.\n\n\n\n\n\n","category":"function"},{"location":"api-timespanlogging/functions/#TimespanLogging.make_timespan","page":"Functions and Macros","title":"TimespanLogging.make_timespan","text":"make_timespan(start::Event, finish::Event) -> Timespan\n\nCreates a Timespan given the start and finish Events.\n\n\n\n\n\n","category":"function"},{"location":"api-timespanlogging/functions/#Logging-Metric-Functions","page":"Functions and Macros","title":"Logging Metric Functions","text":"","category":"section"},{"location":"api-timespanlogging/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"init_similar","category":"page"},{"location":"api-timespanlogging/functions/#TimespanLogging.init_similar","page":"Functions and Macros","title":"TimespanLogging.init_similar","text":"Creates a copy of x with the same configuration, but fresh/empty data.\n\n\n\n\n\n","category":"function"},{"location":"scopes/#Scopes","page":"Scopes","title":"Scopes","text":"","category":"section"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"Sometimes you will have data that is only meaningful in a certain location, such as within a single Julia process, a given server, or even for a specific Dagger processor. We call this location a \"scope\" in Dagger, denoting the bounds within which the data is meaningful and valid. For example, C pointers are typically scoped to a process, file paths are scoped to one or more servers dependent on filesystem configuration, etc. By default, Dagger doesn't recognize this; it treats everything passed into a task, or generated from a task, as inherently safe to transfer anywhere else. When this is not the case, Dagger provides optional scopes to instruct the scheduler where data is considered valid.","category":"page"},{"location":"scopes/#Scope-Basics","page":"Scopes","title":"Scope Basics","text":"","category":"section"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"Let's take the example of a webcam handle generated by VideoIO.jl. This handle is a C pointer, and thus has process scope. We can open the handle on a given process, and set the scope of the resulting data to be locked to the current process with Dagger.scope to construct a ProcessScope:","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"using VideoIO, Distributed\n\nfunction get_handle()\n handle = VideoIO.opencamera()\n proc = Dagger.thunk_processor()\n scope = Dagger.scope(worker=myid()) # constructs a `ProcessScope`\n return Dagger.tochunk(handle, proc, scope)\nend\n\ncam_handle = Dagger.@spawn get_handle()","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"Now, wherever cam_handle is passed, Dagger will ensure that any computations on the handle only happen within its defined scope. For example, we can read from the camera:","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"cam_frame = Dagger.@spawn read(cam_handle)","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"The cam_frame task is executed within any processor on the same process that the cam_handle task was executed on. Of course, the resulting camera frame is not scoped to anywhere specific (denoted as AnyScope), and thus computations on it may execute anywhere.","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"You may also encounter situations where you want to use a callable struct (such as a closure, or a Flux.jl layer) only within a certain scope; you can specify the scope of the function pretty easily:","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"using Flux\nm = Chain(...)\n# If `m` is only safe to transfer to and execute on this process,\n# we can set a `ProcessScope` on it:\nresult = Dagger.@spawn scope=Dagger.scope(worker=myid()) m(rand(8,8))","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"Setting a scope on the function treats it as a regular piece of data (like the arguments to the function), so it participates in the scoping rules described in the following sections all the same.","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"Scope Functions","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"Now, let's try out some other kinds of scopes, starting with NodeScope. This scope encompasses the server that one or more Julia processes may be running on. Say we want to use memory mapping (mmap) to more efficiently send arrays between two tasks. We can construct the mmap'd array in one task, attach a NodeScope() to it, and using the path of the mmap'd file to communicate its location, lock downstream tasks to the same server:","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"using Mmap\n\nfunction generate()\n path = \"myfile.bin\"\n arr = Mmap.mmap(path, Matrix{Int}, (64,64))\n fill!(arr, 1)\n Mmap.sync!(arr)\n # Note: Dagger.scope() does not yet support node scopes\n Dagger.tochunk(path, Dagger.thunk_processor(), NodeScope())\nend\n\nfunction consume(path)\n arr = Mmap.mmap(path, Matrix{Int}, (64,64))\n sum(arr)\nend\n\na = Dagger.@spawn generate()\n@assert fetch(Dagger.@spawn consume(a)) == 64*64","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"Whatever server a executed on, b will also execute on it!","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"Finally, we come to the \"lowest\" scope on the scope hierarchy, the ExactScope. This scope specifies one exact processor as the bounding scope, and is typically useful in certain limited cases (such as data existing only on a specific GPU). We won't provide an example here, because you don't usually need to ever use this scope, but if you already understand the NodeScope and ProcessScope, the ExactScope should be easy to figure out.","category":"page"},{"location":"scopes/#Union-Scopes","page":"Scopes","title":"Union Scopes","text":"","category":"section"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"Sometimes one simple scope isn't enough! In that case, you can use the UnionScope to construct the union of two or more scopes. Say, for example, you have some sensitive data on your company's servers that you want to compute summaries of, but you'll be driving the computation from your laptop, and you aren't allowed to send the data itself outside of the company's network. You could accomplish this by constructing a UnionScope of ProcessScopes of each of the non-laptop Julia processes, and use that to ensure that the data in its original form always stays within the company network:","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"addprocs(4) # some local processors\nprocs = addprocs([(\"server.company.com\", 4)]) # some company processors\n\nsecrets_scope = UnionScope(ProcessScope.(procs))\n\nfunction generate_secrets()\n secrets = open(\"/shared/secret_results.txt\", \"r\") do io\n String(read(io))\n end\n Dagger.tochunk(secrets, Dagger.thunk_processor(), secrets_scope)\nend\n\nsummarize(secrets) = occursin(\"QA Pass\", secrets)\n\n# Generate the data on the first company process\nsensitive_data = Dagger.@spawn single=first(procs) generate_secrets()\n\n# We can safely call this, knowing that it will be executed on a company server\nqa_passed = Dagger.@spawn summarize(sensitive_data)","category":"page"},{"location":"scopes/#Mismatched-Scopes","page":"Scopes","title":"Mismatched Scopes","text":"","category":"section"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"You might now be thinking, \"What if I want to run a task on multiple pieces of data whose scopes don't match up?\" In such a case, Dagger will throw an error, refusing to schedule that task, since the intersection of the data scopes is an empty set (there is no feasible processor which can satisfy the scoping constraints). For example:","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"ps2 = ProcessScope(2)\nps3 = ProcessScope(3)\n\ngenerate(scope) = Dagger.tochunk(rand(64), Dagger.thunk_processor(), scope)\n\nd2 = Dagger.@spawn generate(ps2) # Run on process 2\nd3 = Dagger.@spawn generate(ps3) # Run on process 3\nres = Dagger.@spawn d2 * d3 # An error!","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"Moral of the story: only use scopes when you know you really need them, and if you aren't careful to arrange everything just right, be prepared for Dagger to refuse to schedule your tasks! Scopes should only be used to ensure correctness of your programs, and are not intended to be used to optimize the schedule that Dagger uses for your tasks, since restricting the scope of execution for tasks will necessarily reduce the optimizations that Dagger's scheduler can perform.","category":"page"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"CurrentModule = Dagger","category":"page"},{"location":"api-dagger/functions/#Dagger-Functions","page":"Functions and Macros","title":"Dagger Functions","text":"","category":"section"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"Pages = [\"functions.md\"]","category":"page"},{"location":"api-dagger/functions/#Task-Functions/Macros","page":"Functions and Macros","title":"Task Functions/Macros","text":"","category":"section"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"@spawn\nspawn\ndelayed\n@par","category":"page"},{"location":"api-dagger/functions/#Dagger.@spawn","page":"Functions and Macros","title":"Dagger.@spawn","text":"@spawn [opts] f(args...) -> Thunk\n\nConvenience macro like Dagger.@par, but eagerly executed from the moment it's called (equivalent to spawn).\n\nSee the docs for @par for more information and usage examples.\n\n\n\n\n\n","category":"macro"},{"location":"api-dagger/functions/#Dagger.spawn","page":"Functions and Macros","title":"Dagger.spawn","text":"spawn(f, args...; kwargs...) -> EagerThunk\n\nSpawns a task with f as the function, args as the arguments, and kwargs as the keyword arguments, returning an EagerThunk. Uses a scheduler running in the background to execute code.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.delayed","page":"Functions and Macros","title":"Dagger.delayed","text":"delayed(f, options=Options())(args...; kwargs...) -> Thunk\ndelayed(f; options...)(args...; kwargs...) -> Thunk\n\nCreates a Thunk object which can be executed later, which will call f with args and kwargs. options controls various properties of the resulting Thunk.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.@par","page":"Functions and Macros","title":"Dagger.@par","text":"@par [opts] f(args...; kwargs...) -> Thunk\n\nConvenience macro to call Dagger.delayed on f with arguments args and keyword arguments kwargs. May also be called with a series of assignments like so:\n\nx = @par begin\n a = f(1,2)\n b = g(a,3)\n h(a,b)\nend\n\nx will hold the Thunk representing h(a,b); additionally, a and b will be defined in the same local scope and will be equally accessible for later calls.\n\nOptions to the Thunk can be set as opts with namedtuple syntax, e.g. single=1. Multiple options may be provided, and will be applied to all generated thunks.\n\n\n\n\n\n","category":"macro"},{"location":"api-dagger/functions/#Task-Options-Functions/Macros","page":"Functions and Macros","title":"Task Options Functions/Macros","text":"","category":"section"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"with_options\nget_options\n@option\ndefault_option","category":"page"},{"location":"api-dagger/functions/#Dagger.with_options","page":"Functions and Macros","title":"Dagger.with_options","text":"with_options(f, options::NamedTuple) -> Any\nwith_options(f; options...) -> Any\n\nSets one or more options to the given values, executes f(), resets the options to their previous values, and returns the result of f(). This is the recommended way to set options, as it only affects tasks spawned within its scope. Note that setting an option here will propagate its value across Julia or Dagger tasks spawned by f() or its callees (i.e. the options propagate).\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.get_options","page":"Functions and Macros","title":"Dagger.get_options","text":"get_options(key::Symbol, default) -> Any\nget_options(key::Symbol) -> Any\n\nReturns the value of the option named key. If option does not have a value set, then an error will be thrown, unless default is set, in which case it will be returned instead of erroring.\n\nget_options() -> NamedTuple\n\nReturns a NamedTuple of all option key-value pairs.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.@option","page":"Functions and Macros","title":"Dagger.@option","text":"@option name myfunc(A, B, C) = value\n\nA convenience macro for defining default_option. For example:\n\nDagger.@option single mylocalfunc(Int) = 1\n\nThe above call will set the single option to 1 for any Dagger task calling mylocalfunc(Int) with an Int argument.\n\n\n\n\n\n","category":"macro"},{"location":"api-dagger/functions/#Dagger.default_option","page":"Functions and Macros","title":"Dagger.default_option","text":"default_option(::Val{name}, Tf, Targs...) where name = value\n\nDefines the default value for option name to value when Dagger is preparing to execute a function with type Tf with the argument types Targs. Users and libraries may override this to set default values for tasks.\n\nAn easier way to define these defaults is with @option.\n\nNote that the actual task's argument values are not passed, as it may not always be possible or efficient to gather all Dagger task arguments on one worker.\n\nThis function may be executed within the scheduler, so it should generally be made very cheap to execute. If the function throws an error, the scheduler will use whatever the global default value is for that option instead.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Data-Management-Functions","page":"Functions and Macros","title":"Data Management Functions","text":"","category":"section"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"tochunk\n@mutable\n@shard\nshard","category":"page"},{"location":"api-dagger/functions/#Dagger.tochunk","page":"Functions and Macros","title":"Dagger.tochunk","text":"tochunk(x, proc::Processor, scope::AbstractScope; device=nothing, kwargs...) -> Chunk\n\nCreate a chunk from data x which resides on proc and which has scope scope.\n\ndevice specifies a MemPool.StorageDevice (which is itself wrapped in a Chunk) which will be used to manage the reference contained in the Chunk generated by this function. If device is nothing (the default), the data will be inspected to determine if it's safe to serialize; if so, the default MemPool storage device will be used; if not, then a MemPool.CPURAMDevice will be used.\n\nAll other kwargs are passed directly to MemPool.poolset.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.@shard","page":"Functions and Macros","title":"Dagger.@shard","text":"Creates a Shard. See Dagger.shard for details.\n\n\n\n\n\n","category":"macro"},{"location":"api-dagger/functions/#Dagger.shard","page":"Functions and Macros","title":"Dagger.shard","text":"shard(f; kwargs...) -> Chunk{Shard}\n\nExecutes f on all workers in workers, wrapping the result in a process-scoped Chunk, and constructs a Chunk{Shard} containing all of these Chunks on the current worker.\n\nKeyword arguments:\n\nprocs – The list of processors to create pieces on. May be any iterable container of Processors.\nworkers – The list of workers to create pieces on. May be any iterable container of Integers.\nper_thread::Bool=false – If true, creates a piece per each thread, rather than a piece per each worker.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Scope-Functions","page":"Functions and Macros","title":"Scope Functions","text":"","category":"section"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"scope\nconstrain","category":"page"},{"location":"api-dagger/functions/#Dagger.scope","page":"Functions and Macros","title":"Dagger.scope","text":"scope(scs...) -> AbstractScope\nscope(;scs...) -> AbstractScope\n\nConstructs an AbstractScope from a set of scope specifiers. Each element in scs is a separate specifier; if scs is empty, an empty UnionScope() is produced; if scs has one element, then exactly one specifier is constructed; if scs has more than one element, a UnionScope of the scopes specified by scs is constructed. A variety of specifiers can be passed to construct a scope:\n\n:any - Constructs an AnyScope()\n:default - Constructs a DefaultScope()\n(scs...,) - Constructs a UnionScope of scopes, each specified by scs\nthread=tid or threads=[tids...] - Constructs an ExactScope or UnionScope containing all Dagger.ThreadProcs with thread ID tid/tids across all workers.\nworker=wid or workers=[wids...] - Constructs a ProcessScope or UnionScope containing all Dagger.ThreadProcs with worker ID wid/wids across all threads.\nthread=tid/threads=tids and worker=wid/workers=wids - Constructs an ExactScope, ProcessScope, or UnionScope containing all Dagger.ThreadProcs with worker ID wid/wids and threads tid/tids.\n\nAside from the worker and thread specifiers, it's possible to add custom specifiers for scoping to other kinds of processors (like GPUs) or providing different ways to specify a scope. Specifier selection is determined by a precedence ordering: by default, all specifiers have precedence 0, which can be changed by defining scope_key_precedence(::Val{spec}) = precedence (where spec is the specifier as a Symbol). The specifier with the highest precedence in a set of specifiers is used to determine the scope by calling to_scope(::Val{spec}, sc::NamedTuple) (where sc is the full set of specifiers), which should be overriden for each custom specifier, and which returns an AbstractScope. For example:\n\n# Setup a GPU specifier\nDagger.scope_key_precedence(::Val{:gpu}) = 1\nDagger.to_scope(::Val{:gpu}, sc::NamedTuple) = ExactScope(MyGPUDevice(sc.worker, sc.gpu))\n\n# Generate an `ExactScope` for `MyGPUDevice` on worker 2, device 3\nDagger.scope(gpu=3, worker=2)\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.constrain","page":"Functions and Macros","title":"Dagger.constrain","text":"constraint(x::AbstractScope, y::AbstractScope) -> ::AbstractScope\n\nConstructs a scope that is the intersection of scopes x and y.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Lazy-Task-Functions","page":"Functions and Macros","title":"Lazy Task Functions","text":"","category":"section"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"domain\ncompute\ndependents\nnoffspring\norder\ntreereduce","category":"page"},{"location":"api-dagger/functions/#Dagger.domain","page":"Functions and Macros","title":"Dagger.domain","text":"domain(x::T)\n\nReturns metadata about x. This metadata will be in the domain field of a Chunk object when an object of type T is created as the result of evaluating a Thunk.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.compute","page":"Functions and Macros","title":"Dagger.compute","text":"compute(ctx::Context, d::Thunk; options=nothing) -> Chunk\n\nCompute a Thunk - creates the DAG, assigns ranks to nodes for tie breaking and runs the scheduler with the specified options. Returns a Chunk which references the result.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.dependents","page":"Functions and Macros","title":"Dagger.dependents","text":"dependents(node::Thunk) -> Dict{Union{Thunk,Chunk}, Set{Thunk}}\n\nFind the set of direct dependents for each task.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.noffspring","page":"Functions and Macros","title":"Dagger.noffspring","text":"noffspring(dpents::Dict{Union{Thunk,Chunk}, Set{Thunk}}) -> Dict{Thunk, Int}\n\nRecursively find the number of tasks dependent on each task in the DAG. Takes a Dict as returned by dependents.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.order","page":"Functions and Macros","title":"Dagger.order","text":"order(node::Thunk, ndeps) -> Dict{Thunk,Int}\n\nGiven a root node of the DAG, calculates a total order for tie-breaking.\n\nRoot node gets score 1,\nrest of the nodes are explored in DFS fashion but chunks of each node are explored in order of noffspring, i.e. total number of tasks depending on the result of the said node.\n\nArgs:\n\nnode: root node\nndeps: result of noffspring\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.treereduce","page":"Functions and Macros","title":"Dagger.treereduce","text":"Tree reduce\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Processor-Functions","page":"Functions and Macros","title":"Processor Functions","text":"","category":"section"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"execute!\niscompatible\ndefault_enabled\nget_processors\nget_parent\nmove\nget_tls\nset_tls!","category":"page"},{"location":"api-dagger/functions/#Dagger.execute!","page":"Functions and Macros","title":"Dagger.execute!","text":"execute!(proc::Processor, f, args...; kwargs...) -> Any\n\nExecutes the function f with arguments args and keyword arguments kwargs on processor proc. This function can be overloaded by Processor subtypes to allow executing function calls differently than normal Julia.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.iscompatible","page":"Functions and Macros","title":"Dagger.iscompatible","text":"iscompatible(proc::Processor, opts, f, Targs...) -> Bool\n\nIndicates whether proc can execute f over Targs given opts. Processor subtypes should overload this function to return true if and only if it is essentially guaranteed that f(::Targs...) is supported. Additionally, iscompatible_func and iscompatible_arg can be overriden to determine compatibility of f and Targs individually. The default implementation returns false.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.default_enabled","page":"Functions and Macros","title":"Dagger.default_enabled","text":"default_enabled(proc::Processor) -> Bool\n\nReturns whether processor proc is enabled by default. The default value is false, which is an opt-out of the processor from execution when not specifically requested by the user, and true implies opt-in, which causes the processor to always participate in execution when possible.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.get_processors","page":"Functions and Macros","title":"Dagger.get_processors","text":"get_processors(proc::Processor) -> Set{<:Processor}\n\nReturns the set of processors contained in proc, if any. Processor subtypes should overload this function if they can contain sub-processors. The default method will return a Set containing proc itself.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.get_parent","page":"Functions and Macros","title":"Dagger.get_parent","text":"get_parent(proc::Processor) -> Processor\n\nReturns the parent processor for proc. The ultimate parent processor is an OSProc. Processor subtypes should overload this to return their most direct parent.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.move","page":"Functions and Macros","title":"Dagger.move","text":"move(from_proc::Processor, to_proc::Processor, x)\n\nMoves and/or converts x such that it's available and suitable for usage on the to_proc processor. This function can be overloaded by Processor subtypes to transport arguments and convert them to an appropriate form before being used for exection. Subtypes of Processor wishing to implement efficient data movement should provide implementations where x::Chunk.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.get_tls","page":"Functions and Macros","title":"Dagger.get_tls","text":"get_tls()\n\nGets all Dagger TLS variable as a NamedTuple.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.set_tls!","page":"Functions and Macros","title":"Dagger.set_tls!","text":"set_tls!(tls)\n\nSets all Dagger TLS variables from the NamedTuple tls.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Context-Functions","page":"Functions and Macros","title":"Context Functions","text":"","category":"section"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"addprocs!\nrmprocs!","category":"page"},{"location":"api-dagger/functions/#Dagger.addprocs!","page":"Functions and Macros","title":"Dagger.addprocs!","text":"addprocs!(ctx::Context, xs)\n\nAdd new workers xs to ctx.\n\nWorkers will typically be assigned new tasks in the next scheduling iteration if scheduling is ongoing.\n\nWorkers can be either Processors or the underlying process IDs as Integers.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.rmprocs!","page":"Functions and Macros","title":"Dagger.rmprocs!","text":"rmprocs!(ctx::Context, xs)\n\nRemove the specified workers xs from ctx.\n\nWorkers will typically finish all their assigned tasks if scheduling is ongoing but will not be assigned new tasks after removal.\n\nWorkers can be either Processors or the underlying process IDs as Integers.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Thunk-Execution-Environment-Functions","page":"Functions and Macros","title":"Thunk Execution Environment Functions","text":"","category":"section"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"These functions are used within the function called by a Thunk.","category":"page"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"in_thunk\nthunk_processor","category":"page"},{"location":"api-dagger/functions/#Dagger.in_thunk","page":"Functions and Macros","title":"Dagger.in_thunk","text":"in_thunk()\n\nReturns true if currently in a Thunk process, else false.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.thunk_processor","page":"Functions and Macros","title":"Dagger.thunk_processor","text":"thunk_processor()\n\nGet the current processor executing the current thunk.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dynamic-Scheduler-Control-Functions","page":"Functions and Macros","title":"Dynamic Scheduler Control Functions","text":"","category":"section"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"These functions query and control the scheduler remotely.","category":"page"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"Sch.sch_handle\nSch.add_thunk!\nBase.fetch\nBase.wait\nSch.exec!\nSch.halt!\nSch.get_dag_ids","category":"page"},{"location":"api-dagger/functions/#Dagger.Sch.sch_handle","page":"Functions and Macros","title":"Dagger.Sch.sch_handle","text":"Gets the scheduler handle for the currently-executing thunk.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.Sch.add_thunk!","page":"Functions and Macros","title":"Dagger.Sch.add_thunk!","text":"Adds a new Thunk to the DAG.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Base.fetch","page":"Functions and Macros","title":"Base.fetch","text":"Base.fetch(c::DArray)\n\nIf a DArray tree has a Thunk in it, make the whole thing a big thunk.\n\n\n\n\n\nWaits on a thunk to complete, and fetches its result.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Base.wait","page":"Functions and Macros","title":"Base.wait","text":"Waits on a thunk to complete.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.Sch.exec!","page":"Functions and Macros","title":"Dagger.Sch.exec!","text":"Executes an arbitrary function within the scheduler, returning the result.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.Sch.halt!","page":"Functions and Macros","title":"Dagger.Sch.halt!","text":"Commands the scheduler to halt execution immediately.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.Sch.get_dag_ids","page":"Functions and Macros","title":"Dagger.Sch.get_dag_ids","text":"Returns all Thunks IDs as a Dict, mapping a Thunk to its downstream dependents.\n\n\n\n\n\n","category":"function"},{"location":"data-management/#Data-Management","page":"Data Management","title":"Data Management","text":"","category":"section"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"Dagger is not just a computing platform - it also has awareness of where each piece of data resides, and will move data between workers and perform conversions as necessary to satisfy the needs of your tasks.","category":"page"},{"location":"data-management/#Chunks","page":"Data Management","title":"Chunks","text":"","category":"section"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"Dagger often needs to move data between workers to allow a task to execute. To make this efficient when communicating potentially large units of data, Dagger uses a remote reference, called a Chunk, to refer to objects which may exist on another worker. Chunks are backed by a distributed refcounting mechanism provided by MemPool.jl, which ensures that the referenced data is not garbage collected until all Chunks referencing that object are GC'd from all workers.","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"Conveniently, if you pass in a Chunk object as an input to a Dagger task, then the task's payload function will get executed with the value contained in the Chunk. The scheduler also understands Chunks, and will try to schedule tasks close to where their Chunk inputs reside, to reduce communication overhead.","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"Chunks also have a cached type, a \"processor\", and a \"scope\", which are important for identifying the type of the object, where in memory (CPU RAM, GPU VRAM, etc.) the value resides, and where the value is allowed to be transferred and dereferenced. See Processors and Scopes for more details on how these properties can be used to control scheduling behavior around Chunks.","category":"page"},{"location":"data-management/#Mutation","page":"Data Management","title":"Mutation","text":"","category":"section"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"Normally, Dagger tasks should be functional and \"pure\": never mutating their inputs, always producing identical outputs for a given set of inputs, and never producing side effects which might affect future program behavior. However, for certain codes, this restriction ends up costing the user performance and engineering time to work around.","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"Thankfully, Dagger provides the Dagger.@mutable macro for just this purpose. @mutable allows data to be marked such that it will never be copied or serialized by the scheduler (unless copied by the user). When used as an argument to a task, the task will be forced to execute on the same worker that @mutable was called on. For example:","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"Dagger.@mutable worker=2 Threads.Atomic{Int}(0)\nx::Dagger.Chunk # The result is always a `Chunk`\n\n# x is now considered mutable, and may only be accessed on worker 2:\nwait(Dagger.@spawn Threads.atomic_add!(x, 1)) # Always executed on worker 2\nwait(Dagger.@spawn scope=Dagger.scope(worker=1) Threads.atomic_add!(x, 1)) # SchedulingException","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"@mutable, when called as above, is constructed on worker 2, and the data gains a scope of ProcessScope(myid()), which means that any processor on that worker is allowed to execute tasks that use the object (subject to the usual scheduling rules).","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"@mutable also allows the scope to be manually supplied, if more specific restrictions are desirable:","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"x = @mutable scope=Dagger.scope(worker=1, threads=[3,4]) rand(100)\n# x is now scoped to threads 3 and 4 on worker `myid()`","category":"page"},{"location":"data-management/#Sharding","page":"Data Management","title":"Sharding","text":"","category":"section"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"@mutable is convenient for creating a single mutable object, but often one wants to have multiple mutable objects, with each object being scoped to their own worker or thread in the cluster, to be used as local counters, partial reduction containers, data caches, etc.","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"The Shard object (constructed with Dagger.@shard/Dagger.shard) is a mechanism by which such a setup can be created with one invocation. By default, each worker will have their own local object which will be used when a task that uses the shard as an argument is scheduled on that worker. Other shard pieces that aren't scoped to the processor being executed on will not be serialized or copied, keeping communication costs constant even with a very large shard.","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"This mechanism makes it easy to construct a distributed set of mutable objects which are treated as \"mirrored shards\" by the scheduler, but require no further user input to access. For example, creating and using a local counter for each worker is trivial:","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"# Create a local atomic counter on each worker that Dagger knows about:\ncs = Dagger.@shard Threads.Atomic{Int}(0)\n\n# Let's add `1` to the local counter, not caring about which worker we're on:\nwait.([Dagger.@spawn Threads.atomic_add!(cs, 1) for i in 1:1000])\n\n# And let's fetch the total sum of all counters:\n@assert sum(map(ctr->fetch(ctr)[], cs)) == 1000","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"Note that map, when used on a shard, will execute the provided function once per shard \"piece\", and each result is considered immutable. map is an easy way to make a copy of each piece of the shard, to be later reduced, scanned, etc.","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"Further details about what arguments can be passed to @shard/shard can be found in Data Management Functions.","category":"page"},{"location":"processors/#Processors","page":"Processors","title":"Processors","text":"","category":"section"},{"location":"processors/","page":"Processors","title":"Processors","text":"Dagger contains a flexible mechanism to represent CPUs, GPUs, and other devices that the scheduler can place user work on. The individual devices that are capable of computing a user operation are called \"processors\", and are subtypes of Dagger.Processor. Processors are automatically detected by Dagger at scheduler initialization, and placed in a hierarchy reflecting the physical (network-, link-, or memory-based) boundaries between processors in the hierarchy. The scheduler uses the information in this hierarchy to efficiently schedule and partition user operations.","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"Dagger's Chunk objects can have a processor associated with them that defines where the contained data \"resides\". Each processor has a set of functions that define the mechanisms and rules by which the data can be transferred between similar or different kinds of processors, and will be called by Dagger's scheduler automatically when fetching function arguments (or the function itself) for computation on a given processor.","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"Setting the processor on a function argument is done by wrapping it in a Chunk with Dagger.tochunk:","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"a = 1\nb = 2\n# Let's say `b` \"resides\" on the second thread of the first worker:\nb_chunk = Dagger.tochunk(b, Dagger.ThreadProc(1, 2))::Dagger.Chunk\nc = Dagger.@spawn a + b_chunk\nfetch(c) == 3","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"It's also simple to set the processor of the function being passed; it will be automatically wrapped in a Chunk if necessary:","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"# `+` is treated as existing on the second thread of the first worker:\nDagger.@spawn processor=Dagger.ThreadProc(1, 2) a + b","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"You can also tell Dagger about the processor type for the returned value of a task by making it a Chunk:","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"Dagger.spawn(a) do a\n c = a + 1\n return Dagger.tochunk(c, Dagger.ThreadProc(1, 2))\nend","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"Note that unless you know that your function, arguments, or return value are associated with a specific processor, you don't need to assign one to them. Dagger will treat them as being simple values with no processor association, and will serialize them to wherever they're used.","category":"page"},{"location":"processors/#Hardware-capabilities,-topology,-and-data-locality","page":"Processors","title":"Hardware capabilities, topology, and data locality","text":"","category":"section"},{"location":"processors/","page":"Processors","title":"Processors","text":"The processor hierarchy is modeled as a multi-root tree, where each root is an OSProc, which represents a Julia OS process, and the \"children\" of the root or some other branch in the tree represent the processors which reside on the same logical server as the \"parent\" branch. All roots are connected to each other directly, in the common case. The processor hierarchy's topology is automatically detected and elaborated by callbacks in Dagger, which users may manipulate to add detection of extra processors.","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"A move between a given pair of processors is implemented as a Julia function dispatching on the types of each processor, as well as the type of the data being moved. Users are permitted to define custom move functions to improve data movement efficiency, perform automatic value conversions, or even make use of special IPC facilities. Custom processors may also be defined by the user to represent a processor type which is not automatically detected by Dagger, such as novel GPUs, special OS process abstractions, FPGAs, etc.","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"Movement of data between any two processors A and B (from A to B), if not defined by the user, is decomposed into 3 moves: processor A to OSProc parent of A, OSProc parent of A to OSProc parent of B, and OSProc parent of B to processor B. This mechanism uses Julia's Serialization library to serialize and deserialize data, so data must be serializable for this mechanism to work properly.","category":"page"},{"location":"processors/#Processor-Selection","page":"Processors","title":"Processor Selection","text":"","category":"section"},{"location":"processors/","page":"Processors","title":"Processors","text":"By default, Dagger uses the CPU to process work, typically single-threaded per cluster node. However, Dagger allows access to a wider range of hardware and software acceleration techniques, such as multithreading and GPUs. These more advanced (but performant) accelerators are disabled by default, but can easily be enabled by using scopes (see Scopes for details).","category":"page"},{"location":"processors/#Resource-Control","page":"Processors","title":"Resource Control","text":"","category":"section"},{"location":"processors/","page":"Processors","title":"Processors","text":"Dagger assumes that a thunk executing on a processor, fully utilizes that processor at 100%. When this is not the case, you can tell Dagger as much with options.procutil:","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"procutil = Dict(\n Dagger.ThreadProc => 4.0, # utilizes 4 CPU threads fully\n DaggerGPU.CuArrayProc => 0.1 # utilizes 10% of a single CUDA GPU\n)","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"Dagger will use this information to execute only as many thunks on a given processor (or set of similar processors) as add up to less than or equal to 1.0 total utilization. If a thunk is scheduled onto a processor which the local worker deems as \"oversubscribed\", it will not execute the thunk until sufficient resources become available by thunks completing execution.","category":"page"},{"location":"processors/#GPU-Processors","page":"Processors","title":"GPU Processors","text":"","category":"section"},{"location":"processors/","page":"Processors","title":"Processors","text":"The DaggerGPU.jl package can be imported to enable GPU acceleration for NVIDIA and AMD GPUs, when available. The processors provided by that package are not enabled by default, but may be enabled via custom scopes (Scopes).","category":"page"},{"location":"processors/#Future:-Network-Devices-and-Topology","page":"Processors","title":"Future: Network Devices and Topology","text":"","category":"section"},{"location":"processors/","page":"Processors","title":"Processors","text":"In the future, users will be able to define network devices attached to a given processor, which provides a direct connection to a network device on another processor, and may be used to transfer data between said processors. Data movement rules will most likely be defined by a similar (or even identical) mechanism to the current processor move mechanism. The multi-root tree will be expanded to a graph to allow representing these network devices (as they may potentially span non-root nodes).","category":"page"},{"location":"processors/#Redundancy","page":"Processors","title":"Redundancy","text":"","category":"section"},{"location":"processors/#Fault-Tolerance","page":"Processors","title":"Fault Tolerance","text":"","category":"section"},{"location":"processors/","page":"Processors","title":"Processors","text":"Dagger has a single means for ensuring redundancy, which is currently called \"fault tolerance\". Said redundancy is only targeted at a specific failure mode, namely the unexpected exit or \"killing\" of a worker process in the cluster. This failure mode often presents itself when running on a Linux and generating large memory allocations, where the Out Of Memory (OOM) killer process can kill user processes to free their allocated memory for the Linux kernel to use. The fault tolerance system mitigates the damage caused by the OOM killer performing its duties on one or more worker processes by detecting the fault as a process exit exception (generated by Julia), and then moving any \"lost\" work to other worker processes for re-computation.","category":"page"},{"location":"processors/#Future:-Multi-master,-Network-Failure-Correction,-etc.","page":"Processors","title":"Future: Multi-master, Network Failure Correction, etc.","text":"","category":"section"},{"location":"processors/","page":"Processors","title":"Processors","text":"This single redundancy mechanism helps alleviate a common issue among HPC and scientific users, however it does little to help when, for example, the master node exits, or a network link goes down. Such failure modes require a more complicated detection and recovery process, including multiple master processes, a distributed and replicated database such as etcd, and checkpointing of the scheduler to ensure an efficient recovery. Such a system does not yet exist, but contributions for such a change are desired.","category":"page"},{"location":"processors/#Dynamic-worker-pools","page":"Processors","title":"Dynamic worker pools","text":"","category":"section"},{"location":"processors/","page":"Processors","title":"Processors","text":"Dagger's default scheduler supports modifying the worker pool while the scheduler is running. This is done by modifying the Processors of the Context supplied to the scheduler at initialization using addprocs!(ctx, ps) and rmprocs(ctx, ps) where ps can be Processors or just process ids.","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"An example of when this is useful is in HPC environments where individual jobs to start up workers are queued so that not all workers are guaranteed to be available at the same time.","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"New workers will typically be assigned new tasks as soon as the scheduler sees them. Removed workers will finish all their assigned tasks but will not be assigned any new tasks. Note that this makes it difficult to determine when a worker is no longer in use by Dagger. Contributions to alleviate this uncertainty are welcome!","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"Example:","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"using Distributed\n\nps1 = addprocs(2, exeflags=\"--project\")\n@everywhere using Distributed, Dagger\n\n# Dummy task to wait for 0.5 seconds and then return the id of the worker\nts = delayed(vcat)((delayed(i -> (sleep(0.5); myid()))(i) for i in 1:20)...)\n\nctx = Context()\n# Scheduler is blocking, so we need a new task to add workers while it runs\njob = @async collect(ctx, ts)\n\n# Lets fire up some new workers\nps2 = addprocs(2, exeflags=\"--project\")\n@everywhere ps2 using Distributed, Dagger\n# New workers are not available until we do this\naddprocs!(ctx, ps2)\n\n# Lets hope the job didn't complete before workers were added :)\n@show fetch(job) |> unique\n\n# and cleanup after ourselves...\nworkers() |> rmprocs","category":"page"},{"location":"api-daggerwebdash/types/","page":"Types","title":"Types","text":"CurrentModule = DaggerWebDash","category":"page"},{"location":"api-daggerwebdash/types/#DaggerWebDash-Types","page":"Types","title":"DaggerWebDash Types","text":"","category":"section"},{"location":"api-daggerwebdash/types/","page":"Types","title":"Types","text":"Pages = [\"types.md\"]","category":"page"},{"location":"api-daggerwebdash/types/#Logging-Event-Types","page":"Types","title":"Logging Event Types","text":"","category":"section"},{"location":"api-daggerwebdash/types/","page":"Types","title":"Types","text":"D3Renderer\nTableStorage\nProfileMetrics","category":"page"},{"location":"api-daggerwebdash/types/#DaggerWebDash.D3Renderer","page":"Types","title":"DaggerWebDash.D3Renderer","text":"D3Renderer(port::Int, port_range::UnitRange; seek_store=nothing) -> D3Renderer\n\nConstructs a D3Renderer, which is a TimespanLogging aggregator which renders the logs over HTTP using the d3.js library. port is the port that will be serving the HTTP website. port_range specifies a range of ports that will be used to listen for connections from other Dagger workers. seek_store, if specified, is a Tables.jl-compatible object that logs will be written to and read from. This table can be written to disk and then re-read later for offline log analysis.\n\n\n\n\n\n","category":"type"},{"location":"api-daggerwebdash/types/#DaggerWebDash.TableStorage","page":"Types","title":"DaggerWebDash.TableStorage","text":"TableStorage\n\nLogWindow-compatible aggregator which stores logs in a Tables.jl-compatible sink.\n\nUsing a TableStorage is reasonably simple:\n\nml = TimespanLogging.MultiEventLog()\n\n... # Add some events\n\nlw = TimespanLogging.LogWindow(5*10^9, :core)\n\n# Create a DataFrame with one Any[] for each event\ndf = DataFrame([key=>[] for key in keys(ml.consumers)]...)\n\n# Create the TableStorage and register its creation handler\nts = DaggerWebDash.TableStorage(df)\npush!(lw.creation_handlers, ts)\n\nml.aggregators[:lw] = lw\n\n# Logs will now be saved into `df` automatically, and packages like\n# DaggerWebDash.jl will automatically use it to retrieve subsets of the logs.\n\n\n\n\n\n","category":"type"},{"location":"api-daggerwebdash/types/#DaggerWebDash.ProfileMetrics","page":"Types","title":"DaggerWebDash.ProfileMetrics","text":"ProfileMetrics\n\nTracks compute profile traces.\n\n\n\n\n\n","category":"type"},{"location":"#Dagger:-A-framework-for-out-of-core-and-parallel-execution","page":"Home","title":"Dagger: A framework for out-of-core and parallel execution","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Dagger.jl is a framework for parallel computing across all kinds of resources, like CPUs and GPUs, and across multiple threads and multiple servers.","category":"page"},{"location":"","page":"Home","title":"Home","text":"","category":"page"},{"location":"#Quickstart:-Task-Spawning","page":"Home","title":"Quickstart: Task Spawning","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"For more details: Task Spawning","category":"page"},{"location":"#Launch-a-task","page":"Home","title":"Launch a task","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"If you want to call a function myfunc with arguments arg1, arg2, arg3, and keyword argument color=:red:","category":"page"},{"location":"","page":"Home","title":"Home","text":"function myfunc(arg1, arg2, arg3; color=:blue)\n arg_total = arg1 + arg2 * arg3\n printstyled(arg_total; color)\n return arg_total\nend\nt = Dagger.@spawn myfunc(arg1, arg2, arg3; color=:red)","category":"page"},{"location":"","page":"Home","title":"Home","text":"This will run the function asynchronously; you can fetch its result with fetch(t), or just wait on it to complete with wait(t). If the call to myfunc throws an error, fetch(t) will rethrow it.","category":"page"},{"location":"","page":"Home","title":"Home","text":"If running Dagger with multiple workers, make sure to define myfunc with @everywhere from the Distributed stdlib.","category":"page"},{"location":"#Launch-a-task-with-an-anonymous-function","page":"Home","title":"Launch a task with an anonymous function","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"It's more convenient to use Dagger.spawn for anonymous functions. Taking the previous example, but using an anonymous function instead of myfunc:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Dagger.spawn((arg1, arg2, arg3; color=:blue) -> begin\n arg_total = arg1 + arg2 * arg3\n printstyled(arg_total; color)\n return arg_total\nend, arg1, arg2, arg3; color=:red)","category":"page"},{"location":"","page":"Home","title":"Home","text":"spawn is functionally identical to @spawn, but can be more or less convenient to use, depending on what you're trying to do.","category":"page"},{"location":"#Launch-many-tasks-and-wait-on-them-all-to-complete","page":"Home","title":"Launch many tasks and wait on them all to complete","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"@spawn participates in @sync blocks, just like @async and Threads.@spawn, and will cause @sync to wait until all the tasks have completed:","category":"page"},{"location":"","page":"Home","title":"Home","text":"@sync for result in simulation_results\n Dagger.@spawn send_result_to_database(result)\nend\nnresults = length(simulation_results)\nwait(Dagger.@spawn update_database_result_count(nresults))","category":"page"},{"location":"","page":"Home","title":"Home","text":"Above, update_database_result_count will only run once all send_result_to_database calls have completed.","category":"page"},{"location":"","page":"Home","title":"Home","text":"Note that other APIs (including spawn) do not participate in @sync blocks.","category":"page"},{"location":"#Run-a-task-on-a-specific-Distributed-worker","page":"Home","title":"Run a task on a specific Distributed worker","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Dagger uses Scopes to control where tasks can execute. There's a handy constructor, Dagger.scope, that makes defining scopes easy:","category":"page"},{"location":"","page":"Home","title":"Home","text":"w2_only = Dagger.scope(worker=2)\nDagger.@spawn scope=w2_only myfunc(arg1, arg2, arg3; color=:red)","category":"page"},{"location":"","page":"Home","title":"Home","text":"Now the launched task will definitely execute on worker 2 (or if it's not possible to run on worker 2, Dagger will throw an error when you try to fetch the result).","category":"page"},{"location":"#Parallelize-nested-loops","page":"Home","title":"Parallelize nested loops","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Nested loops are a very common pattern in Julia, yet it's often difficult to parallelize them efficiently with @threads or @distributed/pmap. Thankfully, this kind of problem is quite easy for Dagger to handle; here is an example of parallelizing a two-level nested loop, where the inner loop computations (g) depend on an outer loop computation (f):","category":"page"},{"location":"","page":"Home","title":"Home","text":"@everywhere begin\n using Random\n Random.seed!(0)\n\n # Some \"expensive\" functions that complete at different speeds\n const crn = abs.(randn(20, 7))\n f(i) = sleep(crn[i, 7])\n g(i, j, y) = sleep(crn[i, j])\nend\nfunction nested_dagger()\n @sync for i in 1:20\n y = Dagger.@spawn f(i)\n for j in 1:6\n z = Dagger.@spawn g(i, j, y)\n end\n end\nend","category":"page"},{"location":"","page":"Home","title":"Home","text":"And the equivalent (and less performant) example with Threads.@threads, either parallelizing the inner or outer loop:","category":"page"},{"location":"","page":"Home","title":"Home","text":"function nested_threads_outer()\n Threads.@threads for i in 1:20\n y = f(i)\n for j in 1:6\n z = g(i, j, y)\n end\n end\nend\nfunction nested_threads_inner()\n for i in 1:20\n y = f(i)\n Threads.@threads for j in 1:6\n z = g(i, j, y)\n end\n end\nend","category":"page"},{"location":"","page":"Home","title":"Home","text":"Unlike Threads.@threads (which is really only intended to be used for a single loop, unnested), Dagger.@spawn is capable of parallelizing across both loop levels seamlessly, using the dependencies between f and g to determine the correct order to execute tasks.","category":"page"},{"location":"","page":"Home","title":"Home","text":"","category":"page"},{"location":"#Quickstart:-Data-Management","page":"Home","title":"Quickstart: Data Management","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"For more details: Data Management","category":"page"},{"location":"#Operate-on-mutable-data-in-place","page":"Home","title":"Operate on mutable data in-place","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Dagger usually assumes that you won't be modifying the arguments passed to your functions, but you can tell Dagger you plan to mutate them with @mutable:","category":"page"},{"location":"","page":"Home","title":"Home","text":"A = Dagger.@mutable rand(1000, 1000)\nDagger.@spawn accumulate!(+, A, A)","category":"page"},{"location":"","page":"Home","title":"Home","text":"This will lock A (and any tasks that use it) to the current worker. You can also lock it to a different worker by creating the data within a task:","category":"page"},{"location":"","page":"Home","title":"Home","text":"A = Dagger.spawn() do\n Dagger.@mutable rand(1000, 1000)\nend","category":"page"},{"location":"","page":"Home","title":"Home","text":"or by specifying the worker argument to @mutable:","category":"page"},{"location":"","page":"Home","title":"Home","text":"A = Dagger.@mutable worker=2 rand(1000, 1000)","category":"page"},{"location":"#Operate-on-distributed-data","page":"Home","title":"Operate on distributed data","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Often we want to work with more than one piece of data; the common case of wanting one piece of data per worker is easy to do by using @shard:","category":"page"},{"location":"","page":"Home","title":"Home","text":"X = Dagger.@shard myid()","category":"page"},{"location":"","page":"Home","title":"Home","text":"This will execute myid() independently on every worker in your Julia cluster, and place references to each within a Shard object called X. We can then use X in task spawning, but we'll only get the result of myid() that corresponds to the worker that the task is running on:","category":"page"},{"location":"","page":"Home","title":"Home","text":"for w in workers()\n @show fetch(Dagger.@spawn scope=Dagger.scope(worker=w) identity(X))\nend","category":"page"},{"location":"","page":"Home","title":"Home","text":"The above should print the result of myid() for each worker in worker(), as identity(X) receives only the value of X specific to that worker.","category":"page"},{"location":"#Reducing-over-distributed-data","page":"Home","title":"Reducing over distributed data","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Reductions are often parallelized by reducing a set of partitions on each worker, and then reducing those intermediate reductions on a single worker. Dagger supports this easily with @shard:","category":"page"},{"location":"","page":"Home","title":"Home","text":"A = Dagger.@shard rand(1:20, 10000)\ntemp_bins = Dagger.@shard zeros(20)\nhist! = (bins, arr) -> for elem in arr\n bins[elem] += 1\nend\nwait.([Dagger.@spawn scope=Dagger.scope(;worker) hist!(temp_bins, A) for worker in procs()])\nfinal_bins = sum(map(b->fetch(Dagger.@spawn copy(b)), temp_bins); dims=1)[1]","category":"page"},{"location":"","page":"Home","title":"Home","text":"Here, A points to unique random arrays, one on each worker, and temp_bins points to a set of histogram bins on each worker. When we @spawn hist!, Dagger passes in the random array and bins for only the specific worker that the task is run on; i.e. a call to hist! that runs on worker 2 will get a different A and temp_bins from a call to hist! on worker 3. All of the calls to hist! may run in parallel.","category":"page"},{"location":"","page":"Home","title":"Home","text":"By using map on temp_bins, we then make a copy of each worker's bins that we can safely return back to our current worker, and sum them together to get our total histogram.","category":"page"},{"location":"","page":"Home","title":"Home","text":"","category":"page"},{"location":"#Quickstart:-File-IO","page":"Home","title":"Quickstart: File IO","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Dagger has support for loading and saving files that integrates seamlessly with its task system, in the form of Dagger.File and Dagger.tofile.","category":"page"},{"location":"","page":"Home","title":"Home","text":"warn: Warn\nThese functions are not yet fully tested, so please make sure to take backups of any files that you load with them.","category":"page"},{"location":"#Loading-files-from-disk","page":"Home","title":"Loading files from disk","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"In order to load one or more files from disk, Dagger provides the File function, which creates a lazy reference to a file:","category":"page"},{"location":"","page":"Home","title":"Home","text":"f = Dagger.File(\"myfile.jls\")","category":"page"},{"location":"","page":"Home","title":"Home","text":"f is now a lazy reference to \"myfile.jls\", and its contents can be loaded automatically by just passing the object to a task:","category":"page"},{"location":"","page":"Home","title":"Home","text":"wait(Dagger.@spawn println(f))\n# Prints the loaded contents of the file","category":"page"},{"location":"","page":"Home","title":"Home","text":"By default, File assumes that the file uses Julia's Serialization format; this can be easily changed to assume Arrow format, for example:","category":"page"},{"location":"","page":"Home","title":"Home","text":"using Arrow\nf = Dagger.File(\"myfile.arrow\"; serialize=Arrow.write, deserialize=Arrow.Table)","category":"page"},{"location":"#Writing-data-to-disk","page":"Home","title":"Writing data to disk","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Saving data to disk is as easy as loading it; tofile provides this capability in a similar manner to File:","category":"page"},{"location":"","page":"Home","title":"Home","text":"A = rand(1000)\nf = Dagger.tofile(A, \"mydata.jls\")","category":"page"},{"location":"","page":"Home","title":"Home","text":"Like File, f can still be used to reference the file's data in tasks. It is likely most useful to use tofile at the end of a task to save results:","category":"page"},{"location":"","page":"Home","title":"Home","text":"function make_data()\n A = rand(1000)\n return Dagger.tofile(A, \"mydata.jls\")\nend\nfetch(Dagger.@spawn make_data())\n# Data was also written to \"mydata.jls\"","category":"page"},{"location":"","page":"Home","title":"Home","text":"tofile takes the same keyword arguments as File, allowing the format of data on disk to be specified as desired.","category":"page"},{"location":"","page":"Home","title":"Home","text":"","category":"page"},{"location":"#Quickstart:-Distributed-Arrays","page":"Home","title":"Quickstart: Distributed Arrays","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Dagger's DArray type represents a distributed array, where a single large array is implemented as a set of smaller array partitions, which may be distributed across a Julia cluster.","category":"page"},{"location":"","page":"Home","title":"Home","text":"For more details: Distributed Arrays","category":"page"},{"location":"#Distribute-an-existing-array","page":"Home","title":"Distribute an existing array","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Distributing any kind of array into a DArray is easy, just use distribute, and specify the partitioning you desire with Blocks. For example, to distribute a 16 x 16 matrix in 4 x 4 partitions:","category":"page"},{"location":"","page":"Home","title":"Home","text":"A = rand(16, 16)\nDA = distribute(A, Blocks(4, 4))","category":"page"},{"location":"#Allocate-a-distributed-array-directly","page":"Home","title":"Allocate a distributed array directly","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"To allocate a DArray, just pass your Blocks partitioning object into the appropriate allocation function, such as rand, ones, or zeros:","category":"page"},{"location":"","page":"Home","title":"Home","text":"rand(Blocks(20, 20), 100, 100)\nones(Blocks(20, 100), 100, 2000)\nzeros(Blocks(50, 20), 300, 200)","category":"page"},{"location":"#Convert-a-DArray-back-into-an-Array","page":"Home","title":"Convert a DArray back into an Array","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"To get back an Array from a DArray, just call collect:","category":"page"},{"location":"","page":"Home","title":"Home","text":"DA = rand(Blocks(32, 32), 256, 128)\ncollect(DA) # returns a `Matrix{Float64}`","category":"page"},{"location":"darray/#Distributed-Arrays","page":"Distributed Arrays","title":"Distributed Arrays","text":"","category":"section"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"The DArray, or \"distributed array\", is an abstraction layer on top of Dagger that allows loading array-like structures into a distributed environment. The DArray partitions a larger array into smaller \"blocks\" or \"chunks\", and those blocks may be located on any worker in the cluster. The DArray uses a Parallel Global Address Space (aka \"PGAS\") model for storing partitions, which means that a DArray instance contains a reference to every partition in the greater array; this provides great flexibility in allowing Dagger to choose the most efficient way to distribute the array's blocks and operate on them in a heterogeneous manner.","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"Aside: an alternative model, here termed the \"MPI\" model, is not yet supported, but would allow storing only a single partition of the array on each MPI rank in an MPI cluster. DArray support for this model is planned in the near future.","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"This should not be confused with the DistributedArrays.jl package.","category":"page"},{"location":"darray/#Creating-DArrays","page":"Distributed Arrays","title":"Creating DArrays","text":"","category":"section"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"A DArray can be created in two ways: through an API similar to the usual rand, ones, etc. calls, or by distributing an existing array with distribute. It's generally not recommended to manually construct a DArray object unless you're developing the DArray itself.","category":"page"},{"location":"darray/#Allocating-new-arrays","page":"Distributed Arrays","title":"Allocating new arrays","text":"","category":"section"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"As an example, one can allocate a random DArray by calling rand with a Blocks object as the first argument - Blocks specifies the size of partitions to be constructed, and must be the same number of dimensions as the array being allocated.","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"# Add some Julia workers\njulia> using Distributed; addprocs(6)\n6-element Vector{Int64}:\n 2\n 3\n 4\n 5\n 6\n 7\n\njulia> @everywhere using Dagger\n\njulia> DX = rand(Blocks(50, 50), 100, 100)\nDagger.DArray{Any, 2, typeof(cat)}(100, 100)","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"The rand(Blocks(50, 50), 100, 100) call specifies that a DArray matrix should be allocated which is in total 100 x 100, split into 4 blocks of size 50 x 50, and initialized with random Float64s. Many other functions, like randn, ones, and zeros can be called in this same way.","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"Note that the DArray is an asynchronous object (i.e. operations on it may execute in the background), so to force it to be materialized, fetch may need to be called:","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"julia> fetch(DX)\nDagger.DArray{Any, 2, typeof(cat)}(100, 100)","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"This doesn't change the type or values of the DArray, but it does make sure that any pending operations have completed.","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"To convert a DArray back into an Array, collect can be used to gather the data from all the Julia workers that they're on and combine them into a single Array on the worker calling collect:","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"julia> collect(DX)\n100×100 Matrix{Float64}:\n 0.610404 0.0475367 0.809016 0.311305 0.0306211 0.689645 … 0.220267 0.678548 0.892062 0.0559988\n 0.680815 0.788349 0.758755 0.0594709 0.640167 0.652266 0.331429 0.798848 0.732432 0.579534\n 0.306898 0.0805607 0.498372 0.887971 0.244104 0.148825 0.340429 0.029274 0.140624 0.292354\n 0.0537622 0.844509 0.509145 0.561629 0.566584 0.498554 0.427503 0.835242 0.699405 0.0705192\n 0.587364 0.59933 0.0624318 0.3795 0.430398 0.0853735 0.379947 0.677105 0.0305861 0.748001\n 0.14129 0.635562 0.218739 0.0629501 0.373841 0.439933 … 0.308294 0.0966736 0.783333 0.00763648\n 0.14539 0.331767 0.912498 0.0649541 0.527064 0.249595 0.826705 0.826868 0.41398 0.80321\n 0.13926 0.353158 0.330615 0.438247 0.284794 0.238837 0.791249 0.415801 0.729545 0.88308\n 0.769242 0.136001 0.950214 0.171962 0.183646 0.78294 0.570442 0.321894 0.293101 0.911913\n 0.786168 0.513057 0.781712 0.0191752 0.512821 0.621239 0.50503 0.0472064 0.0368674 0.75981\n 0.493378 0.129937 0.758052 0.169508 0.0564534 0.846092 … 0.873186 0.396222 0.284 0.0242124\n 0.12689 0.194842 0.263186 0.213071 0.535613 0.246888 0.579931 0.699231 0.441449 0.882772\n 0.916144 0.21305 0.629293 0.329303 0.299889 0.127453 0.644012 0.311241 0.713782 0.0554386\n ⋮ ⋮ ⋱\n 0.430369 0.597251 0.552528 0.795223 0.46431 0.777119 0.189266 0.499178 0.715808 0.797629\n 0.235668 0.902973 0.786537 0.951402 0.768312 0.633666 0.724196 0.866373 0.0679498 0.255039\n 0.605097 0.301349 0.758283 0.681568 0.677913 0.51507 … 0.654614 0.37841 0.86399 0.583924\n 0.824216 0.62188 0.369671 0.725758 0.735141 0.183666 0.0401394 0.522191 0.849429 0.839651\n 0.578047 0.775035 0.704695 0.203515 0.00267523 0.869083 0.0975535 0.824887 0.00787017 0.920944\n 0.805897 0.0275489 0.175715 0.135956 0.389958 0.856349 0.974141 0.586308 0.59695 0.906727\n 0.212875 0.509612 0.85531 0.266659 0.0695836 0.0551129 0.788085 0.401581 0.948216 0.00242077\n 0.512997 0.134833 0.895968 0.996953 0.422192 0.991526 … 0.838781 0.141053 0.747722 0.84489\n 0.283221 0.995152 0.61636 0.75955 0.072718 0.691665 0.151339 0.295759 0.795476 0.203072\n 0.0946639 0.496832 0.551496 0.848571 0.151074 0.625696 0.673817 0.273958 0.177998 0.563221\n 0.0900806 0.127274 0.394169 0.140403 0.232985 0.460306 0.536441 0.200297 0.970311 0.0292218\n 0.0698985 0.463532 0.934776 0.448393 0.606287 0.552196 0.883694 0.212222 0.888415 0.941097","category":"page"},{"location":"darray/#Distributing-existing-arrays","page":"Distributed Arrays","title":"Distributing existing arrays","text":"","category":"section"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"Now let's look at constructing a DArray from an existing array object; we can do this by calling distribute:","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"julia> Z = zeros(100, 500);\n\njulia> Dzeros = distribute(Z, Blocks(10, 50))\nDagger.DArray{Any, 2, typeof(cat)}(100, 500)","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"This will distribute the array partitions (in chunks of 10 x 50 matrices) across the workers in the Julia cluster in a relatively even distribution; future operations on a DArray may produce a different distribution from the one chosen by distribute.","category":"page"},{"location":"darray/#Broadcasting","page":"Distributed Arrays","title":"Broadcasting","text":"","category":"section"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"As the DArray is a subtype of AbstractArray and generally satisfies Julia's array interface, a variety of common operations (such as broadcast) work as expected:","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"julia> DX = rand(Blocks(50,50), 100, 100)\nDagger.DArray{Float64, 2, Blocks{2}, typeof(cat)}(100, 100)\n\njulia> DY = DX .+ DX\nDagger.DArray{Float64, 2, Blocks{2}, typeof(cat)}(100, 100)\n\njulia> DZ = DY .* 3\nDagger.DArray{Float64, 2, Blocks{2}, typeof(cat)}(100, 100)","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"Now, DZ will contain the result of computing (DX .+ DX) .* 3. Note that DArray objects are immutable, and operations on them are thus functional transformations of their input DArray.","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"note: Note\nSupport for mutation of DArrays is planned for a future release","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"julia> Dagger.chunks(DZ)\n2×2 Matrix{Any}:\n EagerThunk (finished) EagerThunk (finished)\n EagerThunk (finished) EagerThunk (finished)\n\njulia> Dagger.chunks(fetch(DZ))\n2×2 Matrix{Union{Thunk, Dagger.Chunk}}:\n Chunk{Matrix{Float64}, DRef, ThreadProc, AnyScope}(Matrix{Float64}, ArrayDomain{2}((1:50, 1:50)), DRef(4, 8, 0x0000000000004e20), ThreadProc(4, 1), AnyScope(), true) … Chunk{Matrix{Float64}, DRef, ThreadProc, AnyScope}(Matrix{Float64}, ArrayDomain{2}((1:50, 1:50)), DRef(2, 5, 0x0000000000004e20), ThreadProc(2, 1), AnyScope(), true)\n Chunk{Matrix{Float64}, DRef, ThreadProc, AnyScope}(Matrix{Float64}, ArrayDomain{2}((1:50, 1:50)), DRef(5, 5, 0x0000000000004e20), ThreadProc(5, 1), AnyScope(), true) Chunk{Matrix{Float64}, DRef, ThreadProc, AnyScope}(Matrix{Float64}, ArrayDomain{2}((1:50, 1:50)), DRef(3, 3, 0x0000000000004e20), ThreadProc(3, 1), AnyScope(), true)","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"Here we can see the DArray's internal representation of the partitions, which are stored as either EagerThunk objects (representing an ongoing or completed computation) or Chunk objects (which reference data which exist locally or on other Julia workers). Of course, one doesn't typically need to worry about these internal details unless implementing low-level operations on DArrays.","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"Finally, it's easy to see the results of this combination of broadcast operations; just use collect to get an Array:","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"julia> collect(DZ)\n100×100 Matrix{Float64}:\n 5.72754 1.23614 4.67045 4.89095 3.40126 … 5.07663 1.60482 5.04386 1.44755 2.5682\n 0.189402 3.64462 5.92218 3.94603 2.32192 1.47115 4.6364 0.778867 3.13838 4.87871\n 3.3492 3.96929 3.46377 1.29776 3.59547 4.82616 1.1512 3.02528 3.05538 0.139763\n 5.0981 5.72564 5.1128 0.954708 2.04515 2.50365 5.97576 5.17683 4.79587 1.80113\n 1.0737 5.25768 4.25363 0.943006 4.25783 4.1801 3.14444 3.07428 4.41075 2.90252\n 5.48746 5.17286 3.99259 0.939678 3.76034 … 0.00763076 2.98176 1.83674 1.61791 3.33216\n 1.05088 4.98731 1.24925 3.57909 2.53366 5.96733 2.35186 5.75815 3.32867 1.15317\n 0.0335647 3.52524 0.159895 5.49908 1.33206 3.51113 0.0753356 1.5557 0.884252 1.45085\n 5.27506 2.00472 0.00636555 0.461574 5.16735 2.74457 1.14679 2.39407 0.151713 0.85013\n 4.43607 4.50304 4.73833 1.92498 1.64338 4.34602 4.62612 3.28248 1.32726 5.50207\n 5.22308 2.53069 1.27758 2.62013 3.73961 … 5.91626 2.54943 5.41472 1.67197 4.09026\n 1.09684 2.53189 4.23236 0.14055 0.889771 2.20834 2.31341 5.23121 1.74341 4.00588\n 2.55253 4.1789 3.50287 4.96437 1.26724 3.04302 3.74262 5.46611 1.39375 4.13167\n 3.03291 4.43932 2.85678 1.59531 0.892166 0.414873 0.643423 4.425 5.48145 5.93383\n 0.726568 0.516686 3.00791 3.76354 3.32603 2.19812 2.15836 3.85669 3.67233 2.1261\n 2.22763 1.36281 4.41129 5.29229 1.10093 … 0.45575 4.38389 0.0526105 2.14792 2.26734\n 2.58065 1.99564 4.82657 0.485823 5.24881 2.16097 3.59942 2.25021 3.96498 0.906153\n 0.546354 0.982523 1.94377 2.43136 2.77469 4.43507 5.98402 0.692576 1.53298 1.20621\n 4.71374 4.99402 1.5876 1.81629 2.56269 1.56588 5.42296 0.160867 4.17705 1.13915\n 2.97733 2.4476 3.82752 1.3491 3.5684 1.23393 1.86595 3.97154 4.6419 4.8964\n ⋮ ⋱ ⋮\n 3.49162 2.46081 1.21659 2.96078 4.58102 5.97679 3.34463 0.202255 2.85433 0.0786219\n 0.894714 2.87079 5.09409 2.2922 3.18928 1.5886 0.163886 5.99251 0.697163 5.75684\n 2.98867 2.2115 5.07771 0.124194 3.88948 3.61176 0.0732554 4.11606 0.424547 0.621287\n 5.95438 3.45065 0.194537 3.57519 1.2266 2.93837 1.02609 5.84021 5.498 3.53337\n 2.234 0.275185 0.648536 0.952341 4.41942 … 4.78238 2.24479 3.31705 5.76518 0.621195\n 5.54212 2.24089 5.81702 1.96178 4.99409 0.30557 3.55499 0.851678 1.80504 5.81679\n 5.79409 4.86848 3.10078 4.22252 4.488 3.03427 2.32752 3.54999 0.967972 4.0385\n 3.06557 5.4993 2.44263 1.82296 0.166883 0.763588 1.59113 4.33305 2.8359 5.56667\n 3.86797 3.73251 3.14999 4.11437 0.454938 0.166886 0.303827 4.7934 3.37593 2.29402\n 0.762158 4.3716 0.897798 4.60541 2.96872 … 1.60095 0.480542 1.41945 1.33071 0.308611\n 1.20503 5.66645 4.03237 3.90194 1.55996 3.58442 4.6735 5.52211 5.46891 2.43612\n 5.51133 1.13591 3.26696 4.24821 4.60696 3.73251 3.25989 4.735 5.61674 4.32185\n 2.46529 0.444928 3.85984 5.49469 1.13501 1.36861 5.34651 0.398515 0.239671 5.36412\n 2.62837 3.99017 4.52569 3.54811 3.35515 4.13514 1.22304 1.01833 3.42534 3.58399\n 4.88289 5.09945 0.267154 3.38482 4.53408 … 3.71752 5.22216 1.39987 1.38622 5.47351\n 0.1046 3.65967 1.62098 5.33185 0.0822769 3.30334 5.90173 4.06603 5.00789 4.40601\n 1.9622 0.755491 2.12264 1.67299 2.34482 4.50632 3.84387 3.22232 5.23164 2.97735\n 4.37208 5.15253 0.346373 2.98573 5.48589 0.336134 2.25751 2.39057 1.97975 3.24243\n 3.83293 1.69017 3.00189 1.80388 3.43671 5.94085 1.27609 3.98737 0.334963 5.84865","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"A variety of other operations exist on the DArray, and it should generally behavior otherwise similar to any other AbstractArray type. If you find that it's missing an operation that you need, please file an issue!","category":"page"}] +[{"location":"dynamic/#Dynamic-Scheduler-Control","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"","category":"section"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"Normally, Dagger executes static graphs defined with delayed and @par. However, it is possible for thunks to dynamically modify the graph at runtime, and to generally exert direct control over the scheduler's internal state. The Dagger.sch_handle function provides this functionality within a thunk:","category":"page"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"function mythunk(x)\n h = Dagger.sch_handle()\n Dagger.halt!(h)\n return x\nend","category":"page"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"The above example prematurely halts a running scheduler at the next opportunity using Dagger.halt!:","category":"page"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"Dagger.halt!","category":"page"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"There are a variety of other built-in functions available for various uses:","category":"page"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"Dagger.get_dag_ids Dagger.add_thunk!","category":"page"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"When working with thunks acquired from get_dag_ids or add_thunk!, you will have ThunkID objects which refer to a thunk by ID. Scheduler control functions which work with thunks accept or return ThunkIDs. For example, one can create a new thunkt and get its result with Base.fetch:","category":"page"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"function mythunk(x)\n h = Dagger.sch_handle()\n id = Dagger.add_thunk!(h, x) do y\n y + 1\n end\n return fetch(h, id)\nend","category":"page"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"Alternatively, Base.wait can be used when one does not wish to retrieve the returned value of the thunk.","category":"page"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"Users with needs not covered by the built-in functions should use the Dagger.exec! function to pass a user-defined function, closure, or callable struct to the scheduler, along with a payload which will be provided to that function:","category":"page"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"Dagger.exec!","category":"page"},{"location":"dynamic/","page":"Dynamic Scheduler Control","title":"Dynamic Scheduler Control","text":"Note that all functions called by Dagger.exec! take the scheduler's internal lock, so it's safe to manipulate the internal ComputeState object within the user-provided function.","category":"page"},{"location":"scheduler-internals/#Scheduler-Internals","page":"Scheduler Internals","title":"Scheduler Internals","text":"","category":"section"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"Dagger's scheduler can be found primarily in the Dagger.Sch module. It performs a variety of functions to support tasks and data, and as such is a complex system. This documentation attempts to shed light on how the scheduler works internally (from a somewhat high level), with the hope that it will help users and contributors understand how to improve the scheduler or fix any bugs that may arise from it.","category":"page"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"warn: Warn\nDagger's scheduler is evolving at a rapid pace, and is a complex mix of interacting parts. As such, this documentation may become out of date very quickly, and may not reflect the current state of the scheduler. Please feel free to file PRs to correct or improve this document, but also beware that the true functionality is defined in Dagger's source!","category":"page"},{"location":"scheduler-internals/#Core-vs.-Worker-Schedulers","page":"Scheduler Internals","title":"Core vs. Worker Schedulers","text":"","category":"section"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"Dagger's scheduler is really two kinds of entities: the \"core\" scheduler, and \"worker\" schedulers:","category":"page"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"The core scheduler runs on worker 1, thread 1, and is the entrypoint to tasks which have been submitted. The core scheduler manages all task dependencies, notifies calls to wait and fetch of task completion, and generally performs initial task placement. The core scheduler has cached information about each worker and their processors, and uses that information (together with metrics about previous tasks and other aspects of the Dagger runtime) to generate a near-optimal just-in-time task schedule.","category":"page"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"The worker schedulers each run as a set of tasks across all workers and all processors, and handles data movement and task execution. Once the core scheduler has scheduled and launched a task, it arrives at the worker scheduler for handling. The worker scheduler will pass the task to a queue for the assigned processor, where it will wait until the processor has a sufficient amount of \"occupancy\" for the task. Once the processor is ready for the task, it will first fetch all of the task's arguments from other workers, and then it will execute the task, package the task's result into a Chunk, and pass that back to the core scheduler.","category":"page"},{"location":"scheduler-internals/#Core:-Basics","page":"Scheduler Internals","title":"Core: Basics","text":"","category":"section"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"The core scheduler contains a single internal instance of type ComputeState, which maintains (among many other things) all necessary state to represent the set of waiting, ready, and running tasks, cached task results, and maps of interdependencies between tasks. It uses Julia's task infrastructure to asynchronously send work requests to remote Julia processes, and uses a RemoteChannel as an inbound queue for completed work.","category":"page"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"There is an outer loop which drives the scheduler, which continues executing either eternally (excepting any internal scheduler errors or Julia exiting), or until all tasks in the graph have completed executing and the final task in the graph is ready to be returned to the user. This outer loop continuously performs two main operations: the first is to launch the execution of nodes which have become \"ready\" to execute; the second is to \"finish\" nodes which have been completed.","category":"page"},{"location":"scheduler-internals/#Core:-Initialization","page":"Scheduler Internals","title":"Core: Initialization","text":"","category":"section"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"At the very beginning of a scheduler's lifecycle, a ComputeState object is allocated, workers are asynchronously initialized, and the outer loop is started. Additionally, the scheduler is passed one or more tasks to start scheduling, and so it will also fill out the ComputeState with the computed sets of dependencies between tasks, initially placing all tasks are placed in the \"waiting\" state. If any of the tasks are found to only have non-task input arguments, then they are considered ready to execute and moved from the \"waiting\" state to \"ready\".","category":"page"},{"location":"scheduler-internals/#Core:-Outer-Loop","page":"Scheduler Internals","title":"Core: Outer Loop","text":"","category":"section"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"At each outer loop iteration, all tasks in the \"ready\" state will be scheduled, moved into the \"running\" state, and asynchronously sent to the workers for execution (called \"firing\"). Once all tasks are either waiting or running, the scheduler may sleep until actions need to be performed","category":"page"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"When fired tasks have completed executing, an entry will exist in the inbound queue signaling the task's result and other metadata. At this point, the most recently-queued task is removed from the queue, \"finished\", and placed in the \"finished\" state. Finishing usually unlocks downstream tasks from the waiting state and allows them to transition to the ready state.","category":"page"},{"location":"scheduler-internals/#Core:-Task-Scheduling","page":"Scheduler Internals","title":"Core: Task Scheduling","text":"","category":"section"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"Once one or more tasks are ready to be scheduled, the scheduler will begin assigning them to the processors within each available worker. This is a sequential operation consisting of:","category":"page"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"Selecting candidate processors based on the task's combined scope\nCalculating the cost to move needed data to each candidate processor\nAdding a \"wait time\" cost proportional to the estimated run time for all the tasks currently executing on each candidate processor\nSelecting the least costly candidate processor as the executor for this task","category":"page"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"After these operations have been performed for each task, the tasks will be fired off to their appropriate worker for handling.","category":"page"},{"location":"scheduler-internals/#Worker:-Task-Execution","page":"Scheduler Internals","title":"Worker: Task Execution","text":"","category":"section"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"Once a worker receives one or more tasks to be executed, the tasks are immediately enqueued into the appropriate processor's queue, and the processors are notified that work is available to be executed. The processors will asynchronously look at their queues and pick the task with the lowest occupancy first; a task with zero occupancy will always be executed immediately, but most tasks have non-zero occupancy, and so will be executed in order of increasing occupancy (effectively prioritizing asynchronous tasks like I/O).","category":"page"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"Before a task begins executions, the processor will collect the task's arguments from other workers as needed, and convert them as needed to execute correctly according to the processor's semantics. This operation is called a \"move\".","category":"page"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"Once a task's arguments have been moved, the task's function will be called with the arguments, and assuming the task doesn't throw an error, the result will be wrapped in a Chunk object. This Chunk will then be sent back to the core scheduler along with information about which task generated it. If the task does throw an error, then the error is instead propagated to the core scheduler, along with a flag indicating that the task failed.","category":"page"},{"location":"scheduler-internals/#Worker:-Workload-Balancing","page":"Scheduler Internals","title":"Worker: Workload Balancing","text":"","category":"section"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"In general, Dagger's core scheduler tries to balance workloads as much as possible across all the available processors, but it can fail to do so effectively when either its cached knowledge of each worker's status is outdated, or when its estimates about the task's behavior are inaccurate. To minimize the possibility of workload imbalance, the worker schedulers' processors will attempt to steal tasks from each other when they are under-occupied. Tasks will only be stolen if the task's scope is compatibl with the processor attempting the steal, so tasks with wider scopes have better balancing potential.","category":"page"},{"location":"scheduler-internals/#Core:-Finishing","page":"Scheduler Internals","title":"Core: Finishing","text":"","category":"section"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"Finishing a task which has completed executing is generally a simple set of operations:","category":"page"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"The task's result is registered in the ComputeState for any tasks or user code which will need it\nAny unneeded data is cleared from the scheduler (such as preserved Chunk arguments)\nDownstream dependencies will be moved from \"waiting\" to \"ready\" if this task was the last upstream dependency to them","category":"page"},{"location":"scheduler-internals/#Core:-Shutdown","page":"Scheduler Internals","title":"Core: Shutdown","text":"","category":"section"},{"location":"scheduler-internals/","page":"Scheduler Internals","title":"Scheduler Internals","text":"If the core scheduler needs to shutdown due to an error or Julia exiting, then all workers will be shutdown, and the scheduler will close any open channels. If shutdown was due to an error, then an error will be printed or thrown back to the caller.","category":"page"},{"location":"use-cases/parallel-nested-loops/#Use-Case:-Parallel-Nested-Loops","page":"Parallel Nested Loops","title":"Use Case: Parallel Nested Loops","text":"","category":"section"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"One of the many applications of Dagger is that it can be used as a drop-in replacement for nested multi-threaded loops that would otherwise be written with Threads.@threads.","category":"page"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"Consider a simplified scenario where you want to calculate the maximum mean values of random samples of various lengths that have been generated by several distributions provided by the Distributions.jl package. The results should be collected into a DataFrame. We have the following function:","category":"page"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"using Dagger, Random, Distributions, StatsBase, DataFrames\n\nfunction f(dist, len, reps, σ)\n v = Vector{Float64}(undef, len) # avoiding allocations\n maximum(mean(rand!(dist, v)) for _ in 1:reps)/σ\nend","category":"page"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"Let us consider the following probability distributions for numerical experiments, all of which have expected values equal to zero, and the following lengths of vectors:","category":"page"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"dists = [Cosine, Epanechnikov, Laplace, Logistic, Normal, NormalCanon, PGeneralizedGaussian, SkewNormal, SkewedExponentialPower, SymTriangularDist]\nlens = [10, 20, 50, 100, 200, 500]","category":"page"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"Using Threads.@threads those experiments could be parallelized as:","category":"page"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"function experiments_threads(dists, lens, K=1000)\n res = DataFrame()\n lck = ReentrantLock()\n Threads.@threads for T in dists\n dist = T()\n σ = std(dist)\n for L in lens\n z = f(dist, L, K, σ)\n Threads.lock(lck) do\n push!(res, (;T, σ, L, z))\n end\n end\n end\n res\nend","category":"page"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"Note that DataFrames.push! is not a thread safe operation and hence we need to utilize a locking mechanism in order to avoid two threads appending the DataFrame at the same time.","category":"page"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"The same code could be rewritten in Dagger as:","category":"page"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"function experiments_dagger(dists, lens, K=1000)\n res = DataFrame()\n @sync for T in dists\n dist = T()\n σ = Dagger.@spawn std(dist)\n for L in lens\n z = Dagger.@spawn f(dist, L, K, σ)\n push!(res, (;T, σ, L, z))\n end\n end\n res.z = fetch.(res.z)\n res.σ = fetch.(res.σ)\n res\nend","category":"page"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"In this code we have job interdependence. Firstly, we are calculating the standard deviation σ and than we are using that value in the function f. Since Dagger.@spawn yields an EagerThunk rather than actual values, we need to use the fetch function to obtain those values. In this example, the value fetching is perfomed once all computations are completed (note that @sync preceding the loop forces the loop to wait for all jobs to complete). Also, note that contrary to the previous example, we do not need to implement locking as we are just pushing the EagerThunk results of Dagger.@spawn serially into the DataFrame (which is fast since Dagger.@spawn doesn't block).","category":"page"},{"location":"use-cases/parallel-nested-loops/","page":"Parallel Nested Loops","title":"Parallel Nested Loops","text":"The above use case scenario has been tested by running julia -t 8 (or with JULIA_NUM_THREADS=8 as environment variable). The Threads.@threads code takes 1.8 seconds to run, while the Dagger code, which is also one line shorter, runs around 0.9 seconds, resulting in a 2x speedup.","category":"page"},{"location":"scheduler-visualization/#Scheduler-Visualization-with-DaggerWebDash","page":"Scheduler Visualization","title":"Scheduler Visualization with DaggerWebDash","text":"","category":"section"},{"location":"scheduler-visualization/","page":"Scheduler Visualization","title":"Scheduler Visualization","text":"When working with Dagger, especially when working with its scheduler, it can be helpful to visualize what Dagger is doing internally. To assist with this, a web dashboard is available in the DaggerWebDash.jl package. This web dashboard uses a web server running within each Dagger worker, along with event logging information, to expose details about the scheduler. Information like worker and processor saturation, memory allocations, profiling traces, and much more are available in easy-to-interpret plots.","category":"page"},{"location":"scheduler-visualization/","page":"Scheduler Visualization","title":"Scheduler Visualization","text":"Using the dashboard is relatively simple and straightforward; if you run Dagger's benchmarking script, it's enabled for you automatically if the BENCHMARK_RENDER environment variable is set to webdash. This is the easiest way to get started with the web dashboard for new users.","category":"page"},{"location":"scheduler-visualization/","page":"Scheduler Visualization","title":"Scheduler Visualization","text":"For manual usage, the following snippet of code will suffice:","category":"page"},{"location":"scheduler-visualization/","page":"Scheduler Visualization","title":"Scheduler Visualization","text":"using Dagger, DaggerWebDash, TimespanLogging\n\nctx = Context() # or `ctx = Dagger.Sch.eager_context()` for eager API usage\nml = TimespanLogging.MultiEventLog()\n\n## Add some logging events of interest\n\nml[:core] = TimespanLogging.Events.CoreMetrics()\nml[:id] = TimespanLogging.Events.IDMetrics()\nml[:timeline] = TimespanLogging.Events.TimelineMetrics()\n# ...\n\n# (Optional) Enable profile flamegraph generation with ProfileSVG\nml[:profile] = DaggerWebDash.ProfileMetrics()\nctx.profile = true\n\n# Create a LogWindow; necessary for real-time event updates\nlw = TimespanLogging.Events.LogWindow(20*10^9, :core)\nml.aggregators[:logwindow] = lw\n\n# Create the D3Renderer server on port 8080\nd3r = DaggerWebDash.D3Renderer(8080)\n\n## Add some plots! Rendered top-down in order\n\n# Show an overview of all generated events as a Gantt chart\npush!(d3r, DaggerWebDash.GanttPlot(:core, :id, :esat, :psat; title=\"Overview\"))\n\n# Show various numerical events as line plots over time\npush!(d3r, DaggerWebDash.LinePlot(:core, :wsat, \"Worker Saturation\", \"Running Tasks\"))\npush!(d3r, DaggerWebDash.LinePlot(:core, :loadavg, \"CPU Load Average\", \"Average Running Threads\"))\npush!(d3r, DaggerWebDash.LinePlot(:core, :bytes, \"Allocated Bytes\", \"Bytes\"))\npush!(d3r, DaggerWebDash.LinePlot(:core, :mem, \"Available Memory\", \"% Free\"))\n\n# Show a graph rendering of compute tasks and data movement between them\n# Note: Profile events are ignored if absent from the log\npush!(d3r, DaggerWebDash.GraphPlot(:core, :id, :timeline, :profile, \"DAG\"))\n\n# TODO: Not yet functional\n#push!(d3r, DaggerWebDash.ProfileViewer(:core, :profile, \"Profile Viewer\"))\n\n# Add the D3Renderer as a consumer of special events generated by LogWindow\npush!(lw.creation_handlers, d3r)\npush!(lw.deletion_handlers, d3r)\n\n# D3Renderer is also an aggregator\nml.aggregators[:d3r] = d3r\n\nctx.log_sink = ml\n# ... use `ctx`","category":"page"},{"location":"scheduler-visualization/","page":"Scheduler Visualization","title":"Scheduler Visualization","text":"Once the server has started, you can browse to http://localhost:8080/ (if running on your local machine) to view the plots in real time. The dashboard also provides options at the top of the page to control the drawing speed, enable and disable reading updates from the server (disabling freezes the display at the current instant), and a selector for which worker to look at. If the connection to the server is lost for any reason, the dashboard will attempt to reconnect at 5 second intervals. The dashboard can usually survive restarts of the server perfectly well, although refreshing the page is usually a good idea. Informational messages are also logged to the browser console for debugging.","category":"page"},{"location":"propagation/#Option-Propagation","page":"Option Propagation","title":"Option Propagation","text":"","category":"section"},{"location":"propagation/","page":"Option Propagation","title":"Option Propagation","text":"Most options passed to Dagger are passed via @spawn/spawn or delayed directly. This works well when an option only needs to be set for a single thunk, but is cumbersome when the same option needs to be set on multiple thunks, or set recursively on thunks spawned within other thunks. Thankfully, Dagger provides the with_options function to make this easier. This function is very powerful, by nature of using \"context variables\"; let's first see some example code to help explain it:","category":"page"},{"location":"propagation/","page":"Option Propagation","title":"Option Propagation","text":"function f(x)\n m = Dagger.@spawn myid()\n return Dagger.@spawn x+m\nend\nDagger.with_options(;scope=ProcessScope(2)) do\n @sync begin\n @async @assert fetch(Dagger.@spawn f(1)) == 3\n @async @assert fetch(Dagger.@spawn f(2)) == 4\n end\nend","category":"page"},{"location":"propagation/","page":"Option Propagation","title":"Option Propagation","text":"In the above example, with_options sets the scope for both Dagger.@spawn f(1) and Dagger.@spawn f(2) to ProcessScope(2) (locking Dagger tasks to worker 2). This is of course very useful for ensuring that a set of operations use a certain scope. What it also does, however, is propagates this scope through calls to @async, Threads.@spawn, and Dagger.@spawn; this means that the task spawned by f(x) also inherits this scope! This works thanks to the magic of context variables, which are inherited recursively through child tasks, and thanks to Dagger intentionally propagating the scope (and other options passed to with_options) across the cluster, ensuring that no matter how deep the recursive task spawning goes, the options are maintained.","category":"page"},{"location":"propagation/","page":"Option Propagation","title":"Option Propagation","text":"It's also possible to retrieve the options currently set by with_options, using Dagger.get_options:","category":"page"},{"location":"propagation/","page":"Option Propagation","title":"Option Propagation","text":"Dagger.with_options(;scope=ProcessScope(2)) do\n fetch(@async @assert Dagger.get_options().scope == ProcessScope(2))\n # Or:\n fetch(@async @assert Dagger.get_options(:scope) == ProcessScope(2))\n # Or, if `scope` might not have been propagated as an option, we can give\n # it a default value:\n fetch(@async @assert Dagger.get_options(:scope, AnyScope()) == ProcessScope(2))\nend","category":"page"},{"location":"propagation/","page":"Option Propagation","title":"Option Propagation","text":"This is a very powerful concept: with a single call to with_options, we can apply any set of options to any nested set of operations. This is great for isolating large workloads to different workers or processors, defining global checkpoint/restore behavior, and more.","category":"page"},{"location":"checkpointing/#Checkpointing","page":"Checkpointing","title":"Checkpointing","text":"","category":"section"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"If at some point during a Dagger computation a thunk throws an error, or if the entire computation dies because the head node hit an OOM or other unexpected error, the entire computation is lost and needs to be started from scratch. This can be unacceptable for scheduling very large/expensive/mission-critical graphs, and for interactive development where errors are common and easily fixable.","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"Robust applications often support \"checkpointing\", where intermediate results are periodically written out to persistent media, or sharded to the rest of the cluster, to allow resuming an interrupted computation from a point later than the original start. Dagger provides infrastructure to perform user-driven checkpointing of intermediate results once they're generated.","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"As a concrete example, imagine that you're developing a numerical algorithm, and distributing it with Dagger. The idea is to sum all the values in a very big matrix, and then get the square root of the absolute value of the sum of sums. Here is what that might look like:","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"X = compute(randn(Blocks(128,128), 1024, 1024))\nY = [delayed(sum)(chunk) for chunk in X.chunks]\ninner(x...) = sqrt(sum(x))\nZ = delayed(inner)(Y...)\nz = collect(Z)","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"Let's pretend that the above calculation of each element in Y takes a full day to run. If you run this, you might realize that if the final sum call returns a negative number, sqrt will throw a DomainError (because sqrt can't accept negative Real inputs). Of course, you forgot to add a call to abs before the call to sqrt! Now, you know how to fix this, but once you do, you'll have to spend another entire day waiting for it to finish! And maybe you fix this one bug and wait a full day for it to finish, and begin adding more very computationally-heavy code (which inevitably has bugs). Those later computations might fail, and if you're running this as a script (maybe under a cluster scheduler like Slurm), you have to restart everything from the very beginning. This is starting to sound pretty wasteful...","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"Thankfully, Dagger has a simple solution to this: checkpointing. With checkpointing, Dagger can be instructed to save intermediate results (maybe the results of computing Y) to a persistent storage medium of your choice. Probably a file on disk, but maybe a database, or even just stored in RAM in a space-efficient form. You also tell Dagger how to restore this data: how to take the result stored in its persistent form, and turn it back into something identical to the original intermediate data that Dagger computed. Then, when the worst happens and a piece of your algorithm throws an error (as above), Dagger will call the restore function and try to materialize those intermediate results that you painstakingly computed, so that you don't need to re-compute them.","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"Let's see how we'd modify the above example to use checkpointing:","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"using Serialization\n\nX = compute(randn(Blocks(128,128), 1024, 1024))\nY = [delayed(sum; checkpoint=(thunk,result)->begin\n open(\"checkpoint-$idx.bin\", \"w\") do io\n serialize(io, collect(result))\n end\nend, restore=(thunk)->begin\n open(\"checkpoint-$idx.bin\", \"r\") do io\n Dagger.tochunk(deserialize(io))\n end\nend)(chunk) for (idx,chunk) in enumerate(X.chunks)]\ninner(x...) = sqrt(sum(x))\nZ = delayed(inner)(Y...)\nz = collect(Z)","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"Two changes were made: first, we enumerate(X.chunks) so that we can get a unique index to identify each chunk; second, we specify a ThunkOptions to delayed with a checkpoint and restore function that is specialized to write or read the given chunk to or from a file on disk, respectively. Notice the usage of collect in the checkpoint function, and the use of Dagger.tochunk in the restore function; Dagger represents intermediate results as Dagger.Chunk objects, so we need to convert between Chunks and the actual data to keep Dagger happy. Performance-sensitive users might consider modifying these methods to store the checkpoint files on the filesystem of the server that currently owns the Chunk, to minimize data transfer times during checkpoint and restore operations.","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"If we run the above code once, we'll still end up waiting a day for Y to be computed, and we'll still get the DomainError from sqrt. However, when we fix the inner function to include that call to abs that was missing, and we re-run this code starting from the creation of Y, we'll find that we don't actually spend a day waiting; we probably spend a few seconds waiting, and end up with our final result! This is because Dagger called the restore function for each element of Y, and was provided a result by the user-specified function, so it skipped re-computing those sums entirely.","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"You might also notice that when you ran this code the first time, you received errors about \"No such file or directory\", or some similar error; this occurs because Dagger always calls the restore function when it exists. In the first run, the checkpoint files don't yet exist, so there's nothing to restore; Dagger reports the thrown error, but keeps moving along, merrily computing the sums of Y. You're welcome to explicitly check if the file exists, and if not, return nothing; then Dagger won't report an annoying error, and will skip the restoration quietly.","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"Of course, you might have a lot of code that looks like this, and may want to also checkpoint the final result of the z = collect(...) call as well. This is just as easy to do:","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"# compute X, Y, Z above ...\nz = collect(Z; options=Dagger.Sch.SchedulerOptions(;\ncheckpoint=(result)->begin\n open(\"checkpoint-final.bin\", \"w\") do io\n serialize(io, collect(result))\n end\nend, restore=()->begin\n open(\"checkpoint-final.bin\", \"r\") do io\n Dagger.tochunk(deserialize(io))\n end\nend))","category":"page"},{"location":"checkpointing/","page":"Checkpointing","title":"Checkpointing","text":"In this case, the entire computation will be skipped if checkpoint-final.bin exists!","category":"page"},{"location":"api-dagger/types/","page":"Types","title":"Types","text":"CurrentModule = Dagger","category":"page"},{"location":"api-dagger/types/#Dagger-Types","page":"Types","title":"Dagger Types","text":"","category":"section"},{"location":"api-dagger/types/","page":"Types","title":"Types","text":"Pages = [\"types.md\"]","category":"page"},{"location":"api-dagger/types/#Task-Types","page":"Types","title":"Task Types","text":"","category":"section"},{"location":"api-dagger/types/","page":"Types","title":"Types","text":"Thunk\nEagerThunk","category":"page"},{"location":"api-dagger/types/#Dagger.Thunk","page":"Types","title":"Dagger.Thunk","text":"Thunk\n\nWraps a callable object to be run with Dagger. A Thunk is typically created through a call to delayed or its macro equivalent @par.\n\nConstructors\n\ndelayed(f; kwargs...)(args...)\n@par [option=value]... f(args...)\n\nExamples\n\njulia> t = delayed(sin)(π) # creates a Thunk to be computed later\nThunk(sin, (π,))\n\njulia> collect(t) # computes the result and returns it to the current process\n1.2246467991473532e-16\n\nArguments\n\nf: The function to be called upon execution of the Thunk.\nargs: The arguments to be passed to the Thunk.\nkwargs: The properties describing unique behavior of this Thunk. Details\n\nfor each property are described in the next section.\n\noption=value: The same as passing kwargs to delayed.\n\nPublic Properties\n\nmeta::Bool=false: If true, instead of fetching cached arguments from\n\nChunks and passing the raw arguments to f, instead pass the Chunk. Useful for doing manual fetching or manipulation of Chunk references. Non-Chunk arguments are still passed as-is.\n\nprocessor::Processor=OSProc() - The processor associated with f. Useful if\n\nf is a callable struct that exists on a given processor and should be transferred appropriately.\n\nscope::Dagger.AbstractScope=DefaultScope() - The scope associated with f.\n\nUseful if f is a function or callable struct that may only be transferred to, and executed within, the specified scope.\n\nOptions\n\noptions: A Sch.ThunkOptions struct providing the options for the Thunk.\n\nIf omitted, options can also be specified by passing key-value pairs as kwargs.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.EagerThunk","page":"Types","title":"Dagger.EagerThunk","text":"EagerThunk\n\nReturned from spawn/@spawn calls. Represents a task that is in the scheduler, potentially ready to execute, executing, or finished executing. May be fetch'd or wait'd on at any time.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Task-Options-Types","page":"Types","title":"Task Options Types","text":"","category":"section"},{"location":"api-dagger/types/","page":"Types","title":"Types","text":"Options\nSch.ThunkOptions\nSch.SchedulerOptions","category":"page"},{"location":"api-dagger/types/#Data-Management-Types","page":"Types","title":"Data Management Types","text":"","category":"section"},{"location":"api-dagger/types/","page":"Types","title":"Types","text":"Chunk\nShard","category":"page"},{"location":"api-dagger/types/#Processor-Types","page":"Types","title":"Processor Types","text":"","category":"section"},{"location":"api-dagger/types/","page":"Types","title":"Types","text":"Processor\nOSProc\nThreadProc","category":"page"},{"location":"api-dagger/types/#Dagger.Processor","page":"Types","title":"Dagger.Processor","text":"Processor\n\nAn abstract type representing a processing device and associated memory, where data can be stored and operated on. Subtypes should be immutable, and instances should compare equal if they represent the same logical processing device/memory. Subtype instances should be serializable between different nodes. Subtype instances may contain a \"parent\" Processor to make it easy to transfer data to/from other types of Processor at runtime.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.OSProc","page":"Types","title":"Dagger.OSProc","text":"OSProc <: Processor\n\nJulia CPU (OS) process, identified by Distributed pid. The logical parent of all processors on a given node, but otherwise does not participate in computations.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.ThreadProc","page":"Types","title":"Dagger.ThreadProc","text":"ThreadProc <: Processor\n\nJulia CPU (OS) thread, identified by Julia thread ID.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Scope-Types","page":"Types","title":"Scope Types","text":"","category":"section"},{"location":"api-dagger/types/","page":"Types","title":"Types","text":"AnyScope\nNodeScope\nProcessScope\nProcessorTypeScope\nTaintScope\nUnionScope\nExactScope","category":"page"},{"location":"api-dagger/types/#Dagger.AnyScope","page":"Types","title":"Dagger.AnyScope","text":"Widest scope that contains all processors.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.NodeScope","page":"Types","title":"Dagger.NodeScope","text":"Scoped to the same physical node.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.ProcessScope","page":"Types","title":"Dagger.ProcessScope","text":"Scoped to the same OS process.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.ProcessorTypeScope","page":"Types","title":"Dagger.ProcessorTypeScope","text":"Scoped to any processor with a given supertype.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/types/#Dagger.TaintScope","page":"Types","title":"Dagger.TaintScope","text":"Taints a scope for later evaluation.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.UnionScope","page":"Types","title":"Dagger.UnionScope","text":"Union of two or more scopes.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.ExactScope","page":"Types","title":"Dagger.ExactScope","text":"Scoped to a specific processor.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Context-Types","page":"Types","title":"Context Types","text":"","category":"section"},{"location":"api-dagger/types/","page":"Types","title":"Types","text":"Context","category":"page"},{"location":"api-dagger/types/#Dagger.Context","page":"Types","title":"Dagger.Context","text":"Context(xs::Vector{OSProc}) -> Context\nContext(xs::Vector{Int}) -> Context\n\nCreate a Context, by default adding each available worker.\n\nIt is also possible to create a Context from a vector of OSProc, or equivalently the underlying process ids can also be passed directly as a Vector{Int}.\n\nSpecial fields include:\n\n'log_sink': A log sink object to use, if any.\nlog_file::Union{String,Nothing}: Path to logfile. If specified, at\n\nscheduler termination, logs will be collected, combined with input thunks, and written out in DOT format to this location.\n\nprofile::Bool: Whether or not to perform profiling with Profile stdlib.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Array-Types","page":"Types","title":"Array Types","text":"","category":"section"},{"location":"api-dagger/types/","page":"Types","title":"Types","text":"DArray\nBlocks\nArrayDomain\nUnitDomain","category":"page"},{"location":"api-dagger/types/#Dagger.DArray","page":"Types","title":"Dagger.DArray","text":"DArray{T,N,F}(domain, subdomains, chunks, concat)\nDArray(T, domain, subdomains, chunks, [concat=cat])\n\nAn N-dimensional distributed array of element type T, with a concatenation function of type F.\n\nArguments\n\nT: element type\ndomain::ArrayDomain{N}: the whole ArrayDomain of the array\nsubdomains::AbstractArray{ArrayDomain{N}, N}: a DomainBlocks of the same dimensions as the array\nchunks::AbstractArray{Union{Chunk,Thunk}, N}: an array of chunks of dimension N\nconcat::F: a function of type F. concat(x, y; dims=d) takes two chunks x and y and concatenates them along dimension d. cat is used by default.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.Blocks","page":"Types","title":"Dagger.Blocks","text":"Blocks(xs...)\n\nIndicates the size of an array operation, specified as xs, whose length indicates the number of dimensions in the resulting array.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.ArrayDomain","page":"Types","title":"Dagger.ArrayDomain","text":"ArrayDomain{N}\n\nAn N-dimensional domain over an array.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.UnitDomain","page":"Types","title":"Dagger.UnitDomain","text":"UnitDomain\n\nDefault domain – has no information about the value\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Logging-Event-Types","page":"Types","title":"Logging Event Types","text":"","category":"section"},{"location":"api-dagger/types/","page":"Types","title":"Types","text":"Events.BytesAllocd\nEvents.ProcessorSaturation\nEvents.WorkerSaturation","category":"page"},{"location":"api-dagger/types/#Dagger.Events.BytesAllocd","page":"Types","title":"Dagger.Events.BytesAllocd","text":"BytesAllocd\n\nTracks memory allocated for Chunks.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.Events.ProcessorSaturation","page":"Types","title":"Dagger.Events.ProcessorSaturation","text":"ProcessorSaturation\n\nTracks the compute saturation (running tasks) per-processor.\n\n\n\n\n\n","category":"type"},{"location":"api-dagger/types/#Dagger.Events.WorkerSaturation","page":"Types","title":"Dagger.Events.WorkerSaturation","text":"WorkerSaturation\n\nTracks the compute saturation (running tasks).\n\n\n\n\n\n","category":"type"},{"location":"task-spawning/#Task-Spawning","page":"Task Spawning","title":"Task Spawning","text":"","category":"section"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"The main entrypoint to Dagger is @spawn:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Dagger.@spawn [option=value]... f(args...; kwargs...)","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"or spawn if it's more convenient:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Dagger.spawn(f, Dagger.Options(options), args...; kwargs...)","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"When called, it creates an EagerThunk (also known as a \"thunk\" or \"task\") object representing a call to function f with the arguments args and keyword arguments kwargs. If it is called with other thunks as args/kwargs, such as in Dagger.@spawn f(Dagger.@spawn g()), then, in this example, the function f gets passed the results of executing g(), once that result is available. If g() isn't yet finished executing, then the execution of f waits on g() to complete before executing.","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"An important observation to make is that, for each argument to @spawn/spawn, if the argument is the result of another @spawn/spawn call (thus it's an EagerThunk), the argument will be computed first, and then its result will be passed into the function receiving the argument. If the argument is not an EagerThunk (instead, some other type of Julia object), it'll be passed as-is to the function f (with some exceptions).","category":"page"},{"location":"task-spawning/#Options","page":"Task Spawning","title":"Options","text":"","category":"section"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"The Options struct in the second argument position is optional; if provided, it is passed to the scheduler to control its behavior. Options contains a NamedTuple of option key-value pairs, which can be any of:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Any field in Dagger.Sch.ThunkOptions (see Scheduler and Thunk options)\nmeta::Bool – Pass the input Chunk objects themselves to f and not the value contained in them","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"There are also some extra optionss that can be passed, although they're considered advanced options to be used only by developers or library authors:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"get_result::Bool – return the actual result to the scheduler instead of Chunk objects. Used when f explicitly constructs a Chunk or when return value is small (e.g. in case of reduce)\npersist::Bool – the result of this Thunk should not be released after it becomes unused in the DAG\ncache::Bool – cache the result of this Thunk such that if the thunk is evaluated again, one can just reuse the cached value. If it’s been removed from cache, recompute the value.","category":"page"},{"location":"task-spawning/#Simple-example","page":"Task Spawning","title":"Simple example","text":"","category":"section"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Let's see a very simple directed acyclic graph (or DAG) constructed with Dagger:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"using Dagger\n\nadd1(value) = value + 1\nadd2(value) = value + 2\ncombine(a...) = sum(a)\n\np = Dagger.@spawn add1(4)\nq = Dagger.@spawn add2(p)\nr = Dagger.@spawn add1(3)\ns = Dagger.@spawn combine(p, q, r)\n\n@assert fetch(s) == 16","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"The thunks p, q, r, and s have the following structure:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"(Image: graph)","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"The final result (from fetch(s)) is the obvious consequence of the operation:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"add1(4) + add2(add1(4)) + add1(3)","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"(4 + 1) + ((4 + 1) + 2) + (3 + 1) == 16","category":"page"},{"location":"task-spawning/#Eager-Execution","page":"Task Spawning","title":"Eager Execution","text":"","category":"section"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Dagger's @spawn macro works similarly to @async and Threads.@spawn: when called, it wraps the function call specified by the user in an EagerThunk object, and immediately places it onto a running scheduler, to be executed once its dependencies are fulfilled.","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"x = rand(400,400)\ny = rand(400,400)\nzt = Dagger.@spawn x * y\nz = fetch(zt)\n@assert isapprox(z, x * y)","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"One can also wait on the result of @spawn and check completion status with isready:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"x = Dagger.@spawn sleep(10)\n@assert !isready(x)\nwait(x)\n@assert isready(x)","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Like @async and Threads.@spawn, Dagger.@spawn synchronizes with locally-scoped @sync blocks:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"function sleep_and_print(delay, str)\n sleep(delay)\n println(str)\nend\n@sync begin\n Dagger.@spawn sleep_and_print(3, \"I print first\")\nend\nwait(Dagger.@spawn sleep_and_print(1, \"I print second\"))","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"One can also safely call @spawn from another worker (not ID 1), and it will be executed correctly:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"x = fetch(Distributed.@spawnat 2 Dagger.@spawn 1+2) # fetches the result of `@spawnat`\nx::EagerThunk\n@assert fetch(x) == 3 # fetch the result of `@spawn`","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"This is useful for nested execution, where an @spawn'd thunk calls @spawn. This is detailed further in Dynamic Scheduler Control.","category":"page"},{"location":"task-spawning/#Errors","page":"Task Spawning","title":"Errors","text":"","category":"section"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"If a thunk errors while running under the eager scheduler, it will be marked as having failed, all dependent (downstream) thunks will be marked as failed, and any future thunks that use a failed thunk as input will fail. Failure can be determined with fetch, which will re-throw the error that the originally-failing thunk threw. wait and isready will not check whether a thunk or its upstream failed; they only check if the thunk has completed, error or not.","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"This failure behavior is not the default for lazy scheduling (Lazy API), but can be enabled by setting the scheduler/thunk option (Scheduler and Thunk options) allow_error to true. However, this option isn't terribly useful for non-dynamic usecases, since any thunk failure will propagate down to the output thunk regardless of where it occurs.","category":"page"},{"location":"task-spawning/#Lazy-API","page":"Task Spawning","title":"Lazy API","text":"","category":"section"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Alongside the modern eager API, Dagger also has a legacy lazy API, accessible via @par or delayed. The above computation can be executed with the lazy API by substituting @spawn with @par and fetch with collect:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"p = Dagger.@par add1(4)\nq = Dagger.@par add2(p)\nr = Dagger.@par add1(3)\ns = Dagger.@par combine(p, q, r)\n\n@assert collect(s) == 16","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"or similarly, in block form:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"s = Dagger.@par begin\n p = add1(4)\n q = add2(p)\n r = add1(3)\n combine(p, q, r)\nend\n\n@assert collect(s) == 16","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Alternatively, if you want to compute but not fetch the result of a lazy operation, you can call compute on the thunk. This will return a Chunk object which references the result (see Chunks for more details):","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"x = Dagger.@par 1+2\ncx = compute(x)\ncx::Chunk\n@assert collect(cx) == 3","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Note that, as a legacy API, usage of the lazy API is generally discouraged for modern usage of Dagger. The reasons for this are numerous:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Nothing useful is happening while the DAG is being constructed, adding extra latency\nDynamically expanding the DAG can't be done with @par and delayed, making recursive nesting annoying to write\nEach call to compute/collect starts a new scheduler, and destroys it at the end of the computation, wasting valuable time on setup and teardown\nDistinct schedulers don't share runtime metrics or learned parameters, thus causing the scheduler to act less intelligently\nDistinct schedulers can't share work or data directly","category":"page"},{"location":"task-spawning/#Scheduler-and-Thunk-options","page":"Task Spawning","title":"Scheduler and Thunk options","text":"","category":"section"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"While Dagger generally \"just works\", sometimes one needs to exert some more fine-grained control over how the scheduler allocates work. There are two parallel mechanisms to achieve this: Scheduler options (from Dagger.Sch.SchedulerOptions) and Thunk options (from Dagger.Sch.ThunkOptions). These two options structs contain many shared options, with the difference being that Scheduler options operate globally across an entire DAG, and Thunk options operate on a thunk-by-thunk basis.","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Scheduler options can be constructed and passed to collect() or compute() as the keyword argument options for lazy API usage:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"t = Dagger.@par 1+2\nopts = Dagger.Sch.SchedulerOptions(;single=1) # Execute on worker 1\n\ncompute(t; options=opts)\n\ncollect(t; options=opts)","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"Thunk options can be passed to @spawn/spawn, @par, and delayed similarly:","category":"page"},{"location":"task-spawning/","page":"Task Spawning","title":"Task Spawning","text":"# Execute on worker 1\n\nDagger.@spawn single=1 1+2\nDagger.spawn(+, Dagger.Options(;single=1), 1, 2)\n\ndelayed(+; single=1)(1, 2)","category":"page"},{"location":"task-queues/#Task-Queues","page":"Task Queues","title":"Task Queues","text":"","category":"section"},{"location":"task-queues/","page":"Task Queues","title":"Task Queues","text":"By default, @spawn/spawn submit tasks immediately and directly into Dagger's scheduler without modifications. However, sometimes you want to be able to tweak this behavior for a region of code; for example, when working with GPUs or other operations which operate in-place, you might want to emulate CUDA's stream semantics by ensuring that tasks execute sequentially (to avoid one kernel reading from an array while another kernel is actively writing to it). Or, you might want to ensure that a set of Dagger tasks are submitted into the scheduler all at once for benchmarking purposes or to emulate the behavior of delayed. This and more is possible through a mechanism called \"task queues\".","category":"page"},{"location":"task-queues/","page":"Task Queues","title":"Task Queues","text":"A task queue in Dagger is an object that can be configured to accept unlaunched tasks from @spawn/spawn and either modify them or delay their launching arbitrarily. By default, Dagger tasks are enqueued through the EagerTaskQueue, which submits tasks directly into the scheduler before @spawn/spawn returns. However, Dagger also has an InOrderTaskQueue, which ensures that tasks enqueued through it execute sequentially with respect to each other. This queue can be allocated with Dagger.spawn_sequential:","category":"page"},{"location":"task-queues/","page":"Task Queues","title":"Task Queues","text":"A = rand(16)\nB = zeros(16)\nC = zeros(16)\nfunction vcopy!(B, A)\n B .= A .+ 1.0\n return\nend\nfunction vadd!(C, A, B)\n C .+= A .+ B\n return\nend\nwait(Dagger.spawn_sequential() do\n Dagger.@spawn vcopy!(B, A)\n Dagger.@spawn vadd!(C, A, B)\nend)","category":"page"},{"location":"task-queues/","page":"Task Queues","title":"Task Queues","text":"In the above example, vadd! is guaranteed to wait until vcopy! is completed, even though vadd! isn't taking the result of vcopy! as an argument (which is how tasks are normally ordered).","category":"page"},{"location":"task-queues/","page":"Task Queues","title":"Task Queues","text":"What if we wanted to launch multiple vcopy! calls within a spawn_sequential region and allow them to execute in parallel, but still ensure that the vadd! happens after they all finish? In this case, we want to switch to another kind of task queue: the LazyTaskQueue. This task queue batches up task submissions into groups, so that all tasks enqueued with it are placed in the scheduler all at once. But what would happen if we used this task queue (via spawn_bulk) within a region using spawn_sequential:","category":"page"},{"location":"task-queues/","page":"Task Queues","title":"Task Queues","text":"A = rand(16)\nB1 = zeros(16)\nB2 = zeros(16)\nC = zeros(16)\nwait(Dagger.spawn_sequential() do\n Dagger.spawn_bulk() do\n Dagger.@spawn vcopy!(B1, A)\n Dagger.@spawn vcopy!(B2, A)\n end\n Dagger.@spawn vadd!(C, B1, B2)\nend)","category":"page"},{"location":"task-queues/","page":"Task Queues","title":"Task Queues","text":"Conveniently, Dagger's task queues can be nested to get the expected behavior; the above example will submit the two vcopy! tasks as a group (and they can execute concurrently), while still ensuring that those two tasks finish before the vadd! task executes.","category":"page"},{"location":"task-queues/","page":"Task Queues","title":"Task Queues","text":"warn: Warn\nTask queues do not propagate to nested tasks; if a Dagger task launches another task internally, the child task doesn't inherit the task queue that the parent task was enqueued in.","category":"page"},{"location":"benchmarking/#Benchmarking-Dagger","page":"Benchmarking","title":"Benchmarking Dagger","text":"","category":"section"},{"location":"benchmarking/","page":"Benchmarking","title":"Benchmarking","text":"For ease of benchmarking changes to Dagger's scheduler and the DArray, a benchmarking script exists at benchmarks/benchmark.jl. This script currently allows benchmarking a non-negative matrix factorization (NNMF) algorithm, which we've found to be a good evaluator of scheduling performance. The benchmark script can test with and without Dagger, and also has support for using CUDA or AMD GPUs to accelerate the NNMF via DaggerGPU.jl.","category":"page"},{"location":"benchmarking/","page":"Benchmarking","title":"Benchmarking","text":"The script checks for a number of environment variables, which are used to control the benchmarks that are performed (all of which are optional):","category":"page"},{"location":"benchmarking/","page":"Benchmarking","title":"Benchmarking","text":"BENCHMARK_PROCS: Selects the number of Julia processes and threads to start-up. Specified as 8:4, this option would start 8 extra Julia processes, with 4 threads each. Defaults to 2 processors and 1 thread each.\nBENCHMARK_REMOTES: Specifies a colon-separated list of remote servers to connect to and start Julia processes on, using BENCHMARK_PROCS to indicate the processor/thread configuration of those remotes. Disabled by default (uses the local machine).\nBENCHMARK_OUTPUT_FORMAT: Selects the output format for benchmark results. Defaults to jls, which uses Julia's Serialization stdlib, and can also be jld to use JLD.jl.\nBENCHMARK_RENDER: Configures rendering, which is disabled by default. Can be \"live\" or \"offline\", which are explained below.\nBENCHMARK: Specifies the set of benchmarks to run as a comma-separated list, where each entry can be one of cpu, cuda, or amdgpu, and may optionally append +dagger (like cuda+dagger) to indicate whether or not to use Dagger. Defaults to cpu,cpu+dagger, which runs CPU benchmarks with and without Dagger.\nBENCHMARK_SCALE: Determines how much to scale the benchmark sizing by, typically specified as a UnitRange{Int}. Defaults to 1:5:50, which runs each scale from 1 to 50, in steps of 5.","category":"page"},{"location":"benchmarking/#Rendering-with-BENCHMARK_RENDER","page":"Benchmarking","title":"Rendering with BENCHMARK_RENDER","text":"","category":"section"},{"location":"benchmarking/","page":"Benchmarking","title":"Benchmarking","text":"Dagger contains visualization code for the scheduler (as a Gantt chart) and thunk execution profiling (flamechart), which can be enabled with BENCHMARK_RENDER. Additionally, rendering can be done \"live\", served via a Mux.jl webserver run locally, or \"offline\", where the visualization will be embedded into the results output file. By default, rendering is disabled. If BENCHMARK_RENDER is set to live, a Mux webserver is started at localhost:8000 (the address is not yet configurable), and the Gantt chart and profiling flamechart will be rendered once the benchmarks start. If set to offline, data visualization will happen in the background, and will be passed in the results file.","category":"page"},{"location":"benchmarking/","page":"Benchmarking","title":"Benchmarking","text":"Note that Gantt chart and flamechart output is only generated and relevant during Dagger execution.","category":"page"},{"location":"benchmarking/#TODO:-Plotting","page":"Benchmarking","title":"TODO: Plotting","text":"","category":"section"},{"location":"api-timespanlogging/types/","page":"Types","title":"Types","text":"CurrentModule = TimespanLogging","category":"page"},{"location":"api-timespanlogging/types/#TimespanLogging-Types","page":"Types","title":"TimespanLogging Types","text":"","category":"section"},{"location":"api-timespanlogging/types/","page":"Types","title":"Types","text":"Pages = [\"types.md\"]","category":"page"},{"location":"api-timespanlogging/types/#Log-Sink-Types","page":"Types","title":"Log Sink Types","text":"","category":"section"},{"location":"api-timespanlogging/types/","page":"Types","title":"Types","text":"MultiEventLog\nLocalEventLog\nNoOpLog","category":"page"},{"location":"api-timespanlogging/types/#TimespanLogging.MultiEventLog","page":"Types","title":"TimespanLogging.MultiEventLog","text":"MultiEventLog\n\nProcesses events immediately, generating multiple log streams. Multiple consumers may register themselves in the MultiEventLog, and when accessed, log events will be provided to all consumers. A consumer is simply a function or callable struct which will be called with an event when it's generated. The return value of the consumer will be pushed into a log stream dedicated to that consumer. Errors thrown by consumers will be caught and rendered, but will not otherwise interrupt consumption by other consumers, or future consumption cycles. An error will result in nothing being appended to that consumer's log.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#TimespanLogging.LocalEventLog","page":"Types","title":"TimespanLogging.LocalEventLog","text":"LocalEventLog\n\nStores events in a process-local array. Accessing the logs is all-or-nothing; if multiple consumers call get_logs!, they will get different sets of logs.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#TimespanLogging.NoOpLog","page":"Types","title":"TimespanLogging.NoOpLog","text":"NoOpLog\n\nDisables event logging entirely.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#Event-Types","page":"Types","title":"Event Types","text":"","category":"section"},{"location":"api-timespanlogging/types/","page":"Types","title":"Types","text":"Event","category":"page"},{"location":"api-timespanlogging/types/#TimespanLogging.Event","page":"Types","title":"TimespanLogging.Event","text":"An event generated by timespan_start or timespan_finish.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#Built-in-Event-Types","page":"Types","title":"Built-in Event Types","text":"","category":"section"},{"location":"api-timespanlogging/types/","page":"Types","title":"Types","text":"Events.CoreMetrics\nEvents.IDMetrics\nEvents.TimelineMetrics\nEvents.FullMetrics\nEvents.CPULoadAverages\nEvents.MemoryFree\nEvents.EventSaturation\nEvents.DebugMetrics\nEvents.LogWindow","category":"page"},{"location":"api-timespanlogging/types/#TimespanLogging.Events.CoreMetrics","page":"Types","title":"TimespanLogging.Events.CoreMetrics","text":"CoreMetrics\n\nTracks the timestamp, category, and kind of the Event object generated by log events.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#TimespanLogging.Events.IDMetrics","page":"Types","title":"TimespanLogging.Events.IDMetrics","text":"IDMetrics\n\nTracks the ID of Event objects generated by log events.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#TimespanLogging.Events.TimelineMetrics","page":"Types","title":"TimespanLogging.Events.TimelineMetrics","text":"TimelineMetrics\n\nTracks the timeline of Event objects generated by log events.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#TimespanLogging.Events.FullMetrics","page":"Types","title":"TimespanLogging.Events.FullMetrics","text":"FullMetrics\n\nTracks the full Event object generated by log events.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#TimespanLogging.Events.CPULoadAverages","page":"Types","title":"TimespanLogging.Events.CPULoadAverages","text":"CPULoadAverages\n\nMonitors the CPU load averages.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#TimespanLogging.Events.MemoryFree","page":"Types","title":"TimespanLogging.Events.MemoryFree","text":"MemoryFree\n\nMonitors the percentage of free system memory.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#TimespanLogging.Events.EventSaturation","page":"Types","title":"TimespanLogging.Events.EventSaturation","text":"EventSaturation\n\nTracks the compute saturation (running tasks) per-processor.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#TimespanLogging.Events.DebugMetrics","page":"Types","title":"TimespanLogging.Events.DebugMetrics","text":"Debugging metric, used to log event start/finish via @debug.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/#TimespanLogging.Events.LogWindow","page":"Types","title":"TimespanLogging.Events.LogWindow","text":"LogWindow\n\nAggregator that prunes events to within a given time window.\n\n\n\n\n\n","category":"type"},{"location":"api-timespanlogging/types/","page":"Types","title":"Types","text":"```","category":"page"},{"location":"logging/#Logging-and-Graphing","page":"Logging and Graphing","title":"Logging and Graphing","text":"","category":"section"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"Dagger's scheduler keeps track of the important and potentially expensive actions it does, such as moving data between workers or executing thunks, and tracks how much time and memory allocations these operations consume, among other things. It does it through the TimespanLogging.jl package (which used to be directly integrated into Dagger). Saving this information somewhere accessible is disabled by default, but it's quite easy to turn it on, by setting a \"log sink\" in the Context being used, as ctx.log_sink. A variety of log sinks are built-in to TimespanLogging; the NoOpLog is the default log sink when one isn't explicitly specified, and disables logging entirely (to minimize overhead). There are currently two other log sinks of interest; the first and newer of the two is the MultiEventLog, which generates multiple independent log streams, one per \"consumer\" (details in the next section). The second and older sink is the LocalEventLog, which is explained later in this document. Most users are recommended to use the MultiEventLog since it's far more flexible and extensible, and is more performant in general.","category":"page"},{"location":"logging/#MultiEventLog","page":"Logging and Graphing","title":"MultiEventLog","text":"","category":"section"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"The MultiEventLog is intended to be configurable to exclude unnecessary information, and to include any built-in or user-defined metrics. It stores a set of \"sub-log\" streams internally, appending a single element to each of them when an event is generated. This element can be called a \"sub-event\" (to distinguish it from the higher-level \"event\" that Dagger creates), and is created by a \"consumer\". A consumer is a function or callable struct that, when called with the Event object generated by TimespanLogging, returns a sub-event characterizing whatever information the consumer represents. For example, the Dagger.Events.BytesAllocd consumer calculates the total bytes allocated and live at any given time within Dagger, and returns the current value when called. Let's construct one:","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"ctx = Context()\nml = TimespanLogging.MultiEventLog()\n\n# Add the BytesAllocd consumer to the log as `:bytes`\nml[:bytes] = Dagger.Events.BytesAllocd()\n\nctx.log_sink = ml","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"As we can see above, each consumer gets a unique name as a Symbol that identifies it. Now that the log sink is attached with a consumer, we can execute some Dagger tasks, and then collect the sub-events generated by BytesAllocd:","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"# Using the lazy API, for explanatory purposes\ncollect(ctx, delayed(+)(1, delayed(*)(3, 4))) # Allocates 8 bytes\nlog = TimspanLogging.get_logs!(ctx)[1] # Get the logs for worker 1\n@show log[:bytes]","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"You'll then see that 8 bytes are allocated and then freed during the process of executing and completing those tasks.","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"Note that the MultiEventLog can also be used perfectly well when using Dagger's eager API:","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"ctx = Dagger.Sch.eager_context()\nctx.log_sink = ml\n\na = Dagger.@spawn 3*4\nDagger.@spawn 1+a","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"There are a variety of other consumers built-in to TimespanLogging and Dagger, under the TimespanLogging.Events and Dagger.Events modules, respectively; see Dagger Types and TimespanLogging Types for details.","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"The MultiEventLog also has a mechanism to call a set of functions, called \"aggregators\", after all consumers have been executed, and are passed the full set of log streams as a Dict{Symbol,Vector{Any}}. The only one currently shipped with TimespanLogging directly is the LogWindow, and DaggerWebDash.jl has the TableStorage which integrates with it; see DaggerWebDash Types for details.","category":"page"},{"location":"logging/#LocalEventLog","page":"Logging and Graphing","title":"LocalEventLog","text":"","category":"section"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"The LocalEventLog is generally only useful when you want combined events (event start and finish combined as a single unit), and only care about a few simple built-in generated events. Let's attach one to our context:","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"ctx = Context()\nlog = TimespanLogging.LocalEventLog()\nctx.log_sink = log","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"Now anytime ctx is used as the context for a scheduler, the scheduler will log events into log.","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"Once sufficient data has been accumulated into a LocalEventLog, it can be gathered to a single host via TimespanLogging.get_logs!(log). The result is a Vector of TimespanLogging.Timespan objects, which describe some metadata about an operation that occured and the scheduler logged. These events may be introspected directly, or may also be rendered to a DOT-format string:","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"logs = TimespanLogging.get_logs!(log)\nstr = Dagger.show_plan(logs)","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"Dagger.show_plan can also be called as Dagger.show_plan(io::IO, logs) to write the graph to a file or other IO object. The string generated by this function may be passed to an external tool like Graphviz for rendering. Note that this method doesn't display input arguments to the DAG (non-Thunks); you can call Dagger.show_plan(logs, thunk), where thunk is the output Thunk of the DAG, to render argument nodes.","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"note: Note\nTimespanLogging.get_logs! clears out the event logs, so that old events don't mix with new ones from future DAGs.","category":"page"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"As a convenience, it's possible to set ctx.log_file to the path to an output file, and then calls to compute(ctx, ...)/collect(ctx, ...) will automatically write the graph in DOT format to that path. There is also a benefit to this approach over manual calls to get_logs! and show_plan: DAGs which aren't Thunks (such as operations on the Dagger.DArray) will be properly rendered with input arguments (which normally aren't rendered because a Thunk is dynamically generated from such operations by Dagger before scheduling).","category":"page"},{"location":"logging/#FilterLog","page":"Logging and Graphing","title":"FilterLog","text":"","category":"section"},{"location":"logging/","page":"Logging and Graphing","title":"Logging and Graphing","text":"The FilterLog exists to allow writing events to a user-defined location (such as a database, file, or network socket). It is not currently tested or documented.","category":"page"},{"location":"api-daggerwebdash/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"CurrentModule = DaggerWebDash","category":"page"},{"location":"api-daggerwebdash/functions/#DaggerWebDash-Functions","page":"Functions and Macros","title":"DaggerWebDash Functions","text":"","category":"section"},{"location":"api-daggerwebdash/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"Pages = [\"functions.md\"]","category":"page"},{"location":"datadeps/#Datadeps-(Data-Dependencies)","page":"Datadeps","title":"Datadeps (Data Dependencies)","text":"","category":"section"},{"location":"datadeps/","page":"Datadeps","title":"Datadeps","text":"For many programs, the restriction that tasks cannot write to their arguments feels overly restrictive and makes certain kinds of programs (such as in-place linear algebra) hard to express efficiently in Dagger. Thankfully, there is a solution: spawn_datadeps. This function constructs a \"datadeps region\", within which tasks are allowed to write to their arguments, with parallelism controlled via dependencies specified via argument annotations. Let's look at a simple example to make things concrete:","category":"page"},{"location":"datadeps/","page":"Datadeps","title":"Datadeps","text":"A = rand(1000)\nB = rand(1000)\nC = zeros(1000)\nadd!(X, Y) = X .+= Y\nDagger.spawn_datadeps() do\n Dagger.@spawn add!(InOut(B), In(A))\n Dagger.@spawn copyto!(Out(C), In(B))\nend","category":"page"},{"location":"datadeps/","page":"Datadeps","title":"Datadeps","text":"In this example, we have two Dagger tasks being launched, one adding A into B, and the other copying B into C. The add! task is specifying that A is being only read from (In for \"input\"), and that B is being read from and written to (Out for \"output\", InOut for \"input and output\"). The copyto task, similarly, is specifying that B is being read from, and C is only being written to.","category":"page"},{"location":"datadeps/","page":"Datadeps","title":"Datadeps","text":"Without spawn_datadeps and In, Out, and InOut, the result of these tasks would be undefined; the two tasks could execute in parallel, or the copyto! could occur before the add!, resulting in all kinds of mayhem. However, spawn_datadeps changes things: because we have told Dagger how our tasks access their arguments, Dagger knows to control the parallelism and ordering, and ensure that add! executes and finishes before copyto! begins, ensuring that copyto! \"sees\" the changes to B before executing.","category":"page"},{"location":"datadeps/","page":"Datadeps","title":"Datadeps","text":"There is another important aspect of spawn_datadeps that makes the above code work: if all of the Dagger.@spawn macros are removed, along with the dependency specifiers, the program would still produce the same results, without using Dagger. In other words, the parallel (Dagger) version of the program produces identical results to the serial (non-Dagger) version of the program. This is similar to using Dagger with purely functional tasks and without spawn_datadeps - removing Dagger.@spawn will still result in a correct (sequential and possibly slower) version of the program. Basically, spawn_datadeps will ensure that Dagger respects the ordering and dependencies of a program, while still providing parallelism, where possible.","category":"page"},{"location":"datadeps/","page":"Datadeps","title":"Datadeps","text":"But where is the parallelism? The above example doesn't actually have any parallelism to exploit! Let's take a look at another example to see the datadeps model truly shine:","category":"page"},{"location":"datadeps/","page":"Datadeps","title":"Datadeps","text":"# Tree reduction of multiple arrays into the first array\nfunction tree_reduce!(op::Base.Callable, As::Vector{<:Array})\n Dagger.spawn_datadeps() do\n to_reduce = Vector[]\n push!(to_reduce, As)\n while !isempty(to_reduce)\n As = pop!(to_reduce)\n n = length(As)\n if n == 2\n Dagger.@spawn Base.mapreducedim!(identity, op, InOut(As[1]), In(As[2]))\n elseif n > 2\n push!(to_reduce, [As[1], As[div(n,2)+1]])\n push!(to_reduce, As[1:div(n,2)])\n push!(to_reduce, As[div(n,2)+1:end])\n end\n end\n end\n return As[1]\nend\n\nAs = [rand(1000) for _ in 1:1000]\nBs = copy.(As)\ntree_reduce!(+, As)\n@assert isapprox(As[1], reduce((x,y)->x .+ y, Bs))","category":"page"},{"location":"datadeps/","page":"Datadeps","title":"Datadeps","text":"In the above implementation of tree_reduce! (which is designed to perform an elementwise reduction across a vector of arrays), we have a tree reduction operation where pairs of arrays are reduced, starting with neighboring pairs, and then reducing pairs of reduction results, etc. until the final result is in As[1]. We can see that the application of Dagger to this algorithm is simple - only the single Base.mapreducedim! call is passed to Dagger - yet due to the data dependencies and the algorithm's structure, there should be plenty of parallelism to be exploited across each of the parallel reductions at each \"level\" of the reduction tree. Specifically, any two Dagger.@spawn calls which access completely different pairs of arrays can execute in parallel, while any call which has an In on an array will wait for any previous call which has an InOut on that same array.","category":"page"},{"location":"datadeps/","page":"Datadeps","title":"Datadeps","text":"Additionally, we can notice a powerful feature of this model - if the Dagger.@spawn macro is removed, the code still remains correct, but simply runs sequentially. This means that the structure of the program doesn't have to change in order to use Dagger for parallelization, which can make applying Dagger to existing algorithms quite effortless.","category":"page"},{"location":"api-timespanlogging/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"CurrentModule = TimespanLogging","category":"page"},{"location":"api-timespanlogging/functions/#TimespanLogging-Functions","page":"Functions and Macros","title":"TimespanLogging Functions","text":"","category":"section"},{"location":"api-timespanlogging/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"Pages = [\"functions.md\"]","category":"page"},{"location":"api-timespanlogging/functions/#Basic-Functions","page":"Functions and Macros","title":"Basic Functions","text":"","category":"section"},{"location":"api-timespanlogging/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"timespan_start\ntimespan_finish\nget_logs!\nmake_timespan","category":"page"},{"location":"api-timespanlogging/functions/#TimespanLogging.timespan_start","page":"Functions and Macros","title":"TimespanLogging.timespan_start","text":"timespan_start(ctx, category::Symbol, id, tl)\n\nGenerates an Event{:start} which denotes the start of an event. The event is categorized by category, and uniquely identified by id; these two must be the same passed to timespan_finish to close the event. tl is the \"timeline\" of the event, which is just an arbitrary payload attached to the event.\n\n\n\n\n\n","category":"function"},{"location":"api-timespanlogging/functions/#TimespanLogging.timespan_finish","page":"Functions and Macros","title":"TimespanLogging.timespan_finish","text":"timespan_finish(ctx, category::Symbol, id, tl)\n\nGenerates an Event{:finish} which denotes the end of an event. The event is categorized by category, and uniquely identified by id; these two must be the same as previously passed to timespan_start. tl is the \"timeline\" of the event, which is just an arbitrary payload attached to the event.\n\n\n\n\n\n","category":"function"},{"location":"api-timespanlogging/functions/#TimespanLogging.get_logs!","page":"Functions and Macros","title":"TimespanLogging.get_logs!","text":"get_logs!(::LocalEventLog, raw=false; only_local=false) -> Union{Vector{Timespan},Vector{Event}}\n\nGet the logs from each process' local event log, clearing it in the process. Set raw to true to get potentially unmatched Events; the default is to return only matched events as Timespans. If only_local is set true, only process-local logs will be fetched; the default is to fetch logs from all processes.\n\n\n\n\n\n","category":"function"},{"location":"api-timespanlogging/functions/#TimespanLogging.make_timespan","page":"Functions and Macros","title":"TimespanLogging.make_timespan","text":"make_timespan(start::Event, finish::Event) -> Timespan\n\nCreates a Timespan given the start and finish Events.\n\n\n\n\n\n","category":"function"},{"location":"api-timespanlogging/functions/#Logging-Metric-Functions","page":"Functions and Macros","title":"Logging Metric Functions","text":"","category":"section"},{"location":"api-timespanlogging/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"init_similar","category":"page"},{"location":"api-timespanlogging/functions/#TimespanLogging.init_similar","page":"Functions and Macros","title":"TimespanLogging.init_similar","text":"Creates a copy of x with the same configuration, but fresh/empty data.\n\n\n\n\n\n","category":"function"},{"location":"scopes/#Scopes","page":"Scopes","title":"Scopes","text":"","category":"section"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"Sometimes you will have data that is only meaningful in a certain location, such as within a single Julia process, a given server, or even for a specific Dagger processor. We call this location a \"scope\" in Dagger, denoting the bounds within which the data is meaningful and valid. For example, C pointers are typically scoped to a process, file paths are scoped to one or more servers dependent on filesystem configuration, etc. By default, Dagger doesn't recognize this; it treats everything passed into a task, or generated from a task, as inherently safe to transfer anywhere else. When this is not the case, Dagger provides optional scopes to instruct the scheduler where data is considered valid.","category":"page"},{"location":"scopes/#Scope-Basics","page":"Scopes","title":"Scope Basics","text":"","category":"section"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"Let's take the example of a webcam handle generated by VideoIO.jl. This handle is a C pointer, and thus has process scope. We can open the handle on a given process, and set the scope of the resulting data to be locked to the current process with Dagger.scope to construct a ProcessScope:","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"using VideoIO, Distributed\n\nfunction get_handle()\n handle = VideoIO.opencamera()\n proc = Dagger.thunk_processor()\n scope = Dagger.scope(worker=myid()) # constructs a `ProcessScope`\n return Dagger.tochunk(handle, proc, scope)\nend\n\ncam_handle = Dagger.@spawn get_handle()","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"Now, wherever cam_handle is passed, Dagger will ensure that any computations on the handle only happen within its defined scope. For example, we can read from the camera:","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"cam_frame = Dagger.@spawn read(cam_handle)","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"The cam_frame task is executed within any processor on the same process that the cam_handle task was executed on. Of course, the resulting camera frame is not scoped to anywhere specific (denoted as AnyScope), and thus computations on it may execute anywhere.","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"You may also encounter situations where you want to use a callable struct (such as a closure, or a Flux.jl layer) only within a certain scope; you can specify the scope of the function pretty easily:","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"using Flux\nm = Chain(...)\n# If `m` is only safe to transfer to and execute on this process,\n# we can set a `ProcessScope` on it:\nresult = Dagger.@spawn scope=Dagger.scope(worker=myid()) m(rand(8,8))","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"Setting a scope on the function treats it as a regular piece of data (like the arguments to the function), so it participates in the scoping rules described in the following sections all the same.","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"Scope Functions","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"Now, let's try out some other kinds of scopes, starting with NodeScope. This scope encompasses the server that one or more Julia processes may be running on. Say we want to use memory mapping (mmap) to more efficiently send arrays between two tasks. We can construct the mmap'd array in one task, attach a NodeScope() to it, and using the path of the mmap'd file to communicate its location, lock downstream tasks to the same server:","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"using Mmap\n\nfunction generate()\n path = \"myfile.bin\"\n arr = Mmap.mmap(path, Matrix{Int}, (64,64))\n fill!(arr, 1)\n Mmap.sync!(arr)\n # Note: Dagger.scope() does not yet support node scopes\n Dagger.tochunk(path, Dagger.thunk_processor(), NodeScope())\nend\n\nfunction consume(path)\n arr = Mmap.mmap(path, Matrix{Int}, (64,64))\n sum(arr)\nend\n\na = Dagger.@spawn generate()\n@assert fetch(Dagger.@spawn consume(a)) == 64*64","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"Whatever server a executed on, b will also execute on it!","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"Finally, we come to the \"lowest\" scope on the scope hierarchy, the ExactScope. This scope specifies one exact processor as the bounding scope, and is typically useful in certain limited cases (such as data existing only on a specific GPU). We won't provide an example here, because you don't usually need to ever use this scope, but if you already understand the NodeScope and ProcessScope, the ExactScope should be easy to figure out.","category":"page"},{"location":"scopes/#Union-Scopes","page":"Scopes","title":"Union Scopes","text":"","category":"section"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"Sometimes one simple scope isn't enough! In that case, you can use the UnionScope to construct the union of two or more scopes. Say, for example, you have some sensitive data on your company's servers that you want to compute summaries of, but you'll be driving the computation from your laptop, and you aren't allowed to send the data itself outside of the company's network. You could accomplish this by constructing a UnionScope of ProcessScopes of each of the non-laptop Julia processes, and use that to ensure that the data in its original form always stays within the company network:","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"addprocs(4) # some local processors\nprocs = addprocs([(\"server.company.com\", 4)]) # some company processors\n\nsecrets_scope = UnionScope(ProcessScope.(procs))\n\nfunction generate_secrets()\n secrets = open(\"/shared/secret_results.txt\", \"r\") do io\n String(read(io))\n end\n Dagger.tochunk(secrets, Dagger.thunk_processor(), secrets_scope)\nend\n\nsummarize(secrets) = occursin(\"QA Pass\", secrets)\n\n# Generate the data on the first company process\nsensitive_data = Dagger.@spawn single=first(procs) generate_secrets()\n\n# We can safely call this, knowing that it will be executed on a company server\nqa_passed = Dagger.@spawn summarize(sensitive_data)","category":"page"},{"location":"scopes/#Mismatched-Scopes","page":"Scopes","title":"Mismatched Scopes","text":"","category":"section"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"You might now be thinking, \"What if I want to run a task on multiple pieces of data whose scopes don't match up?\" In such a case, Dagger will throw an error, refusing to schedule that task, since the intersection of the data scopes is an empty set (there is no feasible processor which can satisfy the scoping constraints). For example:","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"ps2 = ProcessScope(2)\nps3 = ProcessScope(3)\n\ngenerate(scope) = Dagger.tochunk(rand(64), Dagger.thunk_processor(), scope)\n\nd2 = Dagger.@spawn generate(ps2) # Run on process 2\nd3 = Dagger.@spawn generate(ps3) # Run on process 3\nres = Dagger.@spawn d2 * d3 # An error!","category":"page"},{"location":"scopes/","page":"Scopes","title":"Scopes","text":"Moral of the story: only use scopes when you know you really need them, and if you aren't careful to arrange everything just right, be prepared for Dagger to refuse to schedule your tasks! Scopes should only be used to ensure correctness of your programs, and are not intended to be used to optimize the schedule that Dagger uses for your tasks, since restricting the scope of execution for tasks will necessarily reduce the optimizations that Dagger's scheduler can perform.","category":"page"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"CurrentModule = Dagger","category":"page"},{"location":"api-dagger/functions/#Dagger-Functions","page":"Functions and Macros","title":"Dagger Functions","text":"","category":"section"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"Pages = [\"functions.md\"]","category":"page"},{"location":"api-dagger/functions/#Task-Functions/Macros","page":"Functions and Macros","title":"Task Functions/Macros","text":"","category":"section"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"@spawn\nspawn\ndelayed\n@par","category":"page"},{"location":"api-dagger/functions/#Dagger.@spawn","page":"Functions and Macros","title":"Dagger.@spawn","text":"@spawn [opts] f(args...) -> Thunk\n\nConvenience macro like Dagger.@par, but eagerly executed from the moment it's called (equivalent to spawn).\n\nSee the docs for @par for more information and usage examples.\n\n\n\n\n\n","category":"macro"},{"location":"api-dagger/functions/#Dagger.spawn","page":"Functions and Macros","title":"Dagger.spawn","text":"spawn(f, args...; kwargs...) -> EagerThunk\n\nSpawns a task with f as the function, args as the arguments, and kwargs as the keyword arguments, returning an EagerThunk. Uses a scheduler running in the background to execute code.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.delayed","page":"Functions and Macros","title":"Dagger.delayed","text":"delayed(f, options=Options())(args...; kwargs...) -> Thunk\ndelayed(f; options...)(args...; kwargs...) -> Thunk\n\nCreates a Thunk object which can be executed later, which will call f with args and kwargs. options controls various properties of the resulting Thunk.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.@par","page":"Functions and Macros","title":"Dagger.@par","text":"@par [opts] f(args...; kwargs...) -> Thunk\n\nConvenience macro to call Dagger.delayed on f with arguments args and keyword arguments kwargs. May also be called with a series of assignments like so:\n\nx = @par begin\n a = f(1,2)\n b = g(a,3)\n h(a,b)\nend\n\nx will hold the Thunk representing h(a,b); additionally, a and b will be defined in the same local scope and will be equally accessible for later calls.\n\nOptions to the Thunk can be set as opts with namedtuple syntax, e.g. single=1. Multiple options may be provided, and will be applied to all generated thunks.\n\n\n\n\n\n","category":"macro"},{"location":"api-dagger/functions/#Task-Options-Functions/Macros","page":"Functions and Macros","title":"Task Options Functions/Macros","text":"","category":"section"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"with_options\nget_options\n@option\ndefault_option","category":"page"},{"location":"api-dagger/functions/#Dagger.with_options","page":"Functions and Macros","title":"Dagger.with_options","text":"with_options(f, options::NamedTuple) -> Any\nwith_options(f; options...) -> Any\n\nSets one or more options to the given values, executes f(), resets the options to their previous values, and returns the result of f(). This is the recommended way to set options, as it only affects tasks spawned within its scope. Note that setting an option here will propagate its value across Julia or Dagger tasks spawned by f() or its callees (i.e. the options propagate).\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.get_options","page":"Functions and Macros","title":"Dagger.get_options","text":"get_options(key::Symbol, default) -> Any\nget_options(key::Symbol) -> Any\n\nReturns the value of the option named key. If option does not have a value set, then an error will be thrown, unless default is set, in which case it will be returned instead of erroring.\n\nget_options() -> NamedTuple\n\nReturns a NamedTuple of all option key-value pairs.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.@option","page":"Functions and Macros","title":"Dagger.@option","text":"@option name myfunc(A, B, C) = value\n\nA convenience macro for defining default_option. For example:\n\nDagger.@option single mylocalfunc(Int) = 1\n\nThe above call will set the single option to 1 for any Dagger task calling mylocalfunc(Int) with an Int argument.\n\n\n\n\n\n","category":"macro"},{"location":"api-dagger/functions/#Dagger.default_option","page":"Functions and Macros","title":"Dagger.default_option","text":"default_option(::Val{name}, Tf, Targs...) where name = value\n\nDefines the default value for option name to value when Dagger is preparing to execute a function with type Tf with the argument types Targs. Users and libraries may override this to set default values for tasks.\n\nAn easier way to define these defaults is with @option.\n\nNote that the actual task's argument values are not passed, as it may not always be possible or efficient to gather all Dagger task arguments on one worker.\n\nThis function may be executed within the scheduler, so it should generally be made very cheap to execute. If the function throws an error, the scheduler will use whatever the global default value is for that option instead.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Data-Management-Functions","page":"Functions and Macros","title":"Data Management Functions","text":"","category":"section"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"tochunk\n@mutable\n@shard\nshard","category":"page"},{"location":"api-dagger/functions/#Dagger.tochunk","page":"Functions and Macros","title":"Dagger.tochunk","text":"tochunk(x, proc::Processor, scope::AbstractScope; device=nothing, kwargs...) -> Chunk\n\nCreate a chunk from data x which resides on proc and which has scope scope.\n\ndevice specifies a MemPool.StorageDevice (which is itself wrapped in a Chunk) which will be used to manage the reference contained in the Chunk generated by this function. If device is nothing (the default), the data will be inspected to determine if it's safe to serialize; if so, the default MemPool storage device will be used; if not, then a MemPool.CPURAMDevice will be used.\n\nAll other kwargs are passed directly to MemPool.poolset.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.@shard","page":"Functions and Macros","title":"Dagger.@shard","text":"Creates a Shard. See Dagger.shard for details.\n\n\n\n\n\n","category":"macro"},{"location":"api-dagger/functions/#Dagger.shard","page":"Functions and Macros","title":"Dagger.shard","text":"shard(f; kwargs...) -> Chunk{Shard}\n\nExecutes f on all workers in workers, wrapping the result in a process-scoped Chunk, and constructs a Chunk{Shard} containing all of these Chunks on the current worker.\n\nKeyword arguments:\n\nprocs – The list of processors to create pieces on. May be any iterable container of Processors.\nworkers – The list of workers to create pieces on. May be any iterable container of Integers.\nper_thread::Bool=false – If true, creates a piece per each thread, rather than a piece per each worker.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Scope-Functions","page":"Functions and Macros","title":"Scope Functions","text":"","category":"section"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"scope\nconstrain","category":"page"},{"location":"api-dagger/functions/#Dagger.scope","page":"Functions and Macros","title":"Dagger.scope","text":"scope(scs...) -> AbstractScope\nscope(;scs...) -> AbstractScope\n\nConstructs an AbstractScope from a set of scope specifiers. Each element in scs is a separate specifier; if scs is empty, an empty UnionScope() is produced; if scs has one element, then exactly one specifier is constructed; if scs has more than one element, a UnionScope of the scopes specified by scs is constructed. A variety of specifiers can be passed to construct a scope:\n\n:any - Constructs an AnyScope()\n:default - Constructs a DefaultScope()\n(scs...,) - Constructs a UnionScope of scopes, each specified by scs\nthread=tid or threads=[tids...] - Constructs an ExactScope or UnionScope containing all Dagger.ThreadProcs with thread ID tid/tids across all workers.\nworker=wid or workers=[wids...] - Constructs a ProcessScope or UnionScope containing all Dagger.ThreadProcs with worker ID wid/wids across all threads.\nthread=tid/threads=tids and worker=wid/workers=wids - Constructs an ExactScope, ProcessScope, or UnionScope containing all Dagger.ThreadProcs with worker ID wid/wids and threads tid/tids.\n\nAside from the worker and thread specifiers, it's possible to add custom specifiers for scoping to other kinds of processors (like GPUs) or providing different ways to specify a scope. Specifier selection is determined by a precedence ordering: by default, all specifiers have precedence 0, which can be changed by defining scope_key_precedence(::Val{spec}) = precedence (where spec is the specifier as a Symbol). The specifier with the highest precedence in a set of specifiers is used to determine the scope by calling to_scope(::Val{spec}, sc::NamedTuple) (where sc is the full set of specifiers), which should be overriden for each custom specifier, and which returns an AbstractScope. For example:\n\n# Setup a GPU specifier\nDagger.scope_key_precedence(::Val{:gpu}) = 1\nDagger.to_scope(::Val{:gpu}, sc::NamedTuple) = ExactScope(MyGPUDevice(sc.worker, sc.gpu))\n\n# Generate an `ExactScope` for `MyGPUDevice` on worker 2, device 3\nDagger.scope(gpu=3, worker=2)\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.constrain","page":"Functions and Macros","title":"Dagger.constrain","text":"constraint(x::AbstractScope, y::AbstractScope) -> ::AbstractScope\n\nConstructs a scope that is the intersection of scopes x and y.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Lazy-Task-Functions","page":"Functions and Macros","title":"Lazy Task Functions","text":"","category":"section"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"domain\ncompute\ndependents\nnoffspring\norder\ntreereduce","category":"page"},{"location":"api-dagger/functions/#Dagger.domain","page":"Functions and Macros","title":"Dagger.domain","text":"domain(x::T)\n\nReturns metadata about x. This metadata will be in the domain field of a Chunk object when an object of type T is created as the result of evaluating a Thunk.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.compute","page":"Functions and Macros","title":"Dagger.compute","text":"compute(ctx::Context, d::Thunk; options=nothing) -> Chunk\n\nCompute a Thunk - creates the DAG, assigns ranks to nodes for tie breaking and runs the scheduler with the specified options. Returns a Chunk which references the result.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.dependents","page":"Functions and Macros","title":"Dagger.dependents","text":"dependents(node::Thunk) -> Dict{Union{Thunk,Chunk}, Set{Thunk}}\n\nFind the set of direct dependents for each task.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.noffspring","page":"Functions and Macros","title":"Dagger.noffspring","text":"noffspring(dpents::Dict{Union{Thunk,Chunk}, Set{Thunk}}) -> Dict{Thunk, Int}\n\nRecursively find the number of tasks dependent on each task in the DAG. Takes a Dict as returned by dependents.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.order","page":"Functions and Macros","title":"Dagger.order","text":"order(node::Thunk, ndeps) -> Dict{Thunk,Int}\n\nGiven a root node of the DAG, calculates a total order for tie-breaking.\n\nRoot node gets score 1,\nrest of the nodes are explored in DFS fashion but chunks of each node are explored in order of noffspring, i.e. total number of tasks depending on the result of the said node.\n\nArgs:\n\nnode: root node\nndeps: result of noffspring\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.treereduce","page":"Functions and Macros","title":"Dagger.treereduce","text":"Tree reduce\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Processor-Functions","page":"Functions and Macros","title":"Processor Functions","text":"","category":"section"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"execute!\niscompatible\ndefault_enabled\nget_processors\nget_parent\nmove\nget_tls\nset_tls!","category":"page"},{"location":"api-dagger/functions/#Dagger.execute!","page":"Functions and Macros","title":"Dagger.execute!","text":"execute!(proc::Processor, f, args...; kwargs...) -> Any\n\nExecutes the function f with arguments args and keyword arguments kwargs on processor proc. This function can be overloaded by Processor subtypes to allow executing function calls differently than normal Julia.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.iscompatible","page":"Functions and Macros","title":"Dagger.iscompatible","text":"iscompatible(proc::Processor, opts, f, Targs...) -> Bool\n\nIndicates whether proc can execute f over Targs given opts. Processor subtypes should overload this function to return true if and only if it is essentially guaranteed that f(::Targs...) is supported. Additionally, iscompatible_func and iscompatible_arg can be overriden to determine compatibility of f and Targs individually. The default implementation returns false.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.default_enabled","page":"Functions and Macros","title":"Dagger.default_enabled","text":"default_enabled(proc::Processor) -> Bool\n\nReturns whether processor proc is enabled by default. The default value is false, which is an opt-out of the processor from execution when not specifically requested by the user, and true implies opt-in, which causes the processor to always participate in execution when possible.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.get_processors","page":"Functions and Macros","title":"Dagger.get_processors","text":"get_processors(proc::Processor) -> Set{<:Processor}\n\nReturns the set of processors contained in proc, if any. Processor subtypes should overload this function if they can contain sub-processors. The default method will return a Set containing proc itself.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.get_parent","page":"Functions and Macros","title":"Dagger.get_parent","text":"get_parent(proc::Processor) -> Processor\n\nReturns the parent processor for proc. The ultimate parent processor is an OSProc. Processor subtypes should overload this to return their most direct parent.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.move","page":"Functions and Macros","title":"Dagger.move","text":"move(from_proc::Processor, to_proc::Processor, x)\n\nMoves and/or converts x such that it's available and suitable for usage on the to_proc processor. This function can be overloaded by Processor subtypes to transport arguments and convert them to an appropriate form before being used for exection. Subtypes of Processor wishing to implement efficient data movement should provide implementations where x::Chunk.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.get_tls","page":"Functions and Macros","title":"Dagger.get_tls","text":"get_tls()\n\nGets all Dagger TLS variable as a NamedTuple.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.set_tls!","page":"Functions and Macros","title":"Dagger.set_tls!","text":"set_tls!(tls)\n\nSets all Dagger TLS variables from the NamedTuple tls.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Context-Functions","page":"Functions and Macros","title":"Context Functions","text":"","category":"section"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"addprocs!\nrmprocs!","category":"page"},{"location":"api-dagger/functions/#Dagger.addprocs!","page":"Functions and Macros","title":"Dagger.addprocs!","text":"addprocs!(ctx::Context, xs)\n\nAdd new workers xs to ctx.\n\nWorkers will typically be assigned new tasks in the next scheduling iteration if scheduling is ongoing.\n\nWorkers can be either Processors or the underlying process IDs as Integers.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.rmprocs!","page":"Functions and Macros","title":"Dagger.rmprocs!","text":"rmprocs!(ctx::Context, xs)\n\nRemove the specified workers xs from ctx.\n\nWorkers will typically finish all their assigned tasks if scheduling is ongoing but will not be assigned new tasks after removal.\n\nWorkers can be either Processors or the underlying process IDs as Integers.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Thunk-Execution-Environment-Functions","page":"Functions and Macros","title":"Thunk Execution Environment Functions","text":"","category":"section"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"These functions are used within the function called by a Thunk.","category":"page"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"in_thunk\nthunk_processor","category":"page"},{"location":"api-dagger/functions/#Dagger.in_thunk","page":"Functions and Macros","title":"Dagger.in_thunk","text":"in_thunk()\n\nReturns true if currently in a Thunk process, else false.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.thunk_processor","page":"Functions and Macros","title":"Dagger.thunk_processor","text":"thunk_processor()\n\nGet the current processor executing the current thunk.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dynamic-Scheduler-Control-Functions","page":"Functions and Macros","title":"Dynamic Scheduler Control Functions","text":"","category":"section"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"These functions query and control the scheduler remotely.","category":"page"},{"location":"api-dagger/functions/","page":"Functions and Macros","title":"Functions and Macros","text":"Sch.sch_handle\nSch.add_thunk!\nBase.fetch\nBase.wait\nSch.exec!\nSch.halt!\nSch.get_dag_ids","category":"page"},{"location":"api-dagger/functions/#Dagger.Sch.sch_handle","page":"Functions and Macros","title":"Dagger.Sch.sch_handle","text":"Gets the scheduler handle for the currently-executing thunk.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.Sch.add_thunk!","page":"Functions and Macros","title":"Dagger.Sch.add_thunk!","text":"Adds a new Thunk to the DAG.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Base.fetch","page":"Functions and Macros","title":"Base.fetch","text":"Waits on a thunk to complete, and fetches its result.\n\n\n\n\n\nBase.fetch(c::DArray)\n\nIf a DArray tree has a Thunk in it, make the whole thing a big thunk.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Base.wait","page":"Functions and Macros","title":"Base.wait","text":"Waits on a thunk to complete.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.Sch.exec!","page":"Functions and Macros","title":"Dagger.Sch.exec!","text":"Executes an arbitrary function within the scheduler, returning the result.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.Sch.halt!","page":"Functions and Macros","title":"Dagger.Sch.halt!","text":"Commands the scheduler to halt execution immediately.\n\n\n\n\n\n","category":"function"},{"location":"api-dagger/functions/#Dagger.Sch.get_dag_ids","page":"Functions and Macros","title":"Dagger.Sch.get_dag_ids","text":"Returns all Thunks IDs as a Dict, mapping a Thunk to its downstream dependents.\n\n\n\n\n\n","category":"function"},{"location":"data-management/#Data-Management","page":"Data Management","title":"Data Management","text":"","category":"section"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"Dagger is not just a computing platform - it also has awareness of where each piece of data resides, and will move data between workers and perform conversions as necessary to satisfy the needs of your tasks.","category":"page"},{"location":"data-management/#Chunks","page":"Data Management","title":"Chunks","text":"","category":"section"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"Dagger often needs to move data between workers to allow a task to execute. To make this efficient when communicating potentially large units of data, Dagger uses a remote reference, called a Chunk, to refer to objects which may exist on another worker. Chunks are backed by a distributed refcounting mechanism provided by MemPool.jl, which ensures that the referenced data is not garbage collected until all Chunks referencing that object are GC'd from all workers.","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"Conveniently, if you pass in a Chunk object as an input to a Dagger task, then the task's payload function will get executed with the value contained in the Chunk. The scheduler also understands Chunks, and will try to schedule tasks close to where their Chunk inputs reside, to reduce communication overhead.","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"Chunks also have a cached type, a \"processor\", and a \"scope\", which are important for identifying the type of the object, where in memory (CPU RAM, GPU VRAM, etc.) the value resides, and where the value is allowed to be transferred and dereferenced. See Processors and Scopes for more details on how these properties can be used to control scheduling behavior around Chunks.","category":"page"},{"location":"data-management/#Mutation","page":"Data Management","title":"Mutation","text":"","category":"section"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"Normally, Dagger tasks should be functional and \"pure\": never mutating their inputs, always producing identical outputs for a given set of inputs, and never producing side effects which might affect future program behavior. However, for certain codes, this restriction ends up costing the user performance and engineering time to work around.","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"Thankfully, Dagger provides the Dagger.@mutable macro for just this purpose. @mutable allows data to be marked such that it will never be copied or serialized by the scheduler (unless copied by the user). When used as an argument to a task, the task will be forced to execute on the same worker that @mutable was called on. For example:","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"Dagger.@mutable worker=2 Threads.Atomic{Int}(0)\nx::Dagger.Chunk # The result is always a `Chunk`\n\n# x is now considered mutable, and may only be accessed on worker 2:\nwait(Dagger.@spawn Threads.atomic_add!(x, 1)) # Always executed on worker 2\nwait(Dagger.@spawn scope=Dagger.scope(worker=1) Threads.atomic_add!(x, 1)) # SchedulingException","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"@mutable, when called as above, is constructed on worker 2, and the data gains a scope of ProcessScope(myid()), which means that any processor on that worker is allowed to execute tasks that use the object (subject to the usual scheduling rules).","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"@mutable also allows the scope to be manually supplied, if more specific restrictions are desirable:","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"x = @mutable scope=Dagger.scope(worker=1, threads=[3,4]) rand(100)\n# x is now scoped to threads 3 and 4 on worker `myid()`","category":"page"},{"location":"data-management/#Sharding","page":"Data Management","title":"Sharding","text":"","category":"section"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"@mutable is convenient for creating a single mutable object, but often one wants to have multiple mutable objects, with each object being scoped to their own worker or thread in the cluster, to be used as local counters, partial reduction containers, data caches, etc.","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"The Shard object (constructed with Dagger.@shard/Dagger.shard) is a mechanism by which such a setup can be created with one invocation. By default, each worker will have their own local object which will be used when a task that uses the shard as an argument is scheduled on that worker. Other shard pieces that aren't scoped to the processor being executed on will not be serialized or copied, keeping communication costs constant even with a very large shard.","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"This mechanism makes it easy to construct a distributed set of mutable objects which are treated as \"mirrored shards\" by the scheduler, but require no further user input to access. For example, creating and using a local counter for each worker is trivial:","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"# Create a local atomic counter on each worker that Dagger knows about:\ncs = Dagger.@shard Threads.Atomic{Int}(0)\n\n# Let's add `1` to the local counter, not caring about which worker we're on:\nwait.([Dagger.@spawn Threads.atomic_add!(cs, 1) for i in 1:1000])\n\n# And let's fetch the total sum of all counters:\n@assert sum(map(ctr->fetch(ctr)[], cs)) == 1000","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"Note that map, when used on a shard, will execute the provided function once per shard \"piece\", and each result is considered immutable. map is an easy way to make a copy of each piece of the shard, to be later reduced, scanned, etc.","category":"page"},{"location":"data-management/","page":"Data Management","title":"Data Management","text":"Further details about what arguments can be passed to @shard/shard can be found in Data Management Functions.","category":"page"},{"location":"processors/#Processors","page":"Processors","title":"Processors","text":"","category":"section"},{"location":"processors/","page":"Processors","title":"Processors","text":"Dagger contains a flexible mechanism to represent CPUs, GPUs, and other devices that the scheduler can place user work on. The individual devices that are capable of computing a user operation are called \"processors\", and are subtypes of Dagger.Processor. Processors are automatically detected by Dagger at scheduler initialization, and placed in a hierarchy reflecting the physical (network-, link-, or memory-based) boundaries between processors in the hierarchy. The scheduler uses the information in this hierarchy to efficiently schedule and partition user operations.","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"Dagger's Chunk objects can have a processor associated with them that defines where the contained data \"resides\". Each processor has a set of functions that define the mechanisms and rules by which the data can be transferred between similar or different kinds of processors, and will be called by Dagger's scheduler automatically when fetching function arguments (or the function itself) for computation on a given processor.","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"Setting the processor on a function argument is done by wrapping it in a Chunk with Dagger.tochunk:","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"a = 1\nb = 2\n# Let's say `b` \"resides\" on the second thread of the first worker:\nb_chunk = Dagger.tochunk(b, Dagger.ThreadProc(1, 2))::Dagger.Chunk\nc = Dagger.@spawn a + b_chunk\nfetch(c) == 3","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"It's also simple to set the processor of the function being passed; it will be automatically wrapped in a Chunk if necessary:","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"# `+` is treated as existing on the second thread of the first worker:\nDagger.@spawn processor=Dagger.ThreadProc(1, 2) a + b","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"You can also tell Dagger about the processor type for the returned value of a task by making it a Chunk:","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"Dagger.spawn(a) do a\n c = a + 1\n return Dagger.tochunk(c, Dagger.ThreadProc(1, 2))\nend","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"Note that unless you know that your function, arguments, or return value are associated with a specific processor, you don't need to assign one to them. Dagger will treat them as being simple values with no processor association, and will serialize them to wherever they're used.","category":"page"},{"location":"processors/#Hardware-capabilities,-topology,-and-data-locality","page":"Processors","title":"Hardware capabilities, topology, and data locality","text":"","category":"section"},{"location":"processors/","page":"Processors","title":"Processors","text":"The processor hierarchy is modeled as a multi-root tree, where each root is an OSProc, which represents a Julia OS process, and the \"children\" of the root or some other branch in the tree represent the processors which reside on the same logical server as the \"parent\" branch. All roots are connected to each other directly, in the common case. The processor hierarchy's topology is automatically detected and elaborated by callbacks in Dagger, which users may manipulate to add detection of extra processors.","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"A move between a given pair of processors is implemented as a Julia function dispatching on the types of each processor, as well as the type of the data being moved. Users are permitted to define custom move functions to improve data movement efficiency, perform automatic value conversions, or even make use of special IPC facilities. Custom processors may also be defined by the user to represent a processor type which is not automatically detected by Dagger, such as novel GPUs, special OS process abstractions, FPGAs, etc.","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"Movement of data between any two processors A and B (from A to B), if not defined by the user, is decomposed into 3 moves: processor A to OSProc parent of A, OSProc parent of A to OSProc parent of B, and OSProc parent of B to processor B. This mechanism uses Julia's Serialization library to serialize and deserialize data, so data must be serializable for this mechanism to work properly.","category":"page"},{"location":"processors/#Processor-Selection","page":"Processors","title":"Processor Selection","text":"","category":"section"},{"location":"processors/","page":"Processors","title":"Processors","text":"By default, Dagger uses the CPU to process work, typically single-threaded per cluster node. However, Dagger allows access to a wider range of hardware and software acceleration techniques, such as multithreading and GPUs. These more advanced (but performant) accelerators are disabled by default, but can easily be enabled by using scopes (see Scopes for details).","category":"page"},{"location":"processors/#Resource-Control","page":"Processors","title":"Resource Control","text":"","category":"section"},{"location":"processors/","page":"Processors","title":"Processors","text":"Dagger assumes that a thunk executing on a processor, fully utilizes that processor at 100%. When this is not the case, you can tell Dagger as much with options.procutil:","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"procutil = Dict(\n Dagger.ThreadProc => 4.0, # utilizes 4 CPU threads fully\n DaggerGPU.CuArrayProc => 0.1 # utilizes 10% of a single CUDA GPU\n)","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"Dagger will use this information to execute only as many thunks on a given processor (or set of similar processors) as add up to less than or equal to 1.0 total utilization. If a thunk is scheduled onto a processor which the local worker deems as \"oversubscribed\", it will not execute the thunk until sufficient resources become available by thunks completing execution.","category":"page"},{"location":"processors/#GPU-Processors","page":"Processors","title":"GPU Processors","text":"","category":"section"},{"location":"processors/","page":"Processors","title":"Processors","text":"The DaggerGPU.jl package can be imported to enable GPU acceleration for NVIDIA and AMD GPUs, when available. The processors provided by that package are not enabled by default, but may be enabled via custom scopes (Scopes).","category":"page"},{"location":"processors/#Future:-Network-Devices-and-Topology","page":"Processors","title":"Future: Network Devices and Topology","text":"","category":"section"},{"location":"processors/","page":"Processors","title":"Processors","text":"In the future, users will be able to define network devices attached to a given processor, which provides a direct connection to a network device on another processor, and may be used to transfer data between said processors. Data movement rules will most likely be defined by a similar (or even identical) mechanism to the current processor move mechanism. The multi-root tree will be expanded to a graph to allow representing these network devices (as they may potentially span non-root nodes).","category":"page"},{"location":"processors/#Redundancy","page":"Processors","title":"Redundancy","text":"","category":"section"},{"location":"processors/#Fault-Tolerance","page":"Processors","title":"Fault Tolerance","text":"","category":"section"},{"location":"processors/","page":"Processors","title":"Processors","text":"Dagger has a single means for ensuring redundancy, which is currently called \"fault tolerance\". Said redundancy is only targeted at a specific failure mode, namely the unexpected exit or \"killing\" of a worker process in the cluster. This failure mode often presents itself when running on a Linux and generating large memory allocations, where the Out Of Memory (OOM) killer process can kill user processes to free their allocated memory for the Linux kernel to use. The fault tolerance system mitigates the damage caused by the OOM killer performing its duties on one or more worker processes by detecting the fault as a process exit exception (generated by Julia), and then moving any \"lost\" work to other worker processes for re-computation.","category":"page"},{"location":"processors/#Future:-Multi-master,-Network-Failure-Correction,-etc.","page":"Processors","title":"Future: Multi-master, Network Failure Correction, etc.","text":"","category":"section"},{"location":"processors/","page":"Processors","title":"Processors","text":"This single redundancy mechanism helps alleviate a common issue among HPC and scientific users, however it does little to help when, for example, the master node exits, or a network link goes down. Such failure modes require a more complicated detection and recovery process, including multiple master processes, a distributed and replicated database such as etcd, and checkpointing of the scheduler to ensure an efficient recovery. Such a system does not yet exist, but contributions for such a change are desired.","category":"page"},{"location":"processors/#Dynamic-worker-pools","page":"Processors","title":"Dynamic worker pools","text":"","category":"section"},{"location":"processors/","page":"Processors","title":"Processors","text":"Dagger's default scheduler supports modifying the worker pool while the scheduler is running. This is done by modifying the Processors of the Context supplied to the scheduler at initialization using addprocs!(ctx, ps) and rmprocs(ctx, ps) where ps can be Processors or just process ids.","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"An example of when this is useful is in HPC environments where individual jobs to start up workers are queued so that not all workers are guaranteed to be available at the same time.","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"New workers will typically be assigned new tasks as soon as the scheduler sees them. Removed workers will finish all their assigned tasks but will not be assigned any new tasks. Note that this makes it difficult to determine when a worker is no longer in use by Dagger. Contributions to alleviate this uncertainty are welcome!","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"Example:","category":"page"},{"location":"processors/","page":"Processors","title":"Processors","text":"using Distributed\n\nps1 = addprocs(2, exeflags=\"--project\")\n@everywhere using Distributed, Dagger\n\n# Dummy task to wait for 0.5 seconds and then return the id of the worker\nts = delayed(vcat)((delayed(i -> (sleep(0.5); myid()))(i) for i in 1:20)...)\n\nctx = Context()\n# Scheduler is blocking, so we need a new task to add workers while it runs\njob = @async collect(ctx, ts)\n\n# Lets fire up some new workers\nps2 = addprocs(2, exeflags=\"--project\")\n@everywhere ps2 using Distributed, Dagger\n# New workers are not available until we do this\naddprocs!(ctx, ps2)\n\n# Lets hope the job didn't complete before workers were added :)\n@show fetch(job) |> unique\n\n# and cleanup after ourselves...\nworkers() |> rmprocs","category":"page"},{"location":"api-daggerwebdash/types/","page":"Types","title":"Types","text":"CurrentModule = DaggerWebDash","category":"page"},{"location":"api-daggerwebdash/types/#DaggerWebDash-Types","page":"Types","title":"DaggerWebDash Types","text":"","category":"section"},{"location":"api-daggerwebdash/types/","page":"Types","title":"Types","text":"Pages = [\"types.md\"]","category":"page"},{"location":"api-daggerwebdash/types/#Logging-Event-Types","page":"Types","title":"Logging Event Types","text":"","category":"section"},{"location":"api-daggerwebdash/types/","page":"Types","title":"Types","text":"D3Renderer\nTableStorage\nProfileMetrics","category":"page"},{"location":"api-daggerwebdash/types/#DaggerWebDash.D3Renderer","page":"Types","title":"DaggerWebDash.D3Renderer","text":"D3Renderer(port::Int, port_range::UnitRange; seek_store=nothing) -> D3Renderer\n\nConstructs a D3Renderer, which is a TimespanLogging aggregator which renders the logs over HTTP using the d3.js library. port is the port that will be serving the HTTP website. port_range specifies a range of ports that will be used to listen for connections from other Dagger workers. seek_store, if specified, is a Tables.jl-compatible object that logs will be written to and read from. This table can be written to disk and then re-read later for offline log analysis.\n\n\n\n\n\n","category":"type"},{"location":"api-daggerwebdash/types/#DaggerWebDash.TableStorage","page":"Types","title":"DaggerWebDash.TableStorage","text":"TableStorage\n\nLogWindow-compatible aggregator which stores logs in a Tables.jl-compatible sink.\n\nUsing a TableStorage is reasonably simple:\n\nml = TimespanLogging.MultiEventLog()\n\n... # Add some events\n\nlw = TimespanLogging.LogWindow(5*10^9, :core)\n\n# Create a DataFrame with one Any[] for each event\ndf = DataFrame([key=>[] for key in keys(ml.consumers)]...)\n\n# Create the TableStorage and register its creation handler\nts = DaggerWebDash.TableStorage(df)\npush!(lw.creation_handlers, ts)\n\nml.aggregators[:lw] = lw\n\n# Logs will now be saved into `df` automatically, and packages like\n# DaggerWebDash.jl will automatically use it to retrieve subsets of the logs.\n\n\n\n\n\n","category":"type"},{"location":"api-daggerwebdash/types/#DaggerWebDash.ProfileMetrics","page":"Types","title":"DaggerWebDash.ProfileMetrics","text":"ProfileMetrics\n\nTracks compute profile traces.\n\n\n\n\n\n","category":"type"},{"location":"#Dagger:-A-framework-for-out-of-core-and-parallel-execution","page":"Home","title":"Dagger: A framework for out-of-core and parallel execution","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Dagger.jl is a framework for parallel computing across all kinds of resources, like CPUs and GPUs, and across multiple threads and multiple servers.","category":"page"},{"location":"","page":"Home","title":"Home","text":"","category":"page"},{"location":"#Quickstart:-Task-Spawning","page":"Home","title":"Quickstart: Task Spawning","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"For more details: Task Spawning","category":"page"},{"location":"#Launch-a-task","page":"Home","title":"Launch a task","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"If you want to call a function myfunc with arguments arg1, arg2, arg3, and keyword argument color=:red:","category":"page"},{"location":"","page":"Home","title":"Home","text":"function myfunc(arg1, arg2, arg3; color=:blue)\n arg_total = arg1 + arg2 * arg3\n printstyled(arg_total; color)\n return arg_total\nend\nt = Dagger.@spawn myfunc(arg1, arg2, arg3; color=:red)","category":"page"},{"location":"","page":"Home","title":"Home","text":"This will run the function asynchronously; you can fetch its result with fetch(t), or just wait on it to complete with wait(t). If the call to myfunc throws an error, fetch(t) will rethrow it.","category":"page"},{"location":"","page":"Home","title":"Home","text":"If running Dagger with multiple workers, make sure to define myfunc with @everywhere from the Distributed stdlib.","category":"page"},{"location":"#Launch-a-task-with-an-anonymous-function","page":"Home","title":"Launch a task with an anonymous function","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"It's more convenient to use Dagger.spawn for anonymous functions. Taking the previous example, but using an anonymous function instead of myfunc:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Dagger.spawn((arg1, arg2, arg3; color=:blue) -> begin\n arg_total = arg1 + arg2 * arg3\n printstyled(arg_total; color)\n return arg_total\nend, arg1, arg2, arg3; color=:red)","category":"page"},{"location":"","page":"Home","title":"Home","text":"spawn is functionally identical to @spawn, but can be more or less convenient to use, depending on what you're trying to do.","category":"page"},{"location":"#Launch-many-tasks-and-wait-on-them-all-to-complete","page":"Home","title":"Launch many tasks and wait on them all to complete","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"@spawn participates in @sync blocks, just like @async and Threads.@spawn, and will cause @sync to wait until all the tasks have completed:","category":"page"},{"location":"","page":"Home","title":"Home","text":"@sync for result in simulation_results\n Dagger.@spawn send_result_to_database(result)\nend\nnresults = length(simulation_results)\nwait(Dagger.@spawn update_database_result_count(nresults))","category":"page"},{"location":"","page":"Home","title":"Home","text":"Above, update_database_result_count will only run once all send_result_to_database calls have completed.","category":"page"},{"location":"","page":"Home","title":"Home","text":"Note that other APIs (including spawn) do not participate in @sync blocks.","category":"page"},{"location":"#Run-a-task-on-a-specific-Distributed-worker","page":"Home","title":"Run a task on a specific Distributed worker","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Dagger uses Scopes to control where tasks can execute. There's a handy constructor, Dagger.scope, that makes defining scopes easy:","category":"page"},{"location":"","page":"Home","title":"Home","text":"w2_only = Dagger.scope(worker=2)\nDagger.@spawn scope=w2_only myfunc(arg1, arg2, arg3; color=:red)","category":"page"},{"location":"","page":"Home","title":"Home","text":"Now the launched task will definitely execute on worker 2 (or if it's not possible to run on worker 2, Dagger will throw an error when you try to fetch the result).","category":"page"},{"location":"#Parallelize-nested-loops","page":"Home","title":"Parallelize nested loops","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Nested loops are a very common pattern in Julia, yet it's often difficult to parallelize them efficiently with @threads or @distributed/pmap. Thankfully, this kind of problem is quite easy for Dagger to handle; here is an example of parallelizing a two-level nested loop, where the inner loop computations (g) depend on an outer loop computation (f):","category":"page"},{"location":"","page":"Home","title":"Home","text":"@everywhere begin\n using Random\n Random.seed!(0)\n\n # Some \"expensive\" functions that complete at different speeds\n const crn = abs.(randn(20, 7))\n f(i) = sleep(crn[i, 7])\n g(i, j, y) = sleep(crn[i, j])\nend\nfunction nested_dagger()\n @sync for i in 1:20\n y = Dagger.@spawn f(i)\n for j in 1:6\n z = Dagger.@spawn g(i, j, y)\n end\n end\nend","category":"page"},{"location":"","page":"Home","title":"Home","text":"And the equivalent (and less performant) example with Threads.@threads, either parallelizing the inner or outer loop:","category":"page"},{"location":"","page":"Home","title":"Home","text":"function nested_threads_outer()\n Threads.@threads for i in 1:20\n y = f(i)\n for j in 1:6\n z = g(i, j, y)\n end\n end\nend\nfunction nested_threads_inner()\n for i in 1:20\n y = f(i)\n Threads.@threads for j in 1:6\n z = g(i, j, y)\n end\n end\nend","category":"page"},{"location":"","page":"Home","title":"Home","text":"Unlike Threads.@threads (which is really only intended to be used for a single loop, unnested), Dagger.@spawn is capable of parallelizing across both loop levels seamlessly, using the dependencies between f and g to determine the correct order to execute tasks.","category":"page"},{"location":"","page":"Home","title":"Home","text":"","category":"page"},{"location":"#Quickstart:-Data-Management","page":"Home","title":"Quickstart: Data Management","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"For more details: Data Management","category":"page"},{"location":"#Operate-on-mutable-data-in-place","page":"Home","title":"Operate on mutable data in-place","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Dagger usually assumes that you won't be modifying the arguments passed to your functions, but you can tell Dagger you plan to mutate them with @mutable:","category":"page"},{"location":"","page":"Home","title":"Home","text":"A = Dagger.@mutable rand(1000, 1000)\nDagger.@spawn accumulate!(+, A, A)","category":"page"},{"location":"","page":"Home","title":"Home","text":"This will lock A (and any tasks that use it) to the current worker. You can also lock it to a different worker by creating the data within a task:","category":"page"},{"location":"","page":"Home","title":"Home","text":"A = Dagger.spawn() do\n Dagger.@mutable rand(1000, 1000)\nend","category":"page"},{"location":"","page":"Home","title":"Home","text":"or by specifying the worker argument to @mutable:","category":"page"},{"location":"","page":"Home","title":"Home","text":"A = Dagger.@mutable worker=2 rand(1000, 1000)","category":"page"},{"location":"#Operate-on-distributed-data","page":"Home","title":"Operate on distributed data","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Often we want to work with more than one piece of data; the common case of wanting one piece of data per worker is easy to do by using @shard:","category":"page"},{"location":"","page":"Home","title":"Home","text":"X = Dagger.@shard myid()","category":"page"},{"location":"","page":"Home","title":"Home","text":"This will execute myid() independently on every worker in your Julia cluster, and place references to each within a Shard object called X. We can then use X in task spawning, but we'll only get the result of myid() that corresponds to the worker that the task is running on:","category":"page"},{"location":"","page":"Home","title":"Home","text":"for w in workers()\n @show fetch(Dagger.@spawn scope=Dagger.scope(worker=w) identity(X))\nend","category":"page"},{"location":"","page":"Home","title":"Home","text":"The above should print the result of myid() for each worker in worker(), as identity(X) receives only the value of X specific to that worker.","category":"page"},{"location":"#Reducing-over-distributed-data","page":"Home","title":"Reducing over distributed data","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Reductions are often parallelized by reducing a set of partitions on each worker, and then reducing those intermediate reductions on a single worker. Dagger supports this easily with @shard:","category":"page"},{"location":"","page":"Home","title":"Home","text":"A = Dagger.@shard rand(1:20, 10000)\ntemp_bins = Dagger.@shard zeros(20)\nhist! = (bins, arr) -> for elem in arr\n bins[elem] += 1\nend\nwait.([Dagger.@spawn scope=Dagger.scope(;worker) hist!(temp_bins, A) for worker in procs()])\nfinal_bins = sum(map(b->fetch(Dagger.@spawn copy(b)), temp_bins); dims=1)[1]","category":"page"},{"location":"","page":"Home","title":"Home","text":"Here, A points to unique random arrays, one on each worker, and temp_bins points to a set of histogram bins on each worker. When we @spawn hist!, Dagger passes in the random array and bins for only the specific worker that the task is run on; i.e. a call to hist! that runs on worker 2 will get a different A and temp_bins from a call to hist! on worker 3. All of the calls to hist! may run in parallel.","category":"page"},{"location":"","page":"Home","title":"Home","text":"By using map on temp_bins, we then make a copy of each worker's bins that we can safely return back to our current worker, and sum them together to get our total histogram.","category":"page"},{"location":"","page":"Home","title":"Home","text":"","category":"page"},{"location":"#Quickstart:-File-IO","page":"Home","title":"Quickstart: File IO","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Dagger has support for loading and saving files that integrates seamlessly with its task system, in the form of Dagger.File and Dagger.tofile.","category":"page"},{"location":"","page":"Home","title":"Home","text":"warn: Warn\nThese functions are not yet fully tested, so please make sure to take backups of any files that you load with them.","category":"page"},{"location":"#Loading-files-from-disk","page":"Home","title":"Loading files from disk","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"In order to load one or more files from disk, Dagger provides the File function, which creates a lazy reference to a file:","category":"page"},{"location":"","page":"Home","title":"Home","text":"f = Dagger.File(\"myfile.jls\")","category":"page"},{"location":"","page":"Home","title":"Home","text":"f is now a lazy reference to \"myfile.jls\", and its contents can be loaded automatically by just passing the object to a task:","category":"page"},{"location":"","page":"Home","title":"Home","text":"wait(Dagger.@spawn println(f))\n# Prints the loaded contents of the file","category":"page"},{"location":"","page":"Home","title":"Home","text":"By default, File assumes that the file uses Julia's Serialization format; this can be easily changed to assume Arrow format, for example:","category":"page"},{"location":"","page":"Home","title":"Home","text":"using Arrow\nf = Dagger.File(\"myfile.arrow\"; serialize=Arrow.write, deserialize=Arrow.Table)","category":"page"},{"location":"#Writing-data-to-disk","page":"Home","title":"Writing data to disk","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Saving data to disk is as easy as loading it; tofile provides this capability in a similar manner to File:","category":"page"},{"location":"","page":"Home","title":"Home","text":"A = rand(1000)\nf = Dagger.tofile(A, \"mydata.jls\")","category":"page"},{"location":"","page":"Home","title":"Home","text":"Like File, f can still be used to reference the file's data in tasks. It is likely most useful to use tofile at the end of a task to save results:","category":"page"},{"location":"","page":"Home","title":"Home","text":"function make_data()\n A = rand(1000)\n return Dagger.tofile(A, \"mydata.jls\")\nend\nfetch(Dagger.@spawn make_data())\n# Data was also written to \"mydata.jls\"","category":"page"},{"location":"","page":"Home","title":"Home","text":"tofile takes the same keyword arguments as File, allowing the format of data on disk to be specified as desired.","category":"page"},{"location":"","page":"Home","title":"Home","text":"","category":"page"},{"location":"#Quickstart:-Distributed-Arrays","page":"Home","title":"Quickstart: Distributed Arrays","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Dagger's DArray type represents a distributed array, where a single large array is implemented as a set of smaller array partitions, which may be distributed across a Julia cluster.","category":"page"},{"location":"","page":"Home","title":"Home","text":"For more details: Distributed Arrays","category":"page"},{"location":"#Distribute-an-existing-array","page":"Home","title":"Distribute an existing array","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Distributing any kind of array into a DArray is easy, just use distribute, and specify the partitioning you desire with Blocks. For example, to distribute a 16 x 16 matrix in 4 x 4 partitions:","category":"page"},{"location":"","page":"Home","title":"Home","text":"A = rand(16, 16)\nDA = distribute(A, Blocks(4, 4))","category":"page"},{"location":"#Allocate-a-distributed-array-directly","page":"Home","title":"Allocate a distributed array directly","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"To allocate a DArray, just pass your Blocks partitioning object into the appropriate allocation function, such as rand, ones, or zeros:","category":"page"},{"location":"","page":"Home","title":"Home","text":"rand(Blocks(20, 20), 100, 100)\nones(Blocks(20, 100), 100, 2000)\nzeros(Blocks(50, 20), 300, 200)","category":"page"},{"location":"#Convert-a-DArray-back-into-an-Array","page":"Home","title":"Convert a DArray back into an Array","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"To get back an Array from a DArray, just call collect:","category":"page"},{"location":"","page":"Home","title":"Home","text":"DA = rand(Blocks(32, 32), 256, 128)\ncollect(DA) # returns a `Matrix{Float64}`","category":"page"},{"location":"darray/#Distributed-Arrays","page":"Distributed Arrays","title":"Distributed Arrays","text":"","category":"section"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"The DArray, or \"distributed array\", is an abstraction layer on top of Dagger that allows loading array-like structures into a distributed environment. The DArray partitions a larger array into smaller \"blocks\" or \"chunks\", and those blocks may be located on any worker in the cluster. The DArray uses a Parallel Global Address Space (aka \"PGAS\") model for storing partitions, which means that a DArray instance contains a reference to every partition in the greater array; this provides great flexibility in allowing Dagger to choose the most efficient way to distribute the array's blocks and operate on them in a heterogeneous manner.","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"Aside: an alternative model, here termed the \"MPI\" model, is not yet supported, but would allow storing only a single partition of the array on each MPI rank in an MPI cluster. DArray support for this model is planned in the near future.","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"This should not be confused with the DistributedArrays.jl package.","category":"page"},{"location":"darray/#Creating-DArrays","page":"Distributed Arrays","title":"Creating DArrays","text":"","category":"section"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"A DArray can be created in two ways: through an API similar to the usual rand, ones, etc. calls, or by distributing an existing array with distribute. It's generally not recommended to manually construct a DArray object unless you're developing the DArray itself.","category":"page"},{"location":"darray/#Allocating-new-arrays","page":"Distributed Arrays","title":"Allocating new arrays","text":"","category":"section"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"As an example, one can allocate a random DArray by calling rand with a Blocks object as the first argument - Blocks specifies the size of partitions to be constructed, and must be the same number of dimensions as the array being allocated.","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"# Add some Julia workers\njulia> using Distributed; addprocs(6)\n6-element Vector{Int64}:\n 2\n 3\n 4\n 5\n 6\n 7\n\njulia> @everywhere using Dagger\n\njulia> DX = rand(Blocks(50, 50), 100, 100)\nDagger.DArray{Any, 2, typeof(cat)}(100, 100)","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"The rand(Blocks(50, 50), 100, 100) call specifies that a DArray matrix should be allocated which is in total 100 x 100, split into 4 blocks of size 50 x 50, and initialized with random Float64s. Many other functions, like randn, ones, and zeros can be called in this same way.","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"Note that the DArray is an asynchronous object (i.e. operations on it may execute in the background), so to force it to be materialized, fetch may need to be called:","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"julia> fetch(DX)\nDagger.DArray{Any, 2, typeof(cat)}(100, 100)","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"This doesn't change the type or values of the DArray, but it does make sure that any pending operations have completed.","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"To convert a DArray back into an Array, collect can be used to gather the data from all the Julia workers that they're on and combine them into a single Array on the worker calling collect:","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"julia> collect(DX)\n100×100 Matrix{Float64}:\n 0.610404 0.0475367 0.809016 0.311305 0.0306211 0.689645 … 0.220267 0.678548 0.892062 0.0559988\n 0.680815 0.788349 0.758755 0.0594709 0.640167 0.652266 0.331429 0.798848 0.732432 0.579534\n 0.306898 0.0805607 0.498372 0.887971 0.244104 0.148825 0.340429 0.029274 0.140624 0.292354\n 0.0537622 0.844509 0.509145 0.561629 0.566584 0.498554 0.427503 0.835242 0.699405 0.0705192\n 0.587364 0.59933 0.0624318 0.3795 0.430398 0.0853735 0.379947 0.677105 0.0305861 0.748001\n 0.14129 0.635562 0.218739 0.0629501 0.373841 0.439933 … 0.308294 0.0966736 0.783333 0.00763648\n 0.14539 0.331767 0.912498 0.0649541 0.527064 0.249595 0.826705 0.826868 0.41398 0.80321\n 0.13926 0.353158 0.330615 0.438247 0.284794 0.238837 0.791249 0.415801 0.729545 0.88308\n 0.769242 0.136001 0.950214 0.171962 0.183646 0.78294 0.570442 0.321894 0.293101 0.911913\n 0.786168 0.513057 0.781712 0.0191752 0.512821 0.621239 0.50503 0.0472064 0.0368674 0.75981\n 0.493378 0.129937 0.758052 0.169508 0.0564534 0.846092 … 0.873186 0.396222 0.284 0.0242124\n 0.12689 0.194842 0.263186 0.213071 0.535613 0.246888 0.579931 0.699231 0.441449 0.882772\n 0.916144 0.21305 0.629293 0.329303 0.299889 0.127453 0.644012 0.311241 0.713782 0.0554386\n ⋮ ⋮ ⋱\n 0.430369 0.597251 0.552528 0.795223 0.46431 0.777119 0.189266 0.499178 0.715808 0.797629\n 0.235668 0.902973 0.786537 0.951402 0.768312 0.633666 0.724196 0.866373 0.0679498 0.255039\n 0.605097 0.301349 0.758283 0.681568 0.677913 0.51507 … 0.654614 0.37841 0.86399 0.583924\n 0.824216 0.62188 0.369671 0.725758 0.735141 0.183666 0.0401394 0.522191 0.849429 0.839651\n 0.578047 0.775035 0.704695 0.203515 0.00267523 0.869083 0.0975535 0.824887 0.00787017 0.920944\n 0.805897 0.0275489 0.175715 0.135956 0.389958 0.856349 0.974141 0.586308 0.59695 0.906727\n 0.212875 0.509612 0.85531 0.266659 0.0695836 0.0551129 0.788085 0.401581 0.948216 0.00242077\n 0.512997 0.134833 0.895968 0.996953 0.422192 0.991526 … 0.838781 0.141053 0.747722 0.84489\n 0.283221 0.995152 0.61636 0.75955 0.072718 0.691665 0.151339 0.295759 0.795476 0.203072\n 0.0946639 0.496832 0.551496 0.848571 0.151074 0.625696 0.673817 0.273958 0.177998 0.563221\n 0.0900806 0.127274 0.394169 0.140403 0.232985 0.460306 0.536441 0.200297 0.970311 0.0292218\n 0.0698985 0.463532 0.934776 0.448393 0.606287 0.552196 0.883694 0.212222 0.888415 0.941097","category":"page"},{"location":"darray/#Distributing-existing-arrays","page":"Distributed Arrays","title":"Distributing existing arrays","text":"","category":"section"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"Now let's look at constructing a DArray from an existing array object; we can do this by calling distribute:","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"julia> Z = zeros(100, 500);\n\njulia> Dzeros = distribute(Z, Blocks(10, 50))\nDagger.DArray{Any, 2, typeof(cat)}(100, 500)","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"This will distribute the array partitions (in chunks of 10 x 50 matrices) across the workers in the Julia cluster in a relatively even distribution; future operations on a DArray may produce a different distribution from the one chosen by distribute.","category":"page"},{"location":"darray/#Broadcasting","page":"Distributed Arrays","title":"Broadcasting","text":"","category":"section"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"As the DArray is a subtype of AbstractArray and generally satisfies Julia's array interface, a variety of common operations (such as broadcast) work as expected:","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"julia> DX = rand(Blocks(50,50), 100, 100)\nDagger.DArray{Float64, 2, Blocks{2}, typeof(cat)}(100, 100)\n\njulia> DY = DX .+ DX\nDagger.DArray{Float64, 2, Blocks{2}, typeof(cat)}(100, 100)\n\njulia> DZ = DY .* 3\nDagger.DArray{Float64, 2, Blocks{2}, typeof(cat)}(100, 100)","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"Now, DZ will contain the result of computing (DX .+ DX) .* 3. Note that DArray objects are immutable, and operations on them are thus functional transformations of their input DArray.","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"note: Note\nSupport for mutation of DArrays is planned for a future release","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"julia> Dagger.chunks(DZ)\n2×2 Matrix{Any}:\n EagerThunk (finished) EagerThunk (finished)\n EagerThunk (finished) EagerThunk (finished)\n\njulia> Dagger.chunks(fetch(DZ))\n2×2 Matrix{Union{Thunk, Dagger.Chunk}}:\n Chunk{Matrix{Float64}, DRef, ThreadProc, AnyScope}(Matrix{Float64}, ArrayDomain{2}((1:50, 1:50)), DRef(4, 8, 0x0000000000004e20), ThreadProc(4, 1), AnyScope(), true) … Chunk{Matrix{Float64}, DRef, ThreadProc, AnyScope}(Matrix{Float64}, ArrayDomain{2}((1:50, 1:50)), DRef(2, 5, 0x0000000000004e20), ThreadProc(2, 1), AnyScope(), true)\n Chunk{Matrix{Float64}, DRef, ThreadProc, AnyScope}(Matrix{Float64}, ArrayDomain{2}((1:50, 1:50)), DRef(5, 5, 0x0000000000004e20), ThreadProc(5, 1), AnyScope(), true) Chunk{Matrix{Float64}, DRef, ThreadProc, AnyScope}(Matrix{Float64}, ArrayDomain{2}((1:50, 1:50)), DRef(3, 3, 0x0000000000004e20), ThreadProc(3, 1), AnyScope(), true)","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"Here we can see the DArray's internal representation of the partitions, which are stored as either EagerThunk objects (representing an ongoing or completed computation) or Chunk objects (which reference data which exist locally or on other Julia workers). Of course, one doesn't typically need to worry about these internal details unless implementing low-level operations on DArrays.","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"Finally, it's easy to see the results of this combination of broadcast operations; just use collect to get an Array:","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"julia> collect(DZ)\n100×100 Matrix{Float64}:\n 5.72754 1.23614 4.67045 4.89095 3.40126 … 5.07663 1.60482 5.04386 1.44755 2.5682\n 0.189402 3.64462 5.92218 3.94603 2.32192 1.47115 4.6364 0.778867 3.13838 4.87871\n 3.3492 3.96929 3.46377 1.29776 3.59547 4.82616 1.1512 3.02528 3.05538 0.139763\n 5.0981 5.72564 5.1128 0.954708 2.04515 2.50365 5.97576 5.17683 4.79587 1.80113\n 1.0737 5.25768 4.25363 0.943006 4.25783 4.1801 3.14444 3.07428 4.41075 2.90252\n 5.48746 5.17286 3.99259 0.939678 3.76034 … 0.00763076 2.98176 1.83674 1.61791 3.33216\n 1.05088 4.98731 1.24925 3.57909 2.53366 5.96733 2.35186 5.75815 3.32867 1.15317\n 0.0335647 3.52524 0.159895 5.49908 1.33206 3.51113 0.0753356 1.5557 0.884252 1.45085\n 5.27506 2.00472 0.00636555 0.461574 5.16735 2.74457 1.14679 2.39407 0.151713 0.85013\n 4.43607 4.50304 4.73833 1.92498 1.64338 4.34602 4.62612 3.28248 1.32726 5.50207\n 5.22308 2.53069 1.27758 2.62013 3.73961 … 5.91626 2.54943 5.41472 1.67197 4.09026\n 1.09684 2.53189 4.23236 0.14055 0.889771 2.20834 2.31341 5.23121 1.74341 4.00588\n 2.55253 4.1789 3.50287 4.96437 1.26724 3.04302 3.74262 5.46611 1.39375 4.13167\n 3.03291 4.43932 2.85678 1.59531 0.892166 0.414873 0.643423 4.425 5.48145 5.93383\n 0.726568 0.516686 3.00791 3.76354 3.32603 2.19812 2.15836 3.85669 3.67233 2.1261\n 2.22763 1.36281 4.41129 5.29229 1.10093 … 0.45575 4.38389 0.0526105 2.14792 2.26734\n 2.58065 1.99564 4.82657 0.485823 5.24881 2.16097 3.59942 2.25021 3.96498 0.906153\n 0.546354 0.982523 1.94377 2.43136 2.77469 4.43507 5.98402 0.692576 1.53298 1.20621\n 4.71374 4.99402 1.5876 1.81629 2.56269 1.56588 5.42296 0.160867 4.17705 1.13915\n 2.97733 2.4476 3.82752 1.3491 3.5684 1.23393 1.86595 3.97154 4.6419 4.8964\n ⋮ ⋱ ⋮\n 3.49162 2.46081 1.21659 2.96078 4.58102 5.97679 3.34463 0.202255 2.85433 0.0786219\n 0.894714 2.87079 5.09409 2.2922 3.18928 1.5886 0.163886 5.99251 0.697163 5.75684\n 2.98867 2.2115 5.07771 0.124194 3.88948 3.61176 0.0732554 4.11606 0.424547 0.621287\n 5.95438 3.45065 0.194537 3.57519 1.2266 2.93837 1.02609 5.84021 5.498 3.53337\n 2.234 0.275185 0.648536 0.952341 4.41942 … 4.78238 2.24479 3.31705 5.76518 0.621195\n 5.54212 2.24089 5.81702 1.96178 4.99409 0.30557 3.55499 0.851678 1.80504 5.81679\n 5.79409 4.86848 3.10078 4.22252 4.488 3.03427 2.32752 3.54999 0.967972 4.0385\n 3.06557 5.4993 2.44263 1.82296 0.166883 0.763588 1.59113 4.33305 2.8359 5.56667\n 3.86797 3.73251 3.14999 4.11437 0.454938 0.166886 0.303827 4.7934 3.37593 2.29402\n 0.762158 4.3716 0.897798 4.60541 2.96872 … 1.60095 0.480542 1.41945 1.33071 0.308611\n 1.20503 5.66645 4.03237 3.90194 1.55996 3.58442 4.6735 5.52211 5.46891 2.43612\n 5.51133 1.13591 3.26696 4.24821 4.60696 3.73251 3.25989 4.735 5.61674 4.32185\n 2.46529 0.444928 3.85984 5.49469 1.13501 1.36861 5.34651 0.398515 0.239671 5.36412\n 2.62837 3.99017 4.52569 3.54811 3.35515 4.13514 1.22304 1.01833 3.42534 3.58399\n 4.88289 5.09945 0.267154 3.38482 4.53408 … 3.71752 5.22216 1.39987 1.38622 5.47351\n 0.1046 3.65967 1.62098 5.33185 0.0822769 3.30334 5.90173 4.06603 5.00789 4.40601\n 1.9622 0.755491 2.12264 1.67299 2.34482 4.50632 3.84387 3.22232 5.23164 2.97735\n 4.37208 5.15253 0.346373 2.98573 5.48589 0.336134 2.25751 2.39057 1.97975 3.24243\n 3.83293 1.69017 3.00189 1.80388 3.43671 5.94085 1.27609 3.98737 0.334963 5.84865","category":"page"},{"location":"darray/","page":"Distributed Arrays","title":"Distributed Arrays","text":"A variety of other operations exist on the DArray, and it should generally behavior otherwise similar to any other AbstractArray type. If you find that it's missing an operation that you need, please file an issue!","category":"page"}] } diff --git a/dev/task-queues/index.html b/dev/task-queues/index.html index f9b6c9dc4..2d0a27831 100644 --- a/dev/task-queues/index.html +++ b/dev/task-queues/index.html @@ -1,5 +1,5 @@ -Task Queues · Dagger.jl

      Task Queues

      By default, @spawn/spawn submit tasks immediately and directly into Dagger's scheduler without modifications. However, sometimes you want to be able to tweak this behavior for a region of code; for example, when working with GPUs or other operations which operate in-place, you might want to emulate CUDA's stream semantics by ensuring that tasks execute sequentially (to avoid one kernel reading from an array while another kernel is actively writing to it). Or, you might want to ensure that a set of Dagger tasks are submitted into the scheduler all at once for benchmarking purposes or to emulate the behavior of delayed. This and more is possible through a mechanism called "task queues".

      A task queue in Dagger is an object that can be configured to accept unlaunched tasks from @spawn/spawn and either modify them or delay their launching arbitrarily. By default, Dagger tasks are enqueued through the EagerTaskQueue, which submits tasks directly into the scheduler before @spawn/spawn returns. However, Dagger also has an InOrderTaskQueue, which ensures that tasks enqueued through it execute sequentially with respect to each other. This queue can be allocated with Dagger.spawn_sequential:

      A = rand(16)
      +Task Queues · Dagger.jl

      Task Queues

      By default, @spawn/spawn submit tasks immediately and directly into Dagger's scheduler without modifications. However, sometimes you want to be able to tweak this behavior for a region of code; for example, when working with GPUs or other operations which operate in-place, you might want to emulate CUDA's stream semantics by ensuring that tasks execute sequentially (to avoid one kernel reading from an array while another kernel is actively writing to it). Or, you might want to ensure that a set of Dagger tasks are submitted into the scheduler all at once for benchmarking purposes or to emulate the behavior of delayed. This and more is possible through a mechanism called "task queues".

      A task queue in Dagger is an object that can be configured to accept unlaunched tasks from @spawn/spawn and either modify them or delay their launching arbitrarily. By default, Dagger tasks are enqueued through the EagerTaskQueue, which submits tasks directly into the scheduler before @spawn/spawn returns. However, Dagger also has an InOrderTaskQueue, which ensures that tasks enqueued through it execute sequentially with respect to each other. This queue can be allocated with Dagger.spawn_sequential:

      A = rand(16)
       B = zeros(16)
       C = zeros(16)
       function vcopy!(B, A)
      @@ -23,4 +23,4 @@
               Dagger.@spawn vcopy!(B2, A)
           end
           Dagger.@spawn vadd!(C, B1, B2)
      -end)

      Conveniently, Dagger's task queues can be nested to get the expected behavior; the above example will submit the two vcopy! tasks as a group (and they can execute concurrently), while still ensuring that those two tasks finish before the vadd! task executes.

      Warn

      Task queues do not propagate to nested tasks; if a Dagger task launches another task internally, the child task doesn't inherit the task queue that the parent task was enqueued in.

      +end)

      Conveniently, Dagger's task queues can be nested to get the expected behavior; the above example will submit the two vcopy! tasks as a group (and they can execute concurrently), while still ensuring that those two tasks finish before the vadd! task executes.

      Warn

      Task queues do not propagate to nested tasks; if a Dagger task launches another task internally, the child task doesn't inherit the task queue that the parent task was enqueued in.

      diff --git a/dev/task-spawning/index.html b/dev/task-spawning/index.html index 7cc741d41..631b1a5e0 100644 --- a/dev/task-spawning/index.html +++ b/dev/task-spawning/index.html @@ -1,5 +1,5 @@ -Task Spawning · Dagger.jl

      Task Spawning

      The main entrypoint to Dagger is @spawn:

      Dagger.@spawn [option=value]... f(args...; kwargs...)

      or spawn if it's more convenient:

      Dagger.spawn(f, Dagger.Options(options), args...; kwargs...)

      When called, it creates an EagerThunk (also known as a "thunk" or "task") object representing a call to function f with the arguments args and keyword arguments kwargs. If it is called with other thunks as args/kwargs, such as in Dagger.@spawn f(Dagger.@spawn g()), then, in this example, the function f gets passed the results of executing g(), once that result is available. If g() isn't yet finished executing, then the execution of f waits on g() to complete before executing.

      An important observation to make is that, for each argument to @spawn/spawn, if the argument is the result of another @spawn/spawn call (thus it's an EagerThunk), the argument will be computed first, and then its result will be passed into the function receiving the argument. If the argument is not an EagerThunk (instead, some other type of Julia object), it'll be passed as-is to the function f (with some exceptions).

      Options

      The Options struct in the second argument position is optional; if provided, it is passed to the scheduler to control its behavior. Options contains a NamedTuple of option key-value pairs, which can be any of:

      • Any field in Dagger.Sch.ThunkOptions (see Scheduler and Thunk options)
      • meta::Bool – Pass the input Chunk objects themselves to f and not the value contained in them

      There are also some extra optionss that can be passed, although they're considered advanced options to be used only by developers or library authors:

      • get_result::Bool – return the actual result to the scheduler instead of Chunk objects. Used when f explicitly constructs a Chunk or when return value is small (e.g. in case of reduce)
      • persist::Bool – the result of this Thunk should not be released after it becomes unused in the DAG
      • cache::Bool – cache the result of this Thunk such that if the thunk is evaluated again, one can just reuse the cached value. If it’s been removed from cache, recompute the value.

      Simple example

      Let's see a very simple directed acyclic graph (or DAG) constructed with Dagger:

      using Dagger
      +Task Spawning · Dagger.jl

      Task Spawning

      The main entrypoint to Dagger is @spawn:

      Dagger.@spawn [option=value]... f(args...; kwargs...)

      or spawn if it's more convenient:

      Dagger.spawn(f, Dagger.Options(options), args...; kwargs...)

      When called, it creates an EagerThunk (also known as a "thunk" or "task") object representing a call to function f with the arguments args and keyword arguments kwargs. If it is called with other thunks as args/kwargs, such as in Dagger.@spawn f(Dagger.@spawn g()), then, in this example, the function f gets passed the results of executing g(), once that result is available. If g() isn't yet finished executing, then the execution of f waits on g() to complete before executing.

      An important observation to make is that, for each argument to @spawn/spawn, if the argument is the result of another @spawn/spawn call (thus it's an EagerThunk), the argument will be computed first, and then its result will be passed into the function receiving the argument. If the argument is not an EagerThunk (instead, some other type of Julia object), it'll be passed as-is to the function f (with some exceptions).

      Options

      The Options struct in the second argument position is optional; if provided, it is passed to the scheduler to control its behavior. Options contains a NamedTuple of option key-value pairs, which can be any of:

      • Any field in Dagger.Sch.ThunkOptions (see Scheduler and Thunk options)
      • meta::Bool – Pass the input Chunk objects themselves to f and not the value contained in them

      There are also some extra optionss that can be passed, although they're considered advanced options to be used only by developers or library authors:

      • get_result::Bool – return the actual result to the scheduler instead of Chunk objects. Used when f explicitly constructs a Chunk or when return value is small (e.g. in case of reduce)
      • persist::Bool – the result of this Thunk should not be released after it becomes unused in the DAG
      • cache::Bool – cache the result of this Thunk such that if the thunk is evaluated again, one can just reuse the cached value. If it’s been removed from cache, recompute the value.

      Simple example

      Let's see a very simple directed acyclic graph (or DAG) constructed with Dagger:

      using Dagger
       
       add1(value) = value + 1
       add2(value) = value + 2
      @@ -51,4 +51,4 @@
       Dagger.@spawn single=1 1+2
       Dagger.spawn(+, Dagger.Options(;single=1), 1, 2)
       
      -delayed(+; single=1)(1, 2)
      +delayed(+; single=1)(1, 2)
      diff --git a/dev/use-cases/parallel-nested-loops/index.html b/dev/use-cases/parallel-nested-loops/index.html index fea3ca27a..ba433aeb7 100644 --- a/dev/use-cases/parallel-nested-loops/index.html +++ b/dev/use-cases/parallel-nested-loops/index.html @@ -1,5 +1,5 @@ -Parallel Nested Loops · Dagger.jl

      Use Case: Parallel Nested Loops

      One of the many applications of Dagger is that it can be used as a drop-in replacement for nested multi-threaded loops that would otherwise be written with Threads.@threads.

      Consider a simplified scenario where you want to calculate the maximum mean values of random samples of various lengths that have been generated by several distributions provided by the Distributions.jl package. The results should be collected into a DataFrame. We have the following function:

      using Dagger, Random, Distributions, StatsBase, DataFrames
      +Parallel Nested Loops · Dagger.jl

      Use Case: Parallel Nested Loops

      One of the many applications of Dagger is that it can be used as a drop-in replacement for nested multi-threaded loops that would otherwise be written with Threads.@threads.

      Consider a simplified scenario where you want to calculate the maximum mean values of random samples of various lengths that have been generated by several distributions provided by the Distributions.jl package. The results should be collected into a DataFrame. We have the following function:

      using Dagger, Random, Distributions, StatsBase, DataFrames
       
       function f(dist, len, reps, σ)
           v = Vector{Float64}(undef, len) # avoiding allocations
      @@ -32,4 +32,4 @@
           res.z = fetch.(res.z)
           res.σ = fetch.(res.σ)
           res
      -end

      In this code we have job interdependence. Firstly, we are calculating the standard deviation σ and than we are using that value in the function f. Since Dagger.@spawn yields an EagerThunk rather than actual values, we need to use the fetch function to obtain those values. In this example, the value fetching is perfomed once all computations are completed (note that @sync preceding the loop forces the loop to wait for all jobs to complete). Also, note that contrary to the previous example, we do not need to implement locking as we are just pushing the EagerThunk results of Dagger.@spawn serially into the DataFrame (which is fast since Dagger.@spawn doesn't block).

      The above use case scenario has been tested by running julia -t 8 (or with JULIA_NUM_THREADS=8 as environment variable). The Threads.@threads code takes 1.8 seconds to run, while the Dagger code, which is also one line shorter, runs around 0.9 seconds, resulting in a 2x speedup.

      +end

      In this code we have job interdependence. Firstly, we are calculating the standard deviation σ and than we are using that value in the function f. Since Dagger.@spawn yields an EagerThunk rather than actual values, we need to use the fetch function to obtain those values. In this example, the value fetching is perfomed once all computations are completed (note that @sync preceding the loop forces the loop to wait for all jobs to complete). Also, note that contrary to the previous example, we do not need to implement locking as we are just pushing the EagerThunk results of Dagger.@spawn serially into the DataFrame (which is fast since Dagger.@spawn doesn't block).

      The above use case scenario has been tested by running julia -t 8 (or with JULIA_NUM_THREADS=8 as environment variable). The Threads.@threads code takes 1.8 seconds to run, while the Dagger code, which is also one line shorter, runs around 0.9 seconds, resulting in a 2x speedup.