Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch update #822

Merged
merged 26 commits into from
May 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
108fa32
batch tools subpackage
jchen6727 Mar 17, 2024
5446c17
comm update
jchen6727 Mar 18, 2024
f56bac6
new search methods
jchen6727 Mar 19, 2024
b2935c1
fixed import issues
jchen6727 Mar 19, 2024
1bc1eab
updates comm, search
jchen6727 Mar 20, 2024
3d5d0a8
search.py
jchen6727 Mar 21, 2024
cad0574
wrapper tools search
jchen6727 Mar 27, 2024
14502fa
update to batchtools
jchen6727 Mar 30, 2024
d7d3ead
codebase moved to netpyne batchtools
jchen6727 Apr 4, 2024
ccaa1d2
update to comm.py, runners.py
jchen6727 Apr 8, 2024
0b34c0f
update to search (kwargs)
jchen6727 Apr 16, 2024
fbc323e
config update (should delete two keys @ end?)
jchen6727 Apr 18, 2024
9622aa2
replace pubtk with batchtk for compatibility with pip
jchen6727 Apr 25, 2024
c5ffb3d
update the submit (zsh->sh)
jchen6727 Apr 25, 2024
62fbc30
update batchtools, submits
jchen6727 Apr 28, 2024
6273501
fixed socket functionality for submits.py
jchen6727 Apr 30, 2024
0626add
batchtools documentation
jchen6727 May 7, 2024
67108a7
minor updates- search
jchen6727 May 8, 2024
553689b
fixed bug in conn, updated test/examples/* to use dynamic pathing
jchen6727 May 9, 2024
5eb20f8
update CHANGES.md
jchen6727 May 11, 2024
98bad60
Updated documentation `batchtools.rst`
jchen6727 May 14, 2024
2908abe
Merge branch 'development' into batch
jchen6727 May 14, 2024
76b6bd6
update `user_documentation.rst`
jchen6727 May 14, 2024
627e7bf
sort init.py
jchen6727 May 14, 2024
05e0656
update per deployment on HPC
jchen6727 May 14, 2024
c1b657a
Merge branch 'batch' into batch
jchen6727 May 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
278 changes: 278 additions & 0 deletions doc/source/user_documentation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2547,3 +2547,281 @@ The code for neural network optimization through evolutionary algorithm used in
.. Adding cell classes
.. --------------------

Running a Batch Job (Beta)
===================

The NetPyNE batchtools subpackage provides a method of automating job submission and reporting::


batch<-->\ /---> configuration_0 >---\
\ / specs---\
\<--->dispatcher_0 sim_0
\ \ comm ---/
\ \---< results_0 <---/
\
\ /---> configuration_1 >---\
\ / specs---\
\<--->dispatcher_1 sim_1
\ \ comm ---/
\ \---< results_1 <---/
\
\
...



1. Setting up batchtools
-----
Beyond the necessary dependency installations for NetPyNE and NEURON, several additional `pip` installations are required.

The NetPyNE installation should be handled as a development installation of the repository branch `batch`::

git clone https://github.com/Neurosim-lab/netpyne.git
cd netpyne
git checkout batch
pip install -e .

The batchtools installation either::

pip install -u batchtk

or a development install (recommended)::

git clone https://github.com/jchen6727/batchtk.git
cd batchtk
pip install -e .

Ray is a dependency for batchtools, and should be installed with the following command::

pip install -u ray[default]

2. Examples
-----
Examples of NetPyNE batchtools usage can be found in the ``examples`` directory `here <https://github.com/suny-downstate-medical-center/netpyne/tree/batch/netpyne/batchtools/examples>`_.

Examples of the underlying batchtk package can be in the ``examples`` directory `here <https://github.com/jchen6727/batchtk/tree/release/examples>`_.

3. Retrieving batch configuration values through the ``specs`` object
-----
Each simulation is able to retrieve relevant configurations through the ``specs`` object, and communicate with
the dispatcher through the ``comm`` object.

importing the relevant objects::

from netpyne.batchtools import specs, comm
cfg = specs.SimConfig() # create a SimConfig object
netParams = specs.NetParams() # create a netParams object

``netpyne.batchtools.specs`` behaves similarly to ``netpyne.sim.specs`` except in the following cases:

* ``netpyne.batchtools.specs`` automatically captures relevant configuration mappings created by the ``dispatcher`` upon initialization

* these mappings can be retrieved via ``specs.get_mappings()``

* the SimConfig object created by ``netpyne.batch.specs.SimConfig()`` will update itself with relevant configuration mappings through the ``update()`` method::

from netpyne.batchtools import specs # import the custom batch specs
cfg = specs.SimConfig() # create a SimConfig object
cfg.update() # update the cfg object with any relevant mappings for this particular batch job

The ``update`` method will update the ``SimConfig`` object with the configuration mappings captured in ``specs`` (see: ``specs.get_mappings()``)

This replaces the previous idiom for updating the SimConfig object with mappings from the batched job submission::

try:
from __main__ import cfg # import SimConfig object with params from parent module
except:
from cfg import cfg # if no simConfig in parent module, import directly from tut8_cfg module



4. Communicating results to the ``dispatcher`` with the ``comm`` object
-----

Prior batched simulations relied on ``.pkl`` files to communicate data. The ``netpyne.batch`` subpackage uses a specific ``comm`` object to send custom data back
The ``comm`` object determines the method of communication based on the batch job submission type.

In terms of the simulation, the following functions are available to the user:

* **comm.initialize()**: establishes a connection with the batch ``dispatcher`` for sending data

* **comm.send(<data>)**: sends ``<data>`` to the batch ``dispatcher``
* for ``search`` jobs, it is important to match the data sent with the metric specified in the search function

* **comm.close()**: closes and cleans up the connection with the batch ``dispatcher``

5. Specifying a batch job
-----
Batch job handling is implemented with methods from ``netpyne.batchtools.search``

**search**::

def search(job_type: str, # the submission engine to run a single simulation (e.g. 'sge', 'sh')
comm_type: str, # the method of communication between host dispatcher and the simulation (e.g. 'socket', 'filesystem')
run_config: Dict, # batch configuration, (keyword: string pairs to customize the submit template)
params: Dict, # search space (dictionary of parameter keys: tune search spaces)
algorithm: Optional[str] = "variant_generator", # search algorithm to use, see SEARCH_ALG_IMPORT for available options
label: Optional[str] = 'search', # label for the search
output_path: Optional[str] = '../batch', # directory for storing generated files
checkpoint_path: Optional[str] = '../ray', # directory for storing checkpoint files
max_concurrent: Optional[int] = 1, # number of concurrent trials to run at one time
batch: Optional[bool] = True, # whether concurrent trials should run synchronously or asynchronously
num_samples: Optional[int] = 1, # number of trials to run
metric: Optional[str] = "loss", # metric to optimize (this should match some key: value pair in the returned data
mode: Optional[str] = "min", # either 'min' or 'max' (whether to minimize or maximize the metric
algorithm_config: Optional[dict] = None, # additional configuration for the search algorithm
) -> tune.ResultGrid: # results of the search

The basic search implemented with the ``search`` function uses ``ray.tune`` as the search algorithm backend, returning a ``tune.ResultGrid`` which can be used to evaluate the search space and results. It takes the following parameters;

* **job_type**: either "``sge``" or "``sh``", specifying how the job should be submitted, "``sge``" will submit batch jobs through the Sun Grid Engine. "``sh``" will submit bach jobs through the shell on a local machine
* **comm_type**: either "``socket``" or "``filesystem``", specifying how the job should communicate with the dispatcher
* **run_config**: a dictionary of keyword: string pairs to customize the submit template, the expected keyword: string pairs are dependent on the job_type::

=======
sge
=======
queue: the queue to submit the job to (#$ -q {queue})
cores: the number of cores to request for the job (#$ -pe smp {cores})
vmem: the amount of memory to request for the job (#$ -l h_vmem={vmem})
realtime: the amount of time to request for the job (#$ -l h_rt={realtime})
command: the command to run for the job

example:
run_config = {
'queue': 'cpu.q', # request job to be run on the 'cpu.q' queue
'cores': 8, # request 8 cores for the job
'vmem': '8G', # request 8GB of memory for the job
'realtime': '24:00:00', # set timeout of the job to 24 hours
'command': 'mpiexec -n $NSLOTS -hosts $(hostname) nrniv -python -mpi init.py'
} # set the command to be run to 'mpiexec -n $NSLOTS -hosts $(hostname) nrniv -python -mpi init.py'

=======
sh
=======
command: the command to run for the job

example:
run_config = {
'command': 'mpiexec -n 8 nrniv -python -mpi init.py'
} # set the command to be run

* **params**: a dictionary of config values to perform the search over. The keys of the dictionary should match the keys of the config object to be updated. Lists or numpy generators >2 values will force a grid search over the values; otherwise, a list of two values will create a uniform distribution sample space.

**usage 1**: updating a constant value specified in the ``SimConfig`` object ::

# take a config object with the following parameter ``foo``
cfg = specs.SimConfig()
cfg.foo = 0
cfg.update()

# specify a search space for ``foo`` such that a simulation will run with:
# cfg.foo = 0
# cfg.foo = 1
# cfg.foo = 2
# ...
# cfg.foo = 9

# using:
params = {
'foo': range(10)
}

**usage 2**: updating a nested object in the ``SimConfig`` object::

# to update a nested object, the package uses the `.` operator to specify reflection into the object.
# take a config object with the following parameter object ``foo``
cfg = specs.SimConfig()
cfg.foo = {'bar': 0, 'baz': 0}
cfg.update()

# specify a search space for ``foo['bar']`` with `foo.bar` such that a simulation will run:
# cfg.foo['bar'] = 0
# cfg.foo['bar'] = 1
# cfg.foo['bar'] = 2
# ...
# cfg.foo['bar'] = 9

# using:
params = {
'foo.bar': range(10)
}

# this reflection works with nested objects as well...
# i.e.
# cfg.foo = {'bar': {'baz': 0}}
# params = {'foo.bar.baz': range(10)}

* **algorithm** : the search algorithm (supported within ``ray.tune``)

**Supported algorithms**::

* "variant_generator": grid and random based search of the parameter space (see: https://docs.ray.io/en/latest/tune/api/suggestion.html)
* "random": grid and random based search of the parameter space (see: https://docs.ray.io/en/latest/tune/api/suggestion.html)
* "axe": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html)
* "bayesopt": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html)
* "hyperopt": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html)
* "bohb": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html)
* "nevergrad": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html)
* "optuna": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html)
* "hebo": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html)
* "sigopt": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html)
* "zoopt": optimization algorithm (see: https://docs.ray.io/en/latest/tune/api/suggestion.html)

* **label**: a label for the search, used for output file naming

* **output_path**: the directory for storing generated files, can be a relative or absolute path

* **checkpoint_path**: the directory for storing checkpoint files in case the search needs to be restored, can be a relative or absolute path

* **max_concurrent**: the number of concurrent trials to run at one time, it is recommended to keep in mind the resource usage of each trial to avoid overscheduling

* **batch**: whether concurrent trials should run synchronously or asynchronously

* **num_samples**: the number of trials to run, for any grid search, each value in the grid will be sampled ``num_samples`` times.

* **metric**: the metric to optimize (this should match some key: value pair in the returned data)

* **mode**: either 'min' or 'max' (whether to minimize or maximize the metric)

* **algorithm_config**: additional configuration for the search algorithm (see the `optuna docs <https://docs.ray.io/en/latest/tune/api/suggestion.html>`_)

6. Performing parameter optimization searches (CA3 example)
-----
The ``examples`` directory `here <https://github.com/suny-downstate-medical-center/netpyne/tree/batch/netpyne/batchtools/examples>`_ shows both a ``grid`` based search as well as an ``optuna`` based optimization.

In the ``CA3`` example, we tune the ``PYR->BC`` ``NMDA`` and ``AMPA`` synaptic weights, as well as the ``BC->PYR`` ``GABA`` synaptic weight. Note the search space is defined::

# from optuna_search.py
params = {'nmda.PYR->BC' : [1e-3, 1.8e-3],
'ampa.PYR->BC' : [0.2e-3, 0.5e-3],
'gaba.BC->PYR' : [0.4e-3, 1.0e-3],
}

in both ``optuna_search.py``, defining the upper and lower bounds of the search space, while in ``grid_search.py`` the search space is defined::

# from grid_search.py
params = {'nmda.PYR->BC' : numpy.linspace(1e-3, 1.8e-3, 3),
'ampa.PYR->BC' : numpy.linspace(0.2e-3, 0.5e-3, 3),
'gaba.BC->PYR' : numpy.linspace(0.4e-3, 1.0e-3, 3),
}

which defines ``3x3x3`` specific values to search over

Note that the ``metric`` specifies a specific ``string`` (``loss``) to report and optimize around. This value is generated and ``sent`` by the ``init.py`` simulation::

# from init.py
results['PYR_loss'] = (results['PYR'] - 3.33875)**2
results['BC_loss'] = (results['BC'] - 19.725 )**2
results['OLM_loss'] = (results['OLM'] - 3.470 )**2
results['loss'] = (results['PYR_loss'] + results['BC_loss'] + results['OLM_loss']) / 3
out_json = json.dumps({**inputs, **results})

print(out_json)
#TODO put all of this in a single function.
comm.send(out_json)
comm.close()

The ``out_json`` output contains a dictionary which includes the ``loss`` metric (calculated as the MSE between observed and expected values)

In a multi-objective optimization, the relevant ``PYR_loss``, ``BC_loss``, and ``OLM_loss`` components are additionally included (see ``mo_optuna_search.py``)
8 changes: 7 additions & 1 deletion netpyne/batchtools/__init__.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,17 @@
from netpyne.batchtools.runners import NetpyneRunner
from batchtk.runtk import dispatchers

from netpyne.batchtools import submits
from batchtk import runtk

specs = NetpyneRunner()

from netpyne.batchtools.comm import Comm

dispatchers = dispatchers
submits = submits
runtk = runtk


comm = Comm()


Expand Down
Loading