Skip to content

Commit

Permalink
Create README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
christopherwharrop committed May 1, 2015
1 parent 0f7e3c4 commit 1bd25c0
Showing 1 changed file with 20 additions and 0 deletions.
20 changes: 20 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Rocoto Workflow Management System

##Introduction
Workflow Management is a concept that originated in the 1970's to handle business process management. Workflow management systems were developed to manage complex collections of business processes that need to be carried out in a certain way with complex interdependencies and requirements. Scientific Workflow Management is much newer, and is very much like its business counterpart, except that it is usually data oriented instead of process oriented. That is, scientific workflows are driven by the scientific data that "flows" through them. Scientific workflow tasks are usually triggered by the availability of some kind of input data, and a task's result is usally some kind of data that is fed as input to another task in the workflow. The individual tasks themselves are scientific codes that perform some kind of computation or retreive or store some type of data for a computation. So, whereas a business workflow is comprised of a diverse set of processes that have to be completed in a certain way, sometimes carried out by a machine, sometimes carried out by a human being, a scientific workflow is usually comprised of a set of computations that are driven by the availabiilty of input data.

##Why Workflow Management?
The day when a scientist could conduct his or her numerical modeling and simulation research by writing, running, and monitoriing the progress of a modest Fortran code or two, is quickly becoming a distant memory. It is a fact that researchers now often have to make hundreds or thousands of runs of a numerical model to get a single result. In addition, each end-to-end "run" of the model often entails running many different codes for pre- and post-processing in addition to the model itself. And, in some cases, multiple models and their associated pre- and post-processing tasks are coupled together to build a larger, more complex model. The codes that comprise the end-to-end modeling systems often have complex interdependencies that dictate the order in which they can be run. And, in order to run the end-to-end system efficiently, concurrency must be used when dependencies allow it. The problem of scale and complexity is exacerbated by the fact that these codes are usually run on high performance machines that are notoriously difficult for scientists to use, and which tend to exhibit frequent failures. As machines get larger and larger, the failure rate of hardware and software components increases commensurately. Ad-hoc management of the execution of a complex modeling system is often difficult even for a single end-to-end run on a machine that never fails. Multiply that by the thousands of runs needed to perform a scientific experiment, in a hostile computing environment where hardware and facility outages are not uncommon, and you have a very challenging situation. For simulations that must run reliably in realtime, the situtation is almost hopeless. The traditional ad-hoc techniques for automating the execution of modeling systems (e.g. driver scripts, batch job chains or trees) do not provide sufficient fault tolerance for the scale and complexity of current and future workflows, nor are they reusable; each modeling system requires a custom automation system.

A Workflow Management System addresses the problems of complexity, scale, reliability, and reusability by providing two things:

* A high-level means by which to describe the various codes that need to be run, along with their runtime requirements and interdependencies.
* An automation engine for reliably managing the execution of the workflow

##Prerequisites
Depending on how the components of a modeling system are designed and how existing software for running them is designed, some changes may be necessary to make use of a workflow management system. In order to take full advantage of the features offered by a workflow management system, the model system components must be well designed. In particular the following best practices should be followed:

Each workflow task must correctly check for its successful completion, and must return a non-zero exit status upon failure. An exit status of 0 means success, regardless of what actually happened. No workflow task should contain automation features. Automation is the workflow managmenet system's responsibility. A workflow management system cannot manage tasks or jobs that it is not aware of. Enable reuse of workflow tasks by using principles of modular design to build autonomous model components with well-defined interfaces for input and output that can be run stand-alone. Prefer the construction of small model components that do only one thing. It is easy to combine several small, well-designed, components together to build a larger, more complex workflow task. It is generally much more difficult to divide large, complex, model components into smaller ones to form multiple workflow tasks. Avoid combining serial and parallel processing in the same workflow task unless the serial processing is very short in duration.

## Documenation
Detailed documentation is provided at http://christopherwharrop.github.io/rocoto/

0 comments on commit 1bd25c0

Please sign in to comment.