Skip to content

Commit

Permalink
(docs) core documentation overhaul
Browse files Browse the repository at this point in the history
  • Loading branch information
amkrajewski committed Feb 29, 2024
1 parent 5f7cbdc commit d28a510
Show file tree
Hide file tree
Showing 3 changed files with 34 additions and 18 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/documentation.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ jobs:
nim doc --project --index:on --outdir:docs --git.url:https://github.com/amkrajewski/nimCSO --git.commit:main src/nimcso
sed -i '0,/src\/nimcso/s//nimCSO/;0,/src\/nimcso/s//nimCSO/' docs/nimcso.html
cp docs/nimcso.html docs/index.html
mkdir -p docs/assets
cp -r paper/assets docs/assets
- name: Setup Pages
uses: actions/configure-pages@v4
Expand Down
46 changes: 29 additions & 17 deletions docs/docs.nim
Original file line number Diff line number Diff line change
@@ -1,22 +1,34 @@

## **Navigation:** [nimCSO](nimcso.html) (core library) | [Changelog](docs/changelog.html) | [nimcso/bitArrayAutoconfigured](nimcso/bitArrayAutoconfigured.html)
##
## **nim** **C**omposition **S**pace **O**ptimization is a high-performance, low-level tool for selecting sets of components (dimensions) in compositional spaces, which optimize the data availability
## given a constraint on the number of components to be selected. Ability to do so is crucial for deploying machine learning (ML) algorithms, so that they can be designed in a way balancing their
## accuracy and domain of applicability. Howerver, this becomes a combinatorically hard problem for complex compositions existing in highly dimensional spaces due to the interdependency of components
## being present. For instance, removing datapoints many low-frequency components
##
##
##
## Such spaces are often encountered in materials science, where datasets on Compositionally Complex Materials (CCMs) often span 20-40 chemical elements, while each data point contains
## several of them.
##
##
##
##
## This tool employs a set of methods, ranging from (1) brute-force search through (2) genetic algorithms to (3) a newly designed search method. They use custom data structures and procedures written in Nim language, which are compile-time optimized for the specific problem statement and dataset pair, which allows nimCSO to run faster and use 1-2 orders of magnitude less memory than general-purpose data structures. All configuration is done with a simple human-readable config file, allowing easy modification of the search method and its parameters.
##
##

## **nim** **C**omposition **S**pace **O**ptimization is a high-performance tool implementing several methods for selecting components (data dimensions) in compositional datasets, which
## optimize the data availability and density for applications such as machine learning (ML) given a constraint on the number of components to be selected. Ability to do so is crucial for
## deploying machine learning (ML) algorithms, so that they can be designed in a way balancing their accuracy and domain of applicability. Making said choice is a combinatorically hard
## problem when data is composed of a large number of independent components due to the interdependency of components being present. Thus, efficiency of the search becomes critical for any
## application where interaction between components is of interest in a modeling effort, ranging from market economics, through medicine where drug interactions can have a significant
## impact on the treatment, to materials science, where the composition and processing history are critical to resulting properties.
##
## We are particularily interested in the latter case of materials science, where we utilize `nimCSO` to optimize ML deployment over our datasets on Compositionally Complex Materials (CCMs)
## which are largest ever collected (from almost 550 publications) spanning up to 60 dimensions and developed within the [ULTERA Project (ultera.org)](https://ultera.org) carried under the
## [US DOE ARPA-E ULTIMATE](https://arpa-e.energy.gov/?q=arpa-e-programs/ultimate) program which aims to develop
## a new generation of ultra-high temperature materials for aerospace applications, through generative machine learning models [10.20517/jmi.2021.05](https://doi.org/10.20517/jmi.2021.05)
## driving thermodynamic modeling and experimentation [10.2139/ssrn.4689687](https://dx.doi.org/10.2139/ssrn.4689687).
##
## At its core, `nimCSO` leverages the metaprogramming ability of the [Nim language](https://nim-lang.org) to optimize itself at the compile time, both in terms of speed and memory handling,
## to the specific problem statement and dataset at hand based on a human-readable configuration file. As demonstrated later in benchamrks, `nimCSO` reaches the physical limits of the hardware
## (L1 cache latency) and can outperform an efficient native Python implementation over 400 times in terms of speed and 50 times in terms of memory usage (*not* counting interpreter), while
## also outperforming NumPy implementation 35 and 17 times, respectively, when checking a candidate solution.
##
## .. figure:: assets/nimCSO_mainFigure.png
## :alt: Main nimCSO figure
##
## `nimCSO` is designed to be both (1) a user-ready tool (see figure above), implementing efficient brute force approaches (for handling up to 25 dimensions), a custom search algorithm
## (for up to 40 dimensions), and a genetic algorithm (for any dimensionality), and (2) a scaffold for building even more elaborate methods in the future, including heuristics going beyond
## data availability. All configuration is done with a simple human-readable `YAML` config file and plain text data files, making it easy to modify the search method and its parameters with
## no knowledge of programming and only basic command line skills. A single command is used to recompile (`nim c -f`) and run (`-r`) problem (`-d:configPath=config.yaml`) with `nimCSO`
## (`src/nimcso`) using one of several methods. Advanced users can also quickly customize the provided methods with brief scripts using the `nimCSO` as a data-centric library.


## # Usage
##
##
Expand Down
4 changes: 3 additions & 1 deletion src/nimcso.nim
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,15 @@
{.passL: "-flto".}

when defined(nimdoc):
# Core documentation living in the root of the project
# Core documentation living in the root of the project.
include ../docs/docs

when defined(nimdoc):
# Documentation on benchmarks, living alongside them.
include ../benchmarks/docs

when defined(nimdoc):
# Documentation on the tests being run, living alongside them.
include ../tests/docs

# Standard library imports. One per line for easy change tracking.
Expand Down

0 comments on commit d28a510

Please sign in to comment.