-
Notifications
You must be signed in to change notification settings - Fork 30
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #171 from uio-bmi/docs_updates
Documentation updates and bugfixes
- Loading branch information
Showing
258 changed files
with
6,929 additions
and
6,011 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,14 @@ | ||
ML method,binary classification,multi-class classification,sequence dataset,receptor dataset,repertoire dataset,model selection CV | ||
AtchleyKmerMILClassifier,✓,✗,✗,✗,✓,✗ | ||
BinaryFeatureClassifier,✓,✗,✓,✗,✗,✗ | ||
DeepRC,✓,✗,✗,✗,✓,✗ | ||
KNN,✓,✓,✓,✓,✓,✓ | ||
KerasSequenceCnn,✓,✗,✓,✗,✗,✗ | ||
LogisticRegression,✓,✓,✓,✓,✓,✓ | ||
PrecomputedKNN,✓,✓,✓,✓,✓,✓ | ||
ProbabalisticBinaryClassifier,✓,✗,✗,✗,✓,✗ | ||
RandomForestClassifier,✓,✓,✓,✓,✓,✓ | ||
ReceptorCNN,✓,✗,✗,✓,✗,✗ | ||
SVC,✓,✓,✓,✓,✓,✓ | ||
SVM,✓,✓,✓,✓,✓,✓ | ||
TCRdistClassifier,✓,✓,✓,✓,✓,✗ |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified
BIN
+38.4 KB
(100%)
docs/source/_static/images/definitions_instructions_overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
|
||
|
||
To prevent recomputing the same result a second time, immuneML uses caching. | ||
Caching can be applied to methods which compute an (intermediate) result. | ||
The result is stored to a file, and when the same method call is made, the previously | ||
stored result is retrieved from the file and returned. | ||
|
||
We recommend applying caching to methods which are computationally expensive and may be called | ||
multiple times in the same way. For example, encoders are a good target for caching as they | ||
may take long to compute and can be called multiple times on the same data when combined | ||
with different ML methods. But ML methods typically do not require caching, as you would | ||
want to apply ML methods with different parameters or to differently encoded data. | ||
|
||
|
||
Any method call in immuneML can be cached as follows: | ||
|
||
.. code:: python | ||
result = CacheHandler.memo_by_params(params = cache_params, fn = lambda: my_method_for_caching(my_method_param1, my_method_param2, ...)) | ||
The :code:`CacheHandler.memo_by_params` method does the following: | ||
|
||
- Using the caching parameters, a unique cache key (random string) is created. | ||
- CacheHandler checks if there already exists a previously computed result that is associated with this key. | ||
- If the result exists, the result is returned without (re)computing the method. | ||
- If the result does not exist, the method is computed, its result is stored using the cache key, and the result is returned. | ||
|
||
|
||
The :code:`lambda` function call simply calls the method to be cached, using any required parameters. | ||
The :code:`cache_params` represent the unique, immutable parameters used to compute the cache key. | ||
It should have the following properties: | ||
|
||
- It must be a nested tuple containing *only* immutable items such as strings, booleans and integers. | ||
It cannot contain mutable items like lists, dictionaries, sets and objects (they all need to be converted nested tuples of immutable items). | ||
- It should include *every* factor that can contribute to a difference in the results of the computed method. | ||
For example, when caching the encode_data step, the following should be included: | ||
|
||
- dataset descriptors (dataset id, example ids, dataset type), | ||
- encoding name, | ||
- labels, | ||
- :code:`EncoderParams.learn_model` if used, | ||
- all relevant input parameters to the encoder. Preferentially retrieved automatically (such as by :code:`vars(self)`), | ||
as this ensures that if new parameters are added to the encoder, they are always added to the caching params. | ||
|
||
For example, :py:obj:`~immuneML.encodings.onehot.OneHotEncoder.OneHotEncoder` computes its | ||
caching parameters as follows: | ||
|
||
.. code:: python | ||
def _prepare_caching_params(self, dataset, params: EncoderParams): | ||
return (("dataset_identifier", dataset.identifier), | ||
("example_identifiers", tuple(dataset.get_example_ids())), | ||
("dataset_type", dataset.__class__.__name__), | ||
("encoding", OneHotEncoder.__name__), | ||
("labels", tuple(params.label_config.get_labels_by_name())), | ||
("encoding_params", tuple(vars(self).items()))) | ||
The construction of caching parameters must be done carefully, as caching bugs are extremely difficult | ||
to discover. Rather add 'too much' information than too little. | ||
A missing parameter will not lead to an error, but can result in silently copying over | ||
results from previous method calls. |
33 changes: 33 additions & 0 deletions
33
docs/source/developer_docs/class_documentation_standards.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
Class documentation should be added as a docstring to all new Encoder, MLMethod, Report or Preprocessing classes. | ||
The class docstrings are used to automatically generate the documentation web pages, using Sphinx `reStructuredText <https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html>`_, and should adhere to a standard format: | ||
|
||
|
||
#. A short, general description of the functionality | ||
|
||
#. Optional extended description, including any references or specific cases that should bee considered. For instance: if a class can only be used for a particular dataset type. Compatibility between Encoders, MLMethods and Reports should also be described. | ||
|
||
#. A list of arguments, when applicable. This should follow the format below: | ||
|
||
.. code:: | ||
**Specification arguments:** | ||
- parameter_name (type): a short description | ||
- other_paramer_name (type): a short description | ||
#. A YAML snippet, to show an example of how the new component should be called. Make sure to test your YAML snippet in an immuneML run to ensure it is specified correctly. The following formatting should be used to ensure the YAML snippet is rendered correctly: | ||
|
||
.. code:: | ||
**YAML specification:** | ||
.. indent with spaces | ||
.. code-block:: yaml | ||
definitions: | ||
yaml_keyword: # could be encodings/ml_methods/reports/etc... | ||
my_new_class: | ||
MyNewClass: | ||
parameter_name: 0 | ||
other_paramer_name: 1 |
10 changes: 10 additions & 0 deletions
10
docs/source/developer_docs/coding_conventions_and_tips.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
.. note:: | ||
**Coding conventions and tips** | ||
|
||
#. Class names are written in CamelCase | ||
#. Class methods are writte in snake_case | ||
#. Abstract base classes :code:`MLMethod`, :code:`DatasetEncoder`, and :code:`Report`, define an interface for their inheriting subclasses. These classes contain abstract methods which should be overwritten. | ||
#. Class methods starting with _underscore are generally considered "private" methods, only to be called by the class itself. If a method is expected to be called from another class, the method name should not start with an underscore. | ||
#. When familiarising yourself with existing code, we recommend focusing on public methods. Private methods are typically very unique to a class (internal class-specific calculations), whereas the public methods contain more general functionalities (e.g., returning a main result). | ||
#. If your class should have any default parameters, they should be defined in a default parameters file under :code:`config/default_params/`. | ||
#. Some utility classes are available in the :code:`util` package to provide useful functionalities. For example, :py:obj:`~immuneML.util.ParameterValidator.ParameterValidator` can be used to check user input and generate error messages, or :py:obj:`~immuneML.util.PathBuilder.PathBuilder` can be used to add and remove folders. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
**EncodedData:** | ||
|
||
- :code:`examples`: a design matrix where the rows represent Repertoires, Receptors or Sequences ('examples'), and the columns the encoding-specific features. This is typically a numpy matrix, but may also be another matrix type (e.g., scipy sparse matrix, pytorch tensor, pandas dataframe). | ||
- :code:`encoding`: a string denoting the encoder base class that was used. | ||
- :code:`labels`: a dictionary of labels, where each label is a key, and the values are the label values across the examples (for example: {disease1: [positive, positive, negative]} if there are 3 repertoires). This parameter should be set only if :code:`EncoderParams.encode_labels` is True, otherwise it should be set to None. This can be created by calling utility function :code:`EncoderHelper.encode_dataset_labels()`. | ||
- :code:`example_ids`: a list of identifiers for the examples (Repertoires, Receptors or Sequences). This can be retrieved using :code:`Dataset.get_example_ids()`. | ||
- :code:`feature_names`: a list of feature names, i.e., the names given to the encoding-specific features. When included, list must be as long as the number of features. | ||
- :code:`feature_annotations`: an optional pandas dataframe with additional information about the features. When included, number of rows in this dataframe must correspond to the number of features. This parameter is not typically used. | ||
- :code:`info`: an optional dictionary that may be used to store any additional information that is relevant (for example paths to additional output files). This parameter is not typically used. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
from pathlib import Path | ||
|
||
import numpy as np | ||
import pandas as pd | ||
import plotly.express as px | ||
|
||
from immuneML.data_model.dataset.Dataset import Dataset | ||
from immuneML.reports.ReportOutput import ReportOutput | ||
from immuneML.reports.ReportResult import ReportResult | ||
from immuneML.reports.data_reports.DataReport import DataReport | ||
from immuneML.util.ParameterValidator import ParameterValidator | ||
from immuneML.util.PathBuilder import PathBuilder | ||
|
||
|
||
class RandomDataPlot(DataReport): | ||
""" | ||
This RandomDataPlot is a placeholder for a real Report. | ||
It plots some random numbers. | ||
**Specification arguments:** | ||
- n_points_to_plot (int): The number of random points to plot. | ||
**YAML specification:** | ||
.. indent with spaces | ||
.. code-block:: yaml | ||
definitions: | ||
reports: | ||
my_report: | ||
RandomDataPlot: | ||
n_points_to_plot: 10 | ||
""" | ||
|
||
@classmethod | ||
def build_object(cls, **kwargs): | ||
# Here you may check the values of given user parameters | ||
# This will ensure immuneML will crash early (upon parsing the specification) if incorrect parameters are specified | ||
ParameterValidator.assert_type_and_value(kwargs['n_points_to_plot'], int, RandomDataPlot.__name__, 'n_points_to_plot', min_inclusive=1) | ||
|
||
return RandomDataPlot(**kwargs) | ||
|
||
def __init__(self, dataset: Dataset = None, result_path: Path = None, number_of_processes: int = 1, name: str = None, | ||
n_points_to_plot: int = None): | ||
super().__init__(dataset=dataset, result_path=result_path, number_of_processes=number_of_processes, name=name) | ||
self.n_points_to_plot = n_points_to_plot | ||
|
||
def check_prerequisites(self): | ||
# Here you may check properties of the dataset (e.g. dataset type), or parameter-dataset compatibility | ||
# and return False if the prerequisites are incorrect. | ||
# This will generate a user-friendly error message and ensure immuneML does not crash, but instead skips the report. | ||
# Note: parameters should be checked in 'build_object' | ||
return True | ||
|
||
def _generate(self) -> ReportResult: | ||
PathBuilder.build(self.result_path) | ||
df = self._get_random_data() | ||
|
||
# utility function for writing a dataframe to a csv file | ||
# and creating a ReportOutput object containing the reference | ||
report_output_table = self._write_output_table(df, self.result_path / 'random_data.csv', name="Random data file") | ||
|
||
# Calling _safe_plot will internally call _plot, but ensure immuneML does not crash if errors occur | ||
report_output_fig = self._safe_plot(df=df) | ||
|
||
# Ensure output is either None or a list with item (not an empty list or list containing None) | ||
output_tables = None if report_output_table is None else [report_output_table] | ||
output_figures = None if report_output_fig is None else [report_output_fig] | ||
|
||
return ReportResult(name=self.name, | ||
info="Some random numbers", | ||
output_tables=output_tables, | ||
output_figures=output_figures) | ||
|
||
def _get_random_data(self): | ||
return pd.DataFrame({"random_data_dim1": np.random.rand(self.n_points_to_plot), | ||
"random_data_dim2": np.random.rand(self.n_points_to_plot)}) | ||
|
||
def _plot(self, df: pd.DataFrame) -> ReportOutput: | ||
figure = px.scatter(df, x="random_data_dim1", y="random_data_dim2", template="plotly_white") | ||
figure.update_layout(template="plotly_white") | ||
|
||
file_path = self.result_path / "random_data.html" | ||
figure.write_html(str(file_path)) | ||
return ReportOutput(path=file_path, name="Random data plot") | ||
|
Oops, something went wrong.