Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Add developer guide about handling of parameters in R #11117

Merged
merged 1 commit into from
Jan 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions doc/R-package/adding_parameters.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
.. _index_base:

Developer guide: parameters from core library
=============================================

The XGBoost core library accepts a long list of input parameters (e.g. ``max_depth`` for decision trees, regularization, ``device`` where compute happens, etc.). New parameters are constantly being added as XGBoost is developed further, and their language bindings should allow passing to the core library everything that it accepts.

In the case of R, these parameters are passed as an R ``list`` object to function ``xgb.train``, but the R interface aims at providing a better, more idiomatic user experience by offering a parameters constructor with full in-package documentation. This requires keeping the list of parameters and their documentation up to date **in the R package** too, in addition to the general online documentation for XGBoost.

In more detail, there is a function ``xgb.params`` which allows the user to construct such a ``list`` object to pass to ``xgb.train`` while getting full IDE autocompletion on it. This function should accept all possible XGBoost parameters as arguments, listing them in the same order as they appear in the online documentation.

In order to add a new parameter from the core library to ``xgb.params``:

- Add the parameter at the right location, according to the order in which it appears in the .rst file listing the parameters for the core library. If the parameter appears more than once (e.g. because it applies to more than one type of booster), then add it in a position according to to the first occurrence.
- Copy-paste the docs from the .rst file as another ``@param`` entry for ``xgb.train``. Some easy substitutions might be needed, such as changing double-backticks to single-backticks, enquoting variables that need to be passed as strings, and replacing ``:math:`` calls with their roxygen equivalent ``\eqn{}``, among others.
- If needed, make minimal modifications for the R interface - for example, since parameters are only listed once, should add at the beginning a note about which type of booster they apply to if they are only applicable for one type, or list default values by booster type if they are different.

After adding the parameter to ``xgb.params``, it will also need to be added to the function ``xgboost`` if that function can use it. The function ``xgboost`` is not meant to support everything that the core library offers - currently parameters related to learning-to-rank are not listed there for example as they are unusable for it (but can be used for ``xgb.train``).

In order to add the parameter to ``xgboost``:

- Add it to the function signature. The position here differs though: there are a few selected parameters whose positions have been moved closer to the top of the signature. New parameters should not be placed within those "top" positions - instead, place it after parameter ``tree_method``, in the most similar place among the remaining parameters according to how it was inserted in ``xgb.params``. Note that the rest of the parameters that come after ``tree_method`` are still meant to follow the same relative order as in ``xgb.params``.
- If the parameter applies exactly in the same way as in ``xgb.train``, then no additional documentation is needed for ``xgboost``, because it inherits parameters from ``xgb.params`` by default. However, some parameters might need slight modifications - for example, not all objectives are supported by ``xgboost``, so modifications are needed for that parameter.
- If the parameter allows aliases, use only one alias, and prefer the most descriptive nomenclature (e.g. "learning_rate" instead of "eta"). These also need a doc entry ``@param`` in ``xgboost``, as the one in ``xgb.params`` will have the unsupported alias.

As new objectives and evaluation metrics are added, be mindful that they need to be added to the docs of both ``xgb.params`` and ``xgboost``. Documentation for objectives in both functions was originally copied from the same .rst file for the core library, but for ``xgboost`` it undergoes additional modifications in order to list what is and isn't supported, and to refer only to the parameter aliases that are accepted by ``xgboost``.

Keep in mind also that objectives that are a variant of one another but with a different prediction mode, are not meant to be allowed in ``xgboost`` as they'd break its intended interface - therefore, such objectives are not described in the docs for ``xgboost`` (but there is a list at the end of what isn't supported by it) and are checked against in function ``prescreen.objective``.
1 change: 1 addition & 0 deletions doc/R-package/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,4 @@ Other topics
:titlesonly:

Handling of indexable elements <index_base>
Developer guide: parameters from core library <adding_parameters>
Loading