-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support NumPy 2.0 (fixes #560) #562
Conversation
@@ -237,7 +237,7 @@ def test_lightgbm_sparse_ranking_model(tmpdir): | |||
|
|||
lgb_model_path = pathlib.Path(tmpdir) / "sparse_ranking_lightgbm.txt" | |||
|
|||
dtrain = lgb.Dataset(X, label=y, group=[X.shape[0]]) | |||
dtrain = lgb.Dataset(X, label=y, group=np.array([X.shape[0]], dtype=np.int32)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is lightgbm
's responsibility to fix.
Passing a list for group
ends up invoking np.array(..., copy=False)
.
def _list_to_1d_numpy(
data: Any,
dtype: "np.typing.DTypeLike",
name: str
) -> np.ndarray:
"""Convert data to numpy 1-D array."""
if _is_numpy_1d_array(data):
return _cast_numpy_array_to_dtype(data, dtype)
elif _is_numpy_column_array(data):
_log_warning('Converting column-vector to 1d array')
array = data.ravel()
return _cast_numpy_array_to_dtype(array, dtype)
elif _is_1d_list(data):
> return np.array(data, dtype=dtype, copy=False)
E ValueError: Unable to avoid copy while creating an array as requested.
E If using `np.array(obj, copy=False)` replace it with `np.asarray(obj)` to allow a copy when needed (no behavior change in NumPy 1.x).
E For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword.
Passing a numpy
array instead avoids that code path and allows this test to pass on NumPy 2.x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't post enough stacktrace there.
To be clear, that is code inside lightgbm
: https://github.com/microsoft/LightGBM/blob/e0ac63568c900405e5e438d95c539f3490919aab/python-package/lightgbm/basic.py#L358
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The int32
is a bit unfortunate, but it, or intc
is necessary here and seems totally safe to keep indefinitely.
src/c_api/model.cc
Outdated
@@ -61,7 +61,7 @@ int TreeliteConcatenateModelObjects( | |||
API_BEGIN(); | |||
std::vector<treelite::Model const*> model_objs(len, nullptr); | |||
std::transform(objs, objs + len, model_objs.begin(), | |||
[](TreeliteModelHandle e) { return static_cast<const treelite::Model*>(e); }); | |||
[](TreeliteModelHandle e) { return static_cast<treelite::Model const*>(e); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-format
made these style changes to C++ files automatically when I ran
pre-commit run --all-files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks all good and safe to me for adapting to NumPy changes (as a NumPy dev).
@@ -237,7 +237,7 @@ def test_lightgbm_sparse_ranking_model(tmpdir): | |||
|
|||
lgb_model_path = pathlib.Path(tmpdir) / "sparse_ranking_lightgbm.txt" | |||
|
|||
dtrain = lgb.Dataset(X, label=y, group=[X.shape[0]]) | |||
dtrain = lgb.Dataset(X, label=y, group=np.array([X.shape[0]], dtype=np.int32)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The int32
is a bit unfortunate, but it, or intc
is necessary here and seems totally safe to keep indefinitely.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## mainline #562 +/- ##
============================================
- Coverage 84.64% 84.59% -0.05%
============================================
Files 71 71
Lines 6330 6330
Branches 521 521
============================================
- Hits 5358 5355 -3
- Misses 972 975 +3 ☔ View full report in Codecov by Sentry. |
There are a couple CI failures, but it looks like something may be misconfigured. For example, am seeing the following in this CI build:
|
Thanks Hyunsu! 🙏 |
Thanks very much! Once there's a new release of |
Treelite 4.2.1 contains the following improvements: * Compatibility patch for latest RapidJSON (dmlc/treelite#567) * Support for NumPy 2.0 (dmlc/treelite#562). Thanks @jameslamb * Handle certain class of XGBoost models (dmlc/treelite#564) Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - Bradley Dice (https://github.com/bdice) - Dante Gama Dessavre (https://github.com/dantegd) - Ray Douglass (https://github.com/raydouglass) - James Lamb (https://github.com/jameslamb) URL: #5908
Fixes #560.
Makes
treelite
compatible with the latestnumpy
release (2.1.0.dev0) and older pre-2.x versions ofnumpy
.As @seberg noted in #560 (comment), the main change here was to change
np.array(..., copy=False)
calls tonp.asarray(...)
calls. As https://numpy.org/doc/stable/reference/generated/numpy.asarray.html notes:Other small changes:
ruff
topre-commit
and itsNPY201
rule to catch use of deprecated or removed NumPy 2.0 code (as suggested in NumPy's migration guide).hypothesis/
directory to.gitignore
hypothesis
installedHow I tested this
I tested
treelite
in apython:3.10
aarch64 container image.how I did that (click me)
Built in a default Python 3.10 image.
docker run \ --rm \ -v $(pwd):/opt/work \ -w /opt/work \ -it python:3.10 \ bash
Built a wheel.
Installed that wheel.
pip install ./dist/*.whl
Installed the latest
numpy
release candidate.Ran the tests.
Then fixed the tests as seen in the diff here, re-built, re-installed, and re-tested until all the tests were passing.
I also quickly scanned through the things being imported and didn't find any that'd be problematic based on the list of changes at https://numpy.org/devdocs/numpy_2_0_migration_guide.html. The
ruff
plugin should catch those anyway.I'm relying on CI here to test these changes against NumPy 1.x.