Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support NumPy 2.0 (fixes #560) #562

Merged
merged 2 commits into from
May 23, 2024
Merged

support NumPy 2.0 (fixes #560) #562

merged 2 commits into from
May 23, 2024

Conversation

jameslamb
Copy link
Contributor

Fixes #560.

Makes treelite compatible with the latest numpy release (2.1.0.dev0) and older pre-2.x versions of numpy.

As @seberg noted in #560 (comment), the main change here was to change np.array(..., copy=False) calls to np.asarray(...) calls. As https://numpy.org/doc/stable/reference/generated/numpy.asarray.html notes:

*No copy is performed if the input is already an ndarray with matching dtype and order. *

Other small changes:

  • adds ruff to pre-commit and its NPY201 rule to catch use of deprecated or removed NumPy 2.0 code (as suggested in NumPy's migration guide)
    • this did not catch any other issues, but good protection. to ensure this support keeps working
  • adds .hypothesis/ directory to .gitignore
    • this was leftover from running the tests locally with hypothesis installed

How I tested this

I tested treelite in a python:3.10 aarch64 container image.

how I did that (click me)

Built in a default Python 3.10 image.

docker run \
    --rm \
    -v $(pwd):/opt/work \
    -w /opt/work \
    -it python:3.10 \
    bash

Built a wheel.

apt-get update
apt-get install -y --no-install-recommends \
    build-essential \
    cmake

pip install \
    build

mkdir -p ./dist
python -m build \
    --wheel \
    --outdir ./dist \
    ./python

Installed that wheel.

pip install ./dist/*.whl

Installed the latest numpy release candidate.

pip install \
    --extra-index-url https://pypi.anaconda.org/scientific-python-nightly-wheels/simple \
    'numpy==2.*,>=0.0.0a'
# 2.1.0.dev0/numpy-2.1.0.dev0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl

Ran the tests.

pip install \
    hypothesis \
    lightgbm \
    pytest \
    pytest-cov \
    scikit-learn \
    xgboost

pytest tests/python

Then fixed the tests as seen in the diff here, re-built, re-installed, and re-tested until all the tests were passing.

I also quickly scanned through the things being imported and didn't find any that'd be problematic based on the list of changes at https://numpy.org/devdocs/numpy_2_0_migration_guide.html. The ruff plugin should catch those anyway.

git grep -E 'np\.'
git grep -E 'numpy\.'

I'm relying on CI here to test these changes against NumPy 1.x.

@@ -237,7 +237,7 @@ def test_lightgbm_sparse_ranking_model(tmpdir):

lgb_model_path = pathlib.Path(tmpdir) / "sparse_ranking_lightgbm.txt"

dtrain = lgb.Dataset(X, label=y, group=[X.shape[0]])
dtrain = lgb.Dataset(X, label=y, group=np.array([X.shape[0]], dtype=np.int32))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is lightgbm's responsibility to fix.

Passing a list for group ends up invoking np.array(..., copy=False).

    def _list_to_1d_numpy(
        data: Any,
        dtype: "np.typing.DTypeLike",
        name: str
    ) -> np.ndarray:
        """Convert data to numpy 1-D array."""
        if _is_numpy_1d_array(data):
            return _cast_numpy_array_to_dtype(data, dtype)
        elif _is_numpy_column_array(data):
            _log_warning('Converting column-vector to 1d array')
            array = data.ravel()
            return _cast_numpy_array_to_dtype(array, dtype)
        elif _is_1d_list(data):
>           return np.array(data, dtype=dtype, copy=False)
E           ValueError: Unable to avoid copy while creating an array as requested.
E           If using `np.array(obj, copy=False)` replace it with `np.asarray(obj)` to allow a copy when needed (no behavior change in NumPy 1.x).
E           For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword.

Passing a numpy array instead avoids that code path and allows this test to pass on NumPy 2.x

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't post enough stacktrace there.

To be clear, that is code inside lightgbm: https://github.com/microsoft/LightGBM/blob/e0ac63568c900405e5e438d95c539f3490919aab/python-package/lightgbm/basic.py#L358

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The int32 is a bit unfortunate, but it, or intc is necessary here and seems totally safe to keep indefinitely.

@jameslamb jameslamb marked this pull request as ready for review May 16, 2024 15:36
@jameslamb jameslamb changed the title WIP: support NumPy 2.0 (fixes #560) support NumPy 2.0 (fixes #560) May 16, 2024
@@ -61,7 +61,7 @@ int TreeliteConcatenateModelObjects(
API_BEGIN();
std::vector<treelite::Model const*> model_objs(len, nullptr);
std::transform(objs, objs + len, model_objs.begin(),
[](TreeliteModelHandle e) { return static_cast<const treelite::Model*>(e); });
[](TreeliteModelHandle e) { return static_cast<treelite::Model const*>(e); });
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-format made these style changes to C++ files automatically when I ran

pre-commit run --all-files

Copy link

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks all good and safe to me for adapting to NumPy changes (as a NumPy dev).

@@ -237,7 +237,7 @@ def test_lightgbm_sparse_ranking_model(tmpdir):

lgb_model_path = pathlib.Path(tmpdir) / "sparse_ranking_lightgbm.txt"

dtrain = lgb.Dataset(X, label=y, group=[X.shape[0]])
dtrain = lgb.Dataset(X, label=y, group=np.array([X.shape[0]], dtype=np.int32))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The int32 is a bit unfortunate, but it, or intc is necessary here and seems totally safe to keep indefinitely.

Copy link

codecov bot commented May 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.59%. Comparing base (890f605) to head (161253c).
Report is 11 commits behind head on mainline.

Additional details and impacted files
@@             Coverage Diff              @@
##           mainline     #562      +/-   ##
============================================
- Coverage     84.64%   84.59%   -0.05%     
============================================
  Files            71       71              
  Lines          6330     6330              
  Branches        521      521              
============================================
- Hits           5358     5355       -3     
- Misses          972      975       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jakirkham
Copy link

There are a couple CI failures, but it looks like something may be misconfigured. For example, am seeing the following in this CI build:

Run conda info
/Users/runner/work/_temp/4c3cc175-ed2f-4fc9-bc2e-3f8e90ad7012.sh: line 1: conda: command not found

@hcho3 hcho3 merged commit 26103a5 into dmlc:mainline May 23, 2024
18 of 19 checks passed
@jakirkham
Copy link

Thanks Hyunsu! 🙏

@jameslamb jameslamb deleted the numpy-2.0 branch May 23, 2024 13:59
@jameslamb
Copy link
Contributor Author

Thanks very much! Once there's a new release of treelite with these changes, we can update rapidsai/build-planning#38 and numpy/numpy#26191

rapids-bot bot pushed a commit to rapidsai/cuml that referenced this pull request Jun 15, 2024
Treelite 4.2.1 contains the following improvements:

* Compatibility patch for latest RapidJSON (dmlc/treelite#567)
* Support for NumPy 2.0 (dmlc/treelite#562). Thanks @jameslamb
* Handle certain class of XGBoost models (dmlc/treelite#564)

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Ray Douglass (https://github.com/raydouglass)
  - James Lamb (https://github.com/jameslamb)

URL: #5908
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NumPy 2.0 support
4 participants