Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support for uint64_t values in ICU backend #246

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

Flamefire
Copy link
Collaborator

ICU doesn't support uint64_t directly but provides access to formatting and parsing of decimal number strings.

Use Boost.Charconv to interface with that such that values larger than INT64_MAX can be formatted correctly and parsed at all.

Fixes #235

@Flamefire Flamefire force-pushed the fix-large-number-icu branch 4 times, most recently from 4e98bb5 to 0936dae Compare January 6, 2025 08:43
@Flamefire Flamefire force-pushed the fix-large-number-icu branch from 0936dae to bd756ef Compare January 11, 2025 11:43
As reported in #235 formatting the first number which doesn't fit into
int64_t anymore fails to add the thousands separators.
I.e.:
`9223372036854775807` -> `9,223,372,036,854,775,807`
`9223372036854775808` -> `9223372036854775808`

Add a test reproducing that that for all backends.
ICU doesn't support uint64_t directly but provides access to formatting
and parsing of decimal number strings.
Use Boost.Charconv to interface with that.

Fixes #235
ICU might return 9223372036854775810 as 9.22337203685477581E+18
Use the internal parser of Boost.Charconv to handle this.
@Flamefire Flamefire force-pushed the fix-large-number-icu branch 3 times, most recently from fdc2fae to da2e86d Compare January 12, 2025 16:28
`boost::charconv::detail::parser` is not made for parsing (large)
integers in exponential notation.
It is mainly tested for parsing floating point numbers in hexadecimal format.

Given we know ICU will output either an integer string or a number in
"E notation" (1.2E2) we can convert that rather easily to a "regular"
integer string by "moving" the dot to the right according to the
exponent. The trailing gap is filled with zeros before passing it to
`from_chars` which is now able to handle the range checks for us.

This avoids overflows that can happen when multiplying the
significant by the exponent which, due to integer arithmetic, would be
cumbersome to guard against.

Any situation that could yield a fractional or a too large value can be caught early.
Instead of filling a temporary buffer we can decompose a number like
"x.yyyEz" to "(x * 10^3 + yyy) * 10^(z - 3)"
I.e. we subtract from the exponent what is required as an exponent to
make the fractional into an integer significant.
For the simple case of "xEz" we just do "x * 10^z".

This requires additional range checks before multiplying but avoids
extra memory accesses.
@Flamefire Flamefire force-pushed the fix-large-number-icu branch from da2e86d to b7b933b Compare January 12, 2025 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

std::uint64_t numbers above a certain value are not formatted correctly
1 participant