Implement support for uint64_t values in ICU backend #246

Flamefire · 2024-12-05T19:54:36Z

ICU doesn't support uint64_t directly but provides access to formatting and parsing of decimal number strings.

Use Boost.Charconv to interface with that such that values larger than INT64_MAX can be formatted correctly and parsed at all.

Fixes #235

As reported in #235 formatting the first number which doesn't fit into int64_t anymore fails to add the thousands separators. I.e.: `9223372036854775807` -> `9,223,372,036,854,775,807` `9223372036854775808` -> `9223372036854775808` Add a test reproducing that that for all backends.

ICU doesn't support uint64_t directly but provides access to formatting and parsing of decimal number strings. Use Boost.Charconv to interface with that. Fixes #235

ICU might return 9223372036854775810 as 9.22337203685477581E+18 Use the internal parser of Boost.Charconv to handle this.

`boost::charconv::detail::parser` is not made for parsing (large) integers in exponential notation. It is mainly tested for parsing floating point numbers in hexadecimal format. Given we know ICU will output either an integer string or a number in "E notation" (1.2E2) we can convert that rather easily to a "regular" integer string by "moving" the dot to the right according to the exponent. The trailing gap is filled with zeros before passing it to `from_chars` which is now able to handle the range checks for us. This avoids overflows that can happen when multiplying the significant by the exponent which, due to integer arithmetic, would be cumbersome to guard against. Any situation that could yield a fractional or a too large value can be caught early.

Instead of filling a temporary buffer we can decompose a number like "x.yyyEz" to "(x * 10^3 + yyy) * 10^(z - 3)" I.e. we subtract from the exponent what is required as an exponent to make the fractional into an integer significant. For the simple case of "xEz" we just do "x * 10^z". This requires additional range checks before multiplying but avoids extra memory accesses.

Flamefire force-pushed the fix-large-number-icu branch 4 times, most recently from 4e98bb5 to 0936dae Compare January 6, 2025 08:43

Flamefire force-pushed the fix-large-number-icu branch from 0936dae to bd756ef Compare January 11, 2025 11:43

Flamefire added 4 commits January 12, 2025 17:04

GHA: Show output of all runs of binaries in test folder

b219350

Implement support for uint64_t values in ICU backend

982cf24

ICU doesn't support uint64_t directly but provides access to formatting and parsing of decimal number strings. Use Boost.Charconv to interface with that. Fixes #235

Handle ICU version that keep parsed number in scientific format

3b90a5f

ICU might return 9223372036854775810 as 9.22337203685477581E+18 Use the internal parser of Boost.Charconv to handle this.

Flamefire force-pushed the fix-large-number-icu branch 3 times, most recently from fdc2fae to da2e86d Compare January 12, 2025 16:28

Flamefire added 2 commits January 12, 2025 17:31

Flamefire force-pushed the fix-large-number-icu branch from da2e86d to b7b933b Compare January 12, 2025 16:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement support for uint64_t values in ICU backend #246

Implement support for uint64_t values in ICU backend #246

Flamefire commented Dec 5, 2024

Implement support for uint64_t values in ICU backend #246

Are you sure you want to change the base?

Implement support for uint64_t values in ICU backend #246

Conversation

Flamefire commented Dec 5, 2024