-
-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent behavior - boost::locale defines numpunct
for non ICU backends
#64
Comments
Because I explain. It shouldn't ever return character but a string. For example in many locales thousands separator is NBSP character and it can't be represented as Additionally |
Also the posix backend does implement a subclass of std::numpunct and ignores all outputs that have a size>1, reverting decimal_point to dot and thousand sep to comma. Why can't it be implemented the same way in icu? I have quickly written a mockup and it works:
|
Because in POSIX backend I tried to do the best I can given the tools (sine I didn't format the number on my own). If you want to use some ICU specific API just use ICU directly. |
@artyom-beilis I think it is still a good idea to implement this:
--> The above mockup shows that "given the tools" (ICU here) one can implement that with minimal effort in a "good enough" way. IMO simply returning ", ." is wrong, if that doesn't match what is actually used at all. Returning wrong results is often problematic, so I'd expect an exception or similar here instead if the API can't be fulfilled. |
It is right and I explain why. Lets look on the code:
Will print lets say in ru_RU.UTF-8 locale following:
Note: numpunct still refer to standard facet that is used for output. If you want to format localized number you use |
The problem is that some external libraries rely on |
Then they do plain wrong since std localization is broken, at point that it actually generated wrong UTF-8 because shown only 1st byte of multi-byte sequence. I have no reason to fix something broken by design. If you want to format number it is also wrong to use facet because number formatting is much more than just separator for example ١٢ is very valid and good number... Are you supporting it as well? |
Why is std::numpunct implemented in all localization backends except icu then? I get it you're trying to use whatever tools the language gives you, but this is inconsistent design. Why don't we just add a boost::locale::numpunct that returns std::string? |
This is what I have to do to make decimal_for_cpp compatible with boost.locale, is this "broken by design"?
|
I have to agree to @artyom-beilis here. Check this code:
So from what I can tell this is fully correct. the sep and decimal_point are shown as used.
|
This is good point. The issue that I shouldn't modify https://github.com/boostorg/locale/blob/develop/src/util/numeric.hpp#L119 The special case should be all others and not
The result is:
|
numpunct
for non ICU backends
Woops, yes: Check this:
It returns
And that IS wrong, isn't it? --> With ICU backend it is correct, with POSIX it is broken |
you need to use boost::locale::as::number for that kind of formatting. |
Not if I specifically want to use the std formatting. I.e. pretend to be some component which does not know about Boost.Locale |
Actually even for C locale thounthands separator is "," but there is no grouping.. The bug/inconsistency is that I can for example give decimal_separator = "," with de_DE locale on POSIX/WIN/STD backends while it should be "." |
Just to warn the participants... I have no idea when I'll be able to fix it. Since it isn't that trivial. So I suggest to look for a workaround before and not wait to fix :-( |
While working at this I now understand why. And the inconsistencies bug me, not sure how to resolve them:
Making the One solution is to have a fully separate However it breaks existing "workarounds" such as the one described in #64 (comment) and makes it impossible to use a localized input/output in third-party libs. Moreover there is no existing solution when/how localized boolalpha input/output should be used. So regarding the doku example (writing a CSV file) and related: A library implicitly expecting classic-locale but not actually setting it may as well be considered broken. So idea:
|
This is related to #235:
The 2nd case can be ignored for this. There could be a problem with parsing a number written before: The posix locale doesn't use a grouping, hence no thousand separator. If we try to parse a locale formatted number with the fallback it might fail if the locale used grouping. Case 3 is basically what happens in #235: Such numbers get formatted using the C locale although it might be possible to use locale specific formatting when the separators are single-byte. Note that the approach of using a replacement char doesn't work for the ICU backend as e.g. an |
This code always outputs ", ." regardless of locale.
Right now the only way to extract std::numpunct information is to format an arbitrary number to a string and extract its, say, second and sixth character, and that's not a very efficient implementation.
I would suggest creating a std::numpunct subclass in icu/numeric.cpp that extracts that information from the
icu::DecimalFormatSymbols
class usinggetSymbol
The text was updated successfully, but these errors were encountered: