Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream read/write with generated locales defaults to classic format/parse #174

Open
Flamefire opened this issue May 24, 2023 · 4 comments
Open

Comments

@Flamefire
Copy link
Collaborator

I've just seen that the default flag for stream operations with locales generated by Boost.Locale is posix, i.e. "classic".

This leads to the bug/inconsistency I've encountered at #64 (comment)

Assume this:

std::locale::global(boost::locale::generator{}("de_DE"));
external_func(); // Some other library not aware of Boost.Locale

And then:

std::string external_func(){
  std::cout << "Enter 1 + " << 123.25 << std::endl;
  float num;
  std::cin >> num;
  std::ostringstream os; // Uses global-locale
  os << num;
  return os.str();
}

So although a German locale is globally set and one can reasonably expect that all formatting and parsing is done with the German decimal separator "," this is not the case: It will use the classic locale which outputs "123.25" and entering "124,25" will be parsed as "12425" if it doesn't fail.

I traced this to base_num_format/base_num_parse which access the ios_info.display_flags() of the stream which defaults to posix

So one would need to do outstream << as::number and instream >> as::number first for every stream created which external libraries are not aware of leading to those issues.

Hence I'd change the default from posix to number so results are more intuitive and external libraries unware of Boost.Locale work.

Although I'd consider this a bugfix, this is a breaking change, hence I wanted to make sure I didn't miss anything.

But the described behavior really does sound like a bug to me, see also #64 (comment): Imagine an external library unaware of Boost.Locale using the global locale and inspecting the numpunct facet and then find that formatting/parsing doesn't behave as expected.

@Flamefire
Copy link
Collaborator Author

@artyom-beilis Would you oppose that change or have any input?

@salvoilmiosi You initially filed #64 and noted that as::number would be required. So do you have an opinion on that inconsistency of the numpunct facet and the visible behavior? I'd guess the change to as::number-by-default would suit your use case.

@artyom-beilis
Copy link
Member

This is by design.

Quoting documentation of standard library implementation

Setting the global locale has bad side effects.
Consider following code:
int main()
{
std::locale::global(std::locale(""));
// Set system's default locale as global
std::ofstream csv("test.csv");
csv << 1.1 << "," << 1.3 << std::endl;
}

What would be the content of test.csv ? It may be "1.1,1.3" or it may be "1,1,1,3" rather than what you had expected.
More than that it affects even printf and libraries like boost::lexical_cast giving incorrect or unexpected formatting.
In fact many third-party libraries are broken in such a situation.

Unlike the standard localization library, Boost.Locale never changes the basic number formatting, even when it uses
std based localization backends, so by default, numbers are always formatted using C-style locale. Localized number
formatting requires specific flags.

It is actually big issue setting locale can break many libraries like SQL, json and many others.

So Boost.Locale requires from user to tell explicitly that the number is localized and basically given to human rather than some kind of text interface.

@Flamefire
Copy link
Collaborator Author

I understand the motivation. I guess this is why the standard implicitly imbues the classic locale on startup and provides access to the classic locale at any point.

What would be the content of test.csv ? It may be "1.1,1.3" or it may be "1,1,1,3" rather than what you had expected.

I differ in that expectation in 2 points:

  • For csv << 1.3 I'd expect "1" + std::use_facet<std::numpunct<char>>(std::locale()).decimal_point() + "3"
  • By using std::locale::global I'd expect the global/default locale dependent behavior to be changed.

IMO those are reasonable expectations. Also:

libraries like boost::lexical_cast giving incorrect or unexpected formatting.

Again: The same expectation holds/pitfall waits: You'd usually change the global locale and then (maybe down in 3rd-party-libs) use boost::lexical_cast to parse/write user input but now you'll have no way for localizing that.

However with

Boost.Locale never changes the basic number formatting, even when it uses std based localization backends, so by default, numbers are always formatted using C-style locale.

in the documentation for ages and hence might be relied on by people so we can't easily change that.

So the bug is rather that the numpunct facet isn't used for (default) formatting, i.e. should always be the C-locale-numpunct.
Might need a test, that ensures the standard input/output is C-style and numpunct returns C-style values by default

Might be a good feature to be able to change that default behavior, similar to generator::use_ansi_encoding().

@artyom-beilis
Copy link
Member

artyom-beilis commented May 27, 2023

Small note, there is a reason why many libraries become broken when changing default locale.

Because it is rarely used std::locale - is broken by design and implementation.

std::locale is not compatible across different compiler vendors (locale names, encodings?), have broken features (like numpunct that can generate non-utf8 sequences) and doesn't even provide the most basic and important things for decent localization: translation catalogs - std::messages factet that needed way-way more than pretty number formatting. Standard didn't even define how to do it and it is implementation defined and generally does not work.

So bringing back broken stuff because it may be somehow used (actually not) instead of having consistent behavior wouldn't do any good to users.

I don't think default number formatting should be anything but "C"/classic locale based.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants