Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boost.locale makes std::regex not match anything #249

Open
Lord-Kamina opened this issue Dec 19, 2024 · 2 comments
Open

Boost.locale makes std::regex not match anything #249

Lord-Kamina opened this issue Dec 19, 2024 · 2 comments

Comments

@Lord-Kamina
Copy link

I had initially posted a comment in #35, but maybe it deserves its own issue instead.
I think it's essentially the same problem, except I'm on macOS 13,

$ clang++ -v
Apple clang version 15.0.0 (clang-1500.1.0.2.5)
Target: x86_64-apple-darwin22.6.0.

I'm using boost 1.86.0, built against ICU 74.2

I had seen this behavior before, and have never found a real solution. I now stumbled upon it again on a project. I spent about two days trying to tune my regex, thinking I must have made a mistake. Eventually I began simplifying it and simplifying it, without it resolving.

Eventually, I decided to make a minimal example to test it; so I have following code:

#include <boost/locale.hpp>
#include <iostream>
#include <locale>
#include <regex>
#include <string>

int main() {
	boost::locale::generator locGen;
	const std::locale loc = locGen("en_US.UTF-8");
// 	std::locale::global(loc);  
	auto pattern = std::regex(R"(^(?:\s)*([_[:alnum:].-]+)\s*=\s*([^;#\n\r]+)*)");
// 	pattern.imbue(loc);
	const std::string text{"  pozo = mani"};
	std::smatch result;
	std::regex_search(text, result, pattern);
	std::cout << "ready: " << result.ready() << ", size: " << result.size() << std::endl;
	for (size_t i=0; i < result.size(); i++) {
		std::cout << "match[" <<i<<"]: " << result[i] <<std::endl;
	}
	return 0;
}

Which outputs

$ clang++ -o regex_test regex_test.cpp -std=c++17 -I/opt/local/include/ -lboost_locale-mt -lboost_system-mt -L/opt/local/lib && ./regex_test
ready: 1, size: 3
match[0]:   pozo = mani
match[1]: pozo
match[2]: mani

If I uncomment the std::locale::global line (with or without the pattern.imbue), this happens instead:

clang++ -o regex_test regex_test.cpp -std=c++17 -I/opt/local/include/ -lboost_locale-mt -lboost_system-mt -L/opt/local/lib && ./regex_test
ready: 1, size: 0

I tried changing facets gradually, OR'ing them one by one and it always worked until I added std::locale::collate. From that point, removing all the others and keeping just std::locale::locate, still makes the regex not work.

#include <boost/locale.hpp>
#include <iostream>
#include <locale>
#include <regex>
#include <string>

int main() {
	boost::locale::generator locGen;
	const std::locale loc = locGen("en_US.UTF-8");
	std::locale testLoc = std::locale(std::locale::classic(), loc, std::locale::collate);
	std::locale::global(testLoc);
	auto pattern = std::regex(R"(^(?:\s)*([_[:alnum:].-]+)\s*=\s*([^;#\n\r]+)*)");
// 	pattern.imbue();
	const std::string text{"  pozo = mani"};
	std::smatch result;
	std::regex_search(text, result, pattern);
	std::cout << "ready: " << result.ready() << ", size: " << result.size() << std::endl;
	for (size_t i=0; i < result.size(); i++) {
		std::cout << "match[" <<i<<"]: " << result[i] <<std::endl;
	}
	return 0;
}

That already doesn't work.

@Lord-Kamina
Copy link
Author

Of note, this does not seem to happen with gcc and libstdc++. I have not yet tried mixing clang with libstc++ nor gcc with libc++.

@Flamefire
Copy link
Collaborator

in #35 it is also reported to fail with libc++. Also

C and POSIX work always ok. Every locale is affected: Even en_US.UTF-8.

Seemingly collation_facet is the culprit which was reported there and your example suggests the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants