-
-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Issue with SConsCPPConditionalScanner misinterpreting ifdefs suffixed with L or UL #4624
Fix Issue with SConsCPPConditionalScanner misinterpreting ifdefs suffixed with L or UL #4624
Conversation
…test each variant
Equivalent regex using quantifier.
|
@bdbaddog my c/c++ is a little rusty but I believe the literal integer suffixes are case insensitive.
Might have to be something like this (you get the idea): |
References:
EDIT: Probably should check if mixed case is accepted for |
@jcbrill - good point. I'll update and push a bit later today. |
MSVS 2022 seems to accept mixed case. Not sure about mingw-64/gcc. test.c:
compile and run:
output (only unreferenced variable warnings):
EDIT:
|
@bdbaddog The original implementation did not account for a I believe this may be correct: |
@bdbaddog I believe the suffix for the hex constant regex immediately above the changed line needs to be modified as well. |
The hex regex prefix should probably be While VS2022 accepts mixing case for long-long, I'm not sure the c specification specifically allows it:
Although for SCons purposes it probably does not matter. At worst, SCons accepts the specification and the compilation has warnings/errors(?). |
too bad Python re doesn't support POSIX character classes. then you could just use |
Fragment from pycparser:
Notes:
Makes sense. |
Candidate regexes without anchors (i.e., beginning of string
Notes:
Test symbols:
|
This seems it's getting overly complex. I had a proposal too, still sitting here in the browser as I apparently didn't press the "Comment" button 24 hours ago when I wrote it. Never mind. Python supports a case-ignoring regex, no? That would simplify it a bit... Further, octal constants need extra work if they're to be supported - the whole idea is to convert into a Python expression that can be evaluated; Python integers don't use the various long/unsigned suffixes so just stripping them is fine; however the syntax for octal is a leading This is just about not getting different results than the preprocessor when doing dependency scanning, so I don't think we need to support every conceivable situation until proven that something causes a real problem - which we do have a case of here. |
To add to the muck, both C and C++ standards are clear that the order of the unsigned suffix and the long or long-long suffix doesn't matter. C++ in the very latest edition seems to add a new suffix to this party:
while the C standard has its own addition:
Sigh. What's the smallest possible change we can make to move forward? |
Possibly. The standard applies that the long long token is case-insensitive and not necessarily the pair of characters independently (I think). If that is true, then ignoring case might not produce the same behavior as the standard(s) and/or compilers. As noted above, VS appears to ignore case in this instance. The ordering of the suffixes not mattering is a bit more problematic. A temporary variable holding the suffix specification would allow the other three regexes to append the variable contents or build via f-string. This may be a case of what is "just good enough" versus "follow a standard from which the compilers may be taking liberties".
Perhaps:
|
The optional integer suffix regex might be described by the assignment to
|
NOT TESTED +
+ int_suffix_opt = r'(?:[uU](?:l{1,2}|L{1,2}|wb|WB|z|Z)?|(?:l{1,2}|L{1,2}|z|Z|wb|WB)[uU]?)?'
# A separate list of expressions to be evaluated and substituted
# sequentially, not all at once.
CPP_to_Python_Eval_List = [
[r'defined\s+(\w+)', '"\\1" in __dict__'],
[r'defined\s*\((\w+)\)', '"\\1" in __dict__'],
- [r'(0x[0-9A-Fa-f]+)(?:L|UL)?', '\\1'],
+ [fr'(0[xX][0-9A-Fa-f]+){int_suffix_opt}', '\\1']
+ [fr'(0[bB][01]+){int_suffix_opt}', '\\1']
- [r'(\d+)(?:L|UL)?', '\\1'],
+ [fr'(\d+){int_suffix_opt}', '\\1'],
] Notes:
The above does not include the start anchor added in this PR: - [r'(\d+)(?:L|UL)?', '\\1'],
+ [r'^(\d+)(?:L|ULL|UL|U)?', '\\1'], I'm not sure what the definition of the beginning of the string is, but I would image as written, the start anchor may not match a negative integer (e.g., -100) specification. In some cases, zero-length word boundary anchors might work. Unary plus (+) and minus (-) could be a problem (which appears to exist in the current implementation). More work could be necessary to prevent matching a substring of a token rather than a complete token. Again, I have not looked at any of the surrounding code. |
@mwichmann At your convenience, request for comments. Possible solution:
Comments:
The negative lookbehind and lookahead should prevent capturing an integer specification in the middle of an identifier. At least that is the hope. The problem with using the beginning of the string anchor ( For example: I am still not sure if unary minus is a problem when used with comparison operators. The unary minus is not part of the integer token in the c grammar (i.e., I believe the unary minus operator is applied to the integer constant). Unary plus is likely the same. Not sure if this matters. If it does, this makes the problem more difficult. It is too bad the actual c preprocessor isn't run against the file of interest and the preprocessor output parsed for include files. pycparser expects the preprocessor output rather than the source files. I thought this could be done with gcc and msvc (i.e., invoke only the preprocessor without compiling). Not sure about other compilers. |
We're not using a real lexer, so this may or may not be a concern... I think #define PAGE_SIZE (1UL << PAGE_SHIFT) The thing is, all of these may not be real problems - the only thing the scanner needs to do is make a good guess about file-level dependencies. If the macro above isn't used to somehow protect a header file from being included, it shouldn't matter if we convert it when we shouldn't, or vice versa. |
I believe it is line-based or line-based fragments and not word-based. I believe there is a problem with definitions that use the optional extended suffix even if they are constants as the suffix is not stripped in the definition. For example:
I need to test against the current code again. The issue is that the definition uses the symbol Edit: line-based fragments |
The Verbose output for the PR code change and an alternative implementation is shown below. There are likely still errors in the alternative implementation. Degenerate examples where the PR (and master) code produces incorrect results:
Alternative implementation:
|
Closing in favor of #4629 |
SConsCPPConditionalScanner
Was misinterpreting
#ifdef X123L
and
#ifdef X123UL
And removing L and UL which is only appropriate when the ifdef'd symbol is numeric with L or UL suffixing.
See Issue #4623 for some more discussion.
Contributor Checklist:
CHANGES.txt
andRELEASE.txt
(and read theREADME.rst
).