Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Issue with SConsCPPConditionalScanner misinterpreting ifdefs suffixed with L or UL #4624

Closed

Conversation

bdbaddog
Copy link
Contributor

@bdbaddog bdbaddog commented Nov 5, 2024

SConsCPPConditionalScanner

Was misinterpreting
#ifdef X123L
and
#ifdef X123UL
And removing L and UL which is only appropriate when the ifdef'd symbol is numeric with L or UL suffixing.

See Issue #4623 for some more discussion.

Contributor Checklist:

  • I have created a new test or updated the unit tests to cover the new/changed functionality.
  • I have updated CHANGES.txt and RELEASE.txt (and read the README.rst).
  • I have updated the appropriate documentation

@jcbrill
Copy link
Contributor

jcbrill commented Nov 5, 2024

Equivalent regex using quantifier.

UL{0,2} U character followed by zero to two L characters.

^(\d+)(?:L|UL{0,2})?

@jcbrill
Copy link
Contributor

jcbrill commented Nov 5, 2024

@bdbaddog my c/c++ is a little rusty but I believe the literal integer suffixes are case insensitive.

l L u U ul UL etc...

Might have to be something like this (you get the idea):
^(\d+)(?:[lL]|[uU][lL]{0,2})?

@jcbrill
Copy link
Contributor

jcbrill commented Nov 5, 2024

References:

EDIT: Probably should check if mixed case is accepted for ll or LL (e.g., lL or Ll) by the c compilers as the quantifier method as written in the previous post accepts mixed case for the trailing L character(s).

@bdbaddog
Copy link
Contributor Author

bdbaddog commented Nov 5, 2024

@jcbrill - good point. I'll update and push a bit later today.

@jcbrill
Copy link
Contributor

jcbrill commented Nov 5, 2024

MSVS 2022 seems to accept mixed case.

Not sure about mingw-64/gcc.

test.c:

#include <stdio.h>

#if 123u
#endif
#if 123U
#endif
#if 123l
#endif
#if 123L
#endif
#if 123ll
#endif
#if 123LL
#endif
#if 123lL
#endif
#if 123Ll
#endif
#if 123ul
#endif
#if 123UL
#endif
#if 123uL
#endif
#if 123Ul
#endif
#if 123ull
#endif
#if 123ULL
#endif
#if 123Ull
#endif
#if 123uLL
#endif
#if 123UlL
#endif
#if 123ULl
#endif
#if 123ulL
#endif
#if 123uLl
#endif

int main(int argc, char** argv) {

    unsigned u_1 = 123u;
    unsigned u_2 = 123U;
    long l_1 = 123l;
    long l_2 = 123L;
    long long ll_1 = 123ll;
    long long ll_2 = 123LL;
    long long ll_3 = 123lL;
    long long ll_4 = 123Ll;
    unsigned long ul_1 = 123ul;
    unsigned long ul_2 = 123UL;
    unsigned long ul_3 = 123uL;
    unsigned long ul_4 = 123Ul;
    unsigned long long ull_1 = 123ull;
    unsigned long long ull_2 = 123ULL;
    unsigned long long ull_3 = 123Ull;
    unsigned long long ull_4 = 123uLL;
    unsigned long long ull_5 = 123UlL;
    unsigned long long ull_6 = 123ULl;
    unsigned long long ull_7 = 123ulL;
    unsigned long long ull_8 = 123uLl;

    printf("Hello, World!\n");

    return 0;
}

compile and run:

@setlocal
@if exist test.obj @del test.obj
@if exist test.exe @del test.exe
@call C:\Software\MSVS-2019-142-Com\VC\Auxiliary\Build\vcvarsall.bat amd64
cl /Wall test.c
@test.exe
@endlocal

output (only unreferenced variable warnings):

**********************************************************************
** Visual Studio 2019 Developer Command Prompt v16.11.41
** Copyright (c) 2021 Microsoft Corporation
**********************************************************************
[vcvarsall.bat] Environment initialized for: 'x64'
test.c
test.c(44): warning C4100: 'argv': unreferenced formal parameter
test.c(44): warning C4100: 'argc': unreferenced formal parameter
test.c(59): warning C4189: 'ull_2': local variable is initialized but not referenced
test.c(61): warning C4189: 'ull_4': local variable is initialized but not referenced
test.c(57): warning C4189: 'ul_4': local variable is initialized but not referenced
test.c(56): warning C4189: 'ul_3': local variable is initialized but not referenced
test.c(47): warning C4189: 'u_2': local variable is initialized but not referenced
test.c(62): warning C4189: 'ull_5': local variable is initialized but not referenced
test.c(60): warning C4189: 'ull_3': local variable is initialized but not referenced
test.c(52): warning C4189: 'll_3': local variable is initialized but not referenced
test.c(51): warning C4189: 'll_2': local variable is initialized but not referenced
test.c(49): warning C4189: 'l_2': local variable is initialized but not referenced
test.c(55): warning C4189: 'ul_2': local variable is initialized but not referenced
test.c(58): warning C4189: 'ull_1': local variable is initialized but not referenced
test.c(53): warning C4189: 'll_4': local variable is initialized but not referenced
test.c(54): warning C4189: 'ul_1': local variable is initialized but not referenced
test.c(64): warning C4189: 'ull_7': local variable is initialized but not referenced
test.c(48): warning C4189: 'l_1': local variable is initialized but not referenced
test.c(63): warning C4189: 'ull_6': local variable is initialized but not referenced
test.c(50): warning C4189: 'll_1': local variable is initialized but not referenced
test.c(46): warning C4189: 'u_1': local variable is initialized but not referenced
test.c(65): warning C4189: 'ull_8': local variable is initialized but not referenced
Microsoft (R) Incremental Linker Version 14.29.30156.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:test.exe 
test.obj 
Hello, World!

EDIT:

  • added long long

@jcbrill
Copy link
Contributor

jcbrill commented Nov 5, 2024

@bdbaddog The original implementation did not account for a long long (e.g., ll or LL) specifier only long (e.g., l or L).

I believe this may be correct: ^(\d+)(?:[lL]{1,2}|[uU][lL]{0,2})?

@jcbrill
Copy link
Contributor

jcbrill commented Nov 5, 2024

@bdbaddog I believe the suffix for the hex constant regex immediately above the changed line needs to be modified as well.

@jcbrill
Copy link
Contributor

jcbrill commented Nov 5, 2024

The hex regex prefix should probably be 0[xX] as 0x and 0X are allowed.

While VS2022 accepts mixing case for long-long, I'm not sure the c specification specifically allows it:

(6.4.4.1) hexadecimal-prefix: one of
              0x    0X

...

(6.4.4.1) long-long-suffix: one of
              ll    LL

Although for SCons purposes it probably does not matter. At worst, SCons accepts the specification and the compilation has warnings/errors(?).

@mwichmann
Copy link
Collaborator

too bad Python re doesn't support POSIX character classes. then you could just use [:xdigit:].

@jcbrill
Copy link
Contributor

jcbrill commented Nov 6, 2024

Fragment from pycparser:

    hex_prefix = '0[xX]'
    hex_digits = '[0-9a-fA-F]+'
    bin_prefix = '0[bB]'
    bin_digits = '[01]+'

    # integer constants (K&R2: A.2.5.1)
    integer_suffix_opt = r'(([uU]ll)|([uU]LL)|(ll[uU]?)|(LL[uU]?)|([uU][lL])|([lL][uU]?)|[uU])?'
    decimal_constant = '(0'+integer_suffix_opt+')|([1-9][0-9]*'+integer_suffix_opt+')'
    octal_constant = '0[0-7]*'+integer_suffix_opt
    hex_constant = hex_prefix+hex_digits+integer_suffix_opt
    bin_constant = bin_prefix+bin_digits+integer_suffix_opt

Notes:

  • accepts ll and LL but not lL and Ll
  • accepts ull, Ull, uLL, ULL, llu, llU, LLu, LLU
  • accepts hex constants 0x and 0X
  • accepts binary constants 0b and 0B

Makes sense.

@jcbrill
Copy link
Contributor

jcbrill commented Nov 6, 2024

Candidate regexes without anchors (i.e., beginning of string ^ and/or end of string $):

r'(0[xX][0-9a-fA-F]+)(?:([uU]ll)|([uU]LL)|(ll[uU]?)|(LL[uU]?)|([uU][lL])|([lL][uU]?)|[uU])?'  # Hexadecimal
r'(0[bB][01]+)(?:([uU]ll)|([uU]LL)|(ll[uU]?)|(LL[uU]?)|([uU][lL])|([lL][uU]?)|[uU])?'         # Binary
r'(0|[1-9][0-9]*)(?:([uU]ll)|([uU]LL)|(ll[uU]?)|(LL[uU]?)|([uU][lL])|([lL][uU]?)|[uU])?'      # Decimal
r'(0[0-7]*)(?:([uU]ll)|([uU]LL)|(ll[uU]?)|(LL[uU]?)|([uU][lL])|([lL][uU]?)|[uU])?'            # Octal

Notes:

  • The optional integer suffix is the same for all four regexes. The optional integer suffix should perhaps be a constant that is added to each of the four regexes for maintenance and readability.
  • As written, the decimal and octal regexes will both match 0 with an optional suffix
  • As noted above, VS2022 appears to accept optional integer suffixes that the c specification does not.

Test symbols:

# Hexadecimal

0x0123456789ABCDEF
0x0123456789ABCDEFull
0x0123456789ABCDEFUll
0x0123456789ABCDEFuLL
0x0123456789ABCDEFULL
0x0123456789ABCDEFllu
0x0123456789ABCDEFllU
0x0123456789ABCDEFLLu
0x0123456789ABCDEFLLU
0x0123456789ABCDEFul
0x0123456789ABCDEFuL
0x0123456789ABCDEFUl
0x0123456789ABCDEFUL
0x0123456789ABCDEFlu
0x0123456789ABCDEFlU
0x0123456789ABCDEFl
0x0123456789ABCDEFLu
0x0123456789ABCDEFLU
0x0123456789ABCDEFL
0x0123456789ABCDEFu
0x0123456789ABCDEFU
0X0123456789ABCDEF
0X0123456789ABCDEFull
0X0123456789ABCDEFUll
0X0123456789ABCDEFuLL
0X0123456789ABCDEFULL
0X0123456789ABCDEFllu
0X0123456789ABCDEFllU
0X0123456789ABCDEFLLu
0X0123456789ABCDEFLLU
0X0123456789ABCDEFul
0X0123456789ABCDEFuL
0X0123456789ABCDEFUl
0X0123456789ABCDEFUL
0X0123456789ABCDEFlu
0X0123456789ABCDEFlU
0X0123456789ABCDEFl
0X0123456789ABCDEFLu
0X0123456789ABCDEFLU
0X0123456789ABCDEFL
0X0123456789ABCDEFu
0X0123456789ABCDEFU
0x0123456789abcdef
0x0123456789abcdefull
0x0123456789abcdefUll
0x0123456789abcdefuLL
0x0123456789abcdefULL
0x0123456789abcdefllu
0x0123456789abcdefllU
0x0123456789abcdefLLu
0x0123456789abcdefLLU
0x0123456789abcdeful
0x0123456789abcdefuL
0x0123456789abcdefUl
0x0123456789abcdefUL
0x0123456789abcdeflu
0x0123456789abcdeflU
0x0123456789abcdefl
0x0123456789abcdefLu
0x0123456789abcdefLU
0x0123456789abcdefL
0x0123456789abcdefu
0x0123456789abcdefU
0X0123456789abcdef
0X0123456789abcdefull
0X0123456789abcdefUll
0X0123456789abcdefuLL
0X0123456789abcdefULL
0X0123456789abcdefllu
0X0123456789abcdefllU
0X0123456789abcdefLLu
0X0123456789abcdefLLU
0X0123456789abcdeful
0X0123456789abcdefuL
0X0123456789abcdefUl
0X0123456789abcdefUL
0X0123456789abcdeflu
0X0123456789abcdeflU
0X0123456789abcdefl
0X0123456789abcdefLu
0X0123456789abcdefLU
0X0123456789abcdefL
0X0123456789abcdefu
0X0123456789abcdefU

# Binary

0b01
0b01ull
0b01Ull
0b01uLL
0b01ULL
0b01llu
0b01llU
0b01LLu
0b01LLU
0b01ul
0b01uL
0b01Ul
0b01UL
0b01lu
0b01lU
0b01l
0b01Lu
0b01LU
0b01L
0b01u
0b01U
0b01
0B01ull
0B01Ull
0B01uLL
0B01ULL
0B01llu
0B01llU
0B01LLu
0B01LLU
0B01ul
0B01uL
0B01Ul
0B01UL
0B01lu
0B01lU
0B01l
0B01Lu
0B01LU
0B01L
0B01u
0B01U

# Decimal

0
0ull
0Ull
0uLL
0ULL
0llu
0llU
0LLu
0LLU
0ul
0uL
0Ul
0UL
0lu
0lU
0l
0Lu
0LU
0L
0u
0U
1234567890
1234567890ull
1234567890Ull
1234567890uLL
1234567890ULL
1234567890llu
1234567890llU
1234567890LLu
1234567890LLU
1234567890ul
1234567890uL
1234567890Ul
1234567890UL
1234567890lu
1234567890lU
1234567890l
1234567890Lu
1234567890LU
1234567890L
1234567890u
1234567890U

# Octal

01234567
01234567ull
01234567Ull
01234567uLL
01234567ULL
01234567llu
01234567llU
01234567LLu
01234567LLU
01234567ul
01234567uL
01234567Ul
01234567UL
01234567lu
01234567lU
01234567l
01234567Lu
01234567LU
01234567L
01234567u
01234567U

@mwichmann
Copy link
Collaborator

This seems it's getting overly complex. I had a proposal too, still sitting here in the browser as I apparently didn't press the "Comment" button 24 hours ago when I wrote it. Never mind. Python supports a case-ignoring regex, no? That would simplify it a bit...

Further, octal constants need extra work if they're to be supported - the whole idea is to convert into a Python expression that can be evaluated; Python integers don't use the various long/unsigned suffixes so just stripping them is fine; however the syntax for octal is a leading 0o so it would need prefix fiddling instead.

This is just about not getting different results than the preprocessor when doing dependency scanning, so I don't think we need to support every conceivable situation until proven that something causes a real problem - which we do have a case of here.

@mwichmann
Copy link
Collaborator

mwichmann commented Nov 6, 2024

To add to the muck, both C and C++ standards are clear that the order of the unsigned suffix and the long or long-long suffix doesn't matter. C++ in the very latest edition seems to add a new suffix to this party: z or Z, as follows:

integer-suffix :
    unsigned-suffix long-suffix(opt)
    unsigned-suffix long-long-suffix(opt)
    unsigned-suffix size-suffix(opt)
    long-suffix unsigned-suffix(opt)
    long-long-suffix unsigned-suffix(opt)
    size-suffix unsigned-suffix(opt)
unsigned-suffix : one of
    u U
long-suffix : one of
    l L
long-long-suffix : one of
    ll LL
size-suffix : one of
    z Z

while the C standard has its own addition:

integer-suffix:
    unsigned-suffix long-suffix(opt)
    unsigned-suffix long-long-suffix
    unsigned-suffix bit-precise-int-suffix
    long-suffix unsigned-suffix(opt)
    long-long-suffix unsigned-suffix(opt)
    bit-precise-int-suffix unsigned-suffix(opt)
bit-precise-int-suffix: one of
    wb WB
unsigned-suffix: one of
    u U
long-suffix: one of
    l L
long-long-suffix: one of
    ll LL

Sigh. What's the smallest possible change we can make to move forward?

@jcbrill
Copy link
Contributor

jcbrill commented Nov 6, 2024

Python supports a case-ignoring regex, no? That would simplify it a bit...

Possibly. The standard applies that the long long token is case-insensitive and not necessarily the pair of characters independently (I think). If that is true, then ignoring case might not produce the same behavior as the standard(s) and/or compilers. As noted above, VS appears to ignore case in this instance.

The ordering of the suffixes not mattering is a bit more problematic.

A temporary variable holding the suffix specification would allow the other three regexes to append the variable contents or build via f-string.

This may be a case of what is "just good enough" versus "follow a standard from which the compilers may be taking liberties".

Sigh. What's the smallest possible change we can make to move forward?

Perhaps:

  • specify the suffix regex in a temporary
  • add the suffix regex to the hexadecimal specification which was not updated in this PR (but should)
  • add the suffix regex to the decimal specification
  • [optionally] add the binary regex specification
  • ignore octal for now as the current decimal specification accepts all octal numbers anyway

@jcbrill
Copy link
Contributor

jcbrill commented Nov 6, 2024

The optional integer suffix regex might be described by the assignment to int_suffix_opt:

# unsigned-suffix long-suffix(opt)
# unsigned-suffix long-long-suffix(opt)
# unsigned-suffix size-suffix(opt)
# unsigned-suffix bit-precise-int-suffix

[uU](?:l{1,2}|L{1,2}|wb|WB|z|Z)?

# long-suffix unsigned-suffix(opt)
# long-long-suffix unsigned-suffix(opt)
# size-suffix unsigned-suffix(opt)
# bit-precise-int-suffix unsigned-suffix(opt)

(?:l{1,2}|L{1,2}|z|Z|wb|WB)[uU]?

Combined:

int_suffix_opt = r'(?:[uU](?:l{1,2}|L{1,2}|wb|WB|z|Z)?|(?:l{1,2}|L{1,2}|z|Z|wb|WB)[uU]?)?'

@jcbrill
Copy link
Contributor

jcbrill commented Nov 6, 2024

NOT TESTED

+
+   int_suffix_opt = r'(?:[uU](?:l{1,2}|L{1,2}|wb|WB|z|Z)?|(?:l{1,2}|L{1,2}|z|Z|wb|WB)[uU]?)?'

    # A separate list of expressions to be evaluated and substituted
    # sequentially, not all at once.
    CPP_to_Python_Eval_List = [
        [r'defined\s+(\w+)',                 '"\\1" in __dict__'],
        [r'defined\s*\((\w+)\)',             '"\\1" in __dict__'],
-       [r'(0x[0-9A-Fa-f]+)(?:L|UL)?',  '\\1'],
+       [fr'(0[xX][0-9A-Fa-f]+){int_suffix_opt}', '\\1']
+       [fr'(0[bB][01]+){int_suffix_opt}', '\\1']
-       [r'(\d+)(?:L|UL)?',  '\\1'],
+       [fr'(\d+){int_suffix_opt}',  '\\1'],
    ]

Notes:

  • Fixes hexadecimal spec (0[xX]) and optional integer suffix.
  • Adds binary specification (0[bB]) and optional integer suffix
  • Fixes decimal specification optional integer suffix
  • The decimal specification accepts octal values as before

The above does not include the start anchor added in this PR:

-       [r'(\d+)(?:L|UL)?',  '\\1'],
+       [r'^(\d+)(?:L|ULL|UL|U)?',  '\\1'],

I'm not sure what the definition of the beginning of the string is, but I would image as written, the start anchor may not match a negative integer (e.g., -100) specification.

In some cases, zero-length word boundary anchors might work. Unary plus (+) and minus (-) could be a problem (which appears to exist in the current implementation).

More work could be necessary to prevent matching a substring of a token rather than a complete token. Again, I have not looked at any of the surrounding code.

@jcbrill
Copy link
Contributor

jcbrill commented Nov 6, 2024

@mwichmann At your convenience, request for comments.

Possible solution:

int_suffix_opt = r'(?:[uU](?:l{1,2}|L{1,2}|wb|WB|z|Z)?|(?:l{1,2}|L{1,2}|z|Z|wb|WB)[uU]?)?'  # <= added optional suffix
neg_lookbehind = r'(?<![a-zA-Z0-9_])'                                                       # <= added 
neg_lookahead = r'(?![a-zA-Z0-9_])'                                                         # <= added

# A separate list of expressions to be evaluated and substituted
# sequentially, not all at once.
CPP_to_Python_Eval_List = [
    [r'defined\s+(\w+)',                 '"\\1" in __dict__'],
    [r'defined\s*\((\w+)\)',             '"\\1" in __dict__'],
    [fr'{neg_lookbehind}(0[xX][0-9A-Fa-f]+){int_suffix_opt}{neg_lookahead}', '\\1'],  # <= modified hexadecimal
    [fr'{neg_lookbehind}(0[bB][01]+){int_suffix_opt}{neg_lookahead}', '\\1'],         # <= added binary
    [fr'{neg_lookbehind}(\d+){int_suffix_opt}{neg_lookahead}', '\\1'],                # <= modified decimal
]

Comments:

  • int_suffix_opt is the optional integer suffix specifications
  • neg_lookbehind is the hex/bin/dec expression is not preceded by a valid identifier character [a-zA-Z0-9_]
  • neg_lookahead is the hex/bin/dec expression not followed by a valid identifier character [a-zA-Z0-9_]

The negative lookbehind and lookahead should prevent capturing an integer specification in the middle of an identifier. At least that is the hope.

The problem with using the beginning of the string anchor (^) and end of string anchor ($) is that the string being substituted is not necessarily a single token (in a lexer context).

For example: #if X123U || X123ULL || 1234L. The first two terms should be ignored (identifiers) and the third term should have the L stripped. The third term is not the beginning of the string.

I am still not sure if unary minus is a problem when used with comparison operators. The unary minus is not part of the integer token in the c grammar (i.e., I believe the unary minus operator is applied to the integer constant). Unary plus is likely the same. Not sure if this matters. If it does, this makes the problem more difficult.

It is too bad the actual c preprocessor isn't run against the file of interest and the preprocessor output parsed for include files. pycparser expects the preprocessor output rather than the source files.

I thought this could be done with gcc and msvc (i.e., invoke only the preprocessor without compiling). Not sure about other compilers.

@mwichmann
Copy link
Collaborator

The problem with using the beginning of the string anchor (^) and end of string anchor ($) is that the string being substituted is not necessarily a single token (in a lexer context).

For example: #if X123U || X123ULL || 1234L. The first two terms should be ignored (identifiers) and the third term should have the L stripped. The third term is not the beginning of the string.

We're not using a real lexer, so this may or may not be a concern... I think SCons/cpp.py splits into words, but I'm not motivated to check at the moment. There's a related possible problem, though, with parenthesized expressions - not function-like macros, but ones like this:

#define PAGE_SIZE		(1UL << PAGE_SHIFT)

The thing is, all of these may not be real problems - the only thing the scanner needs to do is make a good guess about file-level dependencies. If the macro above isn't used to somehow protect a header file from being included, it shouldn't matter if we convert it when we shouldn't, or vice versa.

@jcbrill
Copy link
Contributor

jcbrill commented Nov 7, 2024

I think SCons/cpp.py splits into words, but I'm not motivated to check at the moment.

I believe it is line-based or line-based fragments and not word-based.

I believe there is a problem with definitions that use the optional extended suffix even if they are constants as the suffix is not stripped in the definition.

For example:

#define X0U 0U
#if X0U
#include <file904-yes>
#else
#include <file904-no>
#endif

I need to test against the current code again. The issue is that the definition uses the symbol 0U. When the integer conversion for 0U fails it becomes the string literal '0U'. The string literal rather than the value 0 causes the #if to be true.

Edit: line-based fragments

@jcbrill
Copy link
Contributor

jcbrill commented Nov 7, 2024

The do_define method needs to evaluate (i.e., eval_expression) it's value rather than simply attempting an int conversion.

Verbose output for the PR code change and an alternative implementation is shown below. There are likely still errors in the alternative implementation.

Degenerate examples where the PR (and master) code produces incorrect results:

  • Example 1

    #define X2345ULL 1
    #if !(X2345ULL > 4567ull)
    #include <file901-yes>    # <=== Expected
    #else
    #include <file901-no>
    #endif
    

    Current PR [<file901-no>]:

    do_define ('X2345ULL', None, '1')
    do_define cpp_namespace['X2345ULL']: 1
    eval_expression CPP_to_Python ('!(X2345ULL > 4567ull)',)
    CPP_to_Python: '!(X2345ULL > 4567ull)' -> ' not (X2345ULL > 4567ull)' -> ' not (X2345ULL > 4567ull)'  # <=== 4567ull INTEGER LITERAL
    eval_expression cpp_namespace:
      'X2345ULL': 1
    eval_expression ' not (X2345ULL > 4567ull)' -> <class 'SyntaxError'> -> 0
    

    Alternative [<file901-yes>]:

    do_define ('X2345ULL', None, '1')
    eval_expression CPP_to_Python ('1',)
    CPP_to_Python: '1' -> '1' -> '1'
    eval_expression '1' -> 1
    do_define cpp_namespace['X2345ULL']: 1
    eval_expression CPP_to_Python ('!(X2345ULL > 4567ull)',)
    CPP_to_Python: '!(X2345ULL > 4567ull)' -> ' not (X2345ULL > 4567ull)' -> ' not (X2345ULL > 4567)'
    eval_expression cpp_namespace:
      'X2345ULL': 1
    eval_expression ' not (X2345ULL > 4567)' -> True
    
  • Example 2:

    #if !0ull
    #include <file902-yes>  # <=== Expected
    #else
    #include <file902-no>
    #endif
    

    Current PR [<file902-no>]:

    eval_expression CPP_to_Python ('!0ull',)
    CPP_to_Python: '!0ull' -> ' not 0ull' -> ' not 0ull'
    eval_expression ' not 0ull' -> <class 'SyntaxError'> -> 0
    

    Alternative [<file902-yes>]:

    eval_expression CPP_to_Python ('!0ull',)
    CPP_to_Python: '!0ull' -> ' not 0ull' -> ' not 0o0'
    eval_expression ' not 0o0' -> True
    
  • Example 3:

    #define X0U 0U
    #if X0U
    #include <file904-yes>
    #else
    #include <file904-no>  # <=== Expected
    #endif
    

    Current PR [<file904-yes>]:

    do_define ('X0U', None, '0U')
    do_define cpp_namespace['X0U']: '0U'
    eval_expression CPP_to_Python ('X0U',)
    CPP_to_Python: 'X0U' -> 'X0U' -> 'X0U'
    eval_expression cpp_namespace:
      'X0U': '0U'
    eval_expression 'X0U' -> '0U'
    

    Alternative [<file904-no>]:

    do_define ('X0U', None, '0U')
    eval_expression CPP_to_Python ('0U',)
    CPP_to_Python: '0U' -> '0U' -> '0o0'
    eval_expression '0o0' -> 0
    do_define cpp_namespace['X0U']: 0
    eval_expression CPP_to_Python ('X0U',)
    CPP_to_Python: 'X0U' -> 'X0U' -> 'X0U'
    eval_expression cpp_namespace:
      'X0U': 0
    eval_expression 'X0U' -> 0
    
  • Example 4:

    #define XF1 (0x0U & 0x1U)
    #if XF1
    #include <file905-yes>
    #else
    #include <file905-no>  # <=== Expected
    #endif
    

    Current PR [<file905-yes>]:

    do_define ('XF1', None, '(0x0U & 0x1U)')
    do_define cpp_namespace['XF1']: '(0x0U & 0x1U)'
    eval_expression CPP_to_Python ('XF1',)
    CPP_to_Python: 'XF1' -> 'XF1' -> 'XF1'
    eval_expression cpp_namespace:
      'XF1': '(0x0U & 0x1U)'
    eval_expression 'XF1' -> '(0x0U & 0x1U)'
    

    Alternative [<file905-no>]:

    do_define ('XF1', None, '(0x0U & 0x1U)')
    eval_expression CPP_to_Python ('(0x0U & 0x1U)',)
    CPP_to_Python: '(0x0U & 0x1U)' -> '(0x0U & 0x1U)' -> '(0x0 & 0x1)'
    eval_expression '(0x0 & 0x1)' -> 0
    do_define cpp_namespace['XF1']: 0
    eval_expression CPP_to_Python ('XF1',)
    CPP_to_Python: 'XF1' -> 'XF1' -> 'XF1'
    eval_expression cpp_namespace:
      'XF1': 0
    eval_expression 'XF1' -> 0
    

Alternative implementation:

  • Regexes

    int_suffix_opt = r'(?:[uU](?:l{1,2}|L{1,2}|[zZ]|wb|WB)?|(?:l{1,2}|L{1,2}|[zZ]|wb|WB)[uU]?)?'
    
    hex_integer = fr'(0[xX][0-9A-Fa-f]+){int_suffix_opt}'
    bin_integer = fr'(0[bB][01]+){int_suffix_opt}'
    oct_integer = fr'(0[0-7]*){int_suffix_opt}'
    dec_integer = fr'([1-9][0-9]*){int_suffix_opt}'
    
    ident_chars = r'[a-zA-Z0-9_]'
    neg_lookbehind = fr'(?<!{ident_chars})'
    neg_lookahead = fr'(?!{ident_chars})'
    
    # A separate list of expressions to be evaluated and substituted
    # sequentially, not all at once.
    CPP_to_Python_Eval_List = [
        [r'defined\s+(\w+)',                 '"\\1" in __dict__'],
        [r'defined\s*\((\w+)\)',             '"\\1" in __dict__'],
        [fr'{neg_lookbehind}{hex_integer}{neg_lookahead}', '\\1'],
        [fr'{neg_lookbehind}{bin_integer}{neg_lookahead}', '\\1'],
        [fr'{neg_lookbehind}{oct_integer}{neg_lookahead}', '0o\\1'],
        [fr'{neg_lookbehind}{dec_integer}{neg_lookahead}', '\\1'],
    ]
    
  • do_define (eval_expression_t not shown)

    def do_define(self, t) -> None:
        """
        Default handling of a #define line.
        """
        _, name, args, expansion = t
        rval, success = self.eval_expression_t(t[2:])
        if success and isinstance(rval, int):
            expansion = rval
        else:
            # handle "defined" chain "! (defined (A) || defined (B)" ...
            if "defined " in expansion:
                self.cpp_namespace[name] = self.eval_expression(t[2:])
                return
    
        if args:
            evaluator = FunctionEvaluator(name, args[1:-1], expansion)
            self.cpp_namespace[name] = evaluator
        else:
            self.cpp_namespace[name] = expansion
    

@bdbaddog
Copy link
Contributor Author

Closing in favor of #4629

@bdbaddog bdbaddog closed this Nov 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants