Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update OpenBSD glob implementation for Windows #16948

Merged
merged 7 commits into from
Jan 10, 2025

Conversation

NattyNarwhal
Copy link
Member

We're considering making this used as a glob implementation on POSIX as well, but first, we should rebase it from the latest version of OpenBSD.

This also adds a new internal header (charclass.h) for glob.

See conversation in GH-15564.

@NattyNarwhal
Copy link
Member Author

MSVC error there is misleading, it's missing reallocarray (and defaults to int(void) I assume). Would need to bring that over, adapt it to use realloc (safely avoiding overflow with args), or convert it ZendMM which has eqivalents anyways.

@cmb69
Copy link
Member

cmb69 commented Nov 28, 2024

MSVC error there is misleading, it's missing reallocarray (and defaults to int(void) I assume).

See #16986.

or convert it ZendMM which has eqivalents anyways.

Might be best, unless that would be a PITA when pulling in further updates.

@NattyNarwhal
Copy link
Member Author

For now I imported reallocarray. I don't think glob changes much, so future changes from OpenBSD should be easy to reconcile even if we do convert to ZendMM in the future. Now, to figure out what's broken...

@cmb69
Copy link
Member

cmb69 commented Nov 30, 2024

The usage of SEP is an issue (Windows is fine with forward slashes, too). The following patch solves some problems:

 win32/glob.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/win32/glob.c b/win32/glob.c
index 0923d49efa..36c1c58f1c 100644
--- a/win32/glob.c
+++ b/win32/glob.c
@@ -702,7 +702,7 @@ glob2(Char *pathbuf, Char *pathbuf_last, Char *pathend, Char *pathend_last,
 		/* Find end of next segment, copy tentatively to pathend. */
 		q = pathend;
 		p = pattern;
-		while (*p != EOS && *p != SEP) {
+		while (*p != EOS && !IS_SLASH(*p)) {
 			if (ismeta(*p))
 				anymeta = 1;
 			if (q+1 > pathend_last)
@@ -713,7 +713,7 @@ glob2(Char *pathbuf, Char *pathbuf_last, Char *pathend, Char *pathend_last,
 		if (!anymeta) {		/* No expansion, do next segment. */
 			pathend = q;
 			pattern = p;
-			while (*pattern == SEP) {
+			while (IS_SLASH(*pattern)) {
 				if (pathend+1 > pathend_last)
 					return (1);
 				*pathend++ = *pattern++;

Likely the SEP in line 689 needs to be replace with IS_SLASH() as well.

@NattyNarwhal
Copy link
Member Author

Thanks for trying to catch these. I was going to set up a Windows build environment over the weekend to avoid CI driven development, but didn't have any time and my Windows PC (basically used for games nowadays) seems a bit cooked.

@cmb69
Copy link
Member

cmb69 commented Dec 2, 2024

I'm happy to help with this (less platform specific behavior is always nice).

I think this also needs e42aae1, and should be good then.

@NattyNarwhal
Copy link
Member Author

I'm also considering porting fnmatch from latest OpenBSD. We currently import an ancient version from FreeBSD, but that involves pulling in even more headers (for i.e. collate, we did so for OpenBSD's charclass.h).

NattyNarwhal and others added 2 commits December 3, 2024 11:58
From OpenBSD, dependency of newer versions of glob.
We're considering making this used as a glob implementation on POSIX as
well, but first, we should rebase it from the latest version of OpenBSD.

This also adds a new internal header (charclass.h) for glob.

See conversation in phpGH-15564.

Co-authored-by: Christoph M. Becker <[email protected]>
@NattyNarwhal NattyNarwhal marked this pull request as ready for review December 3, 2024 16:02
@NattyNarwhal NattyNarwhal requested a review from bukka as a code owner December 3, 2024 16:02
@cmb69
Copy link
Member

cmb69 commented Dec 3, 2024

Just noticed that glob() does not support long and UTF-8 paths on Windows. Sigh.

@NattyNarwhal
Copy link
Member Author

Would it just be a matter of making it use the PHP VCWD functions? glob seems abstracted enough internally we can probably change i.e. g_lstat, or provide callbacks in the glob_t at callers/

@NattyNarwhal
Copy link
Member Author

Looking at VCWD, I'm not sure if it'd solve it; the stat function already binds to ioutil, lstat doesn't seem to, and while opendir does go through the path resolution code in VCWD, I'm not sure if that would handle the \\?\ type paths (plus there's the MAXPATHLEN stuff). But if you want to try...

diff --git a/win32/glob.c b/win32/glob.c
index f1e75de7344..6435d8f17d2 100644
--- a/win32/glob.c
+++ b/win32/glob.c
@@ -1041,7 +1041,7 @@ g_opendir(Char *str, glob_t *pglob)
 	if (pglob->gl_flags & GLOB_ALTDIRFUNC)
 		return((*pglob->gl_opendir)(buf));
 
-	return(opendir(buf));
+	return(VCWD_OPENDIR(buf));
 }
 
 static int
@@ -1053,7 +1053,7 @@ g_lstat(Char *fn, zend_stat_t *sb, glob_t *pglob)
 		return(-1);
 	if (pglob->gl_flags & GLOB_ALTDIRFUNC)
 		return((*pglob->gl_lstat)(buf, sb));
-	return(php_sys_lstat(buf, sb));
+	return(VCWD_LSTAT(buf, sb));
 }
 
 static int
@@ -1065,7 +1065,7 @@ g_stat(Char *fn, zend_stat_t *sb, glob_t *pglob)
 		return(-1);
 	if (pglob->gl_flags & GLOB_ALTDIRFUNC)
 		return((*pglob->gl_stat)(buf, sb));
-	return(php_sys_stat(buf, sb));
+	return(VCWD_STAT(buf, sb));
 }
 
 static Char *

@cmb69
Copy link
Member

cmb69 commented Dec 6, 2024

I wouldn't worry about UTF-8 and long path support right now (i.e. for the update). There are already a couple of things that are not supported, e.g.

  • relative paths from the root of the current drive; e.g. \dir is the same as C:\dir if the current drive is C:; IIRC this is a general issue (only supported by NTS, but not ZTS, or maybe the other way round)
  • UNC paths (at least \\?\C: won't work)
  • tilde substitution; for ZTS builds it preprends the absolute path of the CWD, so completely broken; for NTS builds it looks like the respective substitution code is disabled (could probably use HOMEDRIVE and HOMEPATH for plain ~, but expanding ~user would not necessarily work)

Anyway, I tried globbing for ü.txt, and that failed with the old version, but worked with the new, probably because of typedef char Char vs. typedef short Char. That might be a good thing or not; I need to check this more closely.

I shall also check other clients of glob(), such as GlobIterator and the glob:// wrapper. So far only checked zif_glob().

@NattyNarwhal NattyNarwhal requested a review from Girgias December 6, 2024 14:33
@cmb69
Copy link
Member

cmb69 commented Dec 6, 2024

GlobIterator seems to rely on the glob:// wrapper, so doesn't need to be check explicitly. However, the glob:// wrapper has at least bug #17067 (I haven't checked further, yet).

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking care of this.

I'm not a massive fan of adding a new file for reallocarray() but that's not really a problem.
Left some minor comments

win32/glob.h Outdated Show resolved Hide resolved
win32/glob.c Outdated Show resolved Hide resolved
win32/glob.c Outdated
Comment on lines 448 to 554
(c = qpatnext[1]) != RBRACKET) {
(c = qpatnext[1]) != RBRACKET) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent changed from tab to spaces

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, this how OpenBSD indented it (same for the ones you mentioned below). I can change this to match PHP style.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep this file as close to the original as possible, to make backports from upstream easier.

Copy link
Member Author

@NattyNarwhal NattyNarwhal Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind doing the indentation changes if that's how we do it; the file doesn't change too much and in the future we can probably look at backporting individual patches. I just did a blanket import since it looks like we had almost 20 years of changes to deal with.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep this file as close to the original as possible, to make backports from upstream easier.

Well, I suppose if we can backport the indent issues I'm fine with that too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't worry that much about diverging from upstream as we might want to do some changes into it eventually. It would be however useful to track the version that it is based on so we can from time to time review changes in the upstream and backport it to our implementation.

win32/glob.c Outdated
Comment on lines 596 to 597
((pglob->gl_flags & GLOB_NOMAGIC) &&
!(pglob->gl_flags & GLOB_MAGCHAR)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto indent changes

win32/glob.c Outdated
Comment on lines 655 to 656
pathbuf, pathbuf+PATH_MAX-1,
pattern, pattern_last, pglob, limitp));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

win32/glob.c Outdated
Comment on lines 691 to 694
pathend[-1] != SEP) && (S_ISDIR(sb.st_mode) ||
(S_ISLNK(sb.st_mode) &&
(g_stat(pathbuf, &sb, pglob) == 0) &&
S_ISDIR(sb.st_mode)))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent changes

win32/glob.c Outdated
Comment on lines 726 to 727
pathend_last, pattern, p, pattern_last,
pglob, limitp));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

win32/glob.c Outdated
Comment on lines 614 to 761
pglob->gl_flags & GLOB_ERR)
pglob->gl_flags & GLOB_ERR)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

win32/glob.c Outdated
Comment on lines 649 to 805
restpattern, restpattern_last, pglob, limitp);
restpattern, restpattern_last, pglob, limitp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Member

@bukka bukka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice first step. Except that reallocarray placement, it looks good to me. It would be good to wait for @cmb69 to do his testing though.

@@ -0,0 +1,59 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should not be in main but in win32. I don't think we will want to use it in this format if glob is made common. In such case we should probably look to introducing something in zend_alloc that would respect memory_limit at some point. I know that the libc glob does not do that but it would be good thing to do eventually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imported reallocarray there since it's consistent with other imported utility functions like strlcpy, and we'll need it when we use this glob on not-Windows. ZendMM conversion would render this moot though.

OpenBSD uses tabs for indentation, but four spaces to align i.e.
continuations of lines. In PHP, we apparently just use tabs, so do
that for code bodies and function declarations.
Matches the style of glob before, OpenBSD must have changed this to
match their convention, but this is closer to PHP convention.
@NattyNarwhal
Copy link
Member Author

I've cleaned up the indentation stuff and did try to fix some things to match PHP house style. Let me know if I missed something.

If we want to do ZendMM conversion, should that be done in another PR or in this PR? Latter would mean we do even more stuff in this PR, but we can avoid having to import reallocarray just to get rid of it in a ZendMM conversion.

@bukka
Copy link
Member

bukka commented Dec 15, 2024

As it's just temporary, it should be fine to chuck static php_reallocarray to glob.c and then in the future PR do the Zend MM conversion and get rid of it. You can just add a comment that it's a temp thing.

We only need it on Windows, and we want to convert this file to ZendMM
later anyways, which has its own equivalent.

This also reverts commit a26e92e.
@bukka
Copy link
Member

bukka commented Dec 22, 2024

It looks fine to me now. It's all win32 so let's wait for @cmb69 to give it a green light.

@NattyNarwhal
Copy link
Member Author

Bumping again, but to recap on what might still need to be done if this is to be merged:

  • Christoph mentioned long path issues, but that this wasn't blocking this PR.
  • Is indentation fine here?
  • Do we want to do fnmatch bumping in this or a new PR? This would probably use OpenBSD instead of FreeBSD as the basis to avoid importing even more headers, but should be compatible.
  • I think consensus is doing ZendMM conversion in a new PR.

@Girgias
Copy link
Member

Girgias commented Jan 9, 2025

I am happy to progress as is if @cmb69 is OK with it :)

Copy link
Member

@cmb69 cmb69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good as is, and there is still plenty of time to PHP 8.5.0alpha1 for additional checking.

Comment on lines +69 to +71
# ifndef PATH_MAX
# define PATH_MAX MAXPATHLEN
# endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep an eye on this; AFAIK PATH_MAX is not defined on Windows, so it will be defined here (what we want).

@NattyNarwhal NattyNarwhal merged commit 97a26ae into php:master Jan 10, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants