Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

waitforx changes #3297

Merged
merged 1 commit into from
Nov 27, 2024
Merged

Conversation

matt335672
Copy link
Member

Two changes to waitforx:-

  1. Changed alarm message from "Timed out waiting for RandR outputs" to "Timed out waiting for X display". The former was confusing, and the times where this message was triggered were nothing to do with RandR
  2. Don't try to open a display of the form ":n" directory (n >= 0). If the X server hasn't yet opened its local socket, the XOpenDisplay() call can go to the network and possibly block for a long time. Instead, use the (undocumented) "unix:n" display specification, which never goes to the network, and doesn't block.

We've had three reports now of waitforx seemingly blocking:-

I've not been able to get to the bottom of any of them satisfactorily, but I got a clue when looking at the last of these.

What I think is happening is this:-

  1. xrdp-sesman starts Xorg
  2. Xorg takes a while to initialise and open the UNIX socket for the display
  3. waitforx runs before the socket has opened and tries to open a display of the form ":n". Let's say this is ":10".
  4. Because the local socket is unavailable, the code in XOpenDisplay() goes to the network to try to connect to TCP port localhost:6010 for the X server.
  5. Network code blocks, either resolving localhost or because the user has not unreasonably firewalled off X11 ports.
  6. waitforx times out.

The fix is to use a display specification of (e.g.) unix:10 instead of :10. This never goes to the network, and so the scenario above is avoided.

However, the DISPLAY specification 'unix:n' is not documented.

It's clearly supported in the xcb code in _xcb_open(), which libX11 is now layered on:-

https://gitlab.freedesktop.org/xorg/lib/libxcb/-/blob/master/src/xcb_util.c?ref_type=heads#L247

There's evidence of it being supported back in 2007:-
https://gitlab.freedesktop.org/xorg/lib/libxcb/-/commit/09045eaac34973662aaa820a94ca8ed66d9dcb4e

I've tried the notation for both CentOS 7 and Ubuntu 16.04, which are the oldest distros I have available. These both support it.

Are people happy with this notation being used here? Is there a better way to get the same result?

@matt335672
Copy link
Member Author

I'm going to merge this into devel, so we can give it more of a test. Works fine on Linux & FreeBSD

@matt335672
Copy link
Member Author

Just spotted an issue - we need to support (e.g.) :10.0 as well as :10. I'll fix this before a merge

1) Changed alarm message from "Timed out waiting for RandR outputs" to
   "Timed out waiting for X display". The former was confusing, and the
   times where this message was triggered were nothing to do with
   RandR
2) Don't try to open a display of the form ":n" or ":n.m" directly
   (n,m >= 0). If the X server hasn't yet opened its local socket,
   the XOpenDisplay() call can go to the network and possibly block
   for a long time. Instead, use the (undocumented) "unix:n" display
   specification, whih never goes to the network, and doesn't block.
@matt335672 matt335672 force-pushed the waitforx_improvements branch from d037a76 to 80fab03 Compare November 27, 2024 10:29
@matt335672 matt335672 merged commit 600549c into neutrinolabs:devel Nov 27, 2024
14 checks passed
@matt335672 matt335672 deleted the waitforx_improvements branch November 27, 2024 10:54
@derekschrock
Copy link
Contributor

For newer distros this will be broken due to unix: being broken with libxcb 1.16.

https://gitlab.freedesktop.org/xorg/lib/libxcb/-/issues/73

Should if (0 && is_local_display(display_name)) or have is_local_display always return 0 until libxcb is fixed?

@matt335672
Copy link
Member Author

I can confirm this is broken on Fedora 41, which uses libxcb 1.17.0

This is the commit referenced in your issue report:-

https://gitlab.freedesktop.org/xorg/lib/libxcb/-/commit/095255531b90f0b442e6ca41fb3752a058562d07

Reading that, I don't think this is going to be fixed, at least not in the way we would like.

The reason I made the change the way I did was I wanted to avoid adding a dependency on "/tmp/.X11-unix/X%u" as a string for mapping a display to a socket. This is probably not worth worry about for two reasons:-

  1. A code search shows lots of other applications would break if this string changed. This includes ssh:-
    $ strings `which ssh` | grep /tmp
    /tmp/.X11-unix/X%u
    /tmp/ssh-XXXXXXX
    
  2. We've already got that string embedded in sesman/sesman.c

I'll work on a PR to fix this for all the libxcb versions I can find.

@matt335672
Copy link
Member Author

Fix in https://github.com/matt335672/xrdp/tree/fix_x11_socket_check. Out of time today to test. I'll open a PR soon.

@derekschrock
Copy link
Contributor

Works for me.

@matt335672
Copy link
Member Author

Thanks Derek. I've refactored the code for PR #3336 so it's a bit cleaner, but essentially it's doing the same thing as the branch you tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants