Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to install XCP-ng on some hardware with i915 compatible graphics card #436

Open
rushikeshjadhav opened this issue Sep 15, 2020 · 46 comments

Comments

@rushikeshjadhav
Copy link

i915 compatible graphics card has become common in many cases (Intel NUCs) and users are unable to do installation with default kernel or kernel-alt. Current iso build process strips i915 from both kernels which could be causing a black screen after Xen relinquishes VGA. This needs more testing from users who have only i915 compatible VGA.

Users can boot an already installed system as it has i915 module present.

It is not certain that having i915 module in the ISO will solve this issue.

@stormi stormi changed the title Unable to install XCP-ng with i915 compatible graphics card Unable to install XCP-ng on some hardware with i915 compatible graphics card Nov 6, 2020
@stormi
Copy link
Member

stormi commented Nov 6, 2020

It now has been tested that having the i915 module solves the issue. However we still don't know why on those specific computers the installer's kernel is not able to display text without the GPU driver. Adding that driver would be a workaround (and we can help users do it manually during installation), but not the complete solution.

In addition to that, the i915 module is blacklisted in dom0. We don't know the reason why yet.

@vicsanca
Copy link

vicsanca commented Feb 3, 2022

Hello, I'm having this problem with comet lake and rocket lake processors. How can I manually solve it?
Thanks

@stormi
Copy link
Member

stormi commented Feb 3, 2022

There are test installation images at https://xcp-ng.org/forum/topic/5492/xcp-ng-8-2-1-maintenance-update-ready-for-testing that are supposed to bring Rocket Lake support, but I don't think it's related to i915.

@vicsanca
Copy link

vicsanca commented Feb 4, 2022

Same problem with 8.2.1. Black screen after relinquish VGA.
I'm able to install only with PCIe GPU. Once installed and removed GPU tested lots of workarounds for CentOS 7 in grubs command line and i915.conf without success.
Same result with comet lake (i5-10400) and rocket lake (i5-11400).

@stormi
Copy link
Member

stormi commented Feb 4, 2022

So even by un-blacklisting i915 in /etc/modprobe.d/i915.conf on the installed system you still don't have any display after reboot? This would differ from what other users experienced in a similar situation.

We still don't know why such hardware is not able to display a simple console without using the i915 module though.

@vicsanca
Copy link

vicsanca commented Feb 4, 2022

Yes, un-blacklisting has no effect. Maybe the problem is with Intel B560 chipset?. Which CentOS release it's based on? I'm gonna try with CentOS to see if has the same problem.

Thanks

@stormi
Copy link
Member

stormi commented Feb 4, 2022

The CentOS release (7) wouldn't tell you much as we have a custom 4.19 kernel and there's also the Xen layer that might play a role.

@orther
Copy link

orther commented Feb 4, 2022

I can confirm @vicsanca report that the black screen when relinquish VGA.

One thing of note is that the 8.2.1 release did render the GRUB install menu on my Intel 11th Gen NUC (NUC11PAHi7) which could open a path for hacking an i915 fix on those NUCs (with less hassle).

@vicsanca
Copy link

vicsanca commented Feb 4, 2022

OK. I have done more tests with comet lake.

Updated 8.2 to 8.2.1 with yum, same problem, but now lspci shows a name for VGA (00:02.0 VGA compatible controller: Intel Corporation CometLake-S GT2 [UHD Graphics 630] (rev 03)), 8.2 only shows hardware identifier 9bc8

Un-Blacklisting i915 in /etc/modprobe.d/i915.conf has no effect, lsmod shows that i915 module is NOT loaded.

Tested "options i915 force_probe=9bc8" in i915.conf...same result. It's not loading i915.

@orther
Copy link

orther commented Feb 5, 2022

I've read that the Intel Iris Xe Graphics require kernel version 5.4 or higher and I am seeing that reported for my specific device an 11th Gen Intel NUC i7 here: http://linux-hardware.org/?id=pci:8086-9a49-8086-3004

@vicsanca
Copy link

vicsanca commented Feb 8, 2022

Could be a problem related to UEFI/CSM?

https://scottiestech.info/2021/04/13/why-cant-i-enable-csm-on-my-new-motherboard/

@cheezgr8r
Copy link

Adding that driver would be a workaround (and we can help users do it manually during installation)

@stormi , how can this be done manually during installation?

@stormi
Copy link
Member

stormi commented Apr 4, 2022

Adding that driver would be a workaround (and we can help users do it manually during installation)

@stormi , how can this be done manually during installation?

You attach a device with the driver on it, switch to a shell with ALT+RIGHT and insmod the driver.

@orther
Copy link

orther commented Apr 25, 2022

There’s a new NUC Test 2 ISO that works on my NUC 11 w/ i915. You can find out more about it on this XCP-NG community forum thread

@danieltwagner
Copy link

danieltwagner commented Jul 29, 2022

I came here because I've been trying to install XCP-ng on a Jasper Lake N5105 system which also uses i915 graphics and further has 4x i225 network cards and between the two it feels like I'm living on the bleeding edge. The "NUC Test 2" ISO linked above got me through to installation where the vanilla 8.2.1 installer failed. Is there an ETA for inclusion of the i915 driver in the official ISOs?

@olivierlambert
Copy link
Member

Our objective is to product nightly ISO with latest updates (and maybe even test drivers), especially for people with non-server hardware.

@stormi
Copy link
Member

stormi commented Jul 29, 2022

Note: this is not about including the i915 drivers as they are not required to get a display. The fixes are related to console display in the linux kernel.

@danieltwagner
Copy link

@stormi You're right, I should have used more precise language; my question was if there was a timeline to incorporate these improvements to the installer or iso build process such that the installation can complete as normal on Intel NUC and other devices using i915 graphics.

@olivierlambert That's great! I take it this isn't the case yet, or have I missed them?

@olivierlambert
Copy link
Member

Not yet, as I said it's an objective, not something already done ;)

@dezren39
Copy link

dezren39 commented Aug 3, 2022

I am also running into this issue with a Jasper Lake N6005 CPU very similar to @danieltwagner's. Once the 1165g7 version shows up, I will let you know if it's also affected. Likely will be, based on this and the forum threads. Unfortunately there is no legacy mode in the new intel chipsets. (EDIT: Confirmed, also broken with regular and fixed by custom iso from forums.)

They are becoming popular partially because more affordable models from AliExpress (and occasionally Amazon) are showing up now and have been reviewed on YouTube by a few channels, generating at least a little interest. (Between ServeTheHome && Level1Techs combined, I'm sure there's more than one following a similar path.)

@davidpesce
Copy link

Any updates on this? Has it been integrated into nightly build?

@rjt
Copy link

rjt commented Sep 15, 2022 via email

@staticfrost
Copy link

Hey, is the changes from the NUC test2 iso going to be merged in any time soon?

@stormi
Copy link
Member

stormi commented Sep 22, 2022

So, regarding the graphics themselves, the next ISOs we build will have the fix:

  • upcoming 8.3 Beta (probably end of 2022)
  • refreshed 8.2 ISOs

Regarding refreshed 8.2 ISOs, the display issue is not the only issue that needs fixing before we release any: we also need backported network drivers. This is where we're not advancing right now. The initial plan, devised back in May IIRC, was that @andrew64k would contribute pull requests to the XCP-ng project so that we can include these drivers. I also offered help to accompany the process. But I think both sides have been busy and nothing was done.

Another solution is I can do the packaging work myself. I just need some input from Andrew, as we already discussed it on the forum, in the dedicated forum thread: the upstream code that was extracted from the linux kernel + patches that were necessary to make them work on our older 4.19 kernel, and explanations of the rationale behind the changes because we need traceability.

The last resort would be doing it all by myself without @andrew64k, but this doesn't look like a good solution in my eyes, and is likely to happen later than the other solutions.

@andrew64k
Copy link

andrew64k commented Sep 22, 2022 via email

@andrew64k
Copy link

andrew64k commented Sep 24, 2022 via email

@danielbayley80
Copy link

So, regarding the graphics themselves, the next ISOs we build will have the fix:

  • upcoming 8.3 Beta (probably end of 2022)
  • refreshed 8.2 ISOs

I have been trying this with the Alpha.
It installs and boots cleanly (unlike the previous version which had the relinquish issue). I am still having issues in one of my VMs. I think it is when the Guest OS (Windows 10) probes graphics.

I raised this for 8.2.1 on my N5105. It works fine bare metal.
More recently I tried a 12th Gen i7-1260P with the same issue.
I will go back and test my N5105 but I strongly suspect I will see the same issue.

#565

@exetico
Copy link

exetico commented Mar 14, 2023

8.2.1 gave me black screen after the "relinquishing vga console" message. 8.3a2 booted the installation with no problem. I'm on a ODROID-H3+.

@olivierlambert
Copy link
Member

Yes, it's somehow expected :) Thanks for your feedback confirming it's the case 👍

@mrnaz
Copy link

mrnaz commented Oct 11, 2023

I just purchased a bunch of N5105 based units intending to install xcp-ng on them for light duty stuff. I SHOULD have bought one and tested first, but I didn't. Now I can't install XCP-ng on any of them, and this is the issue that I'm bumping into.

Is there a solution for this yet? Or are the various hacks in this thread still the only workaround?

@andrew64k
Copy link

Try installing 8.3 beta or the new 8.2 ISO. There are official updates now to support some newer hardware.

Check the XCP forums for additional information.

@mrnaz
Copy link

mrnaz commented Oct 11, 2023

Is the new 8.2 ISO just from the regular download link? If so, I downloaded 8.2.1 this morning and compared the hashes with one I had from about 6 months ago and they are the same. Both give me the same "relinquishing" bug.

Is 8.3 beta safe(ish) to use in a production environment? Or am I asking for trouble?

@mrnaz
Copy link

mrnaz commented Oct 11, 2023

I confirm that 8.3b installs fine.

@olivierlambert
Copy link
Member

We'll have soon © new updated 8.2 ISO so you can also install 8.2 directly on it. However, 8.3 works already pretty well, which is OK-ish for this kind of hardware anyway :)

@mrnaz
Copy link

mrnaz commented Oct 11, 2023

It's good to know, I would much rather use 8.2 because while this is low power hardware, the use case is production. These boxes are going to be used to run pfsense as the core router, virtualized so that we can also have a few other lightweight functions happening on the same hardware. It will be backed up, so if it goes down it's not the end of the world, but nonetheless I'd sleep better with LTS rather than a beta release.

@lethedata
Copy link

lethedata commented Jan 27, 2024

It now has been tested that having the i915 module solves the issue. However we still don't know why on those specific computers the installer's kernel is not able to display text without the GPU driver. Adding that driver would be a workaround (and we can help users do it manually during installation), but not the complete solution.

In addition to that, the i915 module is blacklisted in dom0. We don't know the reason why yet.

@stormi I think the reason this module was/is blacklisted is due to xen and dom0 using the kernel based vga settings allowing a default expected resolution to be set at all times. By enabling i915, the module is able to take over and adjust the resolution however sometimes it will adjust to something unsupported leading to an "Input Signal Out Of Range" error on the attached monitor.

I've ran into this issue and normally setting the resolution in the kernel parameters manually (required during install) or permanently via Citrix Article CTX226191 fixes the issue however with 8.3 Beta it didn't work. Took me a bit to pinpoint that the i915 module was loading requiring modprobe.blacklist=i915 be added to the module2 kernel parameter as well. I was lucky to have another monitor on-hand as remoting in wasn't an option and it was presenting itself during the install process.

From my understanding of things, servers aren't really expected to need GPU drivers in the first place so all module based issues can be avoided by simply disabling it. It's much easier to troubleshoot and recognize when someone's hardware requires the module than to figure out the opposite due to some obscure module/monitor issue. That's not even mentioning the frustration of trying to information off a system when the customer can't even get any output in the first place, especially as it only presents itself when trying to access things locally.

I'm planning a PR to document the fix but for now opened xcp-ng/xcp-ng-org#255 .

@stormi
Copy link
Member

stormi commented Feb 26, 2024

Hi! I must admit I'm not sure what we should do with these issues, but I'm following your efforts in documenting a way to workaround them! Thanks!

If at some point, you believe you have found something we change in XCP-ng's default configuration, feel free to suggest it too. I'm not sure why i915 is not blacklisted anymore from XenServer 8 (which is what XCP-ng 8.3 inherited this change from).

@stormi
Copy link
Member

stormi commented Feb 26, 2024

Oh, also, I think we should open a separate issue, because yours doesn't seem to have the same cause as the initial issue reported here. Do you agree?

@stormi
Copy link
Member

stormi commented Feb 26, 2024

Oh, by the way, i915 is blacklisted in XCP-ng 8.3:

# cat /etc/modprobe.d/i915.conf 
blacklist i915
options i915 enable_gvt=1

@lethedata
Copy link

lethedata commented Feb 26, 2024

Oh, also, I think we should open a separate issue, because yours doesn't seem to have the same cause as the initial issue reported here. Do you agree?

I don't think an issue is needed. It seems to be more of an edge case so the documented work around should be good enough. Anything in there could be set as the defaults if needed but the trade-off is that you'll be forcing a particular resolution size rather than letting the system automatically handle it. For consistency, forcing resolution on UEFI will match the BIOS config.

Oh, by the way, i915 is blacklisted in XCP-ng 8.3:

I think my system was ignoring that blacklist. I've seen it happen in the past where blacklisting in modprobe.d just didn't work for whatever reason requiring the blacklist be set in the kernel parameters directly. The system is on beta2 now (fresh install) and isn't loading it with the modprode.d file only, as it should.

@stormi
Copy link
Member

stormi commented Feb 26, 2024

Could it be blacklisted in the running system, but not in the initrd?

@lethedata
Copy link

Doesn't look like it. I decompressed both beta1 and beta2 and the blacklist file is there.

"This alone will not prevent a module being loaded if it is a required or an optional dependency of another module. Some kernel modules will attempt to load optional modules on demand...." - Red Hat KB41278

@stormi
Copy link
Member

stormi commented Feb 26, 2024

Right, so maybe another module requires it in your case 🤔

@rjt
Copy link

rjt commented Feb 26, 2024 via email

@lethedata
Copy link

lethedata commented Feb 26, 2024

@rjt No,not fully anyway. Those steps are written for rhel which do not match to how dom0 is built in xcp-ng; grub config being different is the big one.

Due to what dom0 is and how integrated it is with xen, I'd say it's a better idea to blacklist things via kernel parameters using xen-cmdline. This would allow one easy remove any changes at boot time without having to fight with a broken system and broken initrd. It also allows upgrades without breaking things unexpectedly and "wins" over files

Basically, any deviations from the standard dom0 should be done in a non-permanent easily removable way.

@rjt
Copy link

rjt commented Feb 26, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests