Bug #499
opencoreboot will not boot edk2 on Lenovo T440p with CONFIG_RESOURCE_ALLOCATION_TOP_DOWN enabled, cannot disable this setting during build
0%
Description
coreboot revision in git: feb27dcbf3fc685b070c950a16e8adec958bc1ce
coreboot revision (git describe --tags): 4.20-520-gfeb27dcbf3
Tested payloads: edk2 from MrChromebox revision uefipayload_202304 and uefipayload_202306
coreboot will not boot my Lenovo ThinkPad T440p with CONFIG_RESOURCE_ALLOCATION_TOP_DOWN enabled, when using the edk2 payload (MrChromebox version, either uefipayload_202304 or uefipayload_202306). The display sometimes turns on, indicating that some of the hardware initialization was successful, but the payload will not start.
I tried to disable CONFIG_RESOURCE_ALLOCATION_TOP_DOWN in .config, but the build process insists on leaving this config enabled. This seems to be caused by the RESOURCE_ALLOCATION_TOP_DOWN setting changed to "def_bool y" in src/device/Kconfig in commit 5226301765ded70e0ef640e5252bbaca8cd14451 (allocator_v4: Treat above 4G resources more natively). The make target for building coreboot seems to automatically rerun olddefconfig, causing this setting to always remain enabled no matter what was previously saved in the .config file.
Modifying src/device/Kconfig to change RESOURCE_ALLOCATION_TOP_DOWN to "def_bool n" appears to fix the problem on my machine.
I have attached my .config file (with CONFIG_RESOURCE_ALLOCATION_TOP_DOWN enabled to reproduce the problem).
Files
Related links
Updated by Nico Huber about 1 year ago
- Assignee set to Nico Huber
Could you get a coreboot log of the failing boot? If nothing else is available, CONFIG_CONSOLE_SPI_FLASH might work.
Updated by Nico Huber about 1 year ago
Modifying src/device/Kconfig to change RESOURCE_ALLOCATION_TOP_DOWN to "def_bool n" appears to fix the problem on my machine.
The cannonical way would be to add a default n
to the mainboard or northbridge Kconfig. However, fixing the bug would be really preferred.
Updated by Oberon 4071 about 1 year ago
- File console.log console.log added
Nico Huber wrote in #note-2:
Could you get a coreboot log of the failing boot? If nothing else is available, CONFIG_CONSOLE_SPI_FLASH might work.
Sure - I have attached the console.log recovered from the flash CBFS after enabling CONFIG_CONSOLE_SPI_FLASH and attempting boot.
Updated by Oberon 4071 about 1 year ago
Nico Huber wrote in #note-3:
Modifying src/device/Kconfig to change RESOURCE_ALLOCATION_TOP_DOWN to "def_bool n" appears to fix the problem on my machine.
The cannonical way would be to add a
default n
to the mainboard or northbridge Kconfig. However, fixing the bug would be really preferred.
Thank you for this tip. I can make this change locally as a temporary workaround to keep coreboot working on my T440p, while keeping the build process in place for my other boards that have working CONFIG_RESOURCE_ALLOCATION_TOP_DOWN.
Updated by Nico Huber about 1 year ago
Alas, the flash console log ends before ramstage. It's often not fully working, but was worth a shot.
Maybe a cbmem log with RESOURCE_ALLOCATION_TOP_DOWN disabled would already provide some insight. But it's really a bit like shooting in the dark.
Updated by Oberon 4071 about 1 year ago
Nico Huber wrote in #note-6:
Alas, the flash console log ends before ramstage. It's often not fully working, but was worth a shot.
Maybe a cbmem log with RESOURCE_ALLOCATION_TOP_DOWN disabled would already provide some insight. But it's really a bit like shooting in the dark.
I am assuming that the cbmem log is the same as the log generated by CONFIG_CONSOLE_SPI_FLASH - if this is not the case, I will be more than happy to provide the appropriate log where possible. Unfortunately, enabling CONFIG_CONSOLE_SPI_FLASH also causes the build without CONFIG_RESOURCE_ALLOCATION_TOP_DOWN to fail to boot. I will look into determining whether it is possible to output the relevant log to an available port on the hardware, rather than storing the log on SPI flash.
Updated by Nico Huber about 1 year ago
With CONSOLE_SPI_FLASH it seems to hang too early for any useful output. If you'd return to a working config, a cbmem log would be more comprehensive. It won't show the same as one with RESOURCE_ALLOCATION_TOP_DOWN, but could already provide some more details of your system.
Updated by Paul Menzel about 1 year ago
Thank you for your report.
As you explicitly mention EDK2 (TianoCore), did you test another payload like SeaBIOS or GRUB?
Just to avoid confusion, you can get the logs from the CBMEM console using cbmem -c
from util/cbmem/
, or /sys/firmware/log
with Linux built with CONFIG_GOOGLE_MEMCONSOLE_COREBOOT
selected.
Updated by Oberon 4071 about 1 year ago
Paul Menzel wrote in #note-9:
Thank you for your report.
As you explicitly mention EDK2 (TianoCore), did you test another payload like SeaBIOS or GRUB?
Just to avoid confusion, you can get the logs from the CBMEM console using
cbmem -c
fromutil/cbmem/
, or/sys/firmware/log
with Linux built withCONFIG_GOOGLE_MEMCONSOLE_COREBOOT
selected.
I have attached the cbmem -c output after booting the EDK2 coreboot build with RESOURCE_ALLOCATION_TOP_DOWN disabled. (The output of git describe is different because I added a local commit to disable this option by default for lenovo/haswell as part of this test.)
Other than EDK2, I also routinely build coreboot with GRUB2 where SeaBIOS is the secondary payload. This image also did not successfully boot into the payload when RESOURCE_ALLOCATION_TOP_DOWN is enabled. I'll test again with a coreboot build containing only SeaBIOS.
Updated by Nico Huber about 1 year ago
- Related links updated (diff)
Alas, nothing obvious in the log. If there is any resource unknown to coreboot, even with a log of the failing boot, we might end up simply constraining the space for allocations. So we can try that right away: Here's a patch that might work. If that doesn't help, I guess we have to disable top-down allocation for Haswell.
Updated by Oberon 4071 about 1 year ago
- File cbmem_seabios_with_resource_allocation_top_down.txt cbmem_seabios_with_resource_allocation_top_down.txt added
Oberon 4071 wrote in #note-10:
Paul Menzel wrote in #note-9:
Thank you for your report.
As you explicitly mention EDK2 (TianoCore), did you test another payload like SeaBIOS or GRUB?
Just to avoid confusion, you can get the logs from the CBMEM console using
cbmem -c
fromutil/cbmem/
, or/sys/firmware/log
with Linux built withCONFIG_GOOGLE_MEMCONSOLE_COREBOOT
selected.I have attached the cbmem -c output after booting the EDK2 coreboot build with RESOURCE_ALLOCATION_TOP_DOWN disabled. (The output of git describe is different because I added a local commit to disable this option by default for lenovo/haswell as part of this test.)
Other than EDK2, I also routinely build coreboot with GRUB2 where SeaBIOS is the secondary payload. This image also did not successfully boot into the payload when RESOURCE_ALLOCATION_TOP_DOWN is enabled. I'll test again with a coreboot build containing only SeaBIOS.
I was able to boot a coreboot build with RESOURCE_ALLOCATION_TOP_DOWN, containing only SeaBIOS (omitting nvramcui and coreinfo as well). I have attached a cbmem log from this boot, in case it might be of any help.
Updated by Oberon 4071 about 1 year ago
Nico Huber wrote in #note-11:
Alas, nothing obvious in the log. If there is any resource unknown to coreboot, even with a log of the failing boot, we might end up simply constraining the space for allocations. So we can try that right away: Here's a patch that might work. If that doesn't help, I guess we have to disable top-down allocation for Haswell.
Unfortunately, the above patch (to change DOMAIN_RESOURCE_32BIT_LIMIT) did not work for me when using EDK2.
Updated by Nico Huber about 1 year ago
- Related links updated (diff)
Oberon 4071 wrote in #note-13:
Nico Huber wrote in #note-11:
Alas, nothing obvious in the log. If there is any resource unknown to coreboot, even with a log of the failing boot, we might end up simply constraining the space for allocations. So we can try that right away: Here's a patch that might work. If that doesn't help, I guess we have to disable top-down allocation for Haswell.
Unfortunately, the above patch (to change DOMAIN_RESOURCE_32BIT_LIMIT) did not work for me when using EDK2.
Sorry, I forgot: You have to run make olddefconfig
or any other config target after applying that patch. You can check if it had an effect in your .config
file. I've spotted something now, that actually makes sense to patch, but I don't know if it has an effect on edk2.
I'm actually quite surprised that it works with SeaBIOS. A cbmem log with that patch and the changed limit would also be much appreciated. dmesg
output can also be helpful in case Linux detects any error that is caused by the top-down allocation (I couldn't spot anything abnormal in the cbmem log).
Updated by Oberon 4071 about 1 year ago
Nico Huber wrote in #note-14:
Oberon 4071 wrote in #note-13:
Nico Huber wrote in #note-11:
Alas, nothing obvious in the log. If there is any resource unknown to coreboot, even with a log of the failing boot, we might end up simply constraining the space for allocations. So we can try that right away: Here's a patch that might work. If that doesn't help, I guess we have to disable top-down allocation for Haswell.
Unfortunately, the above patch (to change DOMAIN_RESOURCE_32BIT_LIMIT) did not work for me when using EDK2.
Sorry, I forgot: You have to run
make olddefconfig
or any other config target after applying that patch. You can check if it had an effect in your.config
file.
For this test, I've been starting with a clean build and set of configuration defaults for every rebuild. (Even the crossgcc-i386 and iasl targets get rebuilt each time, to prevent the toolchain from becoming out of sync.) I have confirmed that the configuration changes are applied to the .config file.
I've spotted something now, that actually makes sense to patch, but I don't know if it has an effect on edk2.
I applied patches 76198 and 76199 from Gerrit, but the coreboot build still did not successfully start EDK2.
I'm actually quite surprised that it works with SeaBIOS. A cbmem log with that patch and the changed limit would also be much appreciated.
dmesg
output can also be helpful in case Linux detects any error that is caused by the top-down allocation (I couldn't spot anything abnormal in the cbmem log).
I will try booting a SeaBIOS build with the two patches applied, and I'll post the cbmem and dmesg output in case it might provide any useful information.
Updated by Oberon 4071 about 1 year ago
- File dmesg_change_76199.txt dmesg_change_76199.txt added
- File cbmem_change_76199.txt cbmem_change_76199.txt added
Nico Huber wrote in #note-14:
I'm actually quite surprised that it works with SeaBIOS. A cbmem log with that patch and the changed limit would also be much appreciated.
dmesg
output can also be helpful in case Linux detects any error that is caused by the top-down allocation (I couldn't spot anything abnormal in the cbmem log).
I was able to boot a coreboot build containing only SeaBIOS with patches 76198 and 76199 applied. I have attached the cbmem and dmesg outputs from this session.
Updated by Sean Rhodes about 1 year ago
Seems to affect more than Haswell, as I can reproduce on ADL and RDL w/ UefiPayloadPkg or UPL.
It's asserting on L1900 of MdeModulePkg/Universal/Acpi/AcpiTableDxe/AcpiTableProtocol.c
Updated by Nico Huber about 1 year ago
Sean Rhodes wrote in #note-17:
It's asserting on L1900 of MdeModulePkg/Universal/Acpi/AcpiTableDxe/AcpiTableProtocol.c
This?
if (((EFI_ACPI_3_0_FIXED_ACPI_DESCRIPTION_TABLE *)ChildTable)->Dsdt != 0) {
TableToInstall = (VOID *)(UINTN)((EFI_ACPI_3_0_FIXED_ACPI_DESCRIPTION_TABLE *)ChildTable)->Dsdt;
Status = AddTableToList (AcpiTableInstance, TableToInstall, TRUE, Version, TRUE, &TableKey);
if (EFI_ERROR (Status)) {
DEBUG ((DEBUG_ERROR, "InstallAcpiTableFromHob: Fail to add ACPI table DSDT\n"));
ASSERT_EFI_ERROR (Status);
break;
}
}
Any chance to get the status code?
Updated by Sean Rhodes about 1 year ago
- File cbmem.txt cbmem.txt added
- File cbmem_top_down_n.txt cbmem_top_down_n.txt added
- File dmesg_top_down_n.txt dmesg_top_down_n.txt added
Updated by Sean Rhodes about 1 year ago
Seems like the root cause is CB:76127, apparently, edk2 thinks the DSDT has already been added so it says Access Denied, so revert CB:76143 and CB:76127 and it'll boot with TOP_DOWN=y.
TOP_DOWN does break some things in edk2 that don't prevent it booting, first is BlSupportDxe:
Failed to add memory space :0xFEC00000 0x1000
Updated by Michał Żygowski about 1 year ago
Throwing my 2 cents here after finding out TOP_DOWN is the culprit of my problems today:
Seems to affect more than Haswell, as I can reproduce on ADL and RDL w/ UefiPayloadPkg or UPL.
Yes, it most likely affects almost everything which uses EDKII.
Sean Rhodes wrote in #note-20:
Seems like the root cause is CB:76127, apparently, edk2 thinks the DSDT has already been added so it says Access Denied, so revert CB:76143 and CB:76127 and it'll boot with TOP_DOWN=y.
TOP_DOWN does break some things in edk2 that don't prevent it booting, first is BlSupportDxe:
Failed to add memory space :0xFEC00000 0x1000
That's something different AFAIK. EDKII cannot reassign the IOAPIC (FEC00000) memory (which is already marked as MMIO in the memory map/GCD, or as reserved in older EDKII revisions) to MMIO memory type. Not related to TOP_DOWN. It doesn't fail for HPET memory because there is a hole, apparently:
11. 00000000FEC00000 - 00000000FECFFFFF [02]
buildhob: base = 0xFEC00000, size = 0x100000, type = 0x1
12. 00000000FED40000 - 00000000FED6FFFF [02]
buildhob: base = 0xFED40000, size = 0x30000, type = 0x1
ReserveResourceInGcd (TRUE, EfiGcdMemoryTypeMemoryMappedIo, 0xFEC00000, SIZE_4KB, 0, ImageHandle); // IOAPIC
even tries to assign a different range. Anyways such attempts will always fail if the given range is already in GCD.
What I noticed today is that TOP_DOWN breaks the IGD display in EDKII. For some reason, PciIo Protocol is not installed on the BDF 2.0 (IGD) handle, and thus the drivers cannot connect the controllers/drivers to it properly. What I found quite weird was the I/O space assigned to IGD: ffc0 - ffff (makes sense for TOP_DOWN). Maybe it is some problem for EDKII?
Updated by Sean Rhodes about 1 year ago
Not related to TOP_DOWN.
It is - even if it's indirect, I can break/fix the splash with TOP_DOWN=n or y
What I noticed today is that TOP_DOWN breaks the IGD display in EDKII. For some reason, PciIo Protocol is not installed on the BDF 2.0 (IGD) handle, and thus the drivers cannot connect the controllers/drivers to it properly. What I found quite weird was the I/O space assigned to IGD: ffc0 - ffff (makes sense for TOP_DOWN). Maybe it is some problem for EDKII?
The two I've tested don't have that problem.
Updated by Michał Żygowski about 1 year ago
Sean Rhodes wrote in #note-22:
Not related to TOP_DOWN.
It is - even if it's indirect, I can break/fix the splash with TOP_DOWN=n or y
I meant the Failed to add memory space :0xFEC00000 0x1000
error. It is still present after TOP_DOWN is reverted, according to your logs. But indeed, reverting will fix the splash (and graphics output in general with IGD in EDKII according to my tests).
Updated by Nico Huber about 1 year ago
We should probably focus on one problem and one machine (that we can get edk2 logs from) at a time. Sean, thanks for your logs so far. Looking at the latest logs, I see what Michał pointed out: The 0xFEC00000 error is present even without top-down allocation. And in the very last log we are one
InstallProtocolInterface: 4CF5B200-68B8-4CA5-9EEC-B23E3F50029A
(gEfiPciIoProtocolGuid) short. And everything around Found PCI Display device
is missing.
To make sure we are not hunting multiple issues at once, it would be best to return to known-good and first-broken master states, e.g.
supposedly good: commit d7a354dab0fb (mb/google/brya/acpi: Set polling timing for DL23 and LD23 to 2ms)
introduced top-down: commit 5226301765de (allocator_v4: Treat above 4G resources more natively)
otherwise known good: commit 0754e00ace63 (allocator_v4: Fix top-level allocations w/o IORESOURCE_ABOVE_4G)
The last one would only be interesting later, when we figured out what is wrong with the second.
I'm not sure how to trace the gap to the installation of gEfiPciIoProtocolGuid. Assuming it goes through StartPciDevicesOnBridge(), maybe we should add additional output there: for each device in the loop: the path, ->Allocated
and ->Registered
.
Updated by Yu-Ping Wu about 1 year ago
- Related to Bug #508: Dojo fails to boot from NVMe with CONFIG_RESOURCE_ALLOCATION_TOP_DOWN enabled added
Updated by Christian Walter 5 months ago
Oberon 4071 wrote:
coreboot revision in git: feb27dcbf3fc685b070c950a16e8adec958bc1ce
coreboot revision (git describe --tags): 4.20-520-gfeb27dcbf3
Tested payloads: edk2 from MrChromebox revision uefipayload_202304 and uefipayload_202306coreboot will not boot my Lenovo ThinkPad T440p with CONFIG_RESOURCE_ALLOCATION_TOP_DOWN enabled, when using the edk2 payload (MrChromebox version, either uefipayload_202304 or uefipayload_202306). The display sometimes turns on, indicating that some of the hardware initialization was successful, but the payload will not start.
I tried to disable CONFIG_RESOURCE_ALLOCATION_TOP_DOWN in .config, but the build process insists on leaving this config enabled. This seems to be caused by the RESOURCE_ALLOCATION_TOP_DOWN setting changed to "def_bool y" in src/device/Kconfig in commit 5226301765ded70e0ef640e5252bbaca8cd14451 (allocator_v4: Treat above 4G resources more natively). The make target for building coreboot seems to automatically rerun olddefconfig, causing this setting to always remain enabled no matter what was previously saved in the .config file.
Modifying src/device/Kconfig to change RESOURCE_ALLOCATION_TOP_DOWN to "def_bool n" appears to fix the problem on my machine.
I have attached my .config file (with CONFIG_RESOURCE_ALLOCATION_TOP_DOWN enabled to reproduce the problem).
We fixed the issue in our edk2 here: https://github.com/9elements/edk2/commit/d965778103bfe2badd815c6fe35d2786581980b3