Bug #259
openT440p: Tianocore unable to boot Windows 10 (MACHINE_CHECK_EXCEPTION)
0%
Description
Hi, Team!
I've successfully corebooted my T440p (without dGPU) - Debian 10 works fine, but unable to boot into Windows 10 - getting BSOD with Stop Code: MACHINE_CHECK_EXCEPTION
The same exception appears even trying to boot from Windows usb installation media.
Tried different config options, tried coreboot master and v4.11/v4.12 tags with the same result
There is some info about my setup:
$ git rev-parse HEAD
342a8c3b2bc0845638e852af01f3054256a8446c
$ sudo hwinfo --cpu --short
cpu:
Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz, 800 MHz
$ cat defconfig
CONFIG_LOCALVERSION="GLETA1WW (2.55)"
CONFIG_USE_OPTION_TABLE=y
CONFIG_TIMESTAMPS_ON_CONSOLE=y
CONFIG_FW_CONFIG=y
CONFIG_FW_CONFIG_SOURCE_CBFS=y
CONFIG_VENDOR_LENOVO=y
CONFIG_ONBOARD_VGA_IS_PRIMARY=y
CONFIG_CBFS_SIZE=0x200000
CONFIG_MAINBOARD_SMBIOS_PRODUCT_NAME="20AWS0VK00"
CONFIG_HAVE_IFD_BIN=y
CONFIG_BOARD_LENOVO_THINKPAD_T440P=y
CONFIG_CONSOLE_POST=y
CONFIG_PCIEXP_L1_SUB_STATE=y
CONFIG_POWER_STATE_PREVIOUS_AFTER_FAILURE=y
CONFIG_HAVE_MRC=y
CONFIG_MRC_FILE="3rdparty/blobs/mainboard/$(MAINBOARDDIR)/mrc.bin"
CONFIG_PCIEXP_CLK_PM=y
CONFIG_VALIDATE_INTEL_DESCRIPTOR=y
CONFIG_H8_SUPPORT_BT_ON_WIFI=y
CONFIG_HAVE_ME_BIN=y
CONFIG_CHECK_ME=y
CONFIG_USE_ME_CLEANER=y
CONFIG_HAVE_GBE_BIN=y
CONFIG_ELOG=y
CONFIG_USBDEBUG=y
CONFIG_USBDEBUG_DONGLE_FTDI_FT232H=y
CONFIG_DRIVERS_GENERIC_CBFS_SERIAL=y
CONFIG_DRIVERS_PS2_KEYBOARD=y
CONFIG_DEBUG_TPM=y
CONFIG_TPM_RDRESP_NEED_DELAY=y
CONFIG_SECURITY_CLEAR_DRAM_ON_REGULAR_BOOT=y
CONFIG_PAYLOAD_TIANOCORE=y
Files
Updated by Paul Menzel over 4 years ago
Is there any more information regarding the machine check exception? Could you please take a picture of the error screen, and upload it?
Do you see ACPI errors when starting a GNU/Linux distribution? Does mcelog
output anything? Does FWTS 1 show anything critical?
PS: For the record. The master commit you used:
$ git describe 342a8c3b2bc0845638e852af01f3054256a8446c
4.12-604-g342a8c3b2b
Updated by Crazy Fox over 4 years ago
- File results.log results.log added
- File ntbtlog.txt ntbtlog.txt added
- File IMG_20200610_131344.jpg IMG_20200610_131344.jpg added
- File IMG_20200610_132042.jpg IMG_20200610_132042.jpg added
- File IMG_20200610_132407.jpg IMG_20200610_132407.jpg added
Paul Menzel wrote:
Is there any more information regarding the machine check exception? Could you please take a picture of the error screen, and upload it?
Do you see ACPI errors when starting a GNU/Linux distribution? Does
mcelog
output anything? Does FWTS [1] show anything critical?PS: For the record. The master commit you used:
$ git describe 342a8c3b2bc0845638e852af01f3054256a8446c 4.12-604-g342a8c3b2b
Thanks for reply!
Since yesterday there is a little progress - as was suggested in the reddit thread (https://www.reddit.com/r/coreboot/comments/gzmvgp/thinkpad_t440p_coreboot_v412tianocore_machine/ftjinjo/), after rollback the patch https://review.coreboot.org/c/coreboot/+/38723/6/src/mainboard/lenovo/t440p/romstage.c I can boot/login to Windows in Safe Mode.
But during normal boot it still BSODing with PAGE_FAULT_IN_NONPAGED_AREA
, KMODE_EXCEPTION_NOT_HANDLED
or BAD_POOL_CALLER
just after Welcome Screen appears.
When switching back to stock bios (only bios region on 4mb flash, 8mb with stripped IME stay untouched) - Windows boots as expected.
FWTS accidentally gets 14 criticals:
Critical failures: 14
mtrr: Memory range 0x82200000 to 0x82200fff (0000:03:00.0) has incorrect attribute Unknown.
... just truncated same errors
mtrr: Memory range 0x82845000 to 0x8284500f (0000:00:16.0) has incorrect attribute Unknown.
$ sudo ras-mc-ctl --errors
No Memory errors.
PCIe AER events:
1 2020-06-09 20:11:07 +0300 Fatal error: Poisoned TLP
... just truncated same errors
7 2020-06-10 12:03:46 +0300 Fatal error: Poisoned TLP
No Extlog errors.
No MCE errors.
sudo systemctl status ras-mc-ctl
● ras-mc-ctl.service - Initialize EDAC v3.0.0 Drivers For Machine Hardware
Loaded: loaded (/lib/systemd/system/ras-mc-ctl.service; enabled; vendor preset: enabled)
Active: active (exited) since Wed 2020-06-10 13:25:40 EEST; 8min ago
Process: 896 ExecStart=/usr/sbin/ras-mc-ctl --register-labels (code=exited, status=0/SUCCESS)
Main PID: 896 (code=exited, status=0/SUCCESS)
чер 10 13:25:39 ThinkPad-T440p systemd[1]: Starting Initialize EDAC v3.0.0 Drivers For Machine Hardware...
чер 10 13:25:40 ThinkPad-T440p ras-mc-ctl[896]: ras-mc-ctl: Error: No dimm labels for LENOVO model 20AWS0VK00
чер 10 13:25:40 ThinkPad-T440p systemd[1]: Started Initialize EDAC v3.0.0 Drivers For Machine Hardware.
$ sudo dmesg | grep acpi
[ 0.216226] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[ 0.224460] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI]
[ 0.224486] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME AER PCIeCapability LTR]
[ 0.224492] acpi PNP0A08:00: [Firmware Info]: MMCONFIG for domain 0000 [bus 00-3f] only partially covers this bridge
[ 0.246620] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[ 2.303952] acpi device:0b: registered as cooling_device9
[ 21.922355] thinkpad_acpi: ThinkPad ACPI Extras v0.26
[ 21.922360] thinkpad_acpi: http://ibm-acpi.sf.net/
[ 21.922362] thinkpad_acpi: ThinkPad BIOS GLETA1WW (2.55), EC GLHT30WW-3.23
[ 21.932112] thinkpad_acpi: radio switch found; radios are enabled
[ 21.934136] thinkpad_acpi: Tablet mode switch found (type: MHKG), currently in laptop mode
[ 21.934198] thinkpad_acpi: This ThinkPad has standard ACPI backlight brightness control, supported by the ACPI video driver
[ 21.934199] thinkpad_acpi: Disabling thinkpad-acpi brightness events by default...
[ 21.956276] thinkpad_acpi: rfkill switch tpacpi_bluetooth_sw: radio is unblocked
[ 21.957888] thinkpad_acpi: rfkill switch tpacpi_wwan_sw: radio is unblocked
[ 21.967317] thinkpad_acpi: Standard ACPI backlight interface available, not loading native one
[ 21.967658] thinkpad_acpi: Console audio control enabled, mode: monitor (read only)
[ 21.971527] thinkpad_acpi: battery 1 registered (start 0, stop 100)
[ 21.971722] input: ThinkPad Extra Buttons as /devices/platform/thinkpad_acpi/input/input7
also attached BSODs shots, FWTS full log & windows boot log
Updated by Paul Menzel over 4 years ago
I guess the pictures are from different boots?
- 2 * PAGE_FAULT_IN_NONPAGED_AREA
- BAD_POOL_CALLER
I guess with your original report you saw MACHINE_CHECK_EXCEPTION
?
Getting different errors, I’d say there is a problem with memory init. But as GNU/Linux works, I am not sure. Hopefully others will have a clue.
Updated by Crazy Fox over 4 years ago
Paul Menzel wrote:
I guess the pictures are from different boots?
- 2 * PAGE_FAULT_IN_NONPAGED_AREA
- BAD_POOL_CALLER
I guess with your original report you saw
MACHINE_CHECK_EXCEPTION
?Getting different errors, I’d say there is a problem with memory init. But as GNU/Linux works, I am not sure. Hopefully others will have a clue.
Yes, initially I've got MACHINE_CHECK_EXCEPTION
during Normal boot, Safe Mode boot and USB Installation Media boot with no single entry was added to the ntbtlog.txt
.
With reverted patch at normal boot BSOD appears with PAGE_FAULT_IN_NONPAGED_AREA
, KMODE_EXCEPTION_NOT_HANDLED
or BAD_POOL_CALLER
in random order just after Welcome Screen appears.
Updated by Patrick Rudolph over 4 years ago
Please attach WinDbg and investiagate what's causing this issue.
https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/getting-started-with-windbg--kernel-mode-
Updated by Crazy Fox over 4 years ago
seems it can be closed, as latest Win 10 2009 build works fine
Updated by Angel Pons over 4 years ago
- Assignee set to Angel Pons
I see the same BSOD on the Asrock B85M Pro4, and decided to take a look. Out of desperation, I tried disabling things in the devicetree, and found out why things break. See https://review.coreboot.org/43763 for a dirty fix for the B85M Pro4.
Turns out that, when the last root port function is not visible for any reason, root_port_commit_config()
is not called and PCH PCIe root port initialization is not completed. A symptom of this problem is that briefly pressing the power button on the payload should power the computer off, but it might lock up instead. It's a side effect of missing initialization, it seems.
There's a log of the T440p in board_status that shows that the last PCIe root port is disabled, which could be why things don't work: https://review.coreboot.org/cgit/board-status.git/tree/lenovo/t440p/4.11-1594-g6daa8c3ba5/2020-03-13T02_50_21Z/coreboot_console.txt
In short: no, this is far from solved. The PCIe handling code needs a revamp.
Updated by Angel Pons over 4 years ago
Update: I've come up with a cleaner way to handle the Asrock B85M Pro4 problem: https://review.coreboot.org/44155
Updated by Jamal Wright almost 4 years ago
sorry to necro this?
I run hacked 8.1 and I hate 10... but I have found a partial cause for these bugs.
The pool caller crash is from lenovo hotkey drivers in windows. 10 probably auto installed them.
The machine check exception is from the card reader. It affected me on warm boot only. The work around is to stop those lenovo drivers/services and disable the card reader. Windows then works great (maybe better besides battery drain). Interestingly lenovo PM drivers work fine but for some reason in synaptic the palm rejection isn't as good as I remember on patched lenovo bios; that might be OT though.
I have tried: https://review.coreboot.org/43763 and obviously the new commit was already built in... but they didn't change anything.