Project

General

Profile

Actions

Bug #259

open

T440p: Tianocore unable to boot Windows 10 (MACHINE_CHECK_EXCEPTION)

Added by Crazy Fox about 2 years ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
Category:
board support
Target version:
-
Start date:
06/09/2020
Due date:
% Done:

0%

Estimated time:
Affected versions:
Needs backport to:
Affected hardware:
Affected OS:

Description

Hi, Team!

I've successfully corebooted my T440p (without dGPU) - Debian 10 works fine, but unable to boot into Windows 10 - getting BSOD with Stop Code: MACHINE_CHECK_EXCEPTION
The same exception appears even trying to boot from Windows usb installation media.

Tried different config options, tried coreboot master and v4.11/v4.12 tags with the same result

There is some info about my setup:

$ git rev-parse HEAD
342a8c3b2bc0845638e852af01f3054256a8446c
$ sudo hwinfo --cpu --short
cpu:
Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz, 800 MHz
$ cat defconfig 
CONFIG_LOCALVERSION="GLETA1WW (2.55)"
CONFIG_USE_OPTION_TABLE=y
CONFIG_TIMESTAMPS_ON_CONSOLE=y
CONFIG_FW_CONFIG=y
CONFIG_FW_CONFIG_SOURCE_CBFS=y
CONFIG_VENDOR_LENOVO=y
CONFIG_ONBOARD_VGA_IS_PRIMARY=y
CONFIG_CBFS_SIZE=0x200000
CONFIG_MAINBOARD_SMBIOS_PRODUCT_NAME="20AWS0VK00"
CONFIG_HAVE_IFD_BIN=y
CONFIG_BOARD_LENOVO_THINKPAD_T440P=y
CONFIG_CONSOLE_POST=y
CONFIG_PCIEXP_L1_SUB_STATE=y
CONFIG_POWER_STATE_PREVIOUS_AFTER_FAILURE=y
CONFIG_HAVE_MRC=y
CONFIG_MRC_FILE="3rdparty/blobs/mainboard/$(MAINBOARDDIR)/mrc.bin"
CONFIG_PCIEXP_CLK_PM=y
CONFIG_VALIDATE_INTEL_DESCRIPTOR=y
CONFIG_H8_SUPPORT_BT_ON_WIFI=y
CONFIG_HAVE_ME_BIN=y
CONFIG_CHECK_ME=y
CONFIG_USE_ME_CLEANER=y
CONFIG_HAVE_GBE_BIN=y
CONFIG_ELOG=y
CONFIG_USBDEBUG=y
CONFIG_USBDEBUG_DONGLE_FTDI_FT232H=y
CONFIG_DRIVERS_GENERIC_CBFS_SERIAL=y
CONFIG_DRIVERS_PS2_KEYBOARD=y
CONFIG_DEBUG_TPM=y
CONFIG_TPM_RDRESP_NEED_DELAY=y
CONFIG_SECURITY_CLEAR_DRAM_ON_REGULAR_BOOT=y
CONFIG_PAYLOAD_TIANOCORE=y

Files

results.log (343 KB) results.log FWTS Crazy Fox, 06/10/2020 10:27 AM
ntbtlog.txt (20.2 KB) ntbtlog.txt windows boot log Crazy Fox, 06/10/2020 10:28 AM
IMG_20200610_131344.jpg (114 KB) IMG_20200610_131344.jpg Crazy Fox, 06/10/2020 10:56 AM
IMG_20200610_132042.jpg (120 KB) IMG_20200610_132042.jpg Crazy Fox, 06/10/2020 10:56 AM
IMG_20200610_132407.jpg (123 KB) IMG_20200610_132407.jpg Crazy Fox, 06/10/2020 10:56 AM
Actions #1

Updated by Paul Menzel about 2 years ago

Is there any more information regarding the machine check exception? Could you please take a picture of the error screen, and upload it?

Do you see ACPI errors when starting a GNU/Linux distribution? Does mcelog output anything? Does FWTS 1 show anything critical?

PS: For the record. The master commit you used:

$ git describe 342a8c3b2bc0845638e852af01f3054256a8446c
4.12-604-g342a8c3b2b
Actions #2

Updated by Crazy Fox about 2 years ago

Paul Menzel wrote:

Is there any more information regarding the machine check exception? Could you please take a picture of the error screen, and upload it?

Do you see ACPI errors when starting a GNU/Linux distribution? Does mcelog output anything? Does FWTS [1] show anything critical?

PS: For the record. The master commit you used:

$ git describe 342a8c3b2bc0845638e852af01f3054256a8446c
4.12-604-g342a8c3b2b

[1]: https://wiki.ubuntu.com/FirmwareTestSuite

Thanks for reply!

Since yesterday there is a little progress - as was suggested in the reddit thread (https://www.reddit.com/r/coreboot/comments/gzmvgp/thinkpad_t440p_coreboot_v412tianocore_machine/ftjinjo/), after rollback the patch https://review.coreboot.org/c/coreboot/+/38723/6/src/mainboard/lenovo/t440p/romstage.c I can boot/login to Windows in Safe Mode.

But during normal boot it still BSODing with PAGE_FAULT_IN_NONPAGED_AREA, KMODE_EXCEPTION_NOT_HANDLED or BAD_POOL_CALLER just after Welcome Screen appears.
When switching back to stock bios (only bios region on 4mb flash, 8mb with stripped IME stay untouched) - Windows boots as expected.

FWTS accidentally gets 14 criticals:

Critical failures: 14
 mtrr: Memory range 0x82200000 to 0x82200fff (0000:03:00.0) has incorrect attribute Unknown.
 ... just truncated same errors
 mtrr: Memory range 0x82845000 to 0x8284500f (0000:00:16.0) has incorrect attribute Unknown.
$ sudo ras-mc-ctl --errors
No Memory errors.

PCIe AER events:
1 2020-06-09 20:11:07 +0300 Fatal error: Poisoned TLP
... just truncated same errors
7 2020-06-10 12:03:46 +0300 Fatal error: Poisoned TLP

No Extlog errors.

No MCE errors.
 sudo systemctl status ras-mc-ctl
● ras-mc-ctl.service - Initialize EDAC v3.0.0 Drivers For Machine Hardware
   Loaded: loaded (/lib/systemd/system/ras-mc-ctl.service; enabled; vendor preset: enabled)
   Active: active (exited) since Wed 2020-06-10 13:25:40 EEST; 8min ago
  Process: 896 ExecStart=/usr/sbin/ras-mc-ctl --register-labels (code=exited, status=0/SUCCESS)
 Main PID: 896 (code=exited, status=0/SUCCESS)

чер 10 13:25:39 ThinkPad-T440p systemd[1]: Starting Initialize EDAC v3.0.0 Drivers For Machine Hardware...
чер 10 13:25:40 ThinkPad-T440p ras-mc-ctl[896]: ras-mc-ctl: Error: No dimm labels for LENOVO model 20AWS0VK00
чер 10 13:25:40 ThinkPad-T440p systemd[1]: Started Initialize EDAC v3.0.0 Drivers For Machine Hardware.
$ sudo dmesg | grep acpi
[    0.216226] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    0.224460] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI]
[    0.224486] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME AER PCIeCapability LTR]
[    0.224492] acpi PNP0A08:00: [Firmware Info]: MMCONFIG for domain 0000 [bus 00-3f] only partially covers this bridge
[    0.246620] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    2.303952] acpi device:0b: registered as cooling_device9
[   21.922355] thinkpad_acpi: ThinkPad ACPI Extras v0.26
[   21.922360] thinkpad_acpi: http://ibm-acpi.sf.net/
[   21.922362] thinkpad_acpi: ThinkPad BIOS GLETA1WW (2.55), EC GLHT30WW-3.23
[   21.932112] thinkpad_acpi: radio switch found; radios are enabled
[   21.934136] thinkpad_acpi: Tablet mode switch found (type: MHKG), currently in laptop mode
[   21.934198] thinkpad_acpi: This ThinkPad has standard ACPI backlight brightness control, supported by the ACPI video driver
[   21.934199] thinkpad_acpi: Disabling thinkpad-acpi brightness events by default...
[   21.956276] thinkpad_acpi: rfkill switch tpacpi_bluetooth_sw: radio is unblocked
[   21.957888] thinkpad_acpi: rfkill switch tpacpi_wwan_sw: radio is unblocked
[   21.967317] thinkpad_acpi: Standard ACPI backlight interface available, not loading native one
[   21.967658] thinkpad_acpi: Console audio control enabled, mode: monitor (read only)
[   21.971527] thinkpad_acpi: battery 1 registered (start 0, stop 100)
[   21.971722] input: ThinkPad Extra Buttons as /devices/platform/thinkpad_acpi/input/input7

also attached BSODs shots, FWTS full log & windows boot log

Actions #3

Updated by Paul Menzel about 2 years ago

I guess the pictures are from different boots?

  1. 2 * PAGE_FAULT_IN_NONPAGED_AREA
  2. BAD_POOL_CALLER

I guess with your original report you saw MACHINE_CHECK_EXCEPTION?

Getting different errors, I’d say there is a problem with memory init. But as GNU/Linux works, I am not sure. Hopefully others will have a clue.

Actions #4

Updated by Crazy Fox about 2 years ago

Paul Menzel wrote:

I guess the pictures are from different boots?

  1. 2 * PAGE_FAULT_IN_NONPAGED_AREA
  2. BAD_POOL_CALLER

I guess with your original report you saw MACHINE_CHECK_EXCEPTION?

Getting different errors, I’d say there is a problem with memory init. But as GNU/Linux works, I am not sure. Hopefully others will have a clue.

Yes, initially I've got MACHINE_CHECK_EXCEPTION during Normal boot, Safe Mode boot and USB Installation Media boot with no single entry was added to the ntbtlog.txt.
With reverted patch at normal boot BSOD appears with PAGE_FAULT_IN_NONPAGED_AREA, KMODE_EXCEPTION_NOT_HANDLED or BAD_POOL_CALLER in random order just after Welcome Screen appears.

Actions #6

Updated by Crazy Fox almost 2 years ago

seems it can be closed, as latest Win 10 2009 build works fine

Actions #7

Updated by Angel Pons almost 2 years ago

  • Assignee set to Angel Pons

I see the same BSOD on the Asrock B85M Pro4, and decided to take a look. Out of desperation, I tried disabling things in the devicetree, and found out why things break. See https://review.coreboot.org/43763 for a dirty fix for the B85M Pro4.

Turns out that, when the last root port function is not visible for any reason, root_port_commit_config() is not called and PCH PCIe root port initialization is not completed. A symptom of this problem is that briefly pressing the power button on the payload should power the computer off, but it might lock up instead. It's a side effect of missing initialization, it seems.

There's a log of the T440p in board_status that shows that the last PCIe root port is disabled, which could be why things don't work: https://review.coreboot.org/cgit/board-status.git/tree/lenovo/t440p/4.11-1594-g6daa8c3ba5/2020-03-13T02_50_21Z/coreboot_console.txt

In short: no, this is far from solved. The PCIe handling code needs a revamp.

Actions #8

Updated by Angel Pons almost 2 years ago

Update: I've come up with a cleaner way to handle the Asrock B85M Pro4 problem: https://review.coreboot.org/44155

Actions #9

Updated by Jamal Wright over 1 year ago

sorry to necro this?

I run hacked 8.1 and I hate 10... but I have found a partial cause for these bugs.

The pool caller crash is from lenovo hotkey drivers in windows. 10 probably auto installed them.

The machine check exception is from the card reader. It affected me on warm boot only. The work around is to stop those lenovo drivers/services and disable the card reader. Windows then works great (maybe better besides battery drain). Interestingly lenovo PM drivers work fine but for some reason in synaptic the palm rejection isn't as good as I remember on patched lenovo bios; that might be OT though.

I have tried: https://review.coreboot.org/43763 and obviously the new commit was already built in... but they didn't change anything.

Actions

Also available in: Atom PDF