Bug #297

T440P card reader is broken.

Added by Jamal Wright 8 months ago. Updated 4 months ago.

Status:NewStart date:03/03/2021
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

I've been trying to figure out the reason why the card reader detects but doesn't work and crashes windows. Wanted to track this bug outside of the mailing list since that will probably get lost. I've even cross compared to the X250 and everything looks normal. Am I missing a GPIO? Is there something wrong with the PCIE detection? Perhaps a conflict with the wifi which is on the same PCIE root? Building off master as of 3/3/21

I get this in kernel:

kernel: pci 0000:00:1c.1: [8086:8c12] type 01 class 0x060400
kernel: pci 0000:00:1c.1: PME# supported from D0 D3hot D3cold
kernel: pci 0000:00:1c.1: PCI bridge to [bus 03]
kernel: pci 0000:00:1c.1: bridge window [mem 0x82300000-0x823fffff]
kernel: pci 0000:00:1c.1: PCI bridge to [bus 03]
kernel: pci 0000:00:1c.1: bridge window [mem 0x82300000-0x823fffff]
kernel: pcieport 0000:00:1c.1: PME: Signaling with IRQ 27
kernel: pcieport 0000:00:1c.1: AER: enabled with IRQ 27

This is my PCIE tree:
-[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor DRAM Controller
+-02.0 Intel Corporation 4th Gen Core Processor Integrated Graphics Controller
+-03.0 Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller
+-04.0 Intel Corporation Device 0c03
+-14.0 Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI
+-16.0 Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1
+-19.0 Intel Corporation Ethernet Connection I217-LM
+-1a.0 Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2
+-1b.0 Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller
+-1c.0-[02]----00.0 Realtek Semiconductor Co., Ltd. RTS5227 PCI Express Card Reader
+-1c.1-[03]----00.0 Intel Corporation Wireless 7265
+-1d.0 Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1
+-1f.0 Intel Corporation QM87 Express LPC Controller
+-1f.2 Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode]
-1f.3 Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller
LSPCI -vv : https://pastebin.com/0hbb0vZg

I've attached CBMEM logs as well. A bit out of ideas, just sitting with fingers crossed hoping a haswell or lynxpoint commit fixes it :)

onbtpci.txt Magnifier - CBMEM log (26.2 KB) Jamal Wright, 03/03/2021 08:34 PM

pcidevs.txt Magnifier - lspci in case pastebin goes down (26.2 KB) Jamal Wright, 03/03/2021 08:35 PM

x250pci.txt Magnifier - x250 vendor firmware (21.9 KB) Jamal Wright, 03/04/2021 12:38 PM

pcidevs.txt Magnifier - Coreboot lspci (26.2 KB) Jamal Wright, 03/05/2021 02:57 PM

dsdt.pre - DSDT (568 KB) Jamal Wright, 03/05/2021 02:57 PM

pcitreeven.txt Magnifier - Vendor Tree (1.09 KB) Jamal Wright, 03/05/2021 02:57 PM

T440P-vendor.txt Magnifier - Vendor lspci (22.9 KB) Jamal Wright, 03/05/2021 02:57 PM

thinkpad_t440p.7z - autoport result (4.72 KB) Jamal Wright, 03/06/2021 12:11 PM

logs.7z - autoport logs (118 KB) Jamal Wright, 03/06/2021 12:11 PM

arch-dmesg.txt Magnifier (66.1 KB) Jamal Wright, 03/06/2021 02:03 PM

cinamon-dmesg.txt Magnifier (57.2 KB) Jamal Wright, 03/06/2021 02:03 PM

dmesg-buntu.txt Magnifier (62.2 KB) Jamal Wright, 03/06/2021 02:03 PM

History

#1 Updated by Paul Menzel 8 months ago

Sorry, you are only saying using the card reader crashes Microsoft Windows, right? Does it work in GNU/Linux?

#2 Updated by Jamal Wright 8 months ago

GNU/Linux?

It doesn't work in linux but thankfully doesn't crash it. I get nothing from the realtek driver module either. On the X250 a kmessage shows up when an sd card is inserted. I get PCIE errors in both windows and as shown from that log, linux. I've also tried removing the wireless card and all other removable PCIE devices before booting but no change.

#3 Updated by Jamal Wright 8 months ago

I think part of the log is cut off:

Feb 28 09:55:36 t440p kernel: pcieport 0000:00:1c.1: AER: Corrected error received: 0000:00:1c.1
Feb 28 09:55:36 t440p kernel: pcieport 0000:00:1c.1: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Feb 28 09:55:36 t440p kernel: pcieport 0000:00:1c.1: device [8086:8c12] error status/mask=00000001/00002000

#4 Updated by Paul Menzel 8 months ago

(Please use Markdown formatting, so your comments are well legible.)

It might have to do something with power saving features. See the ASPM entries below in the lspci -vv output with coreboot.

02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5227 PCI Express Card Reader (rev 01)
    Subsystem: Realtek Semiconductor Co., Ltd. RTS5227 PCI Express Card Reader
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 29
    Region 0: Memory at 82200000 (32-bit, non-prefetchable) [size=4K]
    Capabilities: [40] Power Management version 3
        Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Address: 00000000fee002b8  Data: 0000
    Capabilities: [70] Express (v2) Endpoint, MSI 00
        DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
            ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
        DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
            RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
            MaxPayload 128 bytes, MaxReadReq 512 bytes
        DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
        LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
            ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
        LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk-
            ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
        LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
            TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
             10BitTagComp- 10BitTagReq- OBFF Via message/WAKE#, ExtFmt- EETLPPrefix-
             EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
             FRS- TPHComp- ExtTPHComp-
             AtomicOpsCap: 32bit- 64bit- 128bitCAS-
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled,
             AtomicOpsCtl: ReqEn-
        LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
             Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
             EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
             Retimer- 2Retimers- CrosslinkRes: unsupported
    Capabilities: [100 v2] Advanced Error Reporting
        UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
        CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
        AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
            MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
        HeaderLog: 00000000 00000000 00000000 00000000
    Capabilities: [140 v1] Device Serial Number 00-00-00-01-00-4c-e0-00
    Capabilities: [150 v1] Latency Tolerance Reporting
        Max snoop latency: 0ns
        Max no snoop latency: 0ns
    Capabilities: [158 v1] L1 PM Substates
        L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
              PortCommonModeRestoreTime=60us PortTPowerOnTime=60us
        L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
               T_CommonMode=0us LTR1.2_Threshold=0ns
        L1SubCtl2: T_PwrOn=10us
    Kernel driver in use: rtsx_pci
    Kernel modules: rtsx_pci

Please attach (not paste) also the lspci vv output when running the vendor firmware.

#5 Updated by Paul Menzel 8 months ago

siro in #coreboot@irc.freenode.net made an important observation, that the Linux errors you pasted are for the WiFi device (00:1c.1).

It might also be a good idea to contact the maintainers of the Linux driver rtsx_pci (MULTIMEDIA CARD (MMC), SECURE DIGITAL (SD) AND SDIO SUBSYSTEM).

#6 Updated by Jamal Wright 8 months ago

Flashing back is a bit of a pain as I have to hardware flash coreboot; the vendor bios blocks flashing. I have the dump from the similar broadwell X250 with the same card reader and an even newer wifi card. I think I had mentioned that the card is what drops those errors, they weren't present on vendor fw. ASPM stuff seems similar on x250. Is there a way to compile coreboot without aspm support? Or make some deeper PCIE log. The rtsx driver's only debug output was registers, I did look there and it didn't seem there was much to it. Think it had to be recompiled to get that output too. (https://github.com/torvalds/linux/commit/e455b69ddf9b69326d0cab28d374faf3325489c9)

Was waiting on commits concerning lynxpoint and autoport to make it into coreboot before trying to flash back. Going to run autoport and then compare to what is currently in coreboot, especially the dsdt/acpi entries as they are written out in thinkpad.asl and appear to cover multiple thinkpads; missing some functions vs what I see in the x250. The hotkey driver bluescreens windows and I suspect it is from this.

Speaking of windows, when I booted it the first time, all the PCIE drivers got re-arrranged and re detected which is strange since it's the same devices and theoretically the same device IDs. Not noticeable on linux because it detects everything every boot.

I had tried the IRC before but it never let me send messages so assumed non official coreboot folks weren't welcome.

#7 Updated by Jamal Wright 8 months ago

Also, would like to add that I'm not the only one with the issue: https://github.com/archfan/coreboot/issues/13#issue-638222377

Same AER error: https://github.com/archfan/coreboot/issues/2

#8 Updated by Jamal Wright 8 months ago

I have bit the bullet even though they are expensive now. I have the vendor patched firmware 2.54 and can do an autoport. The LSPCI VV is different, especially memory regions, coreboot uses a completely different region.

Now how to fix it... I tested the reader and it works under the vendor firmware.

#9 Updated by Nico Huber 8 months ago

Please provide a more detailed description of the actual problem. It's "not working"/"broken" doesn't give any hint into what direction to look. Is there any error message? Is the reader detected but a card is not? How do you know that Windows crashes because of the reader? What exact action is causing the crash?

Please provide every log of the OS you can, especially a full dmesg. As the error happens in the OS, we need to know the state of the OS. It could be some odd side-effect (e.g. such as "Linux enables the IOMMU only with coreboot"), you never know.

The errors reported for 00:1c.1 seem to be no errors at all (it says they were corrected) and unrelated. They are only visible because coreboot enables more error reporting (AER) and the same probably happens with the vendor firmware.

There are a few odd things in lspci with coreboot. But they don't seem fatal: LTR configured to 0s (I guess that effectively disables it, but it's optional anyway). The PCIe slots are not configured as slots shrug. For 00:02.0 L1 is enabled but L0s isn't. I don't understand the latter, seems worth investigating.

#10 Updated by Jamal Wright 8 months ago

Ok, I'll try to be more detailed:

"Not working" -windows
Card reader device detects and driver installs. On cold boot the card reader shows in device manager. Drive letter never appears and can't be added.
Inserting a card does nothing and produces no logs/messages. Upon reboot, card inserted or not the computer bluscreens and dumps core with "bad_pool_caller".

"not working" -linux
Card reader shows in lspci. Inserting a card does nothing, there is no mention of it in dmesg. No crashes, no activity. The only PCI related message is as you say to the wireless card. On vendor firmware the only mention in dmesg is the card being detected when inserted so not much to go on there.

Results of the autoport:
The config0-3 registers are different. There are 2 gpios set low vs high. Trying this made no obvious difference.
Checking with LSPCI in the logs, the address of the reader goes from Address: 00000000fee002f8 to Address: 00000000fee002b8 and the hex dump from lspci is slightly different. Then as you said, vendor has L0/L1 ASPM on and coreboot has L1 aspm on. The wireless card address is moved too: Address: 00000000fee00378 to Address: 00000000fee00398.

I have the core dump from windows but it's 36mb compressed and to big to upload. Nothing really reported in linux to show. I can upload the entire dmesg but I've already scoured it and you'll just get a bunch of normal stuff but no clues.

#11 Updated by Paul Menzel 8 months ago

As uploading the Linux messages (dmesg) is easy, please do so anyway. ;-)

#12 Updated by Jamal Wright 8 months ago

Ok, I tried 3 different distros.

#13 Updated by Nico Huber 8 months ago

I've written a patch[1] for the LTR configuration in coreboot. It should apply on current master, or earlier revisions together with the 2 parent commits. I have doubts that this fixes anything, but it should reduce the difference to the vendor configuration.

[1] https://review.coreboot.org/c/coreboot/+/51328

#14 Updated by Nico Huber 8 months ago

About further debugging this. You can check if the driver is receiving any interrupts: grep rtsx_pci /proc/interrupts.

Disabling ASPM for newer platforms is a bit hack'ish, but as you suggested worth a try. If you remove select PCIEXP_ASPM and select PCIEXP_COMMON_CLOCK from src/southbridge/intel/lynxpoint/Kconfig, the two options can be toggled off in menuconfig. I'm not sure if that is enough to turn it off, better check lspci.

#15 Updated by Jamal Wright 7 months ago

I don't think that will fix it. Someone on another code review mentioned initialization in the bios likely being done by a module. I've checked vendor firmware and there is indeed a module for the card reader. I threw it into IDA but have no idea which PCI registers are written to.

They said it might look like ricoh/rce822 driver and it somewhat does. I could dump the decompile or module here but I don't know if that is allowed.

#16 Updated by Iru Cai 7 months ago

Jamal Wright wrote:

I don't think that will fix it. Someone on another code review mentioned initialization in the bios likely being done by a module. I've checked vendor firmware and there is indeed a module for the card reader. I threw it into IDA but have no idea which PCI registers are written to.

They said it might look like ricoh/rce822 driver and it somewhat does. I could dump the decompile or module here but I don't know if that is allowed.

Will the card reader stop to work on vendor firmware if you remove this module with UEFITool?

#17 Updated by Iru Cai 5 months ago

I don't have a T440p on my hand. Have you compared the gpio.c generated by autoport and the current coreboot code?
It looks like this Realtek card reader is also used on other laptops that supports coreboot, and it still works, so maybe it's not the driver problem.

#18 Updated by Bob Dobbs 4 months ago

I can confirm that disabling PCIEXP_ASPM and PCIEXP_COMMON_CLOCK does not fix the issue.

Also available in: Atom PDF