Bug #121

T520: Hangs in OS

Added by Julz Buckton almost 2 years ago. Updated 3 months ago.

Status:NewStart date:06/09/2017
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

I have been running coreboot since 2017.04.15 and have experienced hangs ever since then. It was suggested by folk on the IRC that I run memtest to check for incorrect raminit causing errors, however I have run memtest for 12 hours straight with no errors.

Due to the ambiguous nature of the hangs (immediate freeze with no warning signs, audio gets stuck repeating the last 50ms or so of noise, not sure what this effect is called) I don't have much useful information other than the .config and dmesg. However one thing I can say with high confidence is that the hangs occur significantly more frequently in Linux (*buntu distros) than Windows 10. Within an hour of launching Linux a hang is likely, whereas Windows typically runs for many hours before a hang occurs. I considered this an insignificant anecdotal anomaly at first but over the course of the nearly 2 months I have been running coreboot it seems to be a solid trend. The hangs occur anywhere, typically during mere desktop usage or basic web browsing.

Additionally there is another form of hang I experience where the screen goes black except for some sort of graphical corruption down the left side (http://i.imgur.com/4zWrlpX.jpg), whether this is related to the more common total freeze hangs I don't know but I figured I should include it nonetheless. These hangs only occur about 1:20 compared to the regular hangs.

config (20.7 KB) Julz Buckton, 06/09/2017 06:21 AM

dmesg.txt Magnifier (57.3 KB) Julz Buckton, 06/09/2017 06:21 AM

cbmem-raminit.txt Magnifier (62 KB) Julz Buckton, 06/29/2017 11:58 PM

History

#1 Updated by Julz Buckton almost 2 years ago

https://mail.coreboot.org/pipermail/coreboot/2016-September/082009.html

According to this entry on the mailing list someone else was getting the same issue on their T520. I have tried limiting the max mem speed to 666 in devicetree.cb as suggested in the link, however it did not fix the issue as expected since my RAM is only 1333 anyway. The second suggestion (limiting CPU p-state), I wouldn't know how to do.

#2 Updated by Nico Huber almost 2 years ago

Does your T520 have a dedicated GPU or the integrated Intel GPU only?

#3 Updated by Julz Buckton almost 2 years ago

Integrated only.

#4 Updated by Iru Cai almost 2 years ago

What is the longest uptime before the system hangs in Linux?
How long the system can run before it hangs when you run some heavy loads (e.g. boinc) or do a lot of network transfer?

Also, I suggest you try revision 39937cc2fd28bcc754c0595f1327467499af40ea in which Lenovo T520 is still using mrc.bin blob. I'm now running it the first time and the system has run for >5 hours. However, I don't know if it's still stable in the future boots.

#5 Updated by Vasya Boytsov almost 2 years ago

I have the same issue on t420 with 3632qm. And I accidentally found out that my laptop works more than 2 days without any hangs while I was using the x220 kernel config which had maxcpus set to 4. When I changed this value to 8 in the kernel config those hangs came back. I don't remember whether the maxcpus=7 worked the same way or not.

#6 Updated by Julz Buckton almost 2 years ago

Iru Cai wrote:

What is the longest uptime before the system hangs in Linux?
How long the system can run before it hangs when you run some heavy loads (e.g. boinc) or do a lot of network transfer?

Also, I suggest you try revision 39937cc2fd28bcc754c0595f1327467499af40ea in which Lenovo T520 is still using mrc.bin blob. I'm now running it the first time and the system has run for >5 hours. However, I don't know if it's still stable in the future boots.

I am lucky to get 1 hour uptime in linux. Heavy loads on windows seem to prevent the hangs, I have run Linpack and some GPU benchmarks multiple times for 6+ hours at a time with no hang, and have never seen a hang during such programs. This doesn't seem to be the case on linux, where I frequently get hangs during the crossgcc build stage of the coreboot build, which I assume is running the CPU high. Network activity does not seem to prevent the hangs, furthermore the most common hang scenario for me now is when the laptop was left for some hours with only a torrent client running, where it is unlikely to not hang after 2 hours.

Vasya Boytsov wrote:

I have the same issue on t420 with 3632qm. And I accidentally found out that my laptop works more than 2 days without any hangs while I was using the x220 kernel config which had maxcpus set to 4. When I changed this value to 8 in the kernel config those hangs came back. I don't remember whether the maxcpus=7 worked the same way or not.

I already using a 4 CPUs chip though (i5-3320M). Perhaps I could try setting maxcpus=2 in config.

#7 Updated by Iru Cai almost 2 years ago

Julz Buckton wrote:

Iru Cai wrote:

What is the longest uptime before the system hangs in Linux?
How long the system can run before it hangs when you run some heavy loads (e.g. boinc) or do a lot of network transfer?

Also, I suggest you try revision 39937cc2fd28bcc754c0595f1327467499af40ea in which Lenovo T520 is still using mrc.bin blob. I'm now running it the first time and the system has run for >5 hours. However, I don't know if it's still stable in the future boots.

I am lucky to get 1 hour uptime in linux. Heavy loads on windows seem to prevent the hangs, I have run Linpack and some GPU benchmarks multiple times for 6+ hours at a time with no hang, and have never seen a hang during such programs. This doesn't seem to be the case on linux, where I frequently get hangs during the crossgcc build stage of the coreboot build, which I assume is running the CPU high. Network activity does not seem to prevent the hangs, furthermore the most common hang scenario for me now is when the laptop was left for some hours with only a torrent client running, where it is unlikely to not hang after 2 hours.

Have you tried mrc.bin yet, e.g revision 39937cc?
I've tried this revision and the first revision that uses native ram init, and it seems that native ram init is the problem. I just don't know if mrc.bin supports ivy bridge yet.

Vasya Boytsov wrote:

I have the same issue on t420 with 3632qm. And I accidentally found out that my laptop works more than 2 days without any hangs while I was using the x220 kernel config which had maxcpus set to 4. When I changed this value to 8 in the kernel config those hangs came back. I don't remember whether the maxcpus=7 worked the same way or not.

I already using a 4 CPUs chip though (i5-3320M). Perhaps I could try setting maxcpus=2 in config.

#8 Updated by Iru Cai almost 2 years ago

Vasya Boytsov wrote:

I have the same issue on t420 with 3632qm. And I accidentally found out that my laptop works more than 2 days without any hangs while I was using the x220 kernel config which had maxcpus set to 4. When I changed this value to 8 in the kernel config those hangs came back. I don't remember whether the maxcpus=7 worked the same way or not.

Linux kernel config?
I remember I haven't have any issue on an iGPU only T420. My last working revision is 8bbd596de631adc8b677e69603e978b848eb1708.

#9 Updated by Vasya Boytsov almost 2 years ago

Iru Cai wrote:

Vasya Boytsov wrote:

I have the same issue on t420 with 3632qm. And I accidentally found out that my laptop works more than 2 days without any hangs while I was using the x220 kernel config which had maxcpus set to 4. When I changed this value to 8 in the kernel config those hangs came back. I don't remember whether the maxcpus=7 worked the same way or not.

Linux kernel config?
I remember I haven't have any issue on an iGPU only T420. My last working revision is 8bbd596de631adc8b677e69603e978b848eb1708.

Yes, I've changed this setting in the Linux kernel config, compiled the kernel and it works flawlessly now. The last time I was testing was between 4.5 and 4.6 don't remember the exact revision. So, the problem should be connected with native ram init, I'll try earlier revisions later. How can one be of help with debugging of this issue?

#10 Updated by Julz Buckton almost 2 years ago

Iru Cai wrote:

Julz Buckton wrote:

Iru Cai wrote:

What is the longest uptime before the system hangs in Linux?
How long the system can run before it hangs when you run some heavy loads (e.g. boinc) or do a lot of network transfer?

Also, I suggest you try revision 39937cc2fd28bcc754c0595f1327467499af40ea in which Lenovo T520 is still using mrc.bin blob. I'm now running it the first time and the system has run for >5 hours. However, I don't know if it's still stable in the future boots.

I am lucky to get 1 hour uptime in linux. Heavy loads on windows seem to prevent the hangs, I have run Linpack and some GPU benchmarks multiple times for 6+ hours at a time with no hang, and have never seen a hang during such programs. This doesn't seem to be the case on linux, where I frequently get hangs during the crossgcc build stage of the coreboot build, which I assume is running the CPU high. Network activity does not seem to prevent the hangs, furthermore the most common hang scenario for me now is when the laptop was left for some hours with only a torrent client running, where it is unlikely to not hang after 2 hours.

Have you tried mrc.bin yet, e.g revision 39937cc?
I've tried this revision and the first revision that uses native ram init, and it seems that native ram init is the problem. I just don't know if mrc.bin supports ivy bridge yet.

You mean this version? https://review.coreboot.org/cgit/coreboot.git/commit/?id=39937cc2fd28bcc754c0595f1327467499af40ea

I will give it a try. Could native ram init really be the cause of the issue, even if I got no errors in memtest?

#11 Updated by Julz Buckton almost 2 years ago

Tried coreboot revision 39937cc2fd28bcc754c0595f1327467499af40ea (with systemagent-r6.bin, tried systemagent-ivybridge.bin first and got brick) and got a hang within 30 seconds of booting into linux. Guess that rules out RAM init being the cause of hangs?

#12 Updated by Julz Buckton almost 2 years ago

Here is cbmem output with verbose RAM init logging enabled, in case it is helpful.

#13 Updated by Julz Buckton almost 2 years ago

I managed to get my hands on another SNB chip (i3-2310M) and with the same config (with just PCI ID for vga blob changed from 8086:0166 to 8086:0126), I get no hangs.

So looks like T520 mainboard + Ivy Bridge chip is cause for hangs.

#14 Updated by Iru Cai almost 2 years ago

Julz Buckton wrote:

I managed to get my hands on another SNB chip (i3-2310M) and with the same config (with just PCI ID for vga blob changed from 8086:0166 to 8086:0126), I get no hangs.

So looks like T520 mainboard + Ivy Bridge chip is cause for hangs.

Maybe related to turbo boost? Although the machine often hangs at idle time.
Because the system hang also happens when I use a Sandy Bridge Dual/Quad core processor.

#15 Updated by Patrick Rudolph over 1 year ago

Vendor does dynamically limit pstate depending on attached power supply.
ATM coreboot doesn't care about attached PSU...

Example:
The battery charges at 45 Watt.
The CPU has a TPD of 45 W.
7W idle power.
Other components, including USB 10W ?

It would require a 135 Watt PSU or limiting the CPU TDP / battery charge current to a smaller value.

What power-rating does your PSU have ?

#16 Updated by Seff Qin 10 months ago

Test v4.8.1 with t420, this issue has not been fixed.

I got different informations by executing 'dmidecode -t 17':
Vendor BIOS: Total Width and Data Width are both 64 bits.
Coreboot: Total Width is 16 bits and Data Width is 8 bits.

It seems that the RAMs are not running at full speed.

#17 Updated by Evgeny Zinoviev 8 months ago

Having hangs on T520 + i5-2450M. Happened twice after ~1 min after booting debian (devuan). The interesting part is that it unfreezes after 4-5 minutes. I'm using two 4G Hynix RAM sticks, 8G in total. I'll see if maxcpus=2 helps.

#18 Updated by Evgeny Zinoviev 8 months ago

Update: maxcpus=2 didn't help

#19 Updated by Nico Huber 8 months ago

Evgeny Zinoviev wrote:

Update: maxcpus=2 didn't help

Please note that the original report was for an Ivy Bridge CPU in a T520 (probably caused by missing compatible ME firmware or whatnot). You seem to have a very different problem.

#20 Updated by Evgeny Zinoviev 3 months ago

Now I have X220 with this bug. Yeah I know that the original report is for IVB CPU in T520, but i've seen both symptoms and they are the same: (1) just a hang and (2) a black screen with fluttering red line at the left, like on the photo from the last paragraph of this ticket.

Doesn't happen with lenovo bios. For now I suspect it's something RAM related (just have no other ideas). I'm using 2x8Gb Patriot PSD38G16002S sticks. I'll try to use different sticks and see if it helps. What else can I do to debug this? At least I have a hardware on which we can reproduce this, that's something for a start.

Also available in: Atom PDF