Can't find logs produced from system lockup during plotting, looked everywhere I know

I’m running Ubuntu Server 20.04, and my system keeps locking up about 8 hours into plotting. I usually use plotman to run 12 jobs in parallel, 4GB and 2 threads each with 30min stagger.

I can’t see whats happening in real time over ssh since I get kicked out during the lockup, but I had a monitor connected this time and saw some very useful messages showing the affected core/thread and blaming a bug in chia flying by.

I can’t find those logs anywhere though. I checked the journal, dmesg, and everyhing in /var/log. There is nothing in /var/crash. They looked like dmesg-style messages with a timestamp like [ 10.286001].

Chia: Reproduced on version 1.1.2 and 1.1.3
OS : Ubuntu Server 20.04
CPU : AMD Ryzen 9 5950X
RAM : 2x 32GB Corsair Vengeance LPX DDR4-3600 CL18-22-22-42
MBD : Asus Prime B550-Plus
SSD : 2x 2TB Corsair MP600
HDD : 12TB WD My Book over USB
PSU : Corsair HX750

Are you doing any overclocking or tweaking of any BIOS setting related to your RAM or CPU? If so, turn those off and go stock/auto settings.

1 Like

No, I haven’t changed my UEFI settings at all. All CPU and RAM settings are on ASUS’s automatic profile, I also made sure to load optimized defaults to make sure I didn’t accidentaly change anything.

Is there any way I could find out if this is caused by an unstable CPU / RAM?

Yes, by running memtest … and prime95/mprime overnight… that’s how you test for system stability.

1 Like

Well, won’t be running anything overnight. memtest gets stuck at 9% immediately after starting.

Is that 100% caused by bad RAM or could there be something wrong my memtest setup? Kind of expected more feedback from a program designed to test for bad memory. I installed memtest86+ from apt and reinstalled ubuntu on BIOS instead of UEFI to be able to run it from grub, could that cause that kind of problem?

1 Like

Definitely bad RAM. You can try popping 2 of the 4 RAM sticks out, if you are using 4. That would isolate which “half” is bad? If you are only using 2, they are always paired, so you need to replace both.

1 Like

Thanks for your help, that’s good to know. Got only 2 DIMMs, so no luck for me there.

1 Like