Ubuntu started popping up a “System program problem detected” window as of recently, which usually is the harbinger of a crash to come. The message does not include any information about what program is affected or what the problem is. The kernel and crash log are at the bottom of the post.
I checked the plotter logs too, but there are no errors reported leading up to the last line written before the crash in each of them. Could this be related to XMP/DOCP? DOCP is set to disabled in my UEFI though. I have noticed that the RAM is running at slightly above 2000MHz rather than the rated 3600MHz with automatic settings, could that be causing problems?
I was plotting 12 k32 plots in parallel, 2 threads and 3390MiB, 30min stagger. I think it only got about 6 plots in before the crash though. Average time for a single plot is slightly above 4 hours on my system.
OS : Ubuntu Budgie 20.04, packages up to date
CPU : AMD Ryzen 9 5950X
RAM : 2x 32GB Corsair Vengeance LPX DDR4-3600 CL18-22-22-42
MBD : Asus Prime B550-Plus
SSD : 2x 2TB Corsair MP600
HDD : 12TB WD My Book over USB
PSU : Corsair HX750
GPU : Nvidia GeForce GT240
Log in /var/crash:
raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: non-NULL mapping
Modules linked in: ses enclosure scsi_transport_sas xfs nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep edac_mce_amd nouveau snd_pcm snd_seq_midi kvm snd_seq_midi_event mxm_wmi snd_rawmidi ttm snd_seq drm_kms_helper snd_seq_device crct10dif_pclmul cec ghash_clmulni_intel snd_timer rc_core aesni_intel joydev i2c_algo_bit eeepc_wmi snd input_leds asus_wmi crypto_simd fb_sys_fops syscopyarea sparse_keymap cryptd sysfillrect sysimgblt video glue_helper wmi_bmof soundcore efi_pstore k10temp ccp mac_hid sch_fq_codel parport_pc ppdev lp parport drm ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 multipath linear raid0 hid_corsair hid_generic usbhid hid uas usb_storage crc32_pclmul r8169 nvme ahci i2c_piix4 realtek xhci_pci nvme_core libahci xhci_pci_renesas wmi gpio_amdpt gpio_generic
CPU: 25 PID: 283 Comm: kswapd0 Not tainted 5.8.0-50-generic #56~20.04.1-Ubuntu
Hardware name: ASUS System Product Name/PRIME B550-PLUS, BIOS 1401 12/03/2020
Call Trace:
Package: linux-image-5.8.0-50-generic 5.8.0-50.56~20.04.1
SourcePackage: linux
Tags: kernel-oops
Uname: Linux 5.8.0-50-generic x86_64
Excerpt from kernel log:
Apr 30 00:00:44 fred kernel: [ 6278.255580] Disabling lock debugging due to kernel taint
Apr 30 00:52:19 fred kernel: [ 9373.240482] ------------[ cut here ]------------
Apr 30 00:52:19 fred kernel: [ 9373.240484] nouveau 0000:04:00.0: timeout
Apr 30 00:52:19 fred kernel: [ 9373.240530] WARNING: CPU: 12 PID: 10356 at drivers/gpu/drm/nouveau/nvkm/engine/gr/g84.c:168 g84_gr_tlb_flush+0x30b/0x320 [nouveau]
Apr 30 00:52:19 fred kernel: [ 9373.240531] Modules linked in: ses enclosure scsi_transport_sas xfs nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep edac_mce_amd nouveau snd_pcm snd_seq_midi kvm snd_seq_midi_event mxm_wmi snd_rawmidi ttm snd_seq drm_kms_helper snd_seq_device crct10dif_pclmul cec ghash_clmulni_intel snd_timer rc_core aesni_intel joydev i2c_algo_bit eeepc_wmi snd input_leds asus_wmi crypto_simd fb_sys_fops syscopyarea sparse_keymap cryptd sysfillrect sysimgblt video glue_helper wmi_bmof soundcore efi_pstore k10temp ccp mac_hid sch_fq_codel parport_pc ppdev lp parport drm ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 multipath linear raid0 hid_corsair hid_generic usbhid hid uas usb_storage crc32_pclmul r8169 nvme ahci i2c_piix4 realtek xhci_pci nvme_core libahci xhci_pci_renesas wmi gpio_amdpt gpio_generic
Apr 30 00:52:19 fred kernel: [ 9373.240558] CPU: 12 PID: 10356 Comm: kworker/12:0 Tainted: G B 5.8.0-50-generic #56~20.04.1-Ubuntu
Apr 30 00:52:19 fred kernel: [ 9373.240559] Hardware name: ASUS System Product Name/PRIME B550-PLUS, BIOS 1401 12/03/2020
Apr 30 00:52:19 fred kernel: [ 9373.240582] Workqueue: events nouveau_cli_work [nouveau]
Apr 30 00:52:19 fred kernel: [ 9373.240600] RIP: 0010:g84_gr_tlb_flush+0x30b/0x320 [nouveau]
Apr 30 00:52:19 fred kernel: [ 9373.240601] Code: 8b 40 10 48 8b 78 10 4c 8b 6f 50 4d 85 ed 75 03 4c 8b 2f e8 87 e3 76 e7 4c 89 ea 48 c7 c7 7c 79 cc c0 48 89 c6 e8 4b bb b3 e7 <0f> 0b e9 49 ff ff ff e8 79 34 b9 e7 66 0f 1f 84 00 00 00 00 00 0f
Apr 30 00:52:19 fred kernel: [ 9373.240602] RSP: 0018:ffffac2ac678f888 EFLAGS: 00010082
Apr 30 00:52:19 fred kernel: [ 9373.240602] RAX: 0000000000000000 RBX: ffff9d515e7db400 RCX: 0000000000000027
Apr 30 00:52:19 fred kernel: [ 9373.240603] RDX: 0000000000000027 RSI: 0000000000000082 RDI: ffff9d516eb18cd8
Apr 30 00:52:19 fred kernel: [ 9373.240603] RBP: ffffac2ac678f978 R08: ffff9d516eb18cd0 R09: 0000000000000004
Apr 30 00:52:19 fred kernel: [ 9373.240603] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
Apr 30 00:52:19 fred kernel: [ 9373.240604] R13: ffff9d5166b63060 R14: ffff9d51624a39c0 R15: 0000000000000001
Apr 30 00:52:19 fred kernel: [ 9373.240604] FS: 0000000000000000(0000) GS:ffff9d516eb00000(0000) knlGS:0000000000000000
Apr 30 00:52:19 fred kernel: [ 9373.240604] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 30 00:52:19 fred kernel: [ 9373.240605] CR2: 00007fc0998c1000 CR3: 0000000fdf898000 CR4: 0000000000740ee0
Apr 30 00:52:19 fred kernel: [ 9373.240605] PKRU: 55555554
Apr 30 00:52:19 fred kernel: [ 9373.240606] Call Trace:
Apr 30 00:52:19 fred kernel: [ 9373.240628] ? nv04_timer_read+0x47/0x60 [nouveau]
Apr 30 00:52:19 fred kernel: [ 9373.240644] ? nvkm_timer_wait_test+0x22/0x80 [nouveau]
Apr 30 00:52:19 fred kernel: [ 9373.240657] ? g84_bar_flush+0x8b/0xe0 [nouveau]
...