Hi,
tl:dr getting “Invalid plot header magic” on 8 of my 14 plots, server hardware, ECC RAM, datacentre NVMe…Ubuntu 20.04…no ECC errors and all disks pass SMART…using three NVMe and five datacentre HDD’s, the issue is found on plots from all three SSDs and on all five HDD’s.
If it’s a hardware error it’s an obscure one, or multiple failures.
More detail:
Dual Xeon, DDR4 ECC Type: 8x32GB Multi-bit ECC (edac-util -rfull output: mc0:noinfo:all:UE:0 / mc0:noinfo:all:CE:0 / mc1:noinfo:all:UE:0 / mc1:noinfo:all:CE:0)
3 x nvme drives (plotting)
- drive 1 - 3 plots
- drive 2 - 3 plots
- drive 3 - 1 plots
1 x nvme drives (-2) - drive that all temp files get written to before going to HDD (brand new drive)
5 x 14T sata hdd (farming) new drives
- drive 1 - 4 plots (1 bad, from sequence 2) [nvme drive 1 & 2 plots come here]
- drive 2 - 2 plots (2 bad, one from each sequence) [nvme drive 3 plots come here]
- drive 3 - 4 plots (2 bad, from sequence 1) [nvme drive 1 plots come here]
- drive 4 - 2 plots (1 bad, from sequence 1) [nvme drive 2 plots come here]
- drive 5 - 2 plots (2 bad, one from each sequence) [nvme drive 2 plots come here]
14 plots complete (7 in sequence 1, 7 in sequence 2), 8 bad (4 in sequence 1, 3 in sequence 2) wtf!!!
Error:
2021-05-18T08:09:34.615 chia.plotting.plot_tools : ERROR Failed to open file /mnt/field02/plot-k32-2021-05-17-06-57-xxx.plot. Invalid plot header magic Traceback (most recent call last):
File "/home/rthorntn/chia-blockchain/chia/plotting/plot_tools.py", line 189, in process_file
prover = DiskProver(str(filename))
ValueError: Invalid plot header magic
So I googled and I read disk issues and possibly RAM.
-
I have ECC right so it shouldn’t be that?
-
I didn’t get 8 plots from any one nvme, I guess all nvme drives could be bad?
-
The brand new Intel 750 could be bad but because it’s a single point of failure wouldn’t it corrupt all plots?
-
I have failed plots on all HDD drives, surely all 5 drives can’t be bad?
-
Cosmic rays, SATA bus corruption, SATA controller issue, who knows?
-
A bug, any other way to verify the plots?
I’m pretty pissed that I only have 6 good plots out of 14, less than 50% success rate. Please help, lol, preferably in a way that will get all of my 14 plots to pass the check… Should I just stop plotting until I figure it out, who knows…
I just lowered the chia plots RAM from 8000 to 4000
I just changed -2 to be the same drive as -t
Command:
screen -d -m -S chia01 bash -c 'cd /home/xxx/chia-blockchain && . ./activate && sleep 0h && chia plots create -k 32 -b 4000 -e -r 4 -u 128 -n 32 -t /mnt/1600gb_1/temp1 -2 /mnt/1600gb_1 -d /mnt/field01 |tee /home/rthorntn/chialogs/chia01_1_.log'
Here goes I will check in 10 hours to see if it made any difference.