K35 repeated plot creation failure

seymour.krelborn · June 16, 2022, 4:56am

Using chia.exe via the command line, I have been trying to create K35 plots.

My first two worked.
My next 3 failed.

I just tried, again, on a different computer, and it, too, failed.
During the tail end of phase 3, it shows this output:

Compressing tables 6 and 7

<snip>

        Bucket 83 uniform sort. Ram: 26.370GiB, u_sort min: 13.000GiB, qs min: 4.347GiB.
Only wrote 889 of 9093 bytes at offset 765038627734 to "e:\\chia\\plotting\\1\\plot-k35-2022-06-12-13-25-e86c666c1dc137a79add5a3979c58e3cd4db74b8182ce26e61eb084c64bdc00b.plot.2.tmp" with length 765038628623. Error 1. Retrying in five minutes.
        Bucket 84 uniform sort. Ram: 26.370GiB, u_sort min: 13.000GiB, qs min: 4.348GiB.

The job completes.
But:
$ chia plots check -g name-of-plot.plot reports:

2022-06-16T00:41:39.521  chia.plotting.check_plots        : INFO     event: done, loaded 0 plots, 0 remaining
2022-06-16T00:41:39.537  chia.plotting.manager            : INFO     Saved 6 bytes of cached data
2022-06-16T00:41:40.553  chia.plotting.check_plots        : INFO
2022-06-16T00:41:40.553  chia.plotting.check_plots        : INFO
2022-06-16T00:41:40.553  chia.plotting.check_plots        : INFO     Summary
2022-06-16T00:41:40.553  chia.plotting.check_plots        : INFO     Found 0 valid plots, total size 0.00000 TiB
2022-06-16T00:41:40.553  chia.plotting.check_plots        : WARNING  1 invalid plots found:
2022-06-16T00:41:40.553  chia.plotting.check_plots        : WARNING      1 unopenable plots:

The above happens each time I try, all during the tail end of phase 3, and on two different computers.

Any ideas?

Jacek · June 16, 2022, 5:13am

how is your disk space when it fails?

seymour.krelborn · June 16, 2022, 5:14am

It never dropped below 1 TB (closer to 1.5 TB).

I was running a madmax K34 plot at the same time, using the same temp partition.

But other times I ran the K35 plot solo, and still got the error and plot failure.

amarena · June 16, 2022, 5:17am

Only wrote 889 of 9093 bytes at offset 765038627734

long time ago i also had this error, the reason by me was a broken RAM.
but if u alrady has tryed 2 differend systems, it must be some other problem.

Jacek · June 16, 2022, 5:22am

IIRC, long time ago, when there was no MM, and there was a need to stagger those chia plotters, when tmp file was full (staggering went out of whack), it was giving a similar error. I think that it was not aborting, just patiently waiting for the user to remove some junk. Maybe this is the same thing? (Although, I am not really certain whether it was chia or MM waiting. That ‘Retrying in 5’ suggests that all is good, but the free disk space is missing.)

The output says “Only wrote X at offset Y” that implies that Y size file is sitting there, and copy process is going on, but when it hits that X byte, it may be out of space.

The second output is kind of irrelevant, as we know that the file is not sound.

Although, take a look at those chia charts (lower right) - Grafana. If you switch to k36, maybe Chia will put your name in the tooltip there

seymour.krelborn · June 16, 2022, 5:26am

Do you mean physically broken?
Or do you mean insufficient memory?
Something else?

I give the job 8x the default ram, via the “-b 27112” option (the default is 3389).
I also the job 14 threads (default is 2), on this 16 core computer (the other 14 is the madmax plotting job, leaving 4 threads for the OS).

Both of the computers have 64 GB of RAM.
They rarely go over 32 GB, according to taskmrg.

Perhaps I should double the -b option, again, to 54224?

amarena · June 16, 2022, 5:27am

the 2nd reason for aborting plotts by my system was inode errors at the plotting SSD, if ur using ubuntuu try, but i cant remember if i got also this error messages.

sudo fsck -a -v

if ur got errors about inodes then this was the problem

amarena · June 16, 2022, 5:27am

the RAM was physical broken, i changed him and after this the problem was solved

seymour.krelborn · June 16, 2022, 5:30am

There is no way that I was out of disk space, for three reasons:

I was periodically checking (especially near the section where the job failed 3 times before), and it never got close to 1 TB remaining.
I was running a madmax K34 plot, (actually, several K34s completed during the K35 processing, and all of the K34 plots were good).
Just in case my computers favored madmax in a temp space conflict, where the K34 would be successful, please keep in mind that I got the same errors 3 times when I ran the K35 solo.

seymour.krelborn · June 16, 2022, 5:34am

Windows 10 home.

So that would probably be:
chkdsk (with /f if errors need to be fixed)

Jacek · June 16, 2022, 5:36am

During the plot creation disk utilization constantly fluctuates, and there are short spikes here and there. When we look at what has to be done, it is obvious to us that a move would be good enough, but the code may be just too generic, and will just duplicate a file for no reason, just to kill the original once that process is done. Just saying.

Not sure how soon you caught that error, but maybe you could run resource monitor with some 30 secs interval to check the free space.

amarena · June 16, 2022, 5:37am

yes i think that could be the rigth command for win10, it was a file system error not a broken sector error so u only need to check this

seymour.krelborn · June 16, 2022, 5:39am

I do not have the temp space for K36 (which would be over 4 TB).

K35 requires just north of 2 TB of temp space, which I accomplish via two 2 TB NVMe drives in a RAID 0.

I could direct the K36 job to a mechanical drive, and check on it in a month.
However, I believe that chia.exe has a maximum “-k” value of “35” (based on what the GUI offers).

$ chia plots create --help
…however, shows no -k value limit. Perhaps K36 is doable?

amarena · June 16, 2022, 5:41am

yes the GUI max k is 35 but i think at the cli u can use greater k values

Jacek · June 16, 2022, 5:44am

That chart has k36 column with the respective percentage. So, there has to be at least one such plot out there.

If you think about it, one should be able to specify k-full-drive, and have the plotter fill all the space on that drive, even if would that mean somehow concatenating k32 plots, and having the last one partial. The very first attacks on MM were with respect to dropped hashes (speed optimization), so I would think that partial k32 should be possible, as such k-full_drive shold be an option.

seymour.krelborn · June 16, 2022, 5:49am

I am going to try, again, giving the job double the memory (double the -b value that I have been using).

What puzzles me is that my first two attempts worked.

Jacek · June 16, 2022, 5:50am

Maybe you can run memtest86 first just to exclude bad RAM possibility? Although, as @amarena stated, two boxes with the same problem, so that is rather not likely. It rather suggests that the problem is with your setup, not with the H/w.

seymour.krelborn · June 16, 2022, 5:51am

That 0.0617% for K35 would be 0.0616%, if not for my two successful K35 plots.

seymour.krelborn · June 16, 2022, 5:55am

Is that run from bootable media?

It is worth testing. But it is a pain to reboot these PCs.
Due to all of the USB drives, it takes 5 or 10 minutes to bring me to the login screen.

I am always nervous, waiting for all of the drives to initialize, fretting the headache if it should happen to stall.

It always works. But waiting is no fun.

amarena · June 16, 2022, 5:56am

no u can run it normal under win10, if this was the same progamm that i used im not 100% sure about this