K33/K34 Plot Times ⏲

1tB for t1, 500GB t2 will do it … but looks like a 1tb or bigger t2 is faster, generally.

Thanks, I’ll need more ram/storage before I can do K34’s then.

I would have expected more than 80/day. I can get 75/day with 2x 2680v3 and 128gb ram single plotting. Some rough math of a test I did shows that if I had 256gb ram, it would be capable of about 84/day.

I also get around 70 minutes for k33 using my 2x 1tb NVMe drives, and 160 minutes for k34.

On a single CPU I was doing a K32 in 22 minutes (65 a day), but sometimes it would take longer. When I do a single K32 on the dual CPU setup it takes longer but I haven’t noted the time, think it was 26 minutes.

Running two Madmax in parallel each one takes 36 minutes, so 1 plot every 18 minutes. For some weird reason my system will sometimes take longer, it did this when I had the components in a T5810, but I’ve not yet got to the bottom of it, so it may actually on some occasions be faster. I’m running Linux Mint, with a raid 0 of 4 x Intel S3710 200GB SSD’s, which gives around 2000MB/s throughput.

I probably need to spend some time to find the best settings, but I’m nearly plotted - I need another 74 K33 plots to maximise the disk space I currently have.

I currently have 8 x 32GB and 4 x 8GB, so not all slots are populated, ideally all slots need to be populated for maximum ram speed.

75 a day with 2 x 2680v3 is impressive, what settings are you using for 75 a day, and what ram configuration do you have? Also what OS?

I’m using a t7910 with the aforementioned 2680v3s, 8x 16GB ram sticks populated across both CPUs, and 2x samsung sm961 1tb drives which get around 2200MB/s during plotting formatted in xfs. Running on arch linux with 5.16 kernel, mad max compiled with gcc 11, cpu mitigations disabled, scaling governor set to performance, and ld preload set to jemalloc. I run mad max with -r 24 -K 2 -u 256 -t nvme raid array -2 tmpfs and get a consistent k32 plot time of 19 minutes.

1 Like

Well you totally lost me on the above, I’ve no idea what pretty much all of that means unfortunately. I’m using the Madmax plotters that came from a Chia install.

I use -r 36 -K 2 -u 256 -v 128 -t /media/chiamining/Raid0/ -2 /mnt/ramdisk/

Would you mind sharing your Madmax builds?

I have this thread for my inconsistent plot times, but I just haven’t had the time to investigate further.

5.16 is the latest stable kernel release (5.17 currently in development). GCC is the gnu c compiler, and version 11 is the latest, which offers some pretty decent performance benefits over version 10, which many distros are still using. If your distro is still on GCC10, I would not bother looking into building GCC11 yourself, it’s quite the undertaking. CPU mitigations disabled means I set the kernel parameter mitigations=off in my bootloader so that the kernel will not load the spectre/meltdown CPU mitigations and gains a little bit of performance, especially on these older haswell chips. The CPU scaling governor is probably the easiest way to get more performance on these older xeons, because the default is usually set to either schedutil (which is ok) or powersave, depending on the distro. Setting the governor to performance lets it ramp up to higher clock frequencies faster, and stay there longer. The LD_PRELOAD environment variable set to use libjemalloc uses a custom memory allocation system which slightly improves performance on some machines, I gained maybe 30s at most from it.

You can build your own mad max by following the guide in the readme on his github page: GitHub - madMAx43v3r/chia-plotter. As for your settings, are you using -r 36 with 1 CPU or 2 CPUs? If you’ve only got 1 2699v3, you should try using -r 18. I noticed in my testing that these haswell chips seem to perform best in phase 2 with -r set to logical threads, but best in phase 3 with -r set to physical core count, and the sweat spot (without using -K 2) is partway in between. With -K 2 it generally works the best at physical core count. I also don’t use -v 128, there were a few tests I did that showed promise for lower buckets, but I don’t have enough ram to test it with k32, and 128 buckets was generally not good performance (worse than 256 with k32 plotting, and worse than 64 with k30 plotting).

2 Likes

I have 2 x E5-2699v3, was using -r 18 when I had one. Just trying a K32 without the -v option

Thanks for the more in depth explanation, I’ll have to do a little research and see if I can work out how to adjust some of what you mention.

$ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.

I’ve set mitigations=off and scaling governor to performance, and done away with the -v 128, K32 plotted completely in RAM.

Phase 1 took 561.77 sec
Phase 2 took 282.181 sec
Phase 3 took 331.23 sec
Phase 4 took 55.8239 sec

Total plot creation time was 1231.08 sec (20.518 min)

Can’t wait for the new version of Bladebit that requires less ram, just hope it does K33 and K34 as well.

3 Likes

There’s an extra bit that has to be processed for k33 and k34, so it significantly increases process memory usage and cpu load per thread. I discovered that the limiting factor on my system for k34 is single core performance. One task in the plotting process was pegging a single core at 100%, and all the other threads were waiting on that process to finish and pass the data along. This is the reason that k33 and k34 don’t scale linearly in time with k32, since the plotting time for k33 should be roughly 2x k32, and k34 should be roughly 4x k32. Instead most people are seeing 3-4x for k33 and 7-10x for k34. Your 2699v3s might actually do worse than my 2680s due to lower single core speed for multithreaded k34.

2 Likes

Interesting, what phase/stage is it that requires only one thread? The 2699v3 has a base speed of 2.3GHz, and a single core turbo of 3.6Ghz, whilst the 2680v3 is 2.5Ghz / 3.3Ghz, so the 2699 should be quicker in single core if it’s boosting to full speed.

Should be running another batch of K33’s today, so will be interesting to see what your optimisations do there.

That would be with plotting the K32 completely or with -2 temp directory in RAMdisk, while K33 and K34’s are plotted on (Raid0) SSD/NVME?

On my single CPU system I see near linear from K32 to K33 and additional 10% time needed for K34 vs 2xK33.
To be specific: K32 59m, K33 1h59m, K34 4h45m, all on NVME temp directories.

Plotting K32 with 128GB Ramdisk indeed is 36m so K33 is ~3.3x as long in that comparison.

I have a 250GB ram disk, and using the mods suggested above my K33 time is down to 67 minutes, so about three times my K32 time.

The single core bottleneck is in phase 2 with the “count” process that is literally just counting the entries in the tables, and the other plotting processes can’t proceed without it. I was seeing total CPU usage of around 20-30% while count was running, then it would jump up to 50% usage when count finished and the other processes could work on the new data :laughing:. This seems to be one of the big reasons that k34 scales so poorly, the other phases seem to scale similarly to k33, but phase 2 takes significantly longer on k34 than it should compared to k33.

1 Like

Well, I did x10 k33 plots since last night, doing x2 in parallel on 2 sets of 2 ssds. Each k33 (on average) was 144 minutes or 72 minutes/plot overall. Fastest was 131 min slowest was 151 min.

Considering in a similar scenario, my k32 take about 21 minutes on average, k33s take about x3.4 the time to create as k32s.

So not so economical of time and energy, but may use less HD seeks for the same rewards.

Thinking I may do at least 16tb or maybe two with k33s, k34s seem a bit over the top at this point…

Not necessarily less HDD seeks, just means the plot passes filter less often, but it’s still the size of 2x k32 so it still has to dig through all that space looking for proofs if it’s on a promising lead and there’s more chance it will find multiple proofs for each filter pass (with pooling). If you want to reduce hard drive seeks, go solo.

I’ve just been using K32s and K33s to use up as much space as possible, so far on all the drives I’ve filled I have an average of just under 4GB left.

When you compare k32 to k33 plots, isn’t the number of hashes stored in k33 almost exactly 2x of what k32 has (2x the number of tables, and all tables are the same)?

I mean, the plot size for k33 is slightly bigger than 2x k32, so is the density of hashes/TB slightly lower for k33?

Common sense would say yes, and I don’t know why the devs would make it so much larger (~6.1GB more than x2 k32s) full of exactly what I have no idea. Seems a bit wasteful of disk space if you ask me.

1 Like

In that case, mixing those different kXY plots to use more HD space is just waste of time, isn’t it? The end result is more disk space used, same number of hashes.

Using k32 as a hash unit, the disk utilization should be looked at how many k32 hash units it has, not really how much disk space was consumed.

The point of higher k-values is not really to store more hashes, but rather thwart short range replotting attacks. If those plots would be just a straight forward hash storage, we would not need to deal with those higher k-values. So, my take is this is where that extra overhead comes from.