Hey guys, I could really use some help identifying the bottleneck here. I’m pretty sure I’m pushing my NVMEs too hard.
Threadripper 3970x
Asus Prime TRX40-Pro S
128 GB 3200
4 Samsung 970 EVO Plus 2T (temp drives)
4 WD NAS 18T drives (dest drives)
I have a max of 3 P1 jobs across all 4 NVME at once (going to separate dest drives). Max jobs per NVME in all phases is 8 for a total of 31 jobs at a time (I’ve limited global jobs to 31).
Right now my times are around 12 hours for completion of a job. The first 4 jobs that start have a 2h 50 min P1 time and as more and more jobs get added to the drives that slows to 4h 33 min. The 4 jobs start simultaneously across the 4 NVMEs with a 30 min stagger time. I’ve tried this with a time as low as 12 with pretty much the same results.
I’m assuming it’s the NVMEs themselves that are being overtaxed and not the PCIe lanes since the threadripper and chipset have so many.
If I replaced these 4 970s with 4 980 Pros would this resolve the bottleneck and let me plot 31 jobs at full speed, due to those drives being PCIe 4.0 vs 3.0?
Individual plot completion times are not a useful way to measure what the system is capable of. How many plots are you achieving per day? (count the file modified timestamps for a 24 hour period).
8 parallel plots on a 2TB 970 EVO Plus seems too high for these consumer TLC drives. Try 5-6. Check the iowait to determine the optimum number before it starts to struggle (see link below).
Lack of PCIe bandwidth is not the problem in your case but the 980 PROs may perform better depending on their sustained write performance (check the reviews).
You didn’t mention the OS. Linux performs better if you aren’t already using it. Proper software configuration plays a big part too.
We are testing now to use 4x SSDs as temp -1 and 4x SSDs as temp -2, so paired 1 to 1 and then dumps to HDD as we feel the P1/P2 being handled by 1 SSD and P3/P4/P5 being handled by a second SSD will improve the bottleneck caused within the SSDs.
We would then run 8 jobs staggered by 60 minute delay per pair of SSDs and max total 32 jobs with latest Swar build and to have P1 per drive limited to 3 and total jobs per pair of SSDs set to 8. Will also use an offset = 15 minutes as well, so at the start, 1 job kicks off every 15 minutes sequentially until it ramps up to full, trying to balance everything evenly. We base this on hoping to achieve 8 hour plot times. Based on results, we would need adjust numbers.
In theory it should help a bit and we have been testing on another machine with just 2 SSDs to see how it works out and seeing some positive results.
You could try this method since you have 4 SSDs and just change the max jobs to 16 and global offset to 30 minutes and leave the settings per pair of SSDs as I described above.
Feel free to test, but at this point, I make no promises of results.
You only have four nvmes, I would look into adding the 980s into the mix not replacing the 970s with them. Now you just gotta source the adapter cards (like Asus Hyper m.2 Gen 4) to add more drives into the threadripper system since you have plenty of pci-e lanes.
Yes, generally for consumer drives, more 1TB drives = more controllers = more plotting power compared to a lower quantity of 2TB drives.
It’s a common misconception - the extra space does not translate to more concurrent plots because the drive gets overloaded well before the space is fully utilized.
Something else definitely seems to be going on here. I added a 5th NVME drive and dropped jobs down to 6 on each and times went through the roof. Sitting at about 20 hour completion time now per job with only 25 running parallel.
Preliminary testing I did before adding the 5th drive (marking GB complete at 12min mark):
3 jobs across 1 drive staggered - 12 min 66gb
3 jobs across 1 drive same time - 12 min 66gb
4 jobs across 1 drive same time - 12 min 64gb
4 jobs across drives 1-4 same time - 12 min 58 to 60gb
4 jobs across drives 1-4 staggered - 12 min 59gb
3 jobs across drives 1-3 same time - 12 min 59-61gb
3 jobs across drives 1,2,4 same time - 12 min 62-66gb
2jobs each drives 1,2,4 same time - 12 min 58-59gb
2jobs each drives 1,2 same time - 12 min 60-63gb
It seems that adding a new drive into the mix drops speeds by a few GB and each job I add per drive continues to drop speeds.
While that may be, I can’t see why adding a 5th NVME while keeping the same total amount of jobs by reducing running jobs on the other 4 would slow the system down even more.
3955wx here on the GUI with (currently) 2x2tb temp and been testing different scenarios for best daily output. One or two plots at a time and it’s flying. But then I find striving for more plots at similar speeds like musical chairs. Add more chairs, it all slows down. Take some away and plotting speeds back up. I’ve tried x3 temp and x2 temp with different loads. You think you are going to do more plots to make better output, but everything slows to compensate. It seems that ~24 plots or slightly more is the max for my TR config. I’m thinking it’s the IO, but with x2 in RAID 0 Sam 970 Evo plus and another PCI-e 4.0 2TB that maybe should not happen. IO response is consistent for both ranging very low (sub ms) to higher 10-100ms occasionally but stays similar with loads I’ve tried. Memory is 4000GiB/plot 3T. CPU is not generally maxed at all, I’ve tons of memory, way more than is required.
Still 1/hr ave daily output is reasonable and over time accomplishes what is required, but IMWTK why the limit?
I have a TR pro 16/32 and a TR 24/48, both with 7 nvmes of different makes and models. It does not seem to matter what I do across what drive, the average of the drives is roughly the same IE 2 tb Firecudas vs 1 tb 970 pro with the SAME jobs will have the same total time. Its so strange, it is like the TR balances it or something…
I think it’s the amount of NVMEs. It looks like (at least for me) 6 jobs per NVME is the max so if he’s doing 6x8 that’s 48 jobs at a time which would be about 96 per day.
I also have someone in a reddit thread with 3970x and 5x 980 pros telling me he’s doing 100 plots per day with 5 jobs running at once with a 90 minute stagger using 6T/6500mib but I haven’t had the time to test that and won’t until next week.