Plotting process measurements

I’ve decided to test several cases using second temporary directory. In all cases first temp dir is NVME drive and destination is HDD. These two are the same for all tests.

  • not using it (default) - second temp is the same NVME as first
  • slow SATA SSD
  • fast NVME SSD (different one)
  • another HDD
  • same HDD as destination

I want to make script to take measures every second and log it to files and then combine it all in single spreadsheet with phase and operation of plotting.

So, what to measure:

  • size of all used directories (depending on time, e.g. phase of plotting) - this will allow us to make effective time separation for plotting.
  • memory usage (same as previous) - I need help with scripting here
  • TWB of first temporary - this will answer the question how much TWB we can save for fast NVME drives using cheaper (SSD or HDD) devices and have any profit

Any ideas and proposals are very welcome!

I’m planning to start test soon and it will take a time. I will write process and results here.

6 Likes

Testing hardware:

  • Lenovo ThinkPad T15G1, Intel® Core™ i5-10210U CPU @ 1.60GHz × 8, RAM 15.4 GiB running on Fedora 34 (Workstation Edition)
  • Internal NVME Samsung SSD 970 EVO Plus 1TB
  • External SATA SSD Samsung 860 EVO 500GB
  • External NVME SSD ADATA SX 8200PNP
  • External HDD (destination) Seagate Barracuda ST2000LM 015-2E8174
  • External HDD (second temp) Seagate Mobile HDD ST1000LM 035-1RK172
1 Like

First test is done.

chia plots create -k 32 -t /temporary/temp1/ -d /run/media/lv/Pocket/dest/
First and second temporary is internal NVME, destination is external HDD
Total time = 22613.492 seconds.

(on graphs long vertical lines separate phases, short separate sub-phases)

Disk usage:

For first temporary is used
223188MB at peak on phase 1, next 256849MB at peak on phase 2 and going down from 239871MB at start of phase 3
The peak in the end of third phase (I think) is composing temporary plot file.

Memory allocated:

The default buffer size is 3389MB. Memory allocation on all phases except first are not exceeding 2000MB (1175MB on phase 2,1890MB on phase 3 and 1165MB on phase 4)

TWB

Total TWB of NVME device used for one plot is 1 424 385 MB

2 Likes

Second test is done

chia plots create -k 32 -t /temporary/temp1/ -2 /temporary2/temp2/ -d /run/media/lv/Pocket/dest/
First temporary is internal NVME, second temporary is SATA SSD in external enclosure, destination is external HDD
Total time = 23530.325 seconds.

Disk usage

Looks like previous for phases 1 and 2, but in 3rd phase space decreases faster.


Difference of first temporary usage:

Memory allocated

No visible changes, same as previous

TWB

Total TWB of NVME device used for one plot is 1 342 493 MB (81 892 MB lower)

Conclusions:

  1. Using secondary temp folder on another drive decreases TWB of faster device for about 80GB per 1 plot, so using cheaper and slower SSD will save some money.
  2. Using secondary temp folder on another drive increases available space starting from phase 3, so starting next plot seeding from that moment (or a little bit later) would be greater for queues. Especially for Samsung 970 PRO and 980 PRO series (or similar)
3 Likes

Third test is done

NVME in external is used as second temporary. Results looks like the previous test, but 3rd and 4th phases is a little bit faster (overall about 15-20 minutes)
Total time = 22716.313 seconds.

3 Likes

So there is no need other tests.
Using HDD for secondary temp will slow down phases 3 and 4, also phase 5 (copying plot file to destination).
Using destination drive (HDD) as secondary will result the same. I found workflow of handling tmp plot file is copying tmp file to destination, then renaming it and deleting from second temporary.
note on this: in case of using destination as second temporary it could be just renaming tmp file, but I don’t think it really would be. All the ways it slows down process and I would’n test it.

2 Likes

Ways to effective temporary storage utilization with 2nd temporary

(Charts of temporary space usage)

Starting next plot seeding after compressing tables 5 and 6 (phase 3) of previous plot begins:
about 250GB at peak

Starting next plot seeding after compressing tables 2 and 3 (phase 3) of previous plot begins:
about 320GB at peak

Starting next plot seeding after the first phase of previous:
about 450GB at peak

4 Likes

Interesting, so using a secondary temp isn’t speeding up plot times, even if it’s an SSD drive…!? Also what exactly are the three charts in your last post saying? BTW, how I
understand it, when specifiying the same dir as tmp2 and dst when the plot finishes it just does rename the plot, meaning you will copy the plot only once, to you archiving drive!

Exactly! It is not speeding up ONE plot seeding. But using 2nd temp is speeding up freeing space on 1st temp drive so you can run next seeding without waiting the first one finish. And this flow would utilize same 250GB of temporary space (it’s about 1st chart) allowing to speedup the whole queue.
Look at first chart of second test. In regular way you run seeding, waiting it ends and then run next. It takes 29160 seconds with file copying (for my testing environment) for one plot. BUT! you can run next seeding earlier for about 8200 seconds and it will use the same temp size (250GB), about (max for a part time) 6GB of allocated memory and 3 threads (2 for first phase, and 1 for others).
The second profit is lowering write saturation at 3rd and 4th phases for faster device.
And the third, you save resources for faster (more expensive) device by using cheaper one. I.e. if it has 600TWB, without 2nd temp you will make 441 plots, and with 2nd temp you will make 468 plots (of course, of manufacturer guaranteed). It’s about 5% economy.

See above, it’s utilized temporary space (-t folder).

No. It says
Copied final file from "/temp/plot-k32-xxx.plot.2.tmp" to "/dest/plot-k32-xxx.plot.2.tmp"
Removed temp file "/temp/plot-k32-xxx.plot.2.tmp"
Renamed final file from "/dest/plot-k32-xxx.plot.2.tmp" to "/dest/plot-k32-xxx.plot"

It means temporary file was copied, then renamed. But I didn’t tested in case of -2 folder and -d on same drive.

Great info. @gladanimal im seeing same results as you, no speed up on plot times. Im seeing 5:08 to 5:33 for my plot times staggering 120min, I was having a bottleneck while copying plots to my external USB drive.

1 Like

So all in all you have more space on tmp1 when you are also using tmp2…
Currently I am only using tmp1 and my dst dir is also on the tmp1 drive, so after finishing it remains on the tmp1 drive and I copy it to where I farm from.
Now I have 4TB as tmp1 and am wondering if a tmp2 with the size of 1 TB will be enough when I am plotting 8 parallel, or even 10. Since tmp1 takes only 100gb per plot as long as I am not finishing more than 6 plots per hour, I think the space on tmp2 should be sufficient (multiplying 10min copy time from tmp2 to farming machine by 6).

1 Like

Looks like that. And I use attachable HDD as destination for now.
Currently I have a bottleneck on size of temporary device (1Tb + 512Gb) and searching the way to utilize 100% of it )). I’ve got i7 11700 and asus z590, so it will be really very narrow bottleneck soon (not maintained it yet) until I’ll get more expensive NVMEs

Excellent work here on the graphs showing just what is happening! Kuddos gladanimal!. One comment from my testing. If you make temp -2 and dest the same nvme (and separate from temp1), the final plot temp is created on temp2 and then just renamed in place saving all copy time. You just need to then on your own copy to final destination, whilst the next plot is already on it way and plotting. Works great to reduce times when plotting x8, x10, x12, x16 plots in a staggered fashion.

2 Likes

@Fuzeguy Is there an advantage having tmp1 and tmp2 for you instead of making a raid0 with your two nvmes (or just having two tmp1 folders)?

Thanks for comment! Will try it later. What OS are you on?

1 Like

Having a temp2 saves space on temp1 for the coming plots, there’s that. Temp2 gets 101GB/plot as the files are developed there, plus when plotting starts in the earlier stages there are 0 byte temp files (when multi-plotting) placed there that actually take up 10s of GB each (as shown in properties of the drive) and the size just isn’t shown in ‘dir’. When plotting multiple plots, and staging on a 2TB nvme, space can get limited…this frees some up to allow more simultaneous plots.

Then of course there is the disk activity that is now being shared and so temp1 doesn’t get that as it is offloaded to temp2. So as a group, the ssds should last longer by sharing the work.

As to your question, I have 1x 2TB, 2x 1TB (RAID 0), and 1x 1TB for temp2/destination. So two plotting engines feeding the single temp2/destination nvme. And Win10, GUI.

1 Like

Hey @admin, this post should be pinned.

1 Like

Hi Gladanimal, thanks for sharing this awesome test.

Could you clarify what you mean by “Starting next plot seeding after compressing tables 5 and 6 (phase 3) of previous plot begins”?

Does this mean you start the next plot after compressing tables 5 and six or right after the beginning of compressing tables 5 and six?

Many thanks

After the beginning of compressing tables 5 and 6. :+1:
More detailed test is coming. It will include I/O measuring.

1 Like

I upgraded my script. Results will be soon.
Script is launching by cron every minute and captures single plot seeding statistics. Output is formatted to import to spreadsheet with tab-separated format. Paths and device names is actual for my environment!
Here is script, not perfect, but working:

#!/bin/bash

# Capture date and time meaurements begins
DATE=`/bin/date +%Y-%m-%d`
TIME=`/bin/date +%H:%M:%S`

# Settings section
TEMP1=/temp1/
TEMP1_DRIVE=nvme1n1

TEMP2=/temp2/
TEMP2_DRIVE=nvme0n1

DEST=/plots16t/
DEST_DRIVE=md127

OUTPUT=/home/lv/testing/log2.txt

# Capture IO averages for 60 seconds
IO=`iostat -m -y 60 1`

# Capture disk space usage
TMP1_S=`/bin/du -s $TEMP1 | awk -F$'\t' '{print $1/1024}' OFMT="%3.0f"`
TMP1_IO=`echo -e "$IO" | awk -v var="$TEMP1_DRIVE" '$0~var{print $2,"\011",$3,"\011",$4,"\011",$6,"\011",$7}'`

TMP2_S=`/bin/du -s $TEMP2 | awk -F$'\t' '{print $1/1024}' OFMT="%3.0f"`
TMP2_IO=`echo -e "$IO" | awk -v var="$TEMP2_DRIVE" '$0~var{print $2,"\011",$3,"\011",$4,"\011",$6,"\011",$7}'`

DST_S=`/bin/du -s $DEST | awk -F$'\t' '{print $1/1024}' OFMT="%3.0f"`
DST_IO=`echo -e "$IO" | awk -v var="$DEST_DRIVE" '$0~var{print $2,"\011",$3,"\011",$4,"\011",$6,"\011",$7}'`

# Capture TWB for nvmes
TMP1_TWB=`/usr/sbin/smartctl -a /dev/$TEMP1_DRIVE | awk '/Data Units Written/{gsub(",","",$4); print $4*512/1024}' OFMT="%3.0f"`
TMP2_TWB=`/usr/sbin/smartctl -a /dev/$TEMP2_DRIVE | awk '/Data Units Written/{gsub(",","",$4); print $4*512/1024}' OFMT="%3.0f"`

# Capture CPU stats and memory usage
MEM=`/usr/bin/smem -c "name uss" --processfilter="^/home/lv/chia-blockchain/venv/" | grep chia | awk '{print $2/1024}' OFMT="%3.0f"`
CPU=`echo -e "$IO" | awk '$1 ~ /^[[:digit:]]/ {print $1}'`
WA=`echo -e "$IO" | awk '$1 ~ /^[[:digit:]]/ {print $4}'`

# Make heading row for new file
if [ ! -f $OUTPUT ]; then
COMMONLBL="Phase\tTime\tCPU,%\tWA\tMem,MB\tTemp1,MB\tTemp2,MB\tDst,MB\tTWB1,MB\tTWB2,MB"
TMP1IOLBL="Tmp1 TPS\tTmp1 rs,MB/s\tTmp1 ws, MB/s\tTmp1 r, MB\tTmp1 w,MB"
TMP2IOLBL="Tmp2 TPS\tTmp2 rs,MB/s\tTmp2 ws, MB/s\tTmp2 r, MB\tTmp2 w,MB"
DSTIOLBL="Dest TPS\tDest rs,MB/s\tDest ws, MB/s\tDest r, MB\tDest w,MB"
echo -e "$COMMONLBL\t$TMP1IOLBL\t$TMP2IOLBL\t$DSTIOLBL" >> $OUTPUT
chown lv:lv $OUTPUT
fi

# Make output
COMMON="\t$DATE $TIME\t$CPU\t$WA\t$MEM\t$TMP1_S\t$TMP2_S\t$DST_S\t$TMP1_TWB\t$TMP2_TWB"
echo -e "$COMMON\t$TMP1_IO\t$TMP2_IO\t$DST_IO" >> $OUTPUT
4 Likes