Plotting on temp SSD with crash consistency disabled (Linux)?

Expert-Sandwich-5240 · May 16, 2021, 5:44pm

I’m curious if ext4 without crash consistency on a temp plotting SSDs will be faster than the currently advised XFS setup. I am personally not interested in burning SSDs to plot faster, so I thought I would see if anyone in the community would be interested in testing this out (and hopefully let me know the results). Steps on what to do are shown below, with an explanation of what it’s doing at the bottom.

Note that this is a request for people using the SSD only for plotting purposes. Also, these settings for ext4 (or any other file system) are generally unsafe to use on drives with data you want to keep long-term unless you really know what you’re doing. However, I feel that it is a good fit for chia plotting because plots cannot be continued if interrupted and completed plots do not stay on the temp drive for very long.

Assumptions:

reasonable familiarity with Linux command line
plotting machine rarely crashes

Risks:

maybe you plot a little slower while testing this (you can always switch back when your plot(s) complete)
you (maybe) lose some in-progress plotting data if your machine crashes while writing to the temp drive

Benefits:

maybe you plot a little faster
maybe you write a little less data to your SSD

WARNING: The below ext4 setup is NOT SAFE to run on drives that hold data you want to keep long-term. For why, see “What it is actually doing.”

Steps to setup things:

finish any in-progress plots
move any files you want to keep off the SSD
re-format the SSD with ext4 without a journal by running the following in a shell:
sudo mke2fs -t ext4 -O ^has_journal,large_file,extent <device>
mount the new file system where the OS expects it
continue plotting like normal

If the machine does crash while making a plot, the easiest way to ensure the file system isn’t in a bad state would actually be just killing it and making the file system again by repeating steps 2-5 above.

If you find this isn’t helpful, you can repeat steps 2-5, replacing the command in step 3 with whatever command/workflow you used to format the drive you’re plotting on the first time around.

What it is actually doing:
File systems normally keep some extra information around so they can recover if the computer crashes while writing data to disk. Ext4 keeps this information in its internal “journal.” The mke2fs command above asks the OS to make an ext4 file system without an internal journal, which is not safe for file systems that store data you actually want to keep (i.e. your normal file systems). Disabling the journal in ext4 (slightly) reduces the number of writes the file system must do in some cases and also reduces the number of times the file system has to wait for in-flight writes to complete before writing more data (the journal has some ordering requirements for writes). This setting risks leaking resources (like space) if the machine crashes while writing data or possibly losing/corrupting files or directories if the machine crashes while making/moving/modifying files (there’s other things that could go wrong, but these seem most likely given the workload). Please note that you could probably do something similar to disabling the journal with XFS, I just don’t know that file system as well as I do ext4.

Since this is for a file system that is used solely for plotting, completed plots do not stay on the file system long, plots cannot be restarted if the machine crashes while making them, and users already run the risk of losing completed plots if the SSD dies before the completed plots are moved, I feel this is a small additional risk to take. My assessment of the possible risk may differ from others’ assessment, so there is no need to try this if you feel the risks are too great

Harris · May 16, 2021, 7:01pm

IMO the most important thing you can do for SSD health (to keep the Write Amplification Factor low) is to ensure TRIM is enabled and this should ideally be asynchronous for the best performance. ext4 doesn’t support async TRIM so I would advise sticking with XFS or Btrfs which do.

If journaling can be disabled for either of those other filesystems then that might be worth trying but there are many other mount options for these that can be explored as well for tweaking performance.

Kwyjibo · May 16, 2021, 7:21pm

Hey, I have a couple of 512GB SSDs I plan on deploying in Raid 0 via external usb bay that let’s you enable Raid on upto 2*4tb drives. Between my day commitments, family errands, few other chores, I am trying to get a hang of different business models that can sprout out of this new opportunity. I will definitely test it out on solo drive at first if it shaves a couple of hours - presently logging production time on 6gbps SSDs to see what we are working with… My question to you is, if let’s say we can see 15-20% performance improvement, will this work out on drives configured in Raid over USB?

Expert-Sandwich-5240 · May 16, 2021, 8:01pm

Oh that is quite interesting, thanks for adding to my current knowledge of file systems!

TRIM/discard is definitely something to keep in mind given the file sizes being used/size of SSDs, and returning the maximum number of blocks to the SSD possible when files are truncated/deleted should probably take priority over removing a couple of write barriers per ext4 writeback.

The man page for ext4 shows a discard mount option, but it is not enabled by default because it has not been tested thoroughly yet. A blog from 2015 shows poor performance on ext4 with the discard option, but given it’s age I’m not sure if they’ve made improvements on top of that

Expert-Sandwich-5240 · May 16, 2021, 8:11pm

In theory this should work for any setup where you can choose the file system being used. I am not sure how finicky USB external drives are these days, so I can’t confidently say whether disabling crash consistency would be a good or bad idea. To be fair though, I think the last time I messed around with USB storage was in the early 201Xs when external HDDs were common and many drives didn’t get enough power because they just used a single USB 2.0 port.

However, given Harris’ post about TRIM above, it may not be worth using. I haven’t dug around xfs to really know if one can disable logging/journaling on it. From my knowledge of btrfs (albeit several years old at this point), it is more likely to cause more writes to the SSD due to its copy-on-write nature, so xfs is probably the best bet.

I also kind of doubt you would see a difference of hours in performance for a single plot. You will save ~2 barrier operations (where the file system waits for earlier writes to complete) per ext4 writeback, but unless the drive is heavily overloaded that probably won’t be very noticeable

Expert-Sandwich-5240 · May 16, 2021, 8:38pm

Sorry, I tend to give meandering answers because I try to give context to things I’m talking about.

To be more clear, I would say try the external USB bay with just xfs (or btrfs if you prefer) and make sure that works as expected. If you don’t have connectivity issues or anything (skimming around, others have had some problems with various USB things), then you could consider seeing if the file system you’re using can be run without journaling. Otherwise, just leave it as is.

If the USB drive is finicky, it’s probably not worth removing journaling from the file system because then you might spend a fair amount of time reformatting or recovering the file system if things go wrong somehow

Kwyjibo · May 16, 2021, 8:43pm

I found this AC powered external bay that lets you run laptop drives in RAID - I though if I could put two of my 512 drives in RAID with external power, it should be able to hold three tempfiles in parallel on a side unit that I can return to after 24 hours, if it works out I get an 8 hr return rate on three automated plots… Something like that, will experiment when the bay arrives… Thanks guys for sharing all the knowledge, you are, awesome!