Sync is very slow (multiple months)

chronographer · December 8, 2023, 8:15pm

I’ve seen several other topics which are similar in nature, but so far I haven’t seen any solution yet. I feel like this isn’t just an issue of poor hardware performance. I am using port 8444 and am running chia 1.3.4 In the past 4 to 5 months of mostly non-stop syncing, I my local system’s block height gas gone from ~2,5,000,000 to 4,610,000. At its current speed, it will likely finish syncing in about 1.5 days. To start out, my system specs are as follows:
System:
Dell PowerEdge T300
OS: Mint 20.2
CPU:
1x Xeon X3363 @ up to 2.83 GHz; 1 physical CPU, 4 physical cores, 4 logical threads.
CPU cache:
L1 data: 4x 32KB
L1 data: Same as above
L2 unified: 2x 6MB
RAM:
2x 4GB moduels of DDR2, 667 MT/s (Total 8 GB)
GPU:
AMD Radeon HD 8570, 1 GB onboard RAM
Storage:
4x 500GB HDD in a RAID 5(?) (The total capacity is 1.5 Tb and it can tolerate the loss of any single drive, and can be rebuilt from any 3 remaining disks)
It is using a Dell PERC 6/i Adapter hardware RAID controller with 256 MB of onboard cache memory which is connected on a PCI-E bus. I have benchmarked its read rate at 147 MB/s with 12 ms latency
Internet: 2x Gigabit Ethernet controllers (independent of each other) built into the mobo. I just checked their speed with ookla and it is listed as getting about 200 mb/s down, about 400 mb/s up (yes, its higher, its not a mistake) with 4 ms latency. (interestingly enough, I am using the second Ethernet port on this thing to supply an internet connection to another computer nearby, and that computer can consistently get speed tests with 900-1000 Mb/s up and 900-1000 Mb/s down, and gets comparable speeds in real world workloads. (It is somewhat newer, but not nearly as robust or high end for its time, which makes sense because its just a desktop and not a server)

I have been running this server for most of the semester, so around 4-5 months almost non-stop. Pretty much any time I have the server powered on, the chia client is also running. current uptime is 29 days. It seems to process about 3 days of transactions ever day. The weird thing is that it seems that most of the time is spent not doing anything obvious. There are brief periods of time when all CPU cores are fully utilized, but most of the time they are not. Likewise network upload and download utilization are often very low, usually under 1 MB/s. RAM veries, but right now 75% of the RAM is in use, with the remainder being used as cache. The swap partition (only 2GB) is about 80% full, but normally its around 20% when its just Chia running. I would think that would indicate the problem is with the HDD array, and while it does seem to be more or less constantly active according to the indicator on the physical machine, it doesn’t look like its either working on one MASSIVE file or MANY tiny ones either as the read/write rate at any given time is mostly up to 200-400 kB/s, and hardly ever peaks above 4 MB/s, and there is plenty of time where it doesn’t seem to be reading or writing at all, yet I can HEAR the heads moving so its doing SOMETHING. I just cannot figure out what the bottleneck is here.

Jacek · December 8, 2023, 9:12pm

You can break down syncing process in 3 parts:

network (getting raw data)
CPU processing (block verification / db handling)
HD performance (for db handling)

The network is getting raw data, as such not much bandwidth is needed. 200mbps down is way more than plenty (you should see around 1-2mbps during the sync on your box). Whether you have port 8444 forwarded or not to your node, it really doesn’t make a difference.

HD (db) performance can be critical, so run hdparm tests (https://linuxconfig.org/hard-drive-speed-test-using-linux-command-line-and-hdparm) to check it. It is really recommended to get at least an SSD if not an NVMe to keep up with fast syncing.

CPU processing is where usually things break down. The code is (or maybe was) single threaded, so basically you can see it as one block being verified by one single core. If that core will get maxed out, that is where the bottlenecks are. Also, the code is (maybe was, again) broken down into network serving side (one core) and block verification threads (one per core). So, in your case, the best what you can see is the network core being choked for a bit, then dispatching acquired blocks to 3 block-crunching cores. From what I saw before, the network side was not caching blocks while block-crunching threads were busy.

Usually, the other things (RAM, swap, …) are not causing problems, as the whole process is mostly CPU bound and HD/SSD bound. Your CPU is rather old, so most likely that is where the slowdown happens. Although, I am not sure how fast/slow is your RAID array (maybe not so much RAID array, but rather bus speed connecting it). My take is that if you have good enough PCIe slot (i.e., PCIe 3), I would rather get PCIe 3 NVMe card, as that most likely will be faster and more energy efficient.

As far as RAM, the more you have it, the better db performance you get (caching). With only 8 GB DDR2 RAM you are kind of pushing the minimal RAM limit as well as RAM bandwidth. Although, RAM is usually in the secondary problem category.

Lastly, which chia version are you running?

EDIT:
Looking at Dell t300, that box should take up to 24 GB of RAM. Not sure what the DDR2 price is, but if you want to keep that box, I would try to get the whole 24 GB, as that would most likely help with db side (caching).

Also, most likely that box can take Xeon 5400 series CPUs. If those are faster and don’t cost too much, maybe upgrading that CPU for $10 (guessing) would help.

Although, you could buy on eBay something like HP G4 800 SFF with i7-8700 CPU for about $100 and that would be a day and night difference. Sure, you would need to spend more on extra RAM and NVMe.

chronographer · December 8, 2023, 9:22pm

I am running Chia 1.3.4. I belive the RAID controller is on a PCIe 1.0 slot, but it could be 2.0. Im pretty sure it at least is using a full 16 lanes though. This is an old poweredge server I got for like $10 from a company a family member worked with lol. It has slots for 4 more RAM sticks, and I know I have at least 2 that will work with it but havn’t felt the need to figure out what the configuration I had before was that worked (I think I ended up getting 12 GB to work at one point) I’ll have to try the hdparm tests and see how that does.

Another thing I just remembered that i wanted to ask about; In the bios settings for this thing, there was a setting about “optimize for sequential read” that you could either have checked or not checked, and when I got it it was unchecked. There wasn’t a lot of useful info in the BIOS itself about it other than “check this if your system’s workload is optimized for sequential read operations” (duh) but I dont know what kind of workload that is. In fact, I’m not 100% sure if its talking about RAM or HDD reading, although I have a feeling its RAM. I have it turned on right now but to be honest I don’t think I noticed much difference, but its hard to say.

Also, since most of the time the CPU was mostly idle, I also decided to run xmrig on the side (CPU mining) and just set its priority super low so it will yield to chia, and it hasnt seemed to slow it down at all.

Oh, and I actually already do have an upgrade for the CPU sitting next to me on the desk haha. Its a X5460 at 3.16 GHZ. It says 12 MB of cache so it either has the same amount or double depending on if its talking about per core or total.

Jacek · December 8, 2023, 9:37pm

So, you got yourself a $10 box with an extra set of hard headaches. It kind of borderlines with a dead horse management.

For farming, the HD (db) access is not sequential, so having a HD (whether RAID or not) is not that optimal (plotting is mostly sequential, so RAID is fine on that side).

For a single socket box, usually you don’t get much out of BIOS, so I would not waste my time looking there (for plotting on multiple sockets, NUMA BIOS settings may offer a difference).

Isn’t v1.3.4 before the fork version? It means you will not be able to be on the main chain / win anything. Just update to the latest, as there were improvements to the sync process done, plus db layout change (V1 to V2). V2 db is a bit smaller, thus less RAM will do better caching with it.

By the way, maybe get something like bpytop, as that one provides nice cmd line info.

chronographer · December 8, 2023, 9:40pm

I guess I forgot that the chia client doesn’t (or at least didn’t use to!) inform you that there as a new version available. Im guessing the best way to update is to totally shut the client down, maybe make a backup of the db files, then install the update?

Jacek · December 8, 2023, 9:59pm

db files can be (easily) recreated, so you don’t need to pay much attention to that part (still, worth to save your bc db).

Also, your bc db is most likely v1, so you would need to either run a db conversion (what I would not be doing right now) or download from chia the latest v2 db (what I would do). With db v2, the wallet db is regenerated in a few minutes, so I would just blow away that folder (wallet db). You will also need to make config.yaml changes to both db names (change V1 to V2). (Most likely: 1. stop chia, 2. run the latest installer, 3. stop chia, 4. modify config.yaml to V2, 5. run chia.)

What you do need to save is your config.yaml and keyring files. So, make a backup of those.

Still, I would first add those 2 extra RAM sticks and swap that CPU. At this point you are still in the woods, so take your time to do that.

chronographer · December 10, 2023, 4:29pm

So quick update: I am now running chia 2.1.1 (most recent version at this time). I was in fact using the v1 db, so I went ahead and downloaded the most recent checkpoint db from the chia torrent and am now syncing from there. I did notice however that the pooling setup I had does not appear to have transferred over. Do you happen to know where that is stored, or is it easier to just create a new plot nft and start over?

Also, after looking into the log file, I’m still getting messages like this:

2023-12-10T11:14:02.846 full_node chia.full_node.coin_store: WARNING Height 4308611: It took 18.92s to apply 244 additions and 11 removals to the coin store. Make sure blockchain database is on a fast drive

Im wondering if for whatever reason RAID disks are just not very good for storing the db file or something, and if maybe I would be better off getting a 256-512 GB ssd and plugging it into one of the sata ports (this server has 6 on the mobo) and just putting a link from the chia db location to someplace on that drive, so it actually stores the db on the ssd? What I can’t figure out is why this would be an issue. The RAID disk is managed by a hardware raid controller (albeit an older one) and even has some cache on it, and the drives are 7200 rpm. It benchmarks pretty well (~150 MB/s read, I haven’t bothered booting from another disk to benchmark its write speed) so I don’t see why it would be a problem. Even now with the db v2 (which I’m assuming is more efficient or in some way better) the CPU is still idle most of the time (I didnt do the CPU upgrade or add the extra ram because I’m studying for finals and also don’t have the thermal compound for the CPU or the ram on hand) and while having a faster CPU would shorten the time its waiting on CPU tasks, most of the time overall it does not appear to be CPU bound at all, so I am starting to think the issue must be with the disk array itself or something.

Another thing, you said to modify the config.yaml files, what exactly am I modifying?

Jacek · December 10, 2023, 5:52pm

Most likely, once you get fully synced (both bc and wallet (wallet may need to be deleted/recreated after bc fully syncs)) your pooling will come back. If not, that info is in your old config.yaml, so can be copied over (do not do that before full sync).

When you are syncing, your node is processing more blocks per second than in a normal run, so I would not worry about warnings about your node being slow. However, that’s a warning that you may want to plan to act on. I would seriously consider getting a PCIe to NVMe card and an NVMe (Samsung 970 evo plus (runs hot, needs some fan, but fast and reliable), WDC black (fast and reliable)). SATA SSD is kind of a borderline, especially during dust storms. In your case, the box is rather weak (slow), so could be sensitive to slower NVMe / SSD. I think that the price difference between SSD and NVMe may not be that great, so the penalty is that PCIe to NVMe card (maybe $15-20). There is no point of getting PCIe 4 NVMe, as most likely you are running on PCIe 2 slot right now.

The syncing process is really heavy on the db side and has plenty of small / random place updates. So, that will put a strain on any HD (array or not) on the stepper motor(s) (head movement). This really degrades HD read / write performance. Again, HD RAID works for plotting as both reads and writes are sequential and in big chunks and there is not much disk fragmentation. However, for farming (syncing), db is being updated constantly making it a highly fragmented file, thus hard on any mechanical drive.

As I mentioned, the code is rather single-threaded and serialized, so just looking at the overall CPU may not give the whole story. If the main node process will max out its core (25% of CPU in your case), nothing else will be processing at that time, so the other 3 cores will go idle. Once that thread will finish the job, it will dispatch most likely 3 threads for block crunching. So, not knowing the details, your CPU chart may be showing exactly that (that 100% burst is when the blocks were processed). At the same time, if that main thread will be stuck on HD side it will not be maxing out one core. So, maybe what is on that chart reflects that (maybe you can change sampling rate to a bit longer). Just run top (or bpytop) to see what processes are taking the CPU time. (By the way, the main node process has the same name as the block processing ones, so the count of those processes shows where the processing is.)

As far as those changes in config.yaml, check what are the names of dbs you have right now (bc and wallet should have V2). It looks like that in the latest chia version, the default is V2, so no need to modify config.yaml. Here is an old post about those db names - Chia Blockchain 1.3 Beta released - #16 by Jacek

chronographer · December 10, 2023, 9:58pm

I just re-checked the config.yaml and it looks like everything is good. (I was being a dingbat and for some reason was checking the keyring.yaml file before, which of course has nothing to do with the db or wallet, hence the confusion.)

I am definitely not cpu bound for the vast majority of the time, even for single threaded operations. I have my system monitor set to divide cpu usage by cpu count, so a task that is fully utilizing one core will display 100% utilization, and the vast majority of the time all 4 cores are sitting at single digit utilization.

As for getting a separate drive just to store the db on, what size do you think is reasonable? The part of me that is cheap is thinking something in the 256 GB range, but of course I don’t want to run out of room any time soon. As of block height 4,316,214 (where I’m at now after starting the syncing process from the most recent checkpoint this morning) it looks like the db is about 118 GB, so unless transaction volume suddenly goes up substantially, I feel like 256 GB is probably safe for now (and eventually I can always get another larger drive and use this one for plotting or whatever too)

As for getting an NVMe over a SATA ssd, I actually didn’t really understand the difference until I looked it up just now. I’ll have to look into that.

Jacek · December 10, 2023, 10:38pm

The two most probable causes of slow syncing are either CPU/RAM or HD (or media db is on). So, your box looks like is smooth sailing, but the progress is really slow. It is hard for anyone but you to put a finger on the issue. Although, one of the things that is also impacting performance are temps, especially RAM and SSD/NVMe. So, in your case, maybe you can check your CPU and RAM temps (if there is a RAM temp sensor(s) on the board). When RAM is running hot, it will start throttling, as such CPU will wait on it. When CPU will get hot, the frequencies will go down, but utilization will still look like 100% (looks like not your case, but …).

Looks like you are running GUI on your box, so maybe you can install psensor to see what temps you have (it is sometimes a pain to use but can help - helped me a lot).

As far as db size, the normal growth is predictable; however, if any dust storm happens db grows really fast. So, it looks like 256 GB should be enough. However, the price difference between 256 and 512 GB is possibly around $10-20 and you buy a lot going 512 GB road. Your TBW rate will double (as Flash doubled), also r/w speeds will go up (although, not sure if that matters on PCIe 2.0 slot). Another thing you get with the bigger one is that you will be able to do your own db snapshots (only bc sqlite file will do), although, you could put those snapshots on a rust drive.

chronographer · December 10, 2023, 10:51pm

Yeah, I have psensor installed and unfortunately it has only found temp sensors for the CPU and GPU (which seems weird to me considering this thing was designed as a server) but as far as I can tell there doesn’t seem to be any thermal throttling. Im thinking the issue must be something to do with the RAID controller or something. I’ve never had another machine with a RAID disk before so I dont know what is normal, but about once a second it seems to do a quick access of each physical disk, one at a time. Im guessing this is some kind of heartbeat check or something to make sure they are all still working ok, but Im wondering if maybe it is harming throughput for large numbers of small reads/writes. According to the system monitor the “start_full_node” seems to be perpetually reading and writing, but most of the time only with speeds up to 1.2 MiB for read and between 500 to 900 KiB for writes. Unfortunately I haven’t seen any program that gives detailed info about RAID disks and/or can talk to them while the OS is running. I feel like something like that might answer a lot of questions.

The other weird thing is it seems like syncing gets slower as I get closer to being synced. Maybe its just because early blocks in the chain were smaller, but I initially had it syncing without the snapshot and it burned through about 300k blocks in a day, and today starting from about 4.3 million blocks, its only synced an additional 18k.

I hadn’t thought about the power consumption/TB before, thats a good point.

Jacek · December 11, 2023, 12:24am

Sorry, I don’t have much experience with RAID, so cannot help you there.

Although, as those speeds are so low, maybe as a test you could just connect one rust drive to your SATA connector on board and see how that would work. That would maybe help us eliminate PCIe2 interface your RAID card is on, your RAID card and eventual problems with RAID setup (anything in syslog?).

As far as my understanding of sync, there are 2 phases (maybe 3). The first one is when you are “far away” from the current block, and that should run at kind of a constant pace. (Although, if there was a dust storm on some blocks, those blocks need more processing, so this could be a slowdown you see.). The second phase starts maybe at 30 blocks (just picked that number from a hat) or so, and at that point it may slow down. Not sure if that interpretation is valid, though.

That start_full_node is a Winnebago process. Kind of the main part runs 24/7 and more or less works like a scheduler between the network, db and block crunching processes. Once it fetches a batch of blocks from the network, it dispatches them to (3 in your case) start_full_node process that do block crunching. Once those processes are done, they go idle waiting for new blocks (what may take some time if the primary start_full_node cannot keep up. Usually, it takes some time (very long seconds) for those blocks to be processed, so you should easily see the number of start_full_node processes grabbing as much CPU as they can every so often. If you don’t see those 3 extra processes, most likely the primary start_full_node is waiting for something (we are guessing HDs).

chronographer · December 11, 2023, 11:16pm

I hadn’t thought of just putting the db on another hard disk that wasn’t in the RAID disk, thats something I’ll have to try at some point. (unfortunately right now I’m in the middle of studying for finals so I don’t have a lot of free time to tinker with stuff like that as much as I would love to)

Another thing just occurred to me as well though. Out of the 1.5 TB of total storage on the RAID disk, only about 300 GB is free. My understanding is that ext4 doesn’t have the same problems with disk fragmentation as NTFS does (not sure why that is exactly, but I never seem to hear about having to do regular disk defrags on linux the way you are supposed to on windows with a HDD) but I wonder if maybe having relatively little free space might be slowing things up a bit? If not because of fragmented files, then maybe just because the reader heads have to wait for an empty part of the disk to come around since statistically there is less space that they can write to than there is already utilized space. I already moved one of the existing plots I have on the RAID disk to an external drive when I downloaded the db checkpoint, but maybe it might be worth moving more of them?

Also, I think the dust storms might have a part to play in this as well, as the sync speed seems to actually be highly variable. If I understand correctly, a dust storm is just a period of unusually heavy transaction volume on the network, right?

Jacek · December 12, 2023, 1:24am

The problem with chia syncing (from scratch or to keep up) is that actually the bc db is being fragmented on every db update. Moreover, when records are deleted, that disk space is still kept by the db as used, and eventually may be filled up. This is the primary reason for that main start_full_node to choke on HD media. The partition fragmentation is a secondary issue, if any. Think about that db as a one big flat file (from OS point of view) that is internally formatted by the db, so that file could be fragmented due to the disk fragmentation, and on the top of that it is further fragmented by db operation. Also, dbs are really benefiting from RAM caching, and your setup is rather short on RAM, what further exacerbates this issue.

When you run an sqlite vacuum on bc db, you can usually recover 20-40% of disk space (if db is used for some [long ?] time). As those db updates are basically tiny writes, the heads need to move around a bit to first get the source data and then write the updates or create new rows. So, getting an SSD removes those heads movements all together while having about the same write speed (say 2x faster sequentially, but maybe more like 10x faster on small random r/w). On the other hand, the NVMe gives you about 10x faster writes (not sure about PCIe 2.0 speed, though).

Also, I think that keeping plots on RAID arrays doesn’t make much sense right now (with fast plotting). In case RAID will go down, all drives need to be replotted, where just one non-RAID disk will need to be replaced. Also, RAID 5 is meant to be really resilient, as it is immune to a loss of 2 drives. There is nothing about those plots or db that requires such resiliency as everything can be fairly easily regenerated.

Yes, you are right about dust storms, they generate plenty of extra “junk” transactions that need to be processed by those start_full_node crunching processes (i.e., slower sync progress) and of course, the db needs to ingest all that junk.