Slow syncing on full node

Hi,

Trying to understand why my re-syncing is so slow. I rebooted my server for updates and the re-syncing process is staying about 15 mins behind the current block height. It still is syncing but just not as fast as blocks are running up.

-Windows Server 2019
-R720 with 2 processors 18 cores 256GB Ram
-Internet connection is 1Gig

Any ideas of things to check or change?

Thanks!

…and your chia dir & db is on … nvme? sata or sas HD?
…also are you plotting on the same server?

-No plotting on this machine.
-dir & db located on SAS drive

What is your CPU load (and per logical core loads)?
How is your SAS drive load (the one holding your dbs)?

Get NVMe, also PCIe adapter for it and move your dbs there.
Get something like PrimoCache, and put it in front of those two db folders.

How are your connections look like (Full Node / Connections tab). Are most of those peers at full height, and you see UpDown numbers there (snapshot would be helpful)?

CPU and SAS load are very low.

I just shutdown the Chia GUI for a 5 time and reopened and it started syncing quite fast and synced right up. I believe it may be an issue with Full Node/ Connections like you said. Is there a good list of solid Connections and a way for me to set these to always check first?

I have already ordered the NVMe and PCIe adapter for this to install the DB’s does this make that much difference? Hope So

Thanks for all the input!

It will only make a difference if you need the speed to sync. Looks like you don’t, so it may be random issue resolved by reboot/some new peers. I keep my db on an old 500GB sata ssd and don’t see any dif between that and a fast nvme I used to use. Sync speed is the combo of drive speed/ cpu capability / peers avail. As long as you have some good peers, the weak peers won’t matter much at all.

No, there is no such list, plus manually handling those peers is rather a bad idea (a full-time job that leads nowhere).

Yesterday, I setup a new node (i9-10900) and let it sync from scratch. The average blockchain db download speed is around 1Mbps / 125KBps. So, you really don’t need any “solid” peers, as basically just one peer can handle that easily.

If you see problems with your peers (some (a lot) have 0 height, and/or Up/Down rates are not changing or staying at 0/0), that usually means that it is your node that cannot handle all those connections, not that you got a bunch of ‘slow/under-performing’ peers. Most of the nodes out there don’t have problems, so thinking that for some unknown reason those under-performing nodes are ganging up against your node is kind of myopic thinking.

The fact that you see syncing slowing down with time may (doesn’t have to) imply that you may have problems, and as the number of peers grows, your node starts slowing down. If this is the case (again, a screenshot of your connections would help), cutting down on the number of peers may help.

Although right now the network is rather calm (yes, mem pool is slightly elevated, but not much). If your node has problem with those peers right now, that may mean, that the root cause is not really with those peers, but something else is slow on your node. So, it may be a good time to just watch it, check logs / CPU / db drive utilization, and maybe something will show up.

What version of Chia are you running?

I used to have periodic syncing issues. My last one just kept slipping further and further behind.

After a few days of the same, and after trying all kinds of connection combinations, I upgraded to Chia version 1.2.11 and I got synced up in a few minutes.

Running version 1.2.11

I am going to move the database to an NVMe like @Jacek suggested and monitor the HDD/CPU/Ram performance once I have done this. I am thinking its an HDD issue now as my other machines have installs on SSD drives and they seem flawless.

Thanks!

Probably it’s b’coz of the ‘Dust Storm’. Had to reboot one rig on Saturday. The sync was 15 behind when I reloaded all the drives etc. On Sunday it was 22 hours behind and going backwards. Tried various things, nothing worked. Then I deleted the db directory on chia [.chia\mainnet] having an up to date ‘db’ directory from my 2nd rig. Put the ‘new’ db directory into mainnet on the non syncing rig. 20 mins all synced and farming. If you haven’t an up to date db, you can download it from most Pools etc.

1 Like

There was more like a dust belch last Thursday. On Sat, there were two short dust puffs. Although, for the past about six days mem pool is a bit elevated (say 3-5x normal).

If your box is sensitive to such small transaction increases, you may want to start looking what may be the cause of that, and potentially fix it. Otherwise, if something stronger happens, your node will be one of those zombies.

Switch to 1 W (week) timescale - GraphChia | Netspace

Have had no problems at all since May last year, except this weekend when I had to reboot 1 Double Xeon Server rig for hardware reinstall, then it would not sync. But on using the rig 2, HP z440, data base everything is ok now.

It may be that as the blockchain db grows, accessing that db gets slower and slower. So, what worked before, may be just not good enough today.

Also, my feeling is that it has less to do with how much raw CPU power you have, but rather on what media (HD, SSD, NVMe) is your blockchain, and how powerful is your single core performance (looks like just one start_full_node process chokes, and after that the rest - working ones - are just starving for data).

problem coming from your hard drive. trust me.

1 Like

Funny you say that my 2 HP z420’s with 64GB of RAM are just fine also but this R720 server with 256GB of RAM is having the issues. So strange

I am thinking this also. Will update once I get the NVMe installed.

The problem with syncing is basically because of shit code in the main start_full_node process. This process is not really doing much; however, it is basically single-threaded, so cannot expand beyond the core that it is sitting on. This process basically orchestrates work around peers, blockchain db, and processing blocks. To process those blocks, you have plenty of extra cores, so that is not the issue. However, a single core performance is not that great, and this is where that process chokes.

However, peers and db are closely related. The more peers you have, the more stress that puts on the blockchain db. At the same time, that db needs to do inserts of all new blocks (this comes from those extra cores after they are done). Those inserts are rather expensive.

Blockchain db grows every day, it means that db access gets slower and slower. Also, recently (for the past week), we have a bit elevated mem pool volume, and that doesn’t help.

With that said, I assume that you didn’t touch your config.yaml as far as the number of connections and are running with the default value - 80 peers. Before you add that NVMe, I would ask you to drop the number of those peers down to 40 (in the config.yaml), if not enough down to 20, and see what it does. As mentioned already, you don’t need to worry about “quality” of those peers, as basically just one good will be enough to sync.

Dropping those connections doesn’t mean not to add that NVMe, rather to the contrary. The NVMe will help a bit, and then you may need/want to play with the number of those peers to fine tune it.

This problem is just getting worst for everyone. Hopefully, v1.3 will address some of those issues, as otherwise, more and more people will be screwed.

In your config.yaml, change that second line (I assume, yours is right now 80):

full_node:
  target_peer_count: 40

By the way, to see how much help you get from the NVMe, you could run sqlite vacuum first on your HD (HD to HD), and then on your NVMe (NVMe to NVMe) and compare timings.