Chia Lost Sync Overnight and is No Longer Syncing

There was a dust storm that ended about 7 hours ago and lasted around 8 hours. That didn’t help your syncing for sure. Although, the fact that it stopped at the same place is bizarre. But, this screenshot looks like you still have block being processed, or rather it stalled at that block?

It seems to have stalled at that block. As in, it has not processed any new blocks for a while now. Whereas most blocks seem to process in less than 1 second.

Could you also do a screenshot of your Connections section (below Blocks section).

Connections:

For whatever reason, your node cannot talk to any of those peers (there is nothing in Height column). One of the reasons could be that you have just too many connections for your CPU.

Could you change in your config.yaml the following entries (first up to 10 from 8, the second down to 10 from 80):

full_node:
  ...
  target_outbound_peer_count: 10
  target_peer_count: 10

And restart your chia. Just make sure that everything will shut down. If in doubt, reboot.

Once you have it running, wait for 5-10 mins, and redo the Connections screenshot.

1 Like

I just restarted it with 10 peers on both settings. Will report back in a few minutes.

It seems to be making progress now!

2 Likes

Yes, your peers have heights now, so are properly connected. Also, you are drawing data from them. I assume that on the previous screenshot those Up/Down values were just stalled counters.

Hopefully, you are set for now.

2 Likes

Jacek,

Brings back bad memories…LOL…I’m glad to see he is making progress as this is what got me back up and running. Awesome teamwork :slight_smile:

2 Likes

Actually, you were the very first one to reply to @thepaulo and you suggested that modification. However, for whatever reason that was ignored.

And, yeah analyzing those debug logs is kind of futile, as nothing was pointing to this problem in what was quoted above. For whatever reason (lack of experience), chia code instead of aborting on critical errors merrily continues, basically flooding the debug output with secondary errors that mask the root cause.

1 Like

Jacek, Thank you! I tried the reduced peer counts in the config.YAML file.

> target_outbound_peer_count: 10
>   target_peer_count: 10

Running Ryzen 2200G, 4 cores, 4 threads. 
Upon restart, status went from 'Not Synced' 
to proceeding to sync to a new peak number, 
whatever that is. So it chugged for 15 minutes
 and gave me the "Not Synced" message.
 I cussed it, and then it changes to 'Synced'.  


1 Like

Just to make sure, those both lines have to have the same number of spaces in front. That is how yaml works, and unfortunately easy to drop one extra space there, and be screwed.

Are you suncing right now, or it still struggles?

1 Like
Thank you for the followup. Luckily I edited with no change in spacing.

Still in sync.

Jacek and I went through hell and back for a couple of days when the worst dust storm took place. I got blasted from left field, blindside, LOL, however after much troubleshooting with Jacek leading the way, we nailed it, and keeping the peer count down has made my rig more stable and lucky enough to win a few blocks. Good luck to you and as always TY Jacek for helping the community. :slight_smile:

1 Like

Hi Larry,
Curious regarding the peer counts you settled on. Could you share that?

As long as I’m updating, switched the node over to a Ryzen 3200G. Motherboard on the 2200G system was having recoverable PCIe errors at the USB controllers.
Again one thread per core with the 3200G, and found allowing three threads for miscellaneous Chia workload and only one thread for plot generation reduced warnings. The three threads are pretty well doing nothing on average but I do see three and four threads of activity being recorded in the “TOP” utility in the Ubuntu terminal.

That is the problem. There is no magic number, but chia software insist on using some garbage like 80 or so (as most likely that person had no clue what to do that day). That number should be adjusted by the software based on the transaction load / peer connections status (dynamic) and node capability (during installation establishing max).

So, you may want to consider three scenarios: 1. all is good, 2. you are trying to catch up (syncing from scratch, or because for some reason you are way behind), 3. there is a storm, and your node cannot keep up. If you don’t want to have headaches, you may want to set it up to the level that works well during those dust storms. If you intend to sync from scratch, I would put it to about 5 during that time and move it up once syncing is done (during syncing you are just leaching to get up to speed, so there is no point to have plenty of peers).

To make p2p protocol working, you need just 3 peers or so. However, in case one of your peers will go bad, you may be kind of SOL, so that is not a good number (consider it theoretical lower limit). If your node can run cleanly with 80 peers, I would put it right there. I would monitor my node at that point to see whether CPU is choking, or you are falling behind. If so, would drop by 10 or 20 - depending how badly your node performs, and eventually repeat it. There is usually plenty of time to run those scenarios during those storms

I would stop when you go down to about 10 or so. If your node cannot handle 10, that may imply that you have some other issues (really slow media under your blockchain db; having dbs and your log on the same media - if your logs are out of whack; …).

By the way, that Ryzen 3200G has plenty of power to be a node. Also, my understanding is that chia is mostly single threaded code (seeing what was happening during those dust storms), so that is a bummer. Is your db on HD, SSD or NVMe?

It is not so much single threaded, more that the blockchain is validated serially. I ran chia on a server for instance and it started using about 20 cpus

Re: 3200G (Acer) farm and plot generation, storage setup. The Acer’s been pretty good. It’s a refurb. Didn’t have luck withe the 2200G HP refurb. Digressing. OS=Ubuntu on one NVMe SSD.
32 GB ram. Occasionally used half of it.
A second 2TB NVMe for plot generation, = WD SN750 with heatsink.
For plot storage, 6 WD Elements HDs, hanging off an external USB 3.1 controller. Honeywell fan blowing on these external drives to keep them cool. (They’re hard drives in an insulating plastic box, Pete’s sake). Read/Write has been ok, up around 100 Mb/s copying large files. I have a no_sleep crontab job running every 2 minutes to keep them high strung and awake.

For this config, PC’s UEFI BIOS struggles with boot if the external drives are attached during boot(single thread finding and spinning up 6 drives, one at a time?). Ubuntu stumbles if more than two drives, at a time, are USB attached for logical mount. So that process is executed every re-boot. i.e. Unmount-Detach all the USB drives, Power-On-boot PC, attach/power on drives 2 at a time (auto-mount), then start Chia Gui.

Since all the peers are fighting for bandwidth and the dbase is a major pain to update, we decided to take it down to what we considered to be the min, ten. I instantly saw the quality of peers come and the throughput pick up. I have not touched it since and it has been as stable as can be. I hope the new release really gets a good DVT so for once we can all upgrade and not have major issues. WHen you do your upgrade, just do it old school. Bring Chia down, reboot, make sure there are no chia process running and then upgrade and pray :slight_smile: LOL :slight_smile: Good luck to you.

1 Like