GUI Sync stuck at 0 or freezes

wobblewoo · March 6, 2022, 12:39pm

I’m on 1.2.11 on Ubuntu 20.04.4 and i spotted my frming had stopped.

When i try and start the GUI again it either stays ay zero or freezes.

Ive tried moving the blockchain_v1_mainnet.sqlite file out of the db folder and that causes it to work and start syncing from 0 again. But if i then move it back, i get the same problem.

Im fearing a db corruption and a total resync. Can anyone help?

scottmasterton · March 6, 2022, 1:00pm

I had the same issue about 10 days ago, using 1.2.11, pc crashed and when I brought it back up, it was showing 0,. I tried various things to sort it but nothing worked and my conclusion was the database was corrupted. I had to resync from 0, took about 3-4 days. Back up and farming now.

seymour.krelborn · March 6, 2022, 1:13pm

Whenever I shut down the GUI (usually for a re-boot), I always make a copy of .chia\~

I never ran into blockchain corruption. But since I have a copy of the blockchain, while in good working order, then, if I should get a blue-screen-of-death (or anything else that does not gracefully shut down the GUI, resulting in db corruption), I will not have to sync from 0. I will copy back the blockchain file, and it should sync from that point (never tested it, because I never had corruption – but it should work).

And of course my copy goes on a different physical disk.

Jacek · March 6, 2022, 4:27pm

It looks like your blockchain db is corrupted. You can download blockchain db either from
https://elysiumpool.com/bootstrap/
or
https://www.chia-database.com/
This will get you started in a couple of hours.

ASPAN · March 6, 2022, 6:43pm

Been having the same issues on Windows. Also occurs to databases on HDD, SDD & NMVe. Replacing the databases with a previous copy resolves the issue so have been taking copies from other machines.

I put it down to database corruption and believe it is due to the size of the database is struggling. Hoping new version resolving this issue comes soon.

Jacek · March 6, 2022, 8:53pm

db corruptions happen usually when there is a catastrophic shutdown during the db write, so just partial data are being written (and db doesn’t have transaction history - e.g., sqlite). This can happen either when there is a power failure during a db write, or due to a forced process shutdown that writes to a db. Chia is not properly handling shutdowns, as sometimes the start_full_node process (that handles blockchain db) is not being stopped when UI goes away. A reboot at such time can cause db corruption.

KryptoMine · March 6, 2022, 10:33pm

Nice information! I have had that: “A reboot at such time can cause db corruption.” already. What can be done to fix this?

Jacek · March 6, 2022, 11:22pm

As long as Chia is not going to fix it, not much. It is a serious bug, but affects us farmers, not really the network or Chia directly.

When you shut down chia, you should monitor start_full_node processes, and hope that they will exit. Also, you can check your blockchain db folder (when those processes exit, there should be only two sqlite files there (the main db and related peers db), as *wal, *shm are temp files). I check for those start_full_node processes and usually give them a minute or so, and just kill the main one, if it still lingers. Actually, the process that hangs is the main start_full_node (the controller), the other ones are usually just lingering there waiting to crunch blocks).

UPDATE:
Actually, thinking more about what you have asked for.

It is a bug, of course. However, db writes are only happening when blocks are being processed (peer servicing is just db reads, I think). Potentially, you could unplug the Ethernet cable, that would basically stop block processing, and in a few seconds (10 ?), that process would be done with db writes, so safe to kill.

I would say, that would be a major pain to do it every time chia is about to shut down (most of us access that box remotely). Another thing is to put more pressure on Chia to get it finally fixed.

This problem is / may be also related to slow syncs (when the main start_full_node process chokes its core), as it is around synchronization between different tasks. Therefore, fixing this problem properly has a potential to speed up syncing (e.g., less issues during weak dust storms).

KryptoMine · March 7, 2022, 7:32am

I have a script “reboot safely” / “shutdown safely” for that very reason.

Should be possible to implement network detach before reboot.
Right now it stops all servicess and attempts to unmount all drives. If they don’t unmount, It wont reboot.
I have had it already that disks could not be unmounted due to write operations and then the disk was corrupted/needed to be rechecked after reboot.

Jacek · March 7, 2022, 8:17am

That is a nice script, at least it prevents a reboot in such case. Although, if that process will not stop, it is a guessing game to stop it.

Also, I am not sure what happens, if instead of rebooting someone will just restart chia. It may be that the second main start_full_node process will start processing potentially the same blocks as that orphaned one, and that may also lead to corruption. Although, I have never tried doing that, so am not sure about it.

ASPAN · March 7, 2022, 8:55am

Useful answer, however it is important to know that I have had the occurrences with no shutdowns occurring, as well as clean shutdowns. This of course is confirmed by your points on Chia not properly handling shutdowns and hopefully that is soon addressed in an update.

wobblewoo · March 7, 2022, 11:10am

Thats exactly what i did. Binned the whole db folder contents and then resynced. Wasnt that bad or time consuming a process.

I might look at running a daily backup of the .chia folder

Many thanks for your answer