Update on my plight

Lsherring · November 15, 2021, 6:12pm

Jacek is a Chia Stud! After four hours of troubleshooting, my rig is back to normal. There was never a network issue, we proved that. The resolution was to reduce the # of peer connections from 80 to 20. We saw instant impact for the better on memory and I got over a gig of download. The twenty peers eventually went to 10 as I was getting more than a gig of data. I highly suggest changing the peer count while synching. What took two weeks was done in a 36 hours. We also moved the DB to the NVME after scratching the DB. I reclaimed all my xch from the blockchain as well as the pool. So, I can tell everyone first hand, you can recover and not lose anything. I’m also on 1.2.9 even though Chia came out and said it was crap. If a version is crap, why do you have it still downloadable? That is programming 101. The peer count made a massive difference as well as putting the DB on the NVME.

Jacek knows his stuff and I would not just disregard his feedback. I have been in HW Engineering for 40 years on the data center side and I can tell the real deal. Jacek should be working for Chia as he knows his stuff.

We have a few other folks in this community who are awesome and know their stuff as well and I want to thank them all for helping me through the worst experience possible I have ever had for Chia. Dust storm during an upgrade be damned! LOL

BadgerStork · November 15, 2021, 6:40pm

I gave you the thumbs up but seriously FLEXFARMER!
(lol I am that bloke that always said that)

Bones · November 15, 2021, 7:00pm

Glad your sorted.
Where is 1.2.9 still available pls?

I need to roll back, .10 has just dropped for the 2nd time for me today…

On restart all my plots are vanished again…

Jacek · November 15, 2021, 8:51pm

Scroll down on that page to “Release Notes” and click on version you want. You will see ChiaSetup-X.Y.Z.exe there.

Although, if you want to start from scratch, I would really go for v1.2.6/7.

Jacek · November 15, 2021, 9:00pm

Glad to see that you are up to speed right now. Few people faced problems around ~800 height, and I was afraid that you were hit by that as well, and just gave up.

Although, you are too kind to me. What I did is what others would do as well. I am still in debt to those people, as I am just scratching the surface.

We do need events like Dust Storm, as those expose weaknesses of the network.

Bones · November 15, 2021, 9:00pm

I was stable on 1.2.9 for a good while, but that was on my plotter box.

Checked my log earlier when it dropped again, said error, can’t connect to harvester, banning for.
.
Fancy banning your own machine lol.

Jacek · November 15, 2021, 9:09pm

I would really like to have Flex take over Chia code (have Chia outsource it to Flex, so they still control the overall architecture). That would kind of separate network development from the banking focus (vertical development).

I would not mind starting / signing a petition to make it happen.

The problem with Flex farmer right now is that so far, they have only 400 PB of space (say 20,000 nodes). If anything will really shake the network, their farms will potentially be still intact, but that would be the only part working, so potentially not good enough to have a working network.

Jacek · November 15, 2021, 9:43pm

I would like to take a crack at that number.

Assuming that an average download speed is about 100Mbps, ~30 GB blockchain db should take less than an hour to download (~40 mins). I understand that the supplying peer may be limited to just few Mbps upload speed. However, this is a peer network that should be able to parallelize that download (getting that db in chunks, and locally assembling those).

Yeah, we all can say that we don’t want our nodes to be choked by those downloads (what actually Dust Storm did), but on the other hand, when those “other” nodes are struggling, quite often a solution is to kill the current db, and restart the process again and again, so the problem is really multiplying the number of bytes the network needs to exchange.

md-chia · November 16, 2021, 12:52am

Now that is an interesting update , thanks for this @Lsherring & @Jacek

Lsherring · November 16, 2021, 3:33am

I’m confused by the comment about flexfarmer…I don’t use flexfarmer and never used the term. Help a small brain out…

Lsherring · November 16, 2021, 3:34am

If you do a google search on chia 1.2.9, it will point you to github. All the versions are there.

Jacek · November 16, 2021, 5:15am

It only works on Flex pool. Here is the info:

gtl3 · November 16, 2021, 6:27am

" … farmers to our hosted nodes over a blockchain bridge gateway which runs on powerful infrastructure hosted by us." Not to keen on the terms I highlighted. At least they are up front about it.

Bones · November 16, 2021, 9:27am

Not sure if I’m brave , stupid or both, but im gonna try out .11 instead first.
Wish me luck, I’ll report back when with any issues.

Initial thoughs installed fine, plots loaded faster than with .10.
Let’s just hope she’s stable.

Downside, my nfts aren’t visible, they were there on .10
Guess I’ll have to resync wallet.

Jacek · November 16, 2021, 5:43pm

Actually, one more observation about that 36 hour download. That boils down to less than 2Mbps download speed. It implies that the whole sync is done using just one peer at a time. This is completely retarded. If the blockchain updates will exceed that speed, it implies that most of the nodes will not be able to be synced. With that being said, my take on Dust Storm right now is that this was potentially the second or maybe the first problem while so many nodes went out of sync. This implementation is just not production ready.

Still, assuming that all is well, and a box is trying to sync (from scratch or so), again that is just 2Mbps / 0.25 MBps download speed. How come blockchain db on whatever medium cannot handle that speed. How bad that code is (I understand that this is not the only activity on that db, still we are talking about syncing). I guess, this is the end of the “low performing nodes being mostly affected” explanation for Dust Storm, at least for me.

Actually, a corollary to that is that if someone needs to sync from scratch, maybe the best option is to set the number of nodes to just 1 (on the top of your local connections), and partner with someone that has a decent upload speed. Assuming that your partner has about 20Mbps upload speed, that would basically shorten that time from ~40 hours down to 4 hours.

Bones · November 16, 2021, 8:03pm

Well so far so good, needed a restart after adding another drive but thats normal on that box, otherwise all looking good.
Wallet synced to just over 600k.
But .10 lasted 24 hrs before its first drop so not drawing any hasty conclusions.

nontechguy · November 17, 2021, 12:22am

On my plotting rig, I-9 10850k, I found keeping my peer number under 10 and killing any connected nodes that aren’t synced or syncing (show no peak number and nothing up or down) keeps me from having any sync problems. On my farming rig, AMD Athlon 3000 G, it is still having problems with peer count set to 8. When it was at 80 it would not sync at all. At 20 I was still having issues and had to manually drop bad nodes all the time to get it to sync. At 8 it seems to be okay some of the time. If my connection doesn’t stabilize I am going to try dropping it down to 4. It appears I am getting most of my data from my first two peer connections anyway.

I feel bad for the people who are trying to keep synced on Raspberry Pi’s. I also think having Chia tell everyone to kill un-synced nodes is making the problem even worse for the people that are not already synced because the peer connections are constantly starting and than getting dropped and than starting again. 1.2.7 was the most stable release for me and everything after that has been problems but I am trying to make 1.2.11 work

Jacek · November 17, 2021, 1:12am

You should not really have two full nodes, if your plotter and farmer are on the same local network. Both of those nodes will UPnP compete for port 8444, and part of the problem could be clashes around that.

Saying that, if you really do want to have two full nodes, and want to keep your current setup for a while, I would make your farming node peer with your plotter node only. This will remove all extra peer connections from your farmer, and will give it a 1Gbps connection to your plotter, as such it should never have syncing problems (IMO). At the same time, your plotter will do all the blockchain heavy lifting with those 10-20 connections. I have never done that, but I think that it should work. I think that some other folks may have tried it already.

Also, I would not really bother with manually killing any peers on your plotter, as that can soon become your primary job. Let the protocol sort it out, and just hope for the best.

Your plotter is way more powerful than my full node (low power i5 NUC), so with the number of peers dropped down to 10, the question is what is your blockchain db sitting on? Is that HD, SSD, NVMe? Try to stop your chia, and once down, try to copy that blockchain to another drive. If you get low xfr rate (I would say that everything below 100MB/s is bad on your rig), this is the time to move that db to an NVMe (although, not the one used by your plotter).

I am really not sure, whether those are really nodes that have problems, or rather that your plotter is having problems (this would be my guess), and just cannot chat with those peers for some reason, thus those garbage numbers. This looks like what Larry had (a lot of “bad” peers). In his case all got sorted out, but he also started by killing his old blockchain db, downgrading to v1.2.9, and syncing from scratch. I am leaning to say that potentially that indicates that you may have a corrupt db. Again, I didn’t have that problem, so cannot say that I do know what to do in this case.

The reason that I am saying that the problem with those “bad” nodes rather points to your rig is that in my case, I am fully synced, and I have not one node like that. Virtually, all nodes are synced, or about to be, and just few are really low, it means starting from scratch, as I see my node is sending plenty of data to those. We both have the same P2P protocol, so it is rather unlikely that for whatever reason we would see such big difference in the status of those connected peers.

Although, saying that, I am still on v1.2.6, and Larry did his sync on v1.2.9. That v1.2.11 was a rush job to address Dust Storm, as such maybe it just crippled P2P protocol, and thus are those “bad” nodes.

Maybe folks that are on v1.2.10 and below can chime in, if they see those bad nodes. Maybe this is just v1.2.11 problem.

Chris22 · November 17, 2021, 2:30am

We’ve written a farmer in Go that fixes a lot of the bugs, inefficiencies, and mistakes in the base chia farmer. We then created an entirely new PoS in Go that did the same (flexfarmer 2.0). And now we are working to build the rest of the components needed for a light node with a KVM database. We don’t believe SQLite is functional long-term. We are wrapping what we can and creating what we can’t. So expect a much improved node in anywhere from 1-4 months that will still take us months to debug and complete that we are funding completely inhouse using our own funds to hire on talented devs who are enthusiastic about Chia.

We have experience running and modifying a very advanced node (Geth) and are using that experience to create something that is a Lamborghini with the reliability of a Toyota compared to the current node and farming experience that was rushed. Chia created a product that works. But obviously theres a reason no one still drives the Model T.

Chia Network does have creating a better node in their long-term plans. However, I know they are very busy with their two current projects so I suspect it isn’t happening for at least a year. We are fulfilling a need for Chia completely on our own initiative despite being the 8th largest pool and less than 10% of either of the top 2 pools. I know we get a ton of flak for not open sourcing things, but I would point out that at least we are doing things that will benefit not just farmers but Chia in general and I think we’re light years ahead of any other pool when it comes to giving back. We operate out of a first world nation and our employees are generally well known in the community including the devs who have seen the flexfarmer code and honestly if we were going to steal we’d have done so on the ETH pool that has 23+mil CAD in the account we were the first pool that shared MEV income instead of keeping 100% and all the other pools were forced to stop keeping it all and follow us we could have made a lot of money if we kept it like most pools are now doing through EDEN. (One lucky member on Chia forum also got to meet me in Victoria as I purchased his rig). In 4 months if the network crashes our node will be there to continue carrying the torch no matter how badly the network is spammed. And if your node crashes you’ll know theres a node that will sync reliably in less than 1/10th the time.

Even if you don’t want to currently run flexfarmer I think your crazy to be on any other pool. Our fee is among the lowest while if your node crashes you can run flexfarmer as a backup while you work on syncing it again. Having a backup for a failed node is a huge safety net to any farmer and if your node crashes you won’t be able to switch to us and run flexfarmer so your hooped while you spend a couple days syncing it again.

There are several methods that can be used to crash the Chia network. The 0 mojo transaction spam was honestly just an obvious one that we knew about from the start. Until a production ready node is released the network is not ready for prime time.

Lsherring · November 17, 2021, 6:45pm

Wow!!! Great interation from the community! This is how it should work. Lots of good ideas from some smart folks to help others and continue to make the community better. I love to see this form of interaction. No bashing, just good info, constructive info all being used in the spirit of making a better system for all of us.

Now, that being said, I have a new one for the team. The day after finally synched, I won a block according to space. However, according to space they never see this and it comes from the chain. It also has direction to see your block and get into your wallet f you have not received it. Well, needless to say, I have not seen it and the direction are not working. Any thoughts on this new issue??