Chia-Network down?

Chris22 · October 31, 2021, 9:55pm

Nope but indeed farmers and to a lesser extent even flexfarmer were having issues. Because we use 1 diff issues become apparent immediately and end immediately so you’ll see large dips when events happen whereas elsewhere you will usually see things averaged out so less of a dip but it lasts longer.

BadgerStork · October 31, 2021, 10:13pm

Thank you for taking the the time to reply,seriously I appreciate that you are working on a late Sunday

cyberduck · October 31, 2021, 11:57pm

we also have huge problems here, we have been struggling with lots of stale partitials for about 10 hours. It started this afternoon at about 13:00 CET, stabilized about four or five hours later and now it’s starting again. I’ve been debugging all day today but can’t find the problem. First I thought it was the time change, then I thought it might be the new hard drive we installed the day before yesterday. I have now taken this disk out of the farm and see what happens.

We have two sites and farm together in Space Pool. At first only one site was having problems, now it’s both. I don’t really believe there is a problem on our side anymore.

Are there any new information already?

CD

antbot · November 1, 2021, 12:01am

hi, I’m not able to sync for 1h… (v 1.2.10)
debug log is showing:

2021-11-01T00:51:20.697 full_node chia.full_node.full_node_store: INFO Don’t have challenge hash…
2021-11-01T00:51:20.699 full_node chia.full_node.full_node: INFO End of slot not added CC challenge…

I believe it has something to do with the dust storm. Any tips?

EDIT: reloaded and it synced. Went back to farming

jjs · November 1, 2021, 12:54am

same here. Restarted and we are all good.

antbot · November 1, 2021, 1:08am

BTW also reduced the default peer_count from 80 to 50 as recommended

EDIT: down again…

Bones · November 1, 2021, 1:38am

Peers seem to reduce themselves, I’m down from 80 to 44…

Seems the network is struggling, not a disaster yet, but certainly needs a fix, hope they’re quick.

antbot · November 1, 2021, 1:39am

now I’m not able to connect… this looks serious

Bones · November 1, 2021, 1:41am

If ppl start reducing peer count, that will have a knock on effect that will compound issues…
Not saying it won’t work for some, but at detriment to others.

Jacek · November 1, 2021, 1:41am

I think, this may be helpful to try:

Q: If I’m feeling strain on my node, is there anything I can do to alleviate it?
A: You can lower your default peer count in config.yaml from 80 to something smaller, like 40 or 50 for example, or maybe lower based on your needs. Additionally you can monitor your peer connections and if you see peers that are woefully behind in blocks, and if they show no signs of catching up and are not benefitting from you and only dragging you down, you have the option to terminate their connection from the CLI. (Please only do this for nodes sandbagging you however. If you see peers slowly catching up thanks to you, be a good neighbor and help them!) Also, if you are plotting on the same machine that is your node, you could try splitting the workload between machines or temporarily pausing plotting while your node catches up. Lastly, while we encourage and support the spirit of Chia Forks, halting them on your machine and freeing up resources for Chia specifically will obviously help, especially if you are one of those power users farming 10+ forks on one machine!

Here is the source:

I also see the same thing (up to 80-90% of stalled partials during those storms. I noticed that I have high ratio of uploading to downloading peers. It looks to me that the amount of data that is being pushed to those peers is overwhelming my ISP upload speeds, thus potentially those stales. From my side, it looks like a DDoS attack.

I just dropped the value of target_peer_count (in config.yaml) from 80 to 20. To early to say, whether it is helping, but I am hoping for that. I think that the other suggestion (to manually drop those leeches) is just nonsense, or rather a potential whack-a-mole activity.

One thing that I would like to know is what back-off mechanism Chia is using when saturating the upload connection, if at all is using anything. Another thing, that looks like is missing is not prioritizing farming (partials processing) over those peer upload requests (I think this is what is killing us here, as it is not addressed in that write up).

Bones · November 1, 2021, 1:43am

Oh my , I can’t believe that’s an official recommendation !

Jacek · November 1, 2021, 1:49am

That is potentially partial true. If you think about the peer network, the number of up and down-loads should be balanced. If you see disparity there, then either you are a leech, or leeches are connecting to you.

Assuming that those nodes that are starving for data are really grabbing what is sent to them, those nodes should quickly get up to speed. I don’t think this is happening. I think it is more like there is one process that is requesting data, putting it into a bucket, and going back to ask for more data. However, the process that should be using those buckets is completely overwhelmed, as such most of those buckets ends being destroyed, and the network accessing process starts again from scratch.

If that is the case, those starving peers are basically doomed. Therefore, my take is that if they are restricted (lower connection count), their network data getting process is being starved, but it ends up with the bucket processing process to catch up to it. So, potentially no change for those starving nodes.

At least, that is how I would interpret what is happening, and why reducing that count will not harm those starving farmers.

Jacek · November 1, 2021, 1:50am

I guess, when your code is bad, you are grasping for straws. I don’t agree with a lot of things that is in that document. It is just mostly double talk.

Bones · November 1, 2021, 1:54am

Obviously an issue when I’m a good peer and I’ve lost nearly 50% of my connections.

I dont get the fix that was recomended to change your ip either…
I watch logs, and often see 1 peer banned, then that peer banned again, and again, so I don’t think they’re blocked for long anyways.

Edit.

On the bright side, finally hit 14 days to win…
But I’m confident that won’t last.

MontyBurns · November 1, 2021, 1:58am

I just doubled my peer count

Jacek · November 1, 2021, 2:04am

It may not matter how many peers you have, but rather how big strain on your upload bandwidth those starving peers are putting. I dropped that value from 80 to 20, I don’t see any improvements. Most of the peers that connect to me receive 10x more data than I send them, thus my upload speed is again saturated, more stalls.

That would mean that the guy knew that limiting those connections will not work, and added that whack-a-mole option there (looks like a full time job to me).

I don’t think that was official suggestion, as it doesn’t really change much. Yes, in the peer network, your node should learn to connect to a “stable” node first, so keeps a list of such nodes. When you are on such list, when you reboot, you will get back all those nodes that were hanging off of you. If you change your IP, you are just an infant node, so other peers will start testing you. However, in this case, we have huge problems with those starving nodes, and they look for any connection, not just stable ones. So, changing your IP is basically pointless.

Bones · November 1, 2021, 2:06am

I’ve got 350 mb download, I don’t stress about bandwidth.
My up is shite 40 , but no where near its limit.

Gave up with my moving parts, my poor old board had so many wwrong connectors for my new case I moved it back and slung an ssd in it, I’ll look for a newer board at some point, but after bios update my old pc will boot from ssd, so I’m happy.
It got a 6 mnth needed dust out.

Jacek · November 1, 2021, 2:11am

You are self-pooling, so you either don’t have equivalent of those stalled partials, or those are rather not reported to you.

However, you may have enough upload speed to handle the traffic. I think mine is about 5mbps or so - easy to overwhelm.

Bones · November 1, 2021, 2:13am

I’ve installed farmr, I’m on 100% efficiency since I quashed my bsod’s.

Still no win for 40 odd days, but much of that was spent with heavy issues and rebooting / reinstalling, and I had been extra lucky up to that point.

MontyBurns · November 1, 2021, 2:27am