Periodic bad response times

So my farmer is having issues.
I had some badbit / failbit errors, sorted those out.

My machine can run fine for 4 hours or more with no issues but then will suffer between 1 min up to >1 hour with poor response times.

No specific drive, all seem affected.

I do sometimes see 2 different errors, and will update this post once I’ve seen them again as I just cleared my logs.

Iirc

  1. Is not being able to fetch block from 1 specific peer.
  2. The other is a connection issue saying ssl error , unable to connect, peer may be running old code.

As its intermittent, and I’m farming on a dedicated pc that is not using more than 8% of cpu at most, sometimes 1 % I doubt its a pc issue. It does only have 8MB of ram, but thats sitting at 56% used.

All drives connected over USB 3 with powered hub.

Any thoughts appreciated.

Edit. Thinking in may be my isp, getting some horrible drops in speed.

following…
Though in my case system restart worked.

Mines on and off, can be fine for days, but the last few have been poor.
My speed dropped to 8 Mbps earlier though, and my provider has been in the press from outages this week.

Just hoping it’s nothing worse, didn’t really want to stop plotting yet.

So, little update, hoping I fixed the issue, fingers crossed.
All my plots always showed fine, so naturally I assumed the usb devices were connected properly.
I looked under devices and printers to see yellow exclamation marks on 3 devices ( orico 5 bay hubs).

Turned those off, re added 1 at a time, on re adding the third the other 2 dropped.

I’ve added a 2nd usb hub to my pc, connected those hubs to it, no exclamation marks.

So all is now looking well, just hoping it’s resolved.

Maybe 500TB in hubs was just to much for my old pc over one usb port.
Maybe as by using 5 bay docks, I was plugging hubs into hubs basically.

If that doesn’t work consider just switching to our farmer. I hate to say it as a magic solution but it really is.

I have been debating it, I just like solo, but if my issues not resolved I think I’ll have to do something.
Many of my plots are OG, so solo is simpler, especially for taxes if I end up owing any.

I have to ask, why do you consider these errors? They appear to be others problems, not yours.

Just that I’ve been getting really long response times when it shows looking up qualities.

I was just trying to provide as much info as possible as I couldn’t suss why.

Indeed, they shouldn’t have caused it but we’re the only errors I could see.

Haha “if”.

We do keep track of payouts and when they are made so you can calculate price at the time and tax burden (and you can increase payouts so they are big and made infrequently to reduce the math+take them when chia price is low)

I dont think waiting to withdraw helps.
If your gonna pay taxes best to do it right, so needs calculating as earned, not as withdrawn.

According to our terms its not yours until you withdraw so its only earned when you receive it.

1 Like

Running fine now for 12 hrs, no errors, 2 warnings ( not related, that drive is critical and dying ).
My isp has got my speed back as well, but I’m betting those yellow exclamation marks were the main cause.

So, all good for 18 hrs then looking up qualities starts playing up again…

It seems win 10 has 2 lots of settings that send things to sleep.

I checked in comp management and some of the hubs or controllers were still enabled to sleep.

Really hoping this is it this time.

Instructions in the link if anyone needs them.

1 Like

So, my response times are still poor periodically.
Kinda out of ideas now.
Gonna set up chia on another PC and see how that goes.
At this point I can’t really warrant buying any more drives.
My pc should be able to handle the load being as its doing nothing else and I’m not transferring plots.

Kind of rambling, as I also don’t know what to chase.

If we assume that there are problems with those USB drives going to sleep from time to time, then maybe we should consider what are the potential sources of that

  1. USB drive can go to sleep, if it feels that is not OS controlled
    1.1. This would imply that maybe your USB hubs are acting up? Whether this is due to OS doing something that they don’t know how to react to, or just feel like doing that.
    1.1.a. Maybe you can first get a new USB hub or two, and split your setup into “most part of the existing setup” and move few HDs to that new hub(s)? After that monitor your logs, whether those HDs are also acting up.
    1.1.b. Did you check your Event Viewer for errors?
    1.1.c. That smartctl program can setup drives directly to not go to sleep, but in my case it cannot do it due to USB controller not letting that command pass through. One way would be to remove those HDs from their cases, do that settings while they will be sitting on SATA, and put them back to their cases.
    1.1.d. I would first run “smartctl --scan” to get all drives you have enumerated. Then, once you see those errors run “smartctl -n idle /dev/sdX” on every drive, and see responses. You should only see “Device is in ACTIVE or IDLE mode”, i.e., drive is not in STANDBY, … If you see those STANDBY, we know that we still may have power-down problems.
  2. OS will put HD to sleep, if if feels that there was not enough requests to keep it up and running
    2.1. As we discussed Chia should hit some drives every 10 secs, hopefully all drives within that 20 mins default timeout. If that is not happening, then maybe there are two reasons
    2.1.1. Maybe really small drives are kind of borderline hit, thus from time to time they are not asked to do anything for that 20 minutes
    2.2. For whatever reason, Chia is not getting enough requests, so those events are not happening every 10 seconds (e.g., network/ISP problems)
    2.3. Although, when you followed that guide, and disabled power-down, this case (due to OS) should not happen. Although, I don’t have those settings done on my drives/hubs, and I don’t have those issues. I would assume that this is the case with majority of us (not having it done), So, that kind of also excludes OS being the culprit.
    2.4. Running that smartctl program as outlined above would make sure that there are no power-downs, though.
  3. Maybe the USB port on your mb is acting up? Maybe you could buy a PCIe-to-USB card, and try to see whether that would help?

I reiterated what mostly was said before, just to have it in one place. Maybe someone will add more cases there.

Not sure what else we could point to (PSU, or not that clean power source; therefore, those drives/hubs would be borderline floating). Maybe you can change your wall socket that you use to power those HDs. Maybe you could push a big file between your computers when you see errors, to exclude any Ethernet related issues.

When you use a different computer, basically the only thing that changes is the motherboard / USB port. The rest is more or less the same. Although, one more difference could be that you have something loaded on your computer (maybe this one has something old that wants to put its fingers in USB settings).

1 Like

Thanks for your input.
I’ve got spare hubs and other pcs, so my first thing will be trying those.
I just don’t get how I can go 18 hrs without an issue and then it pops up.
I’ll try splitting the drives over a cpl of pc’s as well.

I pulled my bad drive today hoping that would help but if anything it looks worse, getting some responses on the 60 to 70 second range.

Trying not to stress.
But a pita as I think its OK then it fecks up again.

Edit.
My smallest drive is 12tb, so no small disks other than O/S drive

If some component is acting up (e.g., partially bad hardware), that is usually the behavior - everything is well, and than crap happens. That can be also a result of power fluctuation (say your neighbor is using his microwave, or some heavy duty pump).

Actually, that may also point to some software that is degrading with time (e.g., memory leaks). Maybe you could change your Chia version?

I would check out that smartctl, as it will identify eventual power-downs, as those should not be there.

Lastly, when you see those errors, maybe you can map them to your USB hubs. Maybe that will point to a common hub that is causing that (this is what I suggested with that new hub at the root, to get a new clean branch). I am kind of leaning toward this being the problem.

Well, it’s late and I’ve isp man coming tomorrow, as everytime I think that’s fixed it plays up which isn’t helping analysis.

For tonight I’ve completely removed 1 hub, I know it had 1 bad port.
All devices seem fine on the replacement, which wasn’t the case with the first one I used, and they are identical hubs.

As always hoping it’s fixed so I can stop messing with it.

If you’ve tried everything and their pool plots you could just use our farmer and 90% chance the problem is solved. We wrote everything new and solved almost all the issues with the original farmer.

I’m convinced it’s a hardware error on my end.
If all else fails I’ll consider it, well see.