Storahci.sys UASPstor.sys - problem with storage drivers?

Voodoo · June 15, 2022, 11:15am

So I’ve been getting these errors the system event viewer (win10) for these two drivers.
They coincide with having an entry in my Chia logs “warning lookup quality on plot xx=too long”

The error in the system viewer is:
Reset to device, \device\raidport1\, was issued (each time for different raidports.

Little background:
I’m using a server mainboard from asrock: ep2c602-4l/d16

It has three storage controllers:

Intel (c602 chipset) 4 port SAS controller labeled “SCU” ports on the mainboard. (sata2).
Intel (c602 chipset) 6 port Sata controller 2x sata3 + 4x sata2
Marvell SE9230: 4 x Sata3

Now after a very long time I have finally managed to get them all working. It turned out that the Intel RST driver for the SAS controller was not included in the Chipset Driver package for some reason and/or not working under win10)

Installing the Intel driverset for the C602 chipset causes ramdom freezes, so I installed just the RST driver and this succesfully installed the SAS controller without completely breaking the OS.

But now I have “lookup time” errors popping up in Chia that I used to not have at all. These seem to be caused by a driver conflict or something.

Any thoughts?

drhicom · June 15, 2022, 12:07pm

Windows 10 Pro 64 Bit Hard Freezes Randomly - Microsoft Community

Just encase anyone else stumbles across this, or this is useful to anyone. I was able to fix my crashes after 3 years of trying.

None of the above worked, but what did work was a free program called. Snappy Driver >

https://www.snappy-driver-installer.org/download/

This scanned my PC and found 124 drivers that were either out of date or needing installing. After this downloaded and installed them all I no longer had any crashes. 100% fixed. Why does Windows 10 need this and Windows 7 doesn’t, no idea, but it works!

ASRock BIOS EP2C602-4L/D16 1.70 (touslesdrivers.com)

Worth a shot…

WolfGT · June 15, 2022, 3:11pm

Two possibilities (off the top of my head). If you don’t remember recently changing anything (adding software, changing drivers, adding additional hardware):

Windows updates installed something that is now conflicting with the driver/device. If this is the case, I would be interested to know if the driver installer @drhicom linked helps. Sounds like a nice product if it does its job.
Hardware trouble. There is a possibility (strong possibility) that what you are seeing is a precursor to controller failure. There are different ways to run tests. Am I reading your explanation correctly by thinking you do not have the RST software installed, just the driver? The software is not really that robust but it does show current status and if there are issues may give you a clue on what they are.

There is the “Intel Memory and Storage Tool GUI” that has diagnostics in it. (if you don’t want the GUI, there is a CLI version on this page also.)

miguell · June 15, 2022, 3:24pm

I also had crashes and other issues from missing/bad drivers, now I just use Linux and avoid all those and other issues.

I believe this is the correct link to snappy: https://sdi-tool.org

Voodoo · June 15, 2022, 3:27pm

Cool Thanks, I will give that a try. Normally I am very cautious about running any such tools on my farmer but since this comes recommend I will give it a try. Also good to see it’s an open source project.

Ah yes I can also give that a try. There might be a problem since this computer has seen it’s fair share of use.
I actually do have the some version of the RST software installed, it shows no errors on the disks. But this version does not give a lot of info anyway.

Voodoo · June 15, 2022, 3:29pm

Not sure, I just saw somewhere on reddit mention you want the “origin” version, because others are full of malware/bloatware stuffs
But the website @drhicom linked seems to have been mostly moved here:

I was on Linux for a while but it cost me so much extra time and effort to do stuff that I can do blindly on Windows that I gave up. Just don’t have the time to keep messing with it.

chiameh · June 15, 2022, 3:48pm

Usually a controller reset is indicative of a failing drive or power issue.

Try stress testing each drive one at a time and see if the resets always occur for the same drive. You might not see disk/partition issues logged yet because the controller is resetting due to some kind of hang.

miguell · June 15, 2022, 3:50pm

My link is well recommended, even on other projects, but who knows where this is safe to download anyway.

Voodoo · June 15, 2022, 4:04pm

yeah driver downloads have a -well deserved bad rep, pays to be cautious with them.

Found a few articles saying they struggle to find any real difference between SDI and SDIO and conclude that bith are fine and malware free.

Voodoo · June 15, 2022, 4:16pm

Another thing I’m thnking of is that the chipset is getting overheated maybe, could that be an issue?

HW monitor seems to only give the temps for the Nuvotron I/O chip and not the C602 chipset. It’s a very compact case and quite a lot of stuff in there.

chiameh · June 15, 2022, 5:20pm

Excess heat can certainly cause all kinds of weird issues, so I wouldn’t discount that possibility.

If things got too hot in there, maybe it affected communication with the drive(s), tricking the driver into a reset condition.

Historically, when I’ve seen host bus resets with Adaptec raid cards, they do it when a disk read/write request hangs for too long (because of a failing drive), and so the controller resets itself or a port to try to get things back in order. Of course, with a chronically failing drive, the resets will continue endlessly until the drive is replaced.

Also, these are “cheap” fakeraid controllers, so they may not be as resilient as an enterprise hardware raid controller.

Voodoo · June 15, 2022, 6:33pm

The thing is that the errors seem to occur across a bunch of drives, so that’s why I lean toward either controller or driver issue.
Although a drive failing would be better than the controller failing, the whole point I bought this thing is because it came with 14 sata ports

In anycase, thanks for the info. I’m gonna start trying some these things and see if can figure it out.
Worst case I just add another HBA or sata-expander but well that costs money again.

chiameh · June 15, 2022, 8:34pm

Totally, and the behavior and messaging is going to be controller and driver dependent, so things may not always be what they seem.

I have seen a single drive failure cause an Adaptec RAID card to request a full host bus reset. So in this case, one drive failure causes the whole card to reset. IIRC though, the OS could still see and talk to the other drives attached to the controller.

I guess what I’m saying is, the messages may or may not be helpful in figuring out the problem. The driver might reset the whole controller over a single drive failure, or maybe the entire controller is failing, and is trying to reset each disks to get back online.

Good luck finding the issue! Please do post back if you find a smoking gun.

chiameh · June 15, 2022, 8:38pm

Also, I’ve bought tons of these from Unix Surplus over the years for stuff. They’re good. Used to be ~$85 USD a couple years ago

You can plug 16-sata drives right in (will need power from somewhere though). Maybe this is easier/better than messing around with 3 different on board controllers

Voodoo · June 16, 2022, 8:30pm

Well I’m not declaring victory yet, but it seems promising.

I used SDIO to update a bunch of drivers → caused random system freeze, luckily I made a restore point.

2nd time I was a bit more selective and updated the USB drivers first, then the Intel AHCI controller and finally the Marvell controller.

Now when I start Chia I don’t see any lookup time warnings. Before when staring Chia it would spit out a whole bunch of those right away.

Will report back after it’s been running a while.

If this turns out to be a hopeless course I will go that way, but right now I’m still in the “I want to win this fight mode”. There is already a dell perc hba in the system so I could just add a sas expander to that if need be.

Edit: I also noticed that the Chipset get ridiculously hot, so I’m gonna think of a way to Macgyver a fan there as well just to be sure.

Voodoo · June 17, 2022, 5:47pm

Well this did the trick, thanks for the tip!

24 hours run now and no errors in either event viewer or Chia log. In fact my chia log is cleaner than it has ever been

Voodoo · July 6, 2022, 7:40am

Turns out you where right after all.

At first updating the drivers seemed to work but then problems started to return.

Later I switched to a completely different farmer and still was showing problems.

After digging through event viever I managed to identify the offending disk. This was not as obvious as you want, but after a few looks it stood out. Removed it and now problems are gone completely. It was one of three 14tb expension desktops.

Weird thing is though that scandisk didnt find any problems with it. Crystal disk shows good health as well.

I am now replotting it after a format and will see if it wants to play ball again.

drhicom · July 6, 2022, 11:34am

I did have issues on a disk months ago and after I formatted it and reploted all was fine. Thas why even when I would get an USB disk, I always would format it from that point on.

Voodoo · November 9, 2022, 2:11pm

I thought I’d leave the answer here for anyone who comes by.

it was the 7 port USB 2.0 hub I was using in combination with some kind of scheduled windows task likely something storage related (wasn’t ably to identify it). Took out the hub and all problems are gone now.
I use the hub now only for keyboard and mouse and had no troubles for months now.

Bones · November 10, 2022, 5:51am

So my issue solved you a problem? Great.