So I’ve been getting these errors the system event viewer (win10) for these two drivers.
They coincide with having an entry in my Chia logs “warning lookup quality on plot xx=too long”
The error in the system viewer is: Reset to device, \device\raidport1\, was issued (each time for different raidports.
I’m using a server mainboard from asrock: ep2c602-4l/d16
It has three storage controllers:
Intel (c602 chipset) 4 port SAS controller labeled “SCU” ports on the mainboard. (sata2).
Intel (c602 chipset) 6 port Sata controller 2x sata3 + 4x sata2
Marvell SE9230: 4 x Sata3
Now after a very long time I have finally managed to get them all working. It turned out that the Intel RST driver for the SAS controller was not included in the Chipset Driver package for some reason and/or not working under win10)
Installing the Intel driverset for the C602 chipset causes ramdom freezes, so I installed just the RST driver and this succesfully installed the SAS controller without completely breaking the OS.
But now I have “lookup time” errors popping up in Chia that I used to not have at all. These seem to be caused by a driver conflict or something.
This scanned my PC and found 124 drivers that were either out of date or needing installing. After this downloaded and installed them all I no longer had any crashes. 100% fixed. Why does Windows 10 need this and Windows 7 doesn’t, no idea, but it works!
Two possibilities (off the top of my head). If you don’t remember recently changing anything (adding software, changing drivers, adding additional hardware):
Windows updates installed something that is now conflicting with the driver/device. If this is the case, I would be interested to know if the driver installer @drhicom linked helps. Sounds like a nice product if it does its job.
Hardware trouble. There is a possibility (strong possibility) that what you are seeing is a precursor to controller failure. There are different ways to run tests. Am I reading your explanation correctly by thinking you do not have the RST software installed, just the driver? The software is not really that robust but it does show current status and if there are issues may give you a clue on what they are.
There is the “Intel Memory and Storage Tool GUI” that has diagnostics in it. (if you don’t want the GUI, there is a CLI version on this page also.)
Cool Thanks, I will give that a try. Normally I am very cautious about running any such tools on my farmer but since this comes recommend I will give it a try. Also good to see it’s an open source project.
Ah yes I can also give that a try. There might be a problem since this computer has seen it’s fair share of use.
I actually do have the some version of the RST software installed, it shows no errors on the disks. But this version does not give a lot of info anyway.
Not sure, I just saw somewhere on reddit mention you want the “origin” version, because others are full of malware/bloatware stuffs
But the website @drhicom linked seems to have been mostly moved here:
I was on Linux for a while but it cost me so much extra time and effort to do stuff that I can do blindly on Windows that I gave up. Just don’t have the time to keep messing with it.
Usually a controller reset is indicative of a failing drive or power issue.
Try stress testing each drive one at a time and see if the resets always occur for the same drive. You might not see disk/partition issues logged yet because the controller is resetting due to some kind of hang.
Excess heat can certainly cause all kinds of weird issues, so I wouldn’t discount that possibility.
If things got too hot in there, maybe it affected communication with the drive(s), tricking the driver into a reset condition.
Historically, when I’ve seen host bus resets with Adaptec raid cards, they do it when a disk read/write request hangs for too long (because of a failing drive), and so the controller resets itself or a port to try to get things back in order. Of course, with a chronically failing drive, the resets will continue endlessly until the drive is replaced.
Also, these are “cheap” fakeraid controllers, so they may not be as resilient as an enterprise hardware raid controller.
The thing is that the errors seem to occur across a bunch of drives, so that’s why I lean toward either controller or driver issue.
Although a drive failing would be better than the controller failing, the whole point I bought this thing is because it came with 14 sata ports
In anycase, thanks for the info. I’m gonna start trying some these things and see if can figure it out.
Worst case I just add another HBA or sata-expander but well that costs money again.
Totally, and the behavior and messaging is going to be controller and driver dependent, so things may not always be what they seem.
I have seen a single drive failure cause an Adaptec RAID card to request a full host bus reset. So in this case, one drive failure causes the whole card to reset. IIRC though, the OS could still see and talk to the other drives attached to the controller.
I guess what I’m saying is, the messages may or may not be helpful in figuring out the problem. The driver might reset the whole controller over a single drive failure, or maybe the entire controller is failing, and is trying to reset each disks to get back online.
Good luck finding the issue! Please do post back if you find a smoking gun.
Well I’m not declaring victory yet, but it seems promising.
I used SDIO to update a bunch of drivers → caused random system freeze, luckily I made a restore point.
2nd time I was a bit more selective and updated the USB drivers first, then the Intel AHCI controller and finally the Marvell controller.
Now when I start Chia I don’t see any lookup time warnings. Before when staring Chia it would spit out a whole bunch of those right away.
Will report back after it’s been running a while.
If this turns out to be a hopeless course I will go that way, but right now I’m still in the “I want to win this fight mode”. There is already a dell perc hba in the system so I could just add a sas expander to that if need be.
Edit: I also noticed that the Chipset get ridiculously hot, so I’m gonna think of a way to Macgyver a fan there as well just to be sure.
At first updating the drivers seemed to work but then problems started to return.
Later I switched to a completely different farmer and still was showing problems.
After digging through event viever I managed to identify the offending disk. This was not as obvious as you want, but after a few looks it stood out. Removed it and now problems are gone completely. It was one of three 14tb expension desktops.
Weird thing is though that scandisk didnt find any problems with it. Crystal disk shows good health as well.
I am now replotting it after a format and will see if it wants to play ball again.
I thought I’d leave the answer here for anyone who comes by.
it was the 7 port USB 2.0 hub I was using in combination with some kind of scheduled windows task likely something storage related (wasn’t ably to identify it). Took out the hub and all problems are gone now.
I use the hub now only for keyboard and mouse and had no troubles for months now.