Unstable system with Sata Risers

overheat is a problem, I bought some aluminium tools to help cool the chip, I have 5 copying job from plotter to 5 disks, 75 -80 each, max at 500 MB/s

ehh I dont know.

I knew I was somewhat close to my office rooms electrical capacity. But it does not seem to be related to brownouts. Have not have a longer outage yet and the system keeps running fine. It seems to happen while the system is running (no system failure). At least when I do a copy job, I find the copyjob breaking after some time. I can’t use the drive at this point. I can unmount the drive but upon remounting I realize, it’s block is corrupted. The next restart will fail if its trying to mount the corrupted drive.

It still happened even on sunday, and after working hours when no big machines on the campus may cause problems. Also I do not really believe that it’s related with my psu. I have a 1600 watts psu with 2 * 12v rails and plenty of power on those rails.
Last time I did a large copy job, I specifically cut all mining activities so there should be plenty of headroom with electricity.
I had the issues even with only 5 disks connected.

I think I had some success converting the full drives to read-only. But time will tell if they still corrupt.

So left points for me are:

  • Sata expander is overheating
  • Pcie connection to expander is loose
  • Expander is shit (But I have 4 of it and others here are running it fine)
  • Sata cables are shit

@kuangzha may I ask you what you did to support the chip cooling?
I have made a relatively precise model of the expander card:

It appears that it will be hard to stick something on the main cooler itself. The fan is going out a little.
I might get a different cooler but havent found one yet. Additionally, It might be helpful to glue some small aluminium heatsinks on the smaller chiplets.
Lastly I might be able to design and manufacture a different, lager cooler. But at this point, Its probably cheaper to switch to different expanders alltogether.

I have 1 of these cards, and was convinced it was causing a problem. In reality I had a bad power supply cable. Plots would get stuck, Windows Event viewer kept telling me that this or that drive had a problem, I’d run all sorts of disk utilities that found no issue, the problems would move from drive to drive. It was the power cable. Replaced the cable a few months ago and the card has been working flawlessly since.

1 Like

okay. Course of action is determined:

  1. Build own designed rig

  2. Have additional psu in rig

Might take some time but ill get back how it went.

In the meantime i might also experiment with 5v downsteppers.

Interesting finding:
I have some harddrives from my previous windows operations which I still have to migrate. I had many issues with the file transfer process. It would often fail.

I have added a pause of two minutes between each plot copy.
the file copying starts at ~200mb/s (*2 because of two drives = ~400 mb/s)
At first I did not notice anything significant. But now after 6 hours, I see that each plot is transferred slower than the one before. I might be wrong but to me that indicates overheating and throttling of some element.

For testing I just canceled after the last block, waited 2 minutes and restarted the job to see how it will continue. I may increase the waiting time from 2 Minutes to a higher value to see whats happening.

1 Like

Are you running windows or Linux?

This is normal. The platters of a hard drive spin at a constant state, but it will be faster to write to an empty disk than an almost-full one. When you first start filling a hard drive, you are writing to the outer sectors of the spinning platter, which is moving under the head faster than the sectors closer to the center of the platter. As the drive fills up, you are getting closer and closer to the center of the platter and thus slower and slower. It isn’t uncommon to see large differences in transfer speed between the first copy and the last!

Thats why we all need 20TB SSD’s

1 Like

Ubuntu server 20.04

@enderTown in theory you are right but I do not think it would be noticable to that extend (several percent per plot). I can’t confirm fully yet as this is to another disk (3 plots left, so almost full) but can tell more either by late evening or tomorrow, when transferring the rest to the drive shown in the post above.
copy times2

I didn’t think it would be that noticeable either but I was surprised - especially on large drives, the transfer speed can drop as much as by half from the first plot to the last plot! Of course, normal people don’t notice this, it’s only us crazy farmers that are writing massive contiguous files lol

hmm still going down in a simmilar way. But I would not say it is the Disk diameter. When I wait and fully restart the copy job, the speeds are back to normal again.

I do not think its cache related o something. Maybe something with robocopy. So back to the beginning:
PSU and / or bad connection