I just stumbled across something pretty cool and I don’t see it mentioned anywhere else on the forum after a search.
I’ve got several machines plotting and they all transfer their plots over the gigabit network to the storage server. It takes about 15m at 1gbit if only one is copying, but if all of them start trying to copy, it can take over an hour for each one to finish.
My first thought is to upgrade to 10gbit, but that involves a lot of hardware upgrades and the gear is still quite expensive compared to 1gbit gear. So I did some research on “nic teaming” where you can use multiple network adapters to share the load. I purchased this cheap USB 1gbit adapter from Amazon and installed it today:
I used the Powershell command New-NetSwitchTeam to team the new USB nic with the existing nic on the storage server. Seemed to succeed! But I noticed it did not help copy times at all. This is where I learned that my 1gbit network switch is just a dumb “unmanaged” switch and it doesn’t support “link aggregation.” This needs to be supported by both the client and the switch apparently. So I’m back to upgrading my switch.
But not so fast! Dejected, I removed the nic teaming with Remove-NetSwitchTeam but I didn’t actually unplug the new USB adapter. To my surprise, I watched both adapters start sharing the incoming file transfers! I saw peak speeds of about 1.7gbit/s. What??!?
Turns out Windows 10 uses SMB3 which has a built-in “multi-channel” feature. Apparently, SMB3 clients tell each other that they have multiple nics available to increase file transfer speeds!
I might need to buy a few more of those cheap USB adapters and see how far we can push it!
Is this a win for the Windows side or is SMB3 multichannel supported on Linux too? I’m a Linux newb…
Hmm. It was my understanding that this sort of teaming of multiple 1 gigabit ethernet ports only helps when you are sending data to multiple computers at the same time.
But if you are observing copy times over 111mb/sec (the 1 gigabit limit) then that’s proof that it is working… perhaps you can share a screenshot of the copy?
Of course! Here’s one I just snapped as 2 of my plotters were each copying a plot to the storage server. Notice “Ethernet” and “Ethernet 2” speeds. The X: drive (plot storage drive) at 100% activity, so I’m likely limited by that drive at this point. But definitely over 111mb/sec!
Btw, huge fan since your early blog days and shameless plug of my SO profile in case somebody wants to give me a vote to get me at least 4 more points to 4k! User Josh Painter - Stack Overflow
The other interesting thing is that the load is shared even when a single copy is taking place. In that case, both nics show ~480-500Mbps. On the plotter (client), I see the full bandwidth of the nic being used (~950Mbps+). I think this means that if I buy another cheap USB nic for the plotter, both sides will share ~2gbit? Again, I’ll be limited by the server drive at that point, but then it might make sense to create a SSD “landing” drive on the server to ingest faster!
Assuming both ends of a link are configured with a bonding protocol (e.g. LACP) they’ll negotiate a higher aggregate speed, but packets are distributed across the link not by using round-robin (which we don’t want here since that would cause out-of-order delivery on streams) but by using a hash to select the link.
It’s the choice of the sender as to what algorithm to use… e.g. on this Cisco switch it’s configured with:
port-channel load-balance src-dst-mac
but on this Arista over here there’s a whole slew of options such as (picking two):
port-channel load-balance trident fields mac src-mac dst-mac
port-channel load-balance trident fields ip ip-tcp-udp-header
On top of that, for this to have worked in the first place the senders on both links would need to balance across links:
Windows would need to balance packets across links when sending to the switch
the switch would need to balance packets across links when sending to the receiver
For this to work all the stars would have to align perfectly, unlike SMB Multichannel:
This is a great way of getting around the 1Gbps limitation; by doing the load balancing above the Data Link Layer and handling it in the Session Layer (no, I’m not getting into an argument about at which layer that’s happening) we shift the complexity away from the physical network.
So to summarize… if you plug two 1 gigabit ethernet connections on a Windows 10 machine into the same switch… it will “just work” and you’ll get 111mb/sec × 2 copy speeds?
(Let’s assume for the sake of argument, and to simplify things, that the copy target has a single 2.5gbps or 10gbps connection already.)
Maybe not full x2 because of the law of leaky abstractions, but surely close. At least that’s how it worked for me (even after I tried to outsmart it with Powershell)!
Also I assume this only works for SMB file copies obviously. We aren’t getting free “real” TCP/IP link aggregation, like out to the Internet (or even internally). This would only work for traffic using the SMB3 protocol over internal LAN (maybe VPN? - @Supermathie please correct me if I’m wrong here?) But luckily that’s exactly what we need!
And sorry one more detail in case it isn’t clear: BOTH sides must support SMB3 protocol. In my case, I just have all Windows 10 Pro/Home machines. One of the Pro machines acts as a file server. I’m guessing it would also “just work” with a recent version of Windows Server. I’d love to hear from a Linux guru to see if there is a way to use SMB3 multichannel - I know it supports some version of SMB. I’m kicking around the idea of upgrading one of my plotters to LInux to learn more and this would seal the deal!