.db over smb or the like

jonesjr · April 8, 2022, 4:37am

Is it ok for preformance to move my .db. To external location. And share it over smb. Any suggestions on a network shared db??

DanRelfe · April 8, 2022, 7:26am

Never tried it, but I suspect performance might be poor and network packet rate high. You may find your node falling behind and unable to keep up with sync.

Give it a go, see what happens and let us know!

chiameh · April 8, 2022, 2:29pm

If you are sharing the database file from a Windows share and write to the database using a node running Windows, then technically, the SQLite locking mechanisms used in the Windows version are said to work over SMB (LockFile, LockFileEx, FlushFileBuffers).

If you do this, I would advise not to mix have a Linux node write to the database if shared over Windows, or vice-versa. You need to make sure whatever network filesystem protocol being used supports all the necessary locking functionality and that the implementations are not buggy.

Crank up your logs for a bit after and look for warnings about slow validation times and specifically warnings about slow syncing (blockchain DB on a fast drive) messages.

Jacek · April 8, 2022, 4:40pm

It is quite common in bigger setups to have code reside on one box and db on another. The resource requirements are different (db side is mostly about caching, where the code is all about CPU horsepower). Although, in those cases “db connectors” are used (e.g., to alleviate problems mentioned by @chiameh). (By saying “caching” I meant that db can work with extra dedicated memory, not so much OS level caching. I don’t know how good sqlite is at utilizing dedicated memory for caching.)

Also, when you look at the processing distribution / time spent, the code makes a simple one line query, where db side is going through potentially the whole db to extract relevant data and returns with just few bytes. This implies that the network overhead may be much smaller than the data extraction part.

So, it really boils down to how performant the box that will hold that db is, as it can go both ways.

Finally, as you have stated, if not tested, we will not know, and whatever results are, those may not reflect what others may see with their setups.

chiameh · April 8, 2022, 4:47pm

The thing about sharing the SQLite DB over the network is, the node is reading and writing to the file over the network using a network protocol, rather than speaking directly to the local filesystem through native disk accesses.

Having a network device read/write a database file directly over a network share is a very different thing than having the network device write to the database over the network using the database system’s own network protocol (e.g. MySQL over TCP/IP). In the latter case, a database server is still manipulating files directly and not over a network filesystem protocol like SMB, CIFS, or NFS.

Whether or not the added overhead of the SMB protocol in conjunction with Chia and SQLite will be interesting to see.

Jacek · April 8, 2022, 5:23pm

I stand corrected. (20 chars)

chiameh · April 8, 2022, 5:29pm

I wasn’t actually trying to correct anything you said, just pointing out some of the differences between DB writing over a network share vs talking to a remote database server that interacts directly with the filesystem.

I think your points are all valid. I worry less about network overhead here and more about the overhead the sharing protocol introduces (and whether certain required facilities like flock are available) over said protocol.

The good news is, you only have one DB writer, so there aren’t numerous network clients contending for exclusive write access over the remote DB file.

Jacek · April 8, 2022, 5:59pm

No worries, all is good We just sometimes get stuck in our silos and miss other obvious things. As you noted, I am used to work with mysql and use db connectors, so that is what is on my mind. I just completely blanked out that in this case not the sqlite engine, but just raw data is being moved (even though I touched on that).

With that said, I am on @DanRelfe and your side. Also, my take is that if node runs on a box (not RPi) that already has SSD/NVMe, and has some decent amount of RAM, accessing db data over the network may not really help much, if at all. The RPi may be a bit different, as it has some i/o bandwidth limitations (I think) and not that much RAM in some cases (e.g., 4GB, so swap file is heavily used).

So, let’s wait for test results.

jonesjr · April 8, 2022, 6:36pm

Thanks for all the knowledge, I love to learn, and this place is great
Anyway;

My use case is kinda particular to my setup but…
hyper covered 3 proxmox cluster.
Each physical server has a harvester, Ubuntu lxc.and plots connected to the harvesters.
I only harvest with harvesters. (My full node doesn’t have any plots directly connected.)
And 1 chia full node that moves around between the 3 physical servers. Automatically.
Problem is that the windows chia full node vm is now 200 gb…. And is a pain with only a 1gigabit connection to the other servers.

So the idea is to move the db to a random ct with a net share.

Than Share it in proxmox among the floating chia full node. So that it can migrate that in half the time.

From windows full node…. I may be able to pad the db with a ram chache. So the preformance won’t suffer as hard. And will slowly dump the cach to the db over smb.
But I thought that chia sorta did this Allready? Why els does my bum use 32gb ram for chia when left on for days lol.

Jacek · April 8, 2022, 6:52pm

I guess, the first thing to do would be to upgrade to the latest v1.3.x version, and upgrade your db. This way, bc db will be around 50GB and wallet 15MB. (Looks like you could make a snapshot of your node VM and work on that backup copy to get through the upgrade process.)

I assume that your servers are in the same room, so put them on the same switch. Better, get a faster Ethernet (2.5Gbps or 10Gbps).

Lastly, SMB is kind of heavy protocol, so maybe NFS over UDP would perform better (if you still need to separate your db).

So, maybe your question is not really whether you can gain something (speed, …) by putting that db on a different box (as most likely all boxes have the same specs), but rather whether you can prevent it from too much performance degradation when separated. Stil, you are talking about convenience of moving VMs fast (from time to time) compared to node performance (24/7).

chiameh · April 8, 2022, 9:48pm

This all makes sense, but I wonder if it might be faster/easier/better to run one full node per physical server all the time. You don’t have to live migrate a VM over the 1Gbps link, just fail over your harvesters to another node’s IP address.

This is something you can uniquely do in this situation because the Chia DB isn’t a single source of truth that needs to be replicated in order to migrate. You can just run multiple copies of it all at once and use a different method to fail over your harvesters to another running node.

Technically, you need all that storage space available on each server at any given moment if you expect live migration to work correctly, so my thinking is why not just have them all running and being in sync 100% of the time?

It is possible to have multiple full nodes running in different physical locations using the same keys, with harvesters at both physical locations harvesting separate plots. So with that being possible, there is no issue failing over your harvesters to another full node with the same keys.

What do you think of that idea?

DanRelfe · April 8, 2022, 10:54pm

This is exactly what I do.

And this.

It all works beautifully.

jonesjr · April 10, 2022, 9:06am

brilliant problem solved. thank you for the wisdom and time.

id still enjoy testing the smb db. in a nice VE lol

when I find the time ill post those results back here for sure.