I am not really sure which part you think I don’t understand, or you explained differently. To me, you have just restated what I said based on an output from a slightly different tool (looks like is using a different way to pair processes).
You are banging on me using “background processes” with a pejorative connotation, but I didn’t say that that is something good or bad (of course, both types are needed, and neither is either superior or inferior). I rather stated that for whatever reason those start_full_node (SFN) processes slipped outside of the main Chia container, and therefore they look like orphaned processes. Again, I stated that I see this behavior for the first time, and don’t really have a good explanation for it.
However, when I shut down chia right now, the main container closed in a few seconds, but those three SFN processes didn’t shut down. I waited a couple of minutes but at the end gave up and killed them. Looking at the blockchain folder, db was not closed properly.
This to me further supports the notion that those processes were orphaned, or rather that the daemon process (that owns them) didn’t bother to wait for them to shut down or somehow lost track of them. Actually, knowing that the main SFN task owns those additional ones, I first killed it. However, that didn’t kill those other two SFN processes. I had to manually kill all of them individually. Further implying that those processes really don’t communicate well among each other (thus are becoming “orphaned” - doing whatever they want, not really what they should be scheduled to do). Although, those are started as independent processes, and it was a forceful termination, so maybe not really that much supporting that notion.
I restarted chia, and all processes are included in that Chia container as normal. After some wait (all processes still under the Chia container) I shut chia down again. This time all processes exited nicely, and db was closed properly. Apparently, the inter-process communication was working as intended this time. Further supporting the notion of orphaned processes.
I have to say that I am using the term “orphaned” liberally. I mean that the daemon process looks like lost track of those SFN processes, nothing more. The fact that they are still working, is as expected, as SFN on itself deals only with peers and blockchain syncing, so really doesn’t care whether other processes are there or not. However, the main SFN process keeps ports open for other processes (farmer, wallet) to communicate if they are around. If those other processes will be gone, the SFNs will just merrily proceed with syncing, not paying attention to system events like request for a shutdown (i.e., potentially crashing dbs).
Kind of a long shot, but maybe this is the problem that is causing those syncing problems. Maybe the fact that those SFNs are “orphaned” implies that the synchronization for handling incoming transactions by those sub-SFN processes is busted, as such there are those timeouts, and everything goes out of whack. Although, it would be really a lucky shot if this would be the case.
Also, as you have noticed, all those nodes get some little tasks to do. However, what I saw during the last dust storm was that only the main SFN process was choking one core. All the other SFN processes were basically doing nothing. Same as you, I would expect that the only reason those additional SFN processes are there is to partition the work, but for whatever reason this is not the case. Still, I am on v1.2.10, so maybe it is a bit different in v1.2.11.
Actually, reflecting on what @bones said, he saw about 13% CPU usage. He has 8 logical cores, so that 13% kind of makes 100% of just one core. Even if we see multiple cores active, it could be that whatever process (SFN ?) is spawning multiple threads; however, is serializing the work (waiting for each thread to finish before starting a new one), thus ending up with 100% of combined one core load. Maybe the next time that happen, he can monitor separate cores, not the combined usage (and eventually switch to Process view to nail down which process(es) is (are) hosing CPU).
At least that is what I see, and how I would read it, but of course I am also happy to read how you are interpreting it. Also, all of that is just reflecting on what you wrote, so I didn’t sleep on it yet (not that I will).