My understanding is that for a single harvester, we cannot get multiple partials (I was scanning my logs for it, but never found more than 1; but an absence of proof is not a proof of absence). Not sure about multiple harvesters, maybe it happens on those setups, so that will be a concern there. Although, 10k diff for a 3060 ti card means a handful of partials per day and not that many eligible plots in total, so that would be really an extreme case to only fight multiple partials per challenge, especially on a single harvester setup.
My concern is more about challenge processing “spillage”. When you check your GPU, it looks like a square wave form. The more saturated the GPU is the less off time a card has. The problem comes from randomness of eligible plots found for every challenge. What it implies is that some challenges will have more eligible plots (with no proof) that can be processed in one time slot (9.75 sec or so), and the processing will spill over to the next time slot, basically pushing the results close to 19 secs. The more loaded the GPU is, the more often you will see two time slots overlapping, and 3 slots overlapping will happen also often (and so on, as this is a compounding effect). Once you start seeing 3 slots overlapping, you are in 27 secs processing regime, what means that if you have a partial to process during this overlap, it has a high chance to be over 27 secs (stale). Those 3+ overlaps (for my understanding) are where partials are going stale.
This goes back to what I wrote before. My guess is that #2 (processing eligible plots that don’t produce partials) is dominating the GPU, but higher diffs most likely are lowering the processing needs on per plot loads (allowing more plots with less overlap).
If one doesn’t care about the GPU power usage, and the card is not fully loaded, the diff level is really not that critical, as there are not that many (if at all) 3 challenges long overlaps. Again, 3 challenges overlap does not imply 27 secs partial processing time, as the found partial has to be coming from the first challenge.
By the way, I was looking at processing times vs the number of eligible plots, and eventual plots with proofs found. Of course, in general, the more eligible plots found, the more GPU cycles are needed, but the correlation is not really clean (sometimes more eligible plots per challenge takes less time than a challenge with less eligible plots found). Also, for eligible plots with a partial, time processing was not linear with the number of eligible plots per challenge (meaning that those partials really don’t add that much to challenge processing, implying that optimization is also happening on eligible plots without proof level).
At least, that is my read about how to look at those diffs with respect to what is happening on the GPU.
So, my take is look at what is have going on on the GPU, lower the GPU power, check back on the GPU. If there are not that many overlaps, bump up the diff, and go back to lowering the GPU power, … Therefore, I like better specifying the number of plots / hour, rather than the diff level (which has different meaning for different farm sizes / cards processing power.
The diff is basically a derived value from proofs per challenge (for the mainnet), and not really easily human digestible. It can be seen as a different / indirect representation of netspace size. Somehow, for pools the diff was initially used as a term, and we just got stuck with it.