So, we are now at 1 min K32 plots ... what next?

DjDemonD · June 15, 2021, 8:19pm

Definitely, the way this is going we are going to be creating K33s at 208.8GiB in size as a minimum plot size, although not for a while, just because a supecomputer cluster can do a 1 minute plot doesn’t mean it would be worth tasking that machine to this activity to win XCH beyond this demonstration/experiment, it wouldn’t pay for the machine time, unless XCH rose a lot in value.

Because the plot is a large file (not the much smaller files required to complete blocks in other cryptocurrencies), handling it is always going to slow things down. If you plot it in state of the art RAM then what? You can’t leave it there, you have to move it to some farmer machine somewhere. So 1 minute plots, aren’t 1 minute from start to farm, more like 1 minute to plot and at least a few more minutes to move it to a farm drive somewhere.

Yes you might receive a challenge and plot a plot to win in 30 seconds, but winning 2 XCH using a supercomputer is like winning £10 on the lottery and flying to collcet your winnings in a private jet.

Chia might be made GPU incompatible, or might not, but the speed at which you can write to disk to make plots (though you can now use ram), and then store plots will limit you.

ianj · June 15, 2021, 9:39pm

I thought that it was made clear from the above - you can hire pretty damn fast high processor count, high memory servers in multiples for fractions of a $ per hours. If you are able to create a plot on demand that is able to meet a decent number of challenges it is “game over” until they move up to K33, K34, K35 etc

GPU wise there are a number of ex data centre GPU available now with 24Gb for a few hundred $. They are not so good at mining (too slow compared with modern silicon) or gamin (no outputs) but still havce plenty of horsepower/memory - You can get data on an off them using 16xPCIE3. A GPU is a SIMD device where it can process data in “waves of threads” each carrying out the same operation but on multiple pieces of data in parallel. It is not completely arbitrary processing but it is not too far off. Imagine a few thousand chunks of data all being processing at the same time - not just a few dozen CPU threads. That is how a GPU can accelerate a task thousands of times - as long as the algorithm can be implemented and as long as you can meet the requirements of getting data onto and off the GPU

The question is whether the data is reasonably partitionable and whether it is IN THEORY possible to generate a plot on demand at all - if so performance will come. If you can stuff say 8 GPU in a machine to create and hold a plot long enough to meet a challenge, that is all you need - you don’t need to store it

I understand that plots are a series of pre-calculated solutions that must, in order to make the whole plot be relevant, involve a chain like calculation where you need the result of everything in the plot to make any one entry valid - and there is plenty of filtering, sorting and compressing to create the final plot

So imagine chaining the whole set of solutions (like a blockchain) with the result of the last solution fed back to the start and then re-encoding the whole set again with the solution of the last segment - like a ring.

The question would then be whether it is possible to segment that plot in such a way that a segment can be MOSTLY calculated in isolation and then the results passed into the next segment (hopefully a small amount of data)

It would be interesting to look at the internals of the mad max plotter to analyze the possible of segmenting the process and analyzing what data is passed between each segment. The usual technique deployed in mining algorithms (be in Ethereum, or CPU mining algorithms) is to design it to require a large working set of 1 resource - be it main memory, L3 cache etc

So far it has never been IF, but WHEN it is viable. And that is why the time available to verify plots is always gonna get shorter and the plots are always gonna get larger

roybot · June 15, 2021, 9:44pm

There’s an AMA tonight, 5pm Pacific that’ll touch on this:

Chia Livestream: Plotting speed and security in Chia
When: Jun 15, 2021 05:00 PM Pacific Time (12:00am UTC)
Where: Zoom or Youtube

ianj · June 15, 2021, 9:46pm

… nd that is why the time available to verify plots is always gonna get shorter and the plots are always gonna get larger , because the algo designers know what is coming …

Aspy68 · June 15, 2021, 9:48pm

is k=32 broken!? 15 minute plots with madmax and Intel Optane SSD

Excellent coverage of where plotting is at, where it may be going, and the effects on K32s.

ChiaMax · June 15, 2021, 9:48pm

Please wait until the AMA from Chia in a few hours, you are mixing up two concepts.

creating a plot that always win a challenge and block reward… this is only possible by breaking the math behind Chia and won’t be possible tech wise until quantum computing.
creating a plot that always passes the filter… even with the super computer this was not achieved and still multiples away. and then you just created a plot that acts like 512 normal plots (that pass filter 1/512)
pretty sure that storing 512 plots is much more economical than running a super computer non-stop to recalculate this 512x plot every 10 seconds.

ianj · June 15, 2021, 9:54pm

I write , in my day job, financial algo backtest simulators. If i can manage to drop some, or all of my calculation into L3 cache - my algo will SUDDENLY jump in performance by 5-10 times. Every time you cross an architecture barrier you get that sudden surge. In my case by pre-calculating as much data as possible in order to fit and looking up to the next high L3 cache processor be it Ryzen 9 (64MB) , Thread RIpper - 128-256MB, Milan (32MB/CCX, up to 256MB total)

Performance can bog down until a barrier can be breached and then leaps forward …

ianj · June 15, 2021, 9:56pm

That is why i stated my proviso

The question is whether the data is reasonably partitionable and whether it is IN THEORY possible to generate a plot on demand at all !

ChiaMax · June 15, 2021, 9:57pm

pretty cool

Let’s just chat more after the AMA, so we can have the guys who designed the math explain what a fast plotter means for security / bypassing the plot filter.

ianj · June 15, 2021, 11:03pm

I also understand the principle of one way hashing - you take a big number and generate a hash of it (another number calculated from it using some non reversible algorithm - usually leading to a smaller result)

The only current way to crack a good hashing function is to guess original numbers (or generate at random) , hash them and see if you hit your target

You are limited to how fast you can generate that hash.

chia seems to take that a step further by pre-calculating shitloads of these calculations and fitting as many as possible in 1 file - a plot. The larger the plot the more solutions have to be calculated to make that plot. The trade off is then the cost of accessing that calculation vs the cost of calculating all the solutions in a plot and then doing it a few hundred times to increase the odds/profit margin - you don’t have to “crack” the algo if you can brute force the economics

If you can optimise your use of a “computing architecture barrier” you can sometimes gain several orders of magnitude. If you can parallelize to use multiple cheap cloud resources effectively by minimizing data exchanged you can gain another few orders of magnitude. You don’t have to hit every plot - just enough to profit

Until a few smart cookies have “picked low hanging fruit” you really don’t know what affect 4 orders of magnitude can have on an “uncrackable” algorithm

That doesn’t mean chia the algo would be dead - just that there is another tradeoff entering the market - pre-calculated space vs real time “random” plot generation

To give you an idea of “low hanging fruit” i would give my own project as an example - MT4 backtests

Typically a “back test” using 10 years of “tick data” would take many minutes (sometimes hours if you are a poor coder) to run 1 pass

If i said i could run 1 pass of most algo in a 1-2 seconds PER CPU core you might not believe me - if i said i could go 5-10 times faster in some cases (hybrid bar/tick processing) you might say WOW ! If i said i could partition the test space so that i can effectively run in the cloud on 100/1000 instances whilst minimizing the exchange of data … suddenly real time 3d “surface plots” become a possibility

By attacking a problem on multiple fronts all it takes is a couple of breaks and that 4 orders of magnitude becomes more than a distant possibility

Do not underestimate ingenuity …

miguel · June 21, 2021, 3:48pm

Thanks for the explanation, now I understand better what multi-thread plotting could mean for the chia economics…

I just took a look at my old 1080Ti–it has 3584 flow processors and reasonably has the potential computing power to finish a k32 plot within 30s, which will always pass the 1/512 filtering–to do so calls for maybe utilizing system RAM as the buffer through PCIE3.0*16 with some further optimization in the buffer usage. That means its computing power equals 50TB of storage space–the latter is more expensive at the moment, however much less power consuming. But what if it is with more powerful future GPUs? Or even ASICs?

ChiaMax · June 21, 2021, 4:32pm

Chia will alter the plotting filter before K32’s are obsolete.
A super fast plot (not achieved yet) can pass the filter always, and is equal to 512 normal plots (that pass filter 1/512)

If Chia makes the filter 1/256 or 1/128 then the super fast plot is only worth 256 or 128 plots…
It’s not going to be economical to produce such a 256x or 128x plot every 10 seconds… versus just storing 256 or 128 plots energy wise

ianj · June 21, 2021, 4:46pm

The normal reason why an ASIC is not economical to perform such operations is the cost of fast memory. An ASIC pretty quickly becomes redundant with a tiny change in algorithm , so they only tend to exist for very mature chains. An FPGA is really the middle grounds as it is reprogrammable, yet faster than a GPU and you are able to get faster RAM (but not hundreds of GB) as time goes on - basically an expensive programmable ASIC

If you are attempting the ultimate task of real time plotting, once you have matched the plot you can throw it away - which saves outgoing bandwidth as well as storage, which leaves the question of what the lowest “working set” of memory (ie you have to keep it ALL in at the SAME TIME) is needed to create the plot

If you can tack together segments of a plot relatively economically then the working set is smaller. If you have to pass through the plots in multiple/many passes you need either to hold it all in 1 or more chunks of hardware and/or temp storage - a segment might map to the storage in a single GPU perhaps

Working sets can be considered at multiple levels .

The total working set to hold/calculate/build a single plot (several GPU perhaps)
A segment working set - say the capacity of a single GPU where you can do a useful amount of work without having to exchange data
A cache level working set - within a GPU you have tiered memory cache - smaller and faster chunks where you can perform faster but smaller operations if you can cut a job up into successively smaller chunks whilst minimizing the exchanges required between cache and segments

In the end it is economics - as the tech changes the boundaries change . What has happened historically is that the innermost computation capabilities have usually outpaced the ability to get work in and out of the computation units so squeezing more data into the innermost layer often offers the greatest rewards

ianj · June 21, 2021, 4:55pm

I don’t know whether the current plot creation mechanisms are memory bound or CPU bound

In my own simulation project doubling the amount of main memory helped a lot though clever caching mechanisms managed to keep most work away from disk/network. The biggest improvement was specific encoding, segmentation and processing of data, reducing the size of each unit to be processed, allowing 95%+ of the work to occur in L3 cache - which is many many times faster than main memory.

The result - a rapid 300% speed improvement

There will be a balance and we may not be there yet, but when it tips, it usually tips quickly