Ok so the size of the bitwise data is a little bit bulkier not more than double though.
each core is CISC in nature it has a big set of little functions that process upto 6bits of data at a time.
most of these instruction's reconfigure in scratch the out come of a smaller set of base functions in 2 pipe stages the instruction is set then it is performed all in one cycle.
The bit codec it works with is a 4 bit codec that also accounts for 00 being a break point.
I call this break point coding.
So when doing a multiplication function shall we say it finds quick to hop to in bits preliminary information relative to say 24 bits and the 16 bit instruction aligns the 6 bit instructions on the extracted bits then new preliminary work is done and the final instructions, 2 cycles or with more pipe stages 1 cycle and to join up stages for bigger numbers could be done in parallel in a similar way.
Addition would be similar.
However there would be a good few extra speed up options for some math areas with known and not yet known transcendental numbers and their digit data which can be stored to some adequate digit precision
This CPU could perform 180TFlopps double precision real terms 150 Watts. So that's 1 Tflop to the watt and 1 Exa Flop to the mega watt. It'll probably be 2026-2028 until a 12-17 MWe 1 exa flop of double precision super computer arrives which will be more like 14 Exa flops of low precision and mixed precision at about 1.6nm node however this will likely also include 10,000's of optical accelerators and maybe some other AI accelerator chips taking the range closer to 100 Exaflops.