At the moment the AI lines up in a data space txt and image data where it clips together the base of the image then with that space they use random noise to integrate finer detail.
Rather than working in one space you work little spaces which determine holistic image sprays. So in the first stage a rough and noisy space is sprayed onto relative to text and detail traits modifying sprays.
The next stage the bot resprays the space formalizing more detail traits in the space. Then in stage 3 the space is regularised with spay can space and stage 4 the space is formalised then you have finer detail spray stages. The final image result can then be upscaled and improved on the cloud and you can modify the mid process to get the layout to your liking.
This brain and library would be quicker more memory efficient very concise with lots of picture requirements but would have issues like have to take time compiling new picture into the library if you want to use stuff in the picture and a less exotic output covering less of a dynamic style.
Because the AI works with a compiled data trait library and detail regulation.
I suspect you could use a slight clip and reclip on the output image later to make the output convey style and dynamic better but in the early frame that's what your sacrificing some picture dynamic for better artistic control.