The technique does not rely on deep learning.
Question how do you take an image with a limited colour bit depth and pixel resolution and turn it from a more discrete representation into a more continuous representation. You could use vectorization with a super computer but how would you achieve it on a basic computer.
Well you could convert the image into 4 2d point concentration images. So your points don't have to be in perfect pixel positions they just represent the concentration of red green and blue then for the forth point concentration image each point has an amount of tones equal to the precision your doing the graphing in and this frame represents detail now when upscaling a 3d rendered image you could tweak the graphing with extra data and alter the emphasis of the detail to remove the rigid angular polygon effect where the render is trying to render a more curved looking less polygon look.