Once upon a time humans were the computers.
well 40 compact villages in 5 main locations could house 2 million people every 2 years.
Well 2 million people could fine tune an AI problem requiring a highly efficient 5 - 400 GB data set.
How are 2 million people all going to be that technically advanced and obtainable. They do not need to be they just need to use their natural intelligence in tests and questions and collectively they could be able to produce a very efficient data set for later inferencing in many AI problem areas sometimes you just need to fit the issue to more people not less.
First though the AI has to be capable enough to work effectively enough with 2 million humans over 2 years working on and off.
Where does the new data come from.
We naturally produce new data all the time we are the kings and queens of new intelligent data , AI is moving in through the efficiency cracks.
The human race is equivalent to 4000 of these human data compiler efforts every 2 years but not lets say we're not that efficient working with machines.
Lets imagine we had 2000 of these camps and because it's all hypothetical i'm not wanted by the UN.
over just 2 years of this effort of man.
20 of them and you have refined the datasets so well we are ready for autonomous robots and vehicles every where.
1000 (120TB) of them and we'd have some big data sets well optimised that can allow for humanoid robots which work well in every language.
10 (0.4TB) of them and everything would upscale 4k and beyond with a very efficient video and 3d codec well focusing the possibilities most common best known and represented across the full dynamic of videos.
40 (6TB+) of them and you would have the ultimate AI human interface.
400 (1.7PB) of them and there would not be hardly a fault wrong with planning for nature and risk management.
25(most of england) (0.8TB+) of them and it could learn to think for itself if trained in a more general way.
I hope this puts things into perspective.
what about by the time this next generation of sd card are maxed out so 9 to 18 years from now will the datasets generated by pushing machine efficiency and working with less people produce anything like the perfection and human touch to the data sets all those people would.
over 18 years thats 0.38 billion people helping the AI learn will it be that focused an effort. Over 22 years and accounting for other gains that's 0.15 billion people per year a sort of side dish activity, over 28 years that's 0.065 billion not enough activity in the sector.
That should give you some perspective on how those datasets relate to computing power and prior human development as time moves forward.
So by 2050 we'll have those kind of developed datasets but by 2030 probably not quite as good you could say.