As far as I can find out, there was only one use of GPUs prior to alexnet for CNN, and it certainty didn’t have the impact alexnet had. Besides, running this stuff on GPUs not CPUs is a relevant technological breakthrough, imagine how slow chayGPT would be running on a CPU. And it’s not at all as obvious as it seems, most weather forecasts still run on CPU clusters despite them being obvious targets for GPUs.
Okay, so some of the advances that chatGPT uses (consumer GPUs for training) are even older? 😁