"In June 2018, OpenAI introduced its first GPT (Generative Pre-Training) large language model. Trained on massive amounts of unlabelled text corpora and leveraging breakthrough Transformer generative deep learning architecture, GPT-1 made short work of complex language understanding tasks.
In February 2019, the deep learning community welcomed the new and improved GPT-2, whose 1.5 billion parameters made it 12 times larger than the original. Then, this spring, Open AI rolled out the GPT-3, a behemoth packing 175 billion parameters.
As the size of deep learning models continues to increase, so does their appetite for compute. And that has Neil Thompson, a research scientist with MIT’s Computer Science and Artificial Intelligence Lab (CSAIL), concerned.“The growth in computing power needed for deep learning models is quickly becoming unsustainable,” Thompson recently told Synced. Thompson is first author on the paper The Computational Limits of Deep Learning, which examines years of data and analyzes 1,058 research papers covering domains such as image classification, object detection, question answering, named-entity recognition and machine translation. The paper proposes that deep learning is not computationally expensive by accident, but by design. And the increasing computational costs in deep learning have been central to its performance improvements."
Read the full interview here.