The main work of an LLM is multiplication (roughly 1/parameter/word produced) Newest high-end LLMs have ~1 trillion parameters Newest high-end GPUs perform ~100 billion multiplications per Joule (FLOPS/Watt = floating-point operations per second / Watt = multiplication / Joule) 10J/word