https://a16z.com/llmflation-llm-inference-cost
Since the launch of GPT-3 in November 2021, the cost of LLM inference has plummeted by a factor of 1,000, with predictions suggesting a continued decrease of approximately 10x per year. This decline is attributed to several factors, including advancements in GPU cost/performance, model quantization, software optimizations, and the emergence of smaller, highly efficient models.
Models achieving high performance (MMLU scores) have seen significant price drops, with the cheapest model achieving an MMLU of 42 costing only $0.06 per million tokens, compared to GPT-3’s $60. While the future rate of price decline is uncertain, it opens up new commercial possibilities, making applications like voice assistants and text-processing tools much more accessible. Is this trend beneficial for the AI sector? Will it encourage innovation and new use cases? Or will it be a race to the bottom?
Leave a Comment