Plenty of LLMs in the sea

A comprehensive guide to understanding and navigating the diverse landscape of Large Language Models, using an aquatic metaphor.

AI for Personal Productivity


This blog post is based on the PromptBros selection of models. You can test all of them by creating a Free account.

Navigating the Sea of LLMs: From Whales to Remoras

Some 5 months ago during Build 2024, Microsoft CTO Kevin Scott used whales and other underwater creatures to describe the growth of compute for model training. I'm "stealing" the metaphor to try to shed some light on the recent and current LLM landscape.

For most people not involved with AI on a daily basis, all of it can sound a bit daunting and complicated. So many models, context windows, sizes and flavors... and yet everyone pays for a ChatGPT subscription!

I will try to break it all down using these 4 metrics:

  1. Price - How much does the inference cost per token. It can be viewed as energy consumption or model efficiency.
  2. Number of parameters - measures the number of transformer parameters of the Model. The higher it is, the higher the model's reasoning.
  3. Size of context window - how big can a chat get without the model losing track of it, or how big is the "short term memory" of the model.
  4. Reasoning and Bias - Directly related to the number of parameters but also how much bias and limitation were imposed by the developer.

Based on these criteria we have the following selection of models:

The Whales

Models with 100B+ parameters and 128k+ context windows, usually very expensive, with high reasoning capabilities but also very biased. Even though some of these models are less than a year old, they are starting to be replaced by more efficient and performant models.

  • OpenAI GPT 4.0 and 4.0Turbo: Still the largest and highest reasoning models in benchmarks.
  • Anthropic Claude 3 Opus: The largest Anthropic model, 300B and 200k context. All Anthropic models are biased by design.
  • Google Gemini 1.0 Pro: Very biased but also the biggest model available, with a whopping 600B parameters.
  • Perplexity Sonar Huge Online and Large Chat: They come in Online and Chat, i.e., with or without online search support and are based on fine-tuned versions of Llama 3.1.
  • Mistral Large (soon Large 2): The smallest of the list with 128B, but also the least biased. The next evolution is supposed to be much more advanced.

The Sharks

The "Frontier" contestants, smaller (70-100B) but more advanced in reasoning and efficiency (less expensive) with big context windows (128k). This new generation of models takes advantage of new data optimization techniques, like synthetic data to improve overall performance.

  • Anthropic Claude 3.5 Sonnet: My current daily usage LLM.
  • OpenAI GPT 4o: Probably the fastest and one of the most intelligent models in the industry.
  • Google Gemini 1.5 Flash: The faster, lighter version of Gemini Pro.

Synthetic data is artificially created by LLMs or algorithms based on real-world data sets. This data type is widely recognized for its ability to train machine learning models, reduce biases in data sets, and navigate ethical and privacy concerns surrounding real data.


The Dolphins

Open source, or the ones you could run on your computer. Usually 70B models with varying contexts and little to no restrictions. They are the most versatile models and serve as the basis for an increasing number of specialized models.

  • Meta Llama 3.1 70B: The new open source flagship from Meta.
  • Mistral Mixtral 8x22B: Mixture of experts, leverages up to 141B parameters but only uses about 39B during inference.

The School

The Small, for those who are concerned with energy consumption by the rise of AI, under 15B. They are developed using LLM Distillation techniques.

  • OpenAI GPT 4o Mini: Brand new but already one of, if not the best all-around small model.
  • Anthropic Claude 3.5 Haiku: Small but maintains the 200k window of the bigger brothers.
  • Mistral Nemo: A new open-source 12B model built in partnership with Nvidia.
  • Meta Llama 3.1 7B: The best you could reasonably run on your PC.

LLM distillation is a technique that uses a large language model (LLM) to train a smaller model to perform a specific task. The smaller model, known as the "student model", is trained to mimic the behavior of the larger model, known as the "teacher model". The result is a smaller, more efficient model that can be used in resource-constrained environments.


The Octopus

With the introduction of GPT o1 models which excel in tasks that require deep reasoning, particularly in STEM fields, we are entering a new era of LLMs with PhD-level reasoning capabilities. They use a chain-of-thought approach, mimicking human problem-solving processes which, while making them very efficient in solving complex problems, make them exceedingly slow and very expensive.

  • OpenAI GPT o1: The first of the new generation of models with high reasoning capabilities but still slow and very expensive.

Chain-of-thought reasoning entails breaking down a problem into smaller, logical steps for the LLM to solve in sequence. First, the Model identifies the key parts of the problem. Then it processes each part in sequence, considering how one step leads to the next. Each step builds on the previous one, allowing the Model to methodically move toward a logical conclusion.


The Remoras

These are specialized models that are built on top of the base models, usually by further training them with specific data sets like Code and Math.

  • Mistral Codestral (and soon Mathstral): Specialist coding (and mathematics) models. My new systems-architecture and coding assistant.

Conclusion

As we dive deeper into the vast ocean of LLMs, it's clear that the field is rapidly evolving, offering an exciting array of possibilities for users across different needs and sectors. Whether you're looking for raw power, efficiency, specialized capabilities, or open-source flexibility, there's likely a model that fits your requirements.

The diversity we see in the LLM ecosystem is not just a testament to technological progress, but also a promise of more tailored, efficient, and accessible AI solutions in the future. As these models continue to improve and specialize, we can look forward to even more innovative applications that could revolutionize how we work, learn, and interact with technology.

Remember, the best model for you depends on your specific needs, resources, and ethical considerations. Don't hesitate to explore and experiment with different options to find the perfect fit for your projects or organization. The future of AI is bright, and you're now better equipped to navigate it!

Feel free to contact me if you have any questions or need help with your selection. Happy exploring!