Benchmarking LLMs performance

Market share for generative AI products will be influenced heavily by user perception

May 16, 2023

7.3K views

5 minute read

The battle between the various large language models (LLMs) is heating up. I previously wrote about the big tech titans battling the area of AI supremacy, mainly Google vs. Microsoft (via OpenAI) and also the past, present and future of tech wars.

The market share of chatbots, and LLMs (that will later on be sold to enterprises) will be heavily influenced by user perception. That perception is being cemented today is a function of three main characteristics:

Distribution – who can access more users faster
Quality – which product provides better results
User experience – everything around usability, including down time, hallucinations and workflow integrations

Microsoft vs. Google

So far, it seemed like OpenAI was clearly in the lead and as result, it enabled Microsoft to quickly push ahead with AI features in Office 365, Teams and Bing Search. ChatGPT became the fastest growing consumer product and new functionality rolled out in short succession.

However, that narrative seemed to have taken a bit of a dent following Google I/O 2023 last week. Google is throwing everything at AI and positioned the company to make AI its top priority. Google announced it is releasing Palm-2, an LLM to rival GPT. It’s multilingual (100 languages), can do math and reasoning and it is able to code.

Google also opened up BARD, Google’s chatbot assistant and ChatGPT competitor, to everyone (apart from a number of limited countries where Google cannot comply with its regulation). One of the key differences between BARD and ChatGPT is that Bard is connected to the Internet and can provide real time results. ChatGPT is an incredibly powerful tool, but its trained on data up to September 2021 and is only now starting to connect to the Internet via Plugins (the feature is being rolled out slowly only to Pro subscribers).

Google didn’t stop there. The company announced is also releasing new AI tools in Gmail. Google Workspace (docs, spreadsheets, etc), Google photos (to enhance images with AI, detect images that were created by AI in image search), and perhaps most importantly, revamping the search experience and enriching the results from its classic 10 blue links to an AI-powered experience called AI-snapshots.

Open Source vs. Closed LLMs

Google and OpenAI are the two leading players in the LLM space. Google’s LaMDA is a factual language model that can generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way. Microsoft’s Turing NLG is a generative language model that can generate text, translate languages, and write different kinds of creative content.

But the battle between LLMs isn’t just between the tech giants. Open-source LLMs, such as Meta’s LLaMa, MosaicML, Vicuna-13B (an open-source alternative to GPT-4 which reportedly achieves 90% of ChatGPT’s quality) or RedPajama (which released a 1.2 trillion token dataset that follows the LLaMA recipe) are gaining popularity due to their flexibility and affordability. But asAI researcher Andrej Karpathy has identified several challenges that the open-source LLM ecosystem still faces, including the high costs of pre-training base models.

Oops haven't tweeted too much recently; I'm mostly watching with interest the open source LLM ecosystem experiencing early signs of a cambrian explosion. Roughly speaking the story as of now:

1. Pretraining LLM base models remains very expensive. Think: supercomputer + months.…
— Andrej Karpathy (@karpathy) May 6, 2023

I asked BARD to compare open source vs. closed LLMs and while it looks like Open-source LLMs are the clear winner it’s not entirely straightforward.

Closed-source LLMs have several advantages over open-source LLMs. They are typically trained on larger datasets, which allows them to achieve better performance. They are also more tightly integrated with their respective platforms, which makes them easier to use. However, closed-source LLMs are not as transparent as open-source LLMs. This makes it difficult to understand how they work and to identify potential biases.

Open-source LLMs are typically trained on smaller datasets, which limits their performance. They are also less tightly integrated with their respective platforms, which makes them more difficult to use. However, open-source LLMs are more transparent than closed-source LLMs. This makes it easier to understand how they work and to identify potential biases.

The Pepsi Test for Benchmarking LLMs – ChatBot Arena

Chatbot Arena is an underrated tool in the space of LLM benchmarking – A ‘Pepsi taste challenge’ for comparing LLMs.

How does it work?
Users put the same prompt into two boxes and it feeds it to random LLMs (you don’t know which). You get your output back and you rate which one is best in a blind test.

The result is a leaderboard of LLMs which reveals surprising outcomes. For example, the new Claude model from Anthropic is indistinguishable from GPT-4. Take a look at the latest leaderboard table (still doesn’t include Google’s PALM-2).

High competition and no moat is giving investors jitters

According to the State of AI Q1’23 report by CB Insights, during Q1 2023, AI startups raised $5.4B, a 66% drop from the previous year’s figure and a 43% decline quarter-over-quarter. This was largely unexpected given the supposed favourable market conditions for AI startups.

My belief is that voices are growing louder about the potential bubble in AI where most players, including Google, admit to having no moat.

Another potential reason to the decline in AI investments is the rising prices, especially in a market that has seen most startup valuations (especially later stages), decline in price.

Prices for generative AI startups continue to climb

As the number of generative AI startups grows exponentially, partly due to the increase in models and APIs made available for the application layer of generative AI, established categories like copywriting for marketing or document summarisation are getting more crowded. Many of the companies competing in these crowded markets will have to pivot or shut down if the cannot become market leaders.

With that said, I believe there’s still a huge opportunity for generative AI startups operating in niches. Rather than rely on a single API, these startups would be wise to combine a number of models and adjust their products to the workflow preferred by their users – connecting to existing tools and adding automation to the way things are done today.

LLMs are here to stay

The battle between the various LLMs is likely to continue for several years. It is too early to say which approach will ultimately be more successful, but there’s a lot riding on it. User perceptions are being cemented now as LLMs continue to expand and power more use cases.

Regardless of which model turns out to be superior, it is clear that LLMs are a powerful new technology with the potential to revolutionise the way we interact with computers. At Remagine Ventures, we are following this space, as well as the application layer of generative AI, very closely and looking to continue investing in cutting-edge teams.

Author
Recent Posts

Follow me

Eze Vidra

Co Founder and Managing Partner at Remagine Ventures

Eze is managing partner of Remagine Ventures, a seed fund investing in ambitious founders at the intersection of tech, entertainment, gaming and commerce with a spotlight on Israel.

I'm a former general partner at google ventures, head of Google for Entrepreneurs in Europe and founding head of Campus London, Google's first physical hub for startups.

I'm also the founder of Techbikers, a non-profit bringing together the startup ecosystem on cycling challenges in support of Room to Read. Since inception in 2012 we've built 11 schools and 50 libraries in the developing world.

Follow me

Sign Up for Our Newsletters

Get notified of the best deals on our WordPress themes.

Moral and Ethical Concerns for Generative AI (guest post)

Generative AI

byAmit Revivo

May 15, 2023

2.9K views

4 minute read

Fall in love with the problem, not the solution, Uri Levine

Uri Levine's new book is a reminder for founders to put the problem first

byEze Vidra

May 17, 2023

3.0K views

4 minute read

The Latest

Generative AI: Opportunity or the end of media as we know it?

Cutting Through the AI Noise: How Startups Can Stand Out in a Crowded Market

Tel Aviv is world’s most productive tech ecosystem according to new report

Benchmarking LLMs performance

Microsoft vs. Google

Open Source vs. Closed LLMs

The Pepsi Test for Benchmarking LLMs – ChatBot Arena

High competition and no moat is giving investors jitters

LLMs are here to stay

Sign Up for Our Newsletters

Moral and Ethical Concerns for Generative AI (guest post)

Uri Levine's new book is a reminder for founders to put the problem first

Get new posts by email:

Benchmarking LLMs performance

Share this:

Microsoft vs. Google

Open Source vs. Closed LLMs

The Pepsi Test for Benchmarking LLMs – ChatBot Arena

High competition and no moat is giving investors jitters

LLMs are here to stay

Share this:

Related

Sign Up for Our Newsletters

Moral and Ethical Concerns for Generative AI (guest post)

Uri Levine's new book is a reminder for founders to put the problem first

Related Posts