1000 Conversations about AI

Last year I met over a thousand new people either working in or on AI. I have been writing on my blog Artificially Intelligent about these conversations and today I’m sharing a summary of some of the learnings I’ve gathered and beliefs I have as a result.

I’ve spoken to engineers, researchers, and business leaders about AI at many of the top AI companies, including:

Before we dive in, something I’d like to acknowledge is how divergent many of these experts were in their predictions for how this rapidly changing field will evolve. Where this lack of consensus is particularly acute, I try to surface differing perspectives and provide my synthesis.

Take these perspectives as idea-starters and not as anything concrete as everything here is subject to change.

We’d love your thoughts on the perspectives shared below and would love to hear from you if you’re building in this space: casey@squarepeg.vc.

On AI today

It’s more than hype. What we saw with LLMs over the past 12 months represents a genuine inflection point in capability and not another hype cycle.

For more on what makes this wave different, see my prior writing on the Square Peg blog here.

We’re seeing net-new use cases. This inflection point in capability has real applications for improving the value propositions of businesses.

AI is relevant to every industry and every layer of the technology stack.

This may have a multiplicative effect on technological progress. An improvement in AI capabilities at one part of the stack could improve every other part of technology.
Depending on their generalisability, technological improvements in AI could yield much broader technological progress benefits than other technologies in the past.

Enterprise adoption is weak relative to where it could be for both more traditional AI and generative AI. As adoption rises, so will startup revenue and returns for investors.

Many enterprises are taking to building with AI with real enthusiasm, but much of what has been built so far is in the proof of concept stage. There are a variety of challenges for enterprises adopting generative AI in particular that should be addressed over the next few years, such as deploying stochastic systems in production environments and concerns around data sovereignty.

Although the Australian government could show a deeper interest in growing Australia’s AI capabilities, Australia will benefit more in this wave of AI than in the past because there’s less of a need for specialised talent to utilise the latest and greatest AI technologies.

In the past you needed more than good software engineers to build great AI - you needed data scientists, data engineers, etc. Today it’s easier than ever for software engineers to work with and integrate AI into products thanks not just to LLMs but programs/platforms like PyTorch and Hugging Face.
Note: there is some contention around the idea that businesses that simply plug into AI models using APIs are “AI businesses”. I don’t find meaningful value in drawing a hard line of distinction.

The rift between top talent in AI who believe in open source and those who are indifferent may come to an even greater head than what was seen in Sam Altman’s “firing” as concerns grow about the centralisation of the best research, talent and resources.

I’m told that there’s a lot more disgruntlement about the centralisation and privatisation of the best research than most people hear about. This could lead to great talent splitting off and starting new foundational model players.

On differentiation and moats

Today’s environment for building a startup is more competitive than ever.

With every new technology wave comes lower barriers to entry, thanks to the ability to utilise other products (like cloud compute, for example) to build on that didn’t previously exist.
Decreasing barriers for building software means faster and cheaper time to market for new entrants.
This not only enables new entrants but enables incumbents to innovate faster than ever.

Distribution matters more than in the past.

Considering the lower barriers to building software, distribution and GTM become more important.
Consider this quote by Alex Rampell of A16Z: “The battle between the startup and incumbent comes down to whether the startup can reach distribution before the incumbent gets innovation”.
If it’s easier for incumbents that already have strong distribution to reach innovation, startups have less time than before to build out their distribution.

Founders that focus on product and GTM over model performance will be more successful

Given the pace and relative openness of AI research, in the case of most businesses/founders models will improve regardless of what they do and hence they should focus on building a product customers love and want.
The hope is that businesses will continue to be able to substitute in more performant models as they’re released, anyway.

Cost is an avenue of short-term differentiation against incumbents

Incumbents usually benefit from economies of scale meaning their variable costs are often lower than other players. However, they also have to scale their variable costs across a larger set of customers. This means their absolute costs for working with any solution that has high variable costs (like LLMs) is dramatically higher than a startup’s with a smaller customer base.
This allows for some arbitrage in which a startup could offer LLM-based tech to every customer while scaling, whereas an incumbent may find that difficult to stomach, especially in an economic environment that punishes higher capital outlays, and can only offer LLM-powered features to a smaller portion of customers.
That said, we’re not enormously worried about unit economics of companies leveraging foundation models in the long term as we expect these costs to decline dramatically (if not for the most powerful models, certainly for the kinds of models companies will want to leverage).

Data is increasingly an overhyped moat, especially in products reliant on LLMs

Data is important and valuable. Proprietary data can be especially valuable. In my experience, however, people overestimate the value of data from the perspective of building a competitive advantage or moat.
Frequently I hear from founders of LLM-based applications that they will develop a data moat by collecting and leveraging customer data. The problem is that the type of data they’re going to collect almost always looks like the kind of data the base LLM is trained on so that data is unlikely to yield them any kind of advantage over their competitors as it’s not going to translate to performance increases.
Furthermore, even if more or better data leads to a performance increase, that doesn’t guarantee that it leads to a noticeably better product experience.

OpenAI’s ChatGPT marketplace could post a real risk to startups more broadly.

If people build their product on top of ChatGPT, although they currently don’t request products to share their data for training, their growing market power/control may enable them in the future to ask for it (without much pushback) because products will be reliant on OpenAI as both their distribution channel and “tech stack”.
In contrast to some of the commentary online, the marketplace is not a distraction from their core vision as it helps them understand human workflows and that’s necessary for AGI / training models.
Of course, the marketplace does serve as a great opportunity to build and quickly distribute an MVP - but that’s a short-term advantage in the face of a longer-term risk.

On product design

LLMs will reset the baseline expectations for user experience.

Companies whose sole value proposition depends on what an LLM can deliver will end up delivering a “baseline” experience and quickly be out-competed.
Founders should be wary of whether they are pitching something 10x better, or if they are pitching the new baseline (which used to be 10x better, pre-LLMs).
Ask yourself: what percentage of what’s exciting about my product for my customers comes from publicly-available AI?

Chat is probably the wrong interface for many use cases

I have been told that 70% of Youtube views come from recommendations, instead of search. This is relevant to how Google is thinking about chat-based interfaces: they too have recognised that most people don’t want to think of what they should ask, but simply want to be served the right content at the right time.
We are looking for founders with insights that others have not yet come to. This is not because we want people to be contrarian for the sake of it, but because quality, unique insights can put you ahead of the competition.

There could be a faster cycle of bundling and unbundling.

Generally, with each new tech wave, there's a period of unbundling (where products that were previously offered in a single solution become multiple pieces of software) followed by a rebundling (where they come back together into one horizontal solution). This is a phenomenon that could occur even faster in the case of generative AI because the base technology is becoming more horizontal at the same time. This has a multiplicative effect: where the driving force behind rebundling is usually the maturity of builders working with the underlying tech, the tech itself is also enabling faster generalisation.

Foundation models may be even more generalisable than we think, and that poses risks for certain startups

Many startups are building their product on the belief that by building their product with domain-specific data or domain-specific models they will build a differentiated product. Whilst I believe there’s a meaningful opportunity in vertical-specific AI products I worry about the impact of models improving in generalisability.
An LLM is not a product but an engine for features. There is a risk that products that rely heavily on LLMs to provide their product experience will become redundant as models become more robust.
See this paper, for example, that claims that GPT-4 can achieve superior performance over med-PaLM against certain benchmarks simply with better prompting.

In the short term businesses will continue to use centralised, closed AI models to prototype before moving to open source.

For many applications, it doesn’t make sense to use the most costly, largest models for long. Cost optimisation is critical in this capital-raising environment.
More sophisticated technical teams are building architecture that looks like the Mixture of Expert structures (as Bard and GPT-4 allegedly do), in which they have one highly generalisable model acting as an orchestrator between other models that are more task or domain-specific.

Companies offering end-to-end AI software are struggling the most with cost-effective scaling, due to the low technical readiness of enterprises.

As with existing SaaS, founders should not over-optimise for a small number of early customers. Large enterprises typically have heavily bespoke needs that can distract founders from building products that scale to other customers.
Investors are highly wary of companies that describe themselves as product-led tech companies but realistically act like product-enabled consulting firms.

On model size and foundational model development

The best researchers expect that there is more to come from scaling models.

The general belief from those at the largest firms (Meta, DeepMind, OpenAI, Amazon) that more scale (more parameters) = more powerful models.
There’s still a lot for researchers to test here: more GPUs, more optimised chips, faster interconnect bandwidth, better algorithms etc.
In addition to expanding today’s models, progress towards more multi-modal models (including video) could massively increase model sizes, because of the computational intensity of working with video.

While all of this experimentation with larger models happens within the walls of the largest tech companies, in industry there’s a huge emphasis on working with smaller models to bring down costs.

Recently at NeurIPS, Bjorn Ommer made a passionate case for why we should focus on building smaller, more efficient models. He accompanied his commentary with this statistic: In the last 5 years, we’ve seen a 15x increase in model size. That growth outstrips the growth in compute power by a factor of 9x. It’s unsustainable to keep focusing on scaling.

Whether the future is one large model (or one large mixture of experts) versus smaller or open-source models is not as important a question for founders.

Founders will have to largely accept what’s available to them in the market. What’s more important is finding PMF and then you can rebuild your stack as appropriate. If you reach PMF you’ll have the resources to do so.
My perspective is that both will occur at least in the medium term: businesses will develop an architecture that combines models that are most appropriate for their use cases and cost sensitivity and the largest businesses (who wish to try and capture some of AGI’s value) will continue building massive models until AGI is achieved or internal conviction in doing so declines.

There are real gains to be made from cross-discipline adoption of AI; currently, we’re limited by creativity and research pace.

For example, combining stable diffusion with fMRI.

The issue of factual recall of LLMs is overrepresented as a risk.

See commentary from former Bard researcher Jamie Hall here.

Models may replace software in the future, with agents as an interim version of this future state

Rigid rule-based algorithms may be replaced by models that internalise an understanding of how products should perform and use the resources available to them to execute on that.
This may emerge as “agents” in the early days as the role of agents is to understand an individual’s task(s) and help achieve them.
Some researchers see this as difficult to imagine, given today’s foundation models vary in output in a way that’s not predictable. This isn’t necessarily referring to “hallucination” in which answers have no factual grounding, but simply that these models can provide different answers when asked the same question.

On hardware

Moore’s Law coming to an end may be the catalyst for deeper experimentation with other types of chips because it’s now financially sensible to do so.

Some are optimistic about NVIDIA’s grip on the GPU market lessening as people build products that allow for the circumnavigation of NVIDIA’s CUDA software.

But, it will likely continue to be difficult to challenge NVIDIA on their GPU lead, given the time it takes to design and productionise new chips.

Some are taking on the challenge of building LLM-specific chips, but this is challenging as the technology (and hence chip design needs) is changing so fast.
NVIDIA is accounting for LLM architecture in their next chip, but they’ve also shared that they’re not focussing specifically on building for LLMs as they need their chip architecture to adapt to a broad range of tasks.

Apple may emerge as a surprise dark horse in the race given their hardware expertise, especially as AI moves to the edge.

They are also well positioned given they recently started producing their chips (moving away from outsourcing to Intel).

—

So, now you have it - a rundown of some of the learnings that I’ve spent the most time thinking about over the last 12 months.

I’d love to hear which takeaways resonated the most or which you disagreed with. I’d also love to hear what you’ve learnt that’s surprised you or if you’re working on something related to AI or data. Get in touch: casey@squarepeg.vc

‍

Casey Flint