The CEO and co-founder of the AI audio company ElevenLabs, Mati Staniszewski, recently told an audience at TechCrunch Disrupt that the firm’s core models for voice and audio generation are on track to become commoditized in the near future, despite currently representing a key competitive edge for the company. According to Staniszewski, as generative voice-and-audio technologies mature and proliferate, differentiation will increasingly shift from model quality alone to broader product ecosystems and multi-modal integrations (e.g., combining voice, video and large language models). He indicated that while ElevenLabs will continue to invest in model R&D in the short term, the long-term strategy pivots toward application and platform depth as standalone audio models become more interchangeable among providers.
Sources: Yahoo News, AI Base
Key Takeaways
– Even market-leaders like ElevenLabs acknowledge their core audio-model technology is likely to lose its premium status, becoming a commodity.
– Competitive advantage in the AI audio sector is expected to shift from “model quality” toward “platform value” and multi-modal product integration (voice + video + LLMs).
– For firms and investors, the window for earning outsized returns from proprietary voice models alone is narrowing; strategic focus should move to downstream uses, ecosystems and business models rather than model-race alone.
In-Depth
In this rapidly evolving age of generative artificial intelligence, the pronouncement by Mati Staniszewski, the co-founder and chief executive of ElevenLabs, carries both candour and signal. He essentially admitted what many in the industry quietly believe: while the creation of high-quality AI audio models remains a major technical barrier today, that barrier is eroding—and the true battleground of tomorrow lies not in models, but in how you use them.
For context, ElevenLabs has built its reputation on delivering breakthrough voice-synthesis capabilities—voices that sound natural, emotionally expressive, multilingual, and ready for commercial deployment in everything from audiobooks to videogames. But at the TechCrunch Disrupt 2025 event, Staniszewski told the audience that those very models will gradually become interchangeable commodities: “Over the long term, they will commoditize—probably within the next few years,” he said. Even if certain voices or languages maintain small differentiators, the gap will shrink.
What does that mean? In practical terms, it suggests that high-fidelity audio generation—once a premium capability—is heading toward becoming a baseline expectation. If you hire one vendor or another, you will get “good enough” audio. The margins for being “just a little bit better” will diminish. That changes how value is captured. Rather than winning on model precision alone, firms must win through how a model is embedded in an application: the live integration, the ecosystem, the user experience, the product framework around it. Staniszewski explicitly drew a parallel to how Apple rose by bundling hardware and software rather than winning on component quality alone. Similarly, today’s voice-model firms must ask: can we serve the use case, own the workflow, provide the product not just the piece?
From an investment and strategic perspective, this is a major cue for how to position oneself. If models will become commoditized, the cost to enter is lower—so barriers to entry are falling and competition will intensify. Firms that rely solely on “we invented the model” risk margin erosion. The bigger differentiator becomes what you build around the model: the enterprise contracts, the integrations, the services, the brand, the go-to-market muscle, the data network that supports proprietary usage, and the platform lock-in. For example, a company that offers voice-generation integrated into a broader content-creation suite or embedded in a vertical (education, media, telecommunications) may hold much more value than the model itself.
For ElevenLabs specifically, Staniszewski’s comments suggest the company is already shifting toward a phase where its vision is about more than the model. As per the reporting, the short-term still privileges model development—if your audio sounds bad, you’re dead in the water—but the long-term horizon is different. Multi-modal integration, strategic partnerships, and product layering become the competitive engines.
For those in tech strategy, business development, or investment thinking, here are a few pragmatic implications. First: if you are entering the voice/A I audio space, expect accelerating competition and shrinking margins. Model IP alone will not guarantee differentiation. Second: value capture likely occurs around application, not raw generative horsepower. Firms should consider vertical strategies (specific business uses), platform lock-in, data ecosystems, customization, and integration rather than chasing marginal model improvements. Third: the timeline matters. Staniszewski’s projection—“next couple of years”—is a wake-up call. Firms that do not evolve their business model may find themselves competing in a commoditized marketplace sooner than they anticipate.
In short, we’re witnessing a shift in generative audio where technical novelty is transitioning into operational expectation. The voice model is still an access ticket today, but tomorrow the real play will be: what you do with it and how you tie it into a broader system. For the conservative strategist, this means realising that sustainable advantage will not come simply from doing what everyone else can do, but from building the ecosystem around it. The countdown to commoditisation is ticking—those who retool now may thrive; those who ignore it risk being squeezed by a market where “good enough” becomes the norm.

