AI adoption in mobile apps is no longer a question of whether — it’s a question of how well the foundation underneath it holds. At AIM Conference 2026, organized by FunnelFox and Adapty in New York, four practitioners discussed what that foundation requires: unified data, coherent metrics, and a clear-eyed view of where AI creates value and where it amplifies existing problems.
The session brought together:
- Alec Velikanov, Co-Founder & CTO of NewForm;
- Olga Berezovsky, Analyst in Residence at Data Analysis Journal and former Head of Analytics at MyFitnessPal;
- Yves Benchimol, CEO & Co-Founder of WeWard;
- and Lucas Lovell, VP of Product at Paddle.
Below are the key insights from their conversation covering AI adoption maturity, data infrastructure, and the organizational changes that come with AI-enabled development.
AI adoption: state of play
Olga Berezovsky works with 25 products simultaneously — from small apps to large platforms — which gives her a broad view of where the market stands. Her breakdown is direct:
- Roughly 50% of companies talk about AI but do very little with it.
- Another 40% don’t build their own systems but take advantage of AI built into the tools they already use — FunnelFox, Amplitude, Hex.
- The top 10% go further — they implement MCP (Model Context Protocol), a standard that lets AI models connect to and work across all their tools and data sources as one unified system.
What that looks like in practice varies significantly depending on whether you’re building a consumer product or infrastructure.
Building for consumers
At WeWard — a mobile app with 20 million users across 30+ countries — Yves Benchimol thinks about AI adoption in three layers: what’s already in production, what’s being scaled, and what’s still experimental.
- In production: customer support is largely automated by AI agents, running for over six months. Agents handle analytics queries that previously required dedicated data team time. Translation across 30+ markets is fully automated.
- At scale: AI-generated creatives, tested by country, by local news cycle, by topic.
- The experimental layer is where it gets interesting. WeWard is testing an agent that takes bugs reported through support, creates the ticket, and pushes the fix to production, removing engineers from level-one bug resolution entirely. And non-engineers are already shipping features.
We have already two features that have been pushed to prod by non-engineers. The code has been produced and realized via vibe coding, which is very innovative for a six-year-old app.
Building for businesses
Paddle — 350 people, engineering team of 100+ — has followed a different path. Lucas Lovell describes three distinct phases:
- individuals using LLMs personally to do their jobs better
- rolling out enterprise versions of those tools across the org, connected to internal knowledge sources
- enabling every Paddle employee to build their own agents
We have very widespread LLM adoption across the business, and a lot more agentic use cases are starting to grow.
But the more consequential shift for Paddle is what’s happening on the customer side. Customers are increasingly interacting with Paddle through other platforms using Claude to build implementations, which means Paddle needs MCP in place to support that. And sitting on an enormous payments dataset, the next step is using AI to surface growth opportunities for customers directly:
As a payments platform with an enormous dataset, it’s then about how do we use AI to provide more value to our customers.
The through-line across both cases: AI adoption isn’t a single initiative. It’s a stack, and the companies moving fastest are the ones building it systematically, layer by layer.
If your data is messy, AI will scale the mess
AI requires unification across all of those sources, and that’s where the data warehouse comes in — BigQuery, Snowflake, MySQL. Not Mixpanel, not Amplitude.
Those tools can play a role, but only if you’re feeding them marketing, payment, and messaging data, which gets expensive fast. A data warehouse keeps costs manageable and gives you a single place from which to personalize, automate, and build reporting.
A data warehouse also solves a subtler problem: who owns the definitions. What counts as an active user? What counts as a signup? In a fragmented data ecosystem, every tool has its own answer. But these definitions need to live in-house, and from there, everything else follows.
So, which data tool to pick?
The answer is scale-dependent:
- For smaller mobile apps, BigQuery is the default recommendation — free to set up, pay only for what you use, scales cleanly with the business.
- For enterprise, or anything touching finance or healthcare, the choice is often made for you: regulations dictate where data lives and how it’s processed, so you end up on Postgres, Azure, SQL Server — or in some cases, Oracle.
| BigQuery | Postgres / Amazon | Azure / SQL Server / Oracle | |
|---|---|---|---|
| Best for | Small to mid-size mobile apps | Mid to enterprise | Enterprise, finance, healthcare |
| Cost model | Free to start, pay per query | Predictable, scales with usage | Enterprise licensing |
| Setup | Easy, no upfront cost | Moderate | Complex, often mandatory by compliance |
| Choice | Yours | Yours | Often dictated by regulations |
Hallucinations are a data problem
The question of hallucinations in consumer-facing AI — an agent that offers a free subscription to a user who was about to cancel, for example — is real. But hallucinations are mostly a data problem, not a model problem.
If your data is messy, AI will scale the mess — it’s going to be terrible for your revenue, for the engagement, for everything.
At WeWard, AI is used for personalized communication, and the measure of success is real-world impact: the number of steps users take. That’s the mission.
But before that becomes possible, the data foundation has to be solid. Plugging AI on top of uncertain data just amplifies uncertainty.
AI should never be a black box
The second problem is organizational. As AI gets embedded into product decisions — recommendations, messaging, content — teams need to understand why the AI made a particular choice.
Today AI can justify why it took this choice, show you the chart, show you the reason, the parameters it utilized.
That visibility changes what’s required from both sides of the team. Engineers need to become more product-minded to understand the downstream impact of what AI does. Non-engineers need to understand the reasoning behind AI behavior. Without that mutual understanding, the risk isn’t just bad decisions — it’s bad decisions made at scale.
For WeWard, this also shapes how they use Amplitude. Rather than treating it as an external data source, they send their own events to Amplitude, which means they fully control what data it receives and what it shows. That makes the Amplitude agents genuinely useful for analytics and decision-making, rather than a layer of abstraction on top of data they don’t fully own.
How AI changes the cost structure
Deploying AI features globally runs into a problem fast: the cost of running an LLM is the same whether your user is in the US or in Turkey, but the revenue per user is not.
At WeWard, the math doesn’t work for most of the user base. Millions of daily active users, monetized through a combination of paywall and ads, which means revenue per free user is a few dollars a month. At that level, scaling an LLM to every session isn’t viable.
We don’t use an LLM to personalize every user session — the cost doesn’t work at our scale. If you have millions of daily active users, the AI bill grows too fast.
On-device models (Apple’s embedded models in particular) could eventually make per-user AI economically feasible, but for now, AI at the consumer layer works when the revenue per user justifies it. Yves’ example: Whoop, where the device price point and daily engagement make per-user AI costs manageable.
When your cost base becomes variable
But the regional pricing problem is just one part of something bigger. As more products are built on top of AI models, the underlying cost to run those products becomes variable, and that variability creates a margin problem that gets worse as you scale.
You might have a user on a $20 plan who costs you a lot less than a user on a $10 plan, because the user on the $10 plan is a power user despite being on the standard plan. And the guy on the premium plan just uses the product once a week.
That margin variability affects how predictable your business looks to investors, and how you build toward profitability. The consequence is a slow but inevitable pressure on flat-based pricing in consumer:
I believe that this idea of stable flat-based pricing in consumer is going to slowly die out as we start to see models that are much more consumption-driven. It’s not because flat-based pricing doesn’t work — it’s because as a business you need to somehow drive margin stability and predictability.
That said, this won’t apply equally across all categories. Variable pricing makes most sense where the value metric is tied directly to usage, which is already what you see with AI products and LLM-based tools. For apps where usage and value aren’t tightly linked, the pressure will be lower.
The open question is predictability for consumers. People don’t want to pay for everything on a variable model — they want to control their budgets. That tension is likely to produce pricing models that haven’t existed before: structures that absorb variable underlying costs while still giving consumers a sense of what they’ll pay.
I think we’re going to see a proliferation of pricing models that haven’t existed before to strike that balance between variable costs and consumer predictability.
When users become your cost problem
A writing app lost close to a million dollars in a month after users discovered they could extract tokens infinitely through a gap in the product.
The response to problems like these starts in the same place as any cost management problem: baselines and benchmarks before you scale.
You start with what you currently have — your approximate understanding of your usage, of your cost, of your compute. And from there you do tests. Right now, with AI, it’s actually much faster to run different variations — and you can end up with a band, the range where is the highest potential spend, lowest potential spend — and from there you can pick the right model path where you want to go.
That band feeds into every layer of the business — marketing strategy, product strategy, communication strategy — and ultimately into finance.
How AI is reshaping monetization and where it breaks down
Teams apply AI across the entire monetization journey right now:
- Trial-to-paid conversion: ingesting behavioral signals from the app to predict whether a user is likely to convert before the trial ends, then surfacing the right offer or discount at the right moment.
- Paywalls: adjusting what users see dynamically based on their behavior — which quiz path they took, what they engaged with, how they move through the product.
- Churn prediction: extracting engagement data to flag users likely to cancel before they do.
This isn’t entirely new. Showing different paywalls based on quiz navigation has been around for a while. What AI changes is the sophistication of the underlying models — the signals you can ingest, the predictions you can make, the scale at which you can personalize.
The silo problem
The bigger issue is what happens when these models don’t talk to each other.
You might get a 5 or 6% bump in checkout conversion. But over here, there’s an AI model running on churn prediction. And what people are failing to see is the relationship between those two things.
A model optimizing checkout conversion finds a way to capture more users, but they turn out to be low-intent. Churn goes up, and the gains net out.
You found a way to capture low-intent users, but all they do is fall at the bottom. It’s just wasted effort because it all nets out at the end of the day.
The solution is coherence across models, but that’s still work in progress across the industry.
One more constraint worth noting:
The App Store limits how many pricing plans and paywall configurations you can run natively. Web-to-app removes that ceiling:
The App Store limits how many pricing plans and paywall configurations you can run. Web-to-app removes that ceiling — and as dynamic pricing becomes more central, that’s a big reason why web-to-app is going to keep growing.
AI wins, failures, and one case that’s both
Where AI delivers
The clearest wins are in speed. Tasks that used to take weeks now take hours or days.
For data scientists, model building has been transformed. Selecting the right model for forecasting or user clustering — testing 20 variations, evaluating accuracy, precision, recall — used to take two to three weeks. Now it takes a day.
Time to value is a very different concept right now.
Hypothesis generation has shifted similarly. Analyzing data, finding signals, writing up proposals for the product team, and going back and forth on prioritization takes an hour with AI.
At WeWard, the most tangible result is autonomous bug fixing and non-engineers shipping features to prod.
At Paddle, the biggest win has been onboarding. Compliance checks in a heavily regulated industry used to take four to five days, but AI agents now fetch and process that information automatically.
We’ve reduced the time it takes to onboard onto Paddle from days down to minutes.
Where AI breaks down
Visualization is still a weak spot. AI handles code well, but struggles with drag-and-drop BI tools and chart formatting. The output is technically correct but not presentation-ready.
When I present a chart, I need to have the text, X and Y axis a particular way — and I have this output which I can never use, and I have to redo it myself.
Benchmarking is another area that hasn’t cracked yet. Two consumer apps, same category, same revenue — but checkout conversion of 23% for one and 4.5% for the other. The difference is buyer intent, which a payments platform can’t directly observe. AI can process the numbers, but can’t fill in what it can’t see.
How you net all of that out and provide benchmarking that’s actually valuable for companies is really difficult. We’re still working through that one.
Double-edged case: when shipping becomes too easy
The same capability that’s the biggest win is also the biggest risk. When everyone on the team can push features to production, the product ships faster, but consistency becomes harder to maintain.
Everybody has great ideas, but the consistency of your product can become more and more complex, because you will ship way more features.
His view on the fix: the ratio of product managers to engineers needs to shift. Today, it might be one PM for five or six engineers. As AI makes every engineer more productive, that ratio needs to flip closer to one engineer for four product people. More shipping capacity requires more product judgment to go with it.
There’s also a subtler version of this problem in new products. The constraint of limited engineering capacity used to force teams to choose exactly what to focus on. That forcing function is now gone:
Sometimes the fact that in the past you were not able to develop so many features at the same time was part of a good forcing function for people to choose exactly what they want to focus on.
How to open up AI coding to your whole team: WeWard playbook
Getting a legacy company — built before AI coding existed — to actually adopt it requires more than tooling. At WeWard, it started with a clear mandate: everyone has to do this, and they don’t have a choice.
AI has to be in the hands of the best people. If you put AI in the hands of someone who is not good, it can be very dangerous.
The rollout went in steps:
- The best people in the company got a challenge: push something to production next week.
- Then a company-wide pitch: every employee, engineering or not, has to start using AI, and they’ll have dedicated time for it in their daily job.
- Then a two-day hackathon — ship something for the business, with an unlimited token budget.
Yves made it a game or even a competition. The person who spent $1,000 in tokens became a legend — everyone wanted to know how they did it, and then wanted to hit the same limit themselves.
On the technical side, the whole company runs on Anthropic’s enterprise plan. Everyone gets access to the full codebase, but CLAUDE.md files define what they can and can’t do. They work with a sample of non-sensitive data. And if someone doesn’t know where to start, they ask Claude Code to walk them through everything, including how to open the terminal.
