Skip to main content
Blog
Founder note May 21, 2026 7 min read

AI Lowered the Barrier, but Raised the Bar

When everyone can build, the difference moves to judgment: can the product survive the messy combinations real customers bring?

The Market Got Crowded

AI did not make startups easy. It made trying a startup easy.

The person who knows the workflow no longer has to wait for a software company to care.

The claims analyst can prototype the intake tool. The healthcare operator can build the scheduling product. The compliance analyst can turn the exception path into software instead of a requirements doc. That is good. It also makes the market harsher.

We have already seen this inside our own product: after private beta, Flock surfaced 13 customer-facing issues we had not prioritized because we were focused on harder infrastructure work. One of them was exactly the kind of thing that breaks trust before it looks like a bug: a user could create a shared workspace and then hit a dead end about what to do next.

You are not only competing with incumbents anymore. You are competing with everyone else who had the same idea and no longer needs to quit their job, raise a seed round, hire a full-stack team, or come from Silicon Valley to take a credible swing at it.

The supply shock is already visible. GitHub's 2025 Octoverse says more than 36 million developers joined GitHub in one year, and developers created more than 230 new repositories every minute.

Stripe shows the commercial version of the same compression: new businesses are forming faster, and 20% of Stripe Atlas startups charged a first customer within 30 days.

More people can build. More products can launch. More products can reach users before the team behind them has developed product judgment, support reflexes, quality muscle, or the scar tissue that tells you where trust breaks.

The software can look ready before the company is.

Creation Is Not Judgment

Strauss Zelnick runs Take-Two, the company behind Grand Theft Auto. In a recent conversation, Zelnick pointed out that the technology to copy the surface area of Grand Theft Auto has existed for years. AI may make pieces of that process faster, but the existence of the tools was never the same thing as the ability to make the hit. He made the same point about mobile games: thousands are made every year, zero to five become hits, and almost all of them are made by incumbents. There is a difference between making assets and making hits.

Creation tools spread. Taste, distribution, execution, cultural antenna, and accumulated judgment are not as transferable.

AppMagic's 2026 mobile market report gives that intuition some scale. Across Google Play and the App Store, more than 1.4 million app-and-game releases launched in 2025. Only about 10% drew meaningful user attention.

Hank Green recently made the creator-economy version of this point. Video-game streaming looks easy from the outside: play a game, talk while you do it, upload the result. The barrier to entry is very low, so millions of people try. That does not make success easier. It makes the selection environment more brutal, because the winners are selected from a much larger field.

That is the title of this post in miniature: the barrier got lower, and the bar got higher.

Domain expertise gets you to the right problem. It does not automatically give you the systems thinking to validate every variant of onboarding, permissioning, lifecycle state, billing, roles, expectations, and support edges with every release. That is the new gap.

What Flock Is For

The Flock origin story starts with a testing problem, not an AI demo.

We needed realistic onboarding data without asking real companies to trust an early product. So we built the synthetic company I later wrote about in executive-grade chaos: people, roles, accounts, routines, artifacts, communications, messy edges. Enough realism to make the product react like it was touching an actual organization.

The surprise was that the synthetic company became more valuable than the original test case. It showed us where the product was vague, where the flow assumed too much, where the user would lose confidence, and where a technically correct screen still failed the actual job. The important part was that realistic pressure appeared before real users had to pay for it.

That is Flock: realistic synthetic users before real customers. Not happy-path scripts. Not a single golden-path smoke test that passes every time.

A skeptical buyer. A busy operator. A novice user. An admin with the wrong assumptions. A returning customer with history. A workspace with edge-case state. A flow where copy, permissions, timing, account state, and intent collide.

The output is not "AI says your product is bad." That would be both useless and annoying. We produce evidence: screenshots, browser context, severity, what happened, why it mattered, and a repair prompt optimized for a coding agent.

Flock exists to help creators pass the test that comes after "it works on my machine" and before "a real customer trusted us with their time."

What the Flock

The most useful example is ourselves. After our private beta launch, we were focused on the correct things: the review pipeline, reporting, billing, production readiness, model evaluation, and the thousand other pieces required to harden our young product. That is normal. But it is also fast, messy, and how product surfaces drift.

Embarrassingly, but also fortunately, Flock identified 13 customer-facing issues. These were the kind of small paper cuts that individually would never interrupt a roadmap. But collectively, they grabbed my attention. The pattern was not one spectacular bug. It was a field of small contradictions, easy to miss in isolation and brutal in aggregate.

A user could create a shared workspace and then be left without a clear answer to "what happens now?" A checkout back-link could drop someone out of onboarding and onto the public pricing page, exactly at a critical conversion moment. A plan gate could disagree with billing state and tell an active Pro user that shared workspaces required Pro. Personal workspace copy read like a team setup. Pricing language included features we stripped from the launch. To name a few.

This is the ordinary residue of moving fast and prioritizing other hard problems. But real users do not care. They experience it as a loss of confidence.

An app crashing is a bug. A product claiming two plausible things that cannot both be true destroys credibility.

The Bar Is Still Going Up

Apple said it evaluated more than 9.1 million App Store submissions in 2025 and rejected more than 2 million. Google said it prevented more than 1.75 million policy-violating apps from reaching Google Play. And last year was a decade ago in LLM time. As creation gets cheaper, review machinery has to scale.

In a market where vastly more software reaches users faster, learning every basic trust failure from real customers is too slow and too costly. By the time the customer has shown you the obvious bug, you have already lost.

The winners will not be the teams that generate the most screens. They will be the teams that combine domain insight with taste, operational discipline, and mechanical product judgment.

Build faster. But test the combinations. Catch the trust failures before customers do.