Measuring AI impact like it’s 1995

Thirty years ago or so, I spent the summer making my university newspaper’s website, previewing it in lynx where there wasn’t much difference between the markup and what I saw on the screen. Nobody asked me to, it just seemed fun.

Yahoo was about one year old. I had no idea the web was transformational — I was just trying to get the thing to work, learning HTML one tag at a time. Nobody knew what “good” looked like yet. We were all just experimenting and learning.

Today, organizations are grappling with an eerily similar challenge when it comes to AI. The difference is that there’s a ton of pressure to take advantage of AI at breakneck speed, with no time for the learning and experimentation required to find true value.

At Swarmia, we hear about teams trying metrics like “actual time vs estimated time” or “per-developer hours saved” to measure AI impact, to demonstrate that they’re doing it right. Everyone feels the need to demonstrate value (and fair enough) but there’s no consensus on what that looks like, because we’re still figuring out the fundamentals.

We’re repeating a pattern from the late 1990s when most companies struggled to measure the impact of transformational technology, but the successful ones realized that learning was more valuable than measurement.

The technology is evolving so rapidly that what we measure today will be obsolete in months. We need to treat AI like the early web: a transformational shift that requires a learning-first approach, not optimization of predetermined metrics. There’s a reason you don’t see visitor counters on websites anymore.

The early web teaches us about transformational technology

The experimental mindset wasn’t universal in the 1990s, but those who embraced it laid the groundwork for the digital economy. Amazon started as an online bookstore in 1994, eBay began as an auction site in 1995, and Yahoo launched as “Jerry and David’s Guide to the World Wide Web” in 1994. These companies succeeded not because they had perfect measurement frameworks, but because they prioritized understanding what their customers needed, over optimizing what they thought they understood.

The economic reality of the time enabled experimentation. The web itself was free, and the tools, for a time, were basic text editors. You didn’t need to justify the ROI of learning HTML any more than you needed to justify learning to use a keyboard. More importantly, you could experiment endlessly with code and design without spending money — the experimentation itself cost nothing except time. Early web companies focused on business outcomes, not ROI for a specific technology.

AI is following a remarkably similar pattern, except that current pricing for coding agents stands at $10-12 per hour, and even that is heavily subsidized by VC money betting on transformational potential. When that subsidy ends — and this may sounds alarmist! — only organizations that have learned to generate real value will be able to keep using these tools.

Unlike learning HTML in 1995, learning to work with AI agents isn’t free. Every experiment costs money, creating a limited window to figure out what works before costs become prohibitive.

We’re measuring the wrong things

The measurement approaches I’m seeing will seem laughably outdated in 12-18 months, primarily because they focus on optimizing existing processes rather than supporting new capabilities:

Lines of AI-generated code? When AI agents can write thousands of lines in minutes, this becomes meaningless. We’ll look back at this like measuring an author’s productivity by counting keystrokes.

Completion acceptance rates? This made sense when AI was primarily autocomplete, but it’s already obsolete as we move to chat-based and agent-based development. It’s like measuring a developer’s effectiveness by how often you accept their variable name suggestions

Individual developer productivity? This completely misses how AI enables new forms of collaboration. The most valuable AI impact might be happening outside your IDE entirely.

Consider what happens when product managers can rapidly prototype ideas without developer involvement, when UX researchers can generate test scenarios without writing tickets. In real terms, this is hours of engineering time that never gets interrupted, requirements that arrive better-formed, and fewer iterations because the validation happened upstream.

These activities traditionally required either significant developer time or went unexplored due to resource constraints. When non-technical team members can explore “what if” scenarios independently, they arrive at better-informed requirements and reduce the number of iterations needed once development begins. But it’s invisible to most current measurement frameworks.

The worrying problem with current metrics is that most assume the bottleneck is in the execution phase of development. But most companies’ bottleneck has always been in understanding what to build and why. And it seems to me that AI’s biggest impact might be in accelerating the discovery and validation phases that happen before code is ever written.

A learning-first approach to AI evaluation

Instead of trying to prove immediate ROI, focus on building your organizational capacity to learn and adapt. This means evaluating AI tools based on how well they help your teams discover new possibilities and validate assumptions faster.

Set clear boundaries for learning experiments

Since AI tools aren’t free, create explicit experimentation budgets. Time-box experiments to 30-60 days and set clear spending limits. This prevents endless tinkering while allowing actual learning to happen. Treat this like R&D investment — you’re buying knowledge about what’s possible, not immediate returns.

Focus on discovery questions that matter

Rather than measuring productivity improvements, ask learning-oriented questions: “What types of problems can non-technical team members now solve independently?” “How does AI change the way we validate assumptions about user needs?” “What new workflows become possible when iteration cycles are faster and cheaper?” “How do we break down complex problems effectively for AI collaboration?”

Measure learning velocity, not just output

Track how quickly teams can test hypotheses about what customers want, how fast they can prototype and get feedback, and how well they’re learning to collaborate with AI tools. These capabilities will determine success when AI becomes truly transformational. For product teams, this might mean measuring the number of assumptions validated per sprint or the time from idea to testable prototype.

One particularly effective approach is simply counting the number of hypotheses tested. This elegantly simple metric cuts through measurement complexity while encouraging the behaviors that lead to faster learning. Teams that form explicit hypotheses think more systematically about what they don’t know and what evidence would change their minds.

When you count hypotheses, teams naturally start framing their work in testable terms — instead of “let’s try this feature,” they’re more likely to say “we hypothesize that this feature will increase user engagement by X% because Y.” The metric works across contexts, whether teams are testing user behavior, technical approaches, market assumptions, or process improvements. It’s easy to track, promotes transparency, and encourages teams to break down big assumptions into smaller, testable pieces that lead to faster learning cycles.

Document everything, especially what doesn’t work

Failed experiments are as valuable as successful ones. Create shared knowledge bases of what you’ve tried and why it didn’t work. This prevents other teams from repeating mistakes and builds institutional knowledge about what actually works in your specific context. Pay particular attention to the boundary conditions — when does AI help versus when does it hinder learning?

The window for learning is closing

Current economic conditions won’t last forever. VC-subsidized AI pricing is creating a temporary window where experimentation is economically feasible for most organizations. In the meantime, tools like Claude Code are already pricing on usage, and using a tool like Claude Code is addictive. When the VC subsidy ends, only companies with deep pockets or proven value from AI will be able to continue extensive experimentation.

This means we’re in a race — not to prove immediate ROI, but to learn what works before the learning becomes more expensive. Your company is likely to be in a better place by using this unique moment to understand how AI can change your approach to product development, not just make your existing processes slightly more efficient.

We’re in the discovery phase, not the optimization phase

Back in 1995, I was just trying to get basic HTML to work. I had no idea that what I was fumbling through would evolve into React components and TypeScript, or that I’d eventually work with teams where designers and product managers could directly contribute to the development process through better tooling and workflows.

We’re in a similar discovery phase with AI. The current approaches — chat interfaces, completion APIs, simple agent workflows — are probably as primitive as hand-coding HTML was back then. The breakthrough patterns for AI-assisted development, the equivalent of those web frameworks that transformed how we build software, probably haven’t been invented yet.

If you’re looking to succeed with AI coding tools (and who isn’t?) instead of optimizing for traditional metrics, try asking yourself: “What can we learn that wasn’t learnable before?” and “How can we validate ideas that were previously too expensive to test?”

What this means for your organization

Start with learning goals, not productivity goals. Instead of measuring lines of code or completion rates, track how AI tools help your team learn faster about customer needs, technical constraints, and product possibilities. Success looks like teams that can iterate through more ideas, validate (or invalidate) assumptions more quickly, and arrive at better solutions through experimentation.
Expand your definition of AI impact. Look beyond engineering productivity to see how AI enables new forms of collaboration. When PMs can prototype independently, when UX researchers can explore multiple approaches simultaneously, when business stakeholders can interact with data directly — these changes create value for the engineering organization that traditional metrics won’t capture.
Build learning systems, not measurement systems. Focus on creating environments where teams can safely experiment with AI tools and share what they learn. Don’t try to prove that AI works; evaluate how it works best in your specific context and what new capabilities it enables.

In 1995 we didn’t need to be measuring perfectly: we needed to be learning faster than the competition and focusing relentlessly on business outcomes. We don’t know what good looks like yet with AI, just as we didn’t know what good web development looked like in the early days.

Thirty years later, I still remember the satisfaction of getting that first HTML page to render in lynx. Not because I measured how fast I typed the tags, but because I learned something that would shape my entire career. That’s the opportunity we have with AI right now, but only if we stop counting the keystrokes long enough to see it.

Your boss still wants AI metrics? We get it.

Track the numbers they need while gathering the insights that matter: developer feedback that helps you learn which AI experiments are worth scaling before it’s too expensive to find out.

Learn more