DevReady PodcastKevin Surace on The Future of Generative AI and QA Testing

Episode Overview

In this episode of the DevReady Podcast, host Anthony Sapountzis is joined by Kevin Surace, CEO and CTO of Appvance.ai and one of the original pioneers of voice AI and virtual assistants. Kevin’s work dates back to the early days of AI driven speech interfaces, and his career spans innovations in semiconductors, aerospace, building materials, cybersecurity, and generative AI. Together, Anthony and Kevin unpack how generative AI is reshaping the software development lifecycle, especially enterprise QA testing, and why AI literacy has become a defining advantage for developers and teams.

Kevin Surace’s AI Legacy and Why It Matters Now

From early voice AI to today’s GenAI boom

Kevin opens by situating himself in the long arc of AI history. Long before the current generative AI wave, he helped build some of the earliest voice based AI assistants, laying groundwork for what has now become mainstream. He reflects on the unintended consequences of invention, especially how AI tools can replace certain roles like offshore customer support. However, he frames this as a familiar theme in technology, where progress can create disruption at scale even when the original intent was simply to make life easier.

Innovation across industries and a problem solving mindset

Kevin’s career has never been confined to one sector. He points to a pattern of staying curious, moving into new fields, and solving hard problems without being limited by previous experience. This includes innovations outside AI that became global standards, such as soundproof drywall used in high end hotels. For Kevin, this curiosity is the real driver behind his current focus: fixing the slowest, most expensive parts of modern software delivery.

The Enterprise Testing Crisis

Why only 10 percent of user flows get tested

A major thread in this episode is the gap between how much testing should happen and what organisations can realistically afford. Kevin explains that most enterprises test only around 10 percent of real user flows. Even large corporations cannot justify testing the other 90 percent because it is too labour intensive and too slow when done manually or semi manually. The consequence is predictable: users discover defects after release, often in common scenarios rather than obscure edge cases.

The real cost of end to end QA

Kevin highlights the scale of enterprise complexity. Large organisations do not run a handful of applications. They run thousands, with deep integration between APIs, databases, and backend services. Every update to an operating system, security protocol, or connected service risks breaking something elsewhere. This creates a permanent need for regression testing, which becomes a huge budget line item and a persistent bottleneck in the software development lifecycle. Anthony agrees, noting that this kind of end to end testing is often the least loved part of development, but also one of the most unavoidable.

How Appvance.ai Uses AI Script Generation and Digital Twins

Thousands of tests in hours, not months

Kevin’s core argument is that AI will soon dominate enterprise QA because it removes the human ceiling on coverage. Appvance.ai uses AI script generation to automatically create and execute large sets of tests against defined business requirements. Where a human team might produce one test every hour or two, the platform can generate thousands in a matter of hours. Kevin notes that this approach regularly uncovers more than 90 percent of bugs that would otherwise remain hidden. The implication is that teams no longer have to accept tiny test subsets as a necessary compromise.

Digital twins as the future of autonomous testing

Kevin also explains how Appvance.ai goes beyond old style recorder based automation. Recorders are limited by human speed and by the speed of the application itself. To escape those limits, Appvance.ai builds a digital twin of the application, an instant simulation environment where scripts can be generated and iterated at machine speed. Once the tests are created in the twin, they are validated against the real application. This mirrors how advanced robotics systems train in simulations first, then transfer into the physical world, allowing learning at a scale that real time environments cannot match.

AI Adoption, Pricing, and Trust

Deloitte’s fake citation scandal and the trust gap

Anthony raises a critical business concern: if AI makes work faster, why are clients still being charged the same price? He references a recent Deloitte Australia case where a government report was delivered with fake citations and invented examples. Kevin agrees that this kind of failure is not a reason to reject AI, but a reminder that clients pay for expertise plus proper checking. If a consulting firm uses AI carelessly and skips verification, then the value proposition collapses.

Why the right AI model matters

Kevin adds that the Deloitte case is also about selecting the right model for the task. Some tools are far more reliable for citation driven research, while others hallucinate if not configured correctly. He points out that models with strong retrieval grounding can produce real sources that are easy to verify. The deeper point is that AI does not eliminate accountability. It raises the bar on tool choice, workflow design, and professional oversight.

Reinforcement Learning and the Next Wave

Reward based training in robotics and GenAI

The conversation broadens into how reinforcement learning is influencing both robotics and generative AI. Kevin explains that modern humanoid robotics has moved from rigid rule based systems to reward based learning. Instead of programming every micro step of a task, engineers define rewards and let machines teach themselves through trial and error. Anthony notes the risk of poorly designed reward functions, which can lead to system hacking or unintended behaviours. Both agree that clear context and well framed goals are essential if AI is to produce useful results.

AlphaGo as a turning point

They reflect on landmark system breakthroughs like AlphaGo, and how each wave of AI has repeatedly solved problems once assumed to be uniquely human. The pattern is consistent: the moment machines cross a competence threshold, the public reaction shifts from dismissal to shock, and sometimes to resistance. Kevin sees the current generative AI wave as another version of that turning point, only happening across every industry simultaneously.

Careers, Productivity, and the Future of Development

AI coding tools as accelerated open source reuse

Kevin is blunt about what AI coding tools represent. Developers have always used external resources like Stack Overflow and GitHub examples. AI simply compresses that workflow from hours into seconds. The key is not to become lazy, but to become more strategic. He warns against rebuilding components that already exist, a mistake he sees often among teams who do not know the ecosystem well enough. Your value is in the outcome and the logic, not in reinventing solved infrastructure.

Why GenAI literacy is now a hiring filter

Kevin cites productivity gains of roughly 55 percent for developers who use AI effectively, and predicts that this will only increase. As a result, entry level developer roles are shrinking. Organisations no longer need large groups of junior staff to write boilerplate, because AI performs at that level instantly. However, Kevin says this also creates opportunity for graduates who become genuinely skilled in generative AI tools. Anthony agrees that education is lagging and that universities must adapt quickly, otherwise they will send people into the workforce trained for jobs that have already changed.

Topics Covered

  • Kevin Surace’s early work in voice AI and virtual assistants
  • Enterprise software complexity and the scale of legacy applications
  • The end-to-end testing gap and why most user flows go untested
  • AI script generation and autonomous QA testing with Appvance.ai
  • Workplace resistance to AI and modern “sabotage” dynamics
  • AI adoption, pricing expectations, and trust after the Deloitte citation scandal
  • Digital twins and simulation driven testing, linked to robotics
  • Reinforcement learning, reward functions, and lessons from AlphaGo
  • AI coding tools, developer productivity gains, and the skills shift in hiring

Important Time Stamps

  • From Virtual Assistants to AI Bug Hunting: Kevin Surace’s Journey (0:07 – 5:24)
  • Why 90 Percent of User Flows Go Untested in Enterprise Apps (5:25- 10:40)
  • Adapt or Be Left Behind: The Future of AI in Work and Creativity (10:41 – 16:48)
  • If You Use AI, Should You Charge Less or Deliver More? (16:49 – 35:38)
  • From AlphaGo to Humanoids: The Reinforcement Learning Revolution (25:39 – 28:59)
  • Are We Hitting the AI Ceiling or Just Getting Started? (29:00 – 39:44)
  • AI Makes Developers 55 Percent More Productive, Here’s How (39:55 – 49:05)

Key Takeaways

  • Enterprise QA testing is under resourced, with most organisations validating only a small portion of real user flows.
  • AI script generation and digital twins can deliver true end to end coverage at machine speed.
  • The Deloitte citation scandal shows that model choice and human verification still matter.
  • AI tools amplify productivity, but only when guided with clear context and outcomes.
  • AI will not take your job, but a skilled AI user probably will.

Useful Links

Kevin Surace | LinkedIn

Kevin Surace | Website

Appvance.ai | LinkedIn

Appvance.ai | Website

FAQs

What is autonomous QA testing?

Autonomous QA testing uses AI to generate, run, and evaluate software tests without relying on manual scripting or recorder based automation. In the episode, Kevin explains that this approach can cover far more user flows than human teams can realistically test, reducing bugs found post-release.

Why do enterprises test only a small percentage of user flows?

Kevin says most organisations test roughly 10 percent of real user journeys because end-to-end coverage is too expensive and time-consuming with human labour. The remaining flows go untested, which is why users often discover faults in common scenarios after deployment.

How does Appvance.ai use AI script generation?

Appvance.ai converts business requirements and manual test cases into thousands of executable scripts in hours. Kevin notes that this AI script generation can expose the majority of defects quickly, helping teams avoid months of manual regression work.

What is a digital twin in software testing?

A digital twin is a fast, simulated model of an application that allows AI to explore and generate tests at machine speed. Kevin explains that tests are built in the digital twin first, then validated on the real application, enabling far broader and quicker end to end testing.

What happened in the Deloitte Australia AI report incident?

Anthony references a Deloitte report for an Australian government department that included fabricated citations and errors after generative AI was used, leading Deloitte to issue a partial refund. Kevin uses this as a lesson in choosing the right model and ensuring human verification.

Why is model choice and verification so important when using AI?

Kevin argues that AI failures usually come from using the wrong tool for the job or skipping basic checking. Some models are better at retrieval and citations, while others are more prone to hallucinations, so oversight remains essential.

How are AI coding tools changing developer productivity and hiring?

Kevin and Anthony say AI coding tools speed up what developers already do, like reusing open source patterns, and can lift productivity by around 50 percent or more. Kevin adds that entry level roles are shrinking unless graduates are skilled in GenAI tools, because AI now handles much of the junior workload.

©2025 Aerion Technologies. All rights reserved | Terms of Service | Privacy Policy