AI Tools for QA: Is it really worth the hype?

AI & Data Engineering

Helton Imroth & Pedro Sorto Helton Imroth & Pedro Sorto

October 13, 2025 • 5 min read

Introduction

Since the early days of the Gen AI boom, everybody has been talking about all the new tools that promise to help build great software. Software testing—aka Quality Assurance (QA)—has always seemed like an area ripe for AI automation.

For that reason, we decided to test some tools and see if they are actually worth it. With so many tools on the market, it can be hard to choose the right one for your project, especially if the experience with automation seems like a significant investment of time and effort.

What AI Tools can do for QA?

AI tools are fundamentally transforming Quality Assurance (QA) in software testing. These tools can analyze vast amounts of historical data, code changes, and user behavior to predict where defects are most likely to occur. This enables a proactive, risk-based testing strategy, allowing QA teams to prioritize testing efforts on the most critical or high-risk areas of the application, thereby enhancing overall software quality and accelerating the process.

AI-driven platforms also make test case generation far more dynamic, automatically producing comprehensive and varied scenarios—even tricky edge cases. Perhaps their most impressive feature is “self-healing” automation, where the system adapts to UI or code changes on its own. From our experiments, a few tools stood out for different needs: Playwright MCP, Cursor, and Testcraft.

Playwright MCP

We experimented with Playwright's Model Context Protocol (MCP), which allows multiple clients/AI agents or test scripts to share a single browser instance, reducing resource overhead. It enhances dynamic data handling and supports intelligent AI-driven browser automation. Playwright MCP lets the agent see the website structure and interact with it using basic instructions or structured automation scenarios.

This is a great tool for executing Test scenarios and creating automation scripts to execute the same scenario as many times as needed. This is especially good for Regression suites and new functionalities to be tested in early development phases.

Agentic coding tools generally

Agentic coding tools like Claude Code or Cursor can automate much of the test planning and creation process without the need for dedicated QA software. You can ask them to surface potential edge cases or attack vectors, giving you a strong foundation for your test strategy.

If you’ve been following best practices and feeding your agent rich project context you’ll be well positioned to automatically generate test scenarios. The agent can explore edge cases and simulate variations that would otherwise require manual setup. Similar to Playwright MCP, this setup works especially well for regression testing, where you want to ensure new updates don’t break existing functionality.

TestCraft

The TestCraft platform provides a free, open-source browser extension (Chrome, Edge, Firefox, Safari) to select web elements and instantly generate test ideas, code in Playwright/Cypress/Selenium, and perform accessibility checks. They offer an API agent that ingests OpenAPI specs to scaffold automation frameworks.

This is especially good for new projects and features because it provides a wide range of scenarios. However, it does not execute them, and the user has to copy the code snippet to create the automation framework. The variety of scenarios that it provides is very promising.

Playwright MCP Experiment

For this experiment, we used Pourwall, a Beta Acid product that helps bars and restaurants manage and display digital beer menus. We set out to automate a regression test for tap creation using Playwright MCP, measuring the time it took with AI compared to a traditional setup.

We chose Playwright MCP because it’s one of the most widely adopted frameworks and, as mentioned earlier, it integrates smoothly with web frameworks—making it ideal for building automation frameworks from simple, structured instructions.

The test scenario:

Log in to the Pourwall app
Add a new tap and select “Other”
Fill out all the fields
Add the servings
Save the tap
Verify that the tap was created successfully

Manually executing this test takes around five minutes, but preparing a traditional automation setup typically requires several hours to identify element locators, write scripts, and validate results.

That’s usually the point where teams ask, “Is this really worth automating?”

The Results

Using Playwright MCP, we were able to generate and execute the entire scenario in under 10 minutes. The full test run and code creation took just three minutes, a remarkable improvement over the time required for traditional automation. The AI agent successfully tested the feature and produced a strong base of Cypress code to build on.

However, we did encounter a small hiccup — in one run, the agent tried to create a new serving instead of using an existing one as instructed. After refining the prompt and rerunning the test, it executed correctly.

This showed that while AI can drastically accelerate test creation, prompt clarity is key to achieving consistent and reliable results.