Generative AI can be dazzling…until it isn’t. In the September–October 2025 issue of Harvard Business Review, Stefan Thomke, Philipp Eisenhauer, and Puneet Sahni share how Amazon built its “Catalog AI” system to address Gen-AI’s notorious tendency to hallucinate, omit critical details, or flood decision-makers with too many options. Rather than rely solely on costly human reviewers or plug-in test tools that can only cover a slice of output, Catalog AI automatically filters out unreliable content, generates product-page ideas, tests them, and refines itself with feedback from experiments and quality checks. Though it’s still a work in progress, the authors suggest that other organizations can already learn from Amazon’s early steps.
Sources: Harvard Business Review
Key Takeaways
– Scalable Quality Control Wins – Human reviews alone can’t handle Gen-AI’s volume; scalable AI-based systems are a smarter alternative.
– Feedback Loop Innovation – Catalog AI learns from its own results, integrating testing outcomes and quality checks for continuous improvement.
– Cross-Industry Relevance – Though Amazon’s use case centers on product pages, the concept offers lessons for any sector deploying Gen-AI at scale.
In-Depth
Generative AI sure has a lot of promise—it helps churn out creative options fast—but it isn’t perfect. Too often it makes things up, omits necessary details, or overwhelms us with too many choices to evaluate effectively. That’s a real hurdle for any organization looking to use it at scale. So here’s something pragmatic: Amazon’s “Catalog AI” offers a solution worth noting.
According to Harvard Business Review contributors Stefan Thomke, Philipp Eisenhauer, and Puneet Sahni, Catalog AI wraps several intelligent features into one system. It can detect and screen out content that looks unreliable, then suggest product-page ideas, evaluate which ones work, and adapt based on quality-control feedback. The system blends automation with experimentation—letting AI suggest and test, while measured results inform improvements.
This isn’t just pie-in-the-sky theory. Amazon’s operation faces a massive catalog-creation task, and traditional human reviews or separate testing tools can’t keep pace. Catalog AI addresses that head on by offering scalable, self-improving quality control, freeing up human experts for higher-value tasks, rather than repetitive oversight. This kind of feedback loop—generation, testing, refinement—is what makes the system promising, even though it’s still early in deployment.
What’s of note here is the emphasis on practical, measured adoption—avoiding blind enthusiasm or undue skepticism. Companies thinking of deploying Gen-AI should take note: quality control isn’t luxury, it’s necessity. And scalable, AI-driven systems like Amazon’s may be the most effective way forward—if you’re ready to iterate, measure, and let the technology hold itself accountable.

