Google just rolled out a major upgrade to its AI Mode search tool in the U.S., letting users shop using conversational descriptions and reference images. Instead of relying on filters like “color” or “size,” you can now say things like “barrel jeans that aren’t too baggy” and refine results with “make those ankle length” or “more acid wash.” The update integrates Google Lens, its Shopping Graph, and Gemini 2.5 to better recognize subtle visual details and objects in images, giving you a richer visual grid of shoppable options. The feature is being gradually turned on in English across the U.S. this week, but it may take a few days to appear for all users.
Sources: Google Blog, The Verge
Key Takeaways
– Google’s AI Mode now supports multimodal shopping, meaning you can combine images and natural language in the same query to get visually rich, shoppable results.
– The update relies heavily on Google’s existing image tech (Lens) plus its Shopping Graph and Gemini 2.5 model to interpret nuanced visual context beyond just primary objects.
– For now, the rollout is limited to U.S. English users and is incremental—some users may see the update later than others.
In-Depth
In the landscape of search tools, Google has long compared itself to a directory, helping users find information via typed queries and blue-link results. But with the rise of generative AI, user expectations are shifting toward more natural, conversational interactions. Google’s latest move—enhancing AI Mode to support visual search integrated with shopping—is a strong signal that the company intends to meet those expectations head-on.
Up until now, you could search with text, or use Google Lens to identify objects via images, but the experience often felt disjointed: image search and text search existed as separate modes. With this update, those modes are merging. You can now start a search with an image or text—or both—and then refine your results conversationally. For instance, imagine seeing a photo of a chair and saying, “Show me options in that style but with lighter upholstery.” AI Mode now aims to handle that sort of back-and-forth refinement.
Under the hood, Google uses a “visual search fan-out” approach: it breaks down your query (and image) into sub-queries, analyzes various regions of the image, and combines image metadata and context to understand subtle details. Using the Shopping Graph—which holds over 50 billion product listings—the system then tries to deliver a visually rich grid of product options that match your vibes and criteria. The model at work is Gemini 2.5, which supports advanced multimodal reasoning, tying together what it “sees” with what you describe.
This shift has big implications for how people will shop online. Instead of drilling down through menus and filters, users can more intuitively “talk” their way into the results they want. Google’s strategy also includes future enhancements like “agentic checkout,” where the system monitors price drops and can initiate purchases on your behalf, and virtual try-on features that let you see how clothes might look on you from a photo.
Of course, rolling out a system like this isn’t trivial. Google is first making it available in English across the U.S., and it’s gradually enabling it for users (it may take days for the update to appear). The challenge will be ensuring accurate object recognition, reducing irrelevant matches, and maintaining the balance between AI suggestions and reliable, human-verified sources.
From a larger perspective, this step deepens Google’s integration of AI into everyday tools. It underscores Google’s intent to evolve from a passive search engine into an intelligent, interactive assistant—one that can see, understand, and trade alongside you, rather than just pointing to where things are. The real test will be in execution: whether this feels natural and accurate enough for everyday users to adopt.

