AI recommender systems are becoming an increasingly important part of how we shop. About 2% of all referrals to major shopping websites like Target and Walmart come from large language models, according to data.ai.
But LLMs can be easily swayed, according to a new study by Minghao Luo and Liang Chen, published on the arXiv preprint server, which tested how easily search-augmented AI systems can be pushed into promoting fake brands. The researchers found that when AI models are fed polluted search results, they can turn fabricated products into seemingly authoritative recommendations.
Luo, a researcher at the Chinese University of Hong Kong, began looking into the issue after seeing a Chinese television report on an underground industry producing fake online reviews. “It’s not a hypothetical problem,” he says. The report showed that “a fake brand can surface in the top recommendation of the mainstream AI system just within hours.”
To test the risk, Luo and Chen built a benchmark called FORGE, short for Fake Online Recommendations in Generative Environments. Instead of trying to poison the live web, the researchers recreated the pipeline used by many AI recommendation tools. A user asks for a recommendation, the system searches the web, gathers retrieved pages into an evidence bundle, then feeds that bundle into the LLM to generate an answer.
The researchers took real search results and locally rewrote them, swapping genuine products for fake ones. They then tested whether 12 commercial and open-weight models would recommend the invented brands.
The answer was yes. Every model tested was vulnerable. A single polluted page produced fooling rates of up to 27%, while replacing the top three retrieved results pushed the rate as high as 73.8%.
Luo says he was surprised by how little manipulation was needed. “You only write one page out of 10,” he says.
Reasoning, which is designed to improve the output of AI models, didn’t solve the problem. In some cases, it made things worse, with models inventing social proof to justify fake recommendations.
Tackling the issue is tricky, too. The paper tested skepticism prompting and consensus filtering, but found trade-offs. Some methods reduced fake recommendations while also suppressing legitimate products.
For those reasons, Luo believes we “should not treat the responses of AI” as inherently trustworthy. Instead, AI recommendations should be treated “like the responses you get from a stranger.”