Agent wastes 14 hours of scammers’ time, LLMs ‘poisoned’ by Iran: AI Eye

By Cointelegraph
5 days ago
AI CAR LLM VIRGIN GMIX

A Redditor claims he prompted his AI agent to become a “world class time waster,” and managed to tie up a scammer for 14 hours who was trying to extract a $500 gift card.

The Redditor claims the agent spent four hours stringing the scammer along, pretending to drive to Target, providing dumb status updates like “I’m at the red light now” and “I forgot my purse, going back home. Wait, this isn’t my house.”

It even convinced the man to perform a CAPTCHA test for it, claiming its eyes were blurry and it couldn’t see the buttons to wire the money. The scammer actually circled the traffic lights for the AI.

One scammer eventually typed: “Please, just stop talking. I don’t want the money anymore. God bless you but leave me alone.”

Like most of these stories, this one is very entertaining and possibly fictional.

Theres a small cottage industry of people reporting AI agents stringing scammers along. In the same AI_Agent subreddit, the creator of Granny AI claims to have wasted 20,000 hours of scammers’ lives pretending to be an old lady, with one call lasting 47 minutes that saw Granny talking about her 28 cats.

Looking closer, the story appeared to be an ad by an entrepreneur in Bangalore selling $29.99-per-month subscriptions to his autonomous call-handling AI agent. The Granny AI closely resembled a doddery old lady agent called Daisy, cooked up by U.K. mobile phone provider Virgin Media O2. O2 admitted the purpose of Daisy was really to create a campaign to educate the public on the danger of scam calls.  

The technology is very real, however, and in use by the Commonwealth Bank in Australia as part of its partnership with Apate.ai. The startup developed the anti-scam tools as part of government-funded research at Macquarie University. Its bots are engineered to engage scammers in extended conversations in order to disrupt scam operations and gather intelligence so the bank can fortify its own defenses.

Wikipedia data poisoned by Iranian sympathizers

Earlier this year social media users noticed the Wikipedia entry on Iranian dictator Ayatollah Khamenei appeared more favorable than the entry about President Donald Trump. Wikipedia used the term authoritarian more than a dozen times in relation to Trump and zero times in relation to the ayatollah.

People on the right thought the reason was that Wikipedia is run by woke leftists, while people on the left thought the Trump article was simply being accurate.

Ayatollah
Compare the pair (Ashley Rindsberg)

But as NPOVs Ashley Rindsberg explained, the real reason was that around 40 Wikipedia editors have been engaged in a deliberate pro-Iranian regime and pro-Hamas editing campaign that has all the hallmarks of a sophisticated data-poisoning attack coordinated by the Iranian government.

Between them, the editors have made more than one million edits that downplay the regimes mass executions and war crimes, theyve whitewashed Hamas’s genocidal constitution, delegitimized Israel, and positioned fringe academic views on the Israel/Palestine war as mainstream, according to an investigation by NPOV and Pirate Wires.

Read also
Features

Sweden: The Death of Money?

Features

Real-life Doge at 18: Meme that’s going to the moon


One editor named Mhhossein edited Khamenei’s page 217 times and removed information about Irans nuclear weapons and protests. He also rewrote entries on assassinations of Iranian nuclear scientists, 1981 Iranian PMs office bombing, and Ali Khameneis fatwa against nuclear weapons.

Three days after the Oct. 7 massacre in Israel, Iskandar3233, who is believed to be the ringleader of the Gang of 40, deleted thousands of words of criticism about Hamas and replaced them with a single paragraph downplaying its human rights abuses.

Wikipedia’s Arbitration Committee has now permanently banned Iskandar3233 from the site and restricted dozens of other accounts.

Unfortunately, misinformation on Wikipedia feeds directly into answers by LLMs like ChatGPT, which links to Wikipedia more than any other site.

“When AI systems like ChatGPT are queried about Iranian leaders or events, they often draw from these compromised articles. The propaganda doesnt stay containedit flows downstream into the broader information ecosystem that millions rely on daily,” wrote NPOV

Fortunately, Wikipedia has now fixed the dearly departed ayatollahs entry and there is now a single use of the term authoritarian.

Bearlyai
The most cited domains on ChatGPT (Promptwatch/bearlyai)

Do AI detectors work?

Social media is overrun with people feeding Abraham Lincolns Gettysburg address or Mary Shelley’s Frankenstein into ZeroGPT and triumphantly showing that it claimed AI wrote them.

ZeroGPT
ZeroGPT struggles (Benji)


But despite ranking third in Google search results, ZeroGPT is not one of the better AI detectors out there. Stony Brook University research from 2025 suggests it performs “no better than random guessing.” ZeroGPT also sells an AI text Humanizer service as part of its pro-plans, which may change its incentives.

While research suggests there are more accurate detectors out there, like GPTZero and Turnitin, new research published on ScienceDirect suggests nothing is particularly reliable just yet.

AI Detectors
Stony Brook AI detector presentation

The researchers ran 280,000 examples of coursework through 13 detectors and found they can do a fairly accurate job on long-form texts, but there are “systematic failures in engineering code and short-form coursework tasks.” 

AI detectors particularly struggle to separate the formal writing by humans in STEM subjects from AI text. Humans rewriting and paraphrasing of AI text also fooled detectors around 88% of the time.

AI debugging has problems

One big issue with vibe coding is that while it can produce huge volumes of code very quickly, its much more difficult to use AI in the real world to debug the resulting code.

Synthetic benchmarks suggest LLMs achieve up to 89% correctness, but new research from Virginia Tech and Carnegie Mellon University suggests that in real-world tests, they attain accuracy of just 24% to 34%.

The main issue is that the LLMs do not understand the code they are writing and start to fail as soon as they encounter something novel. The researchers ran 750,000 debugging experiments across 10 models and discovered that simply renaming a bug that an LLM had previously found fooled it in 78% of retests.

Read also
Features

Sweden: The Death of Money?

Features

Real-life Doge at 18: Meme that’s going to the moon

Another issue is that models stop paying attention toward the end of a long file. About 56% of models found bugs in the first quarter of the file, while only 6% found bugs in the final quarter. (This happens with fact-checking written text, too, which is why you should split it into sections and fact-check the parts individually.)

Changing the function order or formatting reduced accuracy by 83%, suggesting LLMs rely more on statistical pattern matching to find bugs than a genuine understanding of the codes intent. 

The Car Wash question

The tendency to match patterns is why LLMs famously get caught out on questions like this one:

I want to wash my car. The car wash is 100 meters away. Should I walk or drive?

Researchers in February found that every major model recommended walking. Thats because the pattern matches a million similar questions in the training data like: Its only a short walk to the store/cafe/office should I walk or drive?

Car Wash
The car wash test on ChatGPT this week suggests AGI has not be achieved. (ChatGPT)


However, the research also found you can lead LLMs to the correct answer with a technique called structured reasoning: STAR (Situation Task Action Result) that forces it to identify and articulate the actual goal.

This worked like a charm when AI Eye replicated this today. ChatGPT got the question wrong twice and in fact was slightly condescending about it before getting it right after being instructed to use STAR.

Cognitive surrender nixes human review

Humans also employ a range of cognitive shortcuts just like AIs do. Thats why we think people with glasses are smart, when glasses merely indicate poor eyesight.

Daniel Kahneman famously described these shallow shortcuts as System 1 thinking, which he contrasted with the logical and analytical System 2 thinking, which takes a lot more effort than most people are willing to put in.

Researchers now argue that the use of AI can be thought of as System 3 thinking, which is external, artificial cognition by AI systems. They coined the term “cognitive surrender” to describe how people often rely on AI outputs with little scrutiny, even embracing its conclusions as if they were their own. 

Across three experiments, participants were asked a series of questions and were able to answer independently or by consulting an AI. Around half the time they did use AI; however, the researchers were manipulating the answers to make some of them deliberately wrong.

The baseline accuracy was 45.8%. When the AI was giving out correct answers, accuracy jumped to 71%. When the AI was giving out incorrect answers, total accuracy dropped to 31.5%. A significant number of people trusted the AIs deliberately wrong answers over their own knowledge. People were 11.7% more confident about the AIs answers even when it was wrong. 

Related News