8 Comments
User's avatar
Dennis Nehrenheim M.Sc.'s avatar

Haven’t read the whole series yet but I think you do put AI in a too positive light here regarding capabilities.

Ponti Min's avatar

> And the notion that AI is a useless parlor trick whose main impact is instigating an economic bubble is cope, not criticism.

On the subject of cope i am grimly amused that the criterion of whether AI had human-level intelligence was always the Turing Test. Until AIs clearly passed it, of course, at which point people tacitly agreed that was never the criterion.

Dennis Nehrenheim M.Sc.'s avatar

Do you have a research paper or reference that explicitly states that these models have passed a Turing test?

Ponti Min's avatar

No, but it's quite obvious that LLMs can converse like humans

Dennis Nehrenheim M.Sc.'s avatar

They can do that, at least superficially. But passing the Turing Test would require sustained, complete indistinguishability from a human, not just a few convincing moments. And we are not quite there yet.

Ponti Min's avatar

In what way are we not there yet?

Dennis Nehrenheim M.Sc.'s avatar

There have been studies, such as https://arxiv.org/pdf/2503.23674, that claim that well-prompted LLMs with certain personas can fool people in short, easy 5-minute conversations.

But I don't think that says all that much. If people get a better feel for how LLMs sound, I think they will get better at exposing LLMs. Even though mathematically it is impossible to distinguish computer-generated text from human-generated text, there are certain "tells" that show you that you are talking to a robot, which will eventually fail to consider important bits and nuances in context.

The biggest problem with the study above is the window. I would have people chat at least for one hour on various topics to validate the claim.

5min conversations matter right now, where LLMs are still new to the public in terms of fraud, social engineering, and automation. But that's a narrower claim than "LLMs pass the Turing test" as a statement about machine intelligence.

Ponti Min's avatar

Specifically "When prompted to adopt a humanlike persona, GPT-4.5 was judged to be the human 73% of the time: significantly more often than interrogators selected the real human participant".

I agree with you that a 1 hour convo would make a better test. I also think that if the AI companies were developing AIs specifically to do this they would get better very quickly.