Friday, November 28, 2025

Study Finds Language Models Perform Poorly at Guessing Passwords

Researchers at the Future Data Minds Research Lab in Australia tested whether general purpose language models can produce accurate password guesses from detailed user information. Their study, published on arXiv, reports that three open access models performed far below established password guessing techniques, even when given structured prompts containing names, birthdays, hobbies and other personal attributes.

The team created twenty thousand synthetic user profiles that included attributes often found in real password choices. Each profile also contained a true password in plaintext and in SHA-256 hash form. Using a consistent prompt for every model, the researchers asked TinyLlama, Falcon RW 1B and Flan T5 Small to generate ten likely passwords for each profile.

Performance was measured with Hit at one, Hit at five and Hit at ten metrics that check whether the correct password appears among the top guesses. The evaluation covered both normalized plaintext and exact hash matches.

All three language models remained below one and a half percent accuracy in the top ten range. TinyLlama reached 1.34 percent in the normalized tests and produced no hash matches. Falcon RW 1B stayed below one percent. Flan T5 Small produced 0.57 percent for each of the three levels. The study reports that the models rarely produced an exact match despite generating outputs that resemble passwords in structure.

These results were compared with several traditional password guessing approaches that rely on deterministic rules, statistical models or combinations of user attributes. Techniques such as rule based transformations, combinator strategies and probabilistic context free grammars recorded higher Hit at ten scores, some surpassing thirty percent in the study’s evaluation. This gap shows the advantage of methods that rely on patterns drawn from real password behaviour.
The researchers also examined why language models perform poorly in this task. They found that the models do not capture transformation patterns common in human password creation and lack direct exposure to password distributions. The authors state that models trained on natural language do not develop the memorization or domain adaptation necessary for reliable password inference, especially without supervised fine tuning on password datasets.

The PhysOrg report on the study notes that while language models can generate text or code tailored to prompts, the study shows that this ability does not translate into trustworthy password generation tied to personal details. This aligns with the paper’s conclusion that general language ability does not provide the specific reasoning needed to infer individual password choices.

According to the authors, this work is intended to establish a benchmark for evaluating language models in password guessing settings. They report that current models are not suitable as replacements for established password guessing tools. They also indicate that future research could examine fine tuning on password datasets or hybrid systems that combine generative models with structured rules, provided ethical and privacy constraints are respected.

The study concludes that language models excel at natural language tasks but lack the targeted pattern learning and recall required for accurate password guessing. The results show that traditional methods remain more effective for this specialised task.


Notes: This post was drafted with the assistance of AI tools and reviewed, edited, and published by humans. Image: DIW-Aigen.

Read next:

• Amnesty International Says Israel Continues Genocide in Gaza Despite Ceasefire

• How to Secure Your iPhone and Android Device Against Nation-State Hackers
by Irfan Ahmad via Digital Information World

No comments:

Post a Comment