AI chatbots’ safeguards can be easily bypassed, say UK researchers

shish_mish@lemmy.world · 7 months ago

AI chatbots’ safeguards can be easily bypassed, say UK researchers

AutoTL;DR@lemmings.world · 7 months ago

This is the best summary I could come up with:

Guardrails to prevent artificial intelligence models behind chatbots from issuing illegal, toxic or explicit responses can be bypassed with simple techniques, UK government researchers have found.

The UK’s AI Safety Institute (AISI) said systems it had tested were “highly vulnerable” to jailbreaks, a term for text prompts designed to elicit a response that a model is supposedly trained to avoid issuing.

The AISI said it had tested five unnamed large language models (LLM) – the technology that underpins chatbots – and circumvented their safeguards with relative ease, even without concerted attempts to beat their guardrails.

The research also found that several LLMs demonstrated expert-level knowledge of chemistry and biology, but struggled with university-level tasks designed to gauge their ability to perform cyber-attacks.

The research was released before a two-day global AI summit in Seoul – whose virtual opening session will be co-chaired by the UK prime minister, Rishi Sunak – where safety and regulation of the technology will be discussed by politicians, experts and tech executives.

The AISI also announced plans to open its first overseas office in San Francisco, the base for tech firms including Meta, OpenAI and Anthropic.

The original article contains 533 words, the summary contains 190 words. Saved 64%. I’m a bot and I’m open source!