
Enlarge (credit score: Andriy Onufriyenko by way of Getty Photographs)
OpenAI really doesn’t need you to know what its newest AI mannequin is “pondering.” For the reason that firm launched its “Strawberry” AI mannequin household final week, touting so-called reasoning skills with o1-preview and o1-mini, OpenAI has been sending out warning emails and threats of bans to any person who tries to probe into how the mannequin works.
Not like earlier AI fashions from OpenAI, comparable to GPT-4o, the corporate educated o1 particularly to work by a step-by-step problem-solving course of earlier than producing a solution. When customers ask an “o1” mannequin a query in ChatGPT, customers have the choice of seeing this chain-of-thought course of written out within the ChatGPT interface. Nonetheless, by design, OpenAI hides the uncooked chain of thought from customers, as an alternative presenting a filtered interpretation created by a second AI mannequin.
Nothing is extra attractive to fans than info obscured, so the race has been on amongst hackers and red-teamers to attempt to uncover o1’s uncooked chain of thought utilizing jailbreaking or immediate injection strategies that try and trick the mannequin into spilling its secrets and techniques. There have been early studies of some successes, however nothing has but been strongly confirmed.