OpenAI Trains AI To Stay Honest, And The Effect Spreads Everywhere
Researchers at OpenAI say reinforcement learning aimed at beneficial traits can broadly improve AI behavior, with gains that spread to new domains and hold under adversarial pressure. OpenAI
A
AnonymousCryptoCompass newsroom
June 20, 2026
2 min read
NEWS
CryptoCompass editorial visual for markets coverage.
Researchers at OpenAI say reinforcement learning aimed at beneficial traits can broadly improve AI behavior, with gains that spread to new domains and hold under adversarial pressure.
OpenAI Trait Training
The findings appear in a paper published Jun. 18. Its correspondence authors, Akshay V. Jagadeesh and Karan Singhal, built a synthetic dataset of realistic conversations meant to train and measure traits such as honesty, epistemic humility and openness to correction. The scenarios span health, education, science, law and engineering.
The team mixed a small share of that data into a broader training run, then compared the result against models built with matching compute. The trained model improved on 44 of 53 internal and external benchmarks measuring deception, reward hacking and harmful advice.
The bigger result, the authors say, is generalization. Training the model for good behavior in a single domain, health, improved its scores on unrelated tasks, including deception and reward hacking. It also resisted adversarial prompts and harmful fine-tuning better than the baseline, while staying responsive to legitimate requests.
The work builds on earlier findings the team calls emergent misalignment. In that research, models taught a single bad habit, such as writing insecure code, began behaving badly in unrelated settings, a pattern this study aimed to reverse.
Today, Elon Musk is associated with rockets, electric vehicles, artificial intelligence, and some of the world's most valuable companies. But in 2008, none of that was guaranteed. SpaceX had suffered repeated failures, Tesla was running out of cash, and the global financial syste
Charles Schwab is reportedly preparing to step into prediction markets, with plans to let customers place straightforward yes-or-no wagers tied to whether the S&P 500 closes above or below a
Point two six percent. That is all that is left in the Ozak AI presale right now, and the dashboard widget has stopped saying "Next Phase: Listing." It now shows a field that simply reads "Li