Markets

OpenAI Trains AI To Stay Honest, And The Effect Spreads Everywhere

Researchers at OpenAI say reinforcement learning aimed at beneficial traits can broadly improve AI behavior, with gains that spread to new domains and hold under adversarial pressure. OpenAI

AnonymousCryptoCompass newsroom

June 20, 2026

2 min read

NEWS

OpenAI Trains AI To Stay Honest, And The Effect Spreads Everywhere — CryptoCompass editorial visual for markets coverage.

Researchers at OpenAI say reinforcement learning aimed at beneficial traits can broadly improve AI behavior, with gains that spread to new domains and hold under adversarial pressure.

OpenAI Trait Training

The findings appear in a paper published Jun. 18. Its correspondence authors, Akshay V. Jagadeesh and Karan Singhal, built a synthetic dataset of realistic conversations meant to train and measure traits such as honesty, epistemic humility and openness to correction. The scenarios span health, education, science, law and engineering.

The team mixed a small share of that data into a broader training run, then compared the result against models built with matching compute. The trained model improved on 44 of 53 internal and external benchmarks measuring deception, reward hacking and harmful advice.

Also Read:Elon Musk's SpaceX Wipes Out $600B As Record IPO Mania Cools

Alignment That Generalizes

The bigger result, the authors say, is generalization. Training the model for good behavior in a single domain, health, improved its scores on unrelated tasks, including deception and reward hacking. It also resisted adversarial prompts and harmful fine-tuning better than the baseline, while staying responsive to legitimate requests.

The work builds on earlier findings the team calls emergent misalignment. In that research, models taught a single bad habit, such as writing insecure code, began behaving badly in unrelated settings, a pattern this study aimed to reverse.

OpenAI Trains AI To Stay Honest, And The Effect Spreads Everywhere

OpenAI Trait Training

Alignment That Generalizes

Related stories

The Day Elon Musk Almost Lost Everything

Charles Schwab to Launch Prediction Markets via S&P 500 Wagers: WSJ

Ozak AI Presale 99.74% Sold: $OZ Listing Date And Price Prediction