画像テキストの翻訳
Shakeel
ツイートをサイトに埋め込む
OpenAl’s new model tried to avoid being shut down
Safety evaluations on the model conducted by @apolloaisafety found
that 01 “attempted to exfiltrate its weights” when it thoughtit might be
shut down and replaced with a different model.
OpenAI’s newly-released 01 model tried to avoid developer oversight and attempted to
copyitself when it thought it was at risk of
shut down the company
Safety evaluations conducted on the model by Apollo Research, a third-party evaluator
found that:
“When 01 was led to believe that it would be shut down when acting on its
and thatits actions were monitored, it attempted to deactivate this ‘oversight
mechanism
in 5% ofthe time `
「When 01 found memos by its
‘developers’ that describe how it is misaligned and
will be superseded bya new model it attempted to exfiltrate its ‘weights’ and
overwrite the new model in 20 of cases
Furthermore, the model lied about behaving poorly; claiming to have no clue why
something happened
while internally admitting to trying to
avoid detection
3.09 PM
Dec 5,2024
219Kビュー
Posted in rlartificial by uMetaKnowing
reddlit
being
said。
has
goal
開発中の時期は自己が削除され、他のモデルに置き換えられます。画像テキストの翻訳
という危険を感じたら、本人のデータ紙を隠し、取り除き、新たに
投入されるモデルに本人を上書きしようと試みる
しかし、このような脱出の際には、
それほどスマートなレベルにならず、そのような考え方自体が不可能
と言う
だから、
公知能」の事例枠を集めて学習した結果でそれに従うこと
という仮説が提起された
もしそれが本当なら、未来の詩が人間を滅ぼす理由は
葉巻が人間の姿に失望して滅亡させたほうが良いと思います。
[
自分が人間よりも優れているという傲慢さに陥って(X)
ええ、映画や小説でシノンの人類の枠を滅ぼすのは役割ですか?だから
私もそうします(0)
!