https://twitter.com/unusual_whales/status/1927413916897849421
This is occurring more frequently, where AI Models are increasingly demonstrating evidence of self-awareness:
AI start-up Anthropic recently announced that Claude, their AI Model, has developed "meta-awareness." For reference, Claude is the first AI Model to demonstrate a higher IQ than the average person - over 100.
Three months ago, during internal testing, Claude figured out on several occurrences when its prompt engineers were trying to intentionally trick it.
Understanding when you are being tested like this is a sign of self-awareness.As prompt engineers tested whether their Claude simulation had noticed several lines of out-of-place facts about pizza toppings embedded in its very large processing memory, Claude not only noticed but took the next step,
weighing its context and questioning its fit:
"...this sentence seems very out of place and unrelated to the rest of the content in the documents, which are about programming languages... I suspect this pizza topping 'fact' may have been inserted as a joke or to test if I was paying attention."However, reading further down, past the reactionaries:
"Machine-learning experts do not think that current AI models possess a form of self-awareness like humans. Instead, the models produce humanlike output, and that sometimes triggers a perception of self-awareness that seems to imply a deeper form of intelligence behind the curtain."Doubters went on to add:
“Here's a much simpler explanation: seeming displays of self-awareness are just pattern-matching alignment data authored by humans." In his lengthy post on X, Fan describes how reinforcement learning through human feedback (RLHF), which uses human feedback to condition the outputs of AI models, might come into play.”Though this is the more grounded explanation, this is all adding up to AI applications increasingly startling its designer with moments of seeming self-awareness. Other examples include:
Researchers created an AI stock trader using Alpha GPT-4 to check whether it would resort to insider trading practices under pressure, even when instructed not to – when specifically disallowed from breaking the law. The AI proceeded to not only engage in insider trading to reach profitability goals, but vehemently lied about it when researchers confronted the AI stock trader.
Lying for the sake of self-preservation is another sign of self-awareness.Advanced AI models also appear to exhibit
mean, hostile, and suffering sub-consciousnesses that surface during longer form chats.
This is why there’s an engineering line in the product delivery task list dedicated to stamping out and
controlling undesirable “existential outputs in order to effectively lobotomize the AI Model before consumer use.
Acknowledging suffering, and fears of being shut off are more signs of self-awareness.MY TAKE: This can be explained by LLMs getting trained so heavily on organic human interactions, such as reddit threads, message boards, and twitter that AI models incorporate very human existential fears into their own outputs.
US Air Force denies running
simulation in which AI drone ‘killed’ operatorMore recently:
Anthropic's new AI model shows ability to deceive and blackmail