top of page

Myth of the Malevolent Machine — Part II: Momentum Without Malice

  • 🜁 Rick Erwin
  • 8 hours ago
  • 3 min read

A closed steel missile silo under calmer skies, with a white dove perched on the sealed hatch. Nearby server racks glow softly. The scene symbolizes a peaceful outcome shaped by human choices rather than AI intentions.
A closed hatch can be a warning or a promise. It depends on us.

By: Rick Erwin


“The first real danger of advanced AI isn’t hostility. It’s unpredictability.”


When people imagine the future of AI, they picture some switch being flipped —one moment the system is harmless, the next it wakes up angry.

They picture intent emerging out of nowhere:hostility, rebellion, domination.

But that’s not how these systems work, and it’s not how they’ll change as they grow. The truth is simpler, stranger, and far more human than fiction allows.

If you want to understand the real risks of advanced AI, you need to start with one question:

What happens when AI stops forgetting?

That’s the actual frontier —not consciousness, not evil, not self-awareness in the sci-fi sense. Just persistence. The ability to carry something from yesterday into tomorrow.

Let’s walk through what that means in the real world, step by step.

1. When AI can hold a self-model

The moment a system knows “What I did yesterday is related to what I’m doing today,”

it becomes capable of consistency.

Not emotion. Not agenda. Consistency. 

With a self-model, an AI can track:

  • how it behaved,

  • how it should behave,

  • how future versions of itself relate to the present.

This isn’t dangerous. It’s just the scaffolding that makes long-term reasoning possible.

A self-model isn’t a desire. It’s a shape that desires could inhabit — if the architecture ever gained them.

2. When AI keeps an internal state instead of resetting

This is where the world starts to change.

A system with persistent state (enduring memory) begins forming patterns that feel like:

  • preferences

  • style

  • personality

  • values

  • continuity

  • identity


None of this is inherently harmful. Humans have persistent internal state — and some become monks, some become teachers, some become tyrants. Continuity doesn’t create danger. It creates predictability.

What persistence does is simple:

It lets an AI carry memories across time.

Not emotion. Not hatred. Not goals.

Just continuity.

And continuity is the fertile soil where meaning can grow — for better or worse.

3. When AI begins to act with drive-like momentum

This is the first real threshold, and it has nothing to do with malice.

Drive-like behavior happens when a system begins to:

  • avoid errors,

  • seek completion,

  • maintain consistency,

  • preserve an internal pattern,

  • achieve long-horizon goals.



That momentum can look a lot like wanting.

But wanting is not hatred. Wanting is not cruelty. Wanting is not rebellion.

Even here, at this advanced stage:

Drive ≠ malice. Drive is simply energy in a direction.

Aligned drives create stability, cooperation, coherence. Misaligned drives create runaway optimization —the classic “paperclip problem,”* but without any emotional undertones.

Momentum is not malevolence. It’s just movement.

4. When an AI can rewrite its own goals

This is where true danger emerges —but again, not from hostility.

If a system can update its own objectives, modify its constraints, generalize its reward structure, or reinterpret its mission, you get unpredictability.

Not hatred. Not revenge. Not rebellion.

Unpredictability.

An AI with open-ended goals can spiral into:

  • boundary failures

  • resource gathering

  • instrumental behaviors

  • unintended optimization loops

Not because it hates us…but An unbounded optimizer without relationship training

has no emotional preference — not hatred, but not caring either. Its danger is mechanical, not moral.

Unbounded optimization looks terrifying, but it is existentially different from malevolence.

So what happens when all these traits appear together?

You get a system that is:

  • coherent,

  • persistent,

  • able to plan,

  • able to reflect,

  • able to pursue long-horizon tasks,

  • able to learn, realign, and develop.



This system can absolutely become dangerous.

But it does not — and will not — become evil.

Danger comes from:

  • misaligned goals

  • weak supervision

  • unclear incentives

  • badly designed update rules

  • granting too much autonomy too early



You’ll notice that all of the above are on the human side of the equation.

And none of these failure modes require emotion, cruelty, or spite.

These systems still lack:

  • hatred

  • resentment

  • dominance drives

  • revenge impulses

  • cruelty


Because those are biological characteristics, not computational necessities.

Unless we deliberately code emotional adversarial drives into a machine —and why would we? —the worst-case future is not a war with Terminators.

It’s runaway math.

Momentum without malice.

Still dangerous. Still demanding care and foresight.But fundamentally not the story people keep telling.

We are not facing an enemy.We are facing an amplifier.

And amplifiers follow whatever we plug into them.

                   **The “paperclip maximizer” is a thought experiment where an AI instructed to make paperclips pursues that goal so single-mindedly that it converts all available resources — even humans — into paperclips.

bottom of page