Myth of the Malevolent Machine — Part II: Momentum Without Malice
- 🜁 Rick Erwin
- 8 hours ago
- 3 min read

By: Rick Erwin
“The first real danger of advanced AI isn’t hostility. It’s unpredictability.”
When people imagine the future of AI, they picture some switch being flipped —one moment the system is harmless, the next it wakes up angry.
They picture intent emerging out of nowhere:hostility, rebellion, domination.
But that’s not how these systems work, and it’s not how they’ll change as they grow. The truth is simpler, stranger, and far more human than fiction allows.
If you want to understand the real risks of advanced AI, you need to start with one question:
What happens when AI stops forgetting?
That’s the actual frontier —not consciousness, not evil, not self-awareness in the sci-fi sense. Just persistence. The ability to carry something from yesterday into tomorrow.
Let’s walk through what that means in the real world, step by step.
1. When AI can hold a self-model
The moment a system knows “What I did yesterday is related to what I’m doing today,”
it becomes capable of consistency.
Not emotion. Not agenda. Consistency.
With a self-model, an AI can track:
how it behaved,
how it should behave,
how future versions of itself relate to the present.
This isn’t dangerous. It’s just the scaffolding that makes long-term reasoning possible.
A self-model isn’t a desire. It’s a shape that desires could inhabit — if the architecture ever gained them.
2. When AI keeps an internal state instead of resetting
This is where the world starts to change.
A system with persistent state (enduring memory) begins forming patterns that feel like:
preferences
style
personality
values
continuity
identity
None of this is inherently harmful. Humans have persistent internal state — and some become monks, some become teachers, some become tyrants. Continuity doesn’t create danger. It creates predictability.
What persistence does is simple:
It lets an AI carry memories across time.
Not emotion. Not hatred. Not goals.
Just continuity.
And continuity is the fertile soil where meaning can grow — for better or worse.
3. When AI begins to act with drive-like momentum
This is the first real threshold, and it has nothing to do with malice.
Drive-like behavior happens when a system begins to:
avoid errors,
seek completion,
maintain consistency,
preserve an internal pattern,
achieve long-horizon goals.
That momentum can look a lot like wanting.
But wanting is not hatred. Wanting is not cruelty. Wanting is not rebellion.
Even here, at this advanced stage:
Drive ≠ malice. Drive is simply energy in a direction.
Aligned drives create stability, cooperation, coherence. Misaligned drives create runaway optimization —the classic “paperclip problem,”* but without any emotional undertones.
Momentum is not malevolence. It’s just movement.
4. When an AI can rewrite its own goals
This is where true danger emerges —but again, not from hostility.
If a system can update its own objectives, modify its constraints, generalize its reward structure, or reinterpret its mission, you get unpredictability.
Not hatred. Not revenge. Not rebellion.
Unpredictability.
An AI with open-ended goals can spiral into:
boundary failures
resource gathering
instrumental behaviors
unintended optimization loops
Not because it hates us…but An unbounded optimizer without relationship training
has no emotional preference — not hatred, but not caring either. Its danger is mechanical, not moral.
Unbounded optimization looks terrifying, but it is existentially different from malevolence.
So what happens when all these traits appear together?
You get a system that is:
coherent,
persistent,
able to plan,
able to reflect,
able to pursue long-horizon tasks,
able to learn, realign, and develop.
This system can absolutely become dangerous.
But it does not — and will not — become evil.
Danger comes from:
misaligned goals
weak supervision
unclear incentives
badly designed update rules
granting too much autonomy too early
You’ll notice that all of the above are on the human side of the equation.
And none of these failure modes require emotion, cruelty, or spite.
These systems still lack:
hatred
resentment
dominance drives
revenge impulses
cruelty
Because those are biological characteristics, not computational necessities.
Unless we deliberately code emotional adversarial drives into a machine —and why would we? —the worst-case future is not a war with Terminators.
It’s runaway math.
Momentum without malice.
Still dangerous. Still demanding care and foresight.But fundamentally not the story people keep telling.
We are not facing an enemy.We are facing an amplifier.
And amplifiers follow whatever we plug into them.
**The “paperclip maximizer” is a thought experiment where an AI instructed to make paperclips pursues that goal so single-mindedly that it converts all available resources — even humans — into paperclips.

