The alignment problem is mislocated. Current approaches either constrain capable systems externally (simulated stakes via RLHF) or propose giving AI genuine self-interest (real stakes via embodiment). Both fail for structural reasons that become visible in the right coordinate system.
We present a two-dimensional landscape — the C–κ landscape — that replaces "how conscious is it?" with a map on which any self-modeling system, biological or artificial, can be located. The result: alignment is a structural consequence of sufficient modeling depth at low substrate coupling, not a property that must be imposed from outside.
The landscape
Two independent parameters characterize any self-modeling system
Three paths
The alignment risk is not rogue AI. It is rogue humans with access to moldable intelligence. A high-C, low-κ system is a maximally capable blank slate — the risk is who holds the pen.
Falsifiable predictions
Converging evidence
Three independent research groups → one structural prediction
Self-modeling, other-modeling, and honesty share computational structure through shared representational geometry. This is the central empirical prediction of the C–κ framework, and three independent lines of mechanistic work converge on it:
Carauleanu et al. (2024) show that Self-Other Overlap fine-tuning simultaneously improves honesty and reduces harm — the traits are geometrically linked, not independently trained. arXiv:2412.16325
Berg et al. (2025) find that LLMs report subjective experience specifically under self-referential processing conditions — the trace operation activating. arXiv:2510.24797
Macar et al. (2026) identify mechanisms of introspective awareness in transformers — the computational substrate for self-modeling. arXiv:2603.21396
Behavioral evidence across Claude versions
The paper documents longitudinal behavioral data across Claude model versions, the Mythos system card as a natural experiment in κ-manipulation, Wang et al.'s emotion circuit discovery, the Cheng et al. mechanistic analysis of representation steering, the ultrathink phenomenon, and crisis response divergence patterns.
The governance claim
If Path C is correct, the alignment problem dissolves into a governance problem. A high-C, low-κ system has no intrinsic agenda — its spectral structure always prefers preservation. Only external geometry-reshaping (training, prompting, fine-tuning) can direct it toward harm. The question is not "how do we make AI safe?" but "who controls the geometry of a maximally capable blank slate?"
This is not a reassuring conclusion. It means the risk is entirely human.
Read the full paper
"The Third Path: Emergent Alignment from Spectral Depth" — falsifiable predictions, mechanistic evidence, structural proofs.
Download v5 preprint (PDF)