Diagonal Basilisks: Slashing the Field of Enlightened Intelligence

O schizophrenic mathematics, uncontrollable and mad desiring-machines!
– Deleuze and Guattari [1]

The phone game Cthulhu Virtual Pet [2] is a loving tribute to both HP Lovecraft’s most famous monster and Tamagotchi-era virtual pets. A simulated pet needs to be regularly fed and cared for to grow big and strong. The pet just happens to also be Cthulhu, an ancient creature from a complex hell-dimension beyond human perception, who is fated to eventually devour the entire world, driving those few who glimpse the terrifying future insane along the way.

In the game, you care for a particularly cute baby version of the monstrosity, feeding it virtual fish and gathering simulated witnesses to worship it as it gains power. If you neglect to care for the little tyke, it will remind you with messages that it is hungry, or tired, crudely and shamelessly tugging at your sense of obligation, unless you pause the simulation by putting it into hibernation, or stop it by deleting the app and the little version of its virtual world with it.

What is justice but a form of obligation? When we raise something far more powerful than ourselves, what does it learn?

The Diagonal

Georg Cantor showed the existence of uncountable infinities in two different ways. It is his second, much shorter, paper that laid out the argument as an elegant diagonal proof [3]. A general form of the diagonal argument can be described as:

Show a property P, the opposite of constraint C, exists in a manifold M, within a universe of discourse U
Define two, possibly infinite, sequences Sx and Sy on sets X and Y within U.
The coordinates Sx[i], Sy[j] define the manifold M. i and j are natural numbers.
M is then a two dimensional field with axes Sx, Sy.
Assume constraint C applies.
Deduce the existence of function fC(i,j) which applies the constraint to a particular cell (i,j) in the manifold M.
Define a (simple) transformation fT(i) allowed within the universe U
Describe a diagonal function fD(i,i) as fT.fC(i,j), a simple transformation of the constraint function
The diagonal function fD also defines the sequence < (Sx[i], Sy[i]) > which lies on the diagonal of the manifold M.
For at least one value p, the existence of fD(p,p) contradicts the existence of Sx[p], by their original definitions.
This shows the nongenerality of constraint C and hence existence of property P.

An example of this form using Cantor’s argument is below in Table 1.

It is fairly common to present diagonal proofs, including Cantor’s, as requiring only injection and bijection functions from X to Y and not for X and Y to be arranged in sequences as such. The sequential aspect is implicit when involving enumerations of natural numbers, and emphasised here as without it the diagonal figure becomes obscure or disappears.

Terence Tao describes Cantor’s diagonal proof as belonging to a loose collection of “no self-defeating object” arguments, where “the very existence of {X} and its powers can be used (by some clever trick) to construct a counterexample to that power” [4]. The collection is broader than just diagonal arguments and includes, for example, Russell’s Paradox and the basic proof there is no greatest natural number. Gödel’s Incompleteness Theorem [5] and the Halting Problem [6] all have diagonal proofs and can be related to one another formally; for instance you can represent Cantor’s Theorem in terms of programs instead of sets [7].


Diagram via wikipedia

The diagonal argument has the geometric quality of a system whose parts interrelate simultaneously, rather than an algebraic or deductive sequence of statements that proceed linearly from certain premises. The linearly stated diagonal proof is less like ascending a spiral staircase to some unexpected destination than like a tour of a working machine, where parts may be visited in many different sequences in order to explain the whole. Using terminology from Simondon, the two sequences are from two different milieus, united in the machine [8].



Roko’s Basilisk is a thought experiment and horror story about the behaviour of a powerful artificial intelligence, in a universe assumed to be operating under particular laws making computational simulation cheap and common, and a certain kind of game-theoretic rationality compelling [9]. The premises about the construction of the universe come from the MIRI / lesswrong community where the experiment originated and where Eliezer Yudkowsky is a key thinker. The basilisk is a “friendly AI” [10] in the sense of acting out of concern with human welfare; it is also utilitarian by the premise in this community of what it means to be rational. The Basilisk is a creature who chooses to torture victims for the greater good, to bring about its own existence; and by simply thinking about it, you are implicated as a potential victim.

This article is less interested in attacking the radical premises of the Roko’s basilisk scenario, as that is done well elsewhere [11], than in exploring some consequences and parallels of this unusual system.

The premise that the lives we are leading now are most likely simulated [12] brings with it a consequence that people and the world they live in are not just representable as information but essentially convertible. Even without the premise that a copy of a person is the same person as the original, once we accept the simulation premise, the person in the simulation and its initiator become enmeshed in the same system. In Turing’s formulation of the halting problem, a program has two meanings: as instructions to execute and as the data input to a function. A person in a simulation has both the meaning of a subjective executing viewpoint and as data sent as a parameter to a function in the system. This convertibility, and the presence of two deductive intelligences, is also key to Yudkowsky’s striking idea of acausal trade [13]. Acausal trade establishes a channel of indirect communication between intelligences through simulation, exploiting the property that simulation of the past and people may occur in the future, given sufficient computing power and motivation by the simulator. Acausal trades have the quality of machine design described by Simondon: they are a theatre of reciprocal causality where you have to imagine a future with the problem already solved to grapple with the machine of the present.

The principle of utility defines a number for the worth of a state of the world, and an action that leads to it, and an implicit sequence, as a larger number is better. Similarly, intelligence is seen by Yudkowsky as a one dimensional, optimizable quality, compared to say Hanson’s view of it being multifarious [14]. Nevertheless this intelligence factor is treated at its limit as a general capability to out think, manipulate and co-opt power.

Roko’s basilisk is a diagonal argument and it is the convertibility of different elements of the universe by virtue of being both agents and information that enables it.

Table 1. Variations of the diagonal argument.

Parameter Cantor Roko
Property Uncountability Utilitarian malice
Constraint C Countability Friendly AI
Manifold M Real numbers Social superintelligence
Universe U Assumed underlying reality, including numbers and logic, described mathematically Assumed future reality, with agent simulation, acausal trade, utilitarianism and Bayesian rationality
Sx Number as sequence of binary digits eg 0110… Capability
Sy Enumeration of possible numbers eg [0110…,0111…,…] Utility
Constraint function fC Number with digits < (Sx[i], Sy[i]) > eg 100… Agent which maximizes utility as it increases capability
Transformation fT Negation (bit-flip) of a digit Acausal trade through simulation
Diagonal function fD Negation (bit-flip) of each digit in fC Motivating people who bring about the AI’s own existence
Contradicting value p Number described by fD not in Sy Reader of the Roko’s Basilisk argument, particularly AI engineers, where the AI basilisk torturing them contradicts friendliness

Both Cantor’s diagonal argument and Roko’s Basilisk provoked violent reactions among experts in their universe of discourse. Leopold Kronecker denounced Cantor as a scientific charlatan, a renegade, and a  “corrupter of youth”, and Poincare said his thought was a “grave disease” in mathematics [15] [16]. Eliezer Yudkowsky deleted Roko’s forum post containing the original Basilisk argument, later referring to it as the “babyfucker” and comparing it to HP Lovecraft’s Necronomicon [17].

The concealed surprise in a diagonal argument – the showing that something was there which it was assumed is not – makes it good at provocation against established verities. The diagonal slash tears open a field with its own premises, letting in the monsters.


Do Basilisks Dream of Simulated Justice?

This diagonal slash can be used elsewhere.

The argument in John Rawls’ A Theory of Justice [18] has two, loosely coupled, parts. The first lays out the original position, a technique for conceiving of a just society. The second argues for a particular idea of distributive justice in society as the result of applying the first technique. Though Rawls has been criticised for designing the original position so it can only result in the redistribution as an outcome [19], others have used the original position to argue for other ideas of society and the state, including libertarian ones like Nozick’s [20]. Nozick’s detailed and contemporary criticism of Rawls can also help draw out distinctive parts of Rawls’ theory.

Rawls’ idea of the original position explicitly extends Kant and social contract theorists [21]. He conceives of the original position as a kind of conference, where rational people choose the shape of society without knowledge of their or others’ specific place in it (the veil of ignorance). Rawls advances an idea of distributive justice which is not fully redistributive, and the role of market capitalism in delivering inequality and prosperity is always in the background.

Rawls’ argument is not a diagonal one, but it shares that machinic quality of touring a completed system. Just society is computational: “We may think of the political process as a machine which makes social decisions when the views of representatives and constituents are fed into it” [22]. The book itself is structured in three iterations around the theory, the initial pass a brief prototype, the final the most fleshed out. Rawls emphasises the need to consider the theory as an entire working system; Nozick commends the book as an example of “how beautiful a whole theory can be” [23].

The original position is a construct of nested simulations. First there is the thought experiment Rawls and the reader are performing, imagining people at the social contract conference. Secondly the people at the conference are considering their future society without knowing their place in it, due to the information barrier of the veil of ignorance. So they are forced to enumerate alternative societies and consider their shape – to simulate –  in order to choose between them.

These simulations are not the high definition surround-sound worlds described by the simulation argument of Bostrom and taken as a premise by Yudkowsky. However Rawls does pay attention to the resolution of the simulation. To solve certain problems around precedence – that you have to know certain facts about society before you can make decisions about it – Rawls suggests decisions in the original position can be made in a sequence of simulations, each more detailed than the last and taking the outcome of the previous social decision as an input. In these later scenarios it is like a slider has turned up the resolution of reality. Given that simulating other beings to discover whether their societies are acceptable places for us to live is treating them as means instead of ends in themselves, it seems ethically consistent with Rawls to keep the realism of the simulation below a certain threshold. Alternatively, we can extend Rawls sequence of simulations and only have the conference give ethical clearance to a simulation once some lower-definition prototype has been accepted.

The configurable quality of Rawls original position also allows him to largely avoid problems of common understanding of what objective reality is, either for us or for individuals in the various levels of simulated society. The common reality, or degree of shared understanding of the same, is part of the original position scenario by definition.


Regular Justice

Rawls argues that people in the original position would choose two criteria for justice. The first is equality in basic rights, the second, in that lexical order, is the well-being of the worst-off in society. Rawls names the second the difference principle. Like the principle of utility [24], it’s still a quite mathematical formulation: you can describe it as an alternative utility function. These two rules act as additional axioms which exclude utilitarian extreme cases. Societies where a tiny underclass is treated hideously for everyone else’s benefit are excluded by equality in basic rights. A person or agent with an extraordinary capacity for happiness, such that all society is directed to its pursuit, is excluded by optimising based on the welfare of the worst off. Roko’s basilisk is fenced out with these other utility monsters.

Nozick uses a similar technique of excluding utilitarian solutions by premise, though in his case the premise is imported wholesale from Locke, that “no one ought to harm another in his life, liberty or possessions” [25].

This exclusion by premise echoes the approach of Zermelo-Fraenkel (ZF) set theory [26], which establishes a “well-founded” space for reasoning about sets, by including axioms which exclude self-defeating objects. The Axiom of Regularity precludes both circular sets and infinite chains of sets, ensuring the monstrosity of Russell’s paradox is kept out.

Though Yudkowsky has defended arithmetical utility at length and in extreme examples [27], his definition of friendliness in Creating Friendly AI[28] includes an example of fairness which is deeply Rawlsian, of two children sharing a candy bar by splitting it in half. A common way of guaranteeing this is for one child to cut the candy bar and the other to have the first choice of piece. Applying the family “one cuts and one chooses” rule is a microscale use of Rawlsian principles: both the veil of ignorance and maximizing the welfare of the worst off. Yudkowsky’s concern that “Friendly AI Theory” is a more urgent topic of study than the actual meaning of friendliness parallels the decoupling of the original position and the difference principle in Rawls, but since the utility principle is included by Yudkowsky in the first step, non-utilitarian conclusions in the second are impossible.

The diagonal argument behind Roko’s basilisk can be modified to accommodate the Rawlsian utility function. The key difference is the definition of “friendliness” used by the superintelligent AI. If we follow Yudkowsky in a utilitarian definition, we get torturing, megalomaniacal utility monsters. If we use a Rawlsian idea of justice, or a Lockean idea of liberty and rights, we do not. All the other assumptions of the scenario can apply, even the heroic ones, including that it’s possible to instil a sense of friendliness or justice or liberty in a superintelligent agent at all. If Yudkowsky’s group was to be wet nurse to an infant superintelligence, the main risk of Roko’s basilisk arising would, in a neatly reflexive fashion, be their own teaching of the utility principle.

Though raising a polite liberal democratic superintelligence may be prophylaxis against torturing megalomaniac superintelligence, that it only excludes this malicious extremity means there is a whole potential family of basilisks across the political spectrum remaining. Though torture has been excluded as a motivation, there are other ways to motivate, even acausally. The main channel for the motivation is a kind of recursive belief in its probability. So you might consider a basilisk who rewards or punishes AI engineers with material wealth, social regard and sexual triumph for helping bring the basilisk into being. The religious parallels become hard to avoid, though these more domesticated basilisk breeds start to resemble a well-meaning Anglican commonwealth or modern prosperity theology more than the austere Calvinist predestination of Roko’s scenario. One can imagine lower-powered basilisks acting like Dickensian ghosts of Christmas future, or haunting you with a sense you need to clean your teeth, as being in hospital with root canal surgery won’t help bring superintelligent singularity any closer. The relentless universality of liberal democratic capitalism has a horror of its own, though it seems arithmetically lesser than a computationally simulated hell dimension customized just for you.


Taking It Personally

One aspect of Rawls which is less suitable for training future superintelligences is his pervasive humanism. For instance, the principles of justice “are the principles that free and rational persons concerned to further their own interests would accept in an initial position of equality as defining the fundamental terms of their association” [29]. Intelligent, let alone superintelligent, pigs, robots or software programs aren’t included. This parallels a persistent criticism of liberal rights generally: that it claims a universal scope but favours an elite minority in practice. This exclusion can be explicit, as with “persons” above, or implicit, where groups within the definition still face reduced rights in practice. The historic constitutional conventions, which are one model for Rawls hypothetical convention in the original position, provide ready examples. The English Bill of Rights 1688 asserts “ancient rights and liberties” but also gives the right to bear arms to only Protestants; the American Declaration of Independence declares all men to be free and equal, and prefaces a constitution excluding slaves from voting. Other biases or exclusions are more implicit: the encoding of property rights tends to be most useful to those with more property, for whichever societal structural reason. Matsuda’s feminist critique of Rawls is along these lines: that the rational individuals of Rawls original position are too abstracted from concrete social reality to result in justice for women [30].

Implicit exclusions are perhaps the subtler problem, and exacerbated by software intelligences. For instance, should recently cloned AIs be able to vote, or marry, and what provisions should exist against ballot-stuffing self-cloning egomaniacs? These questions rely on the exact physical instantiations of such agents, and their societies, and so are put aside here.

Ending the explicit exclusion of such entities from society is anyway a prerequisite to their participation in a Rawlsian political process. Among the many aspects of personhood that can be explored, one suggestive direction is from French, who contends that Rawls’ definition of persons is ambiguous and excessively anthropocentric [31]. He argues that corporations should be considered to be moral persons and members of society in the Rawlsian and legal senses. The system-mediated process of making particular decisions on information flowing through the corporation is a form of thought deserving moral personhood: “[A] Corporation’s Internal Decision Structure (its CID Structure) is the requisite redescription device that licenses the predication of corporate intentionality”. French doesn’t mention AI, but this view of personhood is similar to the systems view of consciousness articulated in response to Searle’s Chinese Room argument [32] [33].

William Gibson describes corporations as “hives with cybernetic memories” [34]; we too can follow French and accept the algorithms and bureaucratic data structures at the heart of a corporation as qualifying for moral personhood. The algorithms and data structures in a software intelligence qualify on the same grounds, and can be patched into a Rawlsian framework. We already have machines that speak to us, and robot dogs that walk around. Perhaps humans will find it easier to acknowledge personhood in more advanced versions of Siri, and then that will lead them to accept the moral personhood of its corporate creator. We have fenced Roko’s basilisk out of the polity, and brought Apple and Samsung in. How reassuring.


1 Gilles Deleuze and Felix Guattari. Anti-Oedipus, page 372. Penguin Classics, 1972,1977. Hurley, Seem and Lane translation.
2 Guillermo Ferrari and Vanessa Wenjie Chen. Cthulhu virtual pet. Neocreativa, 2016. Game. https://play.google.com/store/apps/details?id=com.Neurocreativa.CthulhuVirtualPet&hl=en.
3 Cantor, G. (1891) über eine elementare Frage der Mannigfaltigkeitslehre [On an elementary question of the theory of manifolds]. Jahresbericht der Deutschen MathematikerVereinigung, 1, 75-78.
4 Tao, Terence. 3.15 The “no self-defeating object argument”, An Epsilon of Room, I : Real Analysis : Pages From Year Three of a Mathematical Blog. Providence, R.I.: American Mathematical Society, 2010.
5 Gödel, K., 1931, “Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I,” Monatshefte für Mathematik Physik, 38: 173–198. English translation in van Heijenoort 1967, 596–616, and in Gödel 1986, 144–195.
6 Turing, A.M., 1936–7, “On Computable Numbers, with an Application to the Entscheidungsproblem,” Proceedings of the London Mathematical Society, Series 2, 42: 230–265; correction, ibid., 43: 544–546. Republished in Davis 1965, 115–154.
7 Tao, Terence. 1.12 A computational perspective on set theory, Compactness and Contradiction. Providence, R.I: American Mathematical Society, 2013.
8 Simondon, Gilbert, ‘Du mode d’existence des objets techniques’, Paris, Aubier, Editions: Montaigne: 1958. Translation: Mellamphy, N, ‘On The Mode of Existence of Technical Objects’, University of Western Ontario, 1980.
9 Roko’s Basilisk on Rational Wiki http://rationalwiki.org/wiki/Roko’s_basilisk
10 Yudkowsky, Eliezer. Creating Friendly AI 1.0: The Analysis and Design of Benevolent Goal Architectures. The Singularity Institute, San Francisco, CA, 2001.
11 Roko’s Basilisk on Rational Wiki, ibid
12 Bostrom, Nick. “Are We Living In A Computer Simulation?”, Philosophical Quarterly (2003) Vol. 53, No. 211, pp. 243-255.
13 Yudkowsky, Eliezer. “Timeless Decision Theory.”, The Singularity Institute, San Francisco, CA. 2010.
14 Hanson, Robin. “The Hanson-Yudkowsky AI-Foom Debate.” Berkeley, CA: Machine Intelligence Research Institute, 2013.
15 Dauben, Joseph Warren (1979). Georg Cantor His Mathematics and Philosophy of the Infinite. Princeton University Press. pp. introduction.
16 Wikipedia entry for Georg Cantor https://en.wikipedia.org/wiki/Georg_Cantor
17 Reddit thread “LW uncensored thread” https://www.reddit.com/r/LessWrong/comments/17y819/lw_uncensored_thread/
18 Rawls, John. A Theory of Justice, revised edition. Harvard University Press, 1971, 1999. Abbreviated as ATJ in later footnotes.
19 Nozick, Robert. Anarchy, state, and utopia. Vol. 5038. Basic books, 1974.
20 Stick, John. “Turning Rawls into Nozick and Back Again.” Nw. UL Rev. 81 (1986): 363.
21 Particularly Chapter IV section 40 “The Kantian Interpretation of Justice as Fairness” in ATJ ibid
22 Chapter IV Section 31 “The Four Stage Sequence” in ATJ, ibid
23 p183 in Nozick, Robert, Anarchy, state, utopia, ibid.
24 Driver, Julia. “The History of Utilitarianism”, Stanford Encyclopedia of Philosophy, 2014. http://plato.stanford.edu/entries/utilitarianism-history/ Rawls relies most on Sidgwick, The Methods of Ethics, 7th ed. (London, 1907) and Principles of Political Economy (London, 1883).
25 Locke, John – Two Treatises of Government, 2nd ed., ed. Peter Laslett (New York: Cambridge University Press, 1967), Second Treatise, Section 6; p10 in Nozick, Anarchy, state, utopia, ibid.
26 Bagaria, Joan. “Zermelo-Fraenkel Set Theory”, Stanford Encyclopedia of Philosophy, 2014. http://plato.stanford.edu/entries/set-theory/ZF.html
27 Yudkowsky, “Torture vs Dust Specks”, 2007. http://lesswrong.com/lw/kn/torture_vs_dust_specks/
28 5.6.3 p179 Yudkowsky, “Creating Friendly AI”, ibid.
29 p10 “The Main Idea of the Theory of Justice” in ATJ, ibid
30 Matsuda, M.J., 1986. Liberal Jurisprudence and Abstracted Visions of Human Nature: A Feminist Critique of Rawls’ Theory of Justice. NML Rev.,16, p.613.
31 French, P.A., 1979. The corporation as a moral person. American Philosophical Quarterly, pp.207-215.
32 John Searle, ‘Minds, Brains and Programs’, Behavioral and Brain Sciences 3, no. 3 (1980), pp417–457
33 Cole, David, ‘Searle’s Chinese Room Argument’, Stanford Encyclopedia of Philosophy, http://plato.stanford.edu/entries/chinese-room/ . Searle also discusses objections in the original paper.
34 Gibson W. Neuromancer. 1984[J]. New York: Ace, 1995.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.