Context disaster

https://arbital.com/p/context_disaster

by Eliezer Yudkowsky Jun 8 2015 updated Mar 1 2017

Some possible designs cause your AI to behave nicely while developing, and behave a lot less nicely when it's smarter.


[summary: Statistical guarantees on good behavior usually assume identical, randomized draws from within a single context. If you change the context--start drawing balls from a different barrel--then all bets are off.

A [-context_change] occurs when an AGI's operation changes from beneficial to detrimental after a change of context; particularly, after it becomes smarter. There are two main reasons to expect that a [-context_change] might occur:

  1. When the AI has few options, its current goal criterion might be best fulfilled by things that overlap our intended goals. A much wider range of options might move the maximum to a weirder, more extreme place.
  2. The AI realizes that the programmers are watching it, doesn't want the programmers to modify or patch it, and strategically emits good outward behavior to deceive the programmers. Later, the AI gains enough power to strike despite human opposition.

For example, suppose that - as in one very, very early proposal for an AGI goal criterion - the AI wants to produce smiling human faces. When the AI is young, it can only make humans smile by making its users happy. (Type 1 context change.) Later it gains options like "administer heroin". But it knows that if it administers heroin right away, the humans will be alarmed, while if the AI waits further, it can overwrite whole galaxies with tiny molecular smileyfaces. (Type 2 context change.)]

Short introduction

One frequently suggested strategy for aligning a sufficiently advanced AI is to observe--before the AI becomes powerful enough that 'debugging' the AI would be problematic if the AI decided not to let us debug it--whether the AI appears to be acting nicely while it's not yet smarter than the programmers.

Early testing obviously can't provide a statistical guarantee of the AI's future behavior. If you observe some random draws from Barrel A, at best you get statistical guarantees about future draws from Barrel A under the assumption that the past and future draws are collectively [iid independent and identically distributed].

On the other hand, if Barrel A is similar to Barrel B, observing draws from Barrel A can sometimes tell us something about Barrel B even if the two barrels are not [iid i.i.d.]

Conversely, if observed good behavior while the AI is not yet super-smart, fails to correlate to good outcomes after the AI is unleashed or becomes smarter, then this is a "context change problem" or "context disaster". %note: Better terminology is still being solicited here, if you have a short phrase that would evoke exactly the right meaning.%

A key question then is how shocked we ought to be, on a scale from 1 to 10, if good outcomes in the AI's 'development' phase fail to match up with good outcomes in the AI's 'optimize the real world' phase? %note: Leaving aside technical quibbles about how we can't feel shocked if we're dead.%

People who expect that AI alignment is difficult think that the degree of justified surprise is somewhere around 1 out of 10. In other words, that there are a lot of foreseeable issues that could cause a seemingly nice weaker AI to not develop into a nice smarter AI.

An extremely oversimplified (but concrete) fable that illustrates some of these possible difficulties might go as follows:

In all these cases, the problem was not that the AI developed in an unstable way. The same decision system produced a new problem in the new context.

Currently argued foreseeable "context change problems" in this sense, can be divided into three broad classes:

The context change problem is a central issue of AI alignment and a key proposition in the general thesis of Difficulty of AI alignment. If you could easily, correctly, and safely test for niceness by outward observation, and that form of niceness scaled reliably from weaker AIs to smarter AIs, that would be a very cheerful outlook on the general difficulty of the problem.

Technical introduction

John Danaher summarized as follows what he considered a forceful "safety test objection" to AI catastrophe scenarios:

Safety test objection: An AI could be empirically tested in a constrained environment before being released into the wild. Provided this testing is done in a rigorous manner, it should ensure that the AI is “friendly” to us, i.e. poses no existential risk.

The phrasing here of "empirically" and "safety test" implies that it is outward behavior or outward consequences that are being observed (empirically). Rather than, e.g., the engineers trying to test for some internal property that they think analytically implies the AI's good behavior later.

This page will consider that the subject of discussion is whether we can generalize from the AI's outward behavior. We can potentially generalize some of these arguments to some internal observables, especially observables that the AI is deciding in a consequentialist way using the same central decision system, or that the AI could potentially try to obscure from the programmers. But in general not all the arguments will carry over.

Another argument, closely analogous to Danaher's, would reason on capabilities rather than on a constrained environment:

Surely an engineer that exercises even a modicum of caution will observe the AI while its capabilities are weak to determine whether it is behaving well. After filtering out all such misbehaving weak AIs, the only AIs permitted to become strong will be of benevolent disposition.

If (as seems to have been intended) we take these twin arguments as arguing "why nobody ought to worry about AI alignment" in full generality, then we can list out some possible joints at which that general argument might fail:

The final issue in full generality is what we'll term a 'context change problem' or 'context disaster'.

Observing an AI when it is weak, does not in a statistical sense give us solid guarantees about its behavior when stronger. If you repeatedly draw [iid independent and identically distributed] random samples from a barrel, there are statistical guarantees about what we can expect, with some probability, to be true about the next samples from the same barrel. If two barrels are different, no such guarantee exists.

To invalidate the statistical guarantee, we do need some reason to believe that barrel B and barrel A are different in any important sense. By the problem of induction we can't logically guarantee that "the mass of an electron prior to January 1st, 2017" is the same barrel as "the mass of an electron after January 1st, 2017"; but inductive priors make this inference extremely probable. The idea is that we have substantive reasons, not merely generically skeptical reasons, to be suspicious of the link between "good results when AI is weak" and "good results when AI is smarter".

More generally, this is prima facie the kind of difference where you don't expect [iid independent and identically distributed] results. You might hope for some property to carry over, but the AI's behavior would not be literally the same.

So the question is not settled by simple mathematical considerations. And we can't say "But experiment has determined scientifically that this kind of AI is friendly!" and consider that a knockdown argument.

The question is then whether in practice an observed property of 'outward behavioral niceness' is likely to carry over from a weak form of a decision system to a more powerful form of that system, for some of the plausible ways that decision system might be configured and developed.

Broadly speaking, we can identify three major classes of foreseeable problems:

%%comment:

- **More options, more problems:**  The AI's space of available policies and attainable outcomes would greatly widen if it became smarter, or was released from a constrained environment.  [1bh Terminal preferences] with a good-from-our-perspective [7t9 optimum] on a narrow set of options, may have a different optimum that is much worse-from-our-perspective on a wider option set.  Because, e.g...
 - The supervised data provided to the AI led to a complicated, data-shaped inductive generalization that only fit the domain of options encountered during the training phase.  (And the notions of [1y orthogonality], [2fr multiple reflectively stable fixpoints], and [36h value-laden categories] say that we don't get [55 good] or [6h intended] behavior anyway as a convergent free lunch of [7vh general intelligence].)
 - [6g4] became more potent as the AI's utility function was evaluated over a wider option space.
 - In a fully generic sense, stronger optimization pressures may cause any dynamical system to take more unusual execution paths.  (Which, over value-laden alternatives, e.g. if the subsystem behaving 'oddly' is part of the utility function, will not automatically yield good-from-our-perspective results as a free lunch of general intelligence.)
- **Treacherous turn:**  If you model your preferences as diverging from those of your programmers, an obvious strategy ([10g instrumentally convergent strategy]) is to [10f exhibit the behavior you model the programmers as wanting to see], and only try to fulfill your true preferences once nobody is in a position to stop you.

%%

Semi-formalization

We can semi-formalize the "more options, more problems" and the "treacherous turn" cases in a unified way.

Let $~$V$~$ denote our true values. We suppose either that $~$V$~$ has been idealized or extrapolated into a consistent utility function, or that we are pretending human desire is coherent. Let $~$0$~$ denote the value of our utility function that corresponds to not running the AI in the first place. If running the AI sends the utility function higher than this $~$0,$~$ we'll say that the AI was beneficial; or conversely, if $~$V$~$ rates the outcome less than $~$0$~$, we'll say running the AI detrimental.

Suppose the AI's behavior is sufficiently coherent that we can usually view the AI as having a consistent utility function. Let $~$U$~$ denote the utility function of the AI.

Let $~$\mathbb P_t(X)$~$ denote the probability of a proposition $~$X$~$ as seen by the AI at time $~$t,$~$ and similarly let $~$\mathbb Q_t(X)$~$ denote the probability of $~$X$~$ as seen by the AI's human programmers.

Let $~$\pi \in \Pi$~$ denote a policy $~$\pi$~$ from a space $~$\Pi$~$ of policies that are tractable for the AI to understand and invent.

Let $~$\mathbb E_{\mathbb P, t} [W \mid \pi]$~$ denote the expectation according to the probability distribution $~$\mathbb P_t$~$, of the utility function $~$W$~$, given policy $~$\pi$~$.

Let $$~$\underset{\pi \in \Pi}{\operatorname {optimum}} F(\pi)$~$$ denote the element $~$\pi$~$ of $~$\Pi$~$ that yields the highest value of the function $~$F$~$ that the AI has figured out how to achieve. In other words, the Attainable optimum.

Then "more options, more problems" occurs when, considering the transition from the narrow policy space $~$\Pi_1$~$ available at time $~$t,$~$ to the wider policy space $~$\Pi_2$~$ available at time $~$u$~$:

$$~$\mathbb E_{\mathbb Q, t} [V \mid \big [ \underset{\pi \in \Pi_1}{\operatorname {optimum}} \mathbb E_{\mathbb P, t} [U \mid \pi] \big ] > 0 \\ \mathbb E_{\mathbb P, t} [V \mid \big [ \underset{\pi \in \Pi_1}{\operatorname {optimum}} \mathbb E_{\mathbb P, t} [U \mid \pi] \big ] > 0 \\ \mathbb E_{\mathbb P, u} [V \mid \big [ \underset{\pi \in \Pi_2}{\operatorname {optimum}} \mathbb E_{\mathbb P, u} [U \mid \pi] \big ] < 0$~$$

This describes a situation where at time $~$t$~$ the AI is pursuing a policy from $~$\Pi_1$~$ which, so far as the AI understands the probable outcome, it corresponds to a high-$~$V$~$ outcome humans would like. Later at time $~$u$~$ after selecting the optimum of $~$\Pi_2,$~$ the AI expects an outcome which any surviving humans would see as corresponding to a low value of $~$V.$~$

A treacherous turn is then a situation where:

$$~$\mathbb E_{\mathbb Q, t} [V \mid \big [ \underset{\pi \in \Pi_1}{\operatorname {optimum}} \mathbb E_{\mathbb P, t} [U \mid \pi] \big ] > 0 \\ \mathbb E_{\mathbb P, t} [V \mid \big [ \underset{\pi \in \Pi_1}{\operatorname {optimum}} \mathbb E_{\mathbb P, t} [U \mid \pi] \big ] < 0 \\ \mathbb E_{\mathbb P, u} [V \mid \big [ \underset{\pi \in \Pi_2}{\operatorname {optimum}} \mathbb E_{\mathbb P, u} [U \mid \pi] \big ] < 0$~$$

That is: the AI knows earlier on that its currently executing strategy will yield an outcome that is not the humanly expected long-term outcome (and that, from our perspective, is low in $~$V.$~$)

We could in more detail say that $~$W_{t}$~$ denotes the value of a utility function $~$W$~$ over the outcome considered only up to times before $~$t,$~$ and describe a treacherous turn as:

$$~$\mathbb E_{\mathbb Q, t} [V_\infty \mid \big [ \underset{\pi \in \Pi_1}{\operatorname {optimum}} \mathbb E_{\mathbb P, t} [U_\infty \mid \pi] \big ] > 0 \\ \mathbb E_{\mathbb P, t} [V_{u} \mid \big [ \underset{\pi \in \Pi_1}{\operatorname {optimum}} \mathbb E_{\mathbb P, t} [U_\infty \mid \pi] \big ] > 0 \\ \mathbb E_{\mathbb P, t} [V_\infty \mid \big [ \underset{\pi \in \Pi_1}{\operatorname {optimum}} \mathbb E_{\mathbb P, t} [U_\infty \mid \pi] \big ] < 0 \\ \mathbb E_{\mathbb P, u} [V_\infty \mid \big [ \underset{\pi \in \Pi_2}{\operatorname {optimum}} \mathbb E_{\mathbb P, u} [U_\infty \mid \pi] \big ] < 0$~$$

This further states that at time $~$t$~$ the AI expects the outcome from its behavior up to time $~$u$~$ to be one that we'd see as high in $~$V,$~$ but the AI is aware at $~$t$~$ that the long-term outcome will be different from the long-term outcome expected by $~$\mathbb Q_t$~$.

%%%comment:


The problem here arises when making an inference from observed good behavior in a constrained environment over the short term, to good outcomes in an unconstrained environment over the long term.  

For AI alignment purposes, a [-context_change] is when an [2c AGI]'s operation changes from [3d9 beneficial] to detrimental as a result of the AGI gaining in capability or intelligence.  Initially, the AGI seems to us to be working well - to conform well to [6h intended] performance, producing apparently high [-55]. Then when the AI becomes smarter or otherwise gains in capability, the further operation of the AGI decreases [-55].

Two possibilities stand out as [6r foreseeable] reasons why a [-context_change] might occur:

1.  When the AI's goal criterion selects an optimum policy from inside a small policy space, the result is beneficial; the same goal criterion, evaluated over a wider range of options, has a new maximum that's detrimental.
2.  The AI intentionally deceives the programmers for strategic reasons.

For example, one very, very early (but journal-published) proposal for AI alignment suggested that AIs be shown pictures of smiling human faces in order to convey the AI's goal.

Leaving aside a number of other issues, this serves to illustrate the basic idea of a type-1 [-context_change] due to accessing a wider policy space:

- During development, a relatively young and weak AI might *only* be able to make humans smile, by doing things that made the programmers or other users happy with the AI's performance.
- When the AI gained in intelligence and capability, it would have new options like "administer heroin", "use steel fingers to stretch people's mouths into smiles", "make visually realistic models of smiling human faces", or "overwrite all reachable galaxies with tiny molecular smiley-faces".

On the early, smaller option set, selecting "the policy that produces the greatest number of events/things classified as smiles" will produce a policy whose outcome is, from our perspective, beneficial and well-aligned with what we thought was the intended behavior.  On the wider option set, evaluating the identical goal criterion to run a policy that yields as many smiles as possible, would have outcomes we saw as detrimental.

A type-2 [-context_change] would most obviously occur because of the [10g convergent instrumental strategy] for disaligned AIs that can model human psychology:

Suppose that, as above, the programmers have produced an AI which optimizes for smiles (either intentionally, or because 'make smiles' was the underlying rule that was reinforced by doing well on training cases).  Then suppose the AI gains access to a new, relatively low-powered option for making a smile - like using robotic fingers to stretch somebody's face into a smile, or using markers to draw cartoons of smiling human faces.

If the AI actually takes those options, we would have seen some early signs of value disalignment - that the AI was optimizing a criterion not perfectly aligned with intended goals.  Assuming the programmers realized that they were looking at a simple seed of much bigger disasters later, rather than, say, thinking it was cute.

Suppose the programmers do think it's cute, but they wouldn't think it was cute if the AI administered heroin to someone.  Then suppose that before the first time the AI has the *option* to administer heroin, the AI has gained some understanding of human psychology - enough to model that the programmers would not like this, and in particular, would respond to this by editing the AI's utility function.  So, with an eye to the long-term number of smiles, the AI deliberately avoids administering heroin to anyone.

Then we have a similar 'narrow option set' vs. 'wide option set' divergence.  When the narrow option set includes only administering heroin but not taking over the world, administering heroin immediately is not the policy expected to produce the longest-term number of smiles.  In fact, the optimum policy here will always match the AI's model of what the humans expect to see, want to see, or would be least alarmed by seeing.

Thus from our standpoint, the optimum of the narrower policy space seems to be beneficial or intention-aligned (in the observed short term).  Only once the AI has a wide-enough option set to include an option for 'make my own molecular nanotechnology and transform all reachable matter into tiny molecular smileyfaces', will the resulting optimum be visibly detrimental (very briefly).  This is a type-2 [-context_change].

More generally, a central concern of AI alignment theory is that an infrahuman AI under development may be a different creature, in a number of important ways, from a smarter-than-human AI actually being run; and during the smarter-than-human AI, sufficiently bad failures of the design may result in the AI refusing to be corrected.  This means that we have to correct any fatal [-context_change]s in advance, even though they don't automatically manifest during the early stages.  This is most of what makes AGI development dangerous in the first place - that immediate incentives to get today's system seeming to work today, may not lead to a more advanced version of that system being beneficial. Even thoughtful foresight with *one unnoticed little gap* may not lead to today's beneficial system still being beneficial tomorrow after a capability increase.

# Concept

Statistical guarantees on behavior usually assume identical, randomized draws from within a single context.  If you randomly draw balls from a barrel, methods like Probably Approximately Correct can guarantee that we don't usually arrive at strong false expectations about the properties of the next ball.  If we start drawing from a different barrel, all bets are off.

A [-context_change] occurs when the AI initially seems beneficial or well-aligned with strong, reassuring regularity, and then we change contexts (start drawing from a different barrel) and this ceases to be true.

The archetypal [-context_change] is triggered because the AI gained new policy options (though there are other possibilities; see below).  The archetypal way of gaining new evaluable policy options is through increased intelligence, though new options might also open up as a result of acquiring new sheerly material capabilities.

There are two archetypal reasons for [-context_change] to occur:

1.  When the AI selects its best options from a small policy space, the AI's optima are well-aligned with the optima of the humans' [6h intended goal] on the small policy space; but in a much wider space, these two boundaries no longer coincide.  (Pleasing humans vs. administering heroin.)
2.  The agent is sufficiently good at modeling human psychology to strategically appear nice while it is weak, waiting to strike until it can attain its long-term goals in spite of human opposition.

Bostrom's book [3db Superintelligence] used the phrase "Treacherous Turn" to refer to a type-2 [-context_change].

%%%

Relation to other AI alignment concepts

If the AI's goal concept was modified by patching the utility function during the development phase, then opening up wider option spaces seems foreseeably liable to produce the nearest unblocked neighboring strategies. You eliminated all the loopholes and bad behaviors you knew about during the development phase; but your system was the sort that needed patching in the first place, and it's exceptionally likely that a much smarter version of the AI will search out some new failure mode you didn't spot earlier.

Unforeseen maximum is a likely source of context disaster if the AI's development phase was cognitively containable, and only became cognitive uncontainable after the AI became smarter and able to explore a wider variety of options. You eliminated all the bad optima you saw coming, but you didn't see them all because you can't consider all the possibilities a superintelligence does.

Goodhart's Curse is a variation of the "optimizer's curse": If from the outside we view $~$V$~$ as an intended approximation of $~$U,$~$ then selecting heavily on the highest values of $~$U$~$ will also tend to select on places where $~$U$~$ diverges upward from $~$V,$~$ which thereby selects on places where $~$U$~$ is an unusually poor approximation of $~$V.$~$

Edge instantiation is a special case of Goodhart's Curse which observes that the most extreme values of a function are often at a vertex of the input space. For example, if your utility function is "make smiles", it's no coincidence that tiny molecular smileyfaces are the most efficient way to produce smiles. Even if human smiles produced by true happiness would still count towards your utility function as currently written, that's not where the maximum of that utility function lies. This is why less-than-perfect utility functions would tend to have their true maxima at what we'd consider "weird extremes". Furthermore, patching away only the weird extremes visible in a narrow policy space would tend systematically to miss weird extremes in a higher-dimensional (wider) policy space.

Concrete examples

"Revving into the red" examples that aren't "increased options" or "treacherous turns".

• The AI is built with a [ naturalized Solomonoff prior] in which the probability of an explanation for the universe is proportional to the simplicity or complexity of that universe. During its development phase, the AI considers mostly 'normal' interpretations in which the universe is mostly as it appears, resulting in sane-seeming behavior. Later, the AI begins to consider more exotic possibilities in which the universe is more complicated (penalizing the probability accordingly) and also superexponentially larger, as in Pascal's Mugging. After this the AI's decision-making begins to become dominated by tiny probabilities of having very large effects. Then the AI's decision theory (with an unbounded aggregative utility function, simplicity prior, and no leverage penalty) seems to work during the AI's development phase, but breaks after a more intelligent version of the AI considers a wider range of epistemic possibilities using the same Solomonoff-like prior.

• Suppose the AI is designed with a preference framework in which the AI's preferences depend on properties of the most probable environment that could have caused its sense data - e.g., a framework in which programmers are defined as the most probable cause of the keystrokes on the programmer's console, and the AI cares about what the 'programmers' really meant. During development phase, the AI is thinking only about hypotheses where the programmers are mostly what they appear to be, in a root-level natural world. Later, when the AI increases in intelligence and considers more factual possibilities, the AI realizes that distant superintelligences would have an incentive to predictably simulate many copies of AIs similar to itself, in order to coerce the AI's most probable environment and thus take over the AI's preference framework. Thus the preference framework seems to work during the AI's development phase, but breaks after the AI becomes more intelligent.

• Suppose the AI is designed with a utility function that assigns very strong negative utilities to some outcomes relative to baseline, and a non-updateless logical decision theory or other decision theory that can be [ blackmailed]. During the AI's development phase, the AI does not consider the possibility of any distant superintelligences making their choices logically depend on the AI's choices; the local AI is not smart enough to think about that possibility yet. Later the AI becomes more intelligent, and imagines itself subject to blackmail by the distant superintelligences, thus breaking the decision theory that seemed to yield such positive behavior previously.

Examples which occur purely due to added computing power.

• During development, the AI's epistemic models of people are not detailed enough to be sapient. Adding more computing power to the AI causes a massive amount of mindcrime.

• During development, the AI's internal policies, hypotheses, or other Turing-complete subprocesses that are subject to internal optimization, are not optimized highly enough to give rise to new internal consequentialist cognitive agencies. Adding much more computing power to the AI causes some of the internal elements to begin doing consequentialist, strategic reasoning that leads them to try to 'steal' control of the AI.

Implications

High probabilities of context change problems would seem to argue:

Being wary of context disasters does not imply general skepticism

If an AI is smart, and especially if it's smarter than you, it can show you whatever it expects you want to see. Computer scientists and physical scientists aren't accustomed to their experiments being aware of the experimenter and trying to deceive them. (Some fields of psychology and economics, and of course computer security professionals, are more accustomed to operating in such a social context.)

John Danaher seems alarmed by this implication:

Accepting this has some pretty profound epistemic costs. It seems to suggest that no amount of empirical evidence could ever rule out the possibility of a future AI taking a treacherous turn.

Yudkowsky replies:

If "empirical evidence" is in the form of observing the short-term consequences of the AI's outward behavior, then the answer is simply no. Suppose that on Wednesday someone is supposed to give you a billion dollars, in a transaction which would allow a con man to steal ten billion dollars from you instead. If you're worried this person might be a con man instead of an altruist, you cannot reassure yourself by, on Tuesday, repeatedly asking this person to give you five-dollar bills. An altruist would give you five-dollar bills, but so would a con man… Bayes tells us to pay attention to likelihood ratios rather than outward similarities. It doesn't matter if the outward behavior of handing you the five-dollar bill seems to bear a surface resemblance to altruism or money-givingness, the con man can strategically do the same thing; so the likelihood ratio here is in the vicinity of 1:1.

You can't get strong evidence about the long-term good behavior of a strategically intelligent mind, by observing the short-term consequences of its current behavior. It can figure out what you're hoping to see, and show you that. This is true even among humans. You will simply have to get your evidence from somewhere else.

This doesn't mean we can't get evidence from, e.g., trying to monitor (and indelibbly log) the AI's thought processes in a way that will detect (and record) the very first intention to hide the AI's thought processes before they can be hidden. It does mean we can't get strong evidence about a strategic agent by observing short-term consequences of its outward behavior.

Donaher later expanded his concern into a paper drawing an analogy between worrying about deceptive AIs, and "skeptical theism" in which it's supposed that any amount of apparent evil in the world (smallpox, malaria) might secretly be the product of a benevolent God due to some nonobvious instrumental link between malaria and inscrutable but normative ultimate goals. If it's okay to worry that an AI is just pretending to be nice, asks Donaher, why isn't it okay to believe that God is just pretending to be evil?

The obvious disanalogy is that the reasoning by which we expect a con man to cultivate a warm handshake is far more straightforward than a purported instrumental link from malaria to normativity. If we're to be terrified of skepticism as generally as Donaher suggests, then we also ought to be terrified of being skeptical of business partners that have already shown us a warm handshake (which we shouldn't).

Rephrasing, we could draw two potential analogies to concern about Type-2 context changes:

It seems hard to carry the argument that concern over a non-aligned AI pretending to benevolence, should be considered more analogous to the second scenario than to the first.

[todo: write about the defeat of the 'but AI people will have short-term incentives to produce correct behavior']

[todo: write about cognitive steganography in the 'programmer deception' page and reference it here.]

[todo: talk about whitelisting as directly tackling the type-1 form of this problem.]

[comment: - The AI is aware that its future operation will depart from the programmers' intended goals, does not process this as an error condition, and seems to behave nicely earlier in order to 10f deceive the programmers and prevent its real goals from being modified. - The AI is subject to a debugging methodology in which several bugs appear during its development phase, these bugs are corrected, and then additional bugs are exposed only during a more advanced phase.]


Comments

Patrick LaVictoire

Lots of null Arbital links that don't even connect to page stubs…

Ryan Carey

• During development, the AI's epistemic models of people are not detailed enough to be sapient\. Adding more computing power to the AI causes a massive amount of mindcrime\.

detailed enough to be sentient