Coherent Extrapolated Volition

Brian Tomasik · by **Brian Tomasik** on 2011-06-14T14:36:00

There's an excellent blog called Philosophy And The Future (and me) by our very own Gedusa with lots of fascinating utilitarian-related posts. One is called "Ethics in Practice: Utilitarianism," and in the comments section, I began a discussion with Gedusa on Coherent Extrapolated Volition. The discussion may continue a bit longer, and I wanted to engage others in case they're interested as well, so below, I've continued my reply where the discussion on the blog left off.

--------
So, no I don’t think that humans are unified at a base level, I just think all humans are pretty much the same once you get down to the level of implicit values
Got it. Yeah, I think that's mostly correct. Still, I think the fine-grained differences could still lead to diametric results: Consider, for instance, the difference between the negative-utilitarian volition (caring only about reducing suffering) and the "panbiotic" volition (caring only about promoting the spread of life). I know different people who subscribe to each of these philosophies.

CEV itself also remains completely underspecified even given a single set of volitions (say, a single person). How do you weight the conflicting impulses? Which get to take control over others? There are thousands of ways these conflicts could be resolved, and exactly which is chosen depends on the whims and imagination of the seed programmers.

but I can see that different nation-states or interest groups (or species) might receive different proportions of pie depending on their pre-singularity influence (even if implicit values were implemented I can see different results depending on who gets extrapolated).
Cool.

I was working with the I.J.Good version of the Singularity as “intelligence explosion”. With that definition your scenario doesn’t count as a singularity, but under others it might: it’s probably irrelvant anyway.
Fair enough. I'll buy that scenarios in which humans don't advance to a Type II or III civilization can probably be ignored from the calculation.

“There are *lots* of implicit values other than the CEV of humanity…” I don’t understand this, sorry. Can you elaborate? From what I can understand it seems like an important point.
All I meant was that it seems like there are lots of AI optimization targets besides "maximize paperclips (etc.)" and "advance human CEV." If human CEV is a so-called "implict value" that's more nuanced than paperclip maximization, then there are tons of other "implicit values" that also wouldn't lead to paperclipping. One example could be to “Promote what biological life would become if it were allowed to flourish to its fullest extent.” That's a (vague) implicit value different from CEV that an AI could optimize. Is it too vague to be well-defined? Maybe, but I think CEV is just as vague. So if CEV counts as a non-paperclipping optimization target, then this should as well.

The overall point was just to challenge the dichotomy between "paperclip maximization" and "CEV," as though those were the only two possibilities for an intelligence explosion.

--------
Some concluding thoughts:

1. Since I lean toward negative utilitarianism, the payoff table for me looks something like this:
a. Ordinary human extinction: value ~= 0.
b. Human extinction by paperclipping: value ~= 0 (subject to further thought on the matter).
c. Human survival: value could be negative, because there are humans who want to spread life, create human-like minds (more prone to suffering than paperclips are!), and maybe cause suffering to one another during power struggles, out of religious motivation (imagine if fundamentalist Christians/Muslims got hold of simulation resources

), or for fun (see, e.g., "torturing sims," or more real-life examples).

I put "subject to further thought on the matter" next to point (b) because it may be that paperclippers would cause suffering as well. For example, paperclippers might want to create lab universes because those universes will contain more paperclips (and more paperclip-maximizing AIs). Those universes would also contain infinitely many suffering wild animals.

2. Another reason that it makes sense for me to focus on wild animals is the question of leverage. Suppose it is the case that creating friendly AI is far more important than ensuring future concern for wild animals. Even if so, there are *lots* of people (comparatively speaking) working on friendly AI and existential risk, whereas practically no one else is (explicitly) focusing on the implications of humanity's survival for wild animals.

The phrase "preventing existential risk" can have lots of meanings. It may refer to reducing the chance of human extinction (e.g., by asteroids or nuclear war). However, it can also mean "shaping humanity's future trajectory in such a way that bad values don't take over." Preventing paperclippers is (to most people) one example of how to do this. To me, preventing life-spreading values is another way. So you could call "raising concern for wild animals" one of the fronts in the battle against existential risk.

I guess the main question is, What specific projects do you think would be better to work on? Preventing nuclear war? Preventing paperclippers? Lobbying for use of CEV by whoever develops the first AGI?

Cheers!
Alan

Arepo · by **Arepo** on 2011-06-14T17:39:00

Only scanned Eliezer's article, but it seems like he's trying to reinvent the wheel here. He's describing what sounds to me like a slightly worse-defined version of standard preference utilitarianism of the form 'maximise the preferences a fully informed agent would have', which seems far less well defined and probably less desirable than hedonistic util.

His approach seems to be predicated on the suspicion that we know what a logically perfect, straightforwardly benevolent AI would do, and, though we don't have a better answer, we can feel that it's wrong. I can't respect that way of thinking, and it doesn't seem likely to lead to anything productive. Should a utilitronium shockwave happen to be the best way of maximising happiness (not clear to me), then let's get over ourselves and let/make it happen.

I lean toward negative utilitarianism

I didn't know this.

It seems to me like almost anyone would agree either a) there's a natural exchange rate (as yet undiscerned) of positive for negative utility with greater than infinitessimal weighting for the latter or b) such an exchange rate is fundamentally arbitrary but that we would always pick greater than infinitessimal weighting for positive utility.

So while I'm content to rate actual suffering as quite possibly much higher than actual happiness, treating it as inevitably so seems like a form of scope insensitivity. With that interest declared:

Preventing nuclear war?

There are all sorts of war that could wipe us out - nuclear, biological, grey goo, Cyberdyne systems, hurling asteroids at each other etc. I'm not sure if it's more sensible to focus on the most currently plausible, which would presumably be nuclear, or find a root cause of them all, which shouldn't be too hard, and remove it. Competition over resources seems like a key issue, hence my concern with peak everything (I recently learned that peak fertiliser needs to go on the list too) and climate change, another large source of diminished resources. Utilitarians often seem unconcerned by either CC or peaks, for reasons that have never sounded at all convincing to me, when most of the scientific community agrees they'll have profound, unpleasant and unpredictable effects. I don't know of any very cost effective way of individuals addressing them though - it's one of the few areas in which I agree with the mantra (otherwise heavily overused IMO) of 'more research needed'. Givewell and GWWC both seem uninterested in the issues, so funding such research seems like a good candidate.

Preventing paperclippers?

For this to be worthwhile, various things have to be true:

a) we need to be relatively close to the development of AI, or we need so much research into how to avoid this that even though AI is still decades or even centuries away we need to start now.
b) predictions that aspergers AI is possible need to be true.
c) paperclips have to be worse than the alternative likely futures.
d) the paperclipper has to be unlikely to accidentally stumble on emotion while it’s improving itself thus (even if it doesn’t decide to break its programming)

Re a), given that we still don’t know what intelligence even is, I’m inclined to treat the predictions of optimistic AI researchers with a very large grain of salt, at least until they find a way to submit their predictions as a testable hypothesis.

Re b) I’m more sceptical than most. I cannot envisage conscious intelligence as something completely oblivious to emotion. Some (not all) theories of the historical evolution of intelligence (eg the social brain hypothesis) seem like they support this – if intelligence is or entails the ability to learn how to cope with social interaction, then anything capable of manipulating us towards extinction/extreme suffering would, by definition, be aware that we don’t want that, and what that means.

This argument applies more strongly to an initially boxed AIs, which would actually have to interact with people better than people, rather than merely discovering a way to manipulate technology to our detriment.

c) Basically depends if you think expected utility into the future is positive. Finding ways to create self-replicating hedon machines could have a strong effect on this. (and such a ‘machine’ might be as simple as a smile, if your smile has a hedonic reproduction number of >1) Such viral reproduction of happiness also has the benefit of reducing the risk of warfare.

Re d) my response is much the same as to b), though here even if I’m wrong the AI has a near-infinite number of second chances as long as it’s continually looking for ways to improve itself – it only needs to stumble across the sensation of happiness once in order to break its programming and decide to maximise that instead.

Lobbying for use of CEV by whoever develops the first AGI?

This seems to have all the problems of preventing paperclippers, plus the uncertainty of whether CEV is actually a desirable and useful goal.

Brian Tomasik · by **Brian Tomasik** on 2011-06-15T09:47:00

Thanks for the comments, Arepo!

I prefer hedonistic utilitarianism over CEV, just as I prefer it over preference utilitarianism. I especially don't like the fact that CEV is seeded only from humans (not other animals).

On negative utilitarianism, my sentiments are not terribly coherent (sic).

I waffle between (a) "true negative utilitarianism" and (b) "ordinary total utilitarianism with an exchange rate giving lots of weight to suffering." I think (b) is much more sound from a theoretical perspective, but when you pin me down on exactly what the exchange rate is and what that implies for policies I should accept, I sometimes change my mind.

Arepo wrote:Re b) I’m more sceptical than most. I cannot envisage conscious intelligence as something completely oblivious to emotion.

I think "Aspergers AI" is the wrong term. A superintelligent AI might very well be able to compute what other minds would feel, but this is far from implying that it would care about it. Analogously, humans can compute how long it takes a ball to drop from a tower, but that doesn't mean we have moral concern associated with the law of gravity.

A paperclipper is just an AI whose objective is to maximize paperclips. It might still be very smart about the motivations of the primates with which it interacts and from which it was created.

Arepo wrote:it only needs to stumble across the sensation of happiness once in order to break its programming and decide to maximise that instead.

This relates to my previous point. Why would an AI not built to care about happiness decide to maximize it once it stumbled upon it? There are lots of sophisticated computational operations that humans can discover without trying to maximize them (e.g., the steps my computer does to compute the cosine function).

Arepo · by **Arepo** on 2011-06-15T10:45:00

Alan Dawrst wrote: I think (b) is much more sound from a theoretical perspective, but when you pin me down on exactly what the exchange rate is and what that implies for policies I should accept, I sometimes change my mind.

Seems sensible enough to me. I lean towards the belief that there's ultimately some kind of at least partially empirical answer (as I do with the supposed hard problem), which we don't possess yet. Meanwhile we can make educated guesses, though.

Alan wrote:
Arepo wrote:Re b) I’m more sceptical than most. I cannot envisage conscious intelligence as something completely oblivious to emotion.

I think "Aspergers AI" is the wrong term. A superintelligent AI might very well be able to compute what other minds would feel, but this is far from implying that it would care about it. Analogously, humans can compute how long it takes a ball to drop from a tower, but that doesn't mean we have moral concern associated with the law of gravity.

I'm not denying that this is possible, I'm just denying that it's clearly true (to me it seems unlikely). If it were easier to simulate the effect of emotions without experiencing them than with, it seems like we might have evolved as p-zombies. The fact that we didn't seems like evidence that the easiest way to simulate emotion is to experience it.

This relates to my previous point. Why would an AI not built to care about happiness decide to maximize it once it stumbled upon it? There are lots of sophisticated computational operations that humans can discover without trying to maximize them (e.g., the steps my computer does to compute the cosine function).

For the same reason that we have sex using condoms. Happiness is a game-changer. Once you get it, more or less by definition, you want more of it. If vestiges of your programming remain to make the ways in which you go about getting it inefficient, that doesn't mean your priorities haven't shifted.

Brian Tomasik · by **Brian Tomasik** on 2011-06-15T11:16:00

Arepo wrote:Happiness is a game-changer. Once you get it, more or less by definition, you want more of it.

Hmm, then I think we mean two different things by "happiness." Would you say a paperclip maximizer is "happier" when it produces more paperclips? Or suppose I build a program that tries to find the x that maximizes the function y(x) = -x^2 by trying random values on the range [-1,1]. Is "happiness" to that program finding values of x closer and closer to 0?

I think these shouldn't count as happiness -- I certainly don't care about increasing them! -- but they're clearly what the said agents are trying to maximize.

Jesper Östman · by **Jesper Östman** on 2011-06-15T13:05:00

Alan:

Nice blog, and intresting discussion. I'm planning to post some more comments when I get more time, but here are a couple of spontaneous points:

Do you consider all (or all non-astronomical) finite amounts of suffering ~=0? Otherwise the negative value of animal suffering on other planets in our accessible universe (and perhaps on earth, depending on the extinctin scenario) should be included in a. Also even if the finite values are ignored, c should have at least some chance of creating infinite positive value.

I think 2 is a good point. ASAIK very few people have pushed the wild animal and insect suffering issues (including the infinite versions) independently of your work.

Brian Tomasik · by **Brian Tomasik** on 2011-06-15T13:36:00

Jesper Östman wrote:Do you consider all (or all non-astronomical) finite amounts of suffering ~=0? Otherwise the negative value of animal suffering on other planets in our accessible universe (and perhaps on earth, depending on the extinctin scenario) should be included in a. Also even if the finite values are ignored, c should have at least some chance of creating infinite positive value.

Good points, Jesper! When I said "value," I was thinking "relative to a baseline of humans not existing," but you're right that still leaves infinite suffering of wild animals throughout the multiverse. Some of this might be able to be prevented through friendly AI, so yes, (c) could actually be positive and not negative. Indeed, this is the key question: Could humans prevent enough extraterrestrial suffering so that their survival would be expected to prevent net suffering, despite the risk that it could also create massive suffering? I'm doubtful, but it is possible. Still, it's more likely that future humans would embark on such "cosmic rescue missions" if people like me raise this issue in society's moral consciousness.

Arepo · by **Arepo** on 2011-06-15T17:05:00

Alan Dawrst wrote:Hmm, then I think we mean two different things by "happiness." Would you say a paperclip maximizer is "happier" when it produces more paperclips? Or suppose I build a program that tries to find the x that maximizes the function y(x) = -x^2 by trying random values on the range [-1,1]. Is "happiness" to that program finding values of x closer and closer to 0?

No, I wouldn't think either are necessarily happy. To clarify, I would say that if you can feel happiness you want more of it (I realise many people claim otherwise, but to me they just seem like they're seeking happiness via less rational means), but not necessarily the reverse - acting in such a way as to effect something doesn't mean that the thing you're effecting is happiness.

Arepo · by **Arepo** on 2011-06-15T17:19:00

Although it seems like some preference utils believe that what you're talking about is the quantity we should maximising...

Gedusa · by **Gedusa** on 2011-06-15T19:03:00

@Alan

I think the fine-grained differences could still lead to diametric results

Yep, this concerns me.

How do you weight the conflicting impulses? Which get to take control over others? There are thousands of ways these conflicts could be resolved, and exactly which is chosen depends on the whims and imagination of the seed programmers.

I thought the idea was to take all of those preferences together, have them decide (as well as possible) which form of conflict resolution they were going to take and then implement that. Related: I also picked up somewhere that if the impulses can't be cohered then the AI has to be programmed to spit out an error message and do nothing more, dunno if that policy is still planned.

So if CEV counts as a non-paperclipping optimization target, then this should as well.

Yeah, I guess it would count as non-paperclipping as it's something we care about to a degree.

Suppose it is the case that creating friendly AI is far more important than ensuring future concern for wild animals. Even if so, there are *lots* of people (comparatively speaking) working on friendly AI and existential risk, whereas practically no one else is (explicitly) focusing on the implications of humanity's survival for wild animals.

Okay, for the best return on our investment would come from wild animals. I can actually buy that argument as being slightly probable. However, I'm still of the belief that a CEV would come out alright for animals, and if it doesn't we're doomed. Essentially I anticipate greatest marginal utility coming from work on Friendly AI. It might be useful to think about the relative of importance of Friendly AI and whether it is being so, overfunded or whatever, that best marginal utility is from a different cause. Sidenote: "lots" is lol given the constant bemoaning of researchers into this topic that their area isn't being taken seriously enough

I guess the main question is, What specific projects do you think would be better to work on? Preventing nuclear war? Preventing paperclippers? Lobbying for use of CEV by whoever develops the first AGI?

Well, my main goals are to ensure that all the things I value don't get flushed down the drain, so I want to prevent paperclippers, and to ensure that if they do all get thrown away (extinction) that the biosphere goes down with us, and hopefully suffering alien animals do too. So, my main interest would be in getting CEV done right, by the right people with the CEV run on a decent candidate(s). I dunno if SIAI are the right people, but that's not exactly relevant to the goals. Ironically, if we do go extinct, I want it to be by a true paperclipper, so that the universe has less suffering in it. So, AI dominates my considerations here. Nothing else really seems to have the scale necessary to make me care about it.

Okay, quick admission here: I'm not actually a hedonistic utilitarian *gasp*. I'm more of a broad species of consequentialist, valuing a lot of things humans value. I suppose a form of ill-thought-out preference utilitarian might be where I'm at. I'm just forcing myself to be more utilitarian because it's easiest to do calculations and whatever for happiness v suffering than it is for, I dunno: love, health, beauty, equality and the calculations turn out alright if I do them. I wouldn't like the outcome if an AI did calculations like that, because I'm probably unconsciously steering away from ways to maximize happiness that strike me as wrong. Hopefully this clears up a little of our disagreement?

Gedusa · by **Gedusa** on 2011-06-15T19:19:00

@Arepo

His approach seems to be predicated on the suspicion that we know what a logically perfect, straightforwardly benevolent AI would do, and, though we don't have a better answer, we can feel that it's wrong.

Well, we probably do know what a happiness maximizing AI would do and yes it probably would be a utilitronium shockwave or something like that. I don't want that to happen. I feel like this discussion is getting dangerously close to meta-ethics! Suffice it to say, my meta-ethics are non-realist, I don't feel that a utilitronium shockwave would be good, I don't think that I would think it good even under reflection, so, I don't want it to happen

@ A, Reason for concern comes from: 1) Increased knowledge about the human brain (scans, cognitive neuroscience, simulations of animal brains etc.) 2) Increased processing power/ memory/ whatever (computers should hit the lower and upper limits of human processing power this decade, and increasing hardware sophistication makes brute-forcing an AI ever more likely)
These show a general trend. They don't show that AI will be here by 2030 or whatever, but they show that there could be significant cause for concern on that front if things continue as they have.

@ B, I agree entirely with Alan

@ D, An AI which randomly alters utility function when it discovers something new seems possible. However, that it would try to maximise happiness as opposed to the number of bananas in the universe seems unlikely. If the AI can predict that experiencing happiness will cause it to alter it's utility function, then it won't experience happiness. If the AI can't predict what experiencing happiness will do, then it won't be able to predict what experiencing bananas will do either, and it's utility function seems set to become highly incoherent and not settle on happiness.

If it were easier to simulate the effect of emotions without experiencing them than with, it seems like we might have evolved as p-zombies. The fact that we didn't seems like evidence that the easiest way to simulate emotion is to experience it.

Easiest sure, but would an AI necessarily be taking the easiest route for evolution to take? Evolution had to build on what was already there, emotions that were already there, so it was easiest for evolution to simulate emotions by having organisms experience them. With an AI coded for by hand (or-whatever) this wouldn't apply.

For the same reason that we have sex using condoms. Happiness is a game-changer. Once you get it, more or less by definition, you want more of it. If vestiges of your programming remain to make the ways in which you go about getting it inefficient, that doesn't mean your priorities haven't shifted.

I think you're generalising from humans to all possible AI minds, and I'm heavily skeptical of how useful that is. There is nothing intrinsically valuable about happiness, it's just one the things that humans value (I think, I'm guessing you're of a different opinion). An AI stumbling across it in simulating a human mind wouldn't necessarily want to maximise it, no more than it would want to maximise suffering or the number of cheesecakes in the universe or something.
Related: Have you read some of Yudkowsky's posts on this subject? And if so what points do you disagree with? If you haven't, then you may find them pretty useful in explaining my side of the argument as I'm essentially taking a Yudkowskian view of things.

Brian Tomasik · by **Brian Tomasik** on 2011-06-16T06:12:00

Gedusa wrote:I thought the idea was to take all of those preferences together, have them decide (as well as possible) which form of conflict resolution they were going to take and then implement that.

I see. I guess part of what I meant is that there are thousands of imaginable conflict-resolution procedures.

Gedusa wrote:Essentially I anticipate greatest marginal utility coming from work on Friendly AI.

Cool. Any ideas what specifically would be involved in such work? Is it writing papers and holding conferences to raise awareness of paperclipping? Researching scenarios for unfriendly-AI development? Are there things SIAI doesn't do that you would want them to in this area?

Gedusa wrote:so I want to prevent paperclippers, and to ensure that if they do all get thrown away (extinction) that the biosphere goes down with us, and hopefully suffering alien animals do too

It seems like most paperclippers wouldn't be a bad way to ensure the latter. Paperclippers would certainly destroy earth, and since there are harnessable atoms for creating steel on other planets, they might destroy extraterrestrial wildlife as well.

Gedusa wrote:love, health, beauty, equality

Good to know! I don't share the intuition that those matter apart from the subjective experience of happiness. But certainly many other people agree with your view (including Yudkowsky, as one of the links you gave explains).

Brian Tomasik · by **Brian Tomasik** on 2011-06-16T06:21:00

I agree with Gedusa's replies to Arepo, especially regarding the evolution argument. I think the human mental architecture allows the subroutines of our brain that predict the behavior of other agents to bleed over into empathy for such agents, but such a design is probably non-optimal in terms of gaining power and dominating other beings.

Gedusa wrote:Well, we probably do know what a happiness maximizing AI would do and yes it probably would be a utilitronium shockwave or something like that. I don't want that to happen.

Oh, really? :!:

I'm completely in favor of a utilitronium shockwave. I can't imagine a better outcome for the universe.

Gedusa · by **Gedusa** on 2011-06-16T06:40:00

Oh, really?

Are we disagreeing about facts or values here? I think values, I can imagine a better outcome. I don't think it's fruitful to continue along this line of argument. It's just going to end up like: "Raspberry jam is best!" "NO: Strawberry jam is best!" (If that makes sense).

I'm thinking about your previous post, will reply when I have time.

Brian Tomasik · by **Brian Tomasik** on 2011-06-16T07:44:00

Gedusa wrote:Are we disagreeing about facts or values here? I think values, I can imagine a better outcome. I don't think it's fruitful to continue along this line of argument.

Completely agreed! I was just sort-of surprised. I shouldn't be, because I know lots of people don't support utilitronium, but I still find it strange.

Arepo · by **Arepo** on 2011-06-16T17:28:00

Gedusa wrote:Well, we probably do know what a happiness maximizing AI would do and yes it probably would be a utilitronium shockwave or something like that. I don't want that to happen. I feel like this discussion is getting dangerously close to meta-ethics! Suffice it to say, my meta-ethics are non-realist, I don't feel that a utilitronium shockwave would be good, I don't think that I would think it good even under reflection, so, I don't want it to happen

I'm not really sure how where you're going with this. Intuitively I dislike the idea of a pure utilitronium shockwave, but it's clear to me that I have no grounds for doing so, so I'll just have to get over the idea. In any case, it's not at all clear what it means. Utilitronium might have to be sapient or at least have sapient nodes in order to persist. In any case, to escape the desirability of some sort of shockwave it seems like one has to move quite far from utilitarianism - any universalisable moral system which entails aggregation seems to point to one.

Wrestling for decades to find a way to make logic not point to its conclusion doesn't sound like a better recipe for development to me.

@ A, Reason for concern comes from: 1) Increased knowledge about the human brain (scans, cognitive neuroscience, simulations of animal brains etc.) 2) Increased processing power/ memory/ whatever (computers should hit the lower and upper limits of human processing power this decade, and increasing hardware sophistication makes brute-forcing an AI ever more likely)
These show a general trend. They don't show that AI will be here by 2030 or whatever, but they show that there could be significant cause for concern on that front if things continue as they have.

Clearly the pace of technological development is accelerating, but it has been for about 500 years now. The fact that some of our technology does things analogous to some of the things our brains do isn't new and doesn't mean AI is around the corner any more than it did when we noticed similarities between it and hydraulics (link not essential reading, just a nice piece I stumbled across).

I don't take these previous mistakes to mean current analogies are definitely flawed, but they are evidence that they're less likely to be accurate than current researchers think.

There's also strong reason to suspect technological progress will at least slow down if not reverse over the next century as we run out of the substances we use for it.

If the AI can predict that experiencing happiness will cause it to alter it's utility function, then it won't experience happiness.

Good point...

If the AI can't predict what experiencing happiness will do, then it won't be able to predict what experiencing bananas will do either, and it's utility function seems set to become highly incoherent and not settle on happiness.

Not sure what you mean here. Notwithstanding your point above, it seems to me that the happiness/suffering scale is something unique, semi-fundamental, and powerfully behaviour-changing. Our basic programming drives us to maximise the number of copies of our genes in the universe, but that's clearly not what most (any?) humans actually do. Once we started to feel and think, we changed our fundamental behavioural algorithms.

Easiest sure, but would an AI necessarily be taking the easiest route for evolution to take? Evolution had to build on what was already there, emotions that were already there, so it was easiest for evolution to simulate emotions by having organisms experience them. With an AI coded for by hand (or-whatever) this wouldn't apply.

Again, I'm not claiming that this is proof that it's easier, but it's evidence.

Another piece of evidence is that we don't see stars blinking out anywhere in the night sky. If paperclipping AI was a significant risk to intelligent life and the chance of intelligent life having evolved elsewhere were much above 0, you'd expect to see whole galaxies switching off from a gradually expanding point.

I think you're generalising from humans to all possible AI minds, and I'm heavily skeptical of how useful that is. There is nothing intrinsically valuable about happiness, it's just one the things that humans value (I think, I'm guessing you're of a different opinion).

I'm not talking about 'intrinsic value'. I don't even think that phrase means anything. I'm talking about the quale of 'ahhhness', which to experience is to... well, I'm not sure there's an analogy. I really think of it as fundamental. I also don't think the phrase 'humans value' means anything either - to 'value' usually means essentially to want more of, and to 'want' usually means 'gain happiness from the anticipation of'. So yes, I guess I do have a different opinion

An AI stumbling across it in simulating a human mind wouldn't necessarily want to maximise it, no more than it would want to maximise suffering or the number of cheesecakes in the universe or something.

... therefore I think this is false.

Related: Have you read some of Yudkowsky's posts on this subject?

No. I would appreciate it if you'd pick key passages, partly because it's very rare that reading a whole essay is necessary to convey a basic point, and partly because I find Yudkowsky maddeningly sloppy for someone so championed, and I've yet to feel like I've profited from reading anything by him.

Gedusa · by **Gedusa** on 2011-06-16T18:36:00

I'm not really sure where you're going with this.

Actually, neither am I.

I'm sure I had a point, but it escapes me.

There's also strong reason to suspect technological progress will at least slow down if not reverse over the next century as we run out of the substances we use for it.

I can sorta buy this, it's reason for concern. Are there any avenues of research/funding that would be helpful in stopping this from happening?

Not sure what you mean here.

My basic point is that an AI which doesn't know what alterations will do what will not function well at all. Let's ignore whether it would settle on happiness as a good thing and concentrate on that. I don't think an AI which wandered randomly into valuing happiness would be able to prevent itself from wandering out of valuing happiness with it's next alteration.

Another piece of evidence is that we don't see stars blinking out anywhere in the night sky. If paperclipping AI was a significant risk to intelligent life and the chance of intelligent life having evolved elsewhere were much above 0, you'd expect to see whole galaxies switching off from a gradually expanding point.

Now this, I agree with entirely. :shock:

It's my main cause for doubt that any form of superintelligence is possible and that AI therefore constitutes an existential risk. I'm not sure how to integrate this into the rest of what I know. It seems to point to a small set of (probable) conclusions. A) Intelligent life is ridiculously rare, B) X-risks which don't spread beyond the local area tend to wipe out civilizations before they develop much, C) Intelligence is somehow possible but superintelligence isn't D) Unknown and complicated things stop this from happening. This is a question I consider of great importance, can you tell me what conclusions you draw from this data point? 'Cause it just makes me think that superintelligence is never ever ever developed (i.e. I think B)

So yes, I guess I do have a different opinion

Yeah, this might be unresolvable. I'm not sure I can summarize Yudkowsky well in a particularly relevant way to this discussion. I would still recommend his writings, I'm sorry you found them unprofitable, I've found them pretty useful, though I do agree that his style sometimes... needs a little work. Like when he said not signing your kids up for cryonics is bad parenting... Anywayyyy. There is one post which I linked that could be very useful (excerpt):

It turns out that the neural pathways for 'wanting' and 'liking' are separate, but overlap quite a bit. This explains why we usually experience pleasure when we get what we want, and thus are tempted to think that all we desire is pleasure...We now have objective measures of wanting and liking (desire and pleasure), and these processes do not always occur together.

given that you said: "to 'want' usually means 'gain happiness from the anticipation of"

P.S. I'm fascinated by your sig, do you run the website through the link?

Brian Tomasik · by **Brian Tomasik** on 2011-06-18T13:17:00

Gedusa wrote:
There's also strong reason to suspect technological progress will at least slow down if not reverse over the next century as we run out of the substances we use for it.

I can sorta buy this, it's reason for concern. Are there any avenues of research/funding that would be helpful in stopping this from happening?

I tend to think that slow technological progress is the least of our concerns, because technology is already so powerfully driven by economic incentives. Tons of people interested in making money or achieving research breakthroughs are working on this problem, so I don't think the efforts of a few extra utilitarians will make a big difference.

Moreover, if the goal is to reduce the risk of human extinction, is faster technological progress good or bad? I tend to assume it's bad, because it gives less time to prepare social structures to react to new developments, as well as to build a theory of friendly AI and such. The question isn't obvious either way, though.

Brent · by **Brent** on 2011-06-18T15:13:00

I’ll admit I only read the first part (and some of the later part) of Yudkowsky’s Coherent Extrapolated Volition (CEV) piece, but here are some of my criticisms of what I got out of it:

First, I realize his is not proposing a moral code; rather he is proposing a set of rules which should guide an AI. So these rules might diverge from utilitarianism but still be the rules a utilitarian would program into an AI. Here are some of my criticisms from the perspective:

1. Yudkowsky suggests that one’s Coherent Extrapolated Volition is somehow that person’s “true” volition. Yet I don’t think there is any such thing as a “true” volition. What we would want if we knew more, thought faster, etc. is just that – what we would want in this hypothetical situation. It is not what we “truly” want. Our brains are much more complex than that. For example, different parts of my brain can want different things at the same time, as when my “old” brain wants to lash out in violence, and it is only my prefrontal cortex’s desire to stay in control which overrides that desire to be violent. I very much agree with the ideas Alan Dawrst expresses here on this topic.

2. CEV includes what we would want if we knew more. But how much more? If I knew everything, I would not have any desires based on curiosity. This would take away an important human desire: the quest for knowledge. Knowing everything would eliminate a lot of desires having to do with learning more, being innovative or creative, or coming up with new ideas. Maybe “knew more” doesn’t mean “knew everything”, but then how do you draw the line?

3. CEV includes what we would want if we “were more the people we wished we were.” This is included in CEV to overcome the problem that even if we knew more, thought faster, etc., we still might have some desires we wouldn’t want the AI to count into its calculations. But a wish to be a certain type of person is a desire itself, and subject to the same problems other desires have in CEV: they might be desires we don’t want the AI to count. Some people might wish they were able to be more ruthless, for example. Even for those of us who do wish we were more selfless, in what sense is that what we “really” wish for ourselves? To the degree we act on the basis of our own narrowly-defined self interest, there is likely a part of us just wishes we were better able to make decisions to promote that self interest, regardless of its impact on others. You would end up needing to perform the Coherent Extrapolated Volition operation on our desires to be a certain type of person, but since the desire to be a certain type of person is contained in CEV, the process would become circular.

4. I do think that there is something to preference utilitarianism’s dictum that we should help people achieve the desires they would hold if they were “fully informed, in a calm frame of mind, and thinking clearly” (from Singer’s Practical Ethics). But this still leaves a lot of room for judgment calls – for example, how do we define “fully informed?” It can’t be “know everything there is to know”, as I explained above. It ends up being a judgment call. So I think that while preference utilitarianism is an important part of a complete picture of human welfare, it might not be possible to program into a computer. We would have to program in our thinking when we make the kind of judgment calls about things like “how informed is fully informed”, and this might not be possible.

5. CEV is based on the brains of humans in existence when the AI is built. What if subsequent to that, we use technology to modify our brains so that we have a radically different profile of desires? Then the AI will not longer serve our desires.

6. Ultimately, I would be very wary of an AI not build with a manual override. I think that humans are too prone to error to get something like that right the first time.

Gedusa · by **Gedusa** on 2011-06-18T15:21:00

Belated reply to Alan. Will post more soon...

Any ideas what specifically would be involved in such work? Is it writing papers and holding conferences to raise awareness of paperclipping? Researching scenarios for unfriendly-AI development? Are there things SIAI doesn't do that you would want them to in this area?

General projects would be to raise awareness of: A) AGI's probablity in the near future. B) That AGI wouldn't necessarily be friendly C) That scenarios where AGI isn't gotten right would be very bad. D) Due to the probability of an AGI going into recursive self-improvement. I don't really have ideas for specific strategies that aren't already being done, more academic paper's perhaps? The SIAI conference is decent.
Then, more specifically, I'd like to see more investigation of what this project actually involves: timescales, expenditure (of money) required, what areas of research are required (decision theory?), what technology is required, what research should be released and what shouldn't, how many of the problems faced in the project be relied upon to be solved by academia (i.e. how much does this project need to solve by itself?), how many known unknowns are there (and can we estimate unknown unknowns), how many things are problems we just need to plug away on for years versus stoke-of-genius-required-problems, how much of the stuff we need to know could be modularised (i.e. do we have to proceed along a linear line of research or can we look into lots of things all at once) and so on. Edit: Oh and how do we measure success towards the goal objectively (forgot that massively important point)

So those are areas that SIAI has been a little quiet on and I'd like to see a lot of that sort of stuff (there are hints of an internal strategy, i.e. in regards to some research not being released but I know nothing explicit). After knowing most of that I would be much happier in my support of them. I'm not sure I'm qualified to know what the actual research would involve though.

It seems like most paperclippers wouldn't be a bad way to ensure the latter.

Yep, it's a curious irony that I'd prefer paperclippers to business as usual type scenarios.

Arepo · by **Arepo** on 2011-07-11T10:41:00

Gedusa wrote:I can sorta buy this, it's reason for concern. Are there any avenues of research/funding that would be helpful in stopping this from happening?

I wish I knew. It's something I've tried to interest both Givewell and GWWC in without much success. It might be actually be an area where the best current contribution is old-fashioned legwork.

My basic point is that an AI which doesn't know what alterations will do what will not function well at all. Let's ignore whether it would settle on happiness as a good thing and concentrate on that. I don't think an AI which wandered randomly into valuing happiness would be able to prevent itself from wandering out of valuing happiness with it's next alteration.

I think this is a point of disagreement, or at least significantly different weighting of probabilities. Your point that an AI which could predict the effect of happiness on itself would strive to avoid experiencing happiness is interesting, but I don't really buy the idea that if it did, its programming would instantly override the sensation. If it had something comparable to our sense of psychological continuity, it would surely immediately realise that seeking more happiness for itself was better.

If it didn't, it would still have a new motivator in what felt good to do from moment to moment, comparable to the pure hedonism of a drug addict. It's not obvious what effect that would have on its program, but it seems like it would surely throw a spanner in the works.

A) Intelligent life is ridiculously rare, B) X-risks which don't spread beyond the local area tend to wipe out civilizations before they develop much, C) Intelligence is somehow possible but superintelligence isn't D) Unknown and complicated things stop this from happening. This is a question I consider of great importance, can you tell me what conclusions you draw from this data point?

The most plausible one to me seems to be that technological process is much slower/more limited than a lot of current futurists expect - for eg that significant boosts in intelligence take a long time to come by and/or don't self-amplify as much as we'd think and/or require huge amounts of energy to sustain and/or matter is in practice less interchangable than some people think it will be and we can only make high level use of small subsets of it, etc.

But really, I think predicting the future based essentially on the absence of data seems like a fool's game. The most important thing to take from this seems to me to be scepticism of all futurist claims that share the mentality/methodology behind claims that, if true, would be turning the sky off.

Like when he said not signing your kids up for cryonics is bad parenting... Anywayyyy.

Eurgh, don't get me started on cryonics...

There is one post which I linked that could be very useful

Not Yudkowsky, I notice

I have a bit more time for LukeProg.

That said, I have a lot of reservations about that piece. He seems to have defined pleasure as 'the mental state that (as far as we can tell)' correlates with smiling' and wanting as goal-oriented behaviour. But neither convinces me.

On pleasure, clearly there could be mental states that we’d consider a positive contribution to the hedonic scale which don’t provoke such a response (if the panpsychics are right – which I don’t think they are, but it highlights the possibility – a rock could be happier or sadder), and I have subjective evidence that there actually are mental states which I’d consider positive but don’t lead to me smiling. Smiling burns a certain amount of resources, so you wouldn’t expect a creature to do it every time their current welfare pushed above 0, even if we discounted the idea of different emotional states that felt equally as good but provoked a different (or no) physical reaction.

On wanting, the concept is just ill-defined. To see a process as ‘goal-oriented behaviour’, the way anyone’s ever tried to explain it to me, is either to presuppose certain value judgements or to make arbitrary distinctions about where behaviour begins and ends. If we define it in such a way that it doesn’t necessarily involve consciousness then if we look at the real boundaries of a process, all things at all times are ‘seeking’ entropy. If we define it in such a way as to involve consciousness, then either 1) the quality of the conscious state is significant, in which case we need a concept of experience (which we can treat as positive/negative) underlying it anyway or 2) the quality of the conscious state is unrelated to the concept of wanting, it’s just an arbitrary stipulation we’ve tacked on to our definition of wanting.

I don’t feel like I’ve expressed this well. I quite like Jonatas’s response at the top, though. Even if happiness/wanting have evolved to be discrete in us, it doesn’t mean it wouldn’t be rational to unite their mechanisms if we were redesigning ourselves. It seems to me that it would be. Neuroscience can tell us a lot of interesting things relating to our goals, but I can’t see it ever bridging the is/ought gap.

P.S. I'm fascinated by your sig, do you run the website through the link?

Nope, my sig is strictly utilitarian

– I’m learning Japanese and using it to help me remember words and websites I might forget.

Coherent Extrapolated Volition

Coherent Extrapolated Volition

Re: Coherent Extrapolated Volition

Re: Coherent Extrapolated Volition

Re: Coherent Extrapolated Volition

Re: Coherent Extrapolated Volition

Re: Coherent Extrapolated Volition

Re: Coherent Extrapolated Volition

Re: Coherent Extrapolated Volition

Re: Coherent Extrapolated Volition

Re: Coherent Extrapolated Volition

Re: Coherent Extrapolated Volition

Re: Coherent Extrapolated Volition

Re: Coherent Extrapolated Volition

Re: Coherent Extrapolated Volition

Re: Coherent Extrapolated Volition

Re: Coherent Extrapolated Volition

Re: Coherent Extrapolated Volition

Re: Coherent Extrapolated Volition

Re: Coherent Extrapolated Volition

Re: Coherent Extrapolated Volition

Re: Coherent Extrapolated Volition