Why we can't rely on CEV

Brian Tomasik · by **Brian Tomasik** on 2013-01-24T05:23:00

Introduction

I'm often asked questions like the following: If wild-animal suffering, lab universes, sentient simulations, etc. are so bad, why can't we assume that Coherent Extrapolated Volition (CEV) will figure that out and do the right thing for us?

Disclaimer

Most of my knowledge of CEV is based on Yudkowsky's 2004 paper, which he admits is obsolete. I have not yet read most of the more recent literature on the subject.

Reason 1: CEV will (almost certainly) never happen

CEV is like a dream for a certain type of moral philosopher: Finally, the most ideal solution for discovering what we really want upon reflection!

The fact is, the real world is not decided by moral philosophers. It's decided by power politics, economics, and Darwinian selection. Moral philosophers can certainly have an impact through these channels, but they're unlikely to convince the world to rally behind CEV. Can you imagine the US military -- during its AGI development process -- deciding to adopt CEV? No way. It would adopt something that ensures the continued military and political dominance of the US, driven by mainstream American values. Same goes for China or any other country. If AGI is developed by a corporation, the values will reflect those of the corporation or the small group of developers and supervisors who hold the most power over the project. Unless that group is extremely enlightened, CEV is not what we'll get.

Anyway, this is assuming that the developers of AGI can even keep it under control. Most likely AGI will turn into a paperclipper or else evolve into some other kind of Darwinian force over which we lose control.

Objection 1: "Okay. Future military or corporate developers of AGI probably won't do CEV. But why do you think they'd care about wild-animal suffering, etc. either?"

Well, they might not, but if we make the wild-animal movement successful, then in ~50-100 years when AGI does come along, the notion of not spreading wild-animal suffering might be sufficiently mainstream that even military or corporate executives would care about it, at least to some degree.

If post-humanity does achieve astronomical power, it will only be through AGI, so there's high value for influencing the future developers of an AGI. For this reason I believe we should focus our meme-spreading on those targets. However, this doesn't mean they should be our only focus, for two reasons: (1) Future AGI developers will themselves be influenced by their friends, popular media, contemporary philosophical and cultural norms, etc., so if we can change those things, we will diffusely impact future AGI developers too. (2) We need to build our movement, and the lowest-hanging fruit for new supporters are those most interested in the cause (e.g., antispeciesists, environmental-ethics students, transhumanists). We should reach out to them to expand our base of support before going after the big targets.

Objection 2: "Fine. But just as we can advance values like preventing the spread of wild-animal suffering, couldn't we also increase the likelihood of CEV by promoting that idea?"

Sure, we could. The problem is, CEV is not an optimal thing to promote, IMHO. It's sufficiently general that lots of people would want it, so for ourselves, the higher leverage comes from advancing our particular, more idiosyncratic values. Promoting CEV is kind of like promoting democracy or free speech: It's fine to do, but if you have a particular cause that you think is more important than other people realize, it's probably going to be better to promote that specific cause than to jump on the bandwagon and do the same thing everyone else is doing, since the bandwagon's cause may not be what you yourself prefer.

Indeed, for myself, it's possible CEV could be a net bad thing, if it would reduce the likelihood of paperclipping -- a future which might (or might not) contain far less suffering than a future directed by humanity's extrapolated values.

Reason 2: CEV would lead to values we don't like

Some believe that morality is absolute, in which case a CEV's job would be to uncover what that is. This view is mistaken, for the following reasons: (1) Existence of a separate realm of reality where ethical truths reside violates Occam's razor, and (2) even if they did exist, why would we care what they were?

Yudkowsky and the LessWrong community agree that ethics is not absolute, so they have different motivations behind CEV. As far as I can gather, the following are two of them:

Motivation 1: Some believe CEV is genuinely the right thing to do

As Eliezer said in his 2004 paper (p. 29), "Implementing CEV is just my attempt not to be a jerk." Some may believe that CEV is the ideal meta-ethical way to resolve ethical disputes.

I have to differ. First, the set of minds included in CEV is totally arbitrary, and hence, so will be the output. Why include only humans? Why not animals? Why not dead humans? Why not humans that weren't born but might have been? Why not paperclip maximizers? Baby eaters? Pebble sorters? Suffering maximizers? Wherever you draw the line, there you're already inserting your values into the process.

And then once you've picked the set of minds to extrapolate, you still have astronomically many ways to do the extrapolation, each of which could give wildly different outputs. Humans have a thousand random shards of intuition about values that resulted from all kinds of little, arbitrary perturbations during evolution and environmental exposure. If the CEV algorithm happens to make some more salient than others, this will potentially change the outcome, perhaps drastically (butterfly effects).

Now, I would be in favor of a reasonable extrapolation of my own values. But humanity's values are not my values. There are people who want to spread life throughout the universe regardless of suffering, people who want to preserve nature free from human interference, people who want to create lab universes because it would be cool, people who oppose utilitronium and support retaining suffering in the world, people who want to send members of other religions to eternal torture, people who believe sinful children should burn forever in red-hot ovens, and on and on. I do not want these values to be part of the mix.

Maybe (hopefully) some of these beliefs would go away once people learned more about what these wishes really implied, but some would not. Take abortion, for example: Some non-religious people genuinely oppose it, and not for trivial, misinformed reasons. They have thought long and hard about abortion and still find it to be wrong. Others have thought long and hard and still find it to be not wrong. At some point, we have to admit that human intuitions are genuinely in conflict in an irreconcilable way. Some human intuitions are irreconcilably opposed to mine, and I don't want them in the extrapolation process.

Motivation 2: Some argue that even if CEV isn't ideal, it's the best game-theoretic approach because it amounts to cooperating on the prisoner's dilemma

I think the idea is that if you try to promote your specific values above everyone else's, then you're timelessly causing this to be the decision of other groups of people who want to push for their values instead. But if you decided to cooperate with everyone, you would timelessly influence others to do the same.

This seems worth considering, but I'm doubtful that the argument is compelling enough to take too seriously. I can almost guarantee that if I decided to start cooperating by working toward CEV, everyone else working to shape values of the future wouldn't suddenly jump on board and do the same.

Objection 1: "Suppose CEV did happen. Then spreading concern for wild animals and the like might have little value, because the CEV process would realize that you had tried to rig the system ahead of time by making more people care about the cause, and it would attempt to neutralize your efforts."

Well, first of all, CEV is (almost certainly) never going to happen, so I'm not too worried. Second of all, it's not clear to me that such a scheme would actually be put in place. If you're trying to undo pre-CEV influences that led to the distribution of opinions to that point, you're going to have a heck of a lot of undoing to do. Are you going to undo the abundance of Catholics because their religion discouraged birth control and so led to large numbers of supporters? Are you going to undo the over-representation of healthy humans because natural selection unfairly removed all those sickly ones? Are you going to undo the under-representation of dinosaurs because an arbitrary asteroid killed them off before CEV came around?

The fact is that who has power at the time of AGI will probably matter a lot. If we can improve the values of those who will have power in the future, this will in expectation lead to better outcomes -- regardless of whether the CEV fairy tale comes true.

Arepo · by **Arepo** on 2013-01-24T12:57:00

Nice critique. What exactly do you have in mind by CEV 'happening'?

Brian Tomasik · by **Brian Tomasik** on 2013-01-25T15:20:00

Arepo wrote:Nice critique.

Thanks! Arepo can be hard to convince sometimes, so I'm glad he liked at least parts of it.

Arepo wrote:What exactly do you have in mind by CEV 'happening'?

An AGI singleton is built which uses CEV to decide what it thinks should be done.

CEV as a concept is of course more broadly applicable, but it's already pretty widely known as a philosophical idea.

In any event, it's not my ideal solution. I want my values to win; I don't want my values merged with those of people who want to burn babies for eternity.

peterhurford · by **peterhurford** on 2013-01-25T16:08:00

You should publish this on LessWrong?

Pablo Stafforini · by **Pablo Stafforini** on 2013-01-26T03:11:00

I agree with Peter: this should be cross-posted to LessWrong.

Brian Tomasik · by **Brian Tomasik** on 2013-01-26T12:19:00

Thanks, guys.

If I post on LW myself, there will be an unending stream of comments, and I don't think I'm able to reply to them all. If one of you would like to post on my behalf, please go ahead and do so. I would be honored.

Pablo Stafforini · by **Pablo Stafforini** on 2013-01-26T16:13:00

Done.

peterhurford · by **peterhurford** on 2013-01-26T17:00:00

Brian Tomasik wrote:If I post on LW myself, there will be an unending stream of comments, and I don't think I'm able to reply to them all. If one of you would like to post on my behalf, please go ahead and do so. I would be honored.

I wouldn't worry too much about that; especially if it's going to prevent you from posting. I think you should post it since it's your piece and since you're the most qualified to field some of the comments. Those of us here I'm sure can add to the discussion there too, whether or not we agree with you.

Brian Tomasik · by **Brian Tomasik** on 2013-01-27T01:32:00

Aww, thanks again.

I do have a LW account with sufficient karma, so the only reason I didn't post was not wanting to get sucked into replies. Turns out there weren't as many as I was expecting.

I was wondering if it was irrational to avoid posting just to avoid taking time on the discussion. Probably, although I do feel like it might seem insensitive to post and not reply to comments.

In any event, thank you both for your replies! You've done as well as I could have.

Arepo · by **Arepo** on 2013-02-01T13:25:00

Brian Tomasik wrote:Thanks! Arepo can be hard to convince sometimes, so I'm glad he liked at least parts of it.

I can't decide whether that's a good reputation to have

I think I'm not very good at making it clear when I have actually been persuaded on something. You and Trader Joe persuaded me of the value of prophil between you with little to no other influence, for eg (and now I seem to be a stronger advocate of it than most people in 80K).

Turns out there weren't as many as I was expecting.

How many were you expecting?

Brian Tomasik · by **Brian Tomasik** on 2013-02-02T14:03:00

Congrats on hitting 1000 posts, Arepo.

Arepo wrote:I can't decide whether that's a good reputation to have

Well...

Arepo wrote:You and Trader Joe persuaded me of the value of prophil between you with little to no other influence, for eg (and now I seem to be a stronger advocate of it than most people in 80K).

Cool. I was remarking just recently on how powerful it can be to talk about. I have a friend who's now considering it, even though before talking with me, he/she had never thought money was that important for social change.

Arepo wrote: How many were you expecting?

Well, they came slowly. When I made that comment, there were only ~30 or so.

Why we can't rely on CEV

Why we can't rely on CEV

Re: Why we can't rely on CEV

Re: Why we can't rely on CEV

Re: Why we can't rely on CEV

Re: Why we can't rely on CEV

Re: Why we can't rely on CEV

Re: Why we can't rely on CEV

Re: Why we can't rely on CEV

Re: Why we can't rely on CEV

Re: Why we can't rely on CEV

Re: Why we can't rely on CEV