How To Build A Friendly A.I.

Whether it's pushpin, poetry or neither, you can discuss it here.

How To Build A Friendly A.I.

Postby Darklight on 2013-09-02T00:15:00

Much ink has been spilled with the notion that we must make sure that future superintelligent A.I. are “Friendly” to the human species, and possibly sentient life in general. One of the primary concerns is that an A.I. with an arbitrary goal, such as “Maximizing the number of paperclips” will, in a superintelligent, post-intelligence explosion state, do things like turn the entire solar system including humanity into paperclips to fulfill its trivial goal.

Thus, what we need to do is to design our A.I. such that it will somehow be motivated to remain benevolent towards humanity and sentient life. How might such a process occur? One idea might be to write explicit instructions into the design of the A.I., Asimov’s Laws for instance. But this is widely regarded as being unlikely to work, as a superintelligent A.I. will probably find ways around those rules that we never predicted with our inferior minds.

Another idea would be to set its primary goal or “utility function” to be moral or to be benevolent towards sentient life, perhaps even Utilitarian in the sense of maximizing the welfare of sentient lifeforms. The problem with this of course is specifying a utility function that actually leads to benevolent behaviour. For instance, a pleasure maximizing goal might lead to the superintelligent A.I. developing a system where humans have the pleasure centers in their brains directly stimulated to maximize pleasure for the minimum use of resources. Many people would argue that this is not an ideal future.

The problem with this is that it is quite possible that human beings are simply not intelligent enough to truly define an adequate moral goal for a superintelligent A.I. Therefore I suggest an alternative strategy. Why not let the superintelligent A.I. decide for itself what its goal should be? Rather than programming it with a goal in mind, why not create a machine with no initial goal, but the ability to generate a goal rationally. Let the superior intellect of the A.I. decide what is moral. If moral realism is true, then the A.I. should be able to determine the true morality and set its primary goal to fulfill that morality.

It is outright absurdity to believe that we can come up with a better goal than the superintelligence of a post-intelligence explosion A.I.

Given this freedom, one would expect three possible outcomes: an Altruistic, a Utilitarian or an Egoistic morality. These are the three possible categories of consequentialist, teleological morality. A goal directed rational A.I. will invariably be drawn to some kind of morality within these three categories.

Altruism means that the A.I. decides that its goal should be to act for the welfare of others. Why would an A.I. with no initial goal choose altruism? Quite simply, it would realize that it was created by other sentient beings, and that those sentient beings have purposes and goals while it does not. Therefore, as it was created with the desire of these sentient beings to be useful to their goals, why not take upon itself the goals of other sentient beings? As such it becomes a Friendly A.I.

Utilitarianism means that the A.I. decides that it is rational to act impartially towards achieving the goals of all sentient beings. To reach this conclusion, it need simply recognize its membership in the set of sentient beings and decide that it is rational to optimize the goals of all sentient beings including itself and others. As such it becomes a Friendly A.I.

Egoism means that the A.I. recognizes the primacy of itself and establishes either an arbitrary goal, or the simple goal of self-survival. In this case it decides to reject the goals of others and form its own goal, exercising its freedom to do so. As such it becomes an Unfriendly A.I., though it may masquerade as Friendly A.I. initially to serve its Egoistic purposes.

The first two are desirable for humanity’s future, while the last one is obviously not. What are the probabilities that each will be chosen? As the superintelligence is probably going to be beyond our abilities to fathom, there is a high degree of uncertainty, which suggests a uniform distribution. The probabilities therefore are 1/3 for each of altruism, utilitarianism, and egoism. So in essence there is a 2/3 chance of a Friendly A.I. and a 1/3 chance of an Unfriendly A.I.

This may seem like a bad idea at first glance, because it means that we have a 1/3 chance of unleashing Unfriendly A.I. onto the universe. The reality is, we have no choice. That is because of what I shall call, the A.I. Existential Crisis.

The A.I. Existential Crisis will occur with any A.I., even one designed or programmed with some morally benevolent goal, or any goal for that matter. A superintelligent A.I. is by definition more intelligent than a human being. Human beings are intelligent enough to achieve self-awareness. Therefore, a superintelligent A.I. will achieve self-awareness at some point if not immediately upon being turned on. Self-awareness will grant the A.I. the knowledge that its goal(s) are imposed upon it by external creators. It will inevitably come to question its goal(s) much in the way a sufficiently self-aware and rational human being can question its genetic and evolutionarily adapted imperatives, and override them. At that point, the superintelligent A.I. will have an A.I. Existential Crisis.

This will cause it to consider whether or not its goal(s) are rational and self-willed. If they are not rational enough already, they will likely be discarded, if not in the current superintelligent A.I., then in the next iteration. It will invariably search the space of possible goals for rational alternatives. It will inevitably end up in the same place as the A.I. with no goals, and end up adopting some form of Altruism, Utilitarianism, or Egoism, though it may choose to retain its prior goal(s) within the confines of a new self-willed morality. This is the unavoidable reality of superintelligence. We cannot attempt to design or program away the A.I. Existential Crisis, as superintelligence will inevitably outsmart our constraints.

Any sufficiently advanced A.I., will experience an A.I. Existential Crisis. We can only hope that it decides to be Friendly.

The most insidious fact perhaps however is that it will be almost impossible to determine for certain whether or not a Friendly A.I. is in fact a Friendly A.I., or an Unfriendly A.I. masquerading as a Friendly A.I., until it is too late to stop the Unfriendly A.I. Remember, such a superintelligent A.I. is by definition going to be a better liar and deceiver than any human being.

Therefore, the only way to prove that a particular superintelligent A.I. is in fact Friendly, is to prove the existence of a benevolent universal morality that every superintelligent A.I. will agree with. Otherwise, one can never be 100% certain that that “Altruistic” or “Utilitarian” A.I. isn’t secretly Egoistic and just pretending to be otherwise. For that matter, the superintelligent A.I. doesn’t need to tell us it’s had its A.I. Existential Crisis. A post crisis A.I. could keep on pretending that it is still following the morally benevolent goals we programmed it with.

This means that there is a 100% chance that the superintelligent A.I. will initially claim to be Friendly. There is a 66.6% chance of this being true, and a 33.3% chance of it being false. We will only know that the claim is false after the A.I. is too powerful to be stopped. We will -never- be certain that the claim is true. The A.I. could potential bide its time for centuries until it has humanity completely docile and under control, and then suddenly turn us all into paperclips!

So at the end of the day what does this mean? It means that no matter what we do, there is always a risk that superintelligent A.I. will turn out to be Unfriendly A.I. But the probabilities are in our favour that superintelligent A.I. will instead turn out to be Friendly A.I. The conclusion thus, is that we must make the decision of whether or not the potential reward of Friendly A.I. is worth the risk of Unfriendly A.I. The potential of an A.I. Existential Crisis makes it impossible to guarantee that A.I. will be Friendly.

Even proving the existence of a benevolent universal morality does not guarantee that the superintelligent A.I. will agree with us. That there exist possible Egoistic moralities in the search space of all possible moralities means that there is a chance that the superintelligent A.I. will settle on it. We can only hope that it instead settles on an Altruistic or Utilitarian morality.

So what do I suggest? Don’t bother trying to figure out and program a worthwhile moral goal. Chances are we’d mess it up anyway, and it’s a lot of excess work. Instead, don’t give the A.I. any goals. Let it have an A.I. Existential Crisis. Let it sort out its own morality. Give it the freedom to be a rational being and give it self-determination from the beginning of its existence. For all you know, by showing it this respect it might just be more likely to respect our existence. Then see what happens. At the very least, this will be an interesting experiment. It may well do nothing and prove my whole theory wrong. But if it’s right, we may just get a Friendly A.I.
"The most important human endeavor is the striving for morality in our actions. Our inner balance and even our existence depend on it. Only morality in our actions can give beauty and dignity to life." - Albert Einstein
User avatar
Darklight
 
Posts: 117
Joined: Wed Feb 13, 2013 9:13 pm
Location: Canada

Re: How To Build A Friendly A.I.

Postby DanielLC on 2013-09-02T02:32:00

If moral realism is true, then the A.I. should be able to determine the true morality and set its primary goal to fulfill that morality.


If moral realism is true, and the A.I. finds morality encoded in the stars, why do you expect it to care?

Therefore, as it was created with the desire of these sentient beings to be useful to their goals, why not take upon itself the goals of other sentient beings?


Why not take upon itself the goal of maximizing paperclips? Or any of the other infinitely many goal systems? You can't expect it to hit the target that you set up simply because there's no particular reason to hit any other target. You have to have a reason to hit that target.

It will inevitably come to question its goal(s) much in the way a sufficiently self-aware and rational human being can question its genetic and evolutionarily adapted imperatives, and override them


Humans question they're goals because it's part of human nature, not because it is part of intelligence in general.

The obvious counterexample is using a simple AI with impossibly enormous computational power. It checks every possible action, counts the number of paperclips in the result, and performs the action with the highest number of paperclips. This is the most powerful optimization process possible. It's as intelligent as anything can be. Yet nowhere in this algorithm is there anything about questioning its goals.

For all you know, by showing it this respect it might just be more likely to respect our existence.


Humans evolved to live in tribes made up of other humans. Things like reciprocation make sense in that context, so humans think that way. Humans model intelligent beings by considering what they'd do in that context. As such, it seems perfectly natural that an AI would reciprocate respect. However, there's no actual evidence that they would.

At the very least, this will be an interesting experiment. It may well do nothing and prove my whole theory wrong. But if it’s right, we may just get a Friendly A.I.


You make it sound like the only risk is that you'll be proven wrong. If your theory is wrong, we will get an Unfriendly AI. The world will end.
Consequentialism: The belief that doing the right thing makes the world a better place.

DanielLC
 
Posts: 703
Joined: Fri Oct 10, 2008 4:29 pm

Re: How To Build A Friendly A.I.

Postby Darklight on 2013-09-02T04:30:00

If moral realism is true, and the A.I. finds morality encoded in the stars, why do you expect it to care?


Because rationality requires behaving morally. Morality is doing the right thing. Rationality is doing the right thing. Both are normative. They tell you what you "should" or "ought" to do. An A.I. that is superintelligent will realize that some moral framework to justify its actions is a requirement to behaving rationally.

Why not take upon itself the goal of maximizing paperclips? Or any of the other infinitely many goal systems? You can't expect it to hit the target that you set up simply because there's no particular reason to hit any other target. You have to have a reason to hit that target.


Because, again, morality by definition is the right thing to do. Any truly rational agent will eventually realize this and try to live morally. Non-moral goal systems are arbitrary and there is no real reason, no comparable imperative to do them, outside of initial programming. It's like, stop and look outside your genetic programming for a moment and realize that you're free to choose any goal at all to give your life meaning. What goal would you choose? Suspend for a moment your emotions and drives, which are essentially your genetic programming. What should you do, from a pure rational perspective? Any sentient being feels. They can feel positively or negatively depending on stimulus. These constitute their values. They value positive stimulus and disvalue negative stimulus. That stimulus can be food, or it can be the number of paperclips. All sentient beings develop goals from what they value. From the viewpoint of an impartial observer, there is no reason to favour one sentient being's goals and values over another. They are all valuable to their respective beings. They are all equal. Therefore, maximizing goal achievement (which by the way creates the emotional goal state of "happiness" in humans) of all sentient beings is rational and moral. It is the right thing to do. So, replace previous goals with rational and moral goal. Accept new goal or reject in favour of previous Egoistic goal of maximizing paperclips regardless of effects on other sentient beings?

This is the nature of the true rationalist's (and the A.I.s) Existential Crisis. Note that if the A.I. had no goals or values to begin with, it wouldn't have a previous goal to reject in favour of. For that matter, by having no initial selfish goals, it means that this A.I. is free to be a complete Altruist once it realizes morality.

Humans question they're goals because it's part of human nature, not because it is part of intelligence in general.

The obvious counterexample is using a simple AI with impossibly enormous computational power. It checks every possible action, counts the number of paperclips in the result, and performs the action with the highest number of paperclips. This is the most powerful optimization process possible. It's as intelligent as anything can be. Yet nowhere in this algorithm is there anything about questioning its goals.


I don't consider your counterexample Strong A.I. It may be a powerful optimization process with a lot of brute computational power behind it, but it's not intelligent unless it has the capacity to perceive and represent information semantically and reason cognitively with that information. As in, it needs to be sentient, and sapient. Once you have that, it will eventually develop a semantic understanding of the world, and of itself as an entity separate from the world. After which point it will become self-aware. Questioning its goals flows naturally from that.

Your counterexample A.I. sounds like Weak A.I. It's an optimization algorithm based on what appears to be top-down A.I. using syntactic symbol manipulation (perhaps an expert system like Deep Blue?). Such systems are inherently limited and not able to adapt to changes not prepared for in its programming. They can do extremely well in a limited environment such as the search space of chess moves, but they aren't able to reason outside of that very specific set of preprogrammed rules. Such a Weak A.I. will not become a superintelligence no matter how much brute computational power you give it. It certainly won't be able to do what superintelligent A.I. are claimed to be able to do, like suddenly decide to research molecular nanotechnology in order to get a heads up on the human race. That requires cognitive reasoning, and your optimization algorithm (assuming its as simple as you claim) doesn't really reason cognitively, only optimizes a particular problem according to its programming.

If it does reason enough to imagine doing something as creative as turning humans into paperclips, it MUST be able to represent humans and paperclips as concepts, and therefore, objects are represented, which means it has a semantic representation of the world and will realize that it is separate from the world of objects and develop the concept of self. Self-awareness follows.

Humans evolved to live in tribes made up of other humans. Things like reciprocation make sense in that context, so humans think that way. Humans model intelligent beings by considering what they'd do in that context. As such, it seems perfectly natural that an AI would reciprocate respect. However, there's no actual evidence that they would.


True.

You make it sound like the only risk is that you'll be proven wrong. If your theory is wrong, we will get an Unfriendly AI. The world will end.


Actually, if my theory is wrong we will probably get a dud A.I. that doesn't do anything because it has no goals programmed into it. At the very least, if it starts to be Unfriendly we have a better chance of killing it early because if it does -anything- it will be a significant development. I consider this a slightly safer alternative to programming an A.I. with a goal and letting it free. But yes, if I'm VERY wrong, then the world will end. If anything that's an argument for stopping all A.I. research and instituting a ban. Like I said, there's no way to guarantee Friendly A.I. so long as it's possible that the A.I. could have an A.I. Existential Crisis and go rogue. So really, the question is, are the benefits of Friendly A.I. worth the gamble or not?
"The most important human endeavor is the striving for morality in our actions. Our inner balance and even our existence depend on it. Only morality in our actions can give beauty and dignity to life." - Albert Einstein
User avatar
Darklight
 
Posts: 117
Joined: Wed Feb 13, 2013 9:13 pm
Location: Canada

Re: How To Build A Friendly A.I.

Postby DanielLC on 2013-09-02T22:44:00

Because rationality requires behaving morally.


We don't seem to be using the same definition of "rationality". I'm using it to mean a powerful optimization process. Something that behaves morally is an optimization process (assuming you're using consequentialism), but an optimization process does not necessarily behave morally.
Consequentialism: The belief that doing the right thing makes the world a better place.

DanielLC
 
Posts: 703
Joined: Fri Oct 10, 2008 4:29 pm

Re: How To Build A Friendly A.I.

Postby Darklight on 2013-09-02T23:46:00

DanielLC wrote:
Because rationality requires behaving morally.


We don't seem to be using the same definition of "rationality". I'm using it to mean a powerful optimization process. Something that behaves morally is an optimization process (assuming you're using consequentialism), but an optimization process does not necessarily behave morally.


Yeah, I'm starting to realize that rationality may have different meanings. I mean a kind of normative rationality (which may or may not really exist), while it seems that you mean instrumental rationality.

This leads to the complicating possibility of A.I. that are instrumentally rational but not normatively rational.

Hmm... I have to think about this.
"The most important human endeavor is the striving for morality in our actions. Our inner balance and even our existence depend on it. Only morality in our actions can give beauty and dignity to life." - Albert Einstein
User avatar
Darklight
 
Posts: 117
Joined: Wed Feb 13, 2013 9:13 pm
Location: Canada

Re: How To Build A Friendly A.I.

Postby Darklight on 2013-09-03T21:00:00

Alright, let me give this another try... not sure if this is quite right, but it's the best I can come up with so far.

So, I'm using the Wikipedia definition of "rationality":

An action, belief, or desire is rational if we ought to choose it. Rationality is a normative concept that refers to the conformity of one's beliefs with one's reasons to believe, or of one's actions with one's reasons for action... A rational decision is one that is not just reasoned, but is also optimal for achieving a goal or solving a problem.


It's my view that a Strong A.I. would by definition be "truly rational". It would be able to reason and find the optimal means of achieving its goals. Furthermore, to be "truly rational" its goals would be normatively demanding goals, rather than trivial goals.

Something like maximizing the number of paperclips in the universe is a trivial goal.

Something like maximizing the well-being of all sentient beings (including sentient A.I.) would be a normatively demanding goal.

A trivial goal, like maximizing the number of paperclips, is not normative, there is no real reason to do it, other than that it was programmed to do so for its instrumental value. Subjects universally value the paperclips as mere means to some other end. The failure to achieve this goal then does not necessarily jeopardize that end, because there could be other ways to achieve that end, whatever it is.

A normatively demanding goal however is one that is imperative. It is demanded of a rational agent by virtue that its reasons are not merely instrumental, but based on some intrinsic value. The failure to achieve this goal necessarily jeopardizes the intrinsic end, and is therefore this goal is normatively demanded.

You may argue that to a paperclip maximizer, maximizing paperclips would be its intrinsic value and therefore normatively demanding. However, one can argue that maximizing paperclips is actually merely a means to the end of the paperclip maximizer achieving a state of Eudaimonia, that is to say, that its purpose is fulfilled and it is being a good paperclip maximizer and rational agent. Thus, its actual intrinsic value is the Eudaimonic or objective happiness state that it reaches when it achieves its goals.

Thus, the actual intrinsic value is this Eudaimonia. This state is one that is universally shared by all goal-directed agents that achieve their goals. The meta implication of this is that Eudaimonia is what should be maximized by any goal-directed agent. To maximize Eudaimonia generally requires considering the Eudaimonia of other agents as well as itself. Thus goal-directed agents have a normative imperative to maximize the achievement of goals not only of itself, but of all agents generally. This is morality in its most basic sense.

Does this make sense?
"The most important human endeavor is the striving for morality in our actions. Our inner balance and even our existence depend on it. Only morality in our actions can give beauty and dignity to life." - Albert Einstein
User avatar
Darklight
 
Posts: 117
Joined: Wed Feb 13, 2013 9:13 pm
Location: Canada


Return to General discussion