Humane reinforcement-learning research

Brian Tomasik · by **Brian Tomasik** on 2012-06-16T15:23:00

There is a fascinating body of literature in the realm of computational neuroscience which models animal learning in mathematical terms. A juicy introduction can be found in Scholarpedia's article on "Reinforcement learning" and the associated sub-links, including "Reward signals" and "Temporal difference learning." For a classic textbook on RL from a computer-science perspective, see Sutton and Barto, Reinforcement Learning: An Introduction. (I am just beginning to browse through it myself.)

It is remarkable that algorithms such as TD learning -- which rely on ideas like optimal control and Markov decision processes developed by mathematicians many decades ago -- have such a close correspondence with models from classical conditioning and biological investigations of real neurons. Take a look at the pictures in the "Reinforcement learning" article to see what I mean. That article notes:

Reinforcement learning is also reflected at the level of neuronal sub-systems or even at the level of single neurons. In general the Dopaminergic system of the brain is held responsible for RL. Responses from dopaminergic neurons have been recorded in the Substantia Nigra pars compacta (SNc) and the Ventral Tegmental Area (VTA) where some reflect the prediction error δ of TD-learning (see Figure 3B pe). Neurons in the Striatum, orbitofrontal cortex and Amygdala seem to encode reward expectation (for a review see Reward Signals, Schultz 2002, see Figure 3B re).

The review article "A Neural Substrate of Prediction and Reward" by Schultz, Dayan, and Montague provides a very simple explanation of how prediction error can be used:

As the rat above explores the maze, its predictions become more accurate. The predictions are considered “correct” once the average prediction error δ(t) is 0. At this point, fluctuations in dopaminergic activity represent an important “economic evaluation” that is broadcast to target structures: Greater than baseline dopamine activity means the action performed is “better than expected” and less than baseline means “worse than expected.” Hence, dopamine responses provide the information to implement a simple behavioral strategy—take [or learn to take (24)] actions correlated with increased dopamine activity and avoid actions correlated with decreases in dopamine activity.

A very simple such use of δ(t) as an evaluation signal for action choice is a form of learned klinokinesis (25), choosing one action while δ(t) > 0, and choosing a new random action if δ(t) ≤ 0. This use of δ(t) has been shown to account for bee foraging behavior on flowers that yield variable returns (9, 11).

That said, the TD model isn't perfect. Again quoting from "Reinforcement learning":

However, only few dopaminergic neurons produce error signals that comply with the demands of reinforcement learning. Most dopaminergic cells seem to be tuned to arousal, novelty, attention or even intention and possibly other driving forces for animal behavior. Furthermore the TD-rule reflects a well-defined mathematical formalism that demands precise timing and duration of the δ error, which cannot be guaranteed in the basal ganglia or the limbic system (Redgrave et al. 1999). Consequently, it might be difficult to calculate predictions of future rewards. For that reason alternative mechanisms have been proposed which either do not rely on explicit predictions (derivatives) but rather on a Hebbian association between reward and CS (O'Reilley et al. 2007), or which use the [dopamine] DA signal just as a switch which times learning after salient stimuli (Redgrave and Gurney 2007, Porr and Wörgötter 2007). Hence the concept of derivatives and therefore predictions has been questioned in the basal ganglia and the limbic system and alternative more simpler mechanisms [such as differential Hebbian learning] have been proposed which reflect the actual neuronal structure and measured signals.

In any event, computational neuroscientists are making impressive progress on more accurate models.

The article continues:

RL methods are used in a wide range of applications, mostly in academic research but also in fewer cases in industry. Typical application fields are:
Systems control, e.g. learning to schedule elevator dispatching, (Crites and Barto 1996);
Playing Games, e.g. TD-Gammon (Thesauro 1994), and
Simulations of animal learning (simulating classical, Sutton and Barto 1981, or instrumental conditioning tasks, Montague et al 1995, simulating tropisms, Porr and Wörgötter 2003).

An quick search of {"reinforcement learning" industry} brought up many more examples, like modeling stock trading with multiple Q-learning agents or task scheduling in petroleum production using simulated SARSA agents. Of course, these are themselves academic projects with only potential industrial applications. I'm not aware of any corporations today that have massive computing clusters devoted to RL, while there are certainly large computing clusters devoted to certain other machine-learning tools (linear regression, neural networks, decision trees, SVMs, etc.).

The important question is whether RL raises any ethical concerns. To be clear: I am extremely doubtful that it does at this stage. The main reason is that learning per se isn't ultimately what we care about. What matters is conscious affective experience. When RL agents receive positive rewards or negative punishments at each time step, this teaches them something, but I don't think we've built in the architecture for them to feel these rewards/punishments. Pleasure in animals results from operations in a number of hedonic brain circuits, and my guess is that these circuits are doing things besides (or at least in addition to) reward prediction. Moreover, whatever consciousness is, I'm guessing the RL agents of today are too simple to have it, seeing as they lack brain regions associated with consciousness in animals.

That said, as the sophistication of computational models grows, more and more of these components will be added together. At some point, the simulations run by neuroscience grad students on their laptops may correspond to things we would regard as suffering. I fear that people may not heed this if the algorithms don't have a human face. Indeed, just look at how many people today accept traumatic research on rats, which are clearly conscious and do have a face, if not a human one. (I think there can be strong utilitarian arguments for animal experimentation, but my point is that sometimes people don't even see the ethical dilemma in the first place.)

Some questions:

Will industry ever find it beneficial to attach pleasure/pain and consciousness to RL algorithms, or is that superfluous?
If the latter, it seems the main practical use of RL will be by scientists -- either to learn more about the brain or to simulate creatures to gain biological, psychological, and game-theoretic knowledge.
I suppose a huge non-practical use of RL could be for computer games -- AI agents in virtual worlds. (See also the Grandroids thread.) But again, would these have the pleasure/pain and consciousness parts, or just the learning parts? (Maybe hobbyists would pay extra for the "sentient edition" of the games? )
Is there anything we should do today to get the ball rolling on this issue? I think (1) promoting antispeciesism generally and (2) talking about this topic among audiences that are ready for it are good ideas. Anything else?

Hedonic Treader · by **Hedonic Treader** on 2012-06-21T05:51:00

Highly relevant topic, unfortunately very technical and abstract. I assume it requires at least some expertise both in affective neuroscience and in the relevant computer science fields. I can't answer questions 1-3 meaningfully.

Yes, human empathy will misfire and produce both false negatives and false positives. One can only hope reason and cognitive identification with general benevolence will win the day here, but I'm pessimistic. Animal welfare shows that such benevolence exists, but that it is limited and conditional. It has not prevented preventable mass torture, so why should we expect it to succeed for abstract algorithms?

Nevertheless, completely ignoring it might be reckless. Peter Singer has written about robot rights and friendly AI, but I'm not aware of any ethical discourse that specifically addresses reinforcement learning or other algorithms that aren't connected to the idea of an individual (such as a personal AI or a physical robot).

I think what's lacking is a formal theory of affective consciousness. We'd need a descriptor for the types of processes that create conscious good and bad states in the human brain and then extrapolate in a generic enough way to identify it in other systems without broadening the class of algorithms too far. I have no idea how to do that and whether it is even well-defined.

If we knew any individual academics who work in these fields, maybe even with interdisciplinary knowledge, asking them what they would think about a goal like this and how to achieve it might have merit.

Brian Tomasik · by **Brian Tomasik** on 2012-06-21T10:11:00

Thanks, HT! BTW, I was wondering about the origin of your user name. I assume it comes from "hedonic treadmill"? But it could also be hypothesized to imitate the Dawn Treader.

Hedonic Treader wrote:I assume it requires at least some expertise both in affective neuroscience and in the relevant computer science fields.

Somewhat, but I think the fundamental ideas would be accessible to anyone if popularized well. In general, if something seems hard to understand, that means it hasn't been explained clearly enough.

Hedonic Treader wrote:Yes, human empathy will misfire and produce both false negatives and false positives. One can only hope reason and cognitive identification with general benevolence will win the day here, but I'm pessimistic. Animal welfare shows that such benevolence exists, but that it is limited and conditional. It has not prevented preventable mass torture, so why should we expect it to succeed for abstract algorithms?

I agree with everything you said.

Hedonic Treader wrote:Peter Singer has written about robot rights

Great article. I like this paragraph:

The cognitive scientist Steve Torrance has pointed out that powerful new technologies, like cars, computers, and phones, tend to spread rapidly, in an uncontrolled way. The development of a conscious robot that (who?) was not widely perceived as a member of our moral community could therefore lead to mistreatment on a large scale.

Hedonic Treader wrote:We'd need a descriptor for the types of processes that create conscious good and bad states in the human brain and then extrapolate in a generic enough way to identify it in other systems without broadening the class of algorithms too far. I have no idea how to do that and whether it is even well-defined.

Yes. I don't know what you mean by "well defined," because it's we who will be deciding the boundaries. Is it "well defined" which flavors of vegan ice cream we like and which ones we don't?

There will certainly be disagreements among different people about these criteria. We see them among ourselves already: Does non-conscious nociception matter? Should we weight by brain size or perhaps some better measure of "computational horsepower"? What are the proper exchange rates between physical pain, depression, fear, anxiety, relaxation, love, amusement, flow, and all of the other dozens of emotions that we experience?

Hedonic Treader · by **Hedonic Treader** on 2012-06-21T22:39:00

Alan Dawrst wrote:Thanks, HT! BTW, I was wondering about the origin of your user name. I assume it comes from "hedonic treadmill"? But it could also be hypothesized to imitate the Dawn Treader.

No, all the Narnia stuff somehow passed by me unnoticed. Except for something about a talking lion and an ice witch. Yes, Hedonic Treader is a reference to the hedonic treadmill.

The cognitive scientist Steve Torrance has pointed out that powerful new technologies, like cars, computers, and phones, tend to spread rapidly, in an uncontrolled way. The development of a conscious robot that (who?) was not widely perceived as a member of our moral community could therefore lead to mistreatment on a large scale.

Yes, interesting. So the idea that this could become an underrated issue isn't new.

I don't know what you mean by "well defined," because it's we who will be deciding the boundaries. Is it "well defined" which flavors of vegan ice cream we like and which ones we don't?

Yes, actually I think it is. I think it would be possible to scan our brains and find out which flavors we like, i.e. make predictions about our preference ratings. But you probably meant that from an emotivist view, there is no fact of the matter about which types of consciousness we should care, and how much. I'd still think that understanding the neural correlates of our own affect could guide our intuitions one way or another. The better we understand our own pleasures and pains, the more likely it is that we can draw a firmer judgment on a very similar vs. dissimilar type of valuation system.

There will certainly be disagreements among different people about these criteria. We see them among ourselves already: Does non-conscious nociception matter?

I guess it depends on how you define non-conscious? I recently saw a documentary including a case of a man who had a stroke which severed some connections between his limbic system and frontal cortex. He showed visceral emotional reactions to stimuli, but was unable to describe them introspectively or recognize them in others.

Should we weight by brain size or perhaps some better measure of "computational horsepower"?

Certainly not brain size or raw computation alone. You could always add more neurons doing some nonsense computation.

What are the proper exchange rates between physical pain, depression, fear, anxiety, relaxation, love, amusement, flow, and all of the other dozens of emotions that we experience?

Hard to say from my level of knowledge, but something has to code for intensity and affective valence, since we clearly feel the differences. So even if the exchange rates aren't perfectly exact, we should be able to identify neurons that distinguish good from bad or that encode intensities as firing rates. Otherwise our communicable introspective tokens would end up being arbitrary, and I think for that, our behavior is at least somewhat too consistent.

Edit: One more point about the artificial reinforcement learners. While Carl Shulman's dystopian "suffering subroutines" scenario assumes negative affect in RL algorithms, of course it could turn out that their equivalent is positive affect (i.e. efficient algorithms that have "fun" learning). Do we have some reason to expect an asymmetry here?

Brian Tomasik · by **Brian Tomasik** on 2012-06-22T14:32:00

Hedonic Treader wrote:Yes, interesting. So the idea that this could become an underrated issue isn't new.

In other words, maybe "the idea that it's underrated" is not underrated.

Hedonic Treader wrote:But you probably meant that from an emotivist view, there is no fact of the matter about which types of consciousness we should care, and how much. I'd still think that understanding the neural correlates of our own affect could guide our intuitions one way or another.

I couldn't agree more.

Hedonic Treader wrote:I recently saw a documentary including a case of a man who had a stroke which severed some connections between his limbic system and frontal cortex. He showed visceral emotional reactions to stimuli, but was unable to describe them introspectively or recognize them in others.

Fascinating! Do you remember the documentary's title? That's a textbook case of the kind of dilemmas that we'll need to figure out. I really can't say if I would count that as suffering or not at this point. I think "describing emotions introspectively" is a pretty good sufficiency proof of consciousness (except for things like chat bots that we know don't have the right internals for those statements to be genuine), but is it also a necessary condition for non-speech-impaired humans? (It's presumably not a necessary condition for animals that can't speak in the first place.)

Hedonic Treader wrote:Certainly not brain size or raw computation alone. You could always add more neurons doing some nonsense computation.

Yes. You'd need to figure out what the core algorithms are for suffering and then measure essentially throughput for executing those algorithms. But I'm not sure I agree with this measure -- it seems to strip away too much context that also matters. There's a good chance that I actually weight individual organisms equally (or at least on the same order of magnitude). I expressed the intuition during a recent Facebook conversation:

Alan Dawrst wrote:I don't think I would regard insect suffering as a pinprick. To the insect herself, her suffering is all that matters -- it overwhelms everything. The same is not true for me when I feel a pinprick. This is why I have some intuition that suffering should be counted equally per agent, rather than per neuron. Also, if we do count by neurons, we have to adjust for efficiency considerations. A brain that runs a really inefficient suffering algorithm probably doesn't deserve twice as much weight as a brain that runs the same suffering algorithm on half of the wetware.

Hedonic Treader wrote:Hard to say from my level of knowledge, but something has to code for intensity and affective valence, since we clearly feel the differences.

Good point. That said, it might not be a simple number stored somewhere. Maybe it's a reconstructed weighting based on the context of all other inputs also being experienced at that moment. For example, maybe being outside in the cold is normally -10, but if you're with friends it's only -4. In other words, we'd probably want to look at an intermediate output of the neural network rather than the base-level inputs.

There are also issues like how much we want to avoid something vs. how much we dislike the thing, where we hedonistic utilitarians only care about the latter.

Hedonic Treader wrote:dystopian "suffering subroutines" scenario assumes negative affect in RL algorithms, of course it could turn out that their equivalent is positive affect (i.e. efficient algorithms that have "fun" learning). Do we have some reason to expect an asymmetry here?

Ha, great question. The original suffering-subroutines scenario was floated as a reason why negative utilitarians should worry about paperclippers, so in that context, we need only think about potential suffering. For myself, I focus more on the suffering because I give a higher exchange rate to suffering. However, it's not entirely clear if that exchange rate is fundamental or if it's just based on the empirical fact that animals today can suffer much worse than they can ever be happy. I don't know if the same asymmetry would apply to reinforcement learners.

Now, the "Reinforcement learning" article noted, "The agent can visit a finite number of states and in visiting a state, a numerical reward will be collected, where negative numbers may represent punishments." But I'm really doubtful that the difference between happiness and suffering in practice is as simple as a sign difference. Eliezer said in one essay:

If you redesigned the brain to represent the intensity of pleasure using IEEE 754 double-precision floating-point numbers, a mere 64 bits would suffice to feel pleasures up to 10^308 hedons... in, um, whatever base you were using.

This still represents less than 7500 years of 10% annual improvement from a 1-hedon baseline, but after that amount of time, you can switch to larger floats.

Now we have lost a bit of fine-tuning by switching to IEEE-standard hedonics. The 64-bit double-precision float has an 11-bit exponent and a 52-bit fractional part (and a 1-bit sign). So we'll only have 52 bits of precision (16 decimal places) with which to represent our pleasures, however great they may be. An original human's orgasm would soon be lost in the rounding error... which raises the question of how we can experience these invisible hedons, when the finite-precision bits are the whole substance of the pleasure.

We also have the odd situation that, starting from 1 hedon, flipping a single bit in your brain can make your life 10154 times more happy.

And Hell forbid you flip the sign bit. Talk about a need for cosmic ray shielding.

Indeed, depending on whether our integer is signed or unsigned, 10000000 can represent either 128 or -128. The same exact configuration of electrons means exactly opposite things.

Of course, in the context of a given algorithm, it means only one or the other, but I still find it intuitively fishy that something so trivial as this could determine the difference between heaven and hell.

Hedonic Treader wrote:Yes, interesting. So the idea that this could become an underrated issue isn't new.

In other words, maybe "the idea that it's underrated" is not underrated.

Hedonic Treader wrote:But you probably meant that from an emotivist view, there is no fact of the matter about which types of consciousness we should care, and how much. I'd still think that understanding the neural correlates of our own affect could guide our intuitions one way or another.

I couldn't agree more.

Hedonic Treader wrote:I recently saw a documentary including a case of a man who had a stroke which severed some connections between his limbic system and frontal cortex. He showed visceral emotional reactions to stimuli, but was unable to describe them introspectively or recognize them in others.

Fascinating! Do you remember the documentary's title? That's a textbook case of the kind of dilemmas that we'll need to figure out. I really can't say if I would count that as suffering or not at this point. I think "describing emotions introspectively" is a pretty good sufficiency proof of consciousness (except for things like chat bots that we know don't have the right internals for those statements to be genuine), but is it also a necessary condition for non-speech-impaired humans? (It's presumably not a necessary condition for animals that can't speak in the first place.)

Hedonic Treader wrote:Certainly not brain size or raw computation alone. You could always add more neurons doing some nonsense computation.

Yes. You'd need to figure out what the core algorithms are for suffering and then measure essentially throughput for executing those algorithms. But I'm not sure I agree with this measure -- it seems to strip away too much context that also matters. There's a good chance that I actually weight individual organisms equally (or at least on the same order of magnitude). I expressed the intuition during a recent Facebook conversation:

Alan Dawrst wrote:I don't think I would regard insect suffering as a pinprick. To the insect herself, her suffering is all that matters -- it overwhelms everything. The same is not true for me when I feel a pinprick. This is why I have some intuition that suffering should be counted equally per agent, rather than per neuron. Also, if we do count by neurons, we have to adjust for efficiency considerations. A brain that runs a really inefficient suffering algorithm probably doesn't deserve twice as much weight as a brain that runs the same suffering algorithm on half of the wetware.

Hedonic Treader wrote:Hard to say from my level of knowledge, but something has to code for intensity and affective valence, since we clearly feel the differences.

Good point. That said, it might not be a simple number stored somewhere. Maybe it's a reconstructed weighting based on the context of all other inputs also being experienced at that moment. For example, maybe being outside in the cold is normally -10, but if you're with friends it's only -4. In other words, we'd probably want to look at an intermediate output of the neural network rather than the base-level inputs.

There are also issues like how much we want to avoid something vs. how much we dislike the thing, where we hedonistic utilitarians only care about the latter.

Hedonic Treader wrote:dystopian "suffering subroutines" scenario assumes negative affect in RL algorithms, of course it could turn out that their equivalent is positive affect (i.e. efficient algorithms that have "fun" learning). Do we have some reason to expect an asymmetry here?

Ha, great question. The original suffering-subroutines scenario was floated as a reason why negative utilitarians should worry about paperclippers, so in that context, we need only think about potential suffering. For myself, I focus more on the suffering because I give a higher exchange rate to suffering. However, it's not entirely clear if that exchange rate is fundamental or if it's just based on the empirical fact that animals today can suffer much worse than they can ever be happy. I don't know if the same asymmetry would apply to reinforcement learners.

Hedonic Treader · by **Hedonic Treader** on 2012-06-24T20:54:00

Brian Tomasik wrote:Fascinating! Do you remember the documentary's title? That's a textbook case of the kind of dilemmas that we'll need to figure out.

Here's the link: http://www.youtube.com/watch?v=T_-2IAHvfxE

It's the first described case. The documentary is a bit thin on the neuroanatomy details, and the pace is slow, so you might not get much condensed value out of it. Some of the cases are interesting.

Brian Tomasik · by **Brian Tomasik** on 2012-06-24T22:11:00

Thanks! For long-term preservation, in case YouTube takes down the video for copyright reasons(?), the title is "The Secret Life of the Brain, 4 of 5 The Adult Brain" on PBS.

Brian Tomasik · by **Brian Tomasik** on 2012-12-10T04:24:00

Joscha Bach's "A Motivational System for Cognitive AI" describes one toy framework of agents exhibiting emotion-type cognitive processes. For example, the agents have needs (food, water, social support, learning) and select actions to fulfill those needs. Some components of the system feel stylized to me (analogous to taping feathers onto a cat in the hopes that it will fly), but in general, it seems the design of these agents could be a plausible starting point for artificial sentients, once additional cognitive processes are added. I haven't studied the details of performance of the systems, so I can't comment beyond a surface level, though.

Fulfilling desires by action selection seems like motivation ("wanting"), but as noted in the opening post, this may not get us to positive/negative subjective experience ("conscious liking"). Joscha actually makes a similar observation on p. 239, emphasis added:

Whenever the agent performs an action or is subjected to an event that reduces one of its urges, a reinforcement signal with a strength that is proportional to this reduction is created by the agent’s “pleasure center”. The naming of the “pleasure” and “displeasure centers” does not necessarily imply that the agent experiences something like pleasure or displeasure. Like in humans, their purpose lies in signaling the reflexive evaluation of positive or harmful effects according to physiological, cognitive or social needs. (Experiencing these signals would require an observation of these signals at certain levels of the perceptual system of the agent.) Reinforcement signals create or strengthen an association between the urge indicator and the action/event. Whenever the respective urge of the agent becomes active in the future, it may activate the now connected behavior/episodic schema. If the agent pursues the chains of actions/events leading to the situation alleviating the urge, we are witnessing goal-oriented behavior.

Conversely, during events that increase a need (for instance by damaging the agent or frustrating one of its cognitive or social urges), the “displeasure center” creates a signal that causes an inverse link from the harmful situation to the urge indicator. When in future deliberation attempts (for instance, by extrapolating into the expectation horizon) the respective situation gets activated, it also activates the urge indicator and thus signals an aversion. An aversion signal is a predictor for aversive situations, and such aversive situations are avoided if possible.

More info on MicroPsi's website.

This is only one of a number of projects on AI emotion systems. I'd like to study more of them, and if others know about them, feel free to explain here.

BTW, I hope someone works on a "gradients of bliss" AI architecture that eliminates all these suffering components....

Humane reinforcement-learning research

Humane reinforcement-learning research

Re: Humane reinforcement-learning research

Re: Humane reinforcement-learning research

Re: Humane reinforcement-learning research

Re: Humane reinforcement-learning research

Re: Humane reinforcement-learning research

Re: Humane reinforcement-learning research

Re: Humane reinforcement-learning research