There is a fascinating body of literature in the realm of computational neuroscience which models animal learning in mathematical terms. A juicy introduction can be found in Scholarpedia's article on "Reinforcement learning" and the associated sub-links, including "Reward signals" and "Temporal difference learning." For a classic textbook on RL from a computer-science perspective, see Sutton and Barto, Reinforcement Learning: An Introduction. (I am just beginning to browse through it myself.)
It is remarkable that algorithms such as TD learning -- which rely on ideas like optimal control and Markov decision processes developed by mathematicians many decades ago -- have such a close correspondence with models from classical conditioning and biological investigations of real neurons. Take a look at the pictures in the "Reinforcement learning" article to see what I mean. That article notes:
The review article "A Neural Substrate of Prediction and Reward" by Schultz, Dayan, and Montague provides a very simple explanation of how prediction error can be used:
That said, the TD model isn't perfect. Again quoting from "Reinforcement learning":
In any event, computational neuroscientists are making impressive progress on more accurate models.
The article continues:
An quick search of {"reinforcement learning" industry} brought up many more examples, like modeling stock trading with multiple Q-learning agents or task scheduling in petroleum production using simulated SARSA agents. Of course, these are themselves academic projects with only potential industrial applications. I'm not aware of any corporations today that have massive computing clusters devoted to RL, while there are certainly large computing clusters devoted to certain other machine-learning tools (linear regression, neural networks, decision trees, SVMs, etc.).
The important question is whether RL raises any ethical concerns. To be clear: I am extremely doubtful that it does at this stage. The main reason is that learning per se isn't ultimately what we care about. What matters is conscious affective experience. When RL agents receive positive rewards or negative punishments at each time step, this teaches them something, but I don't think we've built in the architecture for them to feel these rewards/punishments. Pleasure in animals results from operations in a number of hedonic brain circuits, and my guess is that these circuits are doing things besides (or at least in addition to) reward prediction. Moreover, whatever consciousness is, I'm guessing the RL agents of today are too simple to have it, seeing as they lack brain regions associated with consciousness in animals.
That said, as the sophistication of computational models grows, more and more of these components will be added together. At some point, the simulations run by neuroscience grad students on their laptops may correspond to things we would regard as suffering. I fear that people may not heed this if the algorithms don't have a human face. Indeed, just look at how many people today accept traumatic research on rats, which are clearly conscious and do have a face, if not a human one. (I think there can be strong utilitarian arguments for animal experimentation, but my point is that sometimes people don't even see the ethical dilemma in the first place.)
Some questions:
It is remarkable that algorithms such as TD learning -- which rely on ideas like optimal control and Markov decision processes developed by mathematicians many decades ago -- have such a close correspondence with models from classical conditioning and biological investigations of real neurons. Take a look at the pictures in the "Reinforcement learning" article to see what I mean. That article notes:
Reinforcement learning is also reflected at the level of neuronal sub-systems or even at the level of single neurons. In general the Dopaminergic system of the brain is held responsible for RL. Responses from dopaminergic neurons have been recorded in the Substantia Nigra pars compacta (SNc) and the Ventral Tegmental Area (VTA) where some reflect the prediction error δ of TD-learning (see Figure 3B pe). Neurons in the Striatum, orbitofrontal cortex and Amygdala seem to encode reward expectation (for a review see Reward Signals, Schultz 2002, see Figure 3B re).
The review article "A Neural Substrate of Prediction and Reward" by Schultz, Dayan, and Montague provides a very simple explanation of how prediction error can be used:
As the rat above explores the maze, its predictions become more accurate. The predictions are considered “correct” once the average prediction error δ(t) is 0. At this point, fluctuations in dopaminergic activity represent an important “economic evaluation” that is broadcast to target structures: Greater than baseline dopamine activity means the action performed is “better than expected” and less than baseline means “worse than expected.” Hence, dopamine responses provide the information to implement a simple behavioral strategy—take [or learn to take (24)] actions correlated with increased dopamine activity and avoid actions correlated with decreases in dopamine activity.
A very simple such use of δ(t) as an evaluation signal for action choice is a form of learned klinokinesis (25), choosing one action while δ(t) > 0, and choosing a new random action if δ(t) ≤ 0. This use of δ(t) has been shown to account for bee foraging behavior on flowers that yield variable returns (9, 11).
That said, the TD model isn't perfect. Again quoting from "Reinforcement learning":
However, only few dopaminergic neurons produce error signals that comply with the demands of reinforcement learning. Most dopaminergic cells seem to be tuned to arousal, novelty, attention or even intention and possibly other driving forces for animal behavior. Furthermore the TD-rule reflects a well-defined mathematical formalism that demands precise timing and duration of the δ error, which cannot be guaranteed in the basal ganglia or the limbic system (Redgrave et al. 1999). Consequently, it might be difficult to calculate predictions of future rewards. For that reason alternative mechanisms have been proposed which either do not rely on explicit predictions (derivatives) but rather on a Hebbian association between reward and CS (O'Reilley et al. 2007), or which use the [dopamine] DA signal just as a switch which times learning after salient stimuli (Redgrave and Gurney 2007, Porr and Wörgötter 2007). Hence the concept of derivatives and therefore predictions has been questioned in the basal ganglia and the limbic system and alternative more simpler mechanisms [such as differential Hebbian learning] have been proposed which reflect the actual neuronal structure and measured signals.
In any event, computational neuroscientists are making impressive progress on more accurate models.
The article continues:
RL methods are used in a wide range of applications, mostly in academic research but also in fewer cases in industry. Typical application fields are:
- Systems control, e.g. learning to schedule elevator dispatching, (Crites and Barto 1996);
- Playing Games, e.g. TD-Gammon (Thesauro 1994), and
- Simulations of animal learning (simulating classical, Sutton and Barto 1981, or instrumental conditioning tasks, Montague et al 1995, simulating tropisms, Porr and Wörgötter 2003).
An quick search of {"reinforcement learning" industry} brought up many more examples, like modeling stock trading with multiple Q-learning agents or task scheduling in petroleum production using simulated SARSA agents. Of course, these are themselves academic projects with only potential industrial applications. I'm not aware of any corporations today that have massive computing clusters devoted to RL, while there are certainly large computing clusters devoted to certain other machine-learning tools (linear regression, neural networks, decision trees, SVMs, etc.).
The important question is whether RL raises any ethical concerns. To be clear: I am extremely doubtful that it does at this stage. The main reason is that learning per se isn't ultimately what we care about. What matters is conscious affective experience. When RL agents receive positive rewards or negative punishments at each time step, this teaches them something, but I don't think we've built in the architecture for them to feel these rewards/punishments. Pleasure in animals results from operations in a number of hedonic brain circuits, and my guess is that these circuits are doing things besides (or at least in addition to) reward prediction. Moreover, whatever consciousness is, I'm guessing the RL agents of today are too simple to have it, seeing as they lack brain regions associated with consciousness in animals.
That said, as the sophistication of computational models grows, more and more of these components will be added together. At some point, the simulations run by neuroscience grad students on their laptops may correspond to things we would regard as suffering. I fear that people may not heed this if the algorithms don't have a human face. Indeed, just look at how many people today accept traumatic research on rats, which are clearly conscious and do have a face, if not a human one. (I think there can be strong utilitarian arguments for animal experimentation, but my point is that sometimes people don't even see the ethical dilemma in the first place.)
Some questions:
- Will industry ever find it beneficial to attach pleasure/pain and consciousness to RL algorithms, or is that superfluous?
- If the latter, it seems the main practical use of RL will be by scientists -- either to learn more about the brain or to simulate creatures to gain biological, psychological, and game-theoretic knowledge.
- I suppose a huge non-practical use of RL could be for computer games -- AI agents in virtual worlds. (See also the Grandroids thread.) But again, would these have the pleasure/pain and consciousness parts, or just the learning parts? (Maybe hobbyists would pay extra for the "sentient edition" of the games? )
- Is there anything we should do today to get the ball rolling on this issue? I think (1) promoting antispeciesism generally and (2) talking about this topic among audiences that are ready for it are good ideas. Anything else?