Cite as in K.Pribram, ed, Origins: Brain and Self-Organization, Hillsdale NJ: Erlbaum, 1994, p.680-706.

 

Additional comments since publication appended to the end. (See the book for the actual figures.)

 

 

 

              The Brain as a Neurocontroller:

New Hypotheses and New Experimental Possibilities

 

 

                                                                                          Paul J. Werbos

                                                                    Room 675, National Science Foundation*

                                                                             Arlington, Virginia, USA 22230

                                                                                        pwerbos@nsf.gov

 

This paper will describe how a new body of mathematics -- initially motivated by neuroscience but developed in recent years through engineering applications -- can begin to yield a predictive, empirical understanding of the phenomenon of intelligence in the brain. The paper is mainly written for neuroscientists, or for engineers working with neuroscientists; it tries to describe crucial new experiments which need to be performed in order to test and refine this new understanding.

                The biggest single obstacle to the full use of mathematics in real neuroscience is the sheer difficulty of the relevant mathematics. The brain is far more complex than today's computers; therefore to understand it, one must use even more sophisticated mathematics than the average research engineer is familiar with. Because of this difficulty, a few "middle men" have presented oversimplified description of biology to the engineers, and oversimplified descriptions of the engineering to the biologists. These oversimplifications have often led to considerable misunderstanding and justified mistrust.

                Because of these communications problems, this paper will be written in an extremely informal style. It will consist mainly of the transcript of a one-hour talk, edited for readability, with a few critical updates inserted. The first section will explain the fundamental approach, and move directly to the "bottom line" -- to some specific areas where new experiments are badly needed. The next two sections will discuss the underlying theory and mathematics in more detail. The second section will discuss the issue of supervised learning, which can shed light on local circuits within the brain. The final section will discuss the major concepts of neurocontrol, which can shed light on the global organization which unifies these local circuits into a truly intelligent system.

                                                                                                       

 

INTRODUCTION AND OVERVIEW

 

Goals of This Talk

 

I really am grateful to speak for once to an audience that is said to have a lot of physiologists in it. I wish I had more chances to do this, because I think that some of the things that we've learned on the engineering side lead to some very interesting experimental possibilities on the physiological side; if we had more chances to talk to each other, we could learn a lot more about experiments which nobody is doing which could lead to some very exciting results in the future. That is what I would really like to talk about today. 

 

 _______________________________________________________________________________

*The views herein are those of the author, not those of NSF; however, as government work, it is in the public domain conditional upon proper citation. This is an updated version of a paper in Computational Neuroscience Symposium 1992, edited by M.Penna, S.Chittajalu and P.G. Madhavan, available from Madhavan at the Electrical Engineering Department at IUPUI in Indianapolis, Indiana.

 

 

 

                Now, because it is late in the day, I figured it might be useful for me to summarize everything I am going to say in one list, so that you can see that it is finite, anyway. I'm basically going to try to make four major points today:

 

(1) First, I'm going to argue that we can understand intelligence or the brain in the same kind of mathematical way  that we understand physics, as a real science. I'm not saying we're there yet, but I think it can be done.  

 

(2) Second, I'm going to argue that neurocontrol gives us new mathematics, which is the mathematics we need in  order to understand the brain mathematically. 

 

(3) I'm going to argue that neurocontrol has made enormous progress in the last few years, in terms of new  engineering applications, new mathematical designs and ideas, and new links to the brain.  Jim Bower has described this process as a kind of convergent evolution.  If you look at the simple‑minded neural nets you see in a lot of the neural net conferences, they don't have much connection to biology. But when you look at people who have to solve really difficult, hard engineering control problems, they're driven to some of the same  complexities we observe in the brain.  So I would argue concretely there are signs of convergent evolution. 

           

(4) Finally, most important, is that what we now have learned about what the brain might be doing suggests new opportunities for experiment. It suggests some surprising predictions. If the predictions are right, then you can  use experiments to surprise a lot of people and have fun changing the culture, and if they're wrong, you can surprise a lot of mathematicians and come up with some new computational principles that people think are impossible. So either way, it's really important. 

           

A caveat here is that as an NSF program director I'm not telling you that I've got a lot of money for this. In fact, I'm not allowed to spend money on things other than neuroengineering; my present budget is too small to allow anything more. I think that this is a very unfortunate situation, because if we're going to try to understand the human mind and human learning -- subjects of truly enormous importance -- then we have got to bring these things together; but right at the moment, there's essentially zero dollars available for the specific kind of two-way cooperation I'll be talking about today. I really wish somebody could fix that.  (As this book goes to press, The Biology Directorate at NSF is preparing a Collaborative Research Initiative which could help fill this vacuum; however, the exact role of Engineering in that initiative is not yet clear.)

                If this were an audience of policy people, or people who talk to their congressman, I would spend ninety percent of my time up here on items number one and two on my list.  I could spend a good hour on this  -- on the theory and the philosophy and all of that. If this were an engineering audience, I would talk about the applications and the designs; I have done that for about eight hours at a stretch.  But here I am going to try to jump ahead to the brain stuff, but this is a little risky.  You have to bear in mind that the kind of mathematics that's relevant to the brain is not the easy stuff.  The kind of math you can totally understand in twenty minutes‑‑that isn't what relates to the brain. The brain is a little more complicated, so I'm going to have to jump over some stuff and give some citations. 

 

Can we Understand the Brain Mathematically?:

Prospects for a Newtonian Revolution

 

Before I get going, though, I really do want to say a little bit about the generalities here. 

                I suspect that a lot of the people in neuroscience started out by wanting to understand the human mind.  They really wanted to understand something fundamental and important. But then they ran into a problem. Do you remember the old saying:"When you're up to your knees in alligators, it's hard to remember that your goal was to drain the swamp."?  All of us have that problem, from time to time. I suspect that a lot of neuroscientists discover, as time goes on, that the brain is so complex that they lose hope of figuring it out in their own lifetimes.  Some 

people have made a formal philosophy of that; they say, "look, the information content in any one brain is more complex than what I have spare neurons to understand, so by definition I cannot understand another brain, let alone everybody's brain." 

 

                But let us think about that idea a little more carefully.

                If you try to know all of the synapse strengths, the connections, the state of all the networks in somebody's brain, and the reverberatory dynamics -- then of course, that is too complicated to ever understand in your life. There is no way that all of those details can be fully known scientifically. There will always be lots and lots of islands of understanding, and those islands are useful.  We've seen good examples of studying connections even here today. But they don't tell you how intelligence works as a whole system. They're just little islands.  And that is very discouraging. 

                But think back, how did physicists solve this problem, how did physics become a science? Basically there was this guy Isaac Newton, and what did he do? Instead of trying to describe every physical object in the universe, physics gave up on that, and they said "let us try to understand instead the simple underlying dynamics which change all of that complicated stuff over time."  Maybe all of these complicated things are governed by something simple enough you can understand it.  In physics, "simple enough" meant a page of equations and a thousand pages of explanation -- not trivial, but understandable. 

                My argument is that the same kind of approach could work on the brain if you think of learning as the dynamics.  There is every reason to believe that underneath the complexity in the cerebral cortex and so on, there is a generalized, modular plasticity. Lashley has shown this, and I've heard of recent experiments where they've trained linguistic cortex to develop edge detectors just by wiring it up differently. It's very clear that there is a uniform, generalized modularity there in the interesting parts of the brain, which ought to be understandable if we focus on the learning, the plasticity[1]. Knowing the laws of learning would not immediately tell us a lot of specialized things about how we process specific sensory inputs in specific ways, but physicists have found that if you understand the underlying dynamic laws that control everything else, that's incredibly important later on when you try to do engineering. 

                So let us try to see if we can create ‑‑ I think we can create, in principle -- a Newtonian revolution, by focusing on the basic laws of learning in the high‑level, modular organs like the  cerebral cortex, the limbic lobes, the cerebellum, the olive, and so on. We won't ever understand the motor pools that way; they're like ad hoc preprocessors and postprocessors. But the really important stuff we can understand, in principle. 

                But of course you can't do that unless you have the right math.  Newton had derivatives. Well, we too have a new flavor of derivatives; that's what backpropagation is really all about, at its base [2], but backpropagation is only one small part of this large area that I'm calling neurocontrol. Years ago, there was great work by David Robinson comparing conventional control theory with what goes on in the vestibular system. His early work was a really great example of cross-disciplinary collaboration, really applying mathematical concepts in detail, but it did not really focus on the issue of learning (unlike his newer work). It really scares me that the time delay between developing the necessary mathematics and control theory in the engineering context and applying it to the eye tracking system may have been something like a 20‑40 year time delay.  Our new math -- neurocontrol -- is only about two years old, in the engineering world. I hope we don't have to wait 40 years before we start to apply it to the brain!  If we follow the normal course of government  funding and human inertia, we might well wait 40 years; however, if we work a little hard to do what's unnatural, the mathematics is available today, if we have the will to go ahead with it. 

                I'll talk more about these issues later on, when I discuss recent progress in neurocontrol.

 

Neuroengineering and Neuroscience:

What is the Basis for Collaboration?

 

Let me move ahead now to the first slide (Figure 1 on the next page).  This is again a generality slide.  Like everybody else here I'm arguing that we need interdisciplinary cooperation, but I'd like to say a little bit about where the problem is, because we need to do more than just say  interdisciplinary cooperation is needed; we need to have a concrete image in our heads of what it's about, or else we'll never be able to implement it. 

                A lot of people are excited because folks in the neuroscience side of the world studying the brain are now using neural network models.  They are building up the field of computational neuroscience, which still belongs on the left-hand side of Figure 1. In computational neuroscience,  we describe the brain by use of differential equations or other mathematical models, instead of just verbal anecdotes and whatnot.  That's exciting.  On the right-hand side of figure 1, in neuroengineering, we are using neural network systems to solve real‑world engineering problems; that's also very exciting.  

 

 

 

                                                                 Figure 1. An NSF Definition of Neuroengineering

               

                But the problem is this: what is the connection between the left and right sides of Figure 1?  Even in today's symposium, which is very interdisciplinary, it is pretty easy to classify most of the talks into who is doing computational neuroscience and who is doing engineering applications.  It's  like a gulf.  And what's the problem?          What's happening is that we're both using neural network models, but one group is using as its standard of validation: "Does the model fit the low‑level circuit and the empirical data down at the low‑level circuit?"  Maybe more than that.  But in engineering, the test is: "Does it work?"  So we have two different communities, based on two different standards of validation.  But in reality, the brain itself meets both tests.  The real circuits not only fit their own biological data, they also work in solving very complex control challenges.  Instead of having two communities, using two different standards of validation to inspire and to evaluate their work, we need to think of using both standards of validation together.  And that's how we can get feedback back and forth here. I won't elaborate on this today; this is just a matter of general principles. As I said before, unfortunately, my tiny bit of money is entirely on the neuroengineering side, and that's something that needs to be changed. 

 

Neural Nets and Neurocontrol:

Where Is the Right Mathematics?

 

A lot of people are worried that the artificial neural network (ANN) community, the engineering community, is itself caught in a kind of local minimum. It is true that 90% of the papers you see in a neural network conference these days talk about pattern recognition, and what are they actually doing?  Usually, they are doing pattern classification, using associative memory or other simple systems. Usually they are "training" ANNs to match databases which contain definite targets for what the output of the ANN should be, for every single example in the database.  There are lots of uses for this kind of task.  But that's not intelligence.  That's not consciousness, that's not what the mind does.  We humans are not just simple classification machines! This really ought to be obvious to anyone.

                This situation is kind of scary; you have to ask what is the relevance of that stuff?  Now, I'm not going to talk today about consciousness or the mind/body problem; if I'm brave at SMC on 

Monday I'll talk about that[3,4,5], but here I'm going to focus on physiology. 

                If we agree that neuroengineering has been caught in a kind of local minimum or intellectual rut, then what is the way to get out of that local minimum? If you forgive a pun, I will argue that we can get out of that local minimum by climbing out, by climbing up a ladder -- and here's the ladder (Figure 2, on the next page), the ladder of designs of neurocontrol. 

                Again, let me warn you that this is just a quick overview; I'll be giving you citations to more detailed information later on. There are many, many designs in this emerging area of neurocontrol, which I define as the use

 

 

                                                                 Figure 2. The Ladder of Designs in Neurocontrol

 

of well‑specified neural nets -- either natural or artificial, just mathematically‑defined neural nets -- to generate control outputs, which could be to motors, muscles, glands, stock transactions or whatever, but real actions in the real world. 

                In the neurocontrol field, we do have very simple designs, and these are the most popular. They're easy to do; they're a great start for people who want to get their students going, and start to build up software. This is the right place to start, but these designs do have very limited power, and they certainly are not like the brain. 

                In the middle level of the ladder, we have what I call the state‑of‑the‑art group, and I'd say there are about

four groups that are really in this category. It's curious that industry is here more than  academia; I don't know why. Are university people scared to do new things? I don't know, but these state‑of‑the‑art groups have mostly taken a couple of years to build up; maybe that's the problem, that you've got to keep your students around long enough, and build up modular software packages. After a couple of years of struggling, these groups have gotten real‑world applications just this year, of things that were only on paper two or three years ago.  And they have proven -- with really exciting, important applications -- that these more advanced, more brain‑like methods are far more powerful in solving real-world problems.  There are just incredibly important engineering problems that have been solved that are in the mill; again, today I won't talk about this a lot today, but I may make some reference to it.  

                After these methods on paper were used in real applications, it was a challenge to us theorists to move ahead of the applications people and come up with new methods to overcome the limits of the older ones, so that now on paper this year there are new methods which did not exist two or three years ago.  And now, on paper, it looks as if these designs and ideas really have the potential to achieve true brain‑like intelligence. So my bottom line is that at least on paper we now  have the math we need to understand real intelligence.  I'm not saying that these ideas are working yet on real systems, and that's what I try to pay people to do, to climb up this ladder with real engineering systems.

                By the way, I'm saying that the bottom level of the ladder is a good place to start, but when I fund people, the higher they can go up the ladder, the higher the probability of funding.  They may have to climb one step at a time, but they had better be moving upwards in a visible way. I'm trying to develop the engineering math that will be necessary to understand the brain.  I'm using engineering as a discipline to get the math we need for what's really interesting, which is the mind and the brain. 

 

Four New Empirical Possibilities: A Summary

 

Now before getting into the intricacies of neurocontrol, I would first like to give you my real bottom line. I would like to summarize four empirical areas where I think new experimental work could be really crucial. I will try to explain the reasoning behind all this in more detail later, but for now I will just give a summary:

 

(1) First,  I'm going to argue that some form of backpropagation -- not the simple three‑layer kind that most people have seen, but a more complicated, advanced form of backpropagation -- almost certainly must exist in the brain in order to explain some of the capabilities that we have observed there.  That in turn suggests that we have to look for some novel mechanisms, to carry information backwards both within and between cells. Between cells, it is now well-known that nitric oxide (NO) acts as a backwards transmitter. In addition, a group of researchers including Timothy Bill -- one of the important pioneers in Long-Term Potentiation (LTP) [6] -- has discovered a new presynaptic receptor intimately related to LTP[7]. (The group speculates that this receptor may be involved in adapting the nearby synapse, but there is no reason to believe that this is its only function.) Back in 1974, after I had developed the backpropagation algorithm, I speculated that the cytoskeleton might take care of the backwards flows within cells [2]; this still appears to be a viable possibility [5,8,9,10], but there is new evidence that the usual kinds of field effects in membranes could also be involved[11]. David Gardner has shown that such backwards mechanisms are crucial to learning even at the level of aplysia[12,13]. Nevertheless, all of this is only just a beginning. 

                There is a lot of engineering work needed in this area, both in theory and in instrumentation.  It's really frightening to me, when I look at how critical the cytoskeleton is in the nervous system (it is like half the nervous system!), to see that the amount of work that's been done understanding how the cytoskeleton relates computationally (or might relate) is negligible.  We don't yet know that it's relevant, but we don't yet know that it's irrelevant either. It is amazing to me that we can just sit back and ignore it and give it maybe ten thousand a year, when we're spending a billion dollars on the other half, when we don't know what it does is.  It's really frightening; we really need to be studying the cytoskeleton in any case, and backpropagation is just one of the things to look for when we do it.

                In looking for backpropagation, you don't necessarily have to look at the cytoskeletal level. There are other kinds of experiments you can do, where region A has a forward fiber to region B (e.g., A might be a part of the limbic zone and you move on to something like the motor cortex), and sometimes you can find that the plasticity in A seems to depend on what happens in B. It would be interesting to see if you could cut the fiber from A to B and then see if you lose the plasticity in A.  There's no way that could happen in a classical neuron model that's all feedforward and membrane‑driven, but if it does happen then that means that you can unhinge the neuron model. I have tried to persuade Karl Pribram to look into experiments like that, and his (informal, not for scientific publication) response was "I've already done it, I've already proven this." 

                Pribram's response was really very interesting to me. If you ask a lot of the middlemen between the neural network field and biology, they'll tell you that this is impossible; however, when I ask Pribram he says it's already been proven, that there is a backpropagation there.  I don't know whether to take his informal statements at face value yet; I think we need a lot closer collaboration to evaluate those experiments to see what they mean mathematically, but it's clear there is a lot to be done here. 

 

(2) True reverse engineering of hippocampal and other slices. In the talk by Sclabassi earlier today, we heard some very exciting things about the hippocampus. It was particularly fascinating to hear that the kind of learning you get from LTP clearly doesn't represent the real nonlinearity of the system.  I would speculate that appropriate slices through the brain can generate model systems that you can play with like artificial neural nets, where you can control the inputs and outputs. Why is it that when we do experiments in neural systems we try to always do them under natural conditions?  If we think that biological neural circuits are general purpose learning machines, then let's play with them! 

                Let's see if we can use a slice of neural tissue to learn to recognize an 

arbitrary pattern that hasn't been seen in nature.  Let's find out what are the capabilities.  Let's find out what the plasticity is in these more micro, more mathematical ways.   And I would speculate,  for example, that a slice through hippocampus and cerebral cortex that maintains those local recurrent links will have a better learning capability, in a sense I hope to have time to define, than  any of the Hebbian or backpropagation, feedforward nets that are in use today. 

                In other words, there are two classes of nets people are using a lot -- the classic Hebbian, the Grossbergian nets, and then maybe the multi‑layer perceptron (MLP) nets; I'm willing to bet that there are critical learning problems which I hope to talk about, which that kind of slice can solve better than any of the nets people now believe in, on the biological or the engineering side. Once you prove this, empirically, then I have some ideas for what is going on there, but the experiments are what's crucial for now. I think if you do the experiment, you will shake up a lot of people, and then they'll start thinking about those more powerful designs that we're just now starting to look at in engineering. 

                There is a whole lot to be done in this area. Once you have taken the first step -- demonstrating and describing plasticity on the slice -- you can then start looking for the learning mechanisms that underlie that plasticity.  So in a way this might be a good place to begin before getting into some of the harder issues I discussed earlier.

                Similar kinds of experiments could also be done in culture, if the right kinds of cells can be grown together in culture. Many biologists worry that cell cultures (and even slices) are very artificial. It can be dangerous to draw too many conclusions from what we see in culture, because the presence of other cell types and inputs in the brain could lead to very different kinds of behavior. Nevertheless, when groups of cells in culture do succeed in demonstrating certain kinds of engineering capabilities - such as the ability to learn to approximate mathematical functions more complex than those which Hebbian or MLP nets can learn -- then we probably can conclude that these cells possess these capabilities (or more) in nature, in the brain. There may be great value in figuring out what kinds of cells need to be present, as part of a culture, to generate what kinds of learning capabilities.

 

(3) A third area has to do with the inferior olive, which governs learning in the cerebellar system.  I am told by Pellionisz that Llinas and his group have observed plasticity in the inferior olive, which is crucially related to the cerebellum and lower‑level motor control.  I haven't looked at the experiments myself, but based on a very careful examination of the cerebellum, working jointly with Pellionisz  (not with tensor theory, working with Pellionisz on some new ideas), it is my conclusion  that something unusual is going on[14]. 

                There are two possibilities  ‑‑ or rather, I'm predicting one of two possibilities. Both of them are very surprising. First of all, before doing the experiment proper, the first stage is to replicate the phenomenon of plasticity in the olive. Then you have to cut one of two fibers, and show that cutting those fibers eliminates the plasticity; this would narrow down the plasticity to one of two possible mechanisms. (The two fibers are: (1) the climbing fibers; (2) the collateral fibers from the deep cerebellar nuclei and vestibular nucleus to the olive.)

                I hope that somebody can do this experiment soon. This may well be the most finite and do‑able thing on this whole list here.  So I really hope somebody looks at this. I have described that in more detail at the end of a recent paper[14]. This is an important experiment nobody has done‑‑I don't think it should be that hard.  And it is really critical to our next step in understanding what the cerebellum is doing.  

                After this talk was given, I found out that the first step -- of simply replicating plasticity in the olive -- itself a serious challenge. The original experiments by Llinas et al, reported in Science in 1975, are still highly controversial. Furthermore, there are certain learning tasks -- like those described by Richard Thompson -- which do not seem to elicit plasticity in the olive. (Just as most physical tasks only require the use of a few muscles, so too do many learning tasks exercise only a part of our learning abilities.) Hockberger and Alford at Northwestern University, working in communication with James Houk, in a small grant supported by NSF, have begun to explore learning mechanisms in olive cells, at a molecular level, in culture. It is predicted that olive cells can display plasticity in culture, but that it will be crucial to include enough other cell types (e.g. deep cerebellar nuclei cells) and to provide appropriate learning challenges through appropriate stimulation of the system; this may or may not be possible within the time-frame of this small start-up project. It is truly amazing that such a basic, important issue has been left unresolved for so many years. (Similar gaps exist in the present understanding of the nucleus basalis, another small but crucial piece of brain circuitry.)

                A more technical issue, crucial to working out the fine points of this system, is the ability of the cerebellum to learn time sequences and delays [14]. This ability clearly depends on certain short-term memory capabilities of Purkinje cells, but it is very tricky to design a circuit which reproduces such capabilities. (See chapter 13 of [15].) Tam, at the University of North Texas, has begun some learning experiments in cultures which might shed light on this issue[16]; again, however, the efforts so far are only a crude beginning.

 

(4) Fourth, there is room for more true reverse engineering of the cerebellar motor system. Suzuki, Kawato et al[17] have done a magnificent job in getting this area started, but a lot more needs to be done. Suzuki et al have basically shown that the lower motor system is doing optimal control, not adaptive control in the classic sense, and not translation between different kinds of coordinate systems, but optimal control.  I think that someone could play with that circuit a lot more than anyone has done so far. Suzuki et al, and Houk, think they know where the reward or utility functions are coming in from; if they are right, we could perturb these inputs and prove what the power is of this system in optimization, in adapting to new regimes. Again, we could play with the lower motor system, by perturbing its inputs to see what capabilities it has as a general‑purpose optimizer. 

 

                In brief, I have described four general areas where new kinds of experiments could be extremely useful. I don't know if I'm describing the tasks in exactly the right way. This is just an attempt to get the process started. I'm just a dumb engineer, as they often say.  But I think that something  needs to be done to get us moving into these new kinds of areas, and there is some theory behind the ideas above.

 

 

SUPERVISED LEARNING: RECENT ANN RESULTS

AND IMPLICATIONS FOR NEUROCONTROL AND BIOLOGY

 

Supervised Learning As A Neural Net Paradigm

 

A lot of people in neuroengineering get upset when I talk about control applications and control, because a lot of people in the artificial field really have this old idea (illustrated in Figure 3), that supervised learning is the same as neural network theory.  They think that neural network theory is the same as learning a map from an input vector X to a target vector Y, in hopes that in the future you'll be able to predict the right target vector.  And you go through training sets and you learn over and over again what this mapping is. 

 

 

                                                                   Figure 3. What Does Supervised Learning Do?

 

                In fact, if you look at the Granger‑Lynch model of the hippocampus (arguably the best existing model of the hippocampus as an associative memory) ‑‑ that's another form of supervised learning; it's just plain old pattern classification you're studying.  Supervised learning tends to be an all-pervasive paradigm, even for biologically motivated research. Many people tend to think that supervised learning is fundamental theory, and that anything else is just dirty applications.

 

Supervised Learning versus Neurocontrol

 

Supervised learning is certainly useful, and it may well exist in subsystems of the brain, but it turns out that for really powerful control systems, you have to do stuff that is a lot harder. 

                What you have to do is stuff like this (see Figure 4 on the next page). When I'm giving a tutorial on how to do real neurocontrol in engineering, it turns out that I have to spend an hour or two on each one of the three main boxes in Figure 4.  You do have supervised learning systems in these designs, but they're like little modules.  And then you have a big system, a neurocontrol system, that takes these lower level modules and integrates them and links them.  I often compare this situation to how we build computers: there's a lot of science to building the chip, but there's a lot of science to putting chips together to making a computer. Supervised learning is a general purpose concept, but neurocontrol is also general-purpose and fundamental; they simply address different general-purpose tasks.

                It turns out that the work that's been done in neurocontrol at these multiple levels of organization has parallels to the brain, at multiple levels of organization.  So the stuff we've learned down at the supervised learning level tends to be relevant to issues like what is the circuitry like within the cortex, within recurrent nets, or within the cerebellum, while the higher-level stuff is important when we try to figure out the organization that connects those systems.  So that means I should talk about both of these levels and explain them before I talk about the brain. So I should spend eight hours before it all becomes crystal clear.  Forgive me, it won't be quite as crystal clear as I like, because I don't have the eight hours.  

 

 

 

 

                                                               Figure 4. Four Task Areas Critical to Neurocontrol

 

Three Supervised Learning Modules Used Today in Neurocontrol

 

     First of all let me talk at the low level, at the supervised learning level.  Little nets that learn pattern recognition. What has been useful in engineering? 

     Basically, there are three kinds of networks that people really use in real‑world control 

applications:

 

(1) The most common is the multilayer perceptron (MLP). (See Figure 5 on the next page.))  Please don't call it a "backpropagation network"!  The MLP is only one special case of what you can adapt with backpropagation. Furthermore, the MLP is a lot older than what I did in 1974. Bernie Widrow or Rosenblatt are the guys that should take credit for the MLP design itself. 

 

 

                                                Figure 5. An Example of a Three-Layer Multilayer Perceptron (MLP)

 

                The MLPs are basically the McCulloch‑Pitts networks, the feedforward things. There have been some wonderful theorems about what they can do.  

 

(2) and (3) Almost as common are the CMAC and the RBF designs. (Figure 6) These networks are examples of "local" learning systems. We have already heard about these designs from Nick DeClaris; for example, CMAC was first proposed by Albus, for a PhD thesis under DeClaris. There are many other local learning systems discussed in neural net meetings, but DeClaris' students happened to hit on what was useful more than most students. There are also many modified versions of the CMAC and the RBF, which give improved performance in control applications[15].

 

 

 

 

 

                                                                  Figure 6. Structure of CMAC or RBF Network

 

                Basically, these local learning rules perform forecasting by association.  The MLP gives you a global model and is good for learning global functions, causal relations, etc.  The local systems are

more like forecasting by precedent. When you've got a new situation, you predict the result will be like what it was before when you had a similar situation.  It's an associative memory, and 

this is what the Granger‑Lynch stuff is, just another example of the same general principle. 

                So these are the things that most people use. These are feedforward nets, easy to implement and I'll show in a couple of charts how some people have used them. There are many other forms of associative memory, based on Hebbian learning, which have appeared in the biologically oriented literature; however, those kinds of nets are not used very often, for one reason or another. Most likely, people feel that the Hebbian nets now available are very similar in capability to the CMAC and RBF, because they are based on forecasting by precedent, while being harder to implement in real time. Why work harder to achieve the same capability? Another factor, however, is that people do tend to implement what is easiest first, not what is most powerful.

                Local learning systems can be adapted in a variety of ways -- through least squares, through backpropagation (i.e. derivative-based learning), etc. No matter how they are adapted, they tend to be faster to adapt than global MLPs, because they do not have to "undo" what was learned in a previous region of space when they explore a new region of space. They are usually set up so that different weights are active in different regions of space. They tend to require many more weights, but learn faster. There may be ways to combine the best advantages of global and local networks together in one system[18], but no one has implemented the right kind of hybrid as yet.

 

What Kinds of Functions Can Such Modules Represent?

 

So what are the capabilities of these different kind of networks? Well, there are a lot of theorems.  There's a guy named Andy Barron who has proven some beautiful theorems showing that those three‑layer neural nets like MLPs are much better than Taylor series, polynomial expansions, or local learning rules even, to approximate any smooth function.  It's beautiful. 

                In control applications, however, to really control an arm efficiently, sometimes you don't have a smooth function.  Then you've got a problem and it often doesn't work, and a lot of control applications three layers won't work.  So, a guy named Sontag at Rutgers has proven that it really can approximate the control functions you need if you've got two hidden layers. 

                But there's another problem. You can approximate any function, but you need to approximate it parsimoniously.  MLPs are as good as any other feedforward net, but in general there are some functions which you cannot approximate parsimoniously with any feedforward net. This means that you need enormous numbers of hidden layers to approximate them, and enormous numbers of hidden neurons. 

                Marvin Minsky, years ago, gave an example of this in his famous book Perceptrons. He described the "connectedness" problem, which sounds at first like a very typical character‑recognition, pixel‑type problems, but it turns out to be a little different. Imagine that you've got a grid of 50x50 input pixels and that they're all either white or black, and that you're trying to recognize a desired pattern.  What you're trying to do is to output a one if the blacks are all connected, and a zero if they're not.

                Now it turns out that Minsky showed that the number of hidden units required for this task is just enormous. As the number of pixels grows, it becomes astronomical; no feedforward net of any kind is going to do a good job.  But if you allow recurrence, recurrent feedback connections, what I call simultaneous recurrence (a special kind of feedback connection), then you can represent it  parsimoniously. 

                I would argue that the kind of language-processing problem that Jim Anderson described earlier (from Fodor) is a problem in this family, where a feedforward net can't represent it parsimoniously, but a simultaneous recurrent net can represent it parsimoniously.  And I've seen systems of that general sort, which so seem to work on that kind of problem, but I haven't studied Fodor's example in great detail. 

                It turns out that, if you want to deal with control problems like navigating a robot through a cluttered room where the clutter keeps moving and it's in a new position, you need to worry about finding a connected path.  So there are good arguments[15,18] that higher‑level intelligence has to use these kind of networks. 

                So that seems easy, but is it?  Well, first of all, what happens if you have parts of the brain that have to make decisions quickly? Simultaneous-recurrent nets take time to settle down‑‑what then?  Then you've got to have a feedforward net, and what happens then? 

 

Fast Feedforward Nets: The Cerebellum

 

     Well, Figure 7 (on the next page) shows an example of a feedforward net with two hidden layers that is good for fast, general motor control.  This comes from Nauta[1]; it is a diagram of the cerebellum. In the cerebellum, you start out with inputs along the mossy fibers (which some people would call an input "layer"). These inputs go to the granule cells, which operate as a first hidden layer, and you've got zillions of them. Some people say there are more granule cells than there are any other kind of cell in the brain, maybe ten times that number. That reinforces

 

the point‑‑you need a lot of hidden nodes if you try to do complex tasks with a (relatively local) feedforward net.  

     The next hidden layer in the cerebellum is the Purkinje cell layer. The output layer is basically the deep cerebellar layer and the vestibular nucleus (the FTN cells of the vestibular nucleus, to be precise); those two systems are basically together ‑‑ they're not right next to each other but they form one output layer, for functional purposes.  In summary, the cerebellum is not based on simultaneous-recurrence, which is slow but powerful. Strictly speaking, however, it is not just a static MLP, either. Above all, the Purkinje layer has some working memory capabilities[14], similar to well-known ANN designs. Such capabilities are tricky to adapt[15, chapter 13]; this, in turn, suggests that Purkinje cells might possibly be adapted by a combination of the well known olive-to-cerebellum mechanism, plus a local mechanism supporting the working memory effects. Also, because the Purkinje cells are large cells, and because continuity in output is very desirable at this level, it is conceivable that dendritic field effects, as described by Pribram, could occur within these cells[4,18].

 

 

 

 

                                                                       Figure 7. The cerebellum, from Nauta [1]

 

Slower, More Powerful Nets: The Cerebrum

 

But again, this kind of feedforward arrangement is not very good for the really higher‑level kinds of functions like finding a connected path or connectedness. There are some kinds of functions you just can't expect the cerebellum to solve, because they require a more sophisticated processing.  That leads to a prediction that somehow or other the higher levels, like the limbic lobes and the cerebral cortex, must form a kind of two‑level system, where the low level is settling down...but first let me describe Figure 8, which shows what a simultaneous recurrent net (SRN) is. 

 

 

                                                                    Figure 8. A Simultaneous Recurrent Network

 

                Mathematically, an engineer would say that you plug in the inputs into any old feedforward structure (the net labelled f in Figure 8). But then you take the outputs of f, and feed them back in as inputs to f, and see how that changes the outputs. You keep feeding the outputs back in as inputs, again and again, until the values of the outputs settle down to some kind of equilibrium, y. Now, to use a system like that... you have to plug in the inputs, wait through many cycles of the inner system f until the outputs settle down, and then that becomes just one cycle time of your bigger system.  Now, when I first saw the engineering of this[15], I said this can't have anything to do with the brain, because you need really fast cycles inside of a longer cycle; how could that be biologically plausible?  I knew that we needed all this to get intelligence in engineering, and I couldn't think of anything else; therefore, in the 1990 Decade of the Brain Symposium (sponsored by the National Federation for Brain Research and the INNS), I presented this from an Engineering point of view, without any notion of how the connection to the brain could be made.

                Later that same day, Walter Freeman presented his model of the hippocampus, which involved exactly the same kind of loops within loops as in my model! He showed how very close inner recurrent loops operate at a very high cycle time, embedded within a larger, slower theta rhythm. For the inner loop, he said that the basic calculation cycle time is like 400 hertz, versus about 4 hertz -- quite enough to implement Figure 8. (Some people quote 80 hertz -- based on Fourier analysis rather than cycle times -- but that still would be enough for some functionality in this design. When I checked with Pribram, he assured me that fast, 1ms. synapses allow such high-frequency computation.) VonderMalsburg has convinced me that such dual-loop effects are even more certain to occur in the neocortex, but the neocortex contains additional capabilities and complexity which may make it harder to work with at present. I would speculate that SRN capabilities are crucial both to binocular vision and to the image segmentation capabilities of neocortex.

                In brief, the biological data appears to fit the model beautifully.  Now, if you look 

at Granger and Lynch's model of the hippocampus, it doesn't do that.  In their model its like a feed forward, associative memory, and you only have an outer recurrence that's used to generate the 

associative memory.  So what is that inner loop doing?  Maybe the hippocampus is more powerful than associative memory. Maybe we need something more powerful than an associative memory to form emotions and make plans in our life, and maybe somebody can do an experiment proving it.  I hope so.  

                Parenthetically, it should be noted that SRNs -- unlike feedforward networks -- can have problems in settling down to a stable equilibrium. In engineering, one can use a "tension" term [15,18] to reduce the probability of instability, but the possibility cannot be totally eliminated. The tension parameter is a global parameter, with an interesting analogy to the global level of adrenalin in the bloodstream. Karl Pribram has pointed out that there is a strong analogy between this "tension" term and the "unpleasure" principle of Freud[19], which plays a key role in understanding the possibility of instability in human brains. In the human limbic system, Pribram's empirical discussion [20] suggests that the hippocampus is an SRN, acting mainly as a "hidden layer" of the limbic network, a network in which the amygdala is the ultimate or penultimate output layer. (This suggests a kind of crude analogy between the hippocampus and the cerebellar cortex.)

 

 

 

Other Forms of Hebbian Learning

 

That covers most of what I really want to say about supervised learning. Again, please excuse my glossing over the many, many details; each one of these topics can be discussed in much more detail, and is so discussed in the papers cited.

                For the sake of completeness, however, I should say a little about two forms of Hebbian learning which I did not mention above.

                Most people who work with Hebbian learning would argue that there are really two different kinds of Hebbian learning system which could be used on supervised learning problems. There are local associative memory systems, which I discussed above. But there are also global systems, which are generally linear, and require that inputs be decorrelated before they enter the supervised learning system. A lot of decorrelating networks have been designed for use with such nets. However, after discussing this matter with Pribram, I am convinced that this latter class of network is not relevant to systems like the human brain. Pribram and others have shown again and again that biological representation systems have a great deal of redundancy (e.g., like wavelets but with a 1.5 amplification factor instead of 2, etc., as in the Simmons talk today). One would expect such redundancy, in any system which also has to have a high degree of fault tolerance. This is inconsistent with the mathematical requirement of orthogonality. In addition, the limitation to the linear case is not encouraging, either.

     In 1992, I developed an alternative learning design which appears Hebbian in character, but has radically different properties[4,18]. It provides a mathematical representation of certain ideas by Pribram about dendritic field processing, which the talk today by Simmons provides strong empirical support for. It is closely linked to Chris Atkeson's experiments with locally-weighted regression, which has performed very well in robotics experiments at MIT. In retrospect, as I reconsider the issue of information flows around dendrites, I suspect that the design still needs to be revised, to account explicitly for the three-dimensional nature of local information flows, at least for biological modeling. In any case, the alternative design is still feedforward in terms of what it accomplishes; therefore, it might possibly be worth considering as a model of the innermost loop of the neocortex, but it does not obviate the need for simultaneous recurrence, and for the unusual kinds of nonHebbian feedback (as in Figure 8) required to adapt key parts of the neocortex and hippocampus -- if those systems are as powerful as I suspect.

 

Summary

 

In summary, I predict that the human brain contains some very complex circuitry, as required to solve some very complex adaptation problems. At present, most people would find it hard to believe that something that complicated is there, even though it does fit these new results of Freeman and so on. I think we need new experiments, based on living slices, to help get home to people that it's this kind of complexity that's in that system, and that the old models are simply not good enough.  So that's the end of supervised learning. 

 

 

THE HIGHER LEVEL OF ORGANIZATION: NEUROCONTROL

 

Now let me talk about neurocontrol.  This is a subject I've talked about for eight hours at a stretch, so I will have to cut out a lot of important material here today. First, I want to talk about why this is crucial to understanding intelligence. I'll skip over my slides on engineering application areas. I will talk a little bit about the kind of designs that engineers are using today, but only a little.  Mainly I will focus on the design concepts which relate directly to understanding the brain.

 

Why is Neurocontrol the Right Mathematics For Understanding the Brain?

 

This is a chart (Figure 9) that people look at and say, "I already know this."  But if people could understand the implications of what they already know, this world would be a different place. There are some implications in what we already know that people haven't thought through.  Now what I am going to talk about here is the reason why the human brain is a neurocontroller; let me give you the argument in a few stages. 

 

 

                                                                       Figure 9. The Brain As a Neurocontroller

 

                Step one: we know that the brain is an information processing system.  I would call it a computer, except that people will think of sequential machines.  But it's really a computer; it's an information processing system; its sole biological function is to be a computer.  And what does it compute?  It computes actions that control glands and muscles.  So the point is that the function of the brain as a whole system is to perform control. 

                Some people think of control as something that's only in the cerebellum, just to control finger movements. That's not true.  Nauta, in his classic text on neuroanatomy[1], stresses how you cannot separate what is the control system from the rest of the brain.  Now that doesn't mean that the whole brain does tracking or pursuit movements; no, it doesn't do that; it does a higher order kind of control, of course.  And you might use the term "sensorimotor control," if you will.  But the point is that the brain as a whole system has the function of calculating these things.  Everything in the brain is there to help it compute these outputs.  So the function of the brain is to do that; if you want to understand the brain as a whole system, you have to understand the mathematics of what it takes to build a controller that has these kind of control capabilities, which again go far beyond mere trajectory tracking. (Those who think of control as trajectory planning may still have troubles with this; I urge them to reconsider the definition of control, and recognize that the overall mathematical literature on control has always been far wider than this in scope.)

                Furthermore, you can't even understand a subsystem until you know how it fits into the whole system. Therefore, you can't even understand subsystems of the brain until you put them into this greater context, which is neurocontrol: the brain is a neurocontroller.  

 

Capabilities of the Brain As a Controller

 

So, next slide (Figure 10).  This slide shows what I regard as the most exciting and crucial capabilities of the brain as an intelligent controller.  I should have added an extra line here about learning in real time; it's just so obvious, but it's something we've got to keep in mind.

 

 

                                                           Figure 10. Capabilities of the Brain As a Neurocontroller

 

 

                The brain can control millions of actuators in parallel -- well, maybe only 900,000 ‑‑ it's the same principle, huge numbers. What about conventional controllers? Most control engineers regard one actuator as a normal problem and ten as a large problem.  Thus the brain has an incredible capability, very exciting to engineers.  It can handle nonlinearity and noise routinely, without being destabilized.  And above all, most critical, it includes what you might call a long‑term planning horizon.  The AI people would say the long-term planning capability is the real intelligence.  And the brain also has a high‑speed coordination capability through the cerebellum, basically.  

 

Brain Capabilities Versus ANN Capabilities

 

Now, how do these capabilities compare with anything we can conceive of in mathematics?  Is there any hope of understanding them?  Well, we presume there's a hope of understanding them, but is there a way that we can conceive of to understand them? 

                The next slide (Figure 11) provides a list of what's been done in Artificial Neural Networks in control. These are the basic kinds of capabilities that exist today.  I've read hundreds of papers on this topic, but they all boil down to this.  I've seen a lot of people try to wriggle out of my basic taxonomies, but these are basically the capabilities you've got.  You've got people using neural nets in subsystems in control; that's not really neurocontrol.  You've got people who have learned to clone experts, learned to copy a human movement.  You've got people who

 

                        N