[Bmi] Deep Learning, Convolution, and Error Back-Propagation

Mon Mar 30 19:33:27 EDT 2015

Hi John,
I'm sad to hear about the list and the moderators.
Frankly this and the fact that I do not have a position (tenured or
otherwise) is why I am not chiming in.  In fact below is what I had
written as a draft and never sent.
However if you need any help incognito, I am happy to help.
---
I completely agree with your new paragraphs and statements about
backprop and similar biologically implausible approaches, but no one
really wants to take alternatives seriously.

I can show mathematically that the same recognition using distributed
weights (as determined by a backprop or Hinton's mechanism and so on)
can be achieved without them by using localist weights and Hebbian
learning.
The key is instead to use dynamics and top down connections DURING
RECOGNITION to minimize reconstruction error.
However no one seems to even want to consider this.  Maybe 20 years
need to pass?
15 have already passed.

Of course this can also be connected to learning that is interested in
outcomes as you suggest (and which I also agree with).  It also gives
real time feedback about inputs involved in reconstruction error to
guide segmentation and attention.

See my new tutorial:
https://drive.google.com/open?id=0B6jhtnREPl-dMzVaTHdTaU9iWmc&authuser=0
It is still not fully incorporated (You have to download it to view it).

-Tsvi

On Sun, Mar 22, 2015 at 9:41 AM, Juyang Weng <weng at cse.msu.edu> wrote:
>
> [Instructions to subscribe or unsubscribe are attached at the end]
>
>
>
> Dear colleagues,
>
> This is a discussion about well known techniques, not specifically about whose work.   We have had many papers about neural networks.  But we did not have sufficiently honest discussion on well-known techniques.  At least I hesitated very much to discuss such a subject, because Profs. X, Y, Z used such techniques.  This lack of honesty has caused a lot of waste in resources, including time (of our professors, researchers, postdocs, and graduate students) and money (governments, private foundations, and companies).   Still, I am afraid that the following paragraphs will make some well known researchers angry.  For that reason, the following discussion has identified myself (J. Weng) who should be blamed for using some of the well-known techniques.  I also made mistakes.  Please accept my apology.
>
> Please reply with your comments.
>
> ---- some new paragraphs in the Brain Principles Manifesto ----
>
> Industrial and academic interests have been keen on a combination of two things — easily understandable tests (e.g., G. Hinton et al. NIPS 2012, congratulations!) and major companies are involved (e.g., Google, thanks!).  We have read statements like “our results can be improved simply by waiting for faster GPUs and bigger datasets to become available” (G. Hinton et al. NIPS 2012).  However, the newly known brain principles have told us that the ways to conduct such tests (e.g., ImageNet) will give only vanishing gains that do not lead to a human-like zero error rate, regardless how long the Moore’s Law can continue and how many more static images are added to the training set.  Why?  All such tests used static images in which objects mix with the background. Such tests therefore prevent participating groups from seriously considering autonomous object segmentation (free of handcrafted object model).  Through synapse maintenance (Y. Wang et al. ICBM 2012), neurons in a human brain automatically cut off inputs from background pixels if background pixels matched badly compared with attended object pixels.  Our babies spend much more time in dynamic physical world than seeing static photos.
>
> Our industry should learn more powerful brain mechanisms that went beyond conventional well-known, well-tested techniques.  The following gives some examples:
>
> (1) Deep Learning Networks (e.g., J. Weng et al. IJCNN 1992, Y. LeCun et al. Proceedings of IEEE 1998, G. Hinton et al. NIPS 2012) are not only biologically implausible but also functionally weak.  The brain uses a rich network of processing areas (e.g., Felleman & Van Essen, Cerebral Cortex 1991) where connections are almost always two-way (J. Weng, Natural and Artificial Intelligence, 2012), not a cascade of modules as in the Deep Learning Networks.  Such a Deep Learning Network is not able to conduct top-down attention in a cluttered scene (e.g., attention to location or type in J. Weng, Natural and Artificial Intelligence, 2012 or attention to more complex object shape as reported in L. B. Smith et al. Developmental Science 2005).
>
> (2) Convolution (e.g., J. Weng et al. IJCNN 1992, Y. LeCun et al. Proceedings of IEEE 1998, G. Hinton et al. NIPS 2012) is not only biologically implausible, but also computationally weak. Why? All feature neurons in the brain carry not only sensory information but also motor information (e.g., Felleman & Van Essen, Cerebral Cortex 1991) so that later-processing neurons become less concrete and more abstract --- which is impossible to accomplish using the shift-invariant convolution.   Namely, convolution is always location-concrete (even using max-pulling) and never location-abstract.
>
> (3) Error back-propagation in neural networks (e.g., G. Hinton et al. NIPS 2012) is not only biologically implausible (e.g., a baby does not have error in his motors) but also damaging to long-term memory because of its lack of match-based competition for error-causality (such as those in SOM, LISSOM, and LCA as optimal SOM).   Even though the gradient vector identifies a neuron that can reduce the current error, the current error is not the business of that neuron at all and it must keep its own long-term memory unchanged.  That is why error back-propagation is well known to be bad for incremental learning and requires research assistants to try many guesses of initial weights (i.e., using the test set as the training set!).  Let us not be blinded by artificial low error rates.
>
> Do our industry and public need another 20 years?
>
> ---- end of the new paragraphs -----
> Full text:
>
> The Brain Principles Manifesto
> (Draft Version 4.5)
>
> March 21, 2015
>
> Historically, public acceptance of science was slow.  For example, Charles Darwin waited about 20 years (from the 1830s to 1858) to publish his theory of evolution for fear of public reaction.  About 20 years later (by the 1870s) the scientific community and much of the general public had accepted evolution as a fact.   Of course, the debate on evolution still goes on today.
>
> Is the public acceptance of science faster in modern days?  Not necessarily so, even though we have now better and faster means to communicate.   The primary reason is still the same but much more severe—the remaining open scientific problems are more complex and the required knowledge goes beyond a typical single person.
>
> For instance, network-like brain computation — connectionist computation (e.g., J. McClelland and D. Rumelhart, Parallel Distributed Processing, 1986) — has been long doubted and ignored by industry.   Deep convolutional networks appeared by at least 1980 (K. Fukushima).  Max-pooling technique for deep convolutional networks was published by 1992 (J. Weng et al.).  However, Apple, Baidu, Google, Microsoft, Samsung, and other major related companies did not show considerable interest till after 2012.  That is a delay of about 20 years.  The two techniques above are not very difficult to understand.  However, these two suddenly hot techniques have already been proved obsolete by the discoveries of more fundamental and effective principles of the brain, six of which are intuitively explained below.
>
> Industrial and academic interests have been keen on a combination of two things — easily understandable tests (e.g., G. Hinton et al. NIPS 2012, congratulations!) and major companies are involved (e.g., Google, thanks!).  We have read statements like “our results can be improved simply by waiting for faster GPUs and bigger datasets to become available” (G. Hinton et al. NIPS 2012).  However, the newly known brain principles have told us that the ways to conduct such tests (e.g., ImageNet) will give only vanishing gains that do not lead to a human-like zero error rate, regardless how long the Moore’s Law can continue and how many more static images are added to the training set.  Why?  All such tests used static images in which objects mix with the background. Such tests therefore prevent participating groups from seriously considering autonomous object segmentation (free of handcrafted object model).  Through synapse maintenance (Y. Wang et al. ICBM 2012), neurons in a human brain automatically cut off inputs from background pixels if background pixels matched badly compared with attended object pixels.  Our babies spend much more time in dynamic physical world than seeing static photos.
>
> Our industry should learn more powerful brain mechanisms that went beyond conventional well-known, well-tested techniques.  The following gives some examples:
>
> (1) Deep Learning Networks (e.g., J. Weng et al. IJCNN 1992, Y. LeCun et al. Proceedings of IEEE 1998, G. Hinton et al. NIPS 2012) are not only biologically implausible but also functionally weak.  The brain uses a rich network of processing areas (e.g., Felleman & Van Essen, Cerebral Cortex 1991) where connections are almost always two-way (J. Weng, Natural and Artificial Intelligence, 2012), not a cascade of modules as in the Deep Learning Networks.  Such a Deep Learning Network is not able to conduct top-down attention in a cluttered scene (e.g., attention to location or type in J. Weng, Natural and Artificial Intelligence, 2012 or attention to more complex object shape as reported in L. B. Smith et al. Developmental Science 2005).
>
> (2) Convolution (e.g., J. Weng et al. IJCNN 1992, Y. LeCun et al. Proceedings of IEEE 1998, G. Hinton et al. NIPS 2012) is not only biologically implausible, but also computationally weak. Why? All feature neurons in the brain carry not only sensory information but also motor information (e.g., Felleman & Van Essen, Cerebral Cortex 1991) so that later-processing neurons become less concrete and more abstract --- which is impossible to accomplish using the shift-invariant convolution.   Namely, convolution is always location-concrete (even using max-pulling) and never location-abstract.
>
> (3) Error back-propagation in neural networks (e.g., G. Hinton et al. NIPS 2012) is not only biologically implausible (e.g., a baby does not have error in his motors) but also damaging to long-term memory because of its lack of match-based competition for error-causality (such as those in SOM, LISSOM, and LCA as optimal SOM).   Even though the gradient vector identifies a neuron that can reduce the current error, the current error is not the business of that neuron at all and it must keep its own long-term memory unchanged.  That is why error back-propagation is well known to be bad for incremental learning and requires research assistants to try many guesses of initial weights (i.e., using the test set as the training set!).  Let us not be blinded by artificial low error rates.
>
> Do our industry and public need another 20 years?
>
> On the other hand, neuroscience and neuropsychology have made many advances by providing experimental data (e.g., Felleman & Van Essen, Cerebral Cortex 1991).   However, it has been well recognized that these disciplines are data-rich and theory-poor.  The phenomena of brain circuits and brain behavior are extremely rich. Many researchers in these areas use only local tools (e.g., attracters that can only be attracted into local extrema) and consequently have been overwhelmed by the richness of brain phenomena.   A fundamental reason is that they miss the guidance of the global automata theory of computer science, although previous automata do not emerge.  For example, X. -J. Wang et al. Nature 2013 stated correctly that neurons of mixed selectivity were rarely analyzed but have widely observed.  However, the mixed selectivity has already been well explained, as a special case, by the new Emergent Turing Machine in Developmental Networks in a theoretically complete way.  The traditional Universal Turing Machine is a theoretical model for modern-day computers --- how computers work --- but they do not emerge.   The mixed selectivity of neurons in such a new kind of Turing Machine are caused by emergent and beautiful brain circuits, but each neuron still uses a simple similarity of inner product in its high dimensional and dynamic input space.
>
> October 2011, a highly respected multi-disciplinary professor kindly wrote: “I tell these students that they can work on brains and do good science, or work on robots and do good engineering.  But if they try to do both at once, the result will be neither good science nor good engineering.”  How long does it take for the industry and public to accept that the pessimistic view of the brain was no longer true even then?
>
> The brain principles that have already been discovered could bring fundamental changes in the way humans live, the way countries and societies are organized, our industry, our economy, and the way humans treat one another.
>
> The known brain principles have told us that the brain of anybody, regardless of his education and experience, is fundamentally shortsighted, in both space and time.  Prof. Jonathan Haidt documented well such shortsightedness in his book “The Righteous Mind: Why Good People Are Divided by Politics and Religion”, although not in terms of brain computation.
>
> In terms of brain computation, the circuits in your brain self-wire beautifully and precisely according to your real-time experience (the genome only regulates) and their various invariance properties required for abstraction also largely depend on experience.  Serotonin (caused by, e.g., threats), dopamine (caused by e.g., praise), and other neural transmitters quickly bias these circuits so that neurons for more long-term thoughts lost in competition to fire.  Furthermore, such bias has a long-term effect.  Therefore, you make long-term mistakes but you still feel you are right.   Everybody is like that.  Depending on experience, shortsightedness varies in terms of subject matter.
>
> Traditionally, many domain experts think that computers and brain appear to use very different principles.  Naturally emerging Turing Machine in Developmental Networks that has been mathematically proved (see J. Weng, Brain as an Emergent Finite Automaton: A Theory and Three Theorems, IJIS, 2015) should change our intuition.
> The new result proposed the following six brain principles:
> 1.    The developmental program (genome-like, task-nonspecific) regulates the development (i.e., lifetime learning) of a task-nonspecific “brain-like” network —— Developmental Network.  The Developmental Network is of general-purpose—can learn any body-capable tasks, in principle.   Not only pattern recognition.
> 2.    The brain’s images are naturally sensed images of cluttered scenes where many objects mix.  In typical machine training (e.g., Krizhevsky et al. NIPS 2012),  each training image has a bounding box drawn around each object to learn, which is not the case  for a human baby.  Neurons in the Developmental Network automatically learn object segmentation through synapse maintenance.
> 3.    The brain’s muscles have multiple subareas where each subarea represents either declarative knowledge (e.g., abstract concepts such as location, type, scale, etc.) or non-declarative knowledge (e.g., driving a car or riding a bicycle).   Not just discrete class labels in global classification.
> 4.    Each brain in the physical world is at least is a Super Turing Machine in a Developmental Network.  Every area in the network emerges (does not statically exist, see M. Sur et al. Nature 2000 and P. Voss, Frontiers in Psychology 2013) using a unified area function whose feature development is nonlinear but free of local minima, contrary to engineering intuition --- not convolution; not error back-propagation.
> 5.    The brain’s Developmental Network learns incrementally—taking one-pair of sensory pattern and motor pattern at a time to update the “brain” and discarding the pair immediately after.  Namely, a real brain has only one pair of stereoscopic retinas which cannot store more than one pair of image.  Batch learning (i.e., learn before test) is not scalable:  Without a mistake in an early test, a student cannot learn how to correct the mistake later.
> 6.    The brain’s Developmental Network is always optimal—Each network update in real time computes the maximum likelihood estimate of the “brain”, conditioned on the limited computational resources and the limited learning experience in its “life” so far.   One should not use the test set as a training set: report only the best network after trying many networks on the test set.
>
> The logic completeness of a brain is (partially, not all) understood by a Universal Turing Machine in a Developmental Network.  This emergent automaton brain model proposes that each brain is an automaton, but also very different from all traditional symbolic automata because it programs itself—emergent.  No traditional Turing Machine can program itself  but a brain Turing Machine does.
>
> The automaton brain model has predicted that brain circuits dynamically and precisely record the statistics of experience, roughly consistent with neural anatomy (e.g., Felleman & Van Essen, Cerebral Cortex, 1991).  In particular, the model predicted that “shifting attention between `humans’ and `vehicles’ dramatically changes brain representation of all categories” (J. Gallant et al. Nature Neuroscience, 2013) and that human attention “can regulate the activity of their neurons in the medial temporal lobe” (C. Koch et al. Nature, 2010).  The “place” cells work of the 2014 Nobel Prize in Physiology or Medicine implies that neurons encode exclusively bottom-up information (place). The automaton brain model challenges such a view: Neurons represent a combination of both bottom-up (e.g., place) and top-down context (e.g., goal) as reported by Koch et al. and Gallant et al.
>
> Unfortunately, the automaton brain model implies that all neuroscientists and neural network researchers are unable to understand the brain of their studies without a rigorous training in automata theory.   For example, traditional models for nervous systems and neural networks focus on pattern recognition and do not have the capabilities of a grounded symbol system (e.g., “rulefully combining and recombining,” Stevan Harnad, Physica D, 1990).  The automata theory deals with such capabilities.  Does this new knowledge stun our students and researchers or guide them so their time is better spent?
>
> Brain automata would enable us to see answers to a wide variety of important questions, some of which are raised below. The automaton brain model predicts that there is no absolute right or wrong in any brain but its environmental experiences wire and rewire the brain.   We do not provide yes/no answers here, only raise questions.
>
> How can our industry and public understand that the door for understanding brains has opened for them?  How can they see the economical outlooks that this opportunity leads them to?
>
> How should our educational system reform to prepare our many bright minds for the new brain age?   Has our government been prompt to properly respond to this modern call from the nature?
>
> How should our young generation act for the new opportunity that is unfolding before their eyes?  Is a currently narrowly defined academic degree sufficient for their career?
>
> How can everybody take advantage of the new knowledge about his own brain so that he is more successful in his career, including statesmen, officials, educators, attorneys, entrepreneurs, doctors, technicians, artists, workers, drivers, and other mental and manual workers?
>
> Regardless where we are and what we do, we are all governed by the same set of brain principles.  Everybody’s brain automatically programs itself.
>
> ---- end of the manifesto ----
>
> -John
>
> --
> --
> Juyang (John) Weng, Professor
> Department of Computer Science and Engineering
> MSU Cognitive Science Program and MSU Neuroscience Program
> 428 S Shaw Ln Rm 3115
> Michigan State University
> East Lansing, MI 48824 USA
> Tel: 517-353-4388
> Fax: 517-432-1061
> Email: weng at cse.msu.edu
> URL: http://www.cse.msu.edu/~weng/
> ----------------------------------------------
>
>
>
>
> _______________________________________________
> BMI mailing list
> BMI at lists.cse.msu.edu
> http://lists.cse.msu.edu/cgi-bin/mailman/listinfo/bmi
>
> To unsubscribe send an e-mail to:
>
> bmi-leave at lists.cse.msu.edu
>
> Also, to subscribe or unsubscribe go to
>
> http://lists.cse.msu.edu/cgi-bin/mailman/listinfo/bmi
>
> and enter your e-mail address in the provided box and confirm your action by responding the the e-mail sent by listserv.