CHAPTER I: Memory and Learning
Article 2: History of a fabulous encounter between artificial intelligence and neuroscience
[Welcome to the “Brain & IA” series: a series that compares neuroscience with artificial intelligence! The series will deal with different topics embodied through different chapters. Each chapter is composed of several articles. Each article can be read independently.]
Here you are in the second article ("History of a fabulous encounter between artificial intelligence and neuroscience") of the first chapter of the series "Memory and Learning".
This article draws a parallel between human learning (Article 1 for more details) and machine learning based on artificial intelligence.
Introduction
“While there are many domains where AI is superior, humans still have an advantage when tasks depend on the flexible use of episodic memory” (a memory type that links with our life events) said Martin Chadwick, a researcher at DeepMind.
Artificial intelligence (AI) has seen many advances since the mid-twentieth century. Many public successes are well-known such as the victory of Deep Blue - IBM computer - over Garry Kasparov - chess champion - in a game played in 1997 (see image 1). But the field of AI often took its inspiration from neuroscience. The complexity of the brain and its incredible ability to memorize and learn is a source of algorithms creation. Therefore, the history of learning systems is influenced by biomimicry. Therefore, neuroscience has contributed and continues to contribute to the improvement of AI in two different ways: firstly by helping in the creation of new algorithms; second, as a validation of artificial intelligence techniques.
Image 1: Historical timeline of key events in artificial intelligence (1).
The emergence of the first neurons
The term artificial intelligence appeared in 1955 and was uttered by Marvin Minsky, a famous stakeholder in AI. But interest in AI manifested itself a few years earlier. In 1943, Warren McCulloch and Walter Pitts published an article dealing with “artificial neurons” inspired by biological neurons (2). Six years later, the work of the Canadian psychologist and the neuropsychologist Donald Hebb played a major role in improving these artificial neurons. He published in 1949 the book “The organization of Behavior: a Neuropsychological theory” (3). He tackled the idea that human learning lies in the strength of connections between neurons. In addition, he defined the following learning rule: when two neurons are excited together, the efficiency of the connection that connects them increases (by creating a new link between these neurons or by strengthening the current link). On the other hand, the out-of-sync activation of these two neurons causes a decrease in the efficiency of this connection.
Image 2: Examples of artificial neurons from the article by Warren McCulloch and Walter Pitts (2).
The Perceptron or the first machine with learning skills
The work of McCulloch, Pitts and Hebb inspired the creation of one of the first learning systems in 1957: the Perceptron (4). The Perceptron was an analog computer created by Frank Rosenblatt made up of the neurons of McCulloch and Pitts. This computer was able to handle simple learning tasks (example: recognizing the letter A against the letter C). The Perceptron was a binary linear classifier: it could define whether an object belongs to one class rather than another. In other words, after a training, he might be able to answer “Yes” or “No” depending on a given entry (for example, saying that it is not an “A” if you present him an "C"). To do this, the system calculated weighted sums, namely, it calculated the total sum of the incoming data taking into account their weights in the decision. Let’s take for example the image 3-C. The input data are pixels. In the diagram, all pixels have a similar weight in the decision (0.25). If pixel 1 and 3 light up and the sum of those pixels multiplied by their weights is greater than a given threshold, the system will light up ("Bright"). If the Perceptron gives an incorrect answer (example: it gives the answer “Bright” when the real answer was “Dark”), we have to correct it by changing the weight of the pixels in question in order to produce a correct answer. In this case, we can reduce the synaptic weight of pixels 1 and 3 so that the result will be below the established threshold.
Image 3: top - A: Rosenblatt and the Perceptron (5). Middle - B: Organization of the Perceptron (4). Bottom - C: diagram of the organization of the Perceptron (6).
It is an extremely simplified model of the calculations performed by biological neurons. In fact, in the brain, a neuron can pass information to another neuron if and only if this first neuron activates beyond a certain threshold. For this, he must have previously received a set of excitations whose sum exceeds the threshold in question. Thus, each neuron has a capacity of synthesis (such as the calculation of the sum of the synaptic weights of the neurons of the Perceptron) to produce a positive or neutral response (i.e. without excitation). But the Perceptron has only one layer of neurons. This limits him to solving very simple problems (he is for example unable to recognize handwriting).
Image 4: Representation of a neuron and its myelinated axon with the input data at the level of the dendrites (“inputs”) and its output data at the level of the axonal terminals (“output”). (7)
A more connected Perceptron: the neural networks
As the brain has approximately 100 billion neurons, it is quickly understood that Perceptron is to the nervous system what the discovery of fire is to innovation. This big analog computer can currently be summed up in 5 lines of code! This is why artificial neural networks appeared around 1985. They take over the architecture of neural connections by increasing the number of synapses and the number of original layers of the Perceptron. Hidden layers can exist between the input layer and the output layer. This shows a higher level of complexity, which gets close to the functioning of the brain. Therefore, artificial neural networks are a Perceptron with several layers of neurons.
These neural networks determined the connectionist current which is now proving to be very famous. The “Deep learning” also refers to the connectionist current. These marketing words highlight the increased complexity of artificial neural networks compared to the Perceptron (“deep”: approximately 3 layers of neurons minimum). For simplicity, imagine a machine with several buttons capable of lighting a LED. These knobs can be set to different intensities. Each modification of these intensities acts on the response. For example, if you want the machine to turn on the led, you will have to adjust the button settings several times until you find the perfect combination that produces the exact responses (i.e. led illuminations). Artificial neural networks work the same: you have to adapt the weightings of each input to train the machine (see my magnificent diagram - image 5). This is possible thanks to "gradient backpropagation", a statistical method for calculating the error gradient for each neuron, from one layer to another. Currently, artificial neural networks are particularly used for facial recognition purposes.
You can have fun by training a neural network by drawing here: https://quickdraw.withgoogle.com.
Image 5: Representation of artificial neural networks (right) and the multiple button machine metaphor (left).
How to recognize a dog image thanks to Convolutional networks
The story continues with the creation of convolutional networks (or CNN for "Convolutional Neural Networks"). These artificial neural networks were created by Yann Le Cun during the years of the fabulous "Bell Laboratories", a historic laboratory where many successes in AI have been recorded. Inspired by the architecture of the brain's visual cortex, CNN are a type of multi-layered artificial neural networks. They have the particularity of filtering the images by extracting different characteristics. Like the visual system, CNN neurons have receptive fields: they capture only a part of the image and filter this image in order to produce a "smaller" one (easier to process). This step is called convolution. Several filters, or kernels, exist and each of them is specialized in pattern recognition. For example, the first filter can recognize outlines, the second the brightness, and so on. Neurons perform this convolution step for each different filter, producing one new image per filter. The images generated will be processed again by another mathematical operation, called the “pooling” step. This new operation aims to target the pixels with the highest value (To find out more: (explanatory video: 8, convolutional networks demo: 9)). Thus, the convolution and pooling steps continue one after the other until the end of image processing. Unlike traditional artificial neural networks, this technique eliminates the need for joint processing of a multitude of pixels. Therefore, recognizing more complex images - recognizing a dog photo rather than the letter "A" - becomes possible!
CNNs appeared in the late 1980s but were forgotten 10 years later because of the inability to apply the method with the low power of the computers. However, CNNs became famous again from 2012 after the overwhelming victory of a team using this method over another learning system ("System Vector Machine" or "SVM") in the ImageNet competition. In 2016, the CNNs showed their effectiveness one more time when the AlphaGo machine won the game of Go against a professional player (Lee Sedol was beaten 4-1 against AlphaGo, a machine belonging to DeepMind). Victims of their success, CNNs are now used for image recognition, speech and natural language processing. They have enabled many applications such as machine translation, self-driving cars or medical image analysis systems.
When learning systems try to mechanize thinking
Many learning systems, other than neural networks and CNNs, emerged during this second half of the twentieth century. This is the case of the "symbolic AI" current : a movement which reached its zenith from 1970 to 1980 during a period of disinterest in neural networks. Unlike the connectionist current which starts from perception to create a more complex learning system, the symbolic current tries to mechanize thought processes. This “top / down” approach is known as “expert systems”: systems that translate all thought processes into rules. For example, if we want to mechanize diagnosis, we will begin a rule creation phase with doctors to define decision “protocols” (perform a certain blood test according to the patient or decide on a particular diagnosis depending on the results). Like the human brain, all of these business rules were processed and interpreted by a central system called the inference engine. Despite the success of these methods in the 1970s and 1980s, expert systems experienced a decline. Indeed, reducing everything to a set of rules and tests remains complex and unreliable. These methods are little used today, and the general population keep an annoyed memories of the Windows paperclip.
Why a baby learns better than the most powerful machine in the world
With all the advances in AI, why is it still impossible to have a learning system as efficient as the human brain? Today, all the dedicated efforts of AI can only reproduce a process performed with less than 1 second in our brain. Indeed, artificial networks mimic the basic visual recognition of our central nervous system, from perception by the eyes to signal propagation in the visual cortex.
The answer to this question is the following: the machine needs data and it is unable to unsupervised learn by observing the world. Indeed, it is necessary to label the data so that the machine learns. However, a baby learns by observing the world without needing the names of all objects. On the other hand, humans have 2 learning systems: a bottum-up system and a top-down system. The first one is commonly used by machines: we learn from the data that surrounds us (sensory data (“bottum”) and process these data into our central nervous system (“up”). Unlike the bottum-up system, the top-down system is an inference system. It is a Bayesian probabilistic system that allows us to make hypotheses about the world. These hypotheses are established according to the experience of each individual. Thanks to these assumptions, our brain becomes predictive: a machine capable of anticipating each situation. If a situation becomes surprising and was not foreseen according to our assumptions, we rectify our system of assumptions by integrating an error. For example, a baby will hypothesize that birds fly after seeing several of them flying. However, if he will see an ostrich, his learning system will warn him and will modify the hypothesis created for birds (ie, to include in his hypothesis that some birds do not fly).
Image 6: Representation of the two human learning systems: the bottom-up system and the top-down system.
In addition, interesting discussions are emerging regarding the contribution of the innate system in learning. During a public debate with Gary Marcus, a psychologist at New York University, and Yann Le Cun, the scientific manager of the AI research laboratory at Facebook, the following question was raised: what dose of innate structure should we put in artificial intelligence systems for emerging intelligence? To answer this question, we can focus on babies’ brains. Is a baby's brain disorganized? Is a baby born with a blank neural system that must learn from scratch like machines? It turns out not. Babies' brains have already well-established networks taking the form of brain areas that are specific to different cognitive functions (hearing area, visual area, tactile area, etc.). For example, if a baby listens to his mother tongue, the auditory information will be transcribed into the language network, the same as that of the adult. Therefore, the brain comes with a pre-wired architecture and will be flexible throughout life.
So, from the Bayesian inference system to the innate's part in intelligence, we are still a long way from intelligent machines capable of learning to learn.
BIBLIOGRAPHY:
https://qbi.uq.edu.au/brain/intelligent-machines/history-artificial-intelligence
McCulloch, W., and Pitts, W. (1943). A logical calculus of ideas immanent in nervous activity. Bull. Math. Biophys. 5, 115–133.
Hebb, D.O. (1949). The Organization of Behavior (John Wiley & Sons).
Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408.
https://news.cornell.edu/stories/2019/09/professors-perceptron-paved-way-ai-60-years-too-soon
https://towardsdatascience.com/what-the-hell-is-perceptron-626217814f53
Savage, Neil. ‘How AI and Neuroscience Drive Each Other Forwards’. Nature 571, no. 7766 (24 July 2019): S15–17. https://doi.org/10.1038/d41586-019-02212-4.
La plus belle histoire de l'intelligence,S. Dehaene, Y. Le Cun, J. Girardon, Robert Laffont, 2018.
Hassabis, Demis, Dharshan Kumaran, Christopher Summerfield, and Matthew Botvinick. ‘Neuroscience-Inspired Artificial Intelligence’. Neuron 95, no. 2 (19 July 2017): 245–58. https://doi.org/10.1016/j.neuron.2017.06.011.
‘Intelligence artificielle : du Perceptron au premier Macintosh, la préhistoire d’une révolution’. Le Monde.fr, 17 July 2018. https://www.lemonde.fr/series-d-ete-2018-long-format/article/2018/07/17/du-perceptron-au-premier-macintosh-la-prehistoire-d-une-revolution_5332451_5325928.html.
https://towardsdatascience.com/the-fascinating-relationship-between-ai-and-neuroscience-89189218bb05
Yorumlar