Evolution and Neural Networks
With the recent explosion in the popularity of AI, there has been a ton of focus placed upon the ability of Neural Networks to solve a variety of tasks. From cancer detection to ChatGPT, these blackbox systems can sometimes feel like magic. But under the hood, the power of Neural Networks really boils down to how they’re trained.
In traditional Deep Learning, Neural Networks are trained using something called backpropagation, which calculates the gradient of a loss function with respect to the network’s weights. In other words, it determines how much each weight contributed to the error (called the loss).
Backpropagation is used together with an optimization algorithm that aims to minimize the error. Nowadays, these optimization algorithms are almost always a variant of Stochastic Gradient Descent (SGD). Perhaps the most common variant is called Adaptive Moment Estimation, commonly referred to as Adam.
Because these algorithms have been wildly successful in a wide variety of domains, they have become the de-facto way to train a Neural Network. And as a consequence, alternative methods have seemingly been forgotten.
In this post, we will be exploring one such method — one that takes remarkable influence from the biological processes that created you, me, and every other living thing on this planet.
Evolutionary Algorithms
Evolutionary Algorithms (to which Genetic Algorithms are a subset) use the principles of natural selection, reproduction, and genetic mutation to solve a given task. Given that these ideas are popularly known, these algorithms tend to be extremely intuitive and easy to implement.
Like Stochastic Gradient Descent, we can use Evolutionary Algorithms to optimize the parameters (weights and biases) of a neural network. With an Evolutionary Algorithm, the main difference is the fact that we don’t need to calculate the gradients. Of course, the model still completes its forward pass; but, there is no backpropagation involved.
Instead, Evolutionary Algorithms depend on a population of neural networks (called organisms). The organisms that make up our population live in an environment (commonly referred to as an ecosystem). In this way, an organism is akin to the concept of an agent in Reinforcement Learning.
The environment produces data, which is used by organisms to make predictions. These predictions are compared to target values to calculate the rewards, which are stored for each member of the population. Then, the environment performs selection by ranking each organism according to the rewards it received. Organisms that received higher rewards are said to have a higher fitness, and are thus given better odds when it comes to reproductive success.
This reproductive protocol can be best summarized by Herbert Spencer’s phrase, “the survival of the fittest” — which he coined while describing Charles Darwin’s ideas regarding natural selection.
The organisms that are chosen to reproduce create children, which form the next generation of estimators. The mating process is done by blending the weights and biases of each parent. The easiest way to do this is by giving each parent a 50% chance of success at passing on their genes.
Lastly, after each child is created, it undergoes random mutation. This is accomplished by adding gaussian noise to the child’s weights, where the amount of noise can be fixed or variable.
After being randomly mutated, each child is ready to take on its environment, which occurs when the environment once again generates data. This whole process is repeated for some number of generations (which is the same concept as epochs in traditional Deep Learning).
We should pause here, and take a moment to reflect on how the weight updates are performed. In traditional Deep Learning, each weight is updated by calculating its contribution to the loss. In Evolutionary Algorithms, however, the weights are adjusted by randomly replacing 50% of the values with the weights of another network, and then adding in some noise.
Since the weights of both networks are randomly initiated, this means we are replacing randomness with other randomness, and then finally adding in more randomness as the cherry on top.
So why does this work? As you probably guessed, the answer is: randomness.
By chance, the randomness of one network will outperform the randomness of another. And since our environment only allows the best-performing organisms to reproduce, this randomness will eventually converge into a solution.
The key to all of this is the selection mechanism. It is responsible for choosing which “genes” (weights) survive, and which do not. In nature, selective pressure can come from many things, such as predators or the climate. In this “Artificial Selection”, however, the selection mechanism is a function of fitness, which is calculated by comparing an organism’s predictions to the targets.
The fitness function is akin to the loss function in traditional Deep Learning. The main difference is that Evolutionary Algorithms tend to optimize through maximization, while traditional Deep Learning tends to optimize through minimization. For this reason, the fitness function can often be a negated version of a traditional Deep Learning loss function.
With that, let’s take a look at how Evolutionary Algorithms perform on some common Machine Learning tasks.
Regression
Evolutionary Algorithms can be used for regression tasks by choosing the appropriate selection mechanism. To show this, I trained organisms to approximate a sine curve, with values ranging between -1 and 1. Each organism consisted of a simple neural network with the following architecture:
- Two dense layers with 32 neurons, each followed by ReLU activation
- A final dense layer with 1 neuron, with no activation
Each organism ingested 128 data points from the environment to produce an equal number of predictions. The fitness function used to evaluate these predictions was negative mean squared error. Finally, the population size was set to 200.
The results of the best organism after 500 generations can be seen below:
Not too shabby.
While not perfect, the best organism was able to approximate the sine curve reasonably well. Of course, estimating the sine function is a rather trivial task. So, in that vein, let’s take a look at a slightly harder problem.
Classification
To demonstrate how Evolutionary Algorithms can be used for classification, I chose to use the MNIST dataset, which is quite famous in the Machine Learning community. The dataset consists of 60,000 28x28 images of handwritten digits, labeled with an integer in the range 0–9. The goal is to predict which digit is present in the image.
Below are three different samples from the MNIST dataset. The target data for these three samples are 5, 2, and 1, respectively.
I used 70% of the data for training and 30% for testing (evaluation).
Each organism consisted of:
- A dense layer with 256 neurons, followed by ReLU activation
- A final dense layer with 10 neurons, followed by softmax activation to produce a probability distribution
Moreover, each member of the population was given 256 randomly selected samples to predict in order to calculate fitness. The fitness function itself was a negated version of categorical cross-entropy, and the population size was 200.
Evaluation was completed by taking the best performing organism in each generation and calculating its accuracy on the test dataset. The simulation lasted for 900 generations. However, the best performing organism lived in the 820th generation, achieving an accuracy of 78% on the test dataset. The results for each generation can be seen below:
This is a pretty low score for MNIST. However, it should be noted that I didn’t spend too much time experimenting, so this result can definitely be improved. For example, we can use a Convolutional Neural Network (CNN) to process the 2D images, rather than using a dense network to process the flattened images.
Nonetheless, the fact that we were able to achieve 78% with minimal effort gives some credence to the effectiveness of Evolutionary Algorithms.
Conclusion
Because Evolutionary Algorithms are said to approximate the gradient, they are usually classified as near-optimal solutions. They are able to explore a wide range of possibilities, allowing for an increased chance at obtaining optimality.
To me, Evolutionary Algorithms are extremely interesting. Perhaps I have some user bias, though, considering they mimic the same processes that allow me to exist. Regardless, the fact that a population of randomness can converge towards a solution — through the very familiar processes of selection, reproduction, and mutation — is pretty cool.
If you’re interested, the code for these experiments can be found here: https://github.com/trevormcguire/blog/blob/main/neuroevolution/neuroevolution.ipynb
Thanks for reading!