Neural networks are a key technology in machine learning and serve as the foundation for the widely popular field of deep learning. Learning about neural networks not only equips you with a powerful machine learning technique but also helps you better understand deep learning technologies.
This article provides a straightforward, step-by-step introduction to neural networks, ideal for those with limited knowledge of the subject. While some basic understanding of machine learning will be helpful, it is not a strict requirement for reading this article.
Table of Contents
ToggleThe Structure of Neural Networks
Neural networks are a machine learning technique designed to simulate the neural network of the human brain, with the goal of achieving artificial intelligence. The human brain’s neural network is extraordinarily complex, containing roughly 100 billion neurons in an adult brain.
The concept behind machine learning neural networks is to replicate this complexity and achieve surprisingly effective results. Through this article, you will learn how neural networks achieve this feat, explore the history of neural networks, and discover the best ways to study them.
1. Introduction
Let’s look at a classic example of a neural network. This is a three-layer neural network where the red layer represents the input layer, the green layer is the output layer, and the purple layer is the hidden layer. The input layer has three units, the hidden layer has four units, and the output layer has two units. We will use these color codes throughout the article to represent the network’s structure.
In designing a neural network, it’s important to note that the number of nodes in the input and output layers is typically fixed, while the number of nodes in the hidden layers can be freely chosen. In addition, the diagram’s arrows represent the flow of data during the prediction process, which differs slightly from the data flow during training.
The key to the network’s function is not the circles (representing neurons) but the lines connecting them, which represent the connections between neurons. Each connection has a weight that must be learned during training.
Another common way of representing a neural network is with the input layer at the bottom and the output layer at the top. This is often used in frameworks like Caffe. In this article, we will follow the left-to-right structure used in Andrew Ng’s work.
2. Neurons
2.1. The Basics
The study of neurons dates back over a century. By 1904, biologists had already identified the basic structure of a neuron. A typical neuron has multiple dendrites, which receive incoming signals, and one axon, which transmits signals to other neurons via axon terminals. The connection between the axon terminal and the dendrite of another neuron is called a synapse.
The structure of a biological neuron can be illustrated as follows:
2.2. The Neuron Model
In 1943, psychologists McCulloch and mathematician Pitts proposed an abstract model of a neuron, known as the McCulloch-Pitts (MP) model. This model served as the foundation for modern neural networks. Here’s a basic representation of the neuron model:
In this model, the neuron has three main components: inputs, outputs, and a computation function. Inputs correspond to the dendrites, outputs correspond to the axon, and the computation occurs in the cell body. The arrows between the neurons represent connections, each with a weight. In the MP model, z is the linear weighted sum of the inputs and weights, with the value of a function g added to it. In the MP model, the function g is the sign function, which outputs 1 when the input is greater than 0 and outputs 0 otherwise.
2.3. The Connection
The connections between neurons are the most important part of the neural network. Each connection has a weight, and a neural network’s learning algorithm adjusts these weights to improve the network’s predictions.
If we represent the input as a
and the weight as w
, the signal passing through the connection is a * w
. Thus, the signal at the end of the connection is weighted by w
.
3. Single-Layer Neural Networks (Perceptrons)
3.1. Introduction to Perceptrons
In 1958, computational scientist Frank Rosenblatt introduced the perceptron, a two-layer neural network that could be trained. The perceptron was the first artificial neural network capable of learning, and Rosenblatt demonstrated its ability to recognize simple images. This breakthrough led to a surge of interest in neural networks, particularly from the U.S. military, which considered neural network research more important than atomic bomb development. This period marked the first major rise in neural network research.
3.2. The Perceptron Model
The perceptron adds input nodes to the original MP model, creating a network with an input layer and an output layer. The input layer only transmits data, while the output layer computes values based on the input. Here’s how the perceptron model looks:
Each of the output units computes a weighted sum of the inputs, and the weights are learned during training. If we are predicting a vector (e.g., [2, 3]), we simply add more output units to the layer.
3.3. Perceptron’s Effectiveness
The perceptron is essentially a linear classifier, and its decision boundaries can be visualized as a line (in 2D), a plane (in 3D), or a hyperplane (in higher dimensions). The perceptron works well for tasks where the data is linearly separable.
Despite early enthusiasm, Minsky’s 1969 critique revealed its limitations:
- Could not solve non-linear problems like XOR.
- Triggered the first “AI winter” due to overhyping.
4. Two-Layer Neural Networks (Multilayer Perceptrons)
4.1. Introduction
Two-layer neural networks, or multilayer perceptrons (MLPs), became popular when researchers realized that adding a hidden layer could enable networks to solve more complex problems, like XOR (exclusive OR), which the perceptron could not. However, the computation involved in training these networks was still difficult.
In 1986, David Rumelhart and Geoffrey Hinton proposed the backpropagation (BP) algorithm, which solved the computational challenges of training multilayer networks. This development sparked a surge of interest in neural networks and laid the foundation for modern deep learning.
4.2. The MLP Model
A multilayer perceptron consists of an input layer, one or more hidden layers, and an output layer. Each layer in the network performs computations on the inputs it receives from the previous layer.
4.3. The Role of Backpropagation
The backpropagation algorithm enables the efficient computation of gradients, allowing the weights of the network to be adjusted during training. By iterating over the training data, the network learns to minimize the error between its predicted output and the actual target values.
These networks could approximate any continuous function but faced challenges:
- Local optima in training
- Manual hyperparameter tunin
5. Deep Neural Networks
Geoffrey Hinton’s 2006 deep belief networks reignited interest via:
- Pre-training: Initializing weights close to optimal solutions.
- GPU acceleration: Enabled practical training of deep architectures.
- ReLU activation: Addressed vanishing gradients and sped convergence.
Modern deep networks excel in tasks like image recognition by learning hierarchical features: edges → shapes → patterns → objects.
6. Historical Context
Neural networks have endured three waves:
- 1950s–60s: Perceptron hype and collapse
- 1980s–90s: BP algorithm and second AI winter
- 2006–present: Deep learning resurgence
Critical enablers include:
- Increased computational power
- Big data availability
- Algorithmic innovations (e.g., dropout, batch normalization)
7. Future Directions
- Quantum computing: Potential to simulate brain-scale networks.
- Generalization: Improving robustness beyond training data.
- Ethical AI: Democratizing access via projects like OpenAI.
8. Conclusion
From single neurons to deep architectures, neural networks have evolved into versatile function approximators. Their success stems from:
- Hierarchical feature learning
- Scalable matrix operations
- Hardware advancements
While challenges remain, neural networks continue reshaping AI applications across industries.