Neural Networks: How They Work and Why They Matter

By —

Reading Time:

6 minutes

•

Apr 1, 2024

In Hervé Abdi’s et al. (1999) book on Neural networks, he defined them as

Neural networks are adaptive statistical methods based on the analogy with the structure of the brain.

Neural networks or artificial neural networks (ANN) are used as statistical tools in different fields of study like biology, psychology, statistics, econometrics, and others. The world cognitive also often goes hand in hand with neural networks due to their similarity to the architecture of the human brain.

The Human Brain and Neurons

The human brain is a very efficient and intricate organ that allows us to understand, learn, and adapt to new things daily. If we can mimic the way it works and apply it to machine learning systems, we can make many of our tasks that require hours of work easier and get them done in less time.

Our brain works with the help of the nerves called neurons and there are billions of them present in our body with varying shapes and types. The image below shows the basic structure of the neuron. It consists of dendrites, a body, and a long axon. The dendrites of the neuron receive a synapse (the transmitter signal in the brain) and send out the signals through axons. These synapses are the connecting points of axons with other axons and dendrites with other dendrites as seen in the image. The principle of this work is the established electrical potential. A threshold is fixed and if the electrical potential or membrane potential reaches that threshold, a spike occurs sending signals down the axons to the neighboring neuron. All this is happening in a matter of milliseconds.

Each neuron is an individual unit and works simply by deciding whether to send the signal or not. This is the foundation of neural networks in computers and machines. By mimicking the way the brain and neurons work, computers can be made to learn the same way as neurons are parallel to the billions of processing elements in computers.

The Layers of Neural Networks

Now keeping this basic structure and functionality of the neurons in mind, neural networks consist of nodes that are arranged in layers. The input layer is the one that receives the inputs (like the dendrites), and the outer layer is the one that gives the output or prediction (like the axons). There are many hidden layers present in the neural networks that process this data. These layers or tiers are highly interconnected, that is, each node in a layer is connected to every other node in the neighbouring layers. If we represent the tier as $T$ than the tier before will be $T-1$ and the tier after will be $T+1$. The output tier may have one or more nodes.

It consists of different parameters, mainly weights, and biases. Hyperparameters such as speed of learning and the duration of training are also considered before the start of training. Weights represent the strength of connections between the nodes of the neural networks while biases are additional parameters that represent situations where all inputs are zero or “inactive”.

Working of the Neural Networks

Let’s go back to the thresholds discussed in the neurons. Imagine the nodes in the layer as a certain number between two values with the larger value being the threshold. If the value of a node is closer to the threshold, it is “activated”. The activation of the nodes in the first layer has a direct impact on the activation of the nodes in the second layer and so on until we reach the last layer which gives us the output. This output is the prediction made by our system. This can be considered parallel to the working of biological neurons where some groups of neurons are spiked leading to the spiking of certain other groups of neurons.

To put it simply, the pattern of activation in the first layer causes a very specific pattern of activation in the next layer. This goes on for all the hidden layers until the output layer is reached which gives the network’s choice or predicted answer.

The layered structure is working very intelligently. Each layer looks for sub-details that allow the production of accurate answers and outputs.

Take all the activations from the first layer and compute their weighted sum according to the layers:

$$w_{1}a_{1} + w_{2}a_{2} + … + w_{n}a_{n} $$

These weights ($w$) can be positive and negative. The weighted sum could be any number, but these are adjusted as the network is trained. The bias is some number (an additional parameter) that is added to the weighted sum.

$$w_{1}a_{1} + w_{2}a_{2} + … + w_{n}a_{n} + B $$

Where $B$ is the bias. The computer is supposed to find the right weights and biases to solve the problem at hand. This happens through the backpropagation algorithm. The system makes a prediction based on its current weights and bias values. The predicted result is then evaluated with the help of a loss function which calculates the errors in the predictions. The network then works to minimize its errors by readjusting its weights and biases and going backwards toward the nodes that cause the most errors. This way, the backpropagation algorithm also acts as a feedback mechanism. Now modifying the above equation with the activation function leads to:

$$f(x)=B + WA$$

$$f(x)=\sigma(B + WA)$$

Where sigmoid represents the sigmoidal activation function $W$ is the summation of all wights and $A$ is the activations of all nodes. The equation represents one layer of the neural network. Note that the activation function cannot be linear because the neural networks are non-linear.

Finally, the neural networks are adaptive which means that they work to reduce the errors and increase the accuracy of the prediction as much as possible. The inputs that contribute to higher accuracy are weighted higher.

Advantages

Neural networks can learn from mistakes and make independent decisions. They can process multiple inputs in a parallel manner especially if they are large. Moreover, if one of the layers fails the rest of them can still function normally.

They are used in image recognition, speech recognition, medical diagnosis, stock market predictions, chatbots, drug discovery, personal assistants, and many other fields.

Citations

Hervé Abdi, Valentin, D., & Edelman, B. (1999). Neural networks. Sage Publications.
Marsland, S. (2014). Machine Learning: An Algorithmic Perspective, Second Edition. United States: CRC Press.

Written by

Hafsa Qureshi

I am a bioinformatics undergraduate interested in AI, machine learning, and large language models. I aim to contribute to the intersection of AI and bioinformatics, leveraging computational techniques to contribute to biological research and healthcare.