How are neural networks implemented?

What can ChatGPT tell us about its inner workings? I instructed it to take the role of a professor in the faculty of engineering at University College London and said that I had the role of a third year undergraduate student. A full transcript of the conversation is here.

To begin, I asked the professor for a high level explanation of network theory, telling him to express the ideas in British English and without using jargon, to take it step by step, and to begin by getting my agreement for what would be the steps in the explanation. After orientating this outline more towards AI networks we experimented with the professor’s ability to reflect on what he had produced – ie to re-evaluate his previous outline. This resulted in a notably better structure for the explanation I sought – see below.

Professor ChatGPT then led me through the subject one step at a time. There were many diversions as I asked him to clarify things. We got as far as Step 2.2 before one of these diversions took me down a path to a good breakpoint: agreement on an understandable explanation of how the concept of an AI’s neuron – a node in its neural network – is implemented in software and hardware and how very large numbers of these nodes can execute simultaneously.

Nodes in a neural network

Yes, that’s a correct way to conceptualise it. In a neural network, a node (or neuron) is a computational unit that takes in one or more input values, applies a weight to each input, sums them together, adds a bias, and then passes the result through an activation function. The values for the weights, biases, and the specific form of the activation function are typically stored in memory, and the code that performs these operations represents the behaviour of the node.
So, you could say that the node effectively comprises the code that executes these operations along with the stored data for the weights, biases, and activation function.

Processing many nodes

With regard to processing, the professor explained that the ability to process many nodes simultaneously in a neural network is made possible by the use of parallel processing hardware such as Graphics Processing Units (GPUs) or Tensor Processing Units (TPUs).

Parallel Architecture: GPUs and TPUs are designed with a large number of smaller, specialized processing cores that can execute operations concurrently. This is in contrast to traditional Central Processing Units (CPUs), which may have fewer but more general-purpose cores.

Vectorisation and Batch Processing: Neural networks often make use of operations that can be expressed as vector or matrix computations. These can be broken down into smaller, independent calculations that can be executed simultaneously. For example, when computing the weighted sum of inputs for an entire layer of nodes, each node’s computation can be done in parallel since they don’t depend on each other.

Batch Training: When training a neural network, it’s common to feed in a batch of data samples at once rather than a single sample at a time. This means that the forward and backward passes of the training can be done on multiple samples simultaneously, exploiting the parallel nature of the hardware.

Hardware Optimisation: Modern GPUs and TPUs are equipped with specific instructions and architecture optimisations that are designed for the kinds of numerical computations found in neural networks. This includes handling floating-point arithmetic efficiently and providing fast access to the large amounts of memory needed to store weights, biases, and intermediate computations.

Software Libraries and Frameworks: Software tools like TensorFlow, PyTorch, and CUDA have been developed to take advantage of the parallel processing capabilities of GPUs and TPUs. They handle the complexities of distributing the computations across the hardware, allowing developers to write code that scales across many cores without having to manage the parallelism manually.

We ended the session with the professor summarising that, “in essence, the simultaneous processing of many nodes is a combination of hardware capabilities specifically tailored for parallel execution and software techniques that exploit this parallelism to perform many calculations at once. This is one of the key factors that enable deep learning models to be trained and run efficiently, handling complex tasks like image recognition, natural language processing, and more.

Structure of Professor ChatGPT’s explanation

Understanding General Network Concepts
1.1. Definition of a Network (Nodes and Connections)
1.2. Basic Network Structures (Grid, Random, Scale-free)
1.3. Applications of Networks in Everyday Life (Social Networks, Transport)
Introduction to Neural Networks in AI
2.1. The Idea Behind Neural Networks (Mimicking the Human Brain)
2.2. Components of Neural Networks (Neurons, Layers)
2.3. How Neural Networks Process Information (Data Flow)
Types of Neural Networks in AI
3.1. Feedforward Neural Networks (Basic Form)
3.2. Convolutional Neural Networks (Specialised for Images)
3.3. Recurrent Neural Networks (For Time-Series or Sequential Data)
3.4. Transformer Networks (Used in Large Language Models) including: Explanation of how transformer networks differ from other types; Overview of their role in state-of-the-art large language models; Highlighting their importance in understanding human language
Training and Learning in Neural Networks
4.1. Learning from Data (Training Process)
4.2. Adjusting the Network (Backpropagation and Error Correction)
4.3. Validating the Model (Testing Accuracy)
Ethical and Practical Considerations in AI Networks
5.1. Real-world Applications (Healthcare, Finance)
5.2. Ethical Considerations (Bias, Transparency)
5.3. Future Trends and Challenges (Scalability, Security)