IA, Machine Learning, Neural Networks and Deep Learning

Rossano Cameroni
7 min readMar 1, 2024

--

What do these terms mean and what do they represent? A brief overview of something we often hear about but may not know in detail.

Starting from a graphical representation that describes the subsets of artificial intelligence.

Is there a precise definition of Artificial Intelligence?

We start by stating that there is no single and official definition of Artificial Intelligence (AI), just as there isn’t for human intelligence. Indeed, for the latter, there isn’t even unanimity on how many types of intelligence there are: there’s crystallized intelligence, fluid intelligence, logical-mathematical intelligence, the multiple intelligences theorized by Howard Gardner, the emotional intelligence expressed by Daniel Goleman, or the analytical, creative, and practical intelligence of Robert Sternberg. This list is not even complete, thus highlighting the complexity of the topic.

For simplicity, let’s borrow the definition elaborated by the OECD, which, in a broad and general sense, describes AI in the following way: “An AI system is a machine-based system that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that [can] influence physical or virtual environments. Different AI systems vary in their levels of autonomy and adaptiveness after deployment”. Note that this is an evolving definition, considering that it is updated according to new developments.

Having defined the umbrella under which algorithms and learning models fall, let’s move on to the first subset or subfield, namely Machine Learning (ML). ML involves algorithms capable of learning from the surrounding environment and improving their performance based on available data. It works in a way similar to that of humans (for example, a child), simulating their learning capabilities. Through different learning modes, such as supervised (thanks to a set of labeled data), unsupervised (unlabeled data), or reinforcement (a process that involves successive actions and rewards) the system constantly evolves to improve its performance. It was a great revolution compared to deterministic models that were based on “If”-“Then” programming, difficult to apply to more or less complex tasks. Machine Learning has therefore allowed us to move from programming to training.

The next subset is Neural Networks. This is an ensemble of techniques that simulate the learning processes of the brain through layered artificial neural networks. In these neural networks, each layer calculates values for the next.

Thanks to this “simulation” of the human brain, it is possible to move on to Deep Learning, a machine learning technique in which artificial neural networks are exposed to a large amount of data. These are capable of autonomously learning to perform certain tasks without the need for preprocessing of data. As in the cerebral cortex, the artificial neurons of Machine Learning are distributed over several layers, where the initial layer concerns the input, followed by a series of intermediate “hidden” layers, and the final one for output. The term “Deep learning” can be used only when there are at least two intermediate layers.

The issue does not stop here because everyone, to a greater or lesser extent, has come across the term LLM or Large Language Model (LLM). Wikipedia defines the LLM as “a type of language model notable for being able to achieve general-purpose language understanding and generation. LLMs acquire this ability by using large amounts of data to learn billions of parameters in training and consuming significant computational resources in operation. The adjective large in the name refers to the large number of parameters of the probabilistic model (in the order of billions)…”. LLMs are largely artificial neural networks and particularly Transformer (neural models that can capture long-range relationships in input data without relying on recurrent structures) and are (pre-)trained using unsupervised learning or supervised learning (hence the term semi-supervised learning).

In the last 3–4 years, the number of parameters on which large multimodal language models are based has grown rapidly and exponentially, as shown by this graph:

Extracted from “Artificial Intelligence Index Report 2023” — page 60 — figure 1.2.15
Extracted from “Artificial Intelligence Index Report 2023” — page 60 — figure 1.2.15

The information provided in this article helps to better understand the functioning of Artificial Intelligence and its training models. For those who wish to expand their knowledge, here are some simplified definitions (generated with the support of ChatGPT) that explain much of the terminology in the proposed scheme at the entrance.

Happy reading.

Simplified Definitions

  • Natural Language Processing (NLP): is like teaching computers to understand and communicate with human language, such as written text or speech. It’s like making a computer understand what we’re saying when we talk to it or ask it to do something.
  • Visual Perception: is like teaching computers to “see” and understand images and the visual world around them. We allow them to recognize objects, people, and other elements in images, just as we do when we look at photos.
  • Automatic Programming: is like teaching computers to write their own code and solve problems on their own. Imagine giving instructions to a computer about what to do and then it writes the code by itself to execute those instructions.
  • Intelligent Robot: is like creating robots that can think and act intelligently. These robots can make decisions based on what they see, hear, and learn, just like we humans.
  • Knowledge Representation: is like organizing information so that computers can understand and use it to make decisions. For example, we might represent the concept of “cat” as an animal with four legs, pointed ears, and a meow.
  • Automatic Reasoning: is the process through which computers use algorithms and logical rules to derive conclusions and make decisions based on available data and information. Essentially, it is the ability of a computer system to reason and draw conclusions autonomously, without the need for direct human intervention.

Machine Learning Techniques

  • Linear/Logistic Regression: is a method used to predict the value of a dependent variable based on one or more independent variables. Linear regression is used when the dependent variable is continuous, while logistic regression is used when the dependent variable is binary (has only two possible values).
  • K-Means: is a clustering algorithm used to divide a dataset into homogeneous groups, called clusters. The algorithm groups data so that elements within the same cluster are more similar to each other than to elements in other clusters.
  • Support Vector Machine (SVM): is a classification algorithm used to find the best hyperplane that separates data into different classes. The goal is to find the hyperplane that maximizes the distance between the nearest data points, called support vectors, of the different classes.
  • Principal Component Analysis (PCA): is a technique used to reduce the dimensionality of data, while preserving as much variance as possible. PCA identifies the directions of maximum variance in the data and projects the data onto these directions, thus reducing the number of variables.
  • Random Forest: is an ensemble learning algorithm that combines multiple decision trees to make predictions. Each tree is trained on a random subset of the data, and predictions are made by aggregating the predictions of all the trees.
  • K-Nearest Neighbor (KNN): is a classification and regression algorithm based on the principle that similar items tend to belong to the same class or have similar values. KNN calculates the distance between data points and predicts the label or value of a data point based on the nearest data points to it.
  • Decision Trees: are supervised learning algorithms used for classification and regression. A decision tree recursively splits the dataset into smaller subsets based on splitting criteria, until a homogeneous set of data is obtained or a stopping condition is reached. The leaf nodes of the tree represent predictions.

Other Definitions

  • Boltzmann Neural Networks: Boltzmann neural networks are a type of stochastic neural network, which means they include elements of randomness in their operation. They are composed of processing units called nodes, which can be active or inactive, and connections that determine the activation of the nodes. These networks are used to model complex systems and solve optimization problems.
  • Self-Organizing Maps (SOM): Self-organizing maps are an unsupervised learning technique used to visualize and organize high-dimensional data in two-dimensional or three-dimensional spaces. SOM uses a grid of neurons that self-organize based on the similarities between input data. These maps are used for data visualization, dimensionality reduction, and pattern discovery in data.

Deep Learning Techniques

  • Convolutional Neural Networks (CNN): Convolutional neural networks are a type of neural network primarily used for image and video recognition. CNNs use convolution layers to automatically extract features from images (feature extraction), reducing the need for manual feature engineering. They consist of convolutional, pooling, and fully connected layers.
  • Recurrent Neural Networks (RNN): Recurrent neural networks are a type of neural network used for modeling data sequences, such as natural language or time. RNNs are capable of maintaining an internal state, allowing them to capture temporal relationships in the data. However, they can suffer from vanishing gradient problems, especially in long sequences.
  • Generative Adversarial Networks (GAN): Generative adversarial networks are a type of neural network architecture used for generating new data that looks like it comes from a certain distribution. GANs consist of two neural networks: the generator and the discriminator. The generator tries to produce data that is indistinguishable from real data, while the discriminator tries to distinguish between real and generated data.
  • Deep Belief Networks (DBN): are machine learning models that combine elements of feed-forward neural networks with probabilistic neural networks. They consist of multiple layers of units, including visible layers and hidden layers, and use unsupervised learning algorithms to learn data representations. DBNs are often used for unstructured data modeling and feature extraction. Additionally, they can capture complex relationships in the data and generate new examples similar to the training data. Deep Belief Networks have been successfully used in applications such as image recognition, speech recognition, and natural language modeling.

--

--

Rossano Cameroni
Rossano Cameroni

Written by Rossano Cameroni

0 Followers

Exploring AI: impacts and future scenarios in modern society

No responses yet