Lately, we see a trend in cybersecurity solutions advertising themselves as Artificial Intelligence systems, and they claim to detect and prevent your organization from cyber threats. Many people do not understand what stands behind modern Machine Learning methods and problems they can solve. And more importantly, that these methods do not provide a full range of tools to achieve the wet dream of every Machine Learning specialist – aka general Artificial Intelligence or, in other words – a perfect copy of the human brain.

But how Machine Learning works? Essentially, almost every algorithm in Machine Learning uses the following paradigms. First, we have a set of data, which we call training data. We divide this data into input and output data. The input data is the data our Machine Learning model will use to generate an output. We compare this generated output with the output of the training data and decide whether this result is good. During my learnings in Machine Learning, I was amazed how many training materials could not explain how we create these models. Many authors just started giving the readers mathematical formulas and even highly complex explanations comparing the models to the human brain. In this article, I am trying to provide a simple high-level description of how they work.

On the diagram you can see a standard Deep Learning Machine Learning model. The Conv/Relu and SoftMax parts are actually polynomials sending data from one algebraic space to another

So how do these Machine Learning algorithms do their job? A powerful mathematics branch is learning the properties of algebraic structures (in my university, it was called Abstract Algebra). The whole idea behind this branch is to define algebraic vector spaces, study their properties, and define operations over the created algebraic space. But let’s go back to Machine Learning, essentially our training data represents an algebraic space, and our input and output data is a set of vectors in that space. Luckily, some algebraic spaces can perform the standard polynomial operations, and we even can generate a polynomial rendering the input and output data. And voila, a Machine Learning model is, in many cases, a generated polynomial, which by given input data produces an output data similar to what we expect.

The modern Deep Learning approach is using a heavily modified version of this idea. In addition, it tries to train its models using mathematical analysis over the generated polynomial and, more essentially, its first derivative. And this method is not new. The only reason it has raised its usage lately is that NVidia managed to expose its GPU API to the host system via CUDA and make matrices calculation way faster than on the standard CPUs. And yeah, the definition of a matrix is a set of vectors. Surprisingly, the list of operations supported by a modern GPU is the same set used in Abstract Algebra.

In the next part, we shall discuss how these methods are used in Cybersecurity.