Deep neural networks are currently that main tool used in artificial intelligence to handle complex problems because of their power and versatility. The networks depend on having sufficient data that they use to learn to identify patterns in using a mathematical process that sets a series of parameters. For example, for images, they reduce them to simpler data so objects, textures or odd elements, etc. can be detected. This process is called training. In that way, when new images are presented to a trained network it compares them with the characteristics that were used for training in order to identify which ones are most similar or what differences are present.
The functionality of deep neural networks thus depends on their capacity to merge data in a suitable way to identify the most relevant characteristics. With convolutional neural networks the fusion is usually done using two standard processes: convolution and pooling. Those processes always use the same information merging mechanisms. To improve the results of the network, traditionally the number of parameters in increased, which makes using those networks slow and costly.
Because of that, in this project we wanted to improve the information merging processes in the network, so its performance could be improved with a lower cost. To do that, we studied new information merging mechanisms to create a set of functions beyond the typical ones that can be adapted to specific problems, depending on the needs of users, while trying to not increase the cost. In other words, we tried to replace brute force guessing to improve the results with a more refined method that is aware of the data being merged. Specifically, the neural networks developed want to be the response to new challenges in the industrial sector. In particular, we want to improve prediction for unknown data, or anomalies. An anomaly is an event that is not part of the system’s past. They are events that cannot be found in the system’s history of data. In an industrial setting, early detection of problems that, for example, cause devices to be unavailable, have a high impact on production. And, on the other hand, they can lead to significant savings in maintenance costs. Anomalous events are identified to pursue the goal of predicting them in the future early enough and with enough confidence to plan interventions that incur lower costs.
To those ends, the main goals of the project were:
– Improve the applicability and interpretability of deep neural networks with the theoretical development of new information merging mechanisms to be applied in the convolution and pooling phases
– Analyse the potential of deep neural networks to be used for detecting anomalies in univariate time series, and adapting them for AI inference functions in perimeter devices
The results were positive. Specifically, it has been shown that pooling processes can, effectively, be improved if functions are used that are capable of being aware of relationships between data using suitable metrics. That has opened the door to developing neural network models that can adapt themselves specifically to the kind of anomalies to study, as well as other image processing problems.
Insofar as convolution, it strongly depends on the way the hardware (processors) is designed in current computers. Although theoretically it is possible to improve the results if more general merge functions are considered, in practice it is not possible to implement those changes unless there is a change in the physical characteristics of the machines. That kind of modification is outside the scope of the companies, so it is not feasible.
Nevertheless, it is important to highlight that it has been shown that the new information merging mechanisms can be extended beyond the convolutional neural networks initially considered to improve any kind of architecture. And that is very significant, because neural networks evolve at a high speed, and the appearance of new and more powerful architecture is very fast. The results of our study make it possible for companies to use them for these new models.
On the other hand, with the goal of defining a detection product valid for different kinds of signals in terms of complexity (periodic, nearly periodic, aperiodic), the performance of several architectures was studied, as well as how to embed the architectures into different hardware. That analysis made it possible to show that the simplest networks are capable of detecting the most complex univariate time series characteristics, for example the price of electricity on the Spanish daily market. That involves looking being able to consider hardware with a lower processing capacity and, consequently, create lower cost detection devices to perform the inference.