SuBiTO: Synopsis-based Training Optimization for Continuous Real-Time Neural Learning over Big Streaming Data

¶

AAAI'25 Paper

Introduction¶

In machine learning applications over Big streaming Data, Neural Networks (NNs) are continuously and rapidly trained over voluminous data arriving at high speeds. As soon as a new version of the NN becomes available, it gets deployed for prediction purposes (e.g. classification). The real-time character of such applications greatly depends on the volume and velocity of the data streams, as well as the NN complexity. Training on large volume of ingested streams or using complex NNs, potentially increases accuracy, but may compromise the real-time character of those applications. In this work, we present SuBiTO, a framework that automatically and continuously learns the training time vs accuracy trade-offs as new data stream in and fine tunes: (i) the number, size and type of NN layers; (ii) the size of the ingested data via stream synopses specific parameters; and (iii) the number of raining epochs. Finally, SuBiTO suggests optimal sets of such parameters and detects concept drifts, enabling the human operator adapt these parameters on-the-fly, at runtime.

SuBiTO Architecture¶

The core idea of SuBiTO Approach is illustrated in the following figure and the included paper:

SuBiTO Dashboard¶

The SuBiTO dashboard is developed in (Streamlit 2024). A human operator (e.g., content moderator, community manager, etc.) can use the SuBiTO dashboard to set the valid ranges for the system’s parameters, such as synopses compression ratio for load shedding, possible ranges of epochs, valid types of NN layers. Upon a concept drift and the execution of the SuBiTO Optimizer, the human operator can see the Pareto Optimal solutions (bottom left) and the architectures, as well as expected training loss and accuracy of the top-3 options. The human operator can deploy any of the top-3 alternatives by clicking on them. The selected NN is on the-fly deployed at runtime in the Training Pipeline. The statistics of the prediction Pipeline are presented in a live bar-chart. The human operator can manually execute the SuBiTO at any time, bypassing automatic concept drift detection.

Source Code¶

The Source code can be found on github.

News¶

SuBiTO accepted for publication in AAAI 2025
SuBiTO is included in the EU Innovation Radar: https://innovation-radar.ec.europa.eu/innovation/58262
SuBiTO source code is released (Image and Video moderation scenarios)

Contact¶

Errikos Streviniotis: estreviniotis [.at] tuc.gr
George Klioumis: gklioumis [.at] tuc.gr
Nikos Giatrakos: ngiatrakos [.at] tuc.gr

Acknowledgements¶

EVENFLOW