NebulaStream

NebulaStream is the first data management system designed for the Internet of Things (IoT). It is designed from the ground up to incorporate all computing resources, even outside the cloud, and apply processing wherever possible. This is a fundamental change compared to today’s streaming systems that are designed for the cloud and thus require bringing all data into the cloud before processing. However, even though NebulaStream is designed for the IoT, it is at the same time a highly efficient streaming engine that can also be used within cloud setups only.

Design Goals

NebulaStream aims to provide unique set features that distinguishes it from state-of-the-art systems:

  • Continuous, stateful processing in a highly dynamic and volatile environment.
  • Out-of-the-box heterogeneity (CPU/GPU/FPGA/DSP, ARM/X86, I2C/SPI/UART, OPC).
  • Scalable processing of thousands of queries over millions of devices.
  • Support for Heterogeneous workloads (Analytics, CEP, ML, and UDF support) and different client languages (e.g, Python, Java).
  • Adaptive Sensor Management.
  • Secure and privacy-preserving processing.

We refer to our systems paper and application paper for more details.

Unique Technologies

To achieve these unique features, NebulaStream is built around a set of cutting-edge technologies:

  • On-demand Data Gathering: NebulaStream gathers only the data required to answer user queries.
  • In-network Processing: NebulaStream uses all available resources in the network, within or outside the cloud, to process data while it flows from its origin at the sources to the exit point in the cloud.
  • Hardware-tailored Code Generation: NebulaStream exploits hardware tailored-code generation to generate highly efficient code based on the underlying hardware, which leads to highly efficient resource utilization.
  • Intermediate Representation: NebulaStream maps all user queries, written in different languages (e.g., Java, Python, C++) to a common intermediate representation and compiles based on that efficient code for multiple backends (e.g., Cuda, C++, OpenAPI).
  • Decentralized Fault Detection and Recovery: NebulaStream combines centralized decision-making (e.g., operator placement) with decentralized decision-making where nodes cooperatively try to solve upcoming problems (e.g., a node failure).
  • Dynamic Optimization and lightweight Partial-(re-)deployment Algorithms: NebulaStream addresses the volatility outside the cloud with a unique set of dynamic algorithms and the massive scale with lightweight algorithms to allow IoT deployments on thousands of nodes.