NebulaStream supports end-to-end benchmarking directly without the need of writing any additional code. To enable the e2e benchmarking framework, build NebulaStream with the CMAKE_FLAG
-DNES_BUILD_BENCHMARKS=1. Configuration of a benchmark takes place in yaml files and thus, no additional code has to be written. Furthermore, during a benchmark run, NebulaStream automatically tracks the throughput and latency and writes the values into a user-definable file in the form of comma-separated values.
A typical workflow consists of the following steps
- Build NebulaStream with
- Create a yaml configuration file
e2e-benchmark-runnerwith the created yaml file as command line arguments, e.g.
- Analyze measurements and plot figures from the csv file
As NebulaStream is still under development, this is just a snapshot of the current configuration options. New options might get added or existing options get removed. At the beginning of each benchmark, all created runs are written to the logger.
The configuration file is divided into two parts:
- Configurations that change per run
- Configurations that stay constant over the whole benchmark
# ~~~ Configurations for single run ~~~ numberOfWorkerThreads: 1, 2 bufferSizeInBytes: 512, 1024, 2048 numberOfBuffersToProduce: 10000000 # Benchmark parameter for the entire run logLevel: LOG_INFO experimentMeasureIntervalInSeconds: 1 startupSleepIntervalInSeconds: 1 numberOfMeasurementsToCollect: 3 numberOfSources: 1 inputType: MemoryMode dataProviderMode: ZeroCopy outputFile: FilterOneSource.csv benchmarkName: FilterOneSource query: 'Query::from("input1").filter(Attribute("value") < 100).sink(NullOutputSinkDescriptor::create());'
As a design choice, we opted for not calculating the cross-product of all options that change per run. Rather for each option, a list will be created and then we iterate through all lists simultaneously. If one option has less values than the other, the last value will be duplicated. In the example, three runs will be created as
bufferSizeInBytes has a maximum of three values.
numberOfBuffersToProduce will be padded to a size of three. Thus, we run the option displayed below.
numberOfWorkerThreads: 1, 2, 2 bufferSizeInBytes: 5120, 5120, 5120 numberOfBuffersToProduce: 1000000, 1000000, 1000000
The following are currently all options for each run.
|Config options for single run||Description|
|numberOfWorkerThreads||The number of worker threads.|
|bufferSizeInBytes||The size (in Bytes) of each buffer.|
|numberOfBuffersToProduce||No. total buffers to generate. Should be larger than the no. buffer processed by the system over each run.|
The following are currently all options for all runs.
|Config options over all runs||Description|
|logLevel||Log level during all runs. One of the NebulaStream log levels.|
|experimentMeasureIntervalInSeconds||Time between measurement points; minimum is one second.|
|startupSleepIntervalInSeconds||Time between registering the query and starting measuring.|
|numberOfMeasurementsToCollect||No. samples for each run|
|numberOfSources||No. sources for all runs|
|dataGenerators||Specifies the data generators. Types are: Default, Uniform or Zipfian|
|inputType||For now, only MemoryMode is supported|
|dataProviderMode||Either MemCpy or ZeroCpy|
|outputFile||Measurements will be written to the specified file as comma-separated values|
|benchmarkName||Name of this benchmark. Currently, it is used as the name for the logger file|
|query||Query to be measured|
NebulaStream supports the following data generators for e2e benchmarks:
- Default: Creates uniformly distributed data in the range of [0, 999]
- Uniform: Creates uniformly distributed data in a given range. Requires following attributes
- Zipfian: Creates Zipfian distributed data in a given range. Requires following attributes