E2E Benchmarking
General
NebulaStream supports end-to-end benchmarking directly without the need of writing any additional code. To enable the e2e benchmarking framework, build NebulaStream with the CMAKE_FLAG -DNES_BUILD_BENCHMARKS=1
. Configuration of a benchmark takes place in yaml files and thus, no additional code has to be written. Furthermore, during a benchmark run, NebulaStream automatically tracks the throughput and latency and writes the values into a user-definable file in the form of comma-separated values.
A typical workflow consists of the following steps
- Build NebulaStream with
-DNES_BUILD_BENCHMARKS=1
- Create a yaml configuration file
- Run
e2e-benchmark-main
with the created yaml file and a logger as command line arguments, e.g../e2e-benchmark-main --configPath=/path/to/file --logPath=logger.log
- Analyze measurements and plot figures from the csv file
Configuration
As NebulaStream is still under development, this is just a snapshot of the current configuration options. New options might get added or existing options removed. At the beginning of each benchmark, all created runs are written to the logger.
The configuration file is divided into two parts:
- Configurations that change per run
- Configurations that stay constant over the whole benchmark
# ~~~ Configurations for single run ~~~
numberOfWorkerThreads: 1, 2
bufferSizeInBytes: 512, 1024, 2048
numberOfBuffersToProduce: 10000000
numberOfPreAllocatedBuffer: 1000
# Benchmark parameter for the entire run
logLevel: LOG_INFO
experimentMeasureIntervalInSeconds: 1
startupSleepIntervalInSeconds: 1
numberOfMeasurementsToCollect: 3
logicalSources:
- name: input1
type: Default
- name: input2
type: Zipfian
numberOfPhysicalSources: 3, 2, 1
alpha: 0.99
minValue: 0
maxValue: 1000
dataProvider:
name: External
ingestionRateCount: 1000
ingestionRateDistribution:
type: Sinus
numberOfPeriods: 64
ingestionRateInBuffers: 50000
inputType: MemoryMode
dataProviderMode: ZeroCopy
outputFile: FilterOneSource.csv
benchmarkName: FilterOneSource
query: 'Query::from("input1").filter(Attribute("value") < 100).sink(NullOutputSinkDescriptor::create());'
As a design choice, we opted for not calculating the cross-product of all options that change per run. Rather for each option, a list will be created and then we iterate through all lists simultaneously. If one option has less values than the other, the last value will be duplicated. In the example, three runs will be created as bufferSizeInBytes
has a maximum of three values. numberOfWorkerThreads
, numberOfBuffersToProduce
and numberOfPreAllocatedBuffer
will be padded to a size of three. Thus, we run the option displayed below.
numberOfWorkerThreads: 1, 2, 2
bufferSizeInBytes: 5120, 5120, 5120
numberOfBuffersToProduce: 1000000, 1000000, 1000000
numberOfPreAllocatedBuffer: 1000, 1000, 1000
Configuration Options
The following are currently all options for each run.
Config options for single run | Description |
---|---|
numberOfWorkerThreads | The number of worker threads. |
bufferSizeInBytes | The size (in Bytes) of each buffer. |
numberOfBuffersToProduce | No. total buffers to generate. Should be larger than the no. buffer processed by the system over each run. |
numberOfPreAllocatedBuffer | No. buffers to generate. If it is smaller than numberOfBuffersToProduce, previously allocated buffers will be processed again starting with the first preAllocatedBuffer. |
The following are currently all options for all runs.
Config options over all runs | Description |
---|---|
logLevel | Log level during all runs. One of the NebulaStream log levels. |
experimentMeasureIntervalInSeconds | Time between measurement points; minimum is one second. |
startupSleepIntervalInSeconds | Time between registering the query and starting measuring. |
numberOfMeasurementsToCollect | No. samples for each run |
logicalSources: type | Specifies the data generators. Types are: Default, Uniform or Zipfian |
numberOfPhysicalSources | No. sources for all runs |
dataProvider: name | Either Internal or External |
ingestionRateCount | No. potentially different ingestion rates |
ingestionRateDistribution: type | Specefies the ingestion rate distribution. Types are: Uniform, Sinus, Cosinus, Custom |
ingestionRateInBuffers | The maximum ingestion rate per 10 milliseconds in buffers. Only applicable for Uniform, Sinus and Cosinus distribution. |
numberOfPeriods | The number of periods for the Sinus or Cosinus distribution within the interval from 0 to ingestionRateCount. |
values | A list of comma seperated ingestion rates. Only applicable for Custom distribution. This is a child Element of ingestionRateDistribution |
inputType | For now, only MemoryMode is supported |
dataProviderMode | Either MemCpy or ZeroCpy |
outputFile | Measurements will be written to the specified file as comma-separated values |
benchmarkName | Name of this benchmark. Currently, it is used as the name for the logger file |
query | Query to be measured |
NebulaStream supports the following data generators for e2e benchmarks:
Data Generators
- Default: Creates uniformly distributed data in the range of [0, 999]
- Uniform: Creates uniformly distributed data in a given range. Requires following attributes
- minValue
- maxValue
- Zipfian: Creates Zipfian distributed data in a given range. Requires following attributes
- alpha
- minValue
- maxValue
- YSB: Creates data corresponding to the yahoo streaming benchmark
NebulaStream supports the following data generators for e2e benchmarks:
Data Providers
- Internal: Ingests the pre-allocated buffers as fast as possible into the system
- External: Ingests the pre-allocated buffers according to the given parameters into the system. The different types are:
- Uniform: Ingestes ingestionRateInBuffers steadily each second
- Sinus, Cosinus: A sine/cosine distribution with period length of numberOfPeriods divided by ingestionRateCount. Ingestes a maximum of ingestionRateInBuffers per second
- Custom: Allows for a custom number of buffers to be ingested each second