How to benchmark NebulaStream?

With NebulaStream, we currently provide nesBench, a helper for single-node benchmarks.

Compile NES for Benchmarking

There are a few vital things to consider when building NebulaStream to achieve optimal performance:

  • set the CMake build type to Release.
  • enable CMake cache variable NES_BUILD_NATIVE to allow for processor-specific tuning like enabling AVX.
  • Maybe disable logging altogether, i.e. set NES_LOG_LEVEL to LOG_NONE.

If you have a NUMA system, also consider binding to a specific node.

Run Your First Benchmark

To run a benchmark, create a benchmark configuration like e.g.

logLevel: LOG_NONE
logicalSources:
  - name: ysb
    type: YSB
numberOfMeasurementsToCollect: 5
query: >
  Query::from("ysb")
    .filter(Attribute("event_type") < 1)
    .window(SlidingWindow::of(IngestionTime(), Seconds(5), Seconds(1)))
    .byKey(Attribute("campaign_id"))
    .apply(Sum(Attribute("user_id")))
    .sink(FileSinkDescriptor::create("out", "CSV_FORMAT", "OVERWRITE"));  

# A bit of tuning
numberOfWorkerThreads: 8
numberOfBuffersInGlobalBufferManager: 10240
numberOfBuffersInSourceLocalBufferPool: 1024
bufferSizeInBytes: 131072

Then, run nesBench from the docker image. We mount the current directory since nesBench will produce output in its working directory, which is /bench.

$ docker run --rm -v .:/bench nebulastream/nes-executable-image nesBench bench.yml

After the run, we obtain a csv file with measurements, from which we could generate graphs etc. For now, we just select a few columns and look at the first few rows:

$ awk -F, < bench.csv '{ print $13, $17, $20 }' | column -t | head -n 5
tuplesPerSecond  numberOfWorkerOfThreads  bufferSizeInBytes
344656           8                        131072
344560           8                        131072
344080           8                        131072
411264           8                        131072

Pre-defined Benchmarks

There is a handful of benchmarks defined in the nebulastream repository, most notably some queries of the Yahoo Streaming Benchmark and the Nexmark Benchmark.

How to define your own Benchmark

With most config options, you can supply a comma separated list of values. nesBench will execute as many runs as there are items in the longest list. Shorter lists are padded with their last value.

So given the configuration snippet:

# specify source & query

numberOfWorkerThreads: 1, 1, 2
bufferSizeInBytes: 131072, 262144, 131072, 262144

nesBench will execute four runs:

numberOfWorkerThreadsbufferSizeInBytes
1131072
1262144
2131072
2262144

Note that the last value for numberOfWorkerThreads is added by the padding.

Overview of configration options

These are general knobs:

  • logLevel
  • numberOfWorkerThreads
  • bufferSizeInBytes
  • numberOfBuffersInGlobalBufferManager
  • numberOfBuffersInSourceLocalBufferPool: Sources have their own local pool of buffers, separate from the global buffer pool.

The following options control different features of NES:

  • nautilusBackend: The backend of the Nautilus compilation framework
  • queryCompilerDumpMode: If and how to dump the intermediate representations of the nautilus compilation process
  • windowingStrategy: The kind of windowing optimization to apply (Slicing or Bucketing)
  • pipeliningStrategy: Whether to enable operator fusion
  • streamJoinStrategy: The kind of join to use
  • useCompilationCache: If compilation results should be cached between executions

Interesting csv output fields

The following three are ever increasing statistic counters:

  • processedTuples
  • processedTasks
  • processedBuffers

And nesBench also calculates aggregates, i.e. tuplesPerSecond, tasksPerSecond, and bufferPerSecond.