How to add a scalar function

Scalar Functions

This document describes the main concepts, features, and examples of how to add simple scalar function to NebulaStream. Scalar function are part of an expression tree and receive a set of input arguments and return exactly one result. Common examples are, e.g., ‘MIN’, ‘MAX’, ‘LOG10’, and ‘SIN’.

All Functions in NebulaStream consist of a logical part, i.e., the LogicalFunction, and a physical representation, i.e., the ExecutableFunction.
The LogicalFunction registers a logical representation of the function, which is used during query optimization, e.g., type-inference and constant-folding, and is serializable. In contrast ExecutableFunctions define the actual implementation of that function and are invoced during query execution.

In the following we discuss how we can implement and register both.

Logical Functions

To define a new logical function, we can extend different interfaces, LogicalFunction, BinaryLogicalFunction, or UnaryLogicalFunction.

The following code defines a Log2Function function and registers it under a specific name to the FunctionRegistry. Whenever, an Function with this function name is used in an query it will lookup the function in the FunctionRegistry. Currently all LogicalFunctions have to implement inferStamp, inferUnary, or inferBinary depending on the function type. This method, returns the expected data type depending on the input arguments.

// Implements the log2 function.
class LogFunction2: public UnaryLogicalFunction {
  public:
    [[nodiscard]] DataTypePtr inferUnary(const DataTypePtr& input) const override {
        // Log2 only supports numeric inputs. 
        if (input->isNumeric()) {
            NES_RUNTIME_EXCEPTION("LogExpressions can only be evaluated on numeric values.");
        }
        return DataTypeFactory::createDouble();
    }
};
// register the LogFunction function to the registry under a specific name
[[maybe_unused]] static LogicalFunctionRegistry::Add<Log2Function> log2Function("log2");

Executable Functions

To define a new executable function, we first have to implement an executable expression, which represents this functions. Finally we register the expression to the ExecutableFunctionRegistry and


// proxy function to wrap std::log2
double calculateLog2(double x) { return std::log2(x); }

class Log2Expression : public Expression {
  public:
    Log2Expression(const ExpressionPtr& subExpression);

    Value<> execute(Record& record) const override{
      Value subValue = subExpression->execute(record);
      if (subValue->isType<Int64>()) {
          // call the pre defined proxy function that wrapps std::log2
          return FunctionCall<>("calculateLog2_int64", calculateLog2, subValue.as<Int64>());
      }else {
        // Add other types
      } 
    };

  private:
    const ExpressionPtr subExpression;
};

// Register executable expression as unary function
static ExecutableFunctionRegistry::Add<UnaryFunctionProvider<Log2Expression>> log2Function("log2");

Expose Function in QueryAPI.

To make the function usable, we have to expose it in the query api. To this end, you have to adjust nes-core/include/API/Expressions/ArithmeticalExpressions.hpp