How to add a new expression?

This is an implementation guide on adding a new arithmetical expression in NebulaStream, i.e., functions for expressions. In the reminder, we describe the main concepts, features, and essential development step with examples of how to add your custom expression. In general, we support scalar functions that receive a set of input arguments and return exactly one result. Common examples are, e.g., MIN, MAX, LOG10, and SIN. Furthermore, all functions in NebulaStream consist of a logical part, i.e., the Logical Expression, and a physical representation, i.e., the Executable Nautilus Expression.

  • a. Executable Nautilus Expression
  • b. Logical Expression
  • c. Query API & Clients

You can implement your expression both ways: bottom-up (a.-c.) and top-down (c.-a.).

a. Executable Expression (Worker)

An executable expression contains the actual implementation of a function and is invoked during query execution. It contains the function execute, which is called for each received tuple. Code in execute gets (query-)compiled by NebulaStreams query compilation backend Nautilus.

Tasks:

  1. Create a new class that inherits from the base class Expression.
  2. Implement an executable expression, which represents your functions. In particular, you have to enter the logic of your function to the execute which is called for each occurring tuple (record).
  3. Finally, we register the expression to the ExecutableFunctionRegistry class.
  4. Add all new src-files to the respective CmakeLists (in the corresponding folders, watch out for subfolders).

Location: nes-execution

Tasks:

  • Execution -> Expressions -> Functions

Tests:

  • tests -> UnitTests -> Execution -> Expressions -> Functions

The following code shows the implementation of the executable Log2 function including its registration to the `ExecutableFunctionRegistry class.

// proxy function to wrap std::log2
double calculateLog2(double x) { return std::log2(x); }

Value<> Log2Expression::execute(NES::Nautilus::Record& record) const {
    Value subValue = subExpression->execute(record);
    if (subValue->isType<Int64>()) {
        // call the pre defined proxy function that wrapps std::log2
        return FunctionCall<>("calculateLog2_int64", calculateLog2, subValue.as<Int64>());
    }else if(...){
        // Add other types
    }
};

// Register executable expression as unary function
static ExecutableFunctionRegistry::Add<UnaryFunctionProvider<Log2Expression>> log2Function("log2");

b. Logical Expression (Coordinator)

A logical expression is part of the logical query plan and registers a logical representation of the function. This representation is used during query optimization, e.g., type-inference and constant-folding, and is serializable. A FunctionExpression is an expression node that represents a function with a specific name. Internally it stores a `LogicalFunction, which is used for inference.

  1. To define a new logical expression, you have to create a new function class that inherits from one of the interfaces provided in the LogicalFunctionRegistry class, i.e., LogicalFunction, BinaryLogicalFunction, or UnaryLogicalFunction.
  2. Add the new src-file to the respective CmakeLists (in the corresponding folders, watch out for subfolders)

Location: nes-expressions

Tasks:

  • Expressions -> Functions

The following code defines the Log2-function and registers it under a specific name to the FunctionRegistry. Whenever, a logical function with this name is used in a query, the function in the FunctionRegistry is looked up. Furthermore, this method returns the expected data type depending on the input arguments.

💡 Currently, all LogicalFunctions have to implement inferStamp, inferUnary, or inferBinary depending on the function type.

class Log2Function : public UnaryLogicalFunction {
    public:
        [[nodiscard]] DataTypePtr inferUnary(const DataTypePtr& input) const override {
            if (!input->isNumeric()) {
                NES_THROW_RUNTIME_ERROR("LogExpressions can only be evaluated on numeric values.");
            }
        // Output values can become highly negative for inputs close to +0. Set Double as output stamp.
        return DataTypeFactory::createDouble();
        }
};

[[maybe_unused]] const static LogicalFunctionRegistry::Add<Log2Function> logFunction("log2");
}

c. QueryAPI & Clients

You must add your expression to the Query API to enable users to use it in a query.

  1. Extend the ArithmeticalExpressions class with your new expression.

Location: nes-client

Tasks:

  • API -> Expressions -> ArithmeticalExpressions

Tests:

  • nes-coordinator -> tests -> UnitTests -> Query

💡 If you want your expression to be available in the NebulaStream clients, you also have to add it their:

The following code create the FunctionExpression for the LOG2 function with the function name log2 and the input arguments exp as vector.

ExpressionNodePtr LOG2(const ExpressionNodePtr& exp) {
    return FunctionExpression::create(DataTypeFactory::createUndefined(), "log2", {exp});
    }
}