How to add a new expression?
This is an implementation guide on adding a new arithmetical expression in NebulaStream, i.e., functions for expressions. In the reminder, we describe the main concepts, features, and essential development step with examples of how to add your custom expression. In general, we support scalar functions that receive a set of input arguments and return exactly one result. Common examples are, e.g., MIN, MAX, LOG10, and SIN. Furthermore, all functions in NebulaStream consist of a logical part, i.e., the Logical Expression, and a physical representation, i.e., the Executable Nautilus Expression.
- a. Executable Nautilus Expression
- b. Logical Expression
- c. Query API & Clients
You can implement your expression both ways: bottom-up (a.-c.) and top-down (c.-a.).
a. Executable Expression (Worker)
An executable expression contains the actual implementation of a function and is invoked during query execution. It contains the function execute
, which is called for each received tuple. Code in execute gets (query-)compiled by NebulaStreams query compilation backend Nautilus.
Tasks:
- Create a new class that inherits from the base class
Expression
. - Implement an executable expression, which represents your functions. In particular, you have to enter the logic of your function to the
execute
which is called for each occurring tuple (record). - Finally, we register the expression to the
ExecutableFunctionRegistry
class. - Add all new src-files to the respective
CmakeList
s (in the corresponding folders, watch out for subfolders).
Location: nes-execution
Tasks:
- Execution -> Expressions -> Functions
Tests:
- tests -> UnitTests -> Execution -> Expressions -> Functions
The following code shows the implementation of the executable Log2 function including its registration to the `ExecutableFunctionRegistry class.
// proxy function to wrap std::log2
double calculateLog2(double x) { return std::log2(x); }
Value<> Log2Expression::execute(NES::Nautilus::Record& record) const {
Value subValue = subExpression->execute(record);
if (subValue->isType<Int64>()) {
// call the pre defined proxy function that wrapps std::log2
return FunctionCall<>("calculateLog2_int64", calculateLog2, subValue.as<Int64>());
}else if(...){
// Add other types
}
};
// Register executable expression as unary function
static ExecutableFunctionRegistry::Add<UnaryFunctionProvider<Log2Expression>> log2Function("log2");
b. Logical Expression (Coordinator)
A logical expression is part of the logical query plan and registers a logical representation of the function. This representation is used during query optimization, e.g., type-inference and constant-folding, and is serializable.
A FunctionExpression
is an expression node that represents a function with a specific name.
Internally it stores a `LogicalFunction, which is used for inference.
- To define a new logical expression, you have to create a new function class that inherits from one of the interfaces provided in the
LogicalFunctionRegistry
class, i.e.,LogicalFunction
,BinaryLogicalFunction
, orUnaryLogicalFunction
. - Add the new src-file to the respective CmakeLists (in the corresponding folders, watch out for subfolders)
Location: nes-expressions
Tasks:
- Expressions -> Functions
The following code defines the Log2-function and registers it under a specific name to the FunctionRegistry
.
Whenever, a logical function with this name is used in a query, the function in the FunctionRegistry
is looked up.
Furthermore, this method returns the expected data type depending on the input arguments.
💡 Currently, all LogicalFunctions have to implement inferStamp
, inferUnary
, or inferBinary
depending on the function type.
class Log2Function : public UnaryLogicalFunction {
public:
[[nodiscard]] DataTypePtr inferUnary(const DataTypePtr& input) const override {
if (!input->isNumeric()) {
NES_THROW_RUNTIME_ERROR("LogExpressions can only be evaluated on numeric values.");
}
// Output values can become highly negative for inputs close to +0. Set Double as output stamp.
return DataTypeFactory::createDouble();
}
};
[[maybe_unused]] const static LogicalFunctionRegistry::Add<Log2Function> logFunction("log2");
}
c. QueryAPI & Clients
You must add your expression to the Query API to enable users to use it in a query.
- Extend the
ArithmeticalExpressions
class with your new expression.
Location: nes-client
Tasks:
- API -> Expressions -> ArithmeticalExpressions
Tests:
- nes-coordinator -> tests -> UnitTests -> Query
💡 If you want your expression to be available in the NebulaStream clients, you also have to add it their:
The following code create the FunctionExpression
for the LOG2 function with the function name log2
and the input arguments exp
as vector.
ExpressionNodePtr LOG2(const ExpressionNodePtr& exp) {
return FunctionExpression::create(DataTypeFactory::createUndefined(), "log2", {exp});
}
}