public interface Function extends EachOperation
A function takes in a set of input fields and emits zero or more tuples as output. The fields of the output tuple are appended to the original input tuple in the stream. If a function emits no tuples, the original input tuple is filtered out. Otherwise, the input tuple is duplicated for each output tuple.
For example, if you have the following function:
public class MyFunction extends BaseFunction {
public void execute(TridentTuple tuple, TridentCollector collector) {
for(int i=0; i < tuple.getInteger(0); i++) {
collector.emit(new Values(i));
}
}
}
Now suppose you have a stream in the variable mystream
with the fields ["a", "b", "c"]
with the following tuples:
[1, 2, 3]
[4, 1, 6]
[3, 0, 8]
If you had the following code in your topology definition:
mystream.each(new Fields("b"), new MyFunction(), new Fields("d")))
The resulting tuples would have the fields ["a", "b", "c", "d"]
and look like this:
[1, 2, 3, 0]
[1, 2, 3, 1]
[4, 1, 6, 0]
In this case, the parameter new Fields("b")
tells Trident that you would like to select the field “b” as input to the function, and that will be the only field in the Tuple passed to the execute()
method. The value of “b” in the first tuple (2) causes the for loop to execute twice, so 2 tuples are emitted. similarly the second tuple causes one tuple to be emitted. For the third tuple, the value of 0 causes the for
loop to be skipped, so nothing is emitted and the incoming tuple is filtered out of the stream.
If your Function
implementation has configuration requirements, you will typically want to extend BaseFunction
and override the Operation.prepare(Map, TridentOperationContext)
method to perform your custom initialization.
Because Trident Functions perform logic on individual tuples – as opposed to batches – it is advisable to avoid expensive operations such as database operations in a Function, if possible. For data store interactions it is better to use a State
or QueryFunction
implementation since Trident states operate on batch partitions and can perform bulk updates to a database.
Modifier and Type | Method and Description |
---|---|
void |
execute(TridentTuple tuple,
TridentCollector collector)
Performs the function logic on an individual tuple and emits 0 or more tuples.
|
void execute(TridentTuple tuple, TridentCollector collector)
Performs the function logic on an individual tuple and emits 0 or more tuples.
tuple
- The incoming tuplecollector
- A collector instance that can be used to emit tuplesCopyright © 2019 The Apache Software Foundation. All Rights Reserved.