Class RollingCountBolt

  • All Implemented Interfaces:
    Serializable, IBolt, IComponent, IRichBolt

    public class RollingCountBolt
    extends BaseRichBolt
    This bolt performs rolling counts of incoming objects, i.e. sliding window based counting.

    The bolt is configured by two parameters, the length of the sliding window in seconds (which influences the output data of the bolt, i.e. how it will count objects) and the emit frequency in seconds (which influences how often the bolt will output the latest window counts). For instance, if the window length is set to an equivalent of five minutes and the emit frequency to one minute, then the bolt will output the latest five-minute sliding window every minute.

    The bolt emits a rolling count tuple per object, consisting of the object itself, its latest rolling count, and the actual duration of the sliding window. The latter is included in case the expected sliding window length (as configured by the user) is different from the actual length, e.g. due to high system load. Note that the actual window length is tracked and calculated for the window, and not individually for each object within a window.

    Note: During the startup phase you will usually observe that the bolt warns you about the actual sliding window length being smaller than the expected length. This behavior is expected and is caused by the way the sliding window counts are initially "loaded up". You can safely ignore this warning during startup (e.g. you will see this warning during the first ~ five minutes of startup time if the window length is set to five minutes).

    See Also:
    Serialized Form
    • Constructor Detail

      • RollingCountBolt

        public RollingCountBolt()
      • RollingCountBolt

        public RollingCountBolt​(int windowLengthInSeconds,
                                int emitFrequencyInSeconds)
    • Method Detail

      • prepare

        public void prepare​(Map<String,​Object> topoConf,
                            TopologyContext context,
                            OutputCollector collector)
        Description copied from interface: IBolt
        Called when a task for this component is initialized within a worker on the cluster. It provides the bolt with the environment in which the bolt executes.

        This includes the:

        Parameters:
        topoConf - The Storm configuration for this bolt. This is the configuration provided to the topology merged in with cluster configuration on this machine.
        context - This object can be used to get information about this task's place within the topology, including the task id and component id of this task, input and output information, etc.
        collector - The collector is used to emit tuples from this bolt. Tuples can be emitted at any time, including the prepare and cleanup methods. The collector is thread-safe and should be saved as an instance variable of this bolt object.
      • execute

        public void execute​(Tuple tuple)
        Description copied from interface: IBolt
        Process a single tuple of input. The Tuple object contains metadata on it about which component/stream/task it came from. The values of the Tuple can be accessed using Tuple#getValue. The IBolt does not have to process the Tuple immediately. It is perfectly fine to hang onto a tuple and process it later (for instance, to do an aggregation or join).

        Tuples should be emitted using the OutputCollector provided through the prepare method. It is required that all input tuples are acked or failed at some point using the OutputCollector. Otherwise, Storm will be unable to determine when tuples coming off the spouts have been completed.

        For the common case of acking an input tuple at the end of the execute method, see IBasicBolt which automates this.

        Parameters:
        tuple - The input tuple to be processed.
      • declareOutputFields

        public void declareOutputFields​(OutputFieldsDeclarer declarer)
        Description copied from interface: IComponent
        Declare the output schema for all the streams of this topology.
        Parameters:
        declarer - this is used to declare output stream ids, output fields, and whether or not each output stream is a direct stream