Package org.apache.storm.hdfs.spout
Class HdfsSpout
- java.lang.Object
-
- org.apache.storm.topology.base.BaseComponent
-
- org.apache.storm.topology.base.BaseRichSpout
-
- org.apache.storm.hdfs.spout.HdfsSpout
-
- All Implemented Interfaces:
Serializable
,ISpout
,IComponent
,IRichSpout
public class HdfsSpout extends BaseRichSpout
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description HdfsSpout()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
ack(Object msgId)
Storm has determined that the tuple emitted by this spout with the msgId identifier has been fully processed.void
close()
Called when an ISpout is going to be shutdown.void
declareOutputFields(OutputFieldsDeclarer declarer)
Declare the output schema for all the streams of this topology.protected void
emitData(List<Object> tuple, org.apache.storm.hdfs.spout.HdfsSpout.MessageId id)
void
fail(Object msgId)
The tuple emitted by this spout with the msgId identifier has failed to be fully processed.SpoutOutputCollector
getCollector()
org.apache.hadoop.fs.Path
getLockDirPath()
void
nextTuple()
When this method is called, Storm is requesting that the Spout emit tuples to the output collector.void
open(Map<String,Object> conf, TopologyContext context, SpoutOutputCollector collector)
Called when a task for this component is initialized within a worker on the cluster.HdfsSpout
setArchiveDir(String archiveDir)
HdfsSpout
setBadFilesDir(String badFilesDir)
HdfsSpout
setClocksInSync(boolean clocksInSync)
HdfsSpout
setCommitFrequencyCount(int commitFrequencyCount)
HdfsSpout
setCommitFrequencySec(int commitFrequencySec)
HdfsSpout
setHdfsUri(String hdfsUri)
HdfsSpout
setIgnoreSuffix(String ignoreSuffix)
HdfsSpout
setLockDir(String lockDir)
HdfsSpout
setLockTimeoutSec(int lockTimeoutSec)
HdfsSpout
setMaxOutstanding(int maxOutstanding)
HdfsSpout
setReaderType(String readerType)
HdfsSpout
setSourceDir(String sourceDir)
HdfsSpout
withConfigKey(String configKey)
set key name under which HDFS options are placed.HdfsSpout
withOutputFields(String... fields)
Output field names.HdfsSpout
withOutputStream(String streamName)
Set output stream name.-
Methods inherited from class org.apache.storm.topology.base.BaseRichSpout
activate, deactivate
-
Methods inherited from class org.apache.storm.topology.base.BaseComponent
getComponentConfiguration
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.storm.topology.IComponent
getComponentConfiguration
-
-
-
-
Method Detail
-
setCommitFrequencyCount
public HdfsSpout setCommitFrequencyCount(int commitFrequencyCount)
-
setCommitFrequencySec
public HdfsSpout setCommitFrequencySec(int commitFrequencySec)
-
setMaxOutstanding
public HdfsSpout setMaxOutstanding(int maxOutstanding)
-
setLockTimeoutSec
public HdfsSpout setLockTimeoutSec(int lockTimeoutSec)
-
setClocksInSync
public HdfsSpout setClocksInSync(boolean clocksInSync)
-
withOutputFields
public HdfsSpout withOutputFields(String... fields)
Output field names. Number of fields depends upon the reader type
-
withConfigKey
public HdfsSpout withConfigKey(String configKey)
set key name under which HDFS options are placed. (similar to HDFS bolt). default key name is 'hdfs.config'
-
getLockDirPath
public org.apache.hadoop.fs.Path getLockDirPath()
-
getCollector
public SpoutOutputCollector getCollector()
-
nextTuple
public void nextTuple()
Description copied from interface:ISpout
When this method is called, Storm is requesting that the Spout emit tuples to the output collector. This method should be non-blocking, so if the Spout has no tuples to emit, this method should return. nextTuple, ack, and fail are all called in a tight loop in a single thread in the spout task. When there are no tuples to emit, it is courteous to have nextTuple sleep for a short amount of time (like a single millisecond) so as not to waste too much CPU.
-
emitData
protected void emitData(List<Object> tuple, org.apache.storm.hdfs.spout.HdfsSpout.MessageId id)
-
open
public void open(Map<String,Object> conf, TopologyContext context, SpoutOutputCollector collector)
Description copied from interface:ISpout
Called when a task for this component is initialized within a worker on the cluster. It provides the spout with the environment in which the spout executes.This includes the:
- Parameters:
conf
- The Storm configuration for this spout. This is the configuration provided to the topology merged in with cluster configuration on this machine.context
- This object can be used to get information about this task's place within the topology, including the task id and component id of this task, input and output information, etc.collector
- The collector is used to emit tuples from this spout. Tuples can be emitted at any time, including the open and close methods. The collector is thread-safe and should be saved as an instance variable of this spout object.
-
close
public void close()
Description copied from interface:ISpout
Called when an ISpout is going to be shutdown. There is no guarentee that close will be called, because the supervisor kill -9's worker processes on the cluster.The one context where close is guaranteed to be called is a topology is killed when running Storm in local mode.
- Specified by:
close
in interfaceISpout
- Overrides:
close
in classBaseRichSpout
-
ack
public void ack(Object msgId)
Description copied from interface:ISpout
Storm has determined that the tuple emitted by this spout with the msgId identifier has been fully processed. Typically, an implementation of this method will take that message off the queue and prevent it from being replayed.- Specified by:
ack
in interfaceISpout
- Overrides:
ack
in classBaseRichSpout
-
fail
public void fail(Object msgId)
Description copied from interface:ISpout
The tuple emitted by this spout with the msgId identifier has failed to be fully processed. Typically, an implementation of this method will put that message back on the queue to be replayed at a later time.- Specified by:
fail
in interfaceISpout
- Overrides:
fail
in classBaseRichSpout
-
declareOutputFields
public void declareOutputFields(OutputFieldsDeclarer declarer)
Description copied from interface:IComponent
Declare the output schema for all the streams of this topology.- Parameters:
declarer
- this is used to declare output stream ids, output fields, and whether or not each output stream is a direct stream
-
-