Flume(4)-监控模子_玖富娱乐主管发布


玖富娱乐是一家为代理招商,直属主管信息发布为主的资讯网站,同时也兼顾玖富娱乐代理注册登录地址。

一. 监控端口数据

起首启动Flume义务,监控本机44444端口,效劳端;

然后经由过程netcat对象向本机44444端口发送音讯,客户端;

末了Flume将监听的数据及时显现在控制台。

1. 装置netcat

sudo yum install -y nc

功用形貌:netstat敕令是一个监控TCP/IP收集的异常有效的对象,它能够显现路由表、现实的收集连接和每一个收集接口装备的状态信息。

基础语法:netstat [选项]

选项参数:

       -t或--tcp:显现TCP传输协定的连线状态;

  -u或--udp:显现UDP传输协定的连线状态;

       -n或--numeric:直接运用ip地点,而不经由过程域名效劳器;

       -l或--listening:显现监控中的效劳器的Socket;

       -p或--programs:显现正在运用Socket的顺序识别码(PID)和顺序称号;

2. 推断端口是不是被占用

sudo netstat -tunlp | grep 44444

3. 建立Flume Agent设置装备摆设文件flume-netcat-logger.conf

#在flume目次下建立job文件夹并进入job文件夹。
mkdir job
cd job/

#在job文件夹下建立Flume Agent设置装备摆设文件flume-netcat-logger.conf
touch flume-netcat-logger.conf

在flume-netcat-logger.conf文件中增加以下内容。

 

# Name the components on this agent
#a1透露表现agent的称号 a1.sources
= r1 #r1透露表现a1的输入源source a1.sinks = k1 #k1透露表现a1的输出目的地sink a1.channels = c1 #c1透露表现a1的缓冲区channel # Describe/configure the source a1.sources.r1.type = netcat #透露表现a1的输入源为netcat端口范例 a1.sources.r1.bind = localhost #透露表现a1监听的主机地点 a1.sources.r1.port = 44444 #透露表现a1监听的端口 # Describe the sink a1.sinks.k1.type = logger #透露表现a1的输出目的地是控制台的logger范例 # Use a channel which buffers events in memory a1.channels.c1.type = memory #透露表现a1的channel范例为memory范例 a1.channels.c1.capacity = 1000 #透露表现a1的channel总容量是1000个event a1.channels.c1.transactionCapacity = 100 #透露表现a1的channel传输时收集到100条event后再去提交事件 # Bind the source and sink to the channel a1.sources.r1.channels = c1 #透露表现将r1和c1连接起来 a1.sinks.k1.channel = c1 #透露表现将k1和c1连接起来

其他参数或参数详解,请参阅官方手册http://flume.apache.org/FlumeUserGuide.html

4. 开启Flume监听端口

#第一种写法:
 bin/flume-ng agent --conf conf/ --name a1 --conf-file job/flume-netcat-logger.conf -Dflume.root.logger=INFO,console

#第二种写法:
bin/flume-ng agent -c conf/ -n a1 –f job/flume-netcat-logger.conf -Dflume.root.logger=INFO,console

开启后会壅塞

参数申明:

--conf conf/  :透露表现设置装备摆设文件存储在conf/目次

--name a1       :透露表现给agent起名为a1

--conf-file job/flume-netcat.conf :flume本次启动读取的设置装备摆设文件是在job文件夹下的flume-telnet.conf文件。

-Dflume.root.logger==INFO,console :-D透露表现flume运行时动态修正flume.root.logger参数属性值,并将控制台日记打印级别设置为INFO级别。日记级别包孕:log、info、warn、error。

5. 运用netcat对象向本机的44444端口发送内容

6. 在Flume监听页检察吸收数据

 

二. 及时读取当地文件到HDFS

1. 让Flume持有Hadoop相干jar包

将commons-configuration-1.6.jar、

hadoop-auth-2.7.2.jar、

hadoop-common-2.7.2.jar、

hadoop-hdfs-2.7.2.jar、

commons-io-2.4.jar、

htrace-core-3.1.0-incubating.jar

拷贝到/opt/module/flume/lib文件夹下(若是已持有的话,略过)。

2. 建立flume-file-hdfs.conf文件

#在hob目次下建立文件
touch flume-file-hdfs.conf

要想读取Linux体系中的文件,就得依照Linux敕令的划定规矩实行敕令。因为Hive日记在Linux体系中以是读取文件的范例挑选:exec即execute实行的意义。透露表现实行Linux敕令来读取文件

 在flume-file-hdfs.conf中增加以下内容

# Name the components on this agent
a2.sources = r2
a2.sinks = k2
a2.channels = c2

# Describe/configure the source
a2.sources.r2.type = exec                                         #界说source范例为exec可实行文件
a2.sources.r2.command = tail -F /opt/module/hive/logs/hive.log    #要实行的linux敕令
a2.sources.r2.shell = /bin/bash -c                                #实行shell剧本的绝对途径

# Describe the sink
a2.sinks.k2.type = hdfs                                            #sink范例为hdfs
a2.sinks.k2.hdfs.path = hdfs://hadoop100:9000/flume/%Y%m%d/%H      #上传文件再hdfs上的途径 转义序列的详解见下表
#上传文件的前缀 a2.sinks.k2.hdfs.filePrefix = logs- #是不是依照时候转动文件夹 a2.sinks.k2.hdfs.round = true #若干时候单元建立一个新的文件夹 a2.sinks.k2.hdfs.roundValue = 1 #从新界说时候单元 a2.sinks.k2.hdfs.roundUnit = hour #是不是运用当地时候戳 a2.sinks.k2.hdfs.useLocalTimeStamp = true #积累若干个Event才flush到HDFS一次 a2.sinks.k2.hdfs.batchSize = 1000 #设置文件范例,可支撑紧缩 a2.sinks.k2.hdfs.fileType = DataStream #多久天生一个新的文件 a2.sinks.k2.hdfs.rollInterval = 60 #设置每一个文件的转动巨细 a2.sinks.k2.hdfs.rollSize = 134217700 #文件的转动与Event数目无关 a2.sinks.k2.hdfs.rollCount = 0 # Use a channel which buffers events in memory a2.channels.c2.type = memory a2.channels.c2.capacity = 1000 a2.channels.c2.transactionCapacity = 100 # Bind the source and sink to the channel a2.sources.r2.channels = c2 a2.sinks.k2.channel = c2

注重 : 关于一切与时候相干的转义序列,Event Header中必需存在以 “timestamp”的key(除非hdfs.useLocalTimeStamp设置为true,此要领会运用TimestampInterceptor自动增加timestamp)。

3. 开启Flume监控

bin/flume-ng agent --conf conf/ --name a2 --conf-file job/flume-file-hdfs.conf

4. 开启hdfs和hive,操纵hive发生日记

#开启hdfs
sbin/start-dfs.sh

#开启hive发生日记
bin/hive

5. 在HDFS上检察文件

三. 及时读取目次文件到HDFS

1. 建立设置装备摆设文件flume-dir-hdfs.conf

#再job目次下建立文件
touch flume-dir-hdfs.conf

增加以下内容

a3.sources = r3
a3.sinks = k3
a3.channels = c3

# Describe/configure the source
#source范例为spooldir
a3.sources.r3.type = spooldir
#监控的目次
a3.sources.r3.spoolDir = /opt/module/flume/upload
#文件上传完后的文件后缀
a3.sources.r3.fileSuffix = .COMPLETED
#是不是有文件头
a3.sources.r3.fileHeader = true
#疏忽一切以.tmp末端的文件,不上传
a3.sources.r3.ignorePattern = ([^ ]*.tmp)

# Describe the sink
a3.sinks.k3.type = hdfs
a3.sinks.k3.hdfs.path = hdfs://hadoop100:9000/flume/upload/%Y%m%d/%H
#上传文件的前缀
a3.sinks.k3.hdfs.filePrefix = upload-
#是不是依照时候转动文件夹
a3.sinks.k3.hdfs.round = true
#若干时候单元建立一个新的文件夹
a3.sinks.k3.hdfs.roundValue = 1
#从新界说时候单元
a3.sinks.k3.hdfs.roundUnit = hour
#是不是运用当地时候戳
a3.sinks.k3.hdfs.useLocalTimeStamp = true
#积累若干个Event才flush到HDFS一次
a3.sinks.k3.hdfs.batchSize = 100
#设置文件范例,可支撑紧缩
a3.sinks.k3.hdfs.fileType = DataStream
#多久天生一个新的文件
a3.sinks.k3.hdfs.rollInterval = 60
#设置每一个文件的转动巨细大概是128M
a3.sinks.k3.hdfs.rollSize = 134217700
#文件的转动与Event数目无关
a3.sinks.k3.hdfs.rollCount = 0

# Use a channel which buffers events in memory
a3.channels.c3.type = memory
a3.channels.c3.capacity = 1000
a3.channels.c3.transactionCapacity = 100

# Bind the source and sink to the channel
a3.sources.r3.channels = c3
a3.sinks.k3.channel = c3

2. 启动监控

bin/flume-ng agent --conf conf/ --name a3 --conf-file job/flume-dir-hdfs.conf

申明: 在运用Spooling Directory Source时,不要在监控目次中建立并延续修正文件;上传完成的文件会以.COMPLETED末端;被监控文件夹每500毫秒扫描一次文件更改

3. 向upload文件夹中增加文件

4. 检察HDFS

5. 检察upload文件夹

 

四. 单数据源多出口(挑选器)

运用Flume-1监控文件更改,Flume-1将更改内容传递给Flume-2,Flume-2卖力存储到HDFS。

同时Flume-1将更改内容传递给Flume-3,Flume-3卖力输出到Local FileSystem。

-玖富娱乐是一家为代理招商,直属主管信息发布为主的资讯网站,同时也兼顾玖富娱乐代理注册登录地址。-

 

1. 准备工作

#在/opt/module/flume/job目次下建立group1文件夹
mkdir group1

#在/opt/module/datas/目次下建立flume3文件夹
mkdir flume3

2.建立flume-file-flume.conf

 设置装备摆设1个吸收日记文件的source和两个channel、两个sink,离别输送给flume-flume-hdfs和flume-flume-dir。

 进入group1文件夹,建立flume-file-flume.conf,增加以下内容

# Name the components on this agent
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2
# 将数据流复制给一切channel
a1.sources.r1.selector.type = replicating

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/module/hive-1.2.1/logs/hive.log
a1.sources.r1.shell = /bin/bash -c

# Describe the sink
# sink端的avro是一个数据发送者
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop100 
a1.sinks.k1.port = 4141

a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop100
a1.sinks.k2.port = 4142

# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

3. 建立flume-flume-hdfs.conf

设置装备摆设上级Flume输出的Source,输出是到HDFS的Sink.在group1目次下建立flume-flume-hdfs.conf,增加以下内容

# Name the components on this agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1
# Describe/configure the source
# source端的avro是一个数据吸收效劳
a2.sources.r1.type = avro
a2.sources.r1.bind = hadoop100
a2.sources.r1.port = 4141

# Describe the sink
a2.sinks.k1.type = hdfs
a2.sinks.k1.hdfs.path = hdfs://hadoop100:9000/flume2/%Y%m%d/%H
#上传文件的前缀
a2.sinks.k1.hdfs.filePrefix = flume2-
#是不是依照时候转动文件夹
a2.sinks.k1.hdfs.round = true
#若干时候单元建立一个新的文件夹
a2.sinks.k1.hdfs.roundValue = 1
#从新界说时候单元
a2.sinks.k1.hdfs.roundUnit = hour
#是不是运用当地时候戳
a2.sinks.k1.hdfs.useLocalTimeStamp = true
#积累若干个Event才flush到HDFS一次
a2.sinks.k1.hdfs.batchSize = 100
#设置文件范例,可支撑紧缩
a2.sinks.k1.hdfs.fileType = DataStream
#多久天生一个新的文件
a2.sinks.k1.hdfs.rollInterval = 600
#设置每一个文件的转动巨细大概是128M
a2.sinks.k1.hdfs.rollSize = 134217700
#文件的转动与Event数目无关
a2.sinks.k1.hdfs.rollCount = 0

# Describe the channel
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

4. 建立flume-flume-dir.conf

设置装备摆设上级Flume输出的Source,输出是到当地目次的Sink。在group1目次下,建立flume-flume-dir.conf,增加以下内容

# Name the components on this agent
a3.sources = r1
a3.sinks = k1
a3.channels = c2

# Describe/configure the source
a3.sources.r1.type = avro
a3.sources.r1.bind = hadoop100
a3.sources.r1.port = 4142

# Describe the sink
a3.sinks.k1.type = file_roll
a3.sinks.k1.sink.directory = /opt/module/datas/flume3

# Describe the channel
a3.channels.c2.type = memory
a3.channels.c2.capacity = 1000
a3.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel
a3.sources.r1.channels = c2
a3.sinks.k1.channel = c2

注: 输出的当地目次必需是已存在的目次,若是该目次不存在,并不会建立新的目次。

5. 实行设置装备摆设文件

离别开启对应设置装备摆设文件:flume-flume-dir,flume-flume-hdfs,flume-file-flume。

bin/flume-ng agent --conf conf/ --name a3 --conf-file jobs/group1/flume-flume-dir.conf

bin/flume-ng agent --conf conf/ --name a2 --conf-file jobs/group1/flume-flume-hdfs.conf

bin/flume-ng agent --conf conf/ --name a1 --conf-file jobs/group1/flume-file-flume.conf

6. 启动Hadoop和Hive

#启动hdfs
start-dfs.sh

#进入到hive目次下,启动hive
 bin/hive

7. 搜检HDFS上数据和/opt/module/datas/flume3目次中数据

为何会有6个文件? 

file_roll的默许设置装备摆设是每30秒转动一次文件.只需没有住手监控,隔30秒去ll一下,就会看到文件又多了

 

五. 单数据源多出口(Sink组)

 运用Flume-1监控文件更改,Flume-1将更改内容传递给Flume-2,Flume-2卖力存储到HDFS。同时Flume-1将更改内容传递给Flume-3,Flume-3也卖力存储到HDFS 

1. 准备工作

#在/opt/module/flume/jobs目次下建立group2文件夹
mkdir group2

2. 建立flume-netcat-flume.conf

设置装备摆设1个吸收日记文件的source和1个channel、两个sink,离别输送给flume-flume-console1和flume-flume-console2。

进入group2文件夹,建立flume-netcat-flume.conf,增加以下内容

# Name the components on this agent
a1.sources = r1
a1.channels = c1
a1.sinkgroups = g1
a1.sinks = k1 k2

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

#The component type name, needs to be defaultfailover or load_balance
a1.sinkgroups.g1.processor.type
= load_balance a1.sinkgroups.g1.processor.backoff = true # Must be either round_robinrandom or FQCN of custom class that inherits from AbstractSinkSelector
a1.sinkgroups.g1.processor.selector
= round_robin a1.sinkgroups.g1.processor.selector.maxTimeOut=10000 # Describe the sink a1.sinks.k1.type = avro a1.sinks.k1.hostname = hadoop100 a1.sinks.k1.port = 4141 a1.sinks.k2.type = avro a1.sinks.k2.hostname = hadoop100 a1.sinks.k2.port = 4142 # Describe the channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinkgroups.g1.sinks = k1 k2 a1.sinks.k1.channel = c1 a1.sinks.k2.channel = c1

3. 建立flume-flume-console1.conf

设置装备摆设上级Flume输出的Source,输出是到当地控制台。

在group2目次下,建立flume-flume-console1.conf,增加以下内容

# Name the components on this agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1

# Describe/configure the source
a2.sources.r1.type = avro
a2.sources.r1.bind = hadoop100
a2.sources.r1.port = 4141

# Describe the sink
a2.sinks.k1.type = logger

# Describe the channel
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

4. 建立flume-flume-console2.conf

设置装备摆设上级Flume输出的Source,输出是到当地控制台。

在group2目次下.建立flume-flume-console2.conf,增加以下内容

# Name the components on this agent
a3.sources = r1
a3.sinks = k1
a3.channels = c2

# Describe/configure the source
a3.sources.r1.type = avro
a3.sources.r1.bind = hadoop100
a3.sources.r1.port = 4142

# Describe the sink
a3.sinks.k1.type = logger

# Describe the channel
a3.channels.c2.type = memory
a3.channels.c2.capacity = 1000
a3.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel
a3.sources.r1.channels = c2
a3.sinks.k1.channel = c2

5. 实行设置装备摆设文件

离别开启对应设置装备摆设文件:flume-flume-console2,flume-flume-console1,flume-netcat-flume。

bin/flume-ng agent --conf conf/ --name a3 --conf-file jobs/group2/flume-flume-console2.conf -Dflume.root.logger=INFO,console

bin/flume-ng agent --conf conf/ --name a2 --conf-file jobs/group2/flume-flume-console1.conf -Dflume.root.logger=INFO,console

bin/flume-ng agent --conf conf/ --name a1 --conf-file jobs/group2/flume-netcat-flume.conf

6. 运用netcat对象向本机的44444端口发送内容

nc localhost 44444

7. 检察Flume2及Flume3的控制台打印日记

 

六. 多数据源汇总(经常使用)

hadoop101上的Flume-1监控文件/opt/module/group.log,

hadoop100上的Flume-2监控某一个端口的数据流,

Flume-1与Flume-2将数据发送给hadoop102上的Flume-3,Flume-3将终究数据打印到控制台

 

1. 准备工作

若是hadoop101和hadoop102没有装置flume,用分发剧本将flume分发一下

xsync flume-1.7.0/

在hadoop100、hadoop101和hadoop102的/opt/module/flume/jobs目次下建立一个group3文件夹。

2. 建立flume1-logger-flume.conf

设置装备摆设Source用于监控hive.log文件,设置装备摆设Sink输出数据到下一级Flume。

在hadoop101上建立设置装备摆设文件flume1-logger-flume.conf,并增加以下内容

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/module/group.log
a1.sources.r1.shell = /bin/bash -c

# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop102
a1.sinks.k1.port = 4141

# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

3. 建立建立flume2-netcat-flume.conf

设置装备摆设Source监控端口44444数据流,设置装备摆设Sink数据到下一级Flume:

在hadoop100上建立设置装备摆设文件flume2-netcat-flume.conf,并增加以下内容

# Name the components on this agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1

# Describe/configure the source
a2.sources.r1.type = netcat
a2.sources.r1.bind = hadoop100
a2.sources.r1.port = 44444

# Describe the sink
a2.sinks.k1.type = avro
a2.sinks.k1.hostname = hadoop102
a2.sinks.k1.port = 4141

# Use a channel which buffers events in memory
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

4. 建立flume3-flume-logger.conf

设置装备摆设source用于吸收flume1与flume2发送过去的数据流,终究兼并后sink到控制台。

在hadoop102上建立设置装备摆设文件flume3-flume-logger.conf,并增加以下内容

# Name the components on this agent
a3.sources = r1
a3.sinks = k1
a3.channels = c1

# Describe/configure the source
a3.sources.r1.type = avro
a3.sources.r1.bind = hadoop102
a3.sources.r1.port = 4141

# Describe the sink
# Describe the sink
a3.sinks.k1.type = logger

# Describe the channel
a3.channels.c1.type = memory
a3.channels.c1.capacity = 1000
a3.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a3.sources.r1.channels = c1
a3.sinks.k1.channel = c1

5. 实行设置装备摆设文件

离别开启对应设置装备摆设文件:flume3-flume-logger.conf,flume2-netcat-flume.conf,flume1-logger-flume.conf。

#hadoop102
bin/flume-ng agent --conf conf/ --name a3 --conf-file jobs/group3/flume3-flume-logger.conf -Dflume.root.logger=INFO,console

#hadoop100
bin/flume-ng agent --conf conf/ --name a2 --conf-file jobs/group3/flume2-netcat-flume.conf

#hadoop101
bin/flume-ng agent --conf conf/ --name a1 --conf-file jobs/group3/flume1-logger-flume.conf

6. 在hadoop101上向/opt/module目次下的group.log追加内容

 

7. 在hadoop100上向44444端口发送数据 

 

8. 视察hadoop102上的数据

 

-玖富娱乐是一家为代理招商,直属主管信息发布为主的资讯网站,同时也兼顾玖富娱乐代理注册登录地址。