1.nagios监控原理 1.Nagios 执行安装在它里面的 check_nrpe 插件,并告诉 check_nrpe 去检测哪些服务。 2.通过 SSL,check_nrpe 连接远端机子上的 NRPE daemon 3.NRPE 运行本地的各种插件去检测本地的服务和状态(check_disk,..etc) 4.最后,NRPE 把检测的结果传给主机端的 check_nrpe,check_nrpe 再把结果送到 Nagios状态队列中
安装需要准备的软件包 nagios-3.4.3.gz nagios-plugins-1.4.13.tar.gz nrpe-2.8.1.tar.gz gd-devel-2.1.0-7.5.1.x86_64.rpm
配置本地yum环境 cd /etc/yum.repos.d/ mkdir repo.bak cp CentOS-* repo.bak/ rm -rf CentOS-* vi aa.repo [aa] name=aa baseurl=file:///mnt gpgcheck=1 :wq 保存退出 挂载 mount /dev/dvd /mnt rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-* rpm -ivh gd-devel-2.1.0-7.5.1.x86_64.rpm --force --nodeps yum install make gcc glibc glibc-common php gd gd-devel libpng libmng libjpeg zlib expect openssl openssl-devel -y service httpd start 检测php安装是否成功 cd /var/www/html 建立php页面 vi index.php <?php phpinfo() ?> :wq 保存退出 http://192.168.1.242
useradd nagios groupadd nagcmd usermod -G nagcmd apache usermod -G nagcmd nagios
tar zxvf nagios-3.4.3.gz cd nagios # ./configure --prefix=/usr/local/nagios --with-command-group=nagcmd 编译Nagios程序包源码 # make all 安装二进制运行程序、初始化脚本、配置文件样本并设置运行目录权限 # make install 安装生成/usr/local/nagios/share的目录(即nagiosWEB访问界面的站点目录) # make install-init 安装生成/etc/rc.d/init.d/nagios 启动脚本 # make install-config 安装生成/usr/local/nagios/etc下的nagios相关配置文件 # make install-commandmode 设定相应nagios工作目录的权限 # make install-webconf 安装Nagios的WEB配置文件到Apache的conf.d目录下
我们应该创建一个的用户用于Nagios的WEB接口登录。记住设置好的登录口令,一会儿你会用到它。 用户名:nagiosadmin 密码:herendh htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin 输入密码 herendh进行密码设置。 重启Apache服务以使设置生效。 service httpd reload 输入地址访问,此时nagios 什么也没有,需要安装下面的插件 将 Nagios和httpd设置为开机启动 chkconfig --add nagios chkconfig nagios on chkconfig --add httpd chkconfig httpd on
tar zxvf nagios-plugins-1.4.13.tar.gz cd nagios-plugins-1.4.13 ./configure --with-nagios-user=nagios --with-nagios-group=nagcmd make all make install service nagios reload service httpd reload
tar zxvf nrpe-2.8.1.tar.gz cd nrpe-2.8.1 ./configure make all make install-plugin
nagios文件的具体含义:(/usr/local/nagios/etc/objects) command.cfg 定义nagios能调用的命令; contacts.cfg 定义联系人; localhost.cfg 定义监控本机的对象; printer.cfg 定义对打印机的监控; switch.cfg 定义对交换机的监控; templates.cfg 定义模板; timeperiods.cfg 定义时间对象; windows.cfg 定义监控的windows主机
cd /usr/local/nagios/etc/objects/ 定义主机和服务 vi heren.cfg define host{ use heren-server host_name 192.168.1.241 alias 192.168.1.241 address 192.168.1.241 } define service{ use heren-service host_name 192.168.1.241 service_description master8051 check_command check_url_port!'192.168.1.241'!'http://192.168.1.241:8051/heren/security/login-new.html'!'class'!8051 } define service{ use heren-service host_name 192.168.1.241 service_description message9999 check_command check_url_port!'192.168.1.241'!'http://192.168.1.241:9999/heren-message/personal/index.html'!'class'!9999 } define service{ use heren-service host_name 192.168.1.241 service_description schedule8061 check_command check_url_port!'192.168.1.241'!'http://192.168.1.241:8061/schedule/schedule/schedule.html'!'class'!8061 } define service{ use heren-service host_name 192.168.1.241 service_description report8081 check_command check_url_port!'192.168.1.241'!'http://192.168.1.241:8081/heren-report/api/jasper-prints/fill'!'class'!8081 } define service{ use heren-service host_name 192.168.1.241 service_description Disk check_command check_nrpe_disk!check_sda } define service{ use heren-service host_name 192.168.1.241 service_description Load check_command check_nrpe_load!check_load } define service{ use heren-service host_name 192.168.1.241 service_description mem check_command check_nrpe_mem!check_mem } 定义模板 vi templates.cfg define host{ name heren-server use generic-host check_period 24x7 check_interval 5 retry_interval 1 max_check_attempts 4 check_command check-host-alive notification_period 24x7 process_perf_data 1 notification_interval 2 notification_options d,u,r contact_groups mail_admins register 0 } define service{ name heren-services active_checks_enabled 1 passive_checks_enabled 1 parallelize_check 1 obsess_over_service 1 check_freshness 0 notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 failure_prediction_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 is_volatile 0 check_period 24x7 max_check_attempts 4 normal_check_interval 5 retry_check_interval 1 contact_groups mail_admins notification_options w,u,c,r notification_interval 5 notification_period 24x7 register 0 } define service{ name heren-service use heren-services max_check_attempts 4 normal_check_interval 5 retry_check_interval 1 register 0 } 定义命令 Vi command ############################################################################## define command{ command_name check_nrpe command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ } define command{ command_name check_nrpe_mem command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 20 -c $ARG1$ } define command{ command_name check_url_port command_line $USER1$/check_http -I $ARG1$ -u $ARG2$ -s $ARG3$ -p $ARG4$ -t 60 } define command{ command_name check_nrpe_disk command_line /usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS$ -t 20 -c $ARG1$ } define command{ command_name check_nrpe_disk_port command_line /usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS$ -t 20 -c $ARG1$ -p $ARG2$ } define command{ command_name check_nrpe_load command_line /usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS$ -t 20 -c $ARG1$ } define command{ command_name check_nrpe_ping command_line /usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS$ -t 200 -c $ARG1$ } ###############################################################################
cd /usr/local/nagios/etc/objects/ 定义主机和服务 vi yingyong.cfg define host{ use heren-server host_name 192.168.1.244 alias 192.168.1.244 address 192.168.1.244 } define host{ use heren-server host_name 192.168.1.245 alias 192.168.1.245 address 192.168.1.245 } define host{ use heren-server host_name 192.168.1.248 alias 192.168.1.248 address 192.168.1.248 } define host{ use heren-server host_name 192.168.1.249 alias 192.168.1.249 address 192.168.1.249 } define service{ use heren-service host_name 192.168.1.248 service_description vip1_url check_command check_url_port!'192.168.1.248'!'http://192.168.1.248:80'!'nginx'!80 } define service{ use heren-service host_name 192.168.1.249 service_description vip2_url check_command check_url_port!'192.168.1.249'!'http://192.168.1.249:80'!'nginx'!80 } define service{ use heren-service host_name 192.168.1.244,192.168.1.245 service_description Disk check_command check_nrpe_disk!check_sda } define service{ use heren-service host_name 192.168.1.244,192.168.1.245 service_description Load check_command check_nrpe_load!check_load } define service{ use heren-service host_name 192.168.1.244,192.168.1.245 service_description mem check_command check_nrpe_mem!check_mem } define service{ use heren-service host_name 192.168.1.244,192.168.1.245 service_description keepalived check_command check_nrpe_keepalived!check_keepalived } 定义命令 define command{ command_name check_nrpe_keepalived command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 20 -c $ARG1$ }
cd /usr/local/nagios/nagios/etc/objects/ 定义主机和服务 vi oracle.cfg define host{ use heren-server host_name 192.168.1.238 alias oracle 11g address 192.168.1.238 } define service{ use heren-service host_name 192.168.1.238 service_description Disk check_command check_nrpe_disk!check_sda } define service{ use heren-service host_name 192.168.1.238 service_description Load check_command check_nrpe_load!check_load } define service{ use heren-service host_name 192.168.1.238 service_description Mem check_command check_nrpe_mem!check_mem } define service { use heren-service host_name 192.168.1.238 service_description Oracle TNS check_command check_nrpe_oracle!check_oracle_tns } define service { use heren-service host_name 192.168.1.238 service_description Oracle DB check_command check_nrpe_oracle!check_oracle_db } define service { use heren-service host_name 192.168.1.238 service_description Oracle Login check_command check_nrpe_oracle!check_oracle_login } define service { use heren-service host_name 192.168.1.238 service_description Oracle Cache check_command check_nrpe_oracle!check_oracle_cache } define service { use heren-service host_name 192.168.1.238 service_description Oracle Tablespace check_command check_nrpe_oracle!check_oracle_tablespace } 定义命令 define command { command_name check_nrpe_oracle command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 20 -c $ARG1$ }
vi windows define host{ use windows-server host_name 192.168.1.235 alias My Windows Server address 192.168.1.235 } define hostgroup{ hostgroup_name windows-servers alias Windows Servers } define service{ use generic-service host_name 192.168.1.235 service_description NSClient++ Version check_command check_nt!CLIENTVERSION } define service{ use generic-service host_name 192.168.1.235 service_description Uptime check_command check_nt!UPTIME } define service{ use generic-service host_name 192.168.1.235 service_description CPU Load check_command check_nt!CPULOAD!-l 5,80,90 } define service{ use generic-service host_name 192.168.1.235 service_description Memory Usage check_command check_nt!MEMUSE!-w 80 -c 90 } define service{ use generic-service host_name 192.168.1.235 service_description C:\ Drive Space check_command check_nt!USEDDISKSPACE!-l c -w 80 -c 90 }
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg 如果没有报错,可以启动Nagios服务 service nagios start
在web里面输入http://192.168.1.242/nagios
tar -zxvf sendEmail-v1.56.tar.gz cp /sendEmail-v1.56/sendEmail /usr/local/bin/
/usr/local/bin/sendEmail -f liudian456@163.com -t 123@qq.com -s smtp.163.com -u "nimei" -m "test1" -xu liudian456 -xp 789 显示发送ok就表明sendEmail能共正常发送邮件 3.定义发送sendEmail 的 command 命令 vi command.cfg define command{ command_name notify-host-by-email command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/local/bin/sendEmail -f liudian456@163.com -t $CONTACTEMAIL$ -s smtp.163.com -u "** $NOTIFICATIONTYPE$ alert - $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" -xu liudian456 -xp 789 } define command{ command_name notify-service-by-email command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /usr/local/bin/sendEmail -f liudian456@163.com -t $CONTACTEMAIL$ -s smtp.163.com -u "** $NOTIFICATIONTYPE$ alert - $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" -xu liudian456-xp 789 } 4.定义邮件联系人 vim /usr/local/nagios/etc/objects/contacts.cfg define contactgroup{ contactgroup_name mail_admins alias Nagios Administrators members mail_lilin,nagiosadmin } define contact{ contact_name mail_lilin use mail-contact alias mail_lilin email 123@qq.com } 5.定义邮件联系模板 define contact{ name mail-contact service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r,f,s host_notification_options d,u,r,f,s service_notification_commands notify-service-by-email host_notification_commands notify-host-by-email register 0 } 6.启动nagios 邮件服务功能 /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg 如果没有报错,可以启动Nagios服务 service nagios reload 现在有报警就会自动发送到123@qq.com这个联系的定义的邮箱了。可以加入多个邮箱,中间用”,”隔开。
yum install make gcc* xinetd openssl openssl-devel –y
/usr/sbin/useradd nagios /usr/sbin/groupadd nagcmd /usr/sbin/usermod -G nagcmd nagios
tar zxvf nagios-plugins-1.4.13.tar.gz cd nagios-plugins-1.4.13 ./configure --prefix=/usr/local/nagios make make install
tar zxvf nrpe-2.8.1.tar.gz cd nrpe-2.8.1 ./configure --prefix=/usr/local/nagios make all make install make install-plugin make install-daemon make install-daemon-config make install-xinetd
# pwd /usr/local/nagios/etc [root@localhost etc]# vi nagios.cfg cfg_file=/usr/local/nagios/etc/objects/localhost.cfg cfg_file=/usr/local/nagios/etc/objects/heren.cfg cfg_file=/usr/local/nagios/etc/objects/oracle.cfg cfg_file=/usr/local/nagios/etc/objects/yingyong.cfg cfg_file=/usr/local/nagios/etc/objects/windows.cfg
vi /etc/xinetd.d/nrpe only_from = 127.0.0.1,192.168.1.242
cd /usr/local/nagios/etc/ Vi nrpe allowed_hosts=127.0.0.1,192.168.1.242 command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10 command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20 #command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/hda1 command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200 command[check_sda]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -c /dev/sde command[check_mem]=/usr/local/nagios/libexec/check_mem -w 20 -c 10 command[check_keepalived]=/usr/local/nagios/libexec/check_procs -w 2: -c:4 -C keepalived 上传脚本check_mem cd /usr/local/nagios/libexec 修改权限 chmod +x check_mem chown nagios:nagios check_mem
cd /usr/local/nagios/etc/ Vi nrpe allowed_hosts=127.0.0.1,192.168.1.242 command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10 command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20 #command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/hda1 command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200 command[check_sda]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -c /dev/sde command[check_mem]=/usr/local/nagios/libexec/check_mem -w 20 -c 10 command[check_oracle_tns]=/usr/local/nagios/libexec/check_oracle --tns orcl command[check_oracle_db]=/usr/local/nagios/libexec/check_oracle --db orcl command[check_oracle_login]=/usr/local/nagios/libexec/check_oracle --login orcl command[check_oracle_cache]=/usr/local/nagios/libexec/check_oracle --cache orcl herendh herendh 80 90 command[check_oracle_tablespace]=/usr/local/nagios/libexec/check_oracle --tablespace orcl herendh herendh herendh 90 80 修改脚本 cd /usr/local/nagios/libexec vi check_oracle 在开头加入 ORACLE_HOME=/u01/app/oracle/product/11.2.0/db_1 PATH=$PATH:/u01/app/oracle/product/11.2.0/db_1/bin
安装NSCP-0.4.1.73-x64 填上nagios主机的地址,192.168.1.242,不需要填写密码
vi /etc/services nrpe 5666/tcp # NRPE
service xinetd restart netstat -anptl | grep 5666 tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 21403/xinetd 启动nrpe 命令 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d 加入开机自动启动 echo "/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d" >> /etc/rc.d/rc.local
/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 NRPE v2.12
/usr/local/nagios/libexec/check_nrpe -H 192.168.1.241(客户机ip) NRPE v2.12
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
service nagios reload 对客户机的监控部署完毕,如需添加新的监控对象可照此添加。
pnp4nagios是一个基于php和perl语言的强大工具,它可以通过process_perfdata.pl脚本调用rrdtool对nagios的性能数据进行分析并绘制成相应的性能图。所以在安装pnp4nagios之前,我们必须先安装php、perl、rrdtool。 rrdtool是Round Robin Database Tool的缩写。从功能上说,rrdtool可用于数据存储+数据展示。著名的网络流量绘图软件cacti和集群监控系统Ganglia使用的都是rrdtool。在数据存储方面,rrdtool采用“Round Robin”模式存储数据,即环状数据库。注意:rrdtool数据库文件后缀名为“.rrd”。
基础库安装 yum -y install gcc cairo-devel libxml2-devel pango-devel pango libpng-devel freetype freetype-devel libart_lgpl-devel php-gd tar -xf rrdtool-1.5.4.tar.gz cd rrdtool-1.5.4 ./configure make make install
yum -y install perl-Time-HiRes tar -xf pnp4nagios-0.6.25.tar.gz cd pnp4nagios-0.6.25 ./configure --with-rrdtool=/opt/rrdtool-1.5.4/bin/rrdtool --with-perl_lib_path=/opt/rrdtool-1.5.4/lib/perl/5.10.1/x86_64-linux-thread-multi/ make all make install make install-webconf make install-config make install-init
cd /usr/local/pnp4nagios/etc/ mv misccommands.cfg-sample misccommands.cfg mv nagios.cfg-sample nagios.cfg mv rra.cfg-sample rra.cfg mv pages/web_traffic.cfg-sample pages/web_traffic.cfg [root@localhost etc]#mv check_commands/check_all_local_disks.cfg-sample check_commands/check_all_local_disks.cfg [root@localhost etc]# mv check_commands/check_nrpe.cfg-sample check_commands/check_nrpe.cfg [root@localhost etc]# mv check_commands/check_nwstat.cfg-sample check_commands/check_nwstat.cfg
/etc/init.d/npcd start ps aux |grep npcd chkconfig npcd on chkconfig --list npcd
http://192.168.1.242/pnp4nagios/ 输入nagios用户名和密码 最下方出现这句 Your environment passed all requirements. Remove or rename the /usr/local/pnp4nagios/share/install.php file now. 执行 mv /usr/local/pnp4nagios/share/install.php /usr/local/pnp4nagios/share/install.php.bak
定义nagios配置文件 vi /usr/local/nagios/etc/nagios.cfg process_performance_data=1 service_perfdata_file=/usr/local/pnp4nagios/var/service-perfdata service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$ service_perfdata_file_mode=a service_perfdata_file_processing_interval=15 service_perfdata_file_processing_command=process-service-perfdata-file host_perfdata_file=/usr/local/pnp4nagios/var/host-perfdata host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$ host_perfdata_file_mode=a host_perfdata_file_processing_interval=15 host_perfdata_file_processing_command=process-host-perfdata-file 定义命令 vi /usr/local/nagios/etc/objects/commands.cfg define command{ command_name process-service-perfdata-file command_line /bin/mv /usr/local/pnp4nagios/var/service-perfdata /usr/local/pnp4nagios/var/spool/service-perfdata.$TIMET$ } define command{ command_name process-host-perfdata-file command_line /bin/mv /usr/local/pnp4nagios/var/host-perfdata /usr/local/pnp4nagios/var/spool/host-perfdata.$TIMET$ } 定义模板 vi /usr/local/nagios/etc/objects/templates.cfg define host { name host-pnp action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=_HOST_’ class=’tips’ rel=’/pnp4nagios/index.php/popup?host=$HOSTNAME$&srv=_HOST_ register 0 } define service { name srv-pnp action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=$SERVICEDESC$’ class=’tips’ rel=’/pnp4nagios/index.php/popup?host=$HOSTNAME$&srv=$SERVICEDESC$ register 0 } 在heren.cfg、yingyong.cfg、oracle.cfg配置文件主机加入host-pnp ,服务加入srv-pnp 如下: vi /usr/local/nagios/etc/objects/ define host{ use heren-server,host-pnp host_name 192.168.1.240 alias 192.168.1.240 address 192.168.1.240 } define service{ use heren-service,srv-pnp host_name 192.168.1.240 service_description heren_url check_command check_url_port!'192.168.1.240'!'http://192.168.1.240:80'!'nginx'!80 } 报错一: 1.缺少openssl openssl-devel 2.客户端配置文件,需要加入服务器IP vi nrpe allowed_hosts=127.0.0.1,192.168.1.242 报错二: 1..客户端配置文件需要加入服务端ip vi /etc/xinetd.d/nrpe 报错三: 监控页面报错 DISK CRITICAL - /root/.gvfs is not accessible 客户端/root下执行 umount .gvfs rm -rf .gvfs