私は1つのLinuxサーバーm / cに対してNagios監視ツールを設定して別のLinuxホストm / cを監視しました(これまでは単一のm / cのみを監視しています)。公式文書に従って、サーバー側にnagiosサーバーをインストールし、クライアント側にNRPEデーモンをインストールしました。ドキュメントによると、nagiosは正常に動作しており、監視に使用するすべてのサービスを定期的に確認しており、いくつかの追加のプラグインもインストールされています。
しかし、監視ホストの出力を特定のファイルから正しい形式にインポートする方法を知りたいです。 Apacheを介してWebインタフェースをインストールしていないので、問題を解決する方法はありますか?
Nagiosモニタリング用に取得したログファイルは次のとおりです。
[1349064000] LOG ROTATION: DAILY
[1349064000] LOG VERSION: 2.0
[1349064000] CURRENT HOST STATE: localhost;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 0.03 ms
[1349064000] CURRENT HOST STATE: remotehost;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 0.17 ms
[1349064000] CURRENT SERVICE STATE: localhost;Current Load;OK;HARD;1;OK - load average: 0.00, 0.00, 0.00
[1349064000] CURRENT SERVICE STATE: localhost;Current Users;OK;HARD;1;USERS OK - 7 users currently logged in
[1349064000] CURRENT SERVICE STATE: localhost;HTTP;OK;HARD;1;HTTP OK HTTP/1.1 200 OK - 1889 bytes in 0.001 seconds
[1349064000] CURRENT SERVICE STATE: localhost;PING;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.04 ms
[1349064000] CURRENT SERVICE STATE: localhost;Root Partition;CRITICAL;HARD;100;DISK CRITICAL - free space: / 108 MB (1% inode=61%):
[1349064000] CURRENT SERVICE STATE: localhost;SSH;OK;HARD;1;SSH OK - OpenSSH_5.1 (protocol 2.0)
[1349064000] CURRENT SERVICE STATE: localhost;Swap Usage;OK;HARD;1;SWAP OK - 97% free (841 MB out of 870 MB)
[1349064000] CURRENT SERVICE STATE: localhost;Total Processes;OK;HARD;1;PROCS OK: 79 processes with STATE = RSZDT
[1349064000] CURRENT SERVICE STATE: remotehost;CPU Load;OK;HARD;1;OK - load average: 0.08, 0.02, 0.01
[1349064000] CURRENT SERVICE STATE: remotehost;Current Users;WARNING;HARD;3;USERS WARNING - 3 users currently logged in
[1349064000] CURRENT SERVICE STATE: remotehost;File Size;WARNING;HARD;3;WARN: /home/new/ctags.1p has size 13864 Byte. Warn at 13000. :
[1349064000] CURRENT SERVICE STATE: remotehost;Swap Usage;OK;HARD;1;SWAP OK - 100% free (869 MB out of 870 MB)
[1349064000] CURRENT SERVICE STATE: remotehost;Total Processes;OK;HARD;1;PROCS OK: 106 processes
[1349064000] CURRENT SERVICE STATE: remotehost;Zombie Processes;OK;HARD;1;PROCS OK: 0 processes with STATE = Z
[1349064028] SERVICE NOTIFICATION: nagiosadmin;remotehost;Current Users;WARNING;notify-service-by-email;USERS WARNING - 3 users currently logged in
[1349064988] Auto-save of retention data completed successfully.
[1349065258] SERVICE NOTIFICATION: nagiosadmin;remotehost;File Size;WARNING;notify-service-by-email;WARN: /home/new/ctags.1p has size 13864 Byte. Warn at 13000. :
[1349065938] SERVICE NOTIFICATION: nagiosadmin;localhost;Root Partition;CRITICAL;notify-service-by-email;DISK CRITICAL - free space: / 109 MB (1% inode=61%):
[1349067628] SERVICE NOTIFICATION: nagiosadmin;remotehost;Current Users;WARNING;notify-service-by-email;USERS WARNING - 3 users currently logged in
[1349068588] Auto-save of retention data completed successfully.
[1349068858] SERVICE NOTIFICATION: nagiosadmin;remotehost;File Size;WARNING;notify-service-by-email;WARN: /home/new/ctags.1p has size 13864 Byte. Warn at 13000. :
[1349069538] SERVICE NOTIFICATION: nagiosadmin;localhost;Root Partition;CRITICAL;notify-service-by-email;DISK CRITICAL - free space: / 109 MB (1% inode=61%)
この過程で私が間違ったことがあれば教えてください。この質問でより多くのnagios情報が必要な場合は、お知らせください。共有いたします。
よろしくお願いします。
答え1
まず、最近ちょっと忙しくて質問に答えることができず、本当に申し訳ないと申し上げます。
ここでは、あなたの質問に2つの答えを提供します。
最初の答え:(単調で革新的ではありません):
!/bin/sh
#
# Log file pattern detector plugin for Nagios
#
# Usage: ./check_log <log_file> <old_log_file> <pattern>
#
# Description:
#
# This plugin will scan a log file (specified by the <log_file> option)
# for a specific pattern (specified by the <pattern> option). Successive
# calls to the plugin script will only report *new* pattern matches in the
# log file, since an copy of the log file from the previous run is saved
# to <old_log_file>.
#
# Output:
#
# On the first run of the plugin, it will return an OK state with a message
# of "Log check data initialized". On successive runs, it will return an OK
# state if *no* pattern matches have been found in the *difference* between the
# log file and the older copy of the log file. If the plugin detects any
# pattern matches in the log diff, it will return a CRITICAL state and print
# out a message is the following format: "(x) last_match", where "x" is the
# total number of pattern matches found in the file and "last_match" is the
# last entry in the log file which matches the pattern.
#
# Notes:
#
# If you use this plugin make sure to keep the following in mind:
#
# 1. The "max_attempts" value for the service should be 1, as this
# will prevent Nagios from retrying the service check (the
# next time the check is run it will not produce the same results).
#
# 2. The "notify_recovery" value for the service should be 0, so that
# Nagios does not notify you of "recoveries" for the check. Since
# pattern matches in the log file will only be reported once and not
# the next time, there will always be "recoveries" for the service, even
# though recoveries really don't apply to this type of check.
#
# 3. You *must* supply a different <old_file_log> for each service that
# you define to use this plugin script - even if the different services
# check the same <log_file> for pattern matches. This is necessary
# because of the way the script operates.
#
# Examples:
#
# Check for login failures in the syslog...
#
# check_log /var/log/messages ./check_log.badlogins.old "LOGIN FAILURE"
#
# Check for port scan alerts generated by Psionic's PortSentry software...
#
# check_log /var/log/message ./check_log.portscan.old "attackalert"
#
# Paths to commands used in this script. These
# may have to be modified to match your system setup.
# TV: removed PATH restriction. Need to think more about what this means overall
#PATH=""
ECHO="/bin/echo"
GREP="/bin/egrep"
DIFF="/bin/diff"
TAIL="/bin/tail"
CAT="/bin/cat"
RM="/bin/rm"
CHMOD="/bin/chmod"
TOUCH="/bin/touch"
PROGNAME=`/bin/basename $0`
PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'`
REVISION="@NP_VERSION@"
. $PROGPATH/utils.sh
print_usage() {
echo "Usage: $PROGNAME -F logfile -O oldlog -q query"
echo "Usage: $PROGNAME --help"
echo "Usage: $PROGNAME --version"
}
print_help() {
print_revision $PROGNAME $REVISION
echo ""
print_usage
echo ""
echo "Log file pattern detector plugin for Nagios"
echo ""
support
}
# Make sure the correct number of command line
# arguments have been supplied
if [ $# -lt 1 ]; then
print_usage
exit $STATE_UNKNOWN
fi
# Grab the command line arguments
#logfile=$1
#oldlog=$2
#query=$3
exitstatus=$STATE_WARNING #default
while test -n "$1"; do
case "$1" in
--help)
print_help
exit $STATE_OK
;;
-h)
print_help
exit $STATE_OK
;;
--version)
print_revision $PROGNAME $REVISION
exit $STATE_OK
;;
-V)
print_revision $PROGNAME $REVISION
exit $STATE_OK
;;
--filename)
logfile=$2
shift
;;
-F)
logfile=$2
shift
;;
--oldlog)
oldlog=$2
shift
;;
-O)
oldlog=$2
shift
;;
--query)
query=$2
shift
;;
-q)
query=$2
shift
;;
-x)
exitstatus=$2
shift
;;
--exitstatus)
exitstatus=$2
shift
;;
*)
echo "Unknown argument: $1"
print_usage
exit $STATE_UNKNOWN
;;
esac
shift
done
# If the source log file doesn't exist, exit
if [ ! -e $logfile ]; then
$ECHO "Log check error: Log file $logfile does not exist!\n"
exit $STATE_UNKNOWN
elif [ ! -r $logfile ] ; then
$ECHO "Log check error: Log file $logfile is not readable!\n"
exit $STATE_UNKNOWN
fi
# If the old log file doesn't exist, this must be the first time
# we're running this test, so copy the original log file over to
# the old diff file and exit
if [ ! -e $oldlog ]; then
$CAT $logfile > $oldlog
$ECHO "Log check data initialized...\n"
exit $STATE_OK
fi
# The old log file exists, so compare it to the original log now
# The temporary file that the script should use while
# processing the log file.
if [ -x /bin/mktemp ]; then
tempdiff=`/bin/mktemp /tmp/check_log.XXXXXXXXXX`
else
tempdiff=`/bin/date '+%H%M%S'`
tempdiff="/tmp/check_log.${tempdiff}"
$TOUCH $tempdiff
$CHMOD 600 $tempdiff
fi
$DIFF $logfile $oldlog | $GREP -v "^>" > $tempdiff
# Count the number of matching log entries we have
count=`$GREP -c "$query" $tempdiff`
# Get the last matching entry in the diff file
lastentry=`$GREP "$query" $tempdiff | $TAIL -1`
$RM -f $tempdiff
$CAT $logfile > $oldlog
if [ "$count" = "0" ]; then # no matches, exit with no error
$ECHO "Log check ok - 0 pattern matches found\n"
exitstatus=$STATE_OK
else # Print total matche count and the last entry we found
$ECHO "($count) $lastentry"
exitstatus=$STATE_CRITICAL
fi
exit $exitstatus
しかし、参考にしてください、私はこれを実行していないので、エラーが表示されたら直接修正する必要があります。
Commands.cfgにこの行を追加する必要があります。
define command{
command_name check_log
command_line $USER1$/check_log -F $CURRENTLOG -O $OLDLOG -q $PATTERN
}
localhost.cfgでサービスを定義する
define service{
use local-service ; Inherit default values from a template
host_name localhost
service_description check_log
check_command check_log!/var/log/secure!/usr/local/nagios/libexec/secure.my!"Failed password"
}
2番目の答え:(やや革新的):
私が知っている限り、nagiosログファイルは次の場所に保存されます。
/var/log/httpd/access_log
ログファイルには、すべてのログなどのタイムスタンプ情報が含まれます。したがって、ここではサーバーの起動時にシステム時間を記録する必要があります。私の経験では、WASを起動するとjava.exeプロセスが生成されると言うことができます。 Nagiosではそれを何と呼ぶのかわかりません。 LNT.exeだとしましょう。だから私たちはLNT.exeの作成時間を見つける必要があります。
サーバーを起動するとログが生成されます。これで、その時間以降のログファイルからログを読み取り、現在のログのみを表示できるようになります。
まず、プロセスのIDを取得して(ps -ef LNT.exe
)変数に保存します(例:)processID
。次に、次の手順を実行しls -ld /proc/${processID}
て時間を変数に保存します。startedTime
これで、ファイルを1行ずつ読み、取得した時間を比較する必要がありますstartedTime
。startedTime
>timeRead
この場合、そのポイントに基づいて、その場所からファイルの読み取りを開始する必要があります。