私はSybaseの偉大な専門家であるMr。 Rob Verschoorが書いたシェルスクリプトを使ってきました。ここ。これはcronジョブを介して1時間に1回呼び出され、キーワードがエラーログの定義済みキーワードと一致した場合に電子メールを送信します。簡単に参照できるように問題を引き起こす可能性のあるコードを以下に投稿しました。
LAST_MARKER=$(${AWK} '/'$MARKER'/ { a=NR } END { print a }' $LOGFILE_COPY)
LAST_MARKER=`echo "$LAST_MARKER+0"|bs`
if [ ! "$LAST_MARKER" = "" ]
then
sed "1,${LAST_MARKER}d" $LOGFILE_COPY > $TMP.x
cp $TMP.x $LOGFILE_COPY
fi
これは、行1の後に行を追加することを除いて、過去2年間に問題なくうまく機能しました。私の立場では、次のようになります。
LAST_MARKER=`echo "$LAST_MARKER+0"|bs`
これは、科学的な形式で提供される正しい数値形式で返される行数の形式を指定するためのものです。
ほぼ毎秒追跡メッセージでエラーログを入力する監視ツールを無効にした後、過去数日間に最後のタグを見つけるのに問題があるようです。したがって、デフォルトでは最後のタグから新しいタグまで行項目が多く、問題はありませんでした。これで、このツールを無効にしてから勤務時間外にアクティビティがなくなり、最後のタグと新しいタグが後続の行になります。
以前は、多くのメッセージを含むエラーログは次のようになりました。
00:0005:00000:00514:2020/04/17 10:15:59.92 server _Marker_For_Checking_Errorlog_
00:0005:00000:00514:2020/04/17 10:15:59.92 server _Marker_End_
...
0:0002:00000:00608:2020/04/16 11:12:40.88 server DBCC TRACEON 3604, SPID 608
00:0002:00000:00608:2020/04/16 11:12:40.88 server DBCC TRACEOFF 3604, SPID 608
00:0006:00000:00660:2020/04/16 11:13:40.47 server DBCC TRACEON 3604, SPID 660
00:0006:00000:00660:2020/04/16 11:13:40.47 server DBCC TRACEOFF 3604, SPID 660
00:0006:00000:00664:2020/04/16 11:13:40.51 server DBCC TRACEON 3604, SPID 664
00:0006:00000:00664:2020/04/16 11:13:40.51 server DBCC TRACEOFF 3604, SPID 664
00:0002:00000:00608:2020/04/16 11:13:40.54 server DBCC TRACEON 3604, SPID 608
00:0002:00000:00608:2020/04/16 11:13:40.54 server DBCC TRACEOFF 3604, SPID 608
00:0006:00000:00660:2020/04/16 11:13:40.87 server DBCC TRACEON 3604, SPID 660
00:0006:00000:00660:2020/04/16 11:13:40.87 server DBCC TRACEOFF 3604, SPID 660
00:0004:00000:00608:2020/04/16 11:14:40.92 server DBCC TRACEOFF 3604, SPID 608
...
00:0005:00000:00514:2020/04/17 11:15:59.92 server _Marker_For_Checking_Errorlog_
00:0005:00000:00514:2020/04/17 11:15:59.92 server _Marker_End_
これでエラーログは次のようになります。
00:0004:00000:00974:2020/04/17 09:15:28.80 server _Marker_For_Checking_Errorlog_
00:0004:00000:00974:2020/04/17 09:15:38.80 server _Marker_End_
00:0005:00000:00514:2020/04/17 10:15:59.92 server _Marker_For_Checking_Errorlog_
00:0005:00000:00514:2020/04/17 10:15:59.92 server _Marker_End_
00:0003:00000:00030:2020/04/17 11:16:01.51 server _Marker_For_Checking_Errorlog_
00:0003:00000:00030:2020/04/17 11:16:01.51 server _Marker_End_
このツールは前のタグと最後のタグを区別できないため、3〜4時間前に発生したエラーを送信し続けます。前回の内容のエラーログに何も記録されていないため、エラーメールを送信しないでください。
私はシェルスクリプトの専門家ではないので、これを助けてくれてありがとう。
編集:過去1時間(3時15分から4時15分まで)に事前定義された一致キーワードがあったため、このツールの正しい動作は4時15分(予定時刻)に次の電子メールを送信することです。
Checking ASE errorlog
Fri Apr 17 04:16:06 WAT 2020
Server=Sybaseprd
Errorlog=/mount/ASE-15_0/install/Sybaseprd.log
00:0006:00000:00061:2020/04/17 04:03:37.15 server Error: 1621, Severity: 18, State: 1
00:0006:00000:00061:2020/04/17 04:03:37.15 server Type '16' not allowed before login.
00:0004:00000:00668:2020/04/17 04:03:42.17 server Error: 1621, Severity: 18, State: 1
00:0004:00000:00668:2020/04/17 04:03:42.17 server Type '16' not allowed before login.
00:0004:00000:00100:2020/04/17 04:03:42.17 server Error: 1621, Severity: 18, State: 1
00:0004:00000:00100:2020/04/17 04:03:42.17 server Type '16' not allowed before login.
00:0012:00000:00000:2020/04/17 04:03:49.30 kernel ksmask__rpacket: Invalid tdslength value 21536, kpid: 268895208
00:0003:00000:00932:2020/04/17 04:04:59.20 server Error: 1621, Severity: 18, State: 1
00:0003:00000:00932:2020/04/17 04:04:59.20 server Type '3' not allowed before login.
9 error lines found in errorlog for ASE server 'SybasePrd'
(end)
誤った動作は次のとおりです。
Checking ASE errorlog
Fri Apr 17 05:16:01 WAT 2020
Server=SybasePrd
Errorlog=/mount/ASE-15_0/install/Sybaseprd.log
00:0006:00000:00061:2020/04/17 04:03:37.15 server Error: 1621, Severity: 18, State: 1
00:0006:00000:00061:2020/04/17 04:03:37.15 server Type '16' not allowed before login.
00:0004:00000:00668:2020/04/17 04:03:42.17 server Error: 1621, Severity: 18, State: 1
00:0004:00000:00668:2020/04/17 04:03:42.17 server Type '16' not allowed before login.
00:0004:00000:00100:2020/04/17 04:03:42.17 server Error: 1621, Severity: 18, State: 1
00:0004:00000:00100:2020/04/17 04:03:42.17 server Type '16' not allowed before login.
00:0012:00000:00000:2020/04/17 04:03:49.30 kernel ksmask__rpacket: Invalid tdslength value 21536, kpid: 268895208
00:0003:00000:00932:2020/04/17 04:04:59.20 server Error: 1621, Severity: 18, State: 1
00:0003:00000:00932:2020/04/17 04:04:59.20 server Type '3' not allowed before login.
9 error lines found in errorlog for ASE server 'SybasePRD'
(end)
上記の操作は5:15に実行され、4:15と5:15の間に一致する行がないため、何も報告されません。前述のように、プログラムは次の5つのスケジュール、つまり10:15まで電子メールを送信し続け、上記のエラー後にエラーログのエントリ数が40を超える場合にのみ停止します。
したがって、望ましい結果は、上記のシェルスクリプトでエラーを見つけて修正して、過去の時間を正確に確認することです。つまり、エラーログの最後の表示から最後の行までのエントリがない場合は、エントリが追加されていないことを意味します。最後に開始を確認したら、次のように何も確認したり報告したりしないでください。
00:0004:00000:00974:2020/04/17 09:15:28.80 server _Marker_For_Checking_Errorlog_
00:0004:00000:00974:2020/04/17 09:15:38.80 server _Marker_End_
00:0005:00000:00514:2020/04/17 10:15:59.92 server _Marker_For_Checking_Errorlog_
00:0005:00000:00514:2020/04/17 10:15:59.92 server _Marker_End_
00:0003:00000:00030:2020/04/17 11:16:01.51 server _Marker_For_Checking_Errorlog_
00:0003:00000:00030:2020/04/17 11:16:01.51 server _Marker_End_
答え1
いくつかの問題が発生しました。もう一度問題を解決できるかどうか見てみましょう。公開したコードを想定すると、次のようになります。
LAST_MARKER=$(${AWK} '/'$MARKER'/ { a=NR } END { print a }' $LOGFILE_COPY)
LAST_MARKER=`echo "$LAST_MARKER+0"|bs`
if [ ! "$LAST_MARKER" = "" ]
then
sed "1,${LAST_MARKER}d" $LOGFILE_COPY > $TMP.x
cp $TMP.x $LOGFILE_COPY
fi
次の場合、$ LOGFILE_COPYから$ MARKER(存在する場合)を含む最後の行までテキストを削除するように設計されていますtac
。
tac "$LOGFILE_COPY" | awk -v m="$MARKER" '$0~m{exit} 1' | tac > "${TMP}.x" &&
mv "${TMP}.x" "$LOGFILE_COPY"
そうでない場合は、tac
次のステップ2 awkのみのソリューションは少し遅く実行され、パイプの入力では機能しませんが、すべてのサイズの入力ファイルでは機能しますが、上記のtacソリューションは、入力が失敗するとファイルが大きすぎます。
awk -v m="$MARKER" 'NR==FNR{if ($0~m) a=NR; next} FNR>a' "$LOGFILE_COPY" "$LOGFILE_COPY" > "${TMP}.x" &&
mv "${TMP}.x" "$LOGFILE_COPY"
遅すぎる場合(それは驚くでしょう)、少し速くなるかもしれません(起動したスクリプトよりもはるかに速いでしょう)。
start=$(awk -v m="$MARKER" '$0~m{a=NR} END{printf "%d\n", a+1; exit (a?0:1)}' "$LOGFILE_COPY") &&
tail -n +"$start" "$LOGFILE_COPY" > "${TMP}.x" &&
mv "${TMP}.x" "$LOGFILE_COPY"
これはあなたの問題を解決しますか?
ナレーター:修正が始まる方法です。オリジナルスクリプトこれらの最も基本的な問題を解決し、読みやすくするには:
#!/bin/sh
this_prog=$(basename "$0")
usage()
{
echo "Usage:"
echo " $this_prog <servername> <login> <passwd> [<errorlog-pathname> [\"all\"]]"
}
#---------------------------------------------------------------------------
# Check parameters
if [ $# -lt 3 ] || [ $# -gt 5 ]
then
usage
exit 1
fi
srv=$1
login=$2
psswd=$3
logfile=$4
opt=$5
#---------------------------------------------------------------------------
# Temp directory
tmp=$(mktemp -d) || exit 1
trap 'rm -f "$tmp"/*; rmdir "$tmp"; exit' 0
logfile_copy="${tmp}/errlog"
#---------------------------------------------------------------------------
# Some contants; do NOT change these !
dft_mailprog="your_mail_program" #DO NOT CHANGE -- go to the next section
dft_dba_mail="[email protected] [email protected]" #DO NOT CHANGE
# -- go to the next section
#---------------------------------------------------------------------------
# Some definitions
#
# mailprog must be set to your command-line mail program, like 'mail', 'mailx',
# etc. Later in this script, it is assumed that this mail program supports
# specifying the mail subject on the command line with the "-s" option.
# Should you use 'sendmail', you'll have to modify the script, or do without
# the mail subject, as 'sendmail' does not have this "-s" option.
# NT users may want to use 'ssmtp' (part of CygWin) as their mail
# program (also see comment below).
mailprog="$dft_mailprog" # define your own setting here
# Define a list of people receiving results by email:
dba_mail="$dft_dba_mail" # define your own setting here
skip_when_empty=NO # if YES, will not send mail when no errors were found
#---------------------------------------------------------------------------
# The marker strings below can be set to any arbitrary string, as long
# as this is unique and does not appear in the errorlog as part of any
# error message.
# These strings should not be changed anymore once you've started using
# this script.
marker="_Marker_For_Checking_Errorlog_" #do not change this !
marker2="_Marker_End_" #do not change this !
#--------------------------------------------------------------------------
# Change the below to 'gawk' (or 'nawk') if desired... This may be needed
# when hitting built-in max. string length limits in 'awk'. 'gawk' etc.
# tend to be more flexible.
AWK='awk' # awk|gawk
#---------------------------------------------------------------------
# Check the mail program and email adresses have been defined
if [ "$mailprog" = "$dft_mailprog" ]
then
echo ""
echo "You must first define the variable 'mailprog' in this script;"
echo "please set it to the name of your command-line mail program,"
echo "like 'mail', 'mailx', etc."
echo ""
exit 1
fi
if [ "$dba_mail" = "$dft_dba_mail" ]
then
echo ""
echo "You must first define the variable 'dba_mail' in this script;"
echo "please set it to a list of recipients."
echo ""
exit 1
fi
#--------------------------------------------------------------------------
# First locate the server errorlog
rm -f "$logfile_copy"
if [ "$logfile" = "" ]
then
# Pick up the server errorlog pathname; first check if this is 12.0
# or later to determine the method for doing this
#
cat << --EOF-- > "${tmp}/vchk.sql"
select name from sysobjects -- used for ASE version check
where name = "sysqueryplans"
go
dbcc traceon(3604)
go
dbcc resource -- contains errorlog pathname
go
--EOF--
# The below isql session also doubles as an ASE access and
# privilege check.
# Using 'cat' and piping the SQL to isql is done to make it run on
# Windows NT as well ('cos the NT version of 'isql' won't understand
# Unix-style pathnames)
#
< "${tmp}/vchk.sql" isql -S"$srv" -U"$login" -P"$psswd" -w500 > "${tmp}/vchk"
if grep -q "CT-LIBRARY error" "${tmp}/vchk"
then
cat "${tmp}/vchk"
echo ""
echo "*** Note: in case you cannot connect because the ASE server is down,"
echo "*** you can also specify the errorlog pathname explicitly."
echo ""
usage
exit 1
fi
if grep "You must have the following role(s) to" "${tmp}/vchk"
then
exit 1
fi
# 18-Sep-2001 Corrected the test below: it said "-ne 1" instead of "-eq 1",
# causing it to not to identify version pre-12.0 correctly
# (thanks to Jean Loesch)
#
if [ "$(grep -c "sysqueryplans" "${tmp}/vchk")" -eq 1 ]
then
#--------------------------------------------------------------------------
# This is ASE 12.0+, so locate the errorlog through @@errorlog (this isn't
# really necessary, as dbcc resource would still work fine), but let's do
# it anyway for educational purposes ...
cat << --EOF-- > "${tmp}/ataterrlog.sql"
print @@errorlog
go
--EOF--
< "${tmp}/ataterrlog.sql" isql -S"$srv" -U"$login" -P"$psswd" > "${tmp}/ataterrlog"
logfile=$( "$AWK" '{print $1}' "${tmp}/ataterrlog" )
#--------------------------------------------------------------------------
else # not 12.0+
# This is ASE pre-12.0, so locate the errorlog through dbcc resource (already
# executed above)
logfile=$( "$AWK" 'sub(/.*rerrfile=/,""){print $1}' "${tmp}/vchl" )
fi
fi # if $logfile = ""
#--------------------------------------------------------------------------
# Errorlog file name known now, check if it's there
if [ ! -f "$logfile" ]
then
echo "Error accessing server errorlog file [$logfile] - file not found"
echo "Note: this script must be run on the same host where the "
echo "ASE errorlog file is located."
exit 1
fi
cp "$logfile" "$logfile_copy"
#--------------------------------------------------------------------------
# Check option parameter
#
if [ "$opt" = "" ]
then
scan_all=N
else
scan_all=Y
echo "Scanning the entire ASE errorlog."
fi
#--------------------------------------------------------------------------
if [ "$scan_all" = "N" ]
then
# Skip the part of the errorlog until the last marker
# Note: if the next line gives an error message, use a different shell
last_marker=$("$AWK" -v marker="$marker" '$0 ~ marker { a=NR } END { print a+0 }' "$logfile_copy")
if [ ! "$last_marker" = "" ]
then
sed "1,${last_marker}d" "$logfile_copy" > "${tmp}/x" &&
cp "${tmp}/x" "$logfile_copy"
fi
fi
#--------------------------------------------------------------------------
# Create output file
{
echo "Checking ASE errorlog"
date
echo "Server=$srv"
echo "Errorlog=$logfile"
echo ""
} > "${tmp}/out"
#--------------------------------------------------------------------------
# Finally... search for errors in the log file. The below set of search
# strings catches pretty much everything, but you can add any string here
# which you would also like to search for...
#
# Note that these strings indicate the presence of messages that should
# be investigated. Still, this may require further inspection of the
# errorlog, as more messages may be present which contain additional
# information.
grep -Ei '(warning|severity|fail|unmirror|mirror exit|not enough|error|suspect|corrupt|correct|deadlock|critical|allow|infect|error|full|problem|unable|not found|threshold|couldn|not valid|invalid|NO_LOG|logsegment|syslogs|stacktrace)' "$logfile_copy" |
grep -Evi '(successfull|_Marker_|(Suspect Granularity))' > "${tmp}/out2"
nrlines=$(wc -l "${tmp}/out2" | "$AWK" '{print $1}')
cat "${tmp}/out2" >> "${tmp}/out"
#--------------------------------------------------------------------------
#
echo "$nrlines error lines found in errorlog for ASE server '$srv'"
{
echo ""
echo "$nrlines error lines found in errorlog for ASE server '$srv'"
echo ""
echo "(end)"
} >> "${tmp}/out"
if [ "$skip_when_empty" = "NO" ] && [ "$nrlines" -eq 0 ]
then
nrlines=1 # to force it into mailing anyway
fi
if [ "$nrlines" -gt 0 ]
then
# Mail any error messages found to the list of recipients
# (note: assumption is that the -s "subject" option is available for
# your email program. Should you use "sendmail", it may not be
# available, and you'd have to remove this option; when you're familiar
# with 'sendmail', you can add the subject line yourself by inserting
# header lines into the message file)
#
# Note for NT users: if you need a command-line mail program on NT,
# consider 'ssmtp'. This is part of the CygWin package, which you need
# anyway to run this script on NT. The download location for CygWin
# is in the file header above.
subj="Results of ASE errorlog check for '$srv'"
"$mailprog" -s "$subj" "$dba_mail" < "${tmp}/out"
fi
#--------------------------------------------------------------------------
if [ "$scan_all" = "N" ]
then
# Write a new marker to the server errorlog to indicate we got till here
# Only do this when (i) no explicit errorlog pathname was specified and
# (ii) only the last part of the log was scanned.
cat << --EOF-- > "${tmp}/logprint.sql"
dbcc logprint ("$marker")
dbcc logprint ("$marker2") -- need a second line to avoid missing the last line
if @@error = 0 print "Writing marker to ASE errorlog."
-- note: in ASE 12.0, we could the more tidy "dbcc printolog(string)" instead
go
--EOF--
< "${tmp}/logprint.sql" isql -S"$srv" -U"$login" -P"$psswd" | grep -Ev '(DBCC execution compl|(SA))'
fi
#--------------------------------------------------------------------------
# end
#
他にも改善できる部分があり、まだテストされていないためバグがあるかもしれませんが、オリジナルと比べてオリジナルがどのように変更されるべきかというアイデアを得ることを願っています。