Linuxサービスの稼働時間とダウンタイムを監視する方法

Question 1

さまざまなサーバーで多数のアプリケーションを監視するにはNagiOSを選択し、特定のアプリケーション、ファイルの所有権などを監視するにはMonitを選択します。

ローカルホストで実行されているデーモンプロセスまたは同様のプログラムを監視します。 Monitは、システム起動時に/etc/init.d/で始まるデーモンなどのデーモンを監視するのに特に役立ちます。たとえば、sendmail、sshd、apache、mysqlなどがあります。

多くのモニタリングシステムとは異なり、Monitはエラー状態が発生した場合に対処することができます。たとえば、sendmail が実行されていない場合、Monit は自動的に sendmail を再起動できるか、Apache が多すぎるリソースを使用している場合 (DoS 攻撃が進行中の場合など)、Monit は Apache を停止または再起動し、警告メッセージを表示します。送信できます。 Monitは、プロセスが使用するメモリやCPUサイクルなどのプロセス特性を監視することもできます。

更新::構成セクション

Monitはaptitudeまたはapt-getを介してインストールするのが最も簡単です。

sudo aptitude install monit

monitがダウンロードされたら、設定ファイルにプログラムとプロセスを追加できます。

vim /etc/monit/monitrc

set daemon 3                    # check services at 3-second intervals
set logfile /var/log/monit.log  # you can see what monit is doing
set alert [email protected]        # receive all alerts
include /etc/monit.d/*          # add monit script path

次に、アプリケーションの monit スクリプトを生成し、次のスクリプト例を確認します。

monitスクリプトを生成し、/etc/monit.d/monit/etc/monit.d/httpd.monitサービスを再ロードしてmonitログを確認します。tail -f /var/log/monit.log

Apacheの場合

check process apache with pidfile /usr/local/apache/logs/httpd.pid
   start program = "/etc/init.d/httpd start" with timeout 60 seconds
   stop program  = "/etc/init.d/httpd stop" 
   if cpu > 60% for 2 cycles then alert
   if cpu > 80% for 5 cycles then restart
   if totalmem > 200.0 MB for 5 cycles then restart
   if children > 250 then restart
   if loadavg(5min) greater than 10 for 8 cycles then stop
   if failed host www.tildeslash.com port 80 protocol http
      and request "/monit/doc/next.php"
      then restart
   if failed port 443 type tcpssl protocol http
      with timeout 15 seconds
      then restart
   if 3 restarts within 5 cycles then timeout
   depends on apache_bin
   group server

Safesquid プロキシの場合

# Check if the safesquid process is running by monitoring the PID recorded in /opt/safesquid/safesquid/run/safesquid.pid
check process safesquid with pidfile /opt/safesquid/safesquid/run/safesquid.pid
group root
start program = "/etc/init.d/safesquid start"
stop program = "/etc/init.d/safesquid stop"
mode active
# If safesquid process is active it must be updating the performance log at
# /opt/safesquid/safesquid/logs/performance/performance.log every 2 seconds.
# If the file is more than 3 seconds old we definitely have a problem

check file "safesquid-PERFORMANCELOG" with path /opt/safesquid/safesquid/logs/performance/performance.log
  if timestamp > 3 SECOND then alert

Answer

さまざまなサーバーで多数のアプリケーションを監視するにはNagiOSを選択し、特定のアプリケーション、ファイルの所有権などを監視するにはMonitを選択します。

モニタリングを使用できます。

ローカルホストで実行されているデーモンプロセスまたは同様のプログラムを監視します。 Monitは、システム起動時に/etc/init.d/で始まるデーモンなどのデーモンを監視するのに特に役立ちます。たとえば、sendmail、sshd、apache、mysqlなどがあります。

多くのモニタリングシステムとは異なり、Monitはエラー状態が発生した場合に対処することができます。たとえば、sendmail が実行されていない場合、Monit は自動的に sendmail を再起動できるか、Apache が多すぎるリソースを使用している場合 (DoS 攻撃が進行中の場合など)、Monit は Apache を停止または再起動し、警告メッセージを表示します。送信できます。 Monitは、プロセスが使用するメモリやCPUサイクルなどのプロセス特性を監視することもできます。

更新::構成セクション

Monitはaptitudeまたはapt-getを介してインストールするのが最も簡単です。

sudo aptitude install monit

monitがダウンロードされたら、設定ファイルにプログラムとプロセスを追加できます。

vim /etc/monit/monitrc

set daemon 3                    # check services at 3-second intervals
set logfile /var/log/monit.log  # you can see what monit is doing
set alert [email protected]        # receive all alerts
include /etc/monit.d/*          # add monit script path

次に、アプリケーションの monit スクリプトを生成し、次のスクリプト例を確認します。

monitスクリプトを生成し、/etc/monit.d/monit/etc/monit.d/httpd.monitサービスを再ロードしてmonitログを確認します。tail -f /var/log/monit.log

Apacheの場合

check process apache with pidfile /usr/local/apache/logs/httpd.pid
   start program = "/etc/init.d/httpd start" with timeout 60 seconds
   stop program  = "/etc/init.d/httpd stop" 
   if cpu > 60% for 2 cycles then alert
   if cpu > 80% for 5 cycles then restart
   if totalmem > 200.0 MB for 5 cycles then restart
   if children > 250 then restart
   if loadavg(5min) greater than 10 for 8 cycles then stop
   if failed host www.tildeslash.com port 80 protocol http
      and request "/monit/doc/next.php"
      then restart
   if failed port 443 type tcpssl protocol http
      with timeout 15 seconds
      then restart
   if 3 restarts within 5 cycles then timeout
   depends on apache_bin
   group server

Safesquid プロキシの場合

# Check if the safesquid process is running by monitoring the PID recorded in /opt/safesquid/safesquid/run/safesquid.pid
check process safesquid with pidfile /opt/safesquid/safesquid/run/safesquid.pid
group root
start program = "/etc/init.d/safesquid start"
stop program = "/etc/init.d/safesquid stop"
mode active
# If safesquid process is active it must be updating the performance log at
# /opt/safesquid/safesquid/logs/performance/performance.log every 2 seconds.
# If the file is more than 3 seconds old we definitely have a problem

check file "safesquid-PERFORMANCELOG" with path /opt/safesquid/safesquid/logs/performance/performance.log
  if timestamp > 3 SECOND then alert

Question 2

監視したいサービスのpidを知っている場合は、サーバーが特定のアイテムのリソース使用量を追跡するために、しばらく前に次のように書きました。

http://cognitivedissonance.ca/cogware/plog

完全に安定しており、非常に低プロファイルで、使用が非常に簡単です。上部に表示される内容の詳細なバージョンを報告しますが、頻度は低く、ログファイルに報告します。たとえば、プロセスを1分ごとまたは5分ごとに確認するように設定できます。これは理由について多くの手がかりを提供しませんが、停止する時間を提供します。

Answer

監視したいサービスのpidを知っている場合は、サーバーが特定のアイテムのリソース使用量を追跡するために、しばらく前に次のように書きました。

http://cognitivedissonance.ca/cogware/plog

完全に安定しており、非常に低プロファイルで、使用が非常に簡単です。上部に表示される内容の詳細なバージョンを報告しますが、頻度は低く、ログファイルに報告します。たとえば、プロセスを1分ごとまたは5分ごとに確認するように設定できます。これは理由について多くの手がかりを提供しませんが、停止する時間を提供します。

Question 3

コメントでJBoss Webサーバーを監視しようとしていると述べました。

モニタリング方法についてお問い合わせいただきました。提供する、あなたのプロセスではありません。 JBossがまだ実行されている場合、プロセスが輻輳してクエリに応答しなくても問題はありません。もし知りたいですか？サービスが機能しません、単にプロセスの終了ではありません。

大規模なサービス監視パッケージを実行したくない場合。ナギオスまたはイシンガまたはザビクスまたはオープンネットワーク管理システムまたはシンケンまたはジェノス、いつでも、curlまたは同じものを使用して賭けることができますwget。

スクリプトを作成して名前を付けて、/root/bin/check_webcrontabで実行してみましょう。

*/5 * * * * /root/bin/check_web http://www.example.com [email protected]

スクリプトは次のとおりです。

#!/bin/bash

if [[ $1 !~ ^https?://[a-z][a-z.]+ ]]; then
  echo "ERROR: that doesn't look like a URL ($1)" >&2
  exit 1
elif [[ $2 !~ .+@[a-z0-9.-]+ ]]; then
  echo "ERROR: that doesn't look like an email address ($2)" >&2
  exit 1
fi

flag="/tmp/m-${1//[^[:alnum:]:.-]/_}"

wget -O /dev/null -q "$1"
result=$?

if [[ $result -eq 0 ]]; then
  if [ -f "$flag" ]; then
    date | Mail -s "Clear: $1" "$2"
    rm -f "$flag"
  fi
else
  if [ ! -f "$flag" ]; then
    echo "error: $?" | Mail -s "OFFLINE: $1" "$2"
    touch "$flag"
  fi
fi

ifsを入れ子にすると、問題が発生した場合に電子メールノイズを減らすのに役立ちます。トラブルシューティングに取り組んでいる間、5分ごとに別の通知を受け取る必要はありません。ただし、突然の再起動や短いネットワークの中断が原因で問題が発生した場合に備えて、バックアップが完了したことを通知することをお勧めします。

このように、より一般的なスクリプトを使用すると、複数のサイトを監視し、通知を受け取る電子メール受信者を複数設定できます。

これらのスクリプトをいくつか作成し、サービスが完全にオフラインの場合は、CRITICALとは異なり、遅い応答に対する警告をエクスポートする機能を追加し、Webフロントエンドを提供して個々のホストの状態を検索および管理し、実行されるスクリプトを作成します。 cron専用デーモンの代わりにNagiosがあります。 :-)

Answer