衝突の再起動により予期しない衝突が発生する

衝突の再起動により予期しない衝突が発生する

私は昨年よく働いていた物理サーバー(Debian 11 Bullseye)で働いていましたが、ここ数日間は非常に奇妙に動作し始めました。ランダムに再起動すると競合が発生し、問題が何であるかがわかりません。システムログを確認してください。 ..

REBOOT CRASH 1

Nov 26 07:04:01 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded.
Nov 26 07:04:01 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter.
Nov 26 07:04:57 testing systemd[1]: Starting Collect ipmitool sensor metrics for prometheus-node-exporter...
Nov 26 07:05:01 testing CRON[320608]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Nov 26 07:05:02 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded.
Nov 26 07:05:02 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter.
Nov 26 07:05:20 testing smartd[1136]: Device: /dev/bus/0 [megaraid_disk_04], SMART Failure: FAILURE PREDICTION THRESHOLD EXCEEDED: ascq=0x5
Nov 26 07:05:57 testing systemd[1]: Starting Collect ipmitool sensor metrics for prometheus-node-exporter...
Nov 26 07:06:01 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded.
Nov 26 07:06:01 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter.
Nov 26 07:28:51 testing systemd-random-seed[456]: Kernel entropy pool is not initialized yet, waiting until it is.
Nov 26 07:28:51 testing systemd[1]: Starting Flush Journal to Persistent Storage...
Nov 26 07:28:51 testing systemd[1]: Finished Create System Users.
Nov 26 07:28:51 testing systemd[1]: Starting Create Static Device Nodes in /dev...
Nov 26 07:28:51 testing systemd[1]: [email protected]: Succeeded.
Nov 26 07:28:51 testing kernel: [    0.000000] Linux version 5.10.0-9-amd64 ([email protected]) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.70-1 (2021-09-30)
Nov 26 07:28:51 testing systemd[1]: Finished Load Kernel Module drm.
Nov 26 07:28:51 testing systemd[1]: Finished Coldplug All udev Devices.
Nov 26 07:28:51 testing kernel: [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.10.0-9-amd64 root=UUID=14f7f68b-d049-4637-8f99-5441121afaf2412 ro quiet crashkernel=2000M crashkernel=384M-:128M
Nov 26 07:28:51 testing systemd[1]: Starting Helper to synchronize boot up for ifupdown...
Nov 26 07:28:51 testing kernel: [    0.000000] x86/fpu: x87 FPU will use FXSAVE
Nov 26 07:28:51 testing kernel: [    0.000000] BIOS-provided physical RAM map:
Nov 26 07:28:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x0000000000010000-0x000000000009ffff] usable
Nov 26 07:28:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bc767fff] usable
Nov 26 07:28:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bc768000-0x00000000bc867fff] type 20
Nov 26 07:28:51 testing systemd[1]: Finished Set the console keyboard layout.
Nov 26 07:28:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bc868000-0x00000000bc967fff] reserved
Nov 26 07:28:51 testing apparmor.systemd[962]: Restarting AppArmor
Nov 26 07:28:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bc968000-0x00000000bca66fff] usable
Nov 26 07:28:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bca67000-0x00000000bca6bfff] ACPI NVS
Nov 26 07:28:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bca6c000-0x00000000bcaebfff] ACPI data
Nov 26 07:28:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bcaec000-0x00000000bcf11fff] usable
Nov 26 07:28:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bcf42000-0x00000000bcf68fff] usable
Nov 26 07:28:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bd369000-0x00000000bf38efff] reserved

------------------------------------------------------------------------------------------------------------------------------------------------------

REBOOT CRASH 2

Nov 26 07:45:41 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded.
Nov 26 07:45:41 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter.
Nov 26 07:46:42 testing systemd[1]: Starting Collect ipmitool sensor metrics for prometheus-node-exporter...
Nov 26 07:46:46 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded.
Nov 26 07:46:46 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter.
Nov 26 07:47:42 testing systemd[1]: Starting Collect ipmitool sensor metrics for prometheus-node-exporter...
Nov 26 07:47:46 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded.
Nov 26 07:47:46 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter.
Nov 26 07:48:42 testing systemd[1]: Starting Collect ipmitool sensor metrics for prometheus-node-exporter...
Nov 26 07:48:43 testing ddclient[2198]: CONNECT:  checkip.dyndns.org
Nov 26 07:48:43 testing ddclient[2198]: CONNECTED:  using HTTP
Nov 26 07:48:43 testing ddclient[2198]: SENDING:  GET / HTTP/1.0
Nov 26 07:48:43 testing ddclient[2198]: SENDING:   Host: checkip.dyndns.org
Nov 26 07:48:43 testing ddclient[2198]: SENDING:   User-Agent: ddclient/3.9.1
Nov 26 07:48:43 testing ddclient[2198]: SENDING:   Connection: close
Nov 26 07:48:43 testing ddclient[2198]: SENDING:
Nov 26 07:48:43 testing ddclient[2198]: SENDING:
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE:  HTTP/1.1 200 OK#015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE:  Date: Fri, 26 Nov 2021 06:48:43 GMT#015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE:  Content-Type: text/html#015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE:  Content-Length: 104#015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE:  Connection: close#015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE:  Cache-Control: no-cache#015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE:  Pragma: no-cache#015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE:  #015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE:  <html><head><title>Current IP Check</title></head><body>Current IP Address: 123.456.78.90</body></html>#015
Nov 26 07:48:43 testing ddclient[2198]: SUCCESS:  database.testing.com: skipped: IP address was already set to 123.456.78.90.
Nov 26 07:48:43 testing ddclient[2198]: SUCCESS:  jenkins.testing.com: skipped: IP address was already set to 123.456.78.90.
Nov 26 07:48:43 testing ddclient[2198]: SUCCESS:  monitors.testing.com: skipped: IP address was already set to 123.456.78.90.
Nov 26 07:48:46 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded.
Nov 26 07:48:46 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter.
Nov 26 07:49:11 testing kernel: [  356.406208] perf: interrupt took too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
Nov 26 07:56:51 testing systemd-random-seed[448]: Kernel entropy pool is not initialized yet, waiting until it is.
Nov 26 07:56:51 testing systemd[1]: Starting Flush Journal to Persistent Storage...
Nov 26 07:56:51 testing systemd[1]: [email protected]: Succeeded.
Nov 26 07:56:51 testing systemd[1]: Finished Load Kernel Module drm.
Nov 26 07:56:51 testing kernel: [    0.000000] Linux version 5.10.0-9-amd64 ([email protected]) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.70-1 (2021-09-30)
Nov 26 07:56:51 testing systemd[1]: Finished Coldplug All udev Devices.
Nov 26 07:56:51 testing systemd[1]: Starting Helper to synchronize boot up for ifupdown...
Nov 26 07:56:51 testing kernel: [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.10.0-9-amd64 root=UUID=14f7f68b-d049-4637-1234-123456789 ro quiet crashkernel=2000M crashkernel=384M-:128M
Nov 26 07:56:51 testing systemd[1]: Finished Create Static Device Nodes in /dev.
Nov 26 07:56:51 testing kernel: [    0.000000] x86/fpu: x87 FPU will use FXSAVE
Nov 26 07:56:51 testing kernel: [    0.000000] BIOS-provided physical RAM map:
Nov 26 07:56:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x0000000000010000-0x000000000009ffff] usable
Nov 26 07:56:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bc767fff] usable
Nov 26 07:56:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bc768000-0x00000000bc867fff] type 20
Nov 26 07:56:51 testing systemd[1]: Starting Rule-based Manager for Device Events and Files...
Nov 26 07:56:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bc868000-0x00000000bc967fff] reserved
Nov 26 07:56:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bc968000-0x00000000bca66fff] usable
Nov 26 07:56:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bca67000-0x00000000bca6bfff] ACPI NVS
Nov 26 07:56:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bca6c000-0x00000000bcaebfff] ACPI data
Nov 26 07:56:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bcaec000-0x00000000bcf11fff] usable
Nov 26 07:56:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bcf42000-0x00000000bcf68fff] usable

Nov 23 11:05:13 myserver kernel: [    2.352549] ata_piix 0000:00:1f.2: version 2.13
Nov 23 11:05:13 myserver kernel: [    3.576528] sd 0:2:0:0: [sda] Mode Sense: 1f 00 10 08
Nov 23 11:05:13 myserver kernel: [    3.723306] sr 1:0:0:0: Attached scsi CD-ROM sr0
Nov 23 11:05:13 myserver kernel: [    4.093233] PM: Image not found (code -22)
Nov 23 11:05:13 myserver kernel: [   10.167638] checking generic (d5800000 130000) vs hw (d5800000 800000)
Nov 23 12:37:25 myserver PackageKit: daemon start
Nov 26 07:28:51 myserver kernel: [    0.002793] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
Nov 26 07:28:51 myserver kernel: [    0.002798] e820: remove [mem 0x000a0000-0x000fffff] usable
Nov 26 07:28:51 myserver kernel: [    0.002814] MTRR default type: uncachable
Nov 26 07:28:51 myserver kernel: [    0.002815] MTRR fixed ranges enabled:
Nov 26 07:28:51 myserver kernel: [    0.002817]   00000-9FFFF write-back
Nov 26 07:28:51 myserver kernel: [    0.002819]   A0000-BFFFF uncachable
Nov 26 07:28:51 myserver kernel: [    0.002820]   C0000-CBFFF write-protect
Nov 26 07:28:51 myserver kernel: [    0.002822]   CC000-D3FFF write-back
Nov 26 07:28:51 myserver kernel: [    0.002823]   D4000-EBFFF uncachable
Nov 26 07:28:51 myserver kernel: [    0.002825]   EC000-FFFFF write-protect
Nov 26 07:28:51 myserver kernel: [    0.002826] MTRR variable ranges enabled:
Nov 26 07:28:51 myserver kernel: [    0.002829]   0 base 0000000000 mask FF80000000 write-back
Nov 26 07:28:51 myserver kernel: [    0.002831]   1 base 0080000000 mask FFC0000000 write-back
Nov 26 07:28:51 myserver kernel: [    0.002833]   2 base 0100000000 mask FF00000000 write-back
Nov 26 07:28:51 myserver kernel: [    0.002834]   3 base 0200000000 mask FE00000000 write-back
Nov 26 07:28:51 myserver kernel: [    0.002836]   4 base 0400000000 mask FC00000000 write-back
Nov 26 07:28:51 myserver kernel: [    0.002838]   5 base 0800000000 mask F800000000 write-back
Nov 26 07:28:51 myserver kernel: [    0.002840]   6 base 1000000000 mask F800000000 write-back
Nov 26 07:28:51 myserver kernel: [    0.002842]   7 base 1800000000 mask FFC0000000 write-back
Nov 26 07:28:51 myserver kernel: [    0.002843]   8 disabled
Nov 26 07:28:51 myserver kernel: [    0.002844]   9 disabled
Nov 26 07:28:51 myserver kernel: [    0.004625] e820: update [mem 0xc0000000-0xffffffff] usable ==> reserved
Nov 26 07:28:51 myserver kernel: [    0.021048] e820: update [mem 0xba378000-0xba37afff] usable ==> reserved
Nov 26 07:28:51 myserver kernel: [    0.022384] ACPI: Local APIC address 0xfee00000
Nov 26 07:28:51 myserver kernel: [    0.023393] On node 0 totalpages: 12582912
Nov 26 07:28:51 myserver kernel: [    0.023395]   Normal zone: 196608 pages used for memmap
Nov 26 07:28:51 myserver kernel: [    0.023396]   Normal zone: 12582912 pages, LIFO batch:63
Nov 26 07:28:51 myserver kernel: [    0.023401] On node 1 totalpages: 12569668
Nov 26 07:28:51 myserver kernel: [    0.023402]   DMA zone: 64 pages used for memmap
Nov 26 07:28:51 myserver kernel: [    0.023404]   DMA zone: 3984 pages, LIFO batch:0
Nov 26 07:28:51 myserver kernel: [    0.023405]   DMA32 zone: 12019 pages used for memmap
Nov 26 07:28:51 myserver kernel: [    0.023407]   DMA32 zone: 769204 pages, LIFO batch:63
Nov 26 07:28:51 myserver kernel: [    0.023408]   Normal zone: 184320 pages used for memmap
Nov 26 07:28:51 myserver kernel: [    0.023410]   Normal zone: 11796480 pages, LIFO batch:63
Nov 26 07:28:51 myserver kernel: [    0.040393] ACPI: Local APIC address 0xfee00000
Nov 26 07:28:51 myserver kernel: [    0.040434] ACPI: IRQ0 used by override.
Nov 26 07:28:51 myserver kernel: [    0.040436] ACPI: IRQ9 used by override.
Nov 26 07:28:51 myserver kernel: [    0.049931] pcpu-alloc: s184152 r8192 d28840 u262144 alloc=1*2097152
Nov 26 07:28:51 myserver kernel: [    0.040436] ACPI: IRQ9 used by override.
Nov 26 07:28:51 myserver kernel: [    0.049931] pcpu-alloc: s184152 r8192 d28840 u262144 alloc=1*2097152
Nov 26 07:28:51 myserver kernel: [    0.049933] pcpu-alloc: [0] 00 02 04 06 08 10 12 14 [0] 16 18 20 22 -- -- -- --
Nov 26 07:28:51 myserver kernel: [    0.049950] pcpu-alloc: [1] 01 03 05 07 09 11 13 15 [1] 17 19 21 23 -- -- -- --
Nov 26 07:28:51 myserver kernel: [    0.950514] PCI: root bus fe: using default resources
Nov 26 07:28:51 myserver kernel: [    0.950516] PCI: Probing PCI hardware (bus fe)
Nov 26 07:28:51 myserver kernel: [    0.952145] PCI: root bus ff: using default resources
Nov 26 07:28:51 myserver kernel: [    0.952146] PCI: Probing PCI hardware (bus ff)
Nov 26 07:28:51 myserver kernel: [    0.953705] PCI: pci_cache_line_size set to 64 bytes
Nov 26 07:28:51 myserver kernel: [    0.953817] e820: reserve RAM buffer [mem 0xba378000-0xbbffffff]
Nov 26 07:28:51 myserver kernel: [    0.953820] e820: reserve RAM buffer [mem 0xbc768000-0xbfffffff]
Nov 26 07:28:51 myserver kernel: [    0.953823] e820: reserve RAM buffer [mem 0xbca67000-0xbfffffff]
Nov 26 07:28:51 myserver kernel: [    0.953826] e820: reserve RAM buffer [mem 0xbcf12000-0xbfffffff]
Nov 26 07:28:51 myserver kernel: [    0.953828] e820: reserve RAM buffer [mem 0xbcf69000-0xbfffffff]
Nov 26 07:28:51 myserver kernel: [    0.974640] system 00:00: Plug and Play ACPI device, IDs PNP0c01 (active)
Nov 26 07:28:51 myserver kernel: [    0.974699] pnp 00:01: Plug and Play ACPI device, IDs PNP0b00 (active)
Nov 26 07:28:51 myserver kernel: [    0.975062] pnp 00:02: Plug and Play ACPI device, IDs PNP0501 (active)
Nov 26 07:28:51 myserver kernel: [    0.975413] pnp 00:03: Plug and Play ACPI device, IDs PNP0501 (active)
Nov 26 07:28:51 myserver kernel: [    0.976584] system 00:04: Plug and Play ACPI device, IDs PNP0c01 (active)
Nov 26 07:28:51 myserver kernel: [    0.976656] pnp 00:05: [irq 0 disabled]
Nov 26 07:28:51 myserver kernel: [    0.976725] system 00:05: Plug and Play ACPI device, IDs IPI0001 PNP0c01 (active)
Nov 26 07:28:51 myserver kernel: [    0.977664] system 00:06: Plug and Play ACPI device, IDs PNP0c02 (active)
Nov 26 07:28:51 myserver kernel: [    0.977799] system 00:07: Plug and Play ACPI device, IDs PNP0c02 (active)
Nov 26 07:28:51 myserver kernel: [    1.847200] intel_idle: MWAIT substates: 0x1120
Nov 26 07:28:51 myserver kernel: [    1.847270] Monitor-Mwait will be used to enter C-1 state
Nov 26 07:28:51 myserver kernel: [    1.847287] Monitor-Mwait will be used to enter C-3 state
Nov 26 07:28:51 myserver kernel: [    1.847390] intel_idle: v0.5.1 model 0x2C
Nov 26 07:28:51 myserver kernel: [    1.848984] intel_idle: Local APIC timer is reliable in all C-states
Nov 26 07:28:51 myserver kernel: [    2.168571]   with arguments:
Nov 26 07:28:51 myserver kernel: [    2.168572]     /init
Nov 26 07:28:51 myserver kernel: [    2.168573]   with environment:
Nov 26 07:28:51 myserver kernel: [    2.168575]     HOME=/
Nov 26 07:28:51 myserver kernel: [    2.168576]     TERM=linux
Nov 26 07:28:51 myserver kernel: [    2.168577]     BOOT_IMAGE=/boot/vmlinuz-5.10.0-9-amd64
Nov 26 07:28:51 myserver kernel: [    2.168579]     crashkernel=384M-:128M
Nov 26 07:28:51 myserver kernel: [    2.336535] megaraid_sas 0000:04:00.0: BAR:0x1  BAR's base_addr(phys):0x00000000df1bc000  mapped virt_addr:0x(____ptrval____)
Nov 26 07:28:51 myserver kernel: [    2.348825] libata version 3.00 loaded.
Nov 26 07:28:51 myserver kernel: [    2.353143] ata_piix 0000:00:1f.2: version 2.13
Nov 26 07:28:51 myserver kernel: [    3.577545] sd 0:2:0:0: [sda] Mode Sense: 1f 00 10 08
Nov 26 07:28:51 myserver kernel: [    3.697853] sr 1:0:0:0: Attached scsi CD-ROM sr0
Nov 26 07:28:51 myserver kernel: [    4.107713] PM: Image not found (code -22)
Nov 26 07:28:51 myserver kernel: [   12.677156] checking generic (d5800000 130000) vs hw (d5800000 800000)
Nov 26 07:43:30 myserver kernel: [    0.002794] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
Nov 26 07:43:30 myserver kernel: [    0.002799] e820: remove [mem 0x000a0000-0x000fffff] usable
Nov 26 07:43:30 myserver kernel: [    0.002814] MTRR default type: uncachable
Nov 26 07:43:30 myserver kernel: [    0.002815] MTRR fixed ranges enabled:
Nov 26 07:43:30 myserver kernel: [    0.002817]   00000-9FFFF write-back
Nov 26 07:43:30 myserver kernel: [    0.002819]   A0000-BFFFF uncachable
Nov 26 07:43:30 myserver kernel: [    0.002821]   C0000-CBFFF write-protect
Nov 26 07:43:30 myserver kernel: [    0.002822]   CC000-D3FFF write-back
Nov 26 07:43:30 myserver kernel: [    0.002824]   D4000-EBFFF uncachable
Nov 26 07:43:30 myserver kernel: [    0.002825]   EC000-FFFFF write-protect
Nov 26 07:43:30 myserver kernel: [    0.002827] MTRR variable ranges enabled:
Nov 26 07:43:30 myserver kernel: [    0.002829]   0 base 0000000000 mask FF80000000 write-back
Nov 26 07:43:30 myserver kernel: [    0.002831]   1 base 0080000000 mask FFC0000000 write-back

CPU/RAMの使用量を監視してきましたが、CPU温度が摂氏34度を超えたことがなく、過負荷が30%を超えたこともありません。 RAM使用量は約70GB程度です。

再起動の理由がわからない。どうぞよろしくお願いします!

答え1

システムがクラッシュしてkernel panicログに何も表示されないことがあります。なぜなら、panicカーネルが実際にクラッシュしてログに何も書くことができないからです。競合が発生する前にカーネルがディスクに同期していない内容はすべて失われます。

core dumpkdumpaがトリガされると、ローカルディスク上のファイルにメモリダンプを書き込むアクティブカーネルを使用する必要があります。panic後で機械の起動後に使用できますcrash

あなたは読むことができますここカーネルコアダンプを有効にする方法のガイドラインです。あなたのディストリビューションでうまくいかない場合は、それを行う方法を説明する他の記事を探すことができます。システムがクラッシュしてコアダンプを生成した後にシステムをcrash分析するには、この情報が必要です。良いチュートリアルは以下にあります。デドメド。必ずしも簡単ではありませんが、衝突の手がかりを見つける唯一の方法です。コアダンプでは、競合が発生する前にディスクに同期されていないログを読み取ることもできます。

関連情報