ceph Storage MDSがメタデータIOが遅いと報告しました

ceph Storage MDSがメタデータIOが遅いと報告しました

ラボではceph Storageを使用しており、サーバーもあり、MON、OSD、MDSなどのすべてのサービスを単一のシステムにインストールしようとしています。

loopdeviceを使って2つのディスクを作成しました。 (サーバーにSSDディスクがあるため、非常に高速です。)

root@ceph2# losetup -a
/dev/loop1: [64769]:26869770 (/root/100G-2.img)
/dev/loop0: [64769]:26869769 (/root/100G-1.img)

これが私ceph -sの出力の外観です

root@ceph2# ceph -s
  cluster:
    id:     1106ae5c-e5bf-4316-8185-3e559d246ac5
    health: HEALTH_WARN
            1 MDSs report slow metadata IOs
            Reduced data availability: 65 pgs inactive
            Degraded data redundancy: 65 pgs undersized

  services:
    mon: 1 daemons, quorum ceph2 (age 8m)
    mgr: ceph2(active, since 9m)
    mds: 1/1 daemons up
    osd: 2 osds: 2 up (since 20m), 2 in (since 38m)

  data:
    volumes: 1/1 healthy
    pools:   3 pools, 65 pgs
    objects: 0 objects, 0 B
    usage:   11 MiB used, 198 GiB / 198 GiB avail
    pgs:     100.000% pgs not active
             65 undersized+peered

MDSの遅いIOエラーがどこで発生するのかわからず、mds統計は生成されたままです。

root@ceph2# ceph mds stat
cephfs:1 {0=ceph2=up:creating}

健康の詳細は次のとおりです。

root@ceph2# ceph health detail
HEALTH_WARN 1 MDSs report slow metadata IOs; Reduced data availability: 65 pgs inactive; Degraded data redundancy: 65 pgs undersized
[WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs
    mds.ceph2(mds.0): 31 slow metadata IOs are blocked > 30 secs, oldest blocked for 864 secs
[WRN] PG_AVAILABILITY: Reduced data availability: 65 pgs inactive
    pg 1.0 is stuck inactive for 22m, current state undersized+peered, last acting [1]
    pg 2.0 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.1 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.2 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.3 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.4 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.5 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.6 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.7 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.8 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.c is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.d is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.e is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.f is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.10 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.11 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.12 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.13 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.14 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.15 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.16 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.17 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.18 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.19 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.1a is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.1b is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.0 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.1 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.2 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.3 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.4 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.5 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.6 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.7 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.9 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.c is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.d is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.e is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.f is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.10 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.11 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.12 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.13 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.14 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.15 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.16 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.17 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.18 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.19 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.1a is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.1b is stuck inactive for 14m, current state undersized+peered, last acting [0]
[WRN] PG_DEGRADED: Degraded data redundancy: 65 pgs undersized
    pg 1.0 is stuck undersized for 22m, current state undersized+peered, last acting [1]
    pg 2.0 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.1 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.2 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.3 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.4 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.5 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.6 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.7 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.8 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.c is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.d is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.e is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.f is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.10 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.11 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.12 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.13 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.14 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.15 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.16 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.17 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.18 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.19 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.1a is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.1b is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.0 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.1 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.2 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.3 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.4 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.5 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.6 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.7 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.9 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.c is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.d is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.e is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.f is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.10 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.11 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.12 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.13 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.14 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.15 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.16 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.17 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.18 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.19 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.1a is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.1b is stuck undersized for 14m, current state undersized+peered, last acting [0]

ここで何が間違っている可能性がありますか?サーバーが1つでOSDが2つしかないからだと思いますか?

答え1

MDSはどのPGにも接続できず、すべてのPGが「非アクティブ」なので、メタデータの報告が遅くなります。 PGを実行すると、最終的に警告が消えます。プールあたりのデフォルトの圧縮ルールサイズは3で、OSDが2つしかない場合は決して達成できません。また、OSDがホストではなくスマッシュエラードメインになるように、osd_crush_chooseleaf_typeこの値を0に変更してください。次に、すべてのPGが両方のOSDに収まるようにプールサイズを2に変更する必要があります。ただし、プールサイズ 2 はテスト目的でのみ使用され、データを重視していない場合は本番用途にはお勧めできません。

関連情報