NVMeのZFS 4k randwriteパフォーマンスは、XFSに比べて非常に低いですか?

NVMeのZFS 4k randwriteパフォーマンスは、XFSに比べて非常に低いですか?

私は長い間ZFSのファンであり、私の家のNASで使用していましたが、本番ワークロードの実行可能性をテストしたとき、同じディスク上のXFSと比較してパフォーマンスが信じられないほど低いことがわかりました。次の設定でfio 3.21を使用してIntel P4510 8TBディスクでテストしました。

fio \
--name=xfs-fio \
--size=10G \
-group_reporting \
--time_based \
--runtime=300 \
--bs=4k \
--numjobs=64 \
--rw=randwrite \
--ioengine=sync \
--directory=/mnt/fio/

結果は次のとおりです。

xfs-fio: (groupid=0, jobs=64): err= 0: pid=63: Mon Feb  1 21:46:44 2021
  write: IOPS=189k, BW=738MiB/s (774MB/s)(216GiB/300056msec); 0 zone resets
    clat (usec): min=2, max=2430.4k, avg=336.28, stdev=4745.39
     lat (usec): min=2, max=2430.4k, avg=336.38, stdev=4745.40
    clat percentiles (usec):
     |  1.00th=[     7],  5.00th=[    10], 10.00th=[    10], 20.00th=[    11],
     | 30.00th=[    12], 40.00th=[    14], 50.00th=[    23], 60.00th=[    35],
     | 70.00th=[    36], 80.00th=[    37], 90.00th=[    39], 95.00th=[    40],
     | 99.00th=[    44], 99.50th=[  8455], 99.90th=[ 66323], 99.95th=[ 70779],
     | 99.99th=[179307]
   bw (  KiB/s): min=95565, max=7139939, per=100.00%, avg=757400.32, stdev=21559.21, samples=38262
   iops        : min=23890, max=1784976, avg=189327.65, stdev=5389.87, samples=38262
  lat (usec)   : 4=0.03%, 10=13.41%, 20=36.22%, 50=49.56%, 100=0.12%
  lat (usec)   : 250=0.13%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (msec)   : 100=0.46%, 250=0.02%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2000=0.01%, >=2000=0.01%
  cpu          : usr=0.27%, sys=7.34%, ctx=793590, majf=0, minf=116620
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,56715776,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1Run status group 0 (all jobs):
  WRITE: bw=738MiB/s (774MB/s), 738MiB/s-738MiB/s (774MB/s-774MB/s), io=216GiB (232GB), run=300056-300056msecDisk stats (read/write):
  nvme7n1: ios=25/21951553, merge=0/173138, ticks=4/660308, in_queue=265520, util=21.39%real    

ZFSでは、このzpoolを使用して以下を作成します。

# zpool create -o ashift=13 -o autoreplace=on nvme6 /dev/nvme6n1

このボリュームは以下を生成します。

zfs create              \
    -o mountpoint=/mnt/nvme6 \
    -o atime=off             \
    -o compression=lz4       \
    -o dnodesize=auto        \
    -o primarycache=metadata \
    -o recordsize=128k       \
    -o xattr=sa              \
    -o acltype=posixacl      \
    nvme6/test0

結果は次のとおりです。

zfs-fio: (groupid=0, jobs=64): err= 0: pid=64: Mon Feb  1 23:00:41 2021
  write: IOPS=28.3k, BW=110MiB/s (116MB/s)(32.3GiB/300004msec); 0 zone resets
    clat (usec): min=7, max=314789, avg=2258.78, stdev=2509.17
     lat (usec): min=7, max=314790, avg=2259.28, stdev=2509.22
    clat percentiles (usec):
     |  1.00th=[   52],  5.00th=[   70], 10.00th=[   81], 20.00th=[  106],
     | 30.00th=[  225], 40.00th=[ 1057], 50.00th=[ 1713], 60.00th=[ 2606],
     | 70.00th=[ 3458], 80.00th=[ 4146], 90.00th=[ 4948], 95.00th=[ 5669],
     | 99.00th=[ 8455], 99.50th=[12256], 99.90th=[25560], 99.95th=[30540],
     | 99.99th=[39060]
   bw (  KiB/s): min=51047, max=455592, per=100.00%, avg=113196.01, stdev=702.99, samples=38272
   iops        : min=12761, max=113897, avg=28297.59, stdev=175.73, samples=38272
  lat (usec)   : 10=0.01%, 20=0.01%, 50=0.80%, 100=16.73%, 250=12.93%
  lat (usec)   : 500=2.45%, 750=2.97%, 1000=3.37%
  lat (msec)   : 2=14.91%, 4=23.92%, 10=21.20%, 20=0.50%, 50=0.19%
  lat (msec)   : 100=0.01%, 250=0.01%, 500=0.01%
  cpu          : usr=0.31%, sys=7.39%, ctx=11163058, majf=0, minf=32449
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,8476060,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1Run status group 0 (all jobs):
  WRITE: bw=110MiB/s (116MB/s), 110MiB/s-110MiB/s (116MB/s-116MB/s), io=32.3GiB (34.7GB), run=300004-300004msecreal  

XFSは189,000 iopsを実行し、ZFSは28.3,000 iopsを実行しました(85%減少)、それに応じてスループットが減少しました。 CPUはデュアルXeon 6132で、マシンのコアは4.15.0-62-genericですが、5.xコアでも同じ効果を見ました。

関連情報