amd r9 5995x を搭載したホストで freeNas OS が kdb モードで停止したゲスト VM

amd r9 5995x を搭載したホストで freeNas OS が kdb モードで停止したゲスト VM

問題の説明:

かつてamd r3 3100を搭載したホストにubuntu20.04をインストールし、kvmをインストールし、freeNas vmを起動したことがありましたが、すべてがうまくいきました。ところがCPUを変えたらfreeNasゲストは動作しませんが、Ubuntuを使う他のゲストは動作しましたね。

無料Nasゲストログイン

db> reboot
cpu_reset: Restarting BSP
cpu_reset_proxy: Stopped CPU 1
GDB: no debug ports present
KDB: debugger backends: ddb
KDB: current backend: ddb
Copyright (c) 1992-2019 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
    The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.3-RELEASE-p14 #0 r325575+c936002dbe2(HEAD): Mon Sep 28 10:48:27 EDT 2020
    [email protected]:/freenas-releng/freenas/_BE/objs/freenas-releng/freenas/_BE/os/sys/FreeNAS.amd64-DEBUG amd64
FreeBSD clang version 8.0.0 (tags/RELEASE_800/final 356365) (based on LLVM 8.0.0)
WARNING: WITNESS option enabled, expect reduced performance.
VT(vga): text 80x25
CPU: AMD EPYC-Milan Processor (3400.05-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0xa00f11  Family=0x19  Model=0x1  Stepping=1
  Features=0x783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2>
  Features2=0xfff83203<SSE3,PCLMULQDQ,SSSE3,FMA,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV>
  AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
  AMD Features2=0xc003f7<LAHF,CMP,SVM,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,Topology,PCXC>
  Structured Extended Features=0x211c07ab<FSGSBASE,TSCADJ,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLWB,SHA>
  Structured Extended Features2=0x40060c<UMIP,PKU,RDPID>
  Structured Extended Features3=0xac000010<IBPB,STIBP,ARCH_CAP,SSBD>
  XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
  IA32_ARCH_CAPS=0x69<RDCL_NO,SKIP_L1DFL_VME>
  AMD Extended Feature Extensions ID EBX=0x300d205<CLZERO,XSaveErPtr>
  SVM: NP,NRIP,NAsids=16
Hypervisor: Origin = "KVMKVMKVM"
real memory  = 8489271296 (8096 MB)
avail memory = 8143572992 (7766 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <BOCHS  BXPCAPIC>
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 2 package(s)
WARNING: VIMAGE (virtualized network stack) is a highly experimental feature.
ioapic0 <Version 1.1> irqs 0-23 on motherboard
SMP: AP CPU #1 Launched!
random: entropy device external interface
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
kbd1 at kbdmux0
mlx5en: Mellanox Ethernet driver 3.5.1 (April 2019)
nexus0
vtvga0: <VT VGA driver> on motherboard
cryptosoft0: <software crypto> on motherboard
aesni0: <AES-CBC,AES-XTS,AES-GCM,AES-ICM> on motherboard
padlock0: No ACE support.
acpi0: <BOCHS BXPCRSDT> on motherboard
acpi0: Power Button (fixed)
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
atrtc0: <AT realtime clock> port 0x70-0x71,0x72-0x77 irq 8 on acpi0
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x608-0x60b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
isab0: <PCI-ISA bridge> at device 1.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel PIIX3 WDMA2 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xc1a0-0xc1af at device 1.1 on pci0
ata0: <ATA channel> at channel 0 on atapci0
ata1: <ATA channel> at channel 1 on atapci0
pci0: <bridge> at device 1.3 (no driver attached)
vgapci0: <VGA-compatible display> port 0xc100-0xc11f mem 0xf4000000-0xf7ffffff,0xf8000000-0xfbffffff,0xfc094000-0xfc095fff irq 10 at device 2.0 on pci0
vgapci0: Boot video device
virtio_pci0: <VirtIO PCI Network adapter> port 0xc120-0xc13f mem 0xfc096000-0xfc096fff,0xfebf0000-0xfebf3fff irq 11 at device 3.0 on pci0
vtnet0: <VirtIO Networking Adapter> on virtio_pci0
vtnet0: Ethernet address: 52:54:00:9b:85:3a
pci0: <multimedia, HDA> at device 4.0 (no driver attached)
uhci0: <Intel 82801I (ICH9) USB controller> port 0xc140-0xc15f irq 10 at device 5.0 on pci0
usbus0 on uhci0
usbus0: 12Mbps Full Speed USB v1.0
uhci1: <Intel 82801I (ICH9) USB controller> port 0xc160-0xc17f irq 10 at device 5.1 on pci0
usbus1 on uhci1
usbus1: 12Mbps Full Speed USB v1.0
uhci2: <Intel 82801I (ICH9) USB controller> port 0xc180-0xc19f irq 11 at device 5.2 on pci0
usbus2 on uhci2
usbus2: 12Mbps Full Speed USB v1.0
ehci0: <Intel 82801I (ICH9) USB 2.0 controller> mem 0xfc097000-0xfc097fff irq 11 at device 5.7 on pci0
usbus3: EHCI version 1.0
usbus3 on ehci0
usbus3: 480Mbps High Speed USB v2.0
virtio_pci1: <VirtIO PCI Console adapter> port 0xc080-0xc0bf mem 0xfc098000-0xfc098fff,0xfebf4000-0xfebf7fff irq 10 at device 6.0 on pci0
virtio_pci2: <VirtIO PCI Balloon adapter> port 0xc0c0-0xc0ff mem 0xfebf8000-0xfebfbfff irq 11 at device 7.0 on pci0
vtballoon0: <VirtIO Balloon Adapter> on virtio_pci2
virtio_pci3: <VirtIO PCI Block adapter> port 0xc000-0xc07f mem 0xfc099000-0xfc099fff,0xfebfc000-0xfebfffff irq 11 at device 8.0 on pci0
vtblk0: <VirtIO Block Adapter> on virtio_pci3
vtblk0: 5723166MB (11721045168 512 byte sectors)
acpi_syscontainer0: <System Container> on acpi0
acpi_syscontainer1: <System Container> port 0xaf00-0xaf0b on acpi0
acpi_syscontainer2: <System Container> port 0xafe0-0xafe3 on acpi0
acpi_syscontainer3: <System Container> port 0xae00-0xae13 on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model IntelliMouse Explorer, device ID 4
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: console (9600,n,8,1)
orm0: <ISA Option ROM> at iomem 0xe9800-0xeffff on isa0
attimer0: <AT timer> at port 0x40 on isa0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
Timecounters tick every 10.000 msec
freenas_sysctl: adding account.
freenas_sysctl: adding directoryservice.
freenas_sysctl: adding middlewared.
freenas_sysctl: adding network.
freenas_sysctl: adding services.
ipfw2 (+ipv6) initialized, divert enabled, nat enabled, default to accept, logging disabled
ugen2.1: <Intel UHCI root HUB> at usbus2
ugen3.1: <Intel EHCI root HUB> at usbus3
uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
ugen0.1: <Intel UHCI root HUB> at usbus0
uhub1: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
uhub2: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ugen1.1: <Intel UHCI root HUB> at usbus1
uhub3: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1
ada0 at ata0 bus 0 scbus0 target 0 lun 0
ada0: <QEMU HARDDISK 2.5+> ATA-7 device
ada0: Serial Number QM00001
ada0: 16.700MB/s transfers (WDMA2, PIO 8192bytes)
ada0: 61440MB (125829120 512 byte sectors)
cd0 at ata0 bus 0 scbus0 target 1 lun 0
cd0: <QEMU QEMU DVD-ROM 2.5+> Removable CD-ROM SCSI device
cd0: Serial Number QM00002
cd0: 16.700MB/s transfers (WDMA2, ATAPI 12bytes, PIO 65534bytes)
cd0: Attempt to query device size failed: NOT READY, Medium not present
WARNING: WITNESS option enabled, expect reduced performance.
Trying to mount root from zfs:freenas-boot/ROOT/default []...
Root mount waiting for: usbus3 usbus2 usbus1 usbus0
uhub0: 2 ports with 2 removable, self powered
uhub2: 2 ports with 2 removable, self powered
uhub3: 2 ports with 2 removable, self powered
Root mount waiting for: usbus3
Root mount waiting for: usbus3
uhub1: 6 ports with 6 removable, self powered
Root mount waiting for: usbus3
ugen3.2: <QEMU QEMU USB Tablet> at usbus3
Starting devd.
warning: KLD '/boot/kernel-debug/uhid.ko' is newer than the linker.hints file
lo0: link state changed to UP


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0xfffffe02311f30c0
fault code      = supervisor write data, page not present
instruction pointer = 0x20:0xffffffff81016d09


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0xfffffe02311c60c0
stack pointer           = 0x28:0xfffffe02311f1eb0
frame pointer           = 0x28:0xfffffe02311f1eb0
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process     = 99 (python3.7)
trap number     = 12
panic: page fault
cpuid = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe02311f1b70
vpanic() at vpanic+0x17e/frame 0xfffffe02311f1bd0
panic() at panic+0x43/frame 0xfffffe02311f1c30
trap_fatal() at trap_fatal+0x369/frame 0xfffffe02311f1c80
trap_pfault() at trap_pfault+0x62/frame 0xfffffe02311f1cd0
trap() at trap+0x2b3/frame 0xfffffe02311f1de0
calltrap() at calltrap+0x8/frame 0xfffffe02311f1de0
--- trap 0xc, rip = 0xffffffff81016d09, rsp = 0xfffffe02311f1eb0, rbp = 0xfffffe02311f1eb0 ---
bcopy() at bcopy+0x19/frame 0xfffffe02311f1eb0
fpugetregs() at fpugetregs+0x192/frame 0xfffffe02311f1f00
get_mcontext() at get_mcontext+0x1b4/frame 0xfffffe02311f1f50
sys_getcontext() at sys_getcontext+0x56/frame 0xfffffe02311f2300
amd64_syscall() at amd64_syscall+0x792/frame 0xfffffe02311f2430
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe02311f2430
--- syscall (421, FreeBSD ELF64, sys_getcontext), rip = 0x801c26280, rsp = 0x7fffffffd188, rbp = 0x7fffffffdcf0 ---
KDB: enter: panic
[ thread pid 99 tid 100490 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why

CPU BIOSの設定は次のとおりです。

dmidecode | grep "Processor Information" -A 54
Processor Information
    Socket Designation: AM4
    Type: Central Processor
    Family: Zen
    Manufacturer: Advanced Micro Devices, Inc.
    ID: 10 0F A2 00 FF FB 8B 17
    Signature: Family 25, Model 33, Stepping 0
    Flags:
        FPU (Floating-point unit on-chip)
        VME (Virtual mode extension)
        DE (Debugging extension)
        PSE (Page size extension)
        TSC (Time stamp counter)
        MSR (Model specific registers)
        PAE (Physical address extension)
        MCE (Machine check exception)
        CX8 (CMPXCHG8 instruction supported)
        APIC (On-chip APIC hardware supported)
        SEP (Fast system call)
        MTRR (Memory type range registers)
        PGE (Page global enable)
        MCA (Machine check architecture)
        CMOV (Conditional move instruction supported)
        PAT (Page attribute table)
        PSE-36 (36-bit page size extension)
        CLFSH (CLFLUSH instruction supported)
        MMX (MMX technology supported)
        FXSR (FXSAVE and FXSTOR instructions supported)
        SSE (Streaming SIMD extensions)
        SSE2 (Streaming SIMD extensions 2)
        HTT (Multi-threading)
    Version: AMD Ryzen 9 5950X 16-Core Processor
    Voltage: 1.1 V
    External Clock: 100 MHz
    Max Speed: 5050 MHz
    Current Speed: 3400 MHz
    Status: Populated, Enabled
    Upgrade: Socket AM4
    L1 Cache Handle: 0x0013
    L2 Cache Handle: 0x0014
    L3 Cache Handle: 0x0015
    Serial Number: Unknown
    Asset Tag: Unknown
    Part Number: Unknown
    Core Count: 16
    Core Enabled: 16
    Thread Count: 32
    Characteristics:
        64-bit capable
        Multi-Core
        Hardware Thread
        Execute Protection
        Enhanced Virtualization
        Power/Performance Control

kdbをリセットした後、次のメッセージが見つかりました。

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0xfffffe02311d00c0
fault code      = supervisor write data, page not present
instruction pointer = 0x20:0xffffffff81016d09
stack pointer           = 0x28:0xfffffe02311ceeb0
frame pointer           = 0x28:0xfffffe02311ceeb0
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process     = 99 (python3.7)
trap number     = 12
panic: page fault
cpuid = 1
KDB: stack backtrace:

私が試したこと:

  1. ゲストを再インストールしましたが失敗しました。同じ問題でkdbモードに入ることができません。
  2. ホストを再起動しましたが、問題を解決できませんでした。

質問:

  1. KDBからより詳細な情報を収集するにはどうすればよいですか?
  2. 問題を解決する方法
  3. freeNasはAMD Ryzen 9 5950X 16コアプロセッサをサポートしていません。

答え1

Wuの助けを借りて、次のコマンドを使用してfreeNas osイメージを使用してテスト仮想マシンを作成できました。

virt-install \
--name test \
--memory 8096 \
--vcpus 2 \
--cpu host-model-only \
--cdrom /var/lib/libvirt/isos/TrueNAS-12.0-U5.1.iso \
--disk size=30,bus=virtio \
--network type=direct,source=enp42s0,source_mode=bridge \
--os-type=linux  \
--os-variant freebsd11.3 \
--graphics vnc,listen=0.0.0.0,port=20012 \
--video vga --input tablet,bus=usb

freeNas vmとtest vmのxmlを比較した後、CPUコンポーネントを次のように変更しました。

  <cpu mode='custom' match='exact' check='partial'>
    <model fallback='allow'>EPYC-Rome</model>
    <feature policy='require' name='ibpb'/>
    <feature policy='require' name='spec-ctrl'/>
    <feature policy='require' name='ssbd'/>
    <feature policy='require' name='virt-ssbd'/>
  </cpu>

そして、次のコマンドを実行してください

virsh destroy freeNas
virsh start freeNas

ついにそれが戻ってきた。

現在では理論ではなく、試みにインスピレーションを受けたものなので、なぜこのようなことが発生するのか全く分かりません。

関連情報