高可用性RKE2クラスター(Rancher Kubernetes Engine 2)をインストールして構成しようとしています。私のアーキテクチャは4つのVMで構成されています。 1 つは DNS と LoadBalancer が構成され、もう 1 つはサーバー ノードが構成されて実行され、2 つの VM が結合ノードとして使用されます。
エージェント・ノードのログには、以下が指定されています。
Jul 27 13:10:22 ha-rancher-2 rke2[31465]: time="2023-07-27T13:10:22Z" level=fatal msg="starting kubernetes: preparing server: failed to get CA certs: https://rancher.inwi.priv:9345/cacerts: 503 Service Unavailable"
Jul 27 13:10:22 ha-rancher-2 systemd[1]: rke2-server.service: main process exited, code=exited, status=1/FAILURE
Jul 27 13:10:22 ha-rancher-2 systemd[1]: Failed to start Rancher Kubernetes Engine v2 (server).
Jul 27 13:10:22 ha-rancher-2 systemd[1]: Unit rke2-server.service entered failed state.
Jul 27 13:10:22 ha-rancher-2 systemd[1]: rke2-server.service failed.
Jul 27 13:10:28 ha-rancher-2 systemd[1]: rke2-server.service holdoff time over, scheduling restart.
Jul 27 13:10:28 ha-rancher-2 systemd[1]: Stopped Rancher Kubernetes Engine v2 (server).
Jul 27 13:10:28 ha-rancher-2 systemd[1]: Starting Rancher Kubernetes Engine v2 (server)...
Jul 27 13:10:28 ha-rancher-2 sh[31479]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Jul 27 13:10:28 ha-rancher-2 sh[31479]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
Jul 27 13:10:28 ha-rancher-2 rke2[31485]: time="2023-07-27T13:10:28Z" level=warning msg="not running in CIS mode"
Jul 27 13:10:28 ha-rancher-2 rke2[31485]: time="2023-07-27T13:10:28Z" level=info msg="Starting rke2 v1.24.15+rke2r1 (8cf3a75d5ccd6e2aa0a99cdf869426f1decd970d)"
Jul 27 13:10:28 ha-rancher-2 rke2[31485]: time="2023-07-27T13:10:28Z" level=info msg="Managed etcd cluster not yet initialized"
Jul 27 13:10:28 ha-rancher-2 rke2[31485]: time="2023-07-27T13:10:28Z" level=fatal msg="starting kubernetes: preparing server: failed to validate server configuration: CA cert validation failed: https://rancher.inwi.priv:9345/cacerts: 503 Service Unavailable"
Jul 27 13:10:28 ha-rancher-2 systemd[1]: rke2-server.service: main process exited, code=exited, status=1/FAILURE
Jul 27 13:10:28 ha-rancher-2 systemd[1]: Failed to start Rancher Kubernetes Engine v2 (server).
Jul 27 13:10:28 ha-rancher-2 systemd[1]: Unit rke2-server.service entered failed state.
Jul 27 13:10:28 ha-rancher-2 systemd[1]: rke2-server.service failed.
問題を調査しようとしましたが、奇妙なことに、「curl」コマンドは断続的に機能し、混乱しています。
[root@HA-Rancher-2 ~]# curl -vks https://rancher.inwi.priv:9345/cacerts * About to connect() to rancher.inwi.priv port 9345 (#0)
* Trying 172.20.10.210...
* Connected to rancher.inwi.priv (172.20.10.210) port 9345 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* NSS: client certificate not found (nickname not specified)
* SSL connection using TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
* Server certificate:
* subject: CN=rke2,O=rke2
* start date: Jul 25 19:12:23 2023 GMT
* expire date: Jul 25 22:54:08 2024 GMT
* common name: rke2
* issuer: CN=rke2-server-ca@1690312343
> GET /cacerts HTTP/1.1
> User-Agent: curl/7.29.0
> Host: rancher.inwi.priv:9345
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/plain
< Date: Thu, 27 Jul 2023 14:34:58 GMT
< Content-Length: 570
<
-----BEGIN CERTIFICATE-----
MIIBeTCCAR+gAwIBAgIBADAKBggqhkjOPQQDAjAkMSIwIAYDVQQDDBlya2UyLXNl
cnZlci1jYUAxNjkwMzEyMzQzMB4XDTIzMDcyNTE5MTIyM1oXDTMzMDcyMjE5MTIy
M1owJDEiMCAGA1UEAwwZcmtlMi1zZXJ2ZXItY2FAMTY5MDMxMjM0MzBZMBMGByqG
SM49AgEGCCqGSM49AwEHA0IABJdeIAgxOwLhgv7IH4hloybTf...
-----END CERTIFICATE-----
* Connection #0 to host rancher.inwi.priv left intact
[root@HA-Rancher-2 ~]# curl -vks https://rancher.inwi.priv:9345/cacerts
* About to connect() to rancher.inwi.priv port 9345 (#0)
* Trying 172.20.10.210...
* Connected to rancher.inwi.priv (172.20.10.210) port 9345 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* NSS: client certificate not found (nickname not specified)
* SSL connection using TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
* Server certificate:
* subject: CN=rke2,O=rke2
* start date: Jul 25 19:12:23 2023 GMT
* expire date: Jul 25 22:54:20 2024 GMT
* common name: rke2
* issuer: CN=rke2-server-ca@1690312343
> GET /cacerts HTTP/1.1
> User-Agent: curl/7.29.0
> Host: rancher.inwi.priv:9345
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Thu, 27 Jul 2023 14:35:00 GMT
< Content-Length: 9
<
starting
* Connection #0 to host rancher.inwi.priv left intact**strong text**
LB vmの「netstat」コマンドを使用すると、使用しているポート(ポート9345など)が「ESTABLISHED」状態に切り替わらず、接続状態が2分以上「TIME_WAIT」状態になっていることを確認できます。 。
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 rancher:48916 172.20.10.11:9345 TIME_WAIT
tcp 0 0 rancher:48898 172.20.10.11:9345 TIME_WAIT
tcp 0 0 rancher:48958 172.20.10.11:9345 TIME_WAIT
tcp 0 0 rancher:56558 172.20.10.14:9345 TIME_WAIT
tcp 0 0 rancher:48988 172.20.10.11:9345 TIME_WAIT
tcp 0 0 rancher:ssh 172.20.10.200:44748 ESTABLISHED
tcp 0 0 rancher:48978 172.20.10.11:9345 TIME_WAIT
tcp 0 0 rancher:56568 172.20.10.14:9345 TIME_WAIT
tcp 0 0 rancher:9345 172.20.10.13:40856 TIME_WAIT
tcp 0 0 rancher:9345 172.20.10.13:40892 TIME_WAIT
tcp 0 0 rancher:56538 172.20.10.14:9345 TIME_WAIT
tcp 0 0 rancher:48924 172.20.10.11:9345 TIME_WAIT
tcp 0 0 rancher:56526 172.20.10.14:9345 TIME_WAIT
udp 0 0 rancher:34185 8.8.8.8:domain ESTABLISHED
udp 0 0 rancher:41489 8.8.4.4:domain ESTABLISHED
udp 0 0 rancher:47731 8.8.8.8:domain ESTABLISHED
udp 0 0 rancher:40760 8.8.8.8:domain ESTABLISHED