highly-available-kubernetes-pi-cluster

By having a Highly Available Kubernetes Pi Cluster, you will have full control over your production grade environment on-premise


HA Kubernetes Pi Cluster (Part I)

(Total Setup Time: 25 mins)

On this special day, I will like to wish all Singaporeans and Singapore a Happy 55th National Day!

With the newly purchase 2x Raspberry Pi Model B 8GB and 64GB SD card to my collection, I will setup a Highly Available Kubernetes Pi Cluster. In this guide, I will setup an external etcd key-value store. In the next article, I will continue with the HA configuration.

Preparing OS

(10 mins)

First, I am using Ubuntu Server (64-bit) as my OS. After burning the image onto my 64GB SD card, create an empty file ssh in d:/boot. This section is required for each master nodes.

Second, change the default password ubuntu for the default ubuntu user. Upgrade the OS:

sudo apt update
sudo apt upgrade

Third, change the hostname by running:

sudo vi /etc/hostname
sudo vi /etc/hosts

Fourth, letting iptables to see bridged traffic:

# Checks if br_netfilter is loaded
lsmod | grep br_netfilter

# Loads its explicitly
sudo modprobe br_netfilter

# Sees bridged traffic
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

sudo sysctl --system

Fifth, enable memory cgroup, by adding the following to /boot/firmware/cmdline.txt:

cgroup=cpuset cgroup_enable=memory cgroup_memory=1 swapaccount=1

Finally, add the following to /boot/firmware/usercfg.txt for disabling WiFi and Bluetooth:

dtoverlay=disable-wifi
dtoverlay=disable-bt

# Memory group should be 1 after reboot
sudo reboot

grep mem /proc/cgroups | awk '{ print $4 }'

Creating Virtual IP

(5 mins)

First, install keepalived referencing from LVS-NAT-Keepalived for all master nodes.

#Installs keepalived
sudo apt install keepalived

# Configures keepalived
sudo vi /etc/keepalived/keepalived.conf
#VRRP Instances definitions
#state MASTER for first master, BACKUP for other master nodes
vrrp_instance VI_1 {
    state MASTER 
    interface eth0
    virtual_router_id 51
    priority 150
    authentication {
        auth_type PASS
        auth_pass seehiong
    }
    virtual_ipaddress {
        192.168.100.200
    }
}

# Virtual Servers definitions
virtual_server 192.168.100.200 6443 {
	delay_loop 6
	lb_algo rr
	lb_kind NAT
	protocol TCP
	real_server 192.168.100.119 6443 {
		weight 1
		TCP_CHECK {
			connect_timeout 3
			connect_port 6443
		}
	}
	real_server 192.168.100.173 6443 {
		weight 1
		TCP_CHECK {
			connect_timeout 3
			connect_port 6443
		}
	}
	real_server 192.168.100.100 6443 {
		weight 1
		TCP_CHECK {
			connect_timeout 3
			connect_port 6443
		}
	}
}
# Restarts keepalived
sudo systemctl restart keepalived

Second, test the connection, which will fail at this point in time:

nc -v 192.168.100.200 6443

# Expected result
nc: connect to 192.168.100.200 port 6443 (tcp) failed: Connection refused

Preparing certs for etcd

(5 mins)

First, by following openssl CA, configure openssl and create root cert:

sudo su

# Openssl configuration
vi /usr/lib/ssl/openssl.cnf
[ CA_default ]                                                                                                                                                                                
dir             = /root/ca

mkdir /root/ca
cd /root/ca
mkdir newcerts certs crl private requests
touch index.txt
echo '1234' > serial

# Root certificate
openssl genrsa -aes256 -out private/cakey.pem 4096
openssl req -new -x509 -key private/cakey.pem -out cacert.pem -days 3650 -set_serial 0 -subj '/C=SG/ST=SG/O=seehiong/CN=master1'

Second, create certs for all master nodes:

# Create master nodes' certificate
cd /root/ca/requests/

openssl genrsa -out etcd-key.pem
openssl req -new -key etcd-key.pem -out etcd.csr -subj '/C=SG/ST=SG/O=seehiong/CN=master1,master2,master3,localhost,cluster-endpoint'
openssl ca -in etcd.csr -out etcd.pem \
  -extfile <(printf "subjectAltName=IP:192.168.100.119,IP:192.168.100.173,IP:192.168.100.100,IP:192.168.100.200,IP:127.0.0.1,\
  DNS:master1,DNS:master2,DNS:master3,DNS:localhost,DNS:cluster-endpoint")  

openssl genrsa -out peer-etcd-key.pem
openssl req -new -key peer-etcd-key.pem -out peer-etcd.csr -subj '/C=SG/ST=SG/O=seehiong/CN=192.168.100.119,192.168.100.173,192.168.100.100,192.168.100.200'
openssl ca -in peer-etcd.csr -out peer-etcd.pem \
  -extfile <(printf "subjectAltName=IP:192.168.100.119,IP:192.168.100.173,IP:192.168.100.100,IP:192.168.100.200,IP:127.0.0.1,\
  DNS:master1,DNS:master2,DNS:master3,DNS:localhost,DNS:cluster-endpoint")
	
rm etcd.csr peer-etcd.csr
mv etcd-key.pem peer-etcd-key.pem /root/ca/private/
mv etcd.pem peer-etcd.pem /root/ca/certs/

# Protects /root/ca folder
chmod -R 600 /root/ca

Third, copies all certs to /srv/etcd-certs/ for all master nodes.

# Copies certs to master1  
cp /root/ca/cacert.pem /root/ca/private/etcd-key.pem /root/ca/private/peer-etcd-key.pem \
  /root/ca/certs/etcd.pem /root/ca/certs/peer-etcd.pem /srv/etcd-certs/

# Copies certs to other master nodes
scp /root/ca/cacert.pem /root/ca/private/etcd-key.pem /root/ca/private/peer-etcd-key.pem \
  /root/ca/certs/etcd.pem /root/ca/certs/peer-etcd.pem ubuntu@master2:/tmp
scp /root/ca/cacert.pem /root/ca/private/etcd-key.pem /root/ca/private/peer-etcd-key.pem \
  /root/ca/certs/etcd.pem /root/ca/certs/peer-etcd.pem ubuntu@master3:/tmp

# ssh into other master nodes, perform these
sudo mv /tmp/*.pem  /srv/etcd-certs/
sudo chown -R etcd:etcd /srv/etcd-certs/

Lastly, update ca-certificate on all nodes:

sudo cp /srv/etcd-certs/cacert.pem /usr/local/share/ca-certificates
sudo update-ca-certificates --fresh

Setting up etcd

(5 mins)

First, by following v3.4.10 release, setup etcd on each master as follows:

ETCD_VER=v3.4.10
GITHUB_URL=https://github.com/etcd-io/etcd/releases/download
DOWNLOAD_URL=${GITHUB_URL}

# Downloads the arm64 architecture for Raspberry Pi
rm -f /tmp/etcd-${ETCD_VER}-linux-arm64.tar.gz
rm -rf /tmp/etcd-download-test && mkdir -p /tmp/etcd-download-test
curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-arm64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-arm64.tar.gz

# Extracts etcd
tar xzvf /tmp/etcd-${ETCD_VER}-linux-arm64.tar.gz -C /tmp/etcd-download-test --strip-components=1
rm -f /tmp/etcd-${ETCD_VER}-linux-arm64.tar.gz

# Checks version
/tmp/etcd-download-test/etcd --version (Error: etcd on unsupported platform without ETCD_UNSUPPORTED_ARCH=arm64 set)
/tmp/etcd-download-test/etcdctl version (Success: etcdctl version: 3.4.10, API version: 3.4)

# Moves to /usr/local/bin
sudo cp /tmp/etcd-download-test/etcd /usr/local/bin/
sudo cp /tmp/etcd-download-test/etcdctl /usr/local/bin/
export ETCD_UNSUPPORTED_ARCH=arm64

# Checks version again
etcd --version (Sccuess: running etcd on unsupported architecture "arm64" since ETCD_UNSUPPORTED_ARCH is set)

Second, prepares etcd as a service on master1:

sudo vi /lib/systemd/system/etcd.service
# Inserts the following into etcd.service for master1
[Unit]
Description=etcd key-value store
Documentation=https://etcd.io/docs/v3.4.0/

[Service]
User=etcd
Type=notify
Environment=ETCD_UNSUPPORTED_ARCH=arm64
#Loggingg flags
Environment=ETCD_LOGGER=zap
# Member flags
Environment=ETCD_NAME=infra1
Environment=ETCD_DATA_DIR=/var/lib/etcd
Environment=ETCD_LISTEN_PEER_URLS=https://192.168.100.119:2380
Environment=ETCD_LISTEN_CLIENT_URLS=https://192.168.100.119:2379,https://127.0.0.1:2379
Environment=ETCD_HEARTBEAT_INTERVAL=1000
Environment=ETCD_ELECTION_TIMEOUT=5000
# Clustering flags
Environment=ETCD_INITIAL_ADVERTISE_PEER_URLS=https://192.168.100.119:2380
Environment=ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster-1
Environment=ETCD_INITIAL_CLUSTER=infra1=https://192.168.100.119:2380,infra2=https://192.168.100.173:2380,infra3=https://192.168.100.100:2380
Environment=ETCD_INITIAL_CLUSTER_STATE=new
Environment=ETCD_ADVERTISE_CLIENT_URLS=https://192.168.100.119:2379
# Security flags
Environment=ETCD_CLIENT_CERT_AUTH=true
Environment=ETCD_TRUSTED_CA_FILE=/srv/etcd-certs/cacert.pem
Environment=ETCD_CERT_FILE=/srv/etcd-certs/etcd.pem
Environment=ETCD_KEY_FILE=/srv/etcd-certs/etcd-key.pem
Environment=ETCD_PEER_CLIENT_CERT_AUTH=true
Environment=ETCD_PEER_TRUSTED_CA_FILE=/srv/etcd-certs/cacert.pem
Environment=ETCD_PEER_CERT_FILE=/srv/etcd-certs/peer-etcd.pem
Environment=ETCD_PEER_KEY_FILE=/srv/etcd-certs/peer-etcd-key.pem
ExecStart=/usr/local/bin/etcd
Restart=always
RestartSec=10s
LimitNOFILE=40000

[Install]
WantedBy=multi-user.target

Third, creates etcd data folder and system account on master1:

sudo mkdir -p /var/lib/etcd

# etcd fails if file permissions are not set correctly
sudo chmod -R 700 /var/lib/etcd

# Creates system user
sudo adduser --system etcd
sudo addgroup etcd
sudo usermod -aG etcd etcd

Fourth, install etcd as a service:

sudo systemctl daemon-reload
sudo systemctl enable etcd
sudo systemctl stop etcd
sudo systemctl start etcd.service
systemctl status etcd.service

# Check for logs
journalctl -xeu etcd

Fifth, ssh into other master nodes (verify that step 1 is done by etcd –version) and perform the following:

ssh ubuntu@master2
sudo vi /lib/systemd/system/etcd.service
# Variations for step 2 for master2
# Inserts the following into etcd.service
[Unit]
Description=etcd key-value store
Documentation=https://etcd.io/docs/v3.4.0/

[Service]
User=etcd
Type=notify
Environment=ETCD_UNSUPPORTED_ARCH=arm64
#Loggingg flags
Environment=ETCD_LOGGER=zap
# Member flags
Environment=ETCD_NAME=infra2
Environment=ETCD_DATA_DIR=/var/lib/etcd
Environment=ETCD_LISTEN_PEER_URLS=https://192.168.100.173:2380
Environment=ETCD_LISTEN_CLIENT_URLS=https://192.168.100.173:2379,https://127.0.0.1:2379
Environment=ETCD_HEARTBEAT_INTERVAL=1000
Environment=ETCD_ELECTION_TIMEOUT=5000
# Clustering flags
Environment=ETCD_INITIAL_ADVERTISE_PEER_URLS=https://192.168.100.173:2380
Environment=ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster-1
Environment=ETCD_INITIAL_CLUSTER=infra1=https://192.168.100.119:2380,infra2=https://192.168.100.173:2380,infra3=https://192.168.100.100:2380
Environment=ETCD_INITIAL_CLUSTER_STATE=new
Environment=ETCD_ADVERTISE_CLIENT_URLS=https://192.168.100.173:2379
# Security flags
Environment=ETCD_CLIENT_CERT_AUTH=true
Environment=ETCD_TRUSTED_CA_FILE=/srv/etcd-certs/cacert.pem
Environment=ETCD_CERT_FILE=/srv/etcd-certs/etcd.pem
Environment=ETCD_KEY_FILE=/srv/etcd-certs/etcd-key.pem
Environment=ETCD_PEER_CLIENT_CERT_AUTH=true
Environment=ETCD_PEER_TRUSTED_CA_FILE=/srv/etcd-certs/cacert.pem
Environment=ETCD_PEER_CERT_FILE=/srv/etcd-certs/peer-etcd.pem
Environment=ETCD_PEER_KEY_FILE=/srv/etcd-certs/peer-etcd-key.pem
ExecStart=/usr/local/bin/etcd
Restart=always
RestartSec=10s
LimitNOFILE=40000

[Install]
WantedBy=multi-user.target
ssh ubuntu@master3
sudo vi /lib/systemd/system/etcd.service
# Variations for step 2 for master3
# Inserts the following into etcd.service
[Unit]
Description=etcd key-value store
Documentation=https://etcd.io/docs/v3.4.0/

[Service]
User=etcd
Type=notify
Environment=ETCD_UNSUPPORTED_ARCH=arm64
#Loggingg flags
Environment=ETCD_LOGGER=zap
# Member flags
Environment=ETCD_NAME=infra3
Environment=ETCD_DATA_DIR=/var/lib/etcd
Environment=ETCD_LISTEN_PEER_URLS=https://192.168.100.100:2380
Environment=ETCD_LISTEN_CLIENT_URLS=https://192.168.100.100:2379,https://127.0.0.1:2379
Environment=ETCD_HEARTBEAT_INTERVAL=1000
Environment=ETCD_ELECTION_TIMEOUT=5000
# Clustering flags
Environment=ETCD_INITIAL_ADVERTISE_PEER_URLS=https://192.168.100.100:2380
Environment=ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster-1
Environment=ETCD_INITIAL_CLUSTER=infra1=https://192.168.100.119:2380,infra2=https://192.168.100.173:2380,infra3=https://192.168.100.100:2380
Environment=ETCD_INITIAL_CLUSTER_STATE=new
Environment=ETCD_ADVERTISE_CLIENT_URLS=https://192.168.100.100:2379
# Security flags
Environment=ETCD_CLIENT_CERT_AUTH=true
Environment=ETCD_TRUSTED_CA_FILE=/srv/etcd-certs/cacert.pem
Environment=ETCD_CERT_FILE=/srv/etcd-certs/etcd.pem
Environment=ETCD_KEY_FILE=/srv/etcd-certs/etcd-key.pem
Environment=ETCD_PEER_CLIENT_CERT_AUTH=true
Environment=ETCD_PEER_TRUSTED_CA_FILE=/srv/etcd-certs/cacert.pem
Environment=ETCD_PEER_CERT_FILE=/srv/etcd-certs/peer-etcd.pem
Environment=ETCD_PEER_KEY_FILE=/srv/etcd-certs/peer-etcd-key.pem
ExecStart=/usr/local/bin/etcd
Restart=always
RestartSec=10s
LimitNOFILE=40000

[Install]
WantedBy=multi-user.target
# Follows step 3
sudo mkdir -p /var/lib/etcd
sudo chmod -R 700 /var/lib/etcd
sudo adduser --system etcd
sudo addgroup etcd
sudo usermod -aG etcd etcd
sudo mkdir -p /srv/etcd-certs/
sudo mv ~/*.csr ~/*.pem /srv/etcd-certs/
sudo chown -R etcd:etcd /srv/etcd-certs/ /var/lib/etcd /usr/local/bin/etcd /usr/local/bin/etcdctl
# Follows step 4
sudo systemctl daemon-reload
sudo systemctl enable etcd
sudo systemctl start etcd
sudo systemctl stop etcd
systemctl status etcd

Finally, by following etcd [security]https://etcd.io/docs/v3.4.0/op-guide/security/) guide, I test my etcd setup using:

sudo curl -k -L https://localhost:2379/metrics --cacert /srv/etcd-certs/cacert.pem --cert /srv/etcd-certs/etcd.pem  --key /srv/etcd-certs/etcd-key.pem | grep -v debugging
sudo etcdctl --cacert /srv/etcd-certs/cacert.pem --cert /srv/etcd-certs/etcd.pem  --key /srv/etcd-certs/etcd-key.pem member list

Troubleshooting

Request sent was ignored by remote peer due to cluster ID mismatch

I solved mine by changing ETCD_INITIAL_CLUSTER_TOKEN=[something else]. You can check the end point health.

sudo etcdctl --endpoints=https://cluster-endpoint:2379 --cacert=/srv/etcd-certs/cacert.pem --cert=/srv/etcd-certs/etcd.pem --key=/srv/etcd-certs/etcd-key.pem endpoint health

If it still fails, you may try to re-create the folder by:

sudo rm -rf /var/lib/etcd && sudo mkdir -p /var/lib/etcd
sudo chown -R etcd:etcd /var/lib/etcd && sudo chmod -R 700 /var/lib/etcd

ERROR:There is already a certificate for /C=SG/ST=SG/O=seehiong/CN=192.168.100.119

You may try to revoke the old certificate and sign the CSR again:

openssl ca -revoke /root/ca/newcerts/1240.pem

Replacing a faulty member

You may have to re-configure your etcd cluster when your SD card failed

# Get member list
sudo etcdctl --cacert /srv/etcd-certs/cacert.pem --cert /srv/etcd-certs/etcd.pem  --key /srv/etcd-certs/etcd-key.pem member list

# Delete the member based on the ID
sudo etcdctl --cacert /srv/etcd-certs/cacert.pem --cert /srv/etcd-certs/etcd.pem  --key /srv/etcd-certs/etcd-key.pem member remove 34ef554257cff34e

# Add the previous member (previous settings remain the same, e.g. IP address)
sudo etcdctl --cacert /srv/etcd-certs/cacert.pem --cert /srv/etcd-certs/etcd.pem  --key /srv/etcd-certs/etcd-key.pem member add infra3 --peer-urls=https://192.168.100.182:2380

Similar message as this will appear:

add-new-etcd-member


sudo vi /lib/systemd/system/etcd.service
# Make the following changes
Environment=ETCD_INITIAL_CLUSTER_STATE=existing
# Add new configuration
Environment=ETCD_INITIAL_CLUSTER="infra2=https://192.168.100.181:2380,infra3=https://192.168.100.182:2380,infra1=https://192.168.100.180:2380"
# Restart etcd
sudo systemctl daemon-reload
sudo systemctl start etcd