High Availability Openshift Origin cluster with Ceph persistent storage

Abstract

In this post, we’ll install an high availibility microservice cluster based on Openshift origin 3.7, CentOS 7.4, Ceph persistent storage and HA Proxy.

Openshift provides docker, kubernetes and other open source great projects like haproxy, for ex. It’s a very complex system, and for this kind of project you can’t install it without using a configuration management tool : Ansible.

This tuto is very straight forward, thanks to Ansible.

Architecture

See original image

Pre requisites

  • 3 Physical hosts (optional, for real HA)
  • An admin VM, CentOS 7
  • 8 VMs CentOS 7.0 x86_64 “minimal install” :
  • +2 VM for high availabvility (ha proxy / keepalived)

You will have to provision the following VMs (assuming your admin node exists already):

Physical host : hyp03 (Intel Nuc 16GB RAM)

  • cm0 : master node : 2GB RAM, vda:16GB, vdb:3GB
  • ci0 : infrastructure node : 2GB RAM, vda:16GB

Physical host : hyp01 (X10SDV 8 cores 64GB RAM)

  • cm1 : master node : 2GB RAM, vda:16GB, vdb:3GB
  • ci1 : infrastructure node : 2GB RAM
  • cn0 : application node : 8GB RAM, vda:16GB, vdb:3GB

Physical host : hyp02 (X10SDV 4 cores 32GB RAM)

  • cm2 : master node : 2GB RAM, vda:16GB, vdb:3GB
  • ci2 : infrastructure node : 2GB RAM, vda:16GB
  • cn1 : application node : 8GB RAM, vda:16GB, vdb:3GB

HA

  • ha0 : tiny Centos 7 VM with 512 MB RAM@
  • ha1 : id.

Procedure

Provision the virtual machines

automated way (optional)

You may want to do this very fast. I’m using virsh for that task :

For example for the first VM (cm0), and so on, follow this process

createcm0.sh :

virt-install --name=cm0 --controller type=scsi,model=virtio-scsi --disk path=/home/libvirt/cm0.qcow2,device=disk,size=16,bus=virtio --disk path=/home/libvirt/cm0-1.qcow2,device=disk,size=3,bus=virtio --graphics spice --vcpus=2 --ram=2048 --network bridge=br0 --os-type=linux --os-variant="CentOS7.0" --location=http://mirror.int.intra/centos/7/os/x86_64/ --extra-args="ks=http://mirror.int.intra/install/cm0" 
virsh autostart cm0

With that kickstart file hosted on a web server of your choice (I use http://mirror.int.intra on a synology NAS) :

# Text mode or graphical mode?
text

# Install or upgrade?
install
eula --agreed

# Language support
lang en_US

# Keyboard
keyboard fr 

# Network
network --device=eth0 --onboot yes --hostname=cm0.int.intra --noipv6 --bootproto static --ip 10.0.1.1 --netmask 255.255.0.0 --gateway 10.0.0.1 --nameserver 10.0.0.1

# Network installation path
url --url http://mirror.int.intra/centos/7/os/x86_64/

# installation path
#repo --name="Centos-Base" --baseurl=http://mirror.int.intra/centos/7/os/x86_64/ --install --cost 1 --noverifyssl
#repo --name="Centos-Updates" --baseurl=http://mirror.int.intra/centos/7/updates/x86_64/ --install --cost 1 --noverifyssl
repo --name="Puppet" --baseurl="https://yum.puppetlabs.com/el/7/products/x86_64/" --noverifyssl

# Root password - change to a real password (use "grub-md5-crypt" to get the crypted version)
rootpw mypassword

# users 
# users
#user --name=cephadm --password=<your pass in md5 format> --iscrypted --homedir=/home/cephadm

# Firewall
firewall --disabled

# Authconfig
auth       --enableshadow --passalgo sha512

# SElinux
selinux --disabled

# Timezone
timezone --utc Europe/Paris --ntpservers=10.0.0.1

# Bootloader
#bootloader --location=mbr --driveorder=vda
#zerombr

# Partition table
clearpart --drives=vda --all
autopart

# Installation logging level
logging --level=info

# skip x window
skipx

services   --enable ntpd

# Reboot after installation?
reboot 

##############################################################################
#
# packages part of the KickStart configuration file
#
##############################################################################
%packages
#
# Use package groups from "Software Development Workstation (CERN Recommended Setup)"
# as defined in /afs/cern.ch/project/linux/cern/slc6X/i386/build/product.img/installclasses/slc.py
#
@Base 
@Core 
yum-utils
%end

##############################################################################
#
# post installation part of the KickStart configuration file
#
##############################################################################
%post
#
# This section describes all the post-Anaconda steps to fine-tune the installation
#

# redirect the output to the log file
exec >/root/ks-post-anaconda.log 2>&1
# show the output on the 7th console
tail -f /root/ks-post-anaconda.log >/dev/tty7 &
# changing to VT 7 that we can see what's going on....
/usr/bin/chvt 7

#
# Update the RPMs
#
yum-config-manager --add-repo http://mirror.int.intra/install/CentOS-Base.repo 
yum update -y

rpm -ivh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
yum update -y

rpm -Uivh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
yum --enablerepo=elrepo-kernel install -y kernel-ml
yum-config-manager --disable elrepo

rpm -ivh https://yum.puppetlabs.com/puppetlabs-release-pc1-el-7.noarch.rpm
yum -y install puppet-agent

/usr/sbin/grub2-set-default 0

# puppet
yum -y install puppet
systemctl enable puppet

# Done
exit 0

%end

manual way

Just create your VMs as you use to..

more automated ..

Openstack way (i’ll do this later…) :)

Prepare your environment

All the following is done with the root user (yes, .. it’s for home use only..)

On the Ansible VM (admin node)

Install ansible, and Openshift playbooks

# yum install centos-release-openshift-origin
# yum install openshift-ansible-playbooks

dsh, ssh…

Configure dsh :

On your admin node only :

I use dancer shell, you will see how to do that in another post (same blog)
Create three group files under .dsh/group/ :

* oc : all VMs with their relative DNS names (cm0..), including the ha[0..1] VMs.
* ocfqdn : all VMs as above but with their full qualified domain name (cm0.int.intra...)
* ocd : all VMs you want to run containers (depending of your choice. for this example, we will use all VMs but the ha[0..1]) 

ssh certificate authentication

On all hosts, copy your root certificate

Run this on your admin node.

# cat .dsh/group/oc  | while read in; do ssh-copy-id root@$in ; done 
# cat .dsh/group/ocfqdn  | while read in; do ssh-copy-id root@$in ; done 

Note that you can automate this process..

#!/bin/sh
# sudo yum install moreutils sshpass openssh-clients
echo 'Enter password:';
read -s SSHPASS;
export SSHPASS;
cat ~/.dsh/group/oc  | while read in; do sshpass -e ssh-copy-id -o StrictHostKeyChecking=no root@$in -p 22 ; done
export SSHPASS=''

Important for Ansible to run well, accept the ssh keys for the hosts in fqdn format

cat ~/.dsh/group/ocfqdn  | while read in; do yes | ssh -o StrictHostKeyChecking=no root@$in -p 22 uptime ; done

Check the result:

[root@admin openshift]# dsh -Mg oc uptime
cm0:  15:46:29 up 47 min,  1 user,  load average: 0.00, 0.00, 0.00
cm1:  15:46:29 up 50 min,  0 users,  load average: 0.00, 0.00, 0.00
cm2:  15:46:29 up 40 min,  0 users,  load average: 0.00, 0.00, 0.00
ci0:  15:46:29 up  1:01,  0 users,  load average: 0.00, 0.00, 0.00
ci1:  15:46:30 up  1:05,  0 users,  load average: 0.24, 0.05, 0.02
ci2:  15:46:30 up 58 min,  0 users,  load average: 0.00, 0.00, 0.00
cn0:  15:46:30 up 19 min,  0 users,  load average: 0.00, 0.00, 0.00
cn1:  15:46:30 up 24 min,  0 users,  load average: 0.00, 0.00, 0.00

check the same with the ocfqdn group

Docker configuration

We will configure the persistent storage for docker

# dsh -Mg ocd "yum install -y docker"
# dsh -Mg ocd "vgcreate vgdocker /dev/vdb"
# dsh -Mg ocd 'printf "CONTAINER_THINPOOL=docker\nVG=vgdocker\n" | sudo tee /etc/sysconfig/docker-storage-setup'
# dsh -Mg ocd "docker-storage-setup"

And check the results :

# dsh -Mg ocd "lvs"

Enable and start docker

# dsh -Mg ocd "systemctl enable docker.service"
# dsh -Mg ocd "systemctl start docker.service"

Deploy Openshift with Ansible

Define your configuration

hosts file

We will define our cluster in a single configuration file (hosts file) for Ansible

Create a file named “c1.hosts”

# Create an OSEv3 group that contains the master, nodes, etcd, and lb groups.
# The lb group lets Ansible configure HAProxy as the load balancing solution.
# Comment lb out if your load balancer is pre-configured.
[OSEv3:children]
masters
nodes
etcd
lb

# Set variables common for all OSEv3 hosts
[OSEv3:vars]
ansible_ssh_user=root
openshift_deployment_type=origin
#openshift_release=v3.6

# Uncomment the following to enable htpasswd authentication; defaults to
# DenyAllPasswordIdentityProvider.
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/htpasswd'}]
openshift_master_htpasswd_users={'admin': '$apr1$CkPI2pl3$aNvboUNKKONazHRjLLJc3/'}

# Native high availbility cluster method with optional load balancer.
# If no lb group is defined installer assumes that a load balancer has
# been preconfigured. For installation the value of
# openshift_master_cluster_hostname must resolve to the load balancer
# or to one or all of the masters defined in the inventory if no load
# balancer is present.
openshift_master_cluster_method=native
openshift_master_cluster_hostname=cm.os.int.intra
openshift_master_cluster_public_hostname=cm.os.int.intra
openshift_public_hostname=cm.os.int.intra
openshift_master_api_port=443
openshift_master_console_port=443
openshift_docker_options='--selinux-enabled --insecure-registry 172.30.0.0/16'
openshift_router_selector='region=infra'
openshift_registry_selector='region=infra'

# other config options
openshift_master_default_subdomain=apps.os.int.intra
osm_default_subdomain=apps.os.int.intra
osm_default_node_selector='region=primary'
osm_cluster_network_cidr=10.128.0.0/14
openshift_portal_net=172.30.0.0/16

# apply updated node defaults
#openshift_node_kubelet_args={'pods-per-core': ['10'], 'max-pods': ['250'], 'image-gc-high-threshold': ['90'], 'image-gc-low-threshold': ['80']}

# enable ntp on masters to ensure proper failover
openshift_clock_enabled=true

# for ours, no need for the minimum cluster requirements.. we're at home...
openshift_disable_check=memory_availability,disk_availability,docker_storage
template_service_broker_install=false

# host group for masters
[masters]
cm[0:2].os.int.intra

# host group for etcd
[etcd]
cm[0:2].os.int.intra

# Specify load balancer host
[lb]
ha0.int.intra
ha1.int.intra

# host group for nodes, includes region info
[nodes]
cm[0:2].os.int.intra openshift_node_labels="{'region': 'master'}" openshift_schedulable=False
ci[0:2].os.int.intra openshift_node_labels="{'region': 'infra', 'zone': 'default'}"
cn0.os.int.intra openshift_node_labels="{'region': 'primary', 'zone': 'east'}"
cn1.os.int.intra openshift_node_labels="{'region': 'primary', 'zone': 'west'}"    

haproxy/keepalived

You must install haproxy and keepalived on the HA hosts (ha[0..1])

HA Proxy

This will be done by Ansible

keepalived

On each hosts, install keepalived :

yum install -y keepalived
systemctl enable keepalived

Edit on the two hosts /etc/keepalived/keepalived.conf so that

  • on the host ha0 you have “state MASTER” and on ha1 you have “state BACKUP”
  • the IPs match your load balanced ips for cm.os.int.intra and ci.os.int.intra (virtual_ipaddress field)

Here is the ha1 conf file :

! Configuration File for keepalived

vrrp_script chk_haproxy {
  script "killall -0 haproxy" # check the haproxy process
  interval 2 # every 2 seconds
  weight 2 # add 2 points if OK
}

vrrp_instance VI_1 {
  interface eth0 # interface to monitor
  state BACKUP # MASTER on haproxy1, BACKUP on haproxy2
  virtual_router_id 50
  priority 100 # 101 on haproxy1, 100 on haproxy2
  virtual_ipaddress {
    10.0.1.50 # virtual ip address
  }
  track_script {
    chk_haproxy
  }
}

vrrp_instance VI_2 {
  interface eth0 # interface to monitor
  state BACKUP # MASTER on haproxy1, BACKUP on haproxy2
  virtual_router_id 51
  priority 100 # 101 on haproxy1, 100 on haproxy2
  virtual_ipaddress {
    10.0.1.51 # virtual ip address
  }
  track_script {
    chk_haproxy
  }
}

Start keepalived

systemctl start keepalived

DNS entries

My choice is to define a subdomain for my openshift cluster so that apps will reside in thie namespace : *.app.os.int.intra

Define the VMs under the subdomain os.int.intra

$ORIGIN os.int.intra.                                     
ci0                     A       10.0.1.200                
ci1                     A       10.0.1.201                
ci2                     A       10.0.1.202                
cm0                     A       10.0.1.1                  
cm1                     A       10.0.1.2                  
cm2                     A       10.0.1.3                  
cn0                     A       10.0.1.100                              
cn1                     A       10.0.1.101  

(not for the ha nodes, because we have to join them from the base FQDN namespace int.intra)

Define the apps IPs, with the help of a wildcard on the three infrastructure nodes :

$ORIGIN apps.os.int.intra.                                
*                       A       10.0.1.200                
*                       A       10.0.1.201                
*                       A       10.0.1.202 

LB

under the $ORIGIN os.int.intra :

kubernetes              CNAME   cm
    cm                      A       10.0.1.50
    ci                      A       10.0.1.51

etcd

in the $ORIGIN _tcp.os.int.intra. subsection:

$ORIGIN _tcp.os.int.intra.                         
$TTL 300        ; 5 minutes                               
_etcd-client            SRV     0 0 2379 cm0.os.int.intra.
                        SRV     0 0 2379 cm1.os.int.intra.
                        SRV     0 0 2379 cm2.os.int.intra.
_etcd-server            SRV     0 0 2380 cm0.os.int.intra.
                        SRV     0 0 2380 cm1.os.int.intra.
                        SRV     0 0 2380 cm2.os.int.intra.
_ceph-mon 60 IN SRV 10 60 6789 n0.int.intra.              
_ceph-mon 60 IN SRV 10 60 6789 n4.int.intra.              
_ceph-mon 60 IN SRV 10 60 6789 n8.int.intra.

Run ansible

# ansible-playbook -i hosts /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml

I’ve been sometimes worried by installation errors (for ex. when deploying routers on infrastructure nodes and service template..). In that case, just restarting the specifi playbook resolved the issue. Sometimes, containers are not so fast to get up and run, so that these issues can occur… but it can be fixed with a simple playbook restart, don’t worry about that.

1 hour later… Voila !!

INSTALLER STATUS *************************************************************************************************************************************************************************************
Initialization             : Complete
Health Check               : Complete
etcd Install               : Complete
Load balancer Install      : Complete
Master Install             : Complete
Master Additional Install  : Complete
Node Install               : Complete
Hosted Install             : Complete
Service Catalog Install    : Complete

Check your brand new cluster :

See original image

Set your admin pass (if not set in your hosts file..)

[root@cm0 ~]# htpasswd -c /etc/origin/htpasswd admin

and assign the admin role

[root@cm0 ~]# oadm policy add-cluster-role-to-user cluster-admin admin
cluster role "cluster-admin" added: "admin"

See original image

Create a persistent Ceph Volume

Install ceph on all nodes

I’ve a luminous ceph cluster -> install the luminous ceph client libraries on the nodes

On the admin node, create a ceph.repo file

[Ceph]
name=Ceph packages for $basearch
baseurl=https://download.ceph.com/rpm-luminous/el7/$basearch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
priority=2

[Ceph-noarch]
name=Ceph noarch packages
baseurl=https://download.ceph.com/rpm-luminous/el7/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
priority=2

[ceph-source]
name=Ceph source packages
baseurl=https://download.ceph.com/rpm-luminous/el7/SRPMS
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
priority=2

Deploy it on all nodes

cat ceph.repo | dsh -g oc -i -c 'sudo tee /etc/yum.repos.d/ceph.repo'

Install the latest client libs

dsh -Mg oc "yum install -y ceph-common"

Ceph cluster part of the work

Create a pool

ceph osd pool create kube 1024

It will be associated with your default policy. for my needs, i’ve defined two pools one on a default SATA pool of disks, another one in a another SSD pool.

The following describes the creation of an RBD-capable pool usable for Openshift

Declare the user which will have access to that pool :

ceph auth get-or-create client.kube mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=kube' -o ceph.client.kube.keyring

Get the client.kube user secret key in a base64 format by issuing the following command :

[root@admin openshift]# ceph auth get-key client.kube |base64
<your key>

Keep the resulting secret for the following steps

Finally, copy the keyring of the ceph user that can manage PVs to the working nodes. this is required at the time of writing this doc because there’s a bug in kubernetes when removing locks on rbd images : Kube uses a keyring in /etc/ceph rather than use the secret stored in openshift configuration ( :( )

You will give this user some capabilities to remove locks :

[root@admin openshift]# ceph auth caps client.kube mon 'allow r, allow command "osd blacklist"' osd 'allow class-read object_prefix rbd_children, allow rwx pool=kube, allow rwx pool=kubessd' -o ceph.client.kube.keyring
updated caps for client.kube

then export the key ..

[root@admin openshift]#  ceph auth get client.kube > ceph.client.kube.keyring
exported keyring for client.kube
[root@admin openshift]# cat ceph.client.kube.keyring | dsh -m cn0.os.int.intra -i -c 'sudo tee /etc/ceph/ceph.keyring'
[client.kube]
    key = <your key>
    caps mon = "allow r"
    caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=kube, allow rwx pool=kubessd"
[root@admin openshift]# cat ceph.client.kube.keyring | dsh -m cn1.os.int.intra -i -c 'sudo tee /etc/ceph/ceph.keyring'
[client.kube]
    key = <your key> 
    caps mon = "allow r"
    caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=kube, allow rwx pool=kubessd"

Openshift part of the work

You will configure storage classes, physical volume claims (request for storage)

This part is done by issuing commands on a master node (for ex cm0)

Create a new project with the help of the Web Interface

For this example, create a simple “test” project

log on to a master and select the new project

[root@cm0 ~]# oc project test
Now using project "test" on server "https://cm.os.int.intra".

Record the ceph user secret in the Openshift cluster

Create a ceph-user-secret.yaml file :

apiVersion: v1
kind: Secret
metadata:
  name: ceph-kube-secret
  namespace: test 
data:
  key: <your key>
type:
  kubernetes.io/rbd

Import it :

[root@cm0 ~]# oc create -f ceph-kube-secret.yaml 
secret "ceph-kube-secret" created

Create a storage class

First of all, create a storage class with a ceph user that can create RBD on the kube pool we have juste created
The storage class will embed the user secret name and password of this user

create a storage class yaml file on the master node (cm0 in my example)
Note the monitors line, it will list the ceph monitors.

Later, it will be fine to look at DNS SRV entries. it seems to be scheduled..

[root@cm0 ~]# vi storageclass.yaml

Insert this config :

apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
  name: dynamic
  annotations:
    storageclass.beta.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/rbd
parameters:
  monitors: 10.0.0.210:6789,10.0.0.214:6789,10.0.0.218:6789
  adminId: kube
  adminSecretName: ceph-kube-secret
  adminSecretNamespace: test 
  pool: kube
  userId: kube
  userSecretName: ceph-kube-secret

create the class on your openshift cluster :

[root@cm0 ~]# oc create -f storageclass.yaml 
storageclass "dynamic" created
[root@cm0 ~]# oc get storageclass
NAME                TYPE
dynamic (default)   kubernetes.io/rbd   

At this point, you can create storage pools with the web interface by selecting the storage class we have just selected. You can also create a PV claim with a YAML file.

create a Physical Volume

To create a 2GB volume for your project, you will have to describe a PV claim that will create the PV

The following shows how to create a PVC with a yaml file, but now it’s very easy to create it with the Web GUI -> try it : go to the storage item on your newly created project, select the storage class, the name of the PV(C), and the size. It will automatically create the PV in the previsouly defined ceph pools.

pvc.yaml:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: ceph-claim
  annotations:
    volume.beta.kubernetes.io/storage-class: dynamic
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi

create it and check the result :

[root@cm0 ~]# oc create -f pvc.yaml 
persistentvolumeclaim "ceph-claim" created
[root@cm0 ~]# oc get pvc
NAME         STATUS    VOLUME                                     CAPACITY   ACCESSMODES   STORAGECLASS   AGE
ceph-claim   Bound     pvc-1c9444af-ef33-11e7-a68e-525400946207   2Gi        RWO           dynamic        3s

the PV has been created (bound), and is usable for your test project

Test your physical volume with a postgresql 9.4 container

describe the pod in a file (ceph-pgsql.yam for ex):

apiVersion: v1
kind: Pod
metadata:
  name: postgresql94
spec:
  containers:
  - name: postgresql94 
    image: centos/postgresql-94-centos7      
    command: ["sleep", "60000"]
    volumeMounts:
    - name: ceph-claim
      mountPath: /var/lib/pgsql/data 
      readOnly: false
  volumes:
  - name: ceph-claim
    persistentVolumeClaim:
      claimName: ceph-claim

Create the pod :

[root@cm0 ~]# oc create -f ceph-pgsql.yaml 
pod "postgresql94" created

Look at the test project at the web interface, you will see the pod creation process (including the postgresql container download)

The following image show the result :

See original image

Look at the following snapshot, in the terminal tab you enter the pod with a shell emulation, you will see an RBD (/dev/rbd0) device mounted, as a persistent storage and formatted in ext4 and used by the postgresql pod for his data !!

See original image

Done !!

elastic

sysctl -w vm.max_map_count=262144 on nodes