Nuxeo CMS on Openshift origin 3.7

Abstract

My Openshift Origin 3.7 cluster host some great apps I use everyday for my needs. One of that great apps is Nuxeo, an open source CMS system : I use Nuxeo to manage (store, index, ..) my family documents and other documents I want to keep for work purposes.

This post is created for my knowledge base (as usual) but I found interesting to publish it because it was not so easy to bring this app up with all services running in a prod-like environment.

Please refer to this blog post for the requirements before the following process.

Step by step..

Create our (empty) project on Openshift

On your openshift admin console, create a new project named Nuxeo

See original image

The project is created. click on it..

See original image

Now connect to a management node, for example cm0.os.int.intra, log on openshift and select your new project:

[root@cm0 ~]# oc login
Authentication required for https://cm.os.int.intra:443 (openshift)
Username: admin
Password: 
Login successful.

You have access to the following projects and can switch between them with 'oc project <projectname>':

    default
    kube-public
    kube-service-catalog
    kube-system
    logging
    management-infra
    nuxeo
    openshift
    openshift-ansible-service-broker
    openshift-infra
    openshift-node
  * test

Using project "test".
    [root@cm0 ~]# oc project nuxeo
Now using project "nuxeo" on server "https://cm.os.int.intra:443".
[root@cm0 ~]# 

Create your persistent volumes

For my needs, I use two storage classes : the first one on a standard HDD ceph pool, the second one on a fast SSD pool.

Create an openshift secret used to create physical volumes on a ceph cluster :

ceph-kube-secret.yaml :

apiVersion: v1
kind: Secret
metadata:
  name: ceph-kube-secret
  namespace: kube-system
data:
  key: QVFDUEV872JlPOuYd2MDJVNkYwMFA1bi9QS1ZkZ0E9PQ==
type:
  kubernetes.io/rbd

Create it :
[root@cm0 ~]# oc create ceph-kube-secret.yaml

The two YAML files I’ve used are provided below (use your own mon IP addresses and variables…)

storageclass.yaml
apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
name: dynamic
annotations:
storageclass.beta.kubernetes.io/is-default-class: “true”
provisioner: kubernetes.io/rbd
parameters:
monitors: 10.0.0.210:6789,10.0.0.214:6789,10.0.0.218:6789
adminId: kube
adminSecretName: ceph-kube-secret
adminSecretNamespace: kube-system
pool: kube
userId: kube
userSecretName: ceph-kube-secret

storageclass-ssd.yaml
apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
name: dynamic-ssd
annotations:
storageclass.beta.kubernetes.io/is-default-class: “false”
provisioner: kubernetes.io/rbd
parameters:
monitors: 10.0.0.210:6789,10.0.0.214:6789,10.0.0.218:6789
adminId: kube
adminSecretName: ceph-kube-secret
adminSecretNamespace: kube-system
pool: kubessd
userId: kube
userSecretName: ceph-kube-secret

create the two classes :
[root@cm0 ~]# oc create -f storageclass.yaml
[root@cm0 ~]# oc create -f storageclass-ssd.yaml
[root@cm0 ~]# oc get storageclass
NAME TYPE
dynamic (default) kubernetes.io/rbd
dynamic-ssd kubernetes.io/rbd

Finally, create a secret for your nuxeo project that will be used to mount the physical volumes :

ceph-kube-secret-nuxeo.yaml

apiVersion: v1
kind: Secret
metadata:
  name: ceph-kube-secret
  namespace: nuxeo
data:
  key: QVFDUEV872JlPOuYd2MDJVNkYwMFA1bi9QS1ZkZ0E9PQ==
type:
  kubernetes.io/rbd

And import it

[root@cm0 ~]# oc create -f ceph-kube-secret-nuxeo.yaml
secret "ceph-kube-secret" created

Create an elasticsearch container

For this part of the work, you will need a recent elastic docker container. Elastic.co supports docker and provides up to date containers.

But you might want to customize them to your environment in order to allow Nuxeo to use elastic.

Fortunately, openshift provides the toolbox to make it possible, including a smart embedded docker registry.

We will use a master node to do this work. Always on cm0.os.int.intra, first download a recent docker container image :

[root@cm0 ~]#  docker pull docker.elastic.co/elasticsearch/elasticsearch:5.6.5
Trying to pull repository docker.elastic.co/elasticsearch/elasticsearch ... 
5.6.5: Pulling from docker.elastic.co/elasticsearch/elasticsearch

85432449fd0f: Downloading [================>                                  ]  24.3 MB/73.43 MB
e0cbfc3faa3e: Downloading [================>                                  ] 23.21 MB/68.79 MB
d2c62af14bd0: Download complete 
429fdbedf602: Downloading [===============================>                   ] 21.35 MB/33.82 MB
c8d14fb24468: Waiting 
8c7507908328: Waiting 
073ffc56cfbb: Waiting 
7e46ec36fb55: Waiting 
51f1e02ed125: Waiting 
eac8f57db5d3: Waiting 

Second : set openshift to allow “fixed” UID into containers. (Openshift default uses random UIDs)

[root@cm0 ~]# oc adm policy add-scc-to-group anyuid system:authenticated
scc "anyuid" added to groups: ["system:authenticated"]

Then for the need of nuxeo, I will customize this elastic container with the following dockerfile :

I will use a very simple elasticsearch config file (elasticsearch.yaml)
(the following shows only uncommented lines)

[root@cm0 5.6.5]# grep -v '^ *#' elasticsearch.yml 
cluster.name: nuxeo
node.name: ec01 
node.attr.rack: r1
network.host: 0.0.0.0
thread_pool.bulk.queue_size: 500

put your elastic yaml config file in a work directory, with the following (very simple) Dockerfile file :

[root@cm0 5.6.5]# cat Dockerfile
FROM docker.elastic.co/elasticsearch/elasticsearch:5.6.5
USER root
COPY elasticsearch.yml /usr/share/elasticsearch/config/
RUN chown elasticsearch:elasticsearch /usr/share/elasticsearch/config/elasticsearch.yml
USER elasticsearch

Build the custom container:

[root@cm0 5.6.5]# docker build --tag=elasticsearch-5.6.5-custom  .
Sending build context to Docker daemon 12.29 kB
Step 1 : FROM docker.elastic.co/elasticsearch/elasticsearch:5.6.5
 ---> 6ffc97b39054
Step 2 : USER root
 ---> Running in c64a8f01836c
 ---> de907f5dd68c
Removing intermediate container c64a8f01836c
Step 3 : COPY elasticsearch.yml /usr/share/elasticsearch/config/
 ---> 839eb882c112
Removing intermediate container 7a7d2de751f6
Step 4 : RUN chown elasticsearch:elasticsearch /usr/share/elasticsearch/config/elasticsearch.yml
 ---> Running in 00a7333100c9
 ---> 3f47b5f294d9
Removing intermediate container 00a7333100c9
Step 5 : USER elasticsearch
 ---> Running in 002fe9d51ee8
 ---> 5d828e985436
Removing intermediate container 002fe9d51ee8
Successfully built 5d828e985436

The following command shows the work :

[root@cm0 5.6.5]# docker images
REPOSITORY                                      TAG                 IMAGE ID            CREATED             SIZE
elasticsearch-5.6.5-custom                      latest              78b0148a9731        4 seconds ago       555.1 MB
docker.io/openshift/origin-service-catalog      latest              88397ed6c5eb        8 days ago          283.7 MB
docker.elastic.co/elasticsearch/elasticsearch   5.6.5               6ffc97b39054        5 weeks ago         555.1 MB
docker.io/openshift/origin-pod                  v3.7.0              73b7557fbb3a        6 weeks ago         218.4 MB
[root@cm0 5.6.5]# 

Then you will upload the custom container to the openshift-embedded image registry :

register your elastic custom container

To do this part of the work, you will have to login to your registry

Connect to the registry console :

This can be done with an URL provided by open shift. The route is available on the default project :

See original image

Use the registry console URL, and connect using your cluster admin creds.

On the overview pane, openshift tells you the two commands to run on your master node (su docker login… and oc login –token…). Run these two commands on your master node, and you should be connected.

tag your image

[root@cm0 5.6.5]# docker tag elasticsearch-5.6.5-custom docker-registry-default.apps.os.int.intra/nuxeo/elasticsearch-5.6.5-custom

this command should return nothing.

push the custom elastic image to your registry
[root@cm0 5.6.5]# docker push docker-registry-default.apps.os.int.intra/nuxeo/elasticsearch-5.6.5-custom
The push refers to a repository [docker-registry-default.apps.os.int.intra/nuxeo/elasticsearch-5.6.5-custom]
432ec0861ba3: Mounted from test/elasticsearch-5.6.5-custom 
a459bbc01c5d: Mounted from test/elasticsearch-5.6.5-custom 
6573437ea578: Mounted from test/elasticsearch-5.6.5-custom 
f6db30a6d5da: Mounted from test/elasticsearch-5.6.5-custom 
b0ec2cfecedf: Mounted from test/elasticsearch-5.6.5-custom 
9f3ec9794aef: Mounted from test/elasticsearch-5.6.5-custom 
ed586282be90: Mounted from test/elasticsearch-5.6.5-custom 
274df55c1a7e: Mounted from test/elasticsearch-5.6.5-custom 
99e22142836e: Mounted from test/elasticsearch-5.6.5-custom 
620127729ace: Mounted from test/elasticsearch-5.6.5-custom 
d1be66a59bc5: Mounted from test/elasticsearch-5.6.5-custom 
latest: digest: sha256:8cae781492dc00412875cf5e74ecefeb10699cb6716bdb7189eec3829bb4097a size: 2617
[root@cm0 5.6.5]# 

Check the results on the registry console web ui :

See original image

Deploy your custom elastic container

Connect to the cluster admin console, select the nuxeo project and choose add to project/deploy image :

See original image

You will see that your custom container is available in the image streams :

See original image

No additional config is required -> choose “deploy” (you will have to provide a name without special characters)

See original image

Voila !

But at this point, your new container uses non persistent storage. if you stop the container, all the data is lost.

provide elasticsearch with a persistent ceph volume

In the admin console, go to the storage tab and click “create storage”
You will be asked to choose between the storage classes you had created before, for a PV size..

See original image

Click “create” and the PV should be created …

See original image

Now add a persistent volume to your elastic deployment. go to the deployment tab, select easticsearch and in the configuration pane, select “add storage”.

See original image

Fill in the form as follows :

See original image

Click ok then check the deployment config, it should be ok.

stop and restart the pod.

You will see back off restarts. It’s due to the security context we have chosen for this pod : remember that we have disable the anyuid SCC (security context) feature that disallow a fixed user id to run into the container.

Consequently to this action, the RBD physical volume is not mounted with the correct user and group id.

we can adjust that at this point : on the master node, export the pod to a yaml file so we can edit it.

First, export the pod definition : on the master :

[root@cm0 5.6.5]# oc get dc -o yaml > elasticdc.yaml

Then go to the deployment tab, click on the elasticsearch deployement and then on action/delete (button on the top/right side of the web ui)

edit the file elasticdc.yaml :

replace

securityContext: {}

by :

securityContext:
  runAsUser: 1000
  fsGroup: 1000

This will force the ceph rbd device to mount with the right permissions

Import the deployment :

[root@cm0 5.6.5]# oc create -f elasticdc.yaml 
deploymentconfig "elasticsearch" created

then you will see the container run, mount the PV. check the container :
(the default password for elastic applied by x-pack is changeme. we have not overrided yet)

[root@cm0 5.6.5]# curl -u elastic -XGET 'elasticsearch.nuxeo.svc:9200/?pretty'
Enter host password for user 'elastic':
{
  "name" : "ec01",
  "cluster_name" : "nuxeo",
  "cluster_uuid" : "Lq0QyhqzSNybgOLhWHFkLA",
  "version" : {
    "number" : "5.6.5",
    "build_hash" : "6a37571",
    "build_date" : "2017-12-04T07:50:10.466Z",
    "build_snapshot" : false,
    "lucene_version" : "6.6.1"
  },
  "tagline" : "You Know, for Search"
}

Create a postgresql container

Another time, go to the console and add an image to the project : search for “centos/postgresql-94-centos7” in the “image name” field and click on the magnify icon to search for the image

add the following variables :

See original image

And deploy. it will work.

Once again, we will want to add persistent storage : define a PV :

See original image

And scale down to 0 pod the postgresql deployment :

See original image
See original image

Remove the non persistent volume : click on “remove” in the volume area :

See original image

And mount the new storage :

See original image

as above, export the deployment config and adjust the UID and GROUP so that the RBD device will be mounted with the correct permissions.

replace the following

securityContext: {}

by

securityContext:
  runAsUser: 1000
  fsGroup: 1000

(1000 for ex)

then destroy the deployment…

and recreate it :

[root@cm0 postgresql]# oc create -f postgresql-94-centos7.yaml 
deploymentconfig "postgresql-94-centos7" created

then resume the rollouts (actions button)

and scale the replicas to 1 (0 since we added the PV)

The pod will raise a container and postgresql is up and running.

Good.

High Availability Openshift Origin cluster with Ceph persistent storage

Abstract

In this post, we’ll install an high availibility microservice cluster based on Openshift origin 3.7, CentOS 7.4, Ceph persistent storage and HA Proxy.

Openshift provides docker, kubernetes and other open source great projects like haproxy, for ex. It’s a very complex system, and for this kind of project you can’t install it without using a configuration management tool : Ansible.

This tuto is very straight forward, thanks to Ansible.

Architecture

See original image

Pre requisites

  • 3 Physical hosts (optional, for real HA)
  • An admin VM, CentOS 7
  • 8 VMs CentOS 7.0 x86_64 “minimal install” :
  • +2 VM for high availabvility (ha proxy / keepalived)

You will have to provision the following VMs (assuming your admin node exists already):

Physical host : hyp03 (Intel Nuc 16GB RAM)

  • cm0 : master node : 2GB RAM, vda:16GB, vdb:3GB
  • ci0 : infrastructure node : 2GB RAM, vda:16GB

Physical host : hyp01 (X10SDV 8 cores 64GB RAM)

  • cm1 : master node : 2GB RAM, vda:16GB, vdb:3GB
  • ci1 : infrastructure node : 2GB RAM
  • cn0 : application node : 8GB RAM, vda:16GB, vdb:3GB

Physical host : hyp02 (X10SDV 4 cores 32GB RAM)

  • cm2 : master node : 2GB RAM, vda:16GB, vdb:3GB
  • ci2 : infrastructure node : 2GB RAM, vda:16GB
  • cn1 : application node : 8GB RAM, vda:16GB, vdb:3GB

HA

  • ha0 : tiny Centos 7 VM with 512 MB RAM@
  • ha1 : id.

Procedure

Provision the virtual machines

automated way (optional)

You may want to do this very fast. I’m using virsh for that task :

For example for the first VM (cm0), and so on, follow this process

createcm0.sh :

virt-install --name=cm0 --controller type=scsi,model=virtio-scsi --disk path=/home/libvirt/cm0.qcow2,device=disk,size=16,bus=virtio --disk path=/home/libvirt/cm0-1.qcow2,device=disk,size=3,bus=virtio --graphics spice --vcpus=2 --ram=2048 --network bridge=br0 --os-type=linux --os-variant="CentOS7.0" --location=http://mirror.int.intra/centos/7/os/x86_64/ --extra-args="ks=http://mirror.int.intra/install/cm0" 
virsh autostart cm0

With that kickstart file hosted on a web server of your choice (I use http://mirror.int.intra on a synology NAS) :

# Text mode or graphical mode?
text

# Install or upgrade?
install
eula --agreed

# Language support
lang en_US

# Keyboard
keyboard fr 

# Network
network --device=eth0 --onboot yes --hostname=cm0.int.intra --noipv6 --bootproto static --ip 10.0.1.1 --netmask 255.255.0.0 --gateway 10.0.0.1 --nameserver 10.0.0.1

# Network installation path
url --url http://mirror.int.intra/centos/7/os/x86_64/

# installation path
#repo --name="Centos-Base" --baseurl=http://mirror.int.intra/centos/7/os/x86_64/ --install --cost 1 --noverifyssl
#repo --name="Centos-Updates" --baseurl=http://mirror.int.intra/centos/7/updates/x86_64/ --install --cost 1 --noverifyssl
repo --name="Puppet" --baseurl="https://yum.puppetlabs.com/el/7/products/x86_64/" --noverifyssl

# Root password - change to a real password (use "grub-md5-crypt" to get the crypted version)
rootpw mypassword

# users 
# users
#user --name=cephadm --password=<your pass in md5 format> --iscrypted --homedir=/home/cephadm

# Firewall
firewall --disabled

# Authconfig
auth       --enableshadow --passalgo sha512

# SElinux
selinux --disabled

# Timezone
timezone --utc Europe/Paris --ntpservers=10.0.0.1

# Bootloader
#bootloader --location=mbr --driveorder=vda
#zerombr

# Partition table
clearpart --drives=vda --all
autopart

# Installation logging level
logging --level=info

# skip x window
skipx

services   --enable ntpd

# Reboot after installation?
reboot 

##############################################################################
#
# packages part of the KickStart configuration file
#
##############################################################################
%packages
#
# Use package groups from "Software Development Workstation (CERN Recommended Setup)"
# as defined in /afs/cern.ch/project/linux/cern/slc6X/i386/build/product.img/installclasses/slc.py
#
@Base 
@Core 
yum-utils
%end

##############################################################################
#
# post installation part of the KickStart configuration file
#
##############################################################################
%post
#
# This section describes all the post-Anaconda steps to fine-tune the installation
#

# redirect the output to the log file
exec >/root/ks-post-anaconda.log 2>&1
# show the output on the 7th console
tail -f /root/ks-post-anaconda.log >/dev/tty7 &
# changing to VT 7 that we can see what's going on....
/usr/bin/chvt 7

#
# Update the RPMs
#
yum-config-manager --add-repo http://mirror.int.intra/install/CentOS-Base.repo 
yum update -y

rpm -ivh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
yum update -y

rpm -Uivh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
yum --enablerepo=elrepo-kernel install -y kernel-ml
yum-config-manager --disable elrepo

rpm -ivh https://yum.puppetlabs.com/puppetlabs-release-pc1-el-7.noarch.rpm
yum -y install puppet-agent

/usr/sbin/grub2-set-default 0

# puppet
yum -y install puppet
systemctl enable puppet

# Done
exit 0

%end

manual way

Just create your VMs as you use to..

more automated ..

Openstack way (i’ll do this later…) :)

Prepare your environment

All the following is done with the root user (yes, .. it’s for home use only..)

On the Ansible VM (admin node)

Install ansible, and Openshift playbooks

# yum install centos-release-openshift-origin
# yum install openshift-ansible-playbooks

dsh, ssh…

Configure dsh :

On your admin node only :

I use dancer shell, you will see how to do that in another post (same blog)
Create three group files under .dsh/group/ :

* oc : all VMs with their relative DNS names (cm0..), including the ha[0..1] VMs.
* ocfqdn : all VMs as above but with their full qualified domain name (cm0.int.intra...)
* ocd : all VMs you want to run containers (depending of your choice. for this example, we will use all VMs but the ha[0..1]) 

ssh certificate authentication

On all hosts, copy your root certificate

Run this on your admin node.

# cat .dsh/group/oc  | while read in; do ssh-copy-id root@$in ; done 
# cat .dsh/group/ocfqdn  | while read in; do ssh-copy-id root@$in ; done 

Note that you can automate this process..

#!/bin/sh
# sudo yum install moreutils sshpass openssh-clients
echo 'Enter password:';
read -s SSHPASS;
export SSHPASS;
cat ~/.dsh/group/oc  | while read in; do sshpass -e ssh-copy-id -o StrictHostKeyChecking=no root@$in -p 22 ; done
export SSHPASS=''

Important for Ansible to run well, accept the ssh keys for the hosts in fqdn format

cat ~/.dsh/group/ocfqdn  | while read in; do yes | ssh -o StrictHostKeyChecking=no root@$in -p 22 uptime ; done

Check the result:

[root@admin openshift]# dsh -Mg oc uptime
cm0:  15:46:29 up 47 min,  1 user,  load average: 0.00, 0.00, 0.00
cm1:  15:46:29 up 50 min,  0 users,  load average: 0.00, 0.00, 0.00
cm2:  15:46:29 up 40 min,  0 users,  load average: 0.00, 0.00, 0.00
ci0:  15:46:29 up  1:01,  0 users,  load average: 0.00, 0.00, 0.00
ci1:  15:46:30 up  1:05,  0 users,  load average: 0.24, 0.05, 0.02
ci2:  15:46:30 up 58 min,  0 users,  load average: 0.00, 0.00, 0.00
cn0:  15:46:30 up 19 min,  0 users,  load average: 0.00, 0.00, 0.00
cn1:  15:46:30 up 24 min,  0 users,  load average: 0.00, 0.00, 0.00

check the same with the ocfqdn group

Docker configuration

We will configure the persistent storage for docker

# dsh -Mg ocd "yum install -y docker"
# dsh -Mg ocd "vgcreate vgdocker /dev/vdb"
# dsh -Mg ocd 'printf "CONTAINER_THINPOOL=docker\nVG=vgdocker\n" | sudo tee /etc/sysconfig/docker-storage-setup'
# dsh -Mg ocd "docker-storage-setup"

And check the results :

# dsh -Mg ocd "lvs"

Enable and start docker

# dsh -Mg ocd "systemctl enable docker.service"
# dsh -Mg ocd "systemctl start docker.service"

Deploy Openshift with Ansible

Define your configuration

hosts file

We will define our cluster in a single configuration file (hosts file) for Ansible

Create a file named “c1.hosts”

# Create an OSEv3 group that contains the master, nodes, etcd, and lb groups.
# The lb group lets Ansible configure HAProxy as the load balancing solution.
# Comment lb out if your load balancer is pre-configured.
[OSEv3:children]
masters
nodes
etcd
lb

# Set variables common for all OSEv3 hosts
[OSEv3:vars]
ansible_ssh_user=root
openshift_deployment_type=origin
#openshift_release=v3.6

# Uncomment the following to enable htpasswd authentication; defaults to
# DenyAllPasswordIdentityProvider.
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/htpasswd'}]
openshift_master_htpasswd_users={'admin': '$apr1$CkPI2pl3$aNvboUNKKONazHRjLLJc3/'}

# Native high availbility cluster method with optional load balancer.
# If no lb group is defined installer assumes that a load balancer has
# been preconfigured. For installation the value of
# openshift_master_cluster_hostname must resolve to the load balancer
# or to one or all of the masters defined in the inventory if no load
# balancer is present.
openshift_master_cluster_method=native
openshift_master_cluster_hostname=cm.os.int.intra
openshift_master_cluster_public_hostname=cm.os.int.intra
openshift_public_hostname=cm.os.int.intra
openshift_master_api_port=443
openshift_master_console_port=443
openshift_docker_options='--selinux-enabled --insecure-registry 172.30.0.0/16'
openshift_router_selector='region=infra'
openshift_registry_selector='region=infra'

# other config options
openshift_master_default_subdomain=apps.os.int.intra
osm_default_subdomain=apps.os.int.intra
osm_default_node_selector='region=primary'
osm_cluster_network_cidr=10.128.0.0/14
openshift_portal_net=172.30.0.0/16

# apply updated node defaults
#openshift_node_kubelet_args={'pods-per-core': ['10'], 'max-pods': ['250'], 'image-gc-high-threshold': ['90'], 'image-gc-low-threshold': ['80']}

# enable ntp on masters to ensure proper failover
openshift_clock_enabled=true

# for ours, no need for the minimum cluster requirements.. we're at home...
openshift_disable_check=memory_availability,disk_availability,docker_storage
template_service_broker_install=false

# host group for masters
[masters]
cm[0:2].os.int.intra

# host group for etcd
[etcd]
cm[0:2].os.int.intra

# Specify load balancer host
[lb]
ha0.int.intra
ha1.int.intra

# host group for nodes, includes region info
[nodes]
cm[0:2].os.int.intra openshift_node_labels="{'region': 'master'}" openshift_schedulable=False
ci[0:2].os.int.intra openshift_node_labels="{'region': 'infra', 'zone': 'default'}"
cn0.os.int.intra openshift_node_labels="{'region': 'primary', 'zone': 'east'}"
cn1.os.int.intra openshift_node_labels="{'region': 'primary', 'zone': 'west'}"    

haproxy/keepalived

You must install haproxy and keepalived on the HA hosts (ha[0..1])

HA Proxy

This will be done by Ansible

keepalived

On each hosts, install keepalived :

yum install -y keepalived
systemctl enable keepalived

Edit on the two hosts /etc/keepalived/keepalived.conf so that

  • on the host ha0 you have “state MASTER” and on ha1 you have “state BACKUP”
  • the IPs match your load balanced ips for cm.os.int.intra and ci.os.int.intra (virtual_ipaddress field)

Here is the ha1 conf file :

! Configuration File for keepalived

vrrp_script chk_haproxy {
  script "killall -0 haproxy" # check the haproxy process
  interval 2 # every 2 seconds
  weight 2 # add 2 points if OK
}

vrrp_instance VI_1 {
  interface eth0 # interface to monitor
  state BACKUP # MASTER on haproxy1, BACKUP on haproxy2
  virtual_router_id 50
  priority 100 # 101 on haproxy1, 100 on haproxy2
  virtual_ipaddress {
    10.0.1.50 # virtual ip address
  }
  track_script {
    chk_haproxy
  }
}

vrrp_instance VI_2 {
  interface eth0 # interface to monitor
  state BACKUP # MASTER on haproxy1, BACKUP on haproxy2
  virtual_router_id 51
  priority 100 # 101 on haproxy1, 100 on haproxy2
  virtual_ipaddress {
    10.0.1.51 # virtual ip address
  }
  track_script {
    chk_haproxy
  }
}

Start keepalived

systemctl start keepalived

DNS entries

My choice is to define a subdomain for my openshift cluster so that apps will reside in thie namespace : *.app.os.int.intra

Define the VMs under the subdomain os.int.intra

$ORIGIN os.int.intra.                                     
ci0                     A       10.0.1.200                
ci1                     A       10.0.1.201                
ci2                     A       10.0.1.202                
cm0                     A       10.0.1.1                  
cm1                     A       10.0.1.2                  
cm2                     A       10.0.1.3                  
cn0                     A       10.0.1.100                              
cn1                     A       10.0.1.101  

(not for the ha nodes, because we have to join them from the base FQDN namespace int.intra)

Define the apps IPs, with the help of a wildcard on the three infrastructure nodes :

$ORIGIN apps.os.int.intra.                                
*                       A       10.0.1.200                
*                       A       10.0.1.201                
*                       A       10.0.1.202 

LB

under the $ORIGIN os.int.intra :

kubernetes              CNAME   cm
    cm                      A       10.0.1.50
    ci                      A       10.0.1.51

etcd

in the $ORIGIN _tcp.os.int.intra. subsection:

$ORIGIN _tcp.os.int.intra.                         
$TTL 300        ; 5 minutes                               
_etcd-client            SRV     0 0 2379 cm0.os.int.intra.
                        SRV     0 0 2379 cm1.os.int.intra.
                        SRV     0 0 2379 cm2.os.int.intra.
_etcd-server            SRV     0 0 2380 cm0.os.int.intra.
                        SRV     0 0 2380 cm1.os.int.intra.
                        SRV     0 0 2380 cm2.os.int.intra.
_ceph-mon 60 IN SRV 10 60 6789 n0.int.intra.              
_ceph-mon 60 IN SRV 10 60 6789 n4.int.intra.              
_ceph-mon 60 IN SRV 10 60 6789 n8.int.intra.

Run ansible

# ansible-playbook -i hosts /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml

I’ve been sometimes worried by installation errors (for ex. when deploying routers on infrastructure nodes and service template..). In that case, just restarting the specifi playbook resolved the issue. Sometimes, containers are not so fast to get up and run, so that these issues can occur… but it can be fixed with a simple playbook restart, don’t worry about that.

1 hour later… Voila !!

INSTALLER STATUS *************************************************************************************************************************************************************************************
Initialization             : Complete
Health Check               : Complete
etcd Install               : Complete
Load balancer Install      : Complete
Master Install             : Complete
Master Additional Install  : Complete
Node Install               : Complete
Hosted Install             : Complete
Service Catalog Install    : Complete

Check your brand new cluster :

See original image

Set your admin pass (if not set in your hosts file..)

[root@cm0 ~]# htpasswd -c /etc/origin/htpasswd admin

and assign the admin role

[root@cm0 ~]# oadm policy add-cluster-role-to-user cluster-admin admin
cluster role "cluster-admin" added: "admin"

See original image

Create a persistent Ceph Volume

Install ceph on all nodes

I’ve a luminous ceph cluster -> install the luminous ceph client libraries on the nodes

On the admin node, create a ceph.repo file

[Ceph]
name=Ceph packages for $basearch
baseurl=https://download.ceph.com/rpm-luminous/el7/$basearch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
priority=2

[Ceph-noarch]
name=Ceph noarch packages
baseurl=https://download.ceph.com/rpm-luminous/el7/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
priority=2

[ceph-source]
name=Ceph source packages
baseurl=https://download.ceph.com/rpm-luminous/el7/SRPMS
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
priority=2

Deploy it on all nodes

cat ceph.repo | dsh -g oc -i -c 'sudo tee /etc/yum.repos.d/ceph.repo'

Install the latest client libs

dsh -Mg oc "yum install -y ceph-common"

Ceph cluster part of the work

Create a pool

ceph osd pool create kube 1024

It will be associated with your default policy. for my needs, i’ve defined two pools one on a default SATA pool of disks, another one in a another SSD pool.

The following describes the creation of an RBD-capable pool usable for Openshift

Declare the user which will have access to that pool :

ceph auth get-or-create client.kube mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=kube' -o ceph.client.kube.keyring

Get the client.kube user secret key in a base64 format by issuing the following command :

[root@admin openshift]# ceph auth get-key client.kube |base64
<your key>

Keep the resulting secret for the following steps

Finally, copy the keyring of the ceph user that can manage PVs to the working nodes. this is required at the time of writing this doc because there’s a bug in kubernetes when removing locks on rbd images : Kube uses a keyring in /etc/ceph rather than use the secret stored in openshift configuration ( :( )

You will give this user some capabilities to remove locks :

[root@admin openshift]# ceph auth caps client.kube mon 'allow r, allow command "osd blacklist"' osd 'allow class-read object_prefix rbd_children, allow rwx pool=kube, allow rwx pool=kubessd' -o ceph.client.kube.keyring
updated caps for client.kube

then export the key ..

[root@admin openshift]#  ceph auth get client.kube > ceph.client.kube.keyring
exported keyring for client.kube
[root@admin openshift]# cat ceph.client.kube.keyring | dsh -m cn0.os.int.intra -i -c 'sudo tee /etc/ceph/ceph.keyring'
[client.kube]
    key = <your key>
    caps mon = "allow r"
    caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=kube, allow rwx pool=kubessd"
[root@admin openshift]# cat ceph.client.kube.keyring | dsh -m cn1.os.int.intra -i -c 'sudo tee /etc/ceph/ceph.keyring'
[client.kube]
    key = <your key> 
    caps mon = "allow r"
    caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=kube, allow rwx pool=kubessd"

Openshift part of the work

You will configure storage classes, physical volume claims (request for storage)

This part is done by issuing commands on a master node (for ex cm0)

Create a new project with the help of the Web Interface

For this example, create a simple “test” project

log on to a master and select the new project

[root@cm0 ~]# oc project test
Now using project "test" on server "https://cm.os.int.intra".

Record the ceph user secret in the Openshift cluster

Create a ceph-user-secret.yaml file :

apiVersion: v1
kind: Secret
metadata:
  name: ceph-kube-secret
  namespace: test 
data:
  key: <your key>
type:
  kubernetes.io/rbd

Import it :

[root@cm0 ~]# oc create -f ceph-kube-secret.yaml 
secret "ceph-kube-secret" created

Create a storage class

First of all, create a storage class with a ceph user that can create RBD on the kube pool we have juste created
The storage class will embed the user secret name and password of this user

create a storage class yaml file on the master node (cm0 in my example)
Note the monitors line, it will list the ceph monitors.

Later, it will be fine to look at DNS SRV entries. it seems to be scheduled..

[root@cm0 ~]# vi storageclass.yaml

Insert this config :

apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
  name: dynamic
  annotations:
    storageclass.beta.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/rbd
parameters:
  monitors: 10.0.0.210:6789,10.0.0.214:6789,10.0.0.218:6789
  adminId: kube
  adminSecretName: ceph-kube-secret
  adminSecretNamespace: test 
  pool: kube
  userId: kube
  userSecretName: ceph-kube-secret

create the class on your openshift cluster :

[root@cm0 ~]# oc create -f storageclass.yaml 
storageclass "dynamic" created
[root@cm0 ~]# oc get storageclass
NAME                TYPE
dynamic (default)   kubernetes.io/rbd   

At this point, you can create storage pools with the web interface by selecting the storage class we have just selected. You can also create a PV claim with a YAML file.

create a Physical Volume

To create a 2GB volume for your project, you will have to describe a PV claim that will create the PV

The following shows how to create a PVC with a yaml file, but now it’s very easy to create it with the Web GUI -> try it : go to the storage item on your newly created project, select the storage class, the name of the PV(C), and the size. It will automatically create the PV in the previsouly defined ceph pools.

pvc.yaml:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: ceph-claim
  annotations:
    volume.beta.kubernetes.io/storage-class: dynamic
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi

create it and check the result :

[root@cm0 ~]# oc create -f pvc.yaml 
persistentvolumeclaim "ceph-claim" created
[root@cm0 ~]# oc get pvc
NAME         STATUS    VOLUME                                     CAPACITY   ACCESSMODES   STORAGECLASS   AGE
ceph-claim   Bound     pvc-1c9444af-ef33-11e7-a68e-525400946207   2Gi        RWO           dynamic        3s

the PV has been created (bound), and is usable for your test project

Test your physical volume with a postgresql 9.4 container

describe the pod in a file (ceph-pgsql.yam for ex):

apiVersion: v1
kind: Pod
metadata:
  name: postgresql94
spec:
  containers:
  - name: postgresql94 
    image: centos/postgresql-94-centos7      
    command: ["sleep", "60000"]
    volumeMounts:
    - name: ceph-claim
      mountPath: /var/lib/pgsql/data 
      readOnly: false
  volumes:
  - name: ceph-claim
    persistentVolumeClaim:
      claimName: ceph-claim

Create the pod :

[root@cm0 ~]# oc create -f ceph-pgsql.yaml 
pod "postgresql94" created

Look at the test project at the web interface, you will see the pod creation process (including the postgresql container download)

The following image show the result :

See original image

Look at the following snapshot, in the terminal tab you enter the pod with a shell emulation, you will see an RBD (/dev/rbd0) device mounted, as a persistent storage and formatted in ext4 and used by the postgresql pod for his data !!

See original image

Done !!

elastic

sysctl -w vm.max_map_count=262144 on nodes

Kubernetes on CoreOS with distributed shared storage (with Ceph) : step by step procedure

Abstract

Now that we have a running CoreOS/Etcd v3 cluster, we want to get the best of it. We have the means, but no great software to manage the containers automatically.

Kubernetes, like Mesos or Openshift, is that kind of software that can do the job. Powered by Google, benefiting from years of experience (Borg…), it’s a great platform as you will see in this post.

Requirements

  • a decent homelab. Example here
  • a running CoreOS/Etcd v3 cluster
  • access to a shared storage for your data. we will use here a Ceph distributed storage (details here
  • access to the internet (until you have your own container registries and other software repositories)
  • You will want to get an automated tool to configure your cluster. I will use a little UNIX tool (dsh) in this tutorial.

Credits

This post is inspired from this post : https://www.upcloud.com/support/deploy-kubernetes-coreos/, but it was written for kubernetes 1.3.0. Since 1.6.0 there are a few options that are deprecated. Use the following guide if you want to install 1.6.

The rest of this guide has been written following the “source” documentations : https://coreos.com/kubernetes/docs/latest/getting-started.html

Architecture

result

Preparing the CoreOS nodes

Take 2 minutes to setup dsh in order to automate your deployment.

Use groups :

  1. One machine.list file that list all your nodes
  2. One group file (~/.dsh/group

    [core@admin .dsh]$ pwd
    /home/core/.dsh
    [core@admin .dsh]$ find
    .
    ./machines.list
    ./group
    ./group/km
    ./group/kw

the machines.list and group files have to include the nodes, like the worker nodes :

[core@admin .dsh]$ cat ./group/kw
c1.int.intra
c2.int.intra

Create your CoreOS nodes

Follow this guide

A bit of change since my previous post : etcd v3 seems not to be supported very well by flannel 0.7

I will use etcd v2, and as you will see, with core os and with containers (etcd on coreos is containerized), it’s very very simple. Juste change the user_data file and apply the procedure described in my previous post.

Only two changes :

  1. map etcd v2 as the same level (2.3.7) as the client installed on CoreOS (at the time this post is published)
  2. removed one argument (auto compaction..) not supported in etcd v2 2.3.7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
#cloud-config
ssh_authorized_keys:
- "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDjgUAV27yJT6YRxzDfGiDB4fwZwmx7EWzcZU3LXRWjaaSgvlizQtHwG8OCJYQN0aG29CTQNgJs+EY40/VQyeidVOdVmaClzmVSMruB68msEuvMrz5DA/v1FXVrYJCAyy+3l719DI9eA++nYyDo//LEj5cf7/4Xcs+12o4ADCJzYMNXazQ8f/1d3EPqJhfcuL+spehCYzDGCYyDDGeIsUaZYnWdOY4z4+2wWtd++9WBfCrVE2g8I3k0+U0iVEM1tZXfrvIzj+fw17zs/FNuzAunQRAIagqUcIswc9aPJbN1H9W31N6gX1X5IV/SvqPl39QXV/wgXtc9M+0oEKJQOvOj core"
hostname: "@@VMNAME@@"
users:
- name: "user"
passwd: "$6$D6SWuIMCinNlXP4b$QXnT3RVvPwHcR6ElHP1hWRBxq03gA/TbM1iMKRz3NukT7AYGGf3uSGC9WIkBI1s0GlQ9a1wWUvil.e/OBu6ax/"
groups:
- "sudo"
write_files:
- path: /etc/systemd/resolved.conf
permissions: 0644
owner: root
content: |
# This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.
#
# Entries in this file show the compile time defaults.
# You can change settings by editing this file.
# Defaults can be restored by simply deleting this file.
#
# See resolved.conf(5) for details
[Resolve]
#DNS=
#FallbackDNS=
Domains=int.intra
#LLMNR=yes
#DNSSEC=allow-downgrade
#Cache=yes
#DNSStubListener=udp
write_files:
- path: /etc/systemd/system/etcd-member.service.d/override.conf
permissions: 0644
owner: root
content: |
[Service]
Environment="ETCD_IMAGE_TAG=v2.3.7"
Environment="ETCD_DATA_DIR=/var/lib/etcd"
Environment="ETCD_SSL_DIR=/etc/ssl/certs"
Environment="ETCD_OPTS=--name=@@VMNAME@@ --listen-client-urls=http://@@IP@@:2379 --advertise-client-urls=http://@@IP@@:2379 --listen-peer-urls=http://@@IP@@:2380 --initial-advertise-peer-urls=http://@@IP@@:2380 --initial-cluster-token=my-etcd-token --discovery-srv=int.intra"
coreos:
update:
reboot-strategy: off
units:
- name: systemd-networkd.service
command: stop
- name: static.network
runtime: true
content: |
[Match]
Name=eth0
[Network]
Address=@@IP@@/16
Gateway=10.0.0.1
DNS=10.0.0.1
- name: down-interfaces.service
command: start
content: |
[Service]
Type=oneshot
ExecStart=/usr/bin/ip link set eth0 down
ExecStart=/usr/bin/ip addr flush dev eth0
- name: systemd-networkd.service
command: restart
- name: systemd-resolved.service
command: restart
- name: etcd-member.service
command: start

Go..

So we have a running CoreOS cluster.

core@c0 ~ $ export ETCDCTL_ENDPOINTS=http://c0:2379; etcdctl cluster-health 
member 65ac5f05513b1262 is healthy: got healthy result from http://10.0.1.101:2379
member 6c2733a4aad35dac is healthy: got healthy result from http://10.0.1.100:2379
member 84d5ec72bbd714d2 is healthy: got healthy result from http://10.0.1.102:2379

We will do the following :

c0 : Kubernetes master
c1 : Kubernetes node #1
c2 : Kubernetes node #2
admin : admin node, hosting the user “core”, that have access to the cluster

Networking : only one network card for the hosts (private nic =public nic =defaut NIC, private ip = public ip = default ip)

Remember you have setup dsh on the admin node, in order to execute commands on each node (previous posts)

create the Root CA public and private key

On the admin node as user “core” :

mkdir ~/kube-ssl
cd ~/kube-ssl
openssl genrsa -out ca-key.pem 2048
openssl req -x509 -new -nodes -key ca-key.pem -days 10000 -out ca.pem -subj "/CN=kube-ca"

Create a DNS CNAME record to match the name “kubernetes” for the master
I use bind on my network.

main zone for my internal DNS network int.intra:

c0                      A       10.0.1.100
kubernetes              CNAME   c0.int.intra.
c1                      A       10.0.1.101   
c2                      A       10.0.1.102

Configure the master node

Create the API server keys

vi openssl.cnf

[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name
[req_distinguished_name]
[ v3_req ]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
subjectAltName = @alt_names
[alt_names]
DNS.1 = kubernetes
DNS.2 = kubernetes.default
DNS.3 = kubernetes.default.svc
DNS.4 = kubernetes.default.svc.cluster.local
DNS.5 = kubernetes.int.intra
IP.1 = 10.0.1.100
IP.2 = 10.2.0.1
IP.3 = 10.3.0.1

and issue :

openssl genrsa -out apiserver-key.pem 2048
openssl req -new -key apiserver-key.pem -out apiserver.csr -subj "/CN=kube-apiserver" -config openssl.cnf
openssl x509 -req -in apiserver.csr -CA ca.pem -CAkey ca-key.pem -CAcreateserial -out apiserver.pem -days 365 -extensions v3_req -extfile openssl.cnf

Create the keys for remote management from the admin node

openssl genrsa -out admin-key.pem 2048
openssl req -new -key admin-key.pem -out admin.csr -subj "/CN=kube-admin"
openssl x509 -req -in admin.csr -CA ca.pem -CAkey ca-key.pem -CAcreateserial -out admin.pem -days 365

Apply the TLS assets

Deploy the root CA pem on all nodes

You’re always managing form the “admin” machine, under user “core”

dsh -g km "sudo mkdir -p /etc/kubernetes/ssl"
cat ca.pem | dsh -g km -i -c 'sudo tee /etc/kubernetes/ssl/ca.pem'
cat apiserver.pem | dsh -g km -i -c 'sudo tee /etc/kubernetes/ssl/apiserver.pem'
cat apiserver-key.pem | dsh -g km -i -c 'sudo tee /etc/kubernetes/ssl/apiserver-key.pem'

And set the permissions

[core@admin ssl]$ dsh -g km "sudo chmod 600 /etc/kubernetes/ssl/*-key.pem ; sudo chown root:root /etc/kubernetes/ssl/*-key.pem"

Setting up flannel

This step could have been setup using cloud-config when you have setup coreos and etcd.

We will do this “manually” (under quotes because it’s a few command lines automatically executed by dsh or the shell, one time only at setup time”. -> no such a cost.

Create the flannel conf dircetory on all nodes :

[core@admin ssl]$ dsh -g km "sudo mkdir /etc/flannel"

And the config file. note : only valid for these three nodes. Review these commands if you need more nodes.

[core@admin ssl]$ dsh -g km 'rm -f options.env ; export IP=`ip addr show eth0 | grep -Po "inet \K[\d.]+"`; echo "FLANNELD_IFACE=$IP" > options.env ; echo "FLANNELD_ETCD_ENDPOINTS=http://10.0.1.100:2379,http://10.0.1.101:2379,http://10.0.1.102:2379" >> options.env ; sudo cp options.env /etc/flannel/'

We also need to create de drop-in for systemd, overriding the existing flanneld systemd service located in /usr/lib64/systemd/system

[core@admin ssl]$ dsh -g km "sudo mkdir -p /etc/systemd/system/flanneld.service.d/"
[core@admin ssl]$ dsh -g km 'sudo rm -f /etc/systemd/system/flanneld.service.d/40-ExecStartPre-symlink.conf; printf "[Service]\nExecStartPre=/usr/bin/ln -sf /etc/flannel/options.env /run/flannel/options.env\n" | sudo tee -a /etc/systemd/system/flanneld.service.d/40-ExecStartPre-symlink.conf'

Set the flanneld configuration into the etcd cluster. For this operation, I use an etcd binary for linux available here

On the master node, run

dsh -m c0 'export ETCDCTL_API=2 ; export ETCDCTL_ENDPOINTS=http://c1:2379; etcdctl set /coreos.com/network/config "{\"Network\":\"10.2.0.0/16\",\"Backend\":{\"Type\":\"vxlan\"}}"'
dsh -m c0 'export ETCDCTL_API=2 ; export ETCDCTL_ENDPOINTS=http://c1:2379; etcdctl get /coreos.com/network/config'

Docker configuration

Create another systemd drop-in on all master nodes :

/etc/systemd/system/docker.service.d/40-flannel.conf :

[Unit]
Requires=flanneld.service
After=flanneld.service
[Service]
EnvironmentFile=/run/flannel/flannel_docker_opts.env

NOTE : when writing this doc, there’s an error on the officiel documentation (coreos.com). the environment file is actually /run/flannel/flannel_docker_opts.env and NOT /etc/kubernetes/cni/docker_opts_cni.env. Either specify it like I did, or create a link pointing from /etc/kubernetes/cni/docker_opts_cni.env to /run/flannel/flannel_docker_opts.env

create the file on the admin node then push it on all master nodes (only one for me)

dsh -g km "sudo mkdir -p /etc/systemd/system/docker.service.d/"
cat docker.conf | dsh -g km -i -c 'sudo tee /etc/systemd/system/docker.service.d/40-flannel.conf'

Create the Docker CNI Options file:

/etc/kubernetes/cni/docker_opts_cni.env

dsh -g km 'sudo mkdir -p /etc/kubernetes/cni/; echo DOCKER_OPT_BIP="" | sudo tee  /etc/kubernetes/cni/docker_opts_cni.env ; echo DOCKER_OPT_IPMASQ="" | sudo tee -a /etc/kubernetes/cni/docker_opts_cni.env' 

Create the flannel conf file, we will use flannel for the overlay network

/etc/kubernetes/cni/net.d/10-flannel.conf

{
    "name": "podnet",
    "type": "flannel",
    "delegate": {
        "isDefaultGateway": true
    }
}

again, don’t do that by hand on each node, use an automated way.

Here`s one another..

[core@admin ~]$ dsh -g km 'sudo mkdir -p /etc/kubernetes/cni/net.d/ ; printf "{\n    "name": "podnet",\n    "type": "flannel",\n    "delegate": {\n        "isDefaultGateway": true\n    }\n}\n" | sudo tee /etc/kubernetes/cni/net.d/10-flannel.conf'

Create the kubelet Unit

Install the systemd unit service

/etc/systemd/system/kubelet.service

[Service]
Environment=KUBELET_IMAGE_TAG=v1.6.1_coreos.0
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/run/kubelet-pod.uuid \
  --volume var-log,kind=host,source=/var/log \
  --mount volume=var-log,target=/var/log \
  --volume dns,kind=host,source=/etc/resolv.conf \
  --mount volume=dns,target=/etc/resolv.conf \
  --volume modprobe,kind=host,source=/usr/sbin/modprobe \
  --mount volume=modprobe,target=/usr/sbin/modprobe \
  --volume lib-modules,kind=host,source=/lib/modules \
  --mount volume=lib-modules,target=/lib/modules \
  --uuid-file-save=/var/run/kubelet-pod.uuid"
ExecStartPre=/usr/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/usr/bin/mkdir -p /var/log/containers
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/run/kubelet-pod.uuid
ExecStart=/usr/lib/coreos/kubelet-wrapper \
  --api-servers=http://127.0.0.1:8080 \
  --register-schedulable=false \
  --cni-conf-dir=/etc/kubernetes/cni/net.d \
  --container-runtime=docker \
  --allow-privileged=true \
  --pod-manifest-path=/etc/kubernetes/manifests \
  --hostname-override=@@IP@@ \
  --cluster_dns=10.3.0.10 \
  --cluster_domain=cluster.local
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/run/kubelet-pod.uuid
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Then push this file (with the real IP) on the master nodes,

cat kubelet.service | dsh -g km -i -c 'export IP=`ip addr show eth0 | grep -Po "inet \K[\d.]+"`; sed s/@@IP@@/${IP}/g | sudo tee /etc/systemd/system/kubelet.service'

Set Up the kube-apiserver Pod

Create the manifests directory for all kubernetes pods.

dsh -g km "sudo mkdir -p /etc/kubernetes/manifests"

NOTE : you MUST add - –storage-backend=etcd2 to the apiserver YAML manifest. I’ve lost 4 hours debugging for that. I used etcd2 because flannel is not actually ETCD v3 compliant..

NOTE : note the - –insecure-bind-address=0.0.0.0 flag. not to do in a production env. Disable it when you will have secured the api server with cryptographic mechanisms.

/etc/kubernetes/manifests/kube-apiserver.yaml

apiVersion: v1
kind: Pod
metadata:
  name: kube-apiserver
  namespace: kube-system
spec:
  hostNetwork: true
  containers:
  - name: kube-apiserver
    image: quay.io/coreos/hyperkube:v1.6.1_coreos.0
    command:
    - /hyperkube
    - apiserver
    - --bind-address=0.0.0.0
    - --insecure-bind-address=0.0.0.0
    - --insecure-port=8080
    - --storage-media-type=application/json
    - --storage-backend=etcd2
    - --etcd-servers=http://10.0.1.100:2379,http://10.0.1.101:2379,http://10.0.1.102:2379
    - --allow-privileged=true
    - --service-cluster-ip-range=10.3.0.0/24
    - --secure-port=443
    - --advertise-address=10.0.1.100
    - --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota
    - --tls-cert-file=/etc/kubernetes/ssl/apiserver.pem
    - --tls-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem
    - --client-ca-file=/etc/kubernetes/ssl/ca.pem
    - --service-account-key-file=/etc/kubernetes/ssl/apiserver-key.pem
    - --runtime-config=extensions/v1beta1/networkpolicies=true
    - --anonymous-auth=false
    livenessProbe:
      httpGet:
        host: 127.0.0.1
        port: 8080
        path: /healthz
      initialDelaySeconds: 15
      timeoutSeconds: 15
    ports:
    - containerPort: 443
      hostPort: 443
      name: https
    - containerPort: 8080
      hostPort: 8080
      name: local
    volumeMounts:
    - mountPath: /etc/kubernetes/ssl
      name: ssl-certs-kubernetes
      readOnly: true
    - mountPath: /etc/ssl/certs
      name: ssl-certs-host
      readOnly: true
  volumes:
  - hostPath:
      path: /etc/kubernetes/ssl
    name: ssl-certs-kubernetes
  - hostPath:
      path: /usr/share/ca-certificates
    name: ssl-certs-host

push it :

cat kube-apiserver.yaml | dsh -g km -i -c 'sudo tee /etc/kubernetes/manifests/kube-apiserver.yaml'

Set Up the kube-proxy Pod

/etc/kubernetes/manifests/kube-proxy.yaml

apiVersion: v1
kind: Pod
metadata:
  name: kube-proxy
  namespace: kube-system
spec:
  hostNetwork: true
  containers:
  - name: kube-proxy
    image: quay.io/coreos/hyperkube:v1.6.1_coreos.0
    command:
    - /hyperkube
    - proxy
    - --master=http://127.0.0.1:8080
    securityContext:
      privileged: true
    volumeMounts:
    - mountPath: /etc/ssl/certs
      name: ssl-certs-host
      readOnly: true
  volumes:
  - hostPath:
      path: /usr/share/ca-certificates
    name: ssl-certs-host

Again, push it :

cat kube-proxy.yaml | dsh -g km -i -c 'sudo tee /etc/kubernetes/manifests/kube-proxy.yaml'

Set Up the kube-controller-manager Pod

/etc/kubernetes/manifests/kube-controller-manager.yaml

apiVersion: v1
kind: Pod
metadata:
  name: kube-controller-manager
  namespace: kube-system
spec:
  hostNetwork: true
  containers:
  - name: kube-controller-manager
    image: quay.io/coreos/hyperkube:v1.6.1_coreos.0
    command:
    - /hyperkube
    - controller-manager
    - --master=http://127.0.0.1:8080
    - --leader-elect=true
    - --service-account-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem
    - --root-ca-file=/etc/kubernetes/ssl/ca.pem
    resources:
      requests:
        cpu: 200m
    livenessProbe:
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10252
      initialDelaySeconds: 15
      timeoutSeconds: 15
    volumeMounts:
    - mountPath: /etc/kubernetes/ssl
      name: ssl-certs-kubernetes
      readOnly: true
    - mountPath: /etc/ssl/certs
      name: ssl-certs-host
      readOnly: true
  volumes:
  - hostPath:
      path: /etc/kubernetes/ssl
    name: ssl-certs-kubernetes
  - hostPath:
      path: /usr/share/ca-certificates
    name: ssl-certs-host

Set Up the kube-scheduler Pod

/etc/kubernetes/manifests/kube-scheduler.yaml

apiVersion: v1
kind: Pod
metadata:
  name: kube-scheduler
  namespace: kube-system
spec:
  hostNetwork: true
  containers:
  - name: kube-scheduler
    image: quay.io/coreos/hyperkube:v1.6.1_coreos.0
    command:
    - /hyperkube
    - scheduler
    - --master=http://127.0.0.1:8080
    - --leader-elect=true
    resources:
      requests:
        cpu: 100m
    livenessProbe:
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10251
      initialDelaySeconds: 15
      timeoutSeconds: 15

push it :

cat kube-scheduler.yaml | dsh -g km -i -c 'sudo tee /etc/kubernetes/manifests/kube-scheduler.yaml'

Start the master node

sudo systemctl daemon-reload
sudo systemctl start flanneld

this last command can take minutes the first time depending of the network speed between your master nodes and the container registry (internet or local, depending of your infrastructure) , cause the wrapper will download the container images to run flannel.
be patient…

sudo systemctl enable flanneld

Start kubernetes

sudo systemctl start kubelet
sudo systemctl enable kubelet

Again, the first time, this command runs a wrapper that downloads the kubernetes containers (approx 250MB)

be patient :)

you can see the progress like that :

c0 ~ # systemctl status kubelet
● kubelet.service
   Loaded: loaded (/etc/systemd/system/kubelet.service; disabled; vendor preset: disabled)
   Active: active (running) since Wed 2017-06-21 09:58:17 UTC; 3min 12s ago
  Process: 2087 ExecStartPre=/usr/bin/rkt rm --uuid-file=/var/run/kubelet-pod.uuid (code=exited, status=254)
  Process: 2083 ExecStartPre=/usr/bin/mkdir -p /var/log/containers (code=exited, status=0/SUCCESS)
  Process: 2076 ExecStartPre=/usr/bin/mkdir -p /etc/kubernetes/manifests (code=exited, status=0/SUCCESS)
 Main PID: 2098 (rkt)
    Tasks: 8 (limit: 32768)
   Memory: 122.3M
      CPU: 5.416s
   CGroup: /system.slice/kubelet.service
           └─2098 /usr/bin/rkt run --uuid-file-save=/var/run/kubelet-pod.uuid --volume var-log,kind=host,source=/var/log --mount volume=var-log,target=/var/log --volume dns,kind=host,source=/etc/resolv.conf --mount volume=dns,targe

Jun 21 10:01:20 c0 kubelet-wrapper[2098]: Downloading ACI:  94.3 MB/237 MB
Jun 21 10:01:21 c0 kubelet-wrapper[2098]: Downloading ACI:  95 MB/237 MB
Jun 21 10:01:22 c0 kubelet-wrapper[2098]: Downloading ACI:  95.7 MB/237 MB
Jun 21 10:01:23 c0 kubelet-wrapper[2098]: Downloading ACI:  96.4 MB/237 MB
Jun 21 10:01:24 c0 kubelet-wrapper[2098]: Downloading ACI:  97.2 MB/237 MB
Jun 21 10:01:25 c0 kubelet-wrapper[2098]: Downloading ACI:  98.3 MB/237 MB
Jun 21 10:01:26 c0 kubelet-wrapper[2098]: Downloading ACI:  99.5 MB/237 MB
Jun 21 10:01:27 c0 kubelet-wrapper[2098]: Downloading ACI:  101 MB/237 MB
Jun 21 10:01:29 c0 kubelet-wrapper[2098]: Downloading ACI:  101 MB/237 MB
Jun 21 10:01:30 c0 kubelet-wrapper[2098]: Downloading ACI:  102 MB/237 MB

You will have to be a bit patient after this download completes, so that kubelet can start all containers..

Finally, check that the api server responds to your requests :

run on the master node :

curl http://c0:8080/version
{
  "major": "1",
  "minor": "6",
  "gitVersion": "v1.6.1+coreos.0",
  "gitCommit": "9212f77ed8c169a0afa02e58dce87913c6387b3e",
  "gitTreeState": "clean",
  "buildDate": "2017-04-04T00:32:53Z",
  "goVersion": "go1.7.5",
  "compiler": "gc",
  "platform": "linux/amd64"
}

Setup the workers

create the kubernetes dir and ssl dir

dsh -g kw "sudo mkdir -p /etc/kubernetes/ssl"

TLS assets

Go back to the kube-ssl directory on the admin node

Generate the pem files :

vi worker-openssl.cnf

[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name
[req_distinguished_name]
[ v3_req ]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
subjectAltName = @alt_names
[alt_names]
IP.1 = $ENV::WORKER_IP

for i in {1..2} ; do export FQDN=c$i.int.intra; export WORKER_IP=10.0.1.10c$i; openssl genrsa -out ${FQDN}-worker-key.pem 2048 ; done
for i in {1..2} ; do export FQDN=c$i.int.intra; export WORKER_IP=10.0.1.10c$i; WORKER_IP=${WORKER_IP} openssl req -new -key ${FQDN}-worker-key.pem -out ${FQDN}-worker.csr -subj "/CN=${FQDN}" -config worker-openssl.cnf ; done
for i in {1..2} ; do export FQDN=c$i.int.intra; export WORKER_IP=10.0.1.10c$i; WORKER_IP=${WORKER_IP} WORKER_IP=${WORKER_IP} openssl x509 -req -in ${FQDN}-worker.csr -CA ca.pem -CAkey ca-key.pem -CAcreateserial -out ${FQDN}-worker.pem -days 365 -extensions v3_req -extfile worker-openssl.cnf ; done

copy the asset files on the worker nodes

cat ca.pem | dsh -g kw -i -c 'sudo tee /etc/kubernetes/ssl/ca.pem'
for i in {1..2} ; do cat ~/kube-ssl/c$i.int.intra-worker.pem | dsh -m c$i -i -c "sudo tee /etc/kubernetes/ssl/c$i.int.intra-worker.pem" ; done
for i in {1..2} ; do cat ~/kube-ssl/c$i.int.intra-worker-key.pem | dsh -m c$i -i -c "sudo tee /etc/kubernetes/ssl/c$i.int.intra-worker-key.pem" ; done

Set permissions

dsh -g kw "sudo chmod 600 /etc/kubernetes/ssl/*-key.pem ; sudo chown root:root /etc/kubernetes/ssl/*-key.pem"

Set links

for i in {1..2} ; do dsh -m c$i "sudo ln -s /etc/kubernetes/ssl/c$i.int.intra-worker.pem /etc/kubernetes/ssl/worker.pem"; done
for i in {1..2} ; do dsh -m c$i "sudo ln -s /etc/kubernetes/ssl/c$i.int.intra-worker-key.pem /etc/kubernetes/ssl/worker-key.pem"; done

Check :

[core@admin kube-ssl]$ dsh -g kw "sudo ls -l /etc/kubernetes/ssl"
total 32
-rw-------. 1 root root 1675 Jun 17 15:40 c1.int.intra-worker-key.pem
-rw-r--r--. 1 root root 1046 Jun 17 15:40 c1.int.intra-worker.pem
-rw-r--r--. 1 root root 1090 Jun 17 15:38 ca.pem
lrwxrwxrwx. 1 root root   47 Jun 17 15:42 worker-key.pem -> /etc/kubernetes/ssl/c1.int.intra-worker-key.pem
lrwxrwxrwx. 1 root root   43 Jun 17 15:42 worker.pem -> /etc/kubernetes/ssl/c1.int.intra-worker.pem
total 32
-rw-------. 1 root root 1679 Jun 17 15:40 c2.int.intra-worker-key.pem
-rw-r--r--. 1 root root 1046 Jun 17 15:40 c2.int.intra-worker.pem
-rw-r--r--. 1 root root 1090 Jun 17 15:38 ca.pem
lrwxrwxrwx. 1 root root   47 Jun 17 15:42 worker-key.pem -> /etc/kubernetes/ssl/c2.int.intra-worker-key.pem
lrwxrwxrwx. 1 root root   43 Jun 17 15:42 worker.pem -> /etc/kubernetes/ssl/c2.int.intra-worker.pem

Networking configuration

/etc/flannel/options.env :

dsh -g kw 'sudo mkdir /etc/flannel ; export IP=`ip addr show eth0 | grep -Po "inet \K[\d.]+"`; echo "FLANNELD_IFACE=$IP" | sudo tee /etc/flannel/options.env ; echo "FLANNELD_ETCD_ENDPOINTS=http://10.0.1.100:2379,http://10.0.1.101:2379,http://10.0.1.102:2379" | sudo tee -a /etc/flannel/options.env'

As for the master, create the systemd dropin for flanneld

dsh -g kw "sudo mkdir -p /etc/systemd/system/flanneld.service.d/"
dsh -g kw 'sudo rm -f /etc/systemd/system/flanneld.service.d/40-ExecStartPre-symlink.conf; printf "[Service]\nExecStartPre=/usr/bin/ln -sf /etc/flannel/options.env /run/flannel/options.env\n" | sudo tee -a /etc/systemd/system/flanneld.service.d/40-ExecStartPre-symlink.conf'

Docker configuration

/etc/systemd/system/docker.service.d/40-flannel.conf

dsh -g kw 'sudo mkdir -p /etc/systemd/system/docker.service.d/; printf "[Unit]\nRequires=flanneld.service\nAfter=flanneld.service\n[Service]\nEnvironmentFile=/run/flannel/flannel_docker_opts.env\n" | sudo tee /etc/systemd/system/docker.service.d/40-flannel.conf'

/etc/kubernetes/cni/docker_opts_cni.env

dsh -g kw 'sudo mkdir -p /etc/kubernetes/cni/ ; echo DOCKER_OPT_BIP="" | sudo tee  /etc/kubernetes/cni/docker_opts_cni.env ; echo DOCKER_OPT_IPMASQ="" | sudo tee -a /etc/kubernetes/cni/docker_opts_cni.env'


dsh -g kw 'sudo mkdir -p /etc/kubernetes/cni/net.d/ ; printf "{\n    "name": "podnet",\n    "type": "flannel",\n    "delegate": {\n        "isDefaultGateway": true\n    }\n}\n" | sudo tee /etc/kubernetes/cni/net.d/10-flannel.conf' 

Create the kubelet Unit

/etc/systemd/system/kubelet.service

create this file on the admin node and push it with DSH on all worker nodes, replacing ${ADVERTISE_IP} with the real hostname.

[Service]
Environment=KUBELET_IMAGE_TAG=v1.6.1_coreos.0
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/run/kubelet-pod.uuid \
  --volume dns,kind=host,source=/etc/resolv.conf \
  --mount volume=dns,target=/etc/resolv.conf \
  --volume var-log,kind=host,source=/var/log \
  --mount volume=var-log,target=/var/log"
ExecStartPre=/usr/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/usr/bin/mkdir -p /var/log/containers
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/run/kubelet-pod.uuid
ExecStart=/usr/lib/coreos/kubelet-wrapper \
  --api-servers=http://c0.int.intra:8080 \
  --cni-conf-dir=/etc/kubernetes/cni/net.d \
  --container-runtime=docker \
  --register-node=true \
  --allow-privileged=true \
  --pod-manifest-path=/etc/kubernetes/manifests \
  --hostname-override=@@IP@@ \
  --cluster_dns=10.3.0.10 \
  --cluster_domain=cluster.local \
  --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml \
  --tls-cert-file=/etc/kubernetes/ssl/worker.pem \
  --tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/run/kubelet-pod.uuid
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

I will name this template kubelet.service.w on the tmp dir of the admin node

cat kubelet.service.w | dsh -g kw -i -c 'export IP=`ip addr show eth0 | grep -Po "inet \K[\d.]+"`; sed s/@@IP@@/${IP}/g | sudo tee /etc/systemd/system/kubelet.service'

Set Up the kube-proxy Pod

create a kube-proxy.yaml.wk on the admin noe :

apiVersion: v1
kind: Pod
metadata:
  name: kube-proxy
  namespace: kube-system
spec:
  hostNetwork: true
  containers:
  - name: kube-proxy
    image: quay.io/coreos/hyperkube:v1.6.1_coreos.0
    command:
    - /hyperkube
    - proxy
    - --master=http://c0.int.intra:8080
    - --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml
    securityContext:
      privileged: true
    volumeMounts:
    - mountPath: /etc/ssl/certs
      name: "ssl-certs"
    - mountPath: /etc/kubernetes/worker-kubeconfig.yaml
      name: "kubeconfig"
      readOnly: true
    - mountPath: /etc/kubernetes/ssl
      name: "etc-kube-ssl"
      readOnly: true
  volumes:
  - name: "ssl-certs"
    hostPath:
      path: "/usr/share/ca-certificates"
  - name: "kubeconfig"
    hostPath:
      path: "/etc/kubernetes/worker-kubeconfig.yaml"
  - name: "etc-kube-ssl"
    hostPath:
      path: "/etc/kubernetes/ssl"

and push it on all worker nodes

dsh -g kw 'sudo mkdir -p /etc/kubernetes/manifests/'
cat kube-proxy.yaml.w | dsh -g kw -i -c 'sudo tee /etc/kubernetes/manifests/kube-proxy.yaml'

Set Up kubeconfig

/etc/kubernetes/worker-kubeconfig.yaml

Again, file to create on the admin node :

apiVersion: v1
kind: Config
clusters:
- name: local
  cluster:
    certificate-authority: /etc/kubernetes/ssl/ca.pem
users:
- name: kubelet
  user:
    client-certificate: /etc/kubernetes/ssl/worker.pem
    client-key: /etc/kubernetes/ssl/worker-key.pem
contexts:
- context:
    cluster: local
    user: kubelet
  name: kubelet-context
current-context: kubelet-context

and push :

cat worker-kubeconfig.yaml | dsh -g kw -i -c 'sudo tee /etc/kubernetes/worker-kubeconfig.yaml'

Start the worker services

dsh -g kw "sudo systemctl daemon-reload"

dsh -g kw "sudo systemctl start flanneld"
dsh -g kw "sudo systemctl start kubelet"

dsh -g kw "sudo systemctl enable flanneld"
dsh -g kw "sudo systemctl enable kubelet"

configure kubectl (kubernetes management tool)

On the admin node (where you want, it’s a simple binary), you install kubectl

curl -O https://storage.googleapis.com/kubernetes-release/release/v1.6.1/bin/linux/amd64/kubectl
chmod +x kubectl

or as the doc suggests, put the binary in a directory included in your PATH env variable, for ex :

mv kubectl /usr/local/bin/kubectl

export MASTER_HOST=kubernetes.int.intra
export CA_CERT=/home/core/kube-ssl/ca.pem
export ADMIN_KEY=/home/core/kube-ssl/apiserver-key.pem 
export ADMIN_CERT=/home/core/kube-ssl/apiserver.pem 

kubectl config set-cluster default-cluster --server=https://${MASTER_HOST} --certificate-authority=${CA_CERT}
kubectl config set-credentials default-admin --certificate-authority=${CA_CERT} --client-key=${ADMIN_KEY} --client-certificate=${ADMIN_CERT}
kubectl config set-context default-system --cluster=default-cluster --user=default-admin
kubectl config use-context default-system

Final checks

[core@admin ~]$ kubectl get nodes
NAME         STATUS                     AGE       VERSION
10.0.1.100   Ready,SchedulingDisabled   1h        v1.6.1+coreos.0
10.0.1.101   Ready                      3m        v1.6.1+coreos.0
10.0.1.102   Ready                      2m        v1.6.1+coreos.0

Got it to work…

addons

DNS service

apiVersion: v1
kind: Service
metadata:
  name: kube-dns
  namespace: kube-system
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: "KubeDNS"
spec:
  selector:
    k8s-app: kube-dns
  clusterIP: 10.3.0.10
  ports:
  - name: dns
    port: 53
    protocol: UDP
  - name: dns-tcp
    port: 53
    protocol: TCP


---


apiVersion: v1
kind: ReplicationController
metadata:
  name: kube-dns-v20
  namespace: kube-system
  labels:
    k8s-app: kube-dns
    version: v20
    kubernetes.io/cluster-service: "true"
spec:
  replicas: 1
  selector:
    k8s-app: kube-dns
    version: v20
  template:
    metadata:
      labels:
        k8s-app: kube-dns
        version: v20
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
        scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly", "operator":"Exists"}]'
    spec:
      containers:
      - name: kubedns
        image: gcr.io/google_containers/kubedns-amd64:1.8
        resources:
          limits:
            memory: 170Mi
          requests:
            cpu: 100m
            memory: 70Mi
        livenessProbe:
          httpGet:
            path: /healthz-kubedns
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 60
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 5
        readinessProbe:
          httpGet:
            path: /readiness
            port: 8081
            scheme: HTTP
          initialDelaySeconds: 3
          timeoutSeconds: 5
        args:
        - --domain=cluster.local.
        - --dns-port=10053
        ports:
        - containerPort: 10053
          name: dns-local
          protocol: UDP
        - containerPort: 10053
          name: dns-tcp-local
          protocol: TCP
      - name: dnsmasq
        image: gcr.io/google_containers/kube-dnsmasq-amd64:1.4
        livenessProbe:
          httpGet:
            path: /healthz-dnsmasq
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 60
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 5
        args:
        - --cache-size=1000
        - --no-resolv
        - --server=127.0.0.1#10053
        - --log-facility=-
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
      - name: healthz
        image: gcr.io/google_containers/exechealthz-amd64:1.2
        resources:
          limits:
            memory: 50Mi
          requests:
            cpu: 10m
            memory: 50Mi
        args:
        - --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
        - --url=/healthz-dnsmasq
        - --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
        - --url=/healthz-kubedns
        - --port=8080
        - --quiet
        ports:
        - containerPort: 8080
          protocol: TCP
      dnsPolicy: Default

import this service into kubernetes

[core@admin tmp]$ kubectl create -f dns-addon.yml 
service "kube-dns" created
replicationcontroller "kube-dns-v20" created

Dashboard !

on the admin node, where kubectl is available, create the two following files :

kube-dashboard-rc.yaml

apiVersion: v1
kind: ReplicationController
metadata:
  name: kubernetes-dashboard-v1.6.0
  namespace: kube-system
  labels:
    k8s-app: kubernetes-dashboard
    version: v1.6.0
    kubernetes.io/cluster-service: "true"
spec:
  replicas: 1
  selector:
    k8s-app: kubernetes-dashboard
  template:
    metadata:
      labels:
        k8s-app: kubernetes-dashboard
        version: v1.6.0
        kubernetes.io/cluster-service: "true"
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
        scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly", "operator":"Exists"}]'
    spec:
      containers:
      - name: kubernetes-dashboard
        image: gcr.io/google_containers/kubernetes-dashboard-amd64:v1.6.0
        resources:
          limits:
            cpu: 100m
            memory: 50Mi
          requests:
            cpu: 100m
            memory: 50Mi
        ports:
        - containerPort: 9090
        livenessProbe:
          httpGet:
            path: /
            port: 9090
          initialDelaySeconds: 30
          timeoutSeconds: 30

kube-dashboard-svc.yaml

apiVersion: v1
kind: Service
metadata:
  name: kubernetes-dashboard
  namespace: kube-system
  labels:
    k8s-app: kubernetes-dashboard
    kubernetes.io/cluster-service: "true"
spec:
  selector:
    k8s-app: kubernetes-dashboard
  ports:
  - port: 80
    targetPort: 9090

Then import the services into the cluster :

kubectl create -f kube-dashboard-rc.yaml
kubectl create -f kube-dashboard-svc.yaml
kubectl get pods --namespace=kube-system

You can access the dashboard like this :

kubectl port-forward kubernetes-dashboard-v1.6.0-SOME-ID 9090 --namespace=kube-system

result

Test your cluster

Flannel

We will test the communication between containers

First, deploy a simple app

kubectl run hello-world --replicas=2 --labels="run=load-balancer-example" --image=gcr.io/google-samples/node-hello:1.0  --port=8080

Be patient, check the progress :

[core@admin tmp]$  kubectl get pods  --all-namespaces
NAMESPACE     NAME                                 READY     STATUS              RESTARTS   AGE
default       hello-world-3272482377-15dng         0/1       ContainerCreating   0          3s
default       hello-world-3272482377-nmkcm         0/1       ContainerCreating   0          3s
default       nuxeo-1718889206-h8wlf               1/1       Running             1          21h
kube-system   kube-apiserver-10.0.1.100            1/1       Running             0          22h
kube-system   kube-controller-manager-10.0.1.100   1/1       Running             0          22h
kube-system   kube-dns-v20-hp25x                   3/3       Running             3          21h
kube-system   kube-proxy-10.0.1.100                1/1       Running             0          22h
kube-system   kube-proxy-10.0.1.101                1/1       Running             1          21h
kube-system   kube-proxy-10.0.1.102                1/1       Running             1          21h
kube-system   kube-scheduler-10.0.1.100            1/1       Running             0          22h
kube-system   kubernetes-dashboard-v1.6.0-46x5q    1/1       Running             1          21h

until the two pods are reported as running.

We will enter a pod on host c1, enter a pod on host c2 and foreach pod, ping the other one :

On the first host c1 :

run “docker ps” to find the id of the nginx container. for me, it’s e8ae791a00c2

c1 ~ # docker exec -ti e8ae791a00c2 bash
root@hello-world-3272482377-15dng:/# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
11: eth0@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:0a:02:61:03 brd ff:ff:ff:ff:ff:ff
    inet 10.2.97.3/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:aff:fe02:6103/64 scope link 
       valid_lft forever preferred_lft forever

On the second host c2 :

c2 ~ # docker exec -ti 54dbf976dbc6 bash
root@hello-world-3272482377-nmkcm:/# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
15: eth0@if16: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:0a:02:38:04 brd ff:ff:ff:ff:ff:ff
    inet 10.2.56.4/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:aff:fe02:3804/64 scope link 
       valid_lft forever preferred_lft forever
root@hello-world-3272482377-nmkcm:/# ping 10.2.97.3
PING 10.2.97.3 (10.2.97.3): 56 data bytes
64 bytes from 10.2.97.3: icmp_seq=0 ttl=62 time=1.960 ms
64 bytes from 10.2.97.3: icmp_seq=1 ttl=62 time=0.970 ms
64 bytes from 10.2.97.3: icmp_seq=2 ttl=62 time=0.957 ms

On c1 :

root@hello-world-3272482377-15dng:/# ping 10.2.56.4
PING 10.2.56.4 (10.2.56.4): 56 data bytes
64 bytes from 10.2.56.4: icmp_seq=0 ttl=62 time=0.887 ms
64 bytes from 10.2.56.4: icmp_seq=1 ttl=62 time=0.770 ms

You are ready to host your apps.

Enjoy ;)

CoreOS cluster, etcd, a base for Kubernetes

Abstract

In this post, we will install the underlying requirements for a container cluster.

You can use several popular OSes, but if you want to save time, you will want to choose a lightweight OS, like DCOS, Atomic Host, or CoreOS.

In this example, we will choose CoreOS, a well known OS used for this purpose. A very simple os to install, embedding all the technologies for running a container cluster (etcd, ..). Well designed for openstack, it uses some mechanisms from it (user_data description files for lcoud-init..)

Planning your installation

You will need to design your network infrastructure before running the following steps.

For my needs, I will partition a /16 class A ipv4 network:

Network subnet and “range” Used for ..
10.0.0.0/16 Terminals network (laptops, …)
10.0.1.0/16 Servers and network appliances/routers
10.2.0.0/16 Container PODs network
10.3.0.0/24 Service IP range

Requirements (for this setup)

For this setup only because there are lots of choices that can be made, depending of your needs.

My needs : I’m at home, I want to understand, learn, and nevertheless use it my “home production” environnement to host applications (ECM, Mail, …), data, and secure them.

My choices :

  • 3 KVM Hosts, with Centos 7.3 64bit, see my datacenter in two boxes
    3 for computing nodes
    2 for storage nodes

    • First host : 16GB RAM Intel NUC, 120 GB SSD Storage
    • Second host : 64 GB RAM supermicro X10SDV 8 cores ITX, .. 25 TB storage (54 + 51)
    • Third host : 32 GB RAM supermicro X10SDV 4 cores ITX, .. 13 TB storage (25 + 31)
  • Shared storage

    • easy with Ceph. Following this link.

All the software will run in virtual machines on top of that bare metal hosts

IMPORTANT NOTES :

  • I have no special need at home about strong security, so i will install thi cluster without SSL/TLS mechanisms (as far as i can). Note that it’s strongly recommended to use encryption mechanisms in a real production environment, the links provided in this document can lead you to this kind of setup
  • For this example, I do not need two separate networks (public/private). Again, it’s not required at home, but in production environments, it’s strongly recommended.

Install the OSes

We will install one CoreOS VM node on each KVM host at first

The name of my hosts are hyp01,hyp02,hyp03.

The CoreOS part is inspired from another blog

CoreOS deployment

Create the environment

First, create a subdirectory in the directory used to store your VMs.

In my hypervisors, this directory is /mnt/images/libvirt/

Create it under the base directory

mkdir /mnt/images/libvirt/coreos

Download the current production image of CoreOS and name its image for kvm/qemu :

wget https://stable.release.core-os.net/amd64-usr/current/coreos_production_qemu_image.img.bz2 -O - | bzcat > coreos.img

CoreOS use some openstack mechanisms for its deployment process. Download the configuration template

wget https://gist.githubusercontent.com/dutchiechris/e5689a526d4651f6dec4880230fe3c89/raw/4576b956ab00430e644917f17080f4ccfd408615/user_data

Edit this file to follow your needs. Here are mine :

#cloud-config

ssh_authorized_keys:
 - "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDjgUAV27yJT6YRxzDfGiDB4fwZwmx7EWzcZU3LXRWjaaSgvlizQtHwG8OCJYQN0aG29CTQNgJs+EY40/VQyeidVOdVmaClzmVSMruB68msEuvMrz5DA/v1FXVrYJCAyy+3l719DI9eA++nYyDo//LEj5cf7/4Xcs+12o4ADCJzYMNXazQ8f/1d3EPqJhfcuL+spehCYzDGCYyDDGeIsUaZYnWdOY4z4+2wWtd++9WBfCrVE2g8I3k0+U0iVEM1tZXfrvIzj+fw17zs/FNuzAunQRAIagqUcIswc9aPJbN1H9W31N6gX1X5IV/SvqPl39QXV/wgXtc9M+0oEKJQOvOj core"

hostname: "@@VMNAME@@"

users:
  - name: "user"
    passwd: "$6$D6SWuIMCinNlXP4b$QXnT3RVvPwHcR6ElHP1hWRBxq03gA/TbM1iMKRz3NukT7AYGGf3uSGC9WIkBI1s0GlQ9a1wWUvil.e/OBu6ax/"
    groups:
      - "sudo"

write_files:
  - path: /etc/systemd/resolved.conf
    permissions: 0644
    owner: root
    content: |
      #  This file is part of systemd.
      #
      #  systemd is free software; you can redistribute it and/or modify it
      #  under the terms of the GNU Lesser General Public License as published by
      #  the Free Software Foundation; either version 2.1 of the License, or
      #  (at your option) any later version.
      #
      # Entries in this file show the compile time defaults.
      # You can change settings by editing this file.
      # Defaults can be restored by simply deleting this file.
      #
      # See resolved.conf(5) for details

      [Resolve]
      #DNS=
      #FallbackDNS=
      Domains=int.intra
      #LLMNR=yes
      #DNSSEC=allow-downgrade
      #Cache=yes
      #DNSStubListener=udp

coreos:
  update:
    reboot-strategy: off
  units:
    - name: systemd-networkd.service
      command: stop
    - name: static.network
      runtime: true
      content: |
        [Match]
        Name=eth0

        [Network]
        Address=@@IP@@/16
        Gateway=10.0.0.1
        DNS=10.0.0.1
    - name: down-interfaces.service
      command: start
      content: |
        [Service]
        Type=oneshot
        ExecStart=/usr/bin/ip link set eth0 down
        ExecStart=/usr/bin/ip addr flush dev eth0
    - name: systemd-networkd.service
      command: restart
  • The first line is REQUIRED for CoreOS to recognize a configuration file
  • The ssh_authorized_keys is the pub key of the user you would like to use when connecting to your nodes. I’ve defined this user (“core”) in an admin node, ran the command ssh-keygen and got this key in .ssh/id_rsa.pub
  • the hash of the password is generated with the help of a little python command
    python -c ‘import crypt,getpass; print(crypt.crypt(getpass.getpass(), crypt.mksalt(crypt.METHOD_SHA512)))’

Finally, download this script, thanks to the blog author, that will automate some things (virt-install..)

wget https://gist.githubusercontent.com/dutchiechris/98ba718278cca3c1b7ecbc546aad22f0/raw/a9b3707fd1fc6054b0e23ae308b473c0bec08fd1/manage.py chmod 755 manage.py

Edit this script to suit your needs (my coreos nodes will have 2 CPU and 2 4096MB of ram, you have to specify it in the script)

DNS records

Make the life more easy : declare the 3 nodes in your DNS, so that the script manage.py will be entirely automated.
It will search the DNS for the IPs to grant to the CoreOS hosts.

CoreOS deployment

For my example, I will run the following command on each of my kvm hosts.
Example follows for the 2nd one : the prefix of the host is “c” for CoreOS or Container, I start at 2 for 1 host, the name will be c2, and the script automatically gets the IP from the DNS according to the previous subchapter.

[root@hyp02 coreos]# ./manage.py --action create --start 2 --count 1 --prefix c --ip dns
INFO: c2: Creating...
INFO: c2: Using IP 192.168.10.192
INFO: c2: Creating cloud-config
INFO: c2: Creating ISO image of cloud-config
INFO: c2: Copying master boot image for VM

Starting install...
Creating domain...                                                                                                                                                                                                |    0 B  00:00:00     
Domain creation completed.
INFO: c2: VM created and booting...

Several seconds (not minutes..) later, you will be able to connect to your nodes with the user you declared before, with SSH, without any password thanks to your SSH public key

[core@admin .ssh]$ ssh core@c2
The authenticity of host 'c2 (192.168.10.192)' can't be established.
ECDSA key fingerprint is b2:18:42:8c:94:3f:c2:4d:29:08:e6:7e:44:9b:7f:1e.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'c2,192.168.10.192' (ECDSA) to the list of known hosts.
Container Linux by CoreOS stable (1353.8.0)
Update Strategy: No Reboots
core@c2 ~ $ sudo sh
sh-4.3#

If you want to debug the cloud-init process, you can use this command :
journalctl –identifier=coreos-cloudinit

Done ! And done quickly …

Add ETCD to your config.

Once you have 3 running CoreOS nodes, you will have to deploy ETCD.

It’s strongly recommended to automate the installation. -> We will use the cloud config file (user_data) we have just created to deploy our CoreOS nodes.

… Yes ! you have to destroy the three nodes we’ve just created, but remember : it’s done and done quickly :

Destroy your VMS

On all nodes : (using my second KVM host for the example)

[root@hyp02 coreos]# cd /mnt/images/libvirt/coreos
[root@hyp02 coreos]# virsh destroy c2
[root@hyp02 coreos]# virsh undefine c2
[root@hyp02 coreos]# rm -Rf ../c2

It was really quickly done uh ?

configure ETCD discovery

I will use DNS for ETCD discovery (service that populates the nodes) instead of a web service. Look here

I will define this three hosts :

(I use bind on openwrt)

10.0/16 rev zone :
100.1.0.10.in-addr.arpa. IN PTR c0.int.intra.
101.1.0.10.in-addr.arpa. IN PTR c1.int.intra.
102.1.0.10.in-addr.arpa. IN PTR c2.int.intra.

main zone :
c0 A 10.0.1.100
c1 A 10.0.1.101
c2 A 10.0.1.102

Add SRV records :

`_etcd-server._tcp.int.intra. 300 IN  SRV  0 0 2380 c0.int.intra.
_etcd-server._tcp.int.intra. 300 IN  SRV  0 0 2380 c1.int.intra.
_etcd-server._tcp.int.intra. 300 IN  SRV  0 0 2380 c2.int.intra.
_etcd-client._tcp.int.intra. 300 IN SRV 0 0 2379 c0.int.intra.
_etcd-client._tcp.int.intra. 300 IN SRV 0 0 2379 c1.int.intra.
_etcd-client._tcp.int.intra. 300 IN SRV 0 0 2379 c2.int.intra.

And test :

[core@admin ~]$ dig +noall +answer SRV _etcd-client._tcp.int.intra
_etcd-client._tcp.int.intra. 300 IN    SRV    0 0 2379 c1.int.intra.
_etcd-client._tcp.int.intra. 300 IN    SRV    0 0 2379 c0.int.intra.
_etcd-client._tcp.int.intra. 300 IN    SRV    0 0 2379 c2.int.intra.
[core@admin ~]$ dig +noall +answer SRV _etcd-server._tcp.int.intra
_etcd-server._tcp.int.intra. 300 IN    SRV    0 0 2380 c0.int.intra.
_etcd-server._tcp.int.intra. 300 IN    SRV    0 0 2380 c1.int.intra.
_etcd-server._tcp.int.intra. 300 IN    SRV    0 0 2380 c2.int.intra.
[core@admin ~]$``

Bootstrap your coreos/etcdv3 cluster

Hmm, I’m writing this part of the job 2 days after beginning working on that. The documentation on coreos.com is not very “steb by step” like, and this post will be useful for you if you want to go fast and not loose your time.

This is the steps I followed. You can go directly to the last item (or etcd v3 in RKT container engine?) but knowing about the other options can be useful to U.

Another explanation : i’ve not sufficiently read and understood ( ;) ) all the documentation to get the steps required to get a cluster quickly
But I’m not alone.. : https://github.com/coreos/etcd/issues/7436

Use puppet ?

Declare an etcd class in your puppetmaster configuration

You already know how to do that

include etcd

class { 'etcd':
  ensure                     => 'latest',
  listen_client_urls          => 'http://0.0.0.0:2379',
  advertise_client_urls       => "http://${::fqdn}:2379,http://127.0.0.1:2379",
  listen_peer_urls            => 'http://0.0.0.0:2380',
  initial_advertise_peer_urls => "http://${::fqdn}:2380,http://127.0.0.1:2379",
  initial_cluster             => [
    "${::hostname}=http://${::fqdn}:2380",
    'c0=http://c0.int.intra:2380',
    'c1=http://c1.int.intra:2380',
    'c2=http://c2.int.intra:2380'
  ],
}

…but unfortunately, the existing documentation is not as complete as you might want.. i’ve spent 3 hours debugging the puppet docker agent, and finally found he advertised as OS “Linux” and not Redhat or Debian. the class etcd expect Redhat or Debian, on my puppetmaster (existing puppet master). Perhaps the agent must be used with the docker-based puppetmaster, i will test it later.

For the moment, I choosed a simplest way … see next subchapter..

or etcd v3 in RKT container engine…

ETCD v3 is the last version of this distributed registry, it’s supported by Kubernetes 1.6 and later.

Although CoreOS is distributed with etc and etc2, they might disappear at the beginning of 2018.

CoreOS choosed to embed ETCD v3 in RKT containers (not docker by default). It’s easy, but the step by step procedure is too much strong to find. The current documentation do not lead you to the result as quick as it should.

Please follow these steps to get a fully functionnal 3 nodes coreos/etcd cluster :

Enabling RKT to use DNS..

I’ve had to follow a lot of web pages to get it work :

3 hours later, one had found a cloud config file that does the trick :-) :

#cloud-config

ssh_authorized_keys:
 - "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDjgUAV27yJT6YRxzDfGiDB4fwZwmx7EWzcZU3LXRWjaaSgvlizQtHwG8OCJ    YQN0aG29CTQNgJs.....................v1FXVrYJCAyy+3l719DI9eA++nYyDo//LEj5cf7/4Xcs+12o4ADCJzYMNXazQ8f/1d3EPqJhfcuL+spehCYzDGCYyDDGeIsUaZYnWdOY4z4+2wWtd++9WBfCrVE2g8I3k0+U0iVEM1tZXfrvIzj+fw17zs/FNuzAunQRAIagqUcIswc9aPJbN1H9W31N6gX1X5IV/SvqPl39QXV/wgXtc9M+0oEKJQOvOj core"

hostname: "@@VMNAME@@"

users:
  - name: "user"
    passwd: "$6$D6SWuIMCinNlXP4b$QXnT3RVvPwHcR6ElHP1hWRBxq03gA/TbM1iMKRz3NukT7...a1wWUvil.e/OBu6ax/"
groups:
  - "sudo"

write_files:
  - path: /etc/systemd/resolved.conf
    permissions: 0644
    owner: root
    content: |
      #  This file is part of systemd.
      #
      #  systemd is free software; you can redistribute it and/or modify it
      #  under the terms of the GNU Lesser General Public License as published by
      #  the Free Software Foundation; either version 2.1 of the License, or
      #  (at your option) any later version.
      #
      # Entries in this file show the compile time defaults.
      # You can change settings by editing this file.
      # Defaults can be restored by simply deleting this file.
      #
      # See resolved.conf(5) for details

      [Resolve]
      #DNS=
      #FallbackDNS=
      Domains=int.intra
      #LLMNR=yes
      #DNSSEC=allow-downgrade
      #Cache=yes
      #DNSStubListener=udp

write_files:
  - path: /etc/systemd/system/etcd-member.service.d/override.conf
    permissions: 0644
    owner: root
    content: |
      [Service]
      Environment="ETCD_IMAGE_TAG=v3.2.0"
      Environment="ETCD_DATA_DIR=/var/lib/etcd"
      Environment="ETCD_SSL_DIR=/etc/ssl/certs"
      Environment="ETCD_OPTS=--name=@@VMNAME@@ --listen-client-urls=http://@@IP@@:2379    --advertise-client-urls=http://@@IP@@:2379 --listen-peer-urls=http://@@IP@@:2380     --initial-advertise-peer-urls=http://@@IP@@:2380 --initial-cluster-token=my-etcd-token --auto-compaction-retention=1 --discovery-srv=int.intra"

coreos:
  update:
    reboot-strategy: off
  units:
    - name: systemd-networkd.service
      command: stop
    - name: static.network
      runtime: true
  content: |
    [Match]
    Name=eth0

        [Network]
        Address=@@IP@@/16
        Gateway=10.0.0.1
        DNS=10.0.0.1
    - name: down-interfaces.service
      command: start
      content: |
        [Service]
        Type=oneshot
        ExecStart=/usr/bin/ip link set eth0 down
        ExecStart=/usr/bin/ip addr flush dev eth0
    - name: systemd-networkd.service
      command: restart
    - name: systemd-resolved.service
      command: restart
    - name: etcd-member.service
      command: start

Please take a look at all the config options of my user_data cloud-config file.
Replace any of my option with yours (int.intra is y internal DNS name, the IP are mine… )

Now just start your three nodes and hop.

Check your new ETCD v3 cluster :

You have to het access to a etcdctl binary. you can build it and run outside the cluster, or use the running containers.

For the example, i will run the embedded etcdctl of my brand new containers :

/ # c0 ~ # rkt list
UUID        APP    IMAGE NAME            STATE    CREATED        STARTED        NETWORKS
21c1654b    etcd    quay.io/coreos/etcd:v3.2.0    running    23 minutes ago    23 minutes ago
c0 ~ # rkt enter 21c1654b /bin/sh
/ #

And issue these commands :

c0 ~ # rkt enter 21c1654b /bin/sh
/ # export ETCDCTL_API=3 ; export ETCDCTL_ENDPOINTS=http://c0:2379;     /usr/local/bin/etcdctl member list
65ac5f05513b1262, started, c1, http://c1.int.intra:2380, http://10.0.1.101:2379
6c2733a4aad35dac, started, c0, http://c0.int.intra:2380, http://10.0.1.100:2379
84d5ec72bbd714d2, started, c2, http://c2.int.intra:2380, http://10.0.1.102:2379
/ # export ETCDCTL_API=3 ; export ETCDCTL_ENDPOINTS=http://c0:2379; /usr/local/bin/etcdctl put foo bar
OK
/ # export ETCDCTL_API=3 ; export ETCDCTL_ENDPOINTS=http://c0:2379; /usr/local/bin/etcdctl get foo
foo
bar
/ #

Verifying on another node…

/ #  export ETCDCTL_API=3 ; export ETCDCTL_ENDPOINTS=http://c2:2379; /usr/local/bin/etcdctl get foo
foo
bar

Good..

/ # export ETCDCTL_API=3 ; export ETCDCTL_ENDPOINTS=http://c0:2379; /usr/local/bin/etcdctl check perf
 60 / 60 Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00%1m0s
PASS: Throughput is 150 writes/s
PASS: Slowest request took 0.264878s
PASS: Stddev is 0.033646s
PASS

Ready for Kubernetes !

Homelab : Resilient and low power "datacenter-in-two-boxes" with X10SDV Mini ITX Xeon D 1540/1518, Centos 7.3, Ceph Jewel, ZFS, KVM/Qemu and OpenvSwitch

In this article you will get an overview on how to build a powerful “home lab/datacenter” based on “very cool” open source technologies, in a very reduced footprint : two network racks, 180W power max.

The technical specs are as follows :

  • 3 compute nodes

  • 256GB RAM max capacity (96GB for now)

  • 40 TB of raw storage (30 usable), including redundancy and backups

  • 10GBE communication between nodes (performance for storage reasons and backups

  • all this services for less than 200W, please. :)

for the main following goals :

Main goals

  • highly resilient data and capacities (definitely, I don’t want to loose my family photos and videos, my work archives, my working environment and all administrative stuff about my family)

  • small power footprint, because power is expensive for me (and the planet)

  • Hosting all mails, data (photos, videos, ..)

  • Hosting home automation systems

  • Manage telecom devices (IP phones, LTE and DSL internet access)

Hardware overview

See original image

Overall map

See original image

Computing and storage :

First node (compute/storage) :

Hardware

Mini ITX Chassis : In Win MS04.265P.ATA, 4 bay hot swap

Supermicro X10SDV-TLNF 8-core/16 threads Xeon D 1540 Mini ITX (TDP = 45W)

See original image

64GB RAM DDR4 ECC

256 GB NVMe SSD SM951 Samsung

2x 1Gb/s “Intel I350 Gigabit”

2x 10GBE “Intel Ethernet Connection X552/X557-AT 10GBASE-T”

Storage Shelf

U-NAS 8 Hard drives (5x4TB WD RED) + others

See original image

Software

Centos 7.3

ZFS, Ceph, KVM, OpenvSwitch

Second node (compute/storage) :

Hardware

Mini ITX Chassis : In Win MS04.265P.SATA, 4 bay hot swap

Supermicro X10SDV-4C+-TLN2F 4 core Xeon D 1518 (TDP = 35W)

See original image

32GB RAM DDR4 ECC

2x 1Gb/s “Intel I350 Gigabit”

2x 10GBE “Intel Ethernet Connection X552/X557-AT 10GBASE-T”

4 hard drives (2x5TB WD GREEN + 1TB Hitachi + SSD)

Software

Centos 7.3

ZFS, Ceph (in VMSs), KVM, OpenvSwitch

Third node (compute : management, domotic software..) :

Hardware

See original image

Intel NUC5i3RYK

16 GB RAM DDR3

120GB SSD M.2. Kingston

Software

Centos 7.3

KVM, OpenvSwitch

Rack mount

All the hardware is located in a garage, in two smart network racks (the UPS is located under the racks, near the power outlet)

The two compute/storage nodes in their final location

result

Network, compute and storage rack :

result

KVM/qemu management

I used to manage virtual machines with virt-manager, but I’ve apple-oriented devices, and XQuartz is not a friend. I use wok an all nodes. Waiting for cloudstack, we’ll see in the future :

result

Network topology between the two main compute nodes : bandwidth optimisation and loop management with RSTP

 

By having two great supermicro nodes embedding four 10GE network cards, you may want that

1. all the traffic to be routed (level 2) in the 10GB channels (for perf reasons : backup, data transfer, traffic between storage -ceph- nodes…)

2. when a node is shut down (for any reason), all the traffic being routed by the 1 Gbps network card.

To do that, your switch must support RSTP (or other mechanisms, but RSTP works well, is supported in entry levels network switchs, and supported by openvswitch)

Here is my network topology :

Home Datacenter slide 1.jpg

#

Configuration for vs0 (the same for the two nodes) :

Create the switch

ovs-vsctl add-br vs0

Add the 1Gb/s port

ovs-vsctl add-port vs0 eno1
ovs-vsctl set Bridge vs0 rstp_enable=true
ovs-vsctl set Port eno4 other_config:rstp-port-priority=32
ovs-vsctl set Port eno4 other_config:rstp-path-cost=150

Then, when RSTP is configured (not before :) ), add the 10 Gb/s port

ovs-vsctl add-port vs0 eno3

This should work.

Finally, set the management ip for the internal port of the switch in /etc/sysconfig/network

[root@hyp02 ~]# cat /etc/sysconfig/network-scripts/ifcfg-vs0
DEVICE=vs0
ONBOOT=yes
DEVICETYPE=ovs
TYPE=OVSBridge
BOOTPROTO=static
HOTPLUG=no
IPADDR=192.168.10.75
GATEWAY=192.168.10.1
PREFIX=24
DNS1=192.168.10.1
DOMAIN=localdomain

First test snapshot : worked well, huh ?

root@hyp01 vol05]# iperf3 -c 192.168.11.2
Connecting to host 192.168.11.2, port 5201
[  4] local 192.168.11.1 port 59982 connected to 192.168.11.2 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  1.09 GBytes  9.35 Gbits/sec   39    656 KBytes
[  4]   1.00-2.00   sec  1.09 GBytes  9.39 Gbits/sec    0    663 KBytes
[  4]   2.00-3.00   sec  1.09 GBytes  9.40 Gbits/sec    0    686 KBytes
[  4]   3.00-4.00   sec  1.09 GBytes  9.34 Gbits/sec  635    447 KBytes
[  4]   4.00-5.00   sec  1.05 GBytes  9.00 Gbits/sec  144    691 KBytes
[  4]   5.00-6.00   sec  1.09 GBytes  9.36 Gbits/sec    0    707 KBytes
[  4]   6.00-7.00   sec  1.09 GBytes  9.40 Gbits/sec    0    723 KBytes
[  4]   7.00-8.00   sec  1.09 GBytes  9.38 Gbits/sec    0    754 KBytes
[  4]   8.00-9.00   sec  1.09 GBytes  9.35 Gbits/sec  270    632 KBytes
[  4]   9.00-10.00  sec  1.07 GBytes  9.16 Gbits/sec  176    635 KBytes
.........    
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  10.8 GBytes  9.31 Gbits/sec  1264             sender
[  4]   0.00-10.00  sec  10.8 GBytes  9.31 Gbits/sec                  receiver

iperf Done.
[root@hyp01 vol05]# iperf3 -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.11.2, port 33226
[  5] local 192.168.11.1 port 5201 connected to 192.168.11.2 port 33228
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec  1.04 GBytes  8.97 Gbits/sec
[  5]   1.00-2.00   sec  1.06 GBytes  9.13 Gbits/sec
[  5]   2.00-3.00   sec  1.07 GBytes  9.22 Gbits/sec
[  5]   3.00-4.00   sec  1.06 GBytes  9.10 Gbits/sec
[  5]   4.00-5.00   sec  1.07 GBytes  9.16 Gbits/sec
[  5]   5.00-6.00   sec  1.06 GBytes  9.15 Gbits/sec
[  5]   6.00-7.00   sec  1.08 GBytes  9.31 Gbits/sec
[  5]   7.00-8.00   sec  1.07 GBytes  9.22 Gbits/sec
[  5]   8.00-9.00   sec  1.05 GBytes  8.98 Gbits/sec
[  5]   9.00-10.00  sec  1.08 GBytes  9.31 Gbits/sec
[  5]  10.00-10.04  sec  40.6 MBytes  9.43 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-10.04  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-10.04  sec  10.7 GBytes  9.16 Gbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------

Done.

Ceph storage cluster

I wrote a post to get a ceph cluster up and running, monitored with grafana/influxdb

Here is my current config :

See original image

Monitoring

I use grafana/telegraf/collectdb to collect all data on all hosts, including internet bandwidth on openwrt..

Internet bandwith and the three hypervisors

See original image

UPS…

See original image

And Ceph ..

See original image

Costs

Here is the total amount of money (not time :) ) for building such a lab :

… working on it..

Low power/good performance Ceph (jewel) cluster monitored with grafana/influxdb/telegraf on Centos 7.3

 

The hardware..

In my Homelab : Highly resilient “datacenter-in-two-boxes�” with Centos 7 and Ceph jewel article, I’ve told how to build a low power homelab.

With this hardware, a bunch of low power disks (2,5 5400), you can build a low power virtualized storage system with Ceph, and store all your data with top-level NAS software

The software :

Centos 7.3 (1611) x86-64 “minimal”

Ceph “jewel” x86-64

Puppet (configuration management software)

Topology

Number of MONs

It’s recommended to install at least 3 MONs for resilience reasons.

For me needs, I will install 5 MONs on my 5 hosts.

In this example, all hosts are virtualized -> in my cas, I have 3 physical hosts (see other pages..). One is not able to get an osd (intel nuc) : in my final cluster map, there is 1 mon on my nuc, and 1 on each storage host. It will ensure that when a host is down, the quorum is satisfied and my ceph cluster is UP.

Installing the cluster

Preparing the hardware and the OS

Requirements :

This blog do not cover the OS installation procedure. Before you continue be sure to configure your OS with these additional requirements :

  • Use a correct DNS configuration or configure manually each /etc/hosts file of the hosts.
  • You will need at least 3 nodes, plus an admin node (for cluster deployment, monitoring, ..)
  • You MUST install NTP on all nodes :

    root@n0:~# yum install -y ntp
    root@n0:~# systemctl enable ntpd

then configure /etc/ntp.conf with your preferred NTP servers

You can also use systemd (search for chrony.conf config file)

It’s safe and more efficient to have a time source close to your cluster. Wifi AP, DSL routers often provide such services. My configuration uses my ADSL router, based on openWRT (you can setup ntpd on openwrt…)

Then run :

root@n0:~# timedatectl set-ntp true
  • disable SELINUX (see /etc/selinux/config)
  • disable your firewalld (systemctl disable firewalld.service)

Finally, ensure everything’s ok when rebooting your node..

Create the ceph admin user on each node :

On each node, create a ceph admin user (used for deployment tasks). It’s important to choose a different user than “ceph” (used by the ceph installer..)

Note : you can omit the -s directive of useradd, it’s a personal choice to use bash.

root@n0:~# sudo useradd -d /home/cephadm -m cephadm -s /bin/bash
root@n0:~# sudo passwd cephadm
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully

root@n0:~# `echo "cephadm ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/ceph`
root@n0:~# chmod 444 /etc/sudoers.d/ceph</pre>

…and so on to host admin, and nodes n1, n2 [and n3, …]

to automate this task, use dsh on the admin node, after having configured dsh for root (ssh-copy-id root@nodes)

echo "cephadm ALL = (root) NOPASSWD:ALL" | dsh -aM -i -c 'sudo tee /etc/sudoers.d/ceph'    
dsh -aM "chmod 444 /etc/sudoers.d/ceph"

Also, install lsb first, it will be useful later.

yum install redhat-lsb-core

Setup the ssh authentication with cryptographic keys

On the admin node :

Create the ssh key for the user cephadm

root@admin:~# su - cephadm
cephadm@admin:~$ ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/home/cephadm/.ssh/id_dsa):
Created directory '/home/cephadm/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/cephadm/.ssh/id_dsa.
Your public key has been saved in /home/cephadm/.ssh/id_dsa.pub.
The key fingerprint is:
ec:16:ad:b4:76:e4:32:c6:7c:14:45:bc:c3:78:5a:cf cephadm@admin
The key's randomart image is:
+---[DSA 1024]----+
|           oo    |
|           ..    |
|          .o .   |
|       . ...*    |
|        S ++ +   |
|       = B.   E  |
|        % +      |
|       + =       |
|                 |
+-----------------+
cephadm@admin:~$

Then push it on the nodes of the cluster:
[cephadm@admin ~]$ ssh-copy-id cephadm@n0
[cephadm@admin ~]$ ssh-copy-id cephadm@n1
[cephadm@admin ~]$ ssh-copy-id cephadm@n2
[cephadm@admin ~]$ ssh-copy-id cephadm@n3
[cephadm@admin ~]$ ssh-copy-id cephadm@n4

Or better to automate (if you do this a lot of times, :) ):

#!/bin/sh
# sudo yum install moreutils sshpass openssh-clients
echo 'Enter password:';
read -s SSHPASS;
export SSHPASS;
for i in {0..4}; do sshpass -e ssh-copy-id -o StrictHostKeyChecking=no cephadm@n$i.int.intra -p 22 ; done
export SSHPASS=''

Install and configure dsh (distributed shell)

[root@admin ~]# yum install -y gcc
[root@admin ~]# yum install -y gcc-c++
[root@admin ~]# yum install -y wget
[root@admin ~]# wget https://www.netfort.gr.jp/~dancer/software/downloads/dsh-0.25.9.tar.gz
[root@admin ~]# wget https://www.netfort.gr.jp/~dancer/software/downloads/libdshconfig-0.20.9.tar.gz
[root@admin ~]# tar xvfz libdshconfig-0.20.9.tar.gz
[root@admin ~]# cd libdshconfig-0.20.9
[root@admin libdshconfig-0.20.9]# ./configure
[root@admin libdshconfig-0.20.9]# make
[root@admin libdshconfig-0.20.9]# make install
[root@admin ~]# tar xvfz dsh-0.25.9.tar.gz
[root@admin ~]# cd dsh-0.25.9
[root@admin dsh-0.25.9]# ./configure
[root@admin dsh-0.25.9]# make
[root@admin dsh-0.25.9]# make install
[root@admin ~]# echo /usr/local/lib &gt; /etc/ld.so.conf.d/dsh.conf
[root@admin ~]# ldconfig

Done. Then configure it :
[root@admin ~]# vi /usr/local/etc/dsh.conf

insert these lines :

remoteshell =ssh
waitshell=1  # whether to wait for execution

Create the default machine list file

[root@admin ~]# su - cephadm
cephadm@admin:~$ cd
cephadm@admin:~$ mkdir .dsh
cephadm@admin:~$ cd .dsh
cephadm@admin:~/.dsh$ for i in {0..4} ; do echo "n$i" &gt;&gt; machines.list ; done

Test…

[cephadm@admin ~]$ dsh -aM uptime
n0:  16:23:21 up 3 min,  0 users,  load average: 0.20, 0.39, 0.20
n1:  16:23:22 up 3 min,  0 users,  load average: 0.19, 0.40, 0.21
n2:  16:23:23 up 3 min,  0 users,  load average: 0.13, 0.38, 0.20
n3:  16:23:24 up 4 min,  0 users,  load average: 0.00, 0.02, 0.02
n4:  16:23:25 up 3 min,  0 users,  load average: 0.24, 0.38, 0.20

Another test :
[cephadm@admin ~]$ dsh -aM cat /proc/cpuinfo | grep model\ name
n0: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n0: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n0: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n0: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n1: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n1: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n1: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n1: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n2: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n2: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n2: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n2: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n3: model name : Intel Core Processor (Broadwell)
n3: model name : Intel Core Processor (Broadwell)
n3: model name : Intel Core Processor (Broadwell)
n3: model name : Intel Core Processor (Broadwell)
n4: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n4: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n4: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n4: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz

Good.. !!

Now you’re ready to install your cluster with automated commands from your admin node. Not than several other solutions are good enough, like cssh (clustered…). Choose the best for your needs ;)

Well,

Now I’m assuming you have followed the installation procedure and the requirements above :).

Here’s my configuration :

n0 : 192.168.10.210/24 1TB HGST 2.5” 5400rpm (data) + 20Gb on SM951 NVMe SSD (journal)
n1 : 192.168.10.211/24 1TB HGST 2.5” 5400rpm (data) + 20Gb on SM951 NVMe SSD (journal)
n2 : 192.168.10.212/24 1TB HGST 2.5” 5400rpm (data) + 20Gb on Crucial MX200 SSD (journal)
n3 : 192.168.10.213/24 1TB WD Red 2.5” 5400rpm (data) + 20Gb on SM951 NVMe SSD (journal)
n4 : 192.168.10.214/24 1TB Hitachi 3.5” 7200rpm (data) + 20Gb on Crucial MX100 SSD (journal)
n5 : 192.168.10.215/24 1TB ZFS (on 2x WD Green 5TB) + 20Gb on Crucial MX100 SSD (journal)
admin : 192.168.10.177/24 (VM)

Finally, don’t forget to change your yum repositories if you installed the OSes with a local media. They should now point to a mirror for all updates (security and software).

Reboot your nodes if you wan to be very sure you haven’t forget anything, and test them with dsh, for example for ntp :

[cephadm@admin ~]$ dsh -aM timedatectl|grep NTP
[cephadm@admin ~]$ dsh -aM timedatectl status|grep NTP
n0:      NTP enabled: yes
n0: NTP synchronized: yes
n1:      NTP enabled: yes
n1: NTP synchronized: yes
n2:      NTP enabled: yes
n2: NTP synchronized: yes
n3:      NTP enabled: yes
n3: NTP synchronized: yes
n4:      NTP enabled: yes
n4: NTP synchronized: yes
...

Install your Ceph cluster

Get the software

Ensure you are up to date on each nodes, at the very beginning of this procedure.

Feel free to use dsh from the admin node for each task you would like to apply to the nodes ;)

[cephadm@admin ~]$ dsh -aM "sudo yum -y upgrade"

Install the repos

On the admin node only, configure the ceph repos.

You have the choice : either do it like this if you want to download Ceph packages from the internet :

[cephadm@admin ~]$ sudo yum install https://download.ceph.com/rpm-jewel/el7/noarch/ceph-release-1-1.el7.noarch.rpm

Or if you want a local mirror, look at the section below telling how to setup Puppet (for example) to do that. I prefer this option for myself, because I have a local mirror (for testing purposes, it’s better to download locally)

Install Ceph-deploy

This tool is written in Python.

[cephadm@admin ~]$ sudo yum install ceph-deploy

Install Ceph

Always on the admin node, create a directory that will contain all the config for your cluster :

cephadm@admin:~$ mkdir cluster
cephadm@admin:~$ cd cluster

I have chosen to install 4 monitors (3 would be sufficient at home, but my needs isn’t your needs).

cephadm@admin:~/cluster$ ceph-deploy new n{0,2,4,5}

(It generates a lot of stdout messages)

Now edit ceph.conf (in the “cluster” directory) and tell ceph you want to shard x3, and add the cluster and public networks in the [global] section ; for myself : 10.0.0.0/8 and 192.168.10.0/24

The file ceph.conf should contain the following lines now :

[global]
fsid = 74a80a50-b7f9-4588-baa4-bb242c3d4cf0
mon_initial_members = n0, n1, n3
mon_host = 192.168.10.210,192.168.10.211,192.168.10.213
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
**osd pool default size = 3**
**cluster network = 10.1.1.0/24**
**public network = 192.168.10.0/24**

[osd]
osd mkfs options type=btrfs

osd journal size = 20000

Please note that I will use btrfs to store the data. My kernel is at a sufficient level for that (4.9), and i’ve experienced obvious filesystem corruptions sometimes, when simply rebooting my nodes which had kernel 3.10 and an XFS partition for the OSDs.

If you install from a local mirror :
cephadm@admin:~/cluster$ for i in {0..5}; do ceph-deploy install --repo-url {http mirror} --gpg-url {http gpg url} --release jewel n$i; done

For ex for me :

cephadm@admin:~/cluster$ for i in {0..5}; do ceph-deploy install --repo-url http://mirror/ceph/rpm-jewel/el7/ --release jewel n$i; done
Else :
cephadm@admin:~/cluster$ for i in {0..5}; do ceph-deploy install --release jewel n$i; done

This command generates a lot of logs (downloads, debug messages, warnings…) but should return without error. Otherwise check the error and google it. You can just restart the ceph-deploy program, depending on the error it will work the second time ;) (I’ve experienced some problems accessing the ceph repository, for ex….)

Create the mons:
cephadm@admin:~/cluster$ ceph-deploy mon create-initial

Idem, a lot of logs… but no error..

Create the OSDs (storage units)

You have to know which device will be used for the data on each node and which device for the journal. If you are building a ceph cluster for a production environment, you should use SSDs for the journal partition.for testing purpose, you can use only one device.

In my case, I took care to make the storage osd disks to be /dev/vdb on all nodes, and journal (SSD) on /dev/vdc..

Important note : if you previously installed ceph on a device, you MUST “zap” (delete) it before. Use the command “ceph-deploy disk zap n3:sdb” for example.

Execute this step if you don’t know anything about the past usage of your disks.

Zap disks If you have a separate partition for SSDs (/dev/vdc, here):
cephadm@admin:~/cluster$ for i in {0..5}; do ceph-deploy disk zap n$i:vdb; done
cephadm@admin:~/cluster$ for i in {0..5}; do ceph-deploy disk zap n$i:vdc; done
If you use only one device :
cephadm@admin:~/cluster$ for i in {0..4}; do ceph-deploy disk zap n$i:vdb; done

Create the OSDsNote : use –fs-type btrfs on “osd create” if you want (as me) another filesystem than xfs. I’ve got obvious problems with xfs (corruptions while rebooting..)

cephadm@admin:~/cluster$ for i in {0..5}; do ceph-deploy osd create --fs-type btrfs n$i:vdb:vdc; done

Else use the defaults (xfs)

cephadm@admin:~/cluster$ for i in {0..5}; do ceph-deploy osd create n$i:vdb:vdc; done

And remember, if you have only one device (vdb for ex), use this instead (defaults with xfs):

cephadm@admin:~/cluster$ for i in {0..4}; do ceph-deploy osd create n$i:vdb; done

#####

Deploy the ceph configuration to all storage nodes
cephadm@admin:~/cluster$ for i in {0..5}; do ceph-deploy admin n$i; done

And check the permissions. For some reasons the permissions are not correct:

cephadm@admin:~/cluster$ dsh -aM "ls -l /etc/ceph/*key*"
n0: -rw------- 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n1: -rw------- 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n2: -rw------- 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n3: -rw------- 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring

To correct this, issue the following command :

cephadm@admin:~/cluster$ dsh -aM "sudo chmod +r /etc/ceph/ceph.client.admin.keyring"

and check :

cephadm@admin:~/cluster$ dsh -aM "ls -l /etc/ceph/*key*"
n0: -rw-r--r-- 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n1: -rw-r--r-- 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n2: -rw-r--r-- 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n3: -rw-r--r-- 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
Finally, install the metadata servers
cephadm@admin:~/cluster$ ceph-deploy mds create n0 n1 n5
and the rados gateway
cephadm@admin:~/cluster$ ceph-deploy rgw create n3 n5

(on n3 for me)

One more time ;) remember to check the ntp status of the nodes :

cephadm@admin:~/cluster$ dsh -aM "timedatectl|grep synchron"
n0: NTP synchronized: yes
n1: NTP synchronized: yes
n2: NTP synchronized: yes
n3: NTP synchronized: yes
n4: NTP synchronized: yes

check the cluster, on one node type :

[cephadm@n0 ~]$ ceph status
cluster 2a663a93-7150-43f5-a8d2-e40e2d9d175f
health HEALTH_OK
monmap e2: 5 mons at {n0=192.168.10.210:6789/0,n1=192.168.10.211:6789/0,n2=192.168.10.212:6789/0,n3=192.168.10.213:6789/0,n4=192.168.10.214:6789/0}
election epoch 8, quorum 0,1,2,3,4 n0,n1,n2,n3,n4
osdmap e32: 5 osds: 5 up, 5 in
flags sortbitwise,require_jewel_osds
pgmap v97: 104 pgs, 6 pools, 1588 bytes data, 171 objects
173 MB used, 3668 GB / 3668 GB avail
104 active+clean
Done !

Test your brand new ceph cluster

You can create a pool to test your new cluster :

[cephadm@n0 ~]$ rados mkpool test
successfully created pool test
[cephadm@n0 ~]$ rados lspools
rbd
.rgw.root
default.rgw.control
default.rgw.data.root
default.rgw.gc
default.rgw.log
test

[cephadm@n0 ~]$ rados put -p test .bashrc .bashrc
[cephadm@n0 ~]$ ceph osd map test .bashrc
osdmap e34 pool 'test' (6) object '.bashrc' -&gt; pg 6.3d13d849 (6.1) -&gt; up ([2,4,1], p2) acting ([2,4,1], p2)

A quick look at the cluster network, ensure it’s used as it should be (tcpdump -i on n0) :

22:43:17.137802 IP 10.1.1.12.50248 &gt; n0.int.intra.acnet: Flags [P.], seq 646:655, ack 656, win 1424, options [nop,nop,TS val 3831166 ecr 3830943], length 9
22:43:17.177297 IP n0.int.intra.acnet &gt; 10.1.1.12.50248: Flags [.], ack 655, win 235, options [nop,nop,TS val 3831203 ecr 3831166], length 0
22:43:17.205945 IP 10.1.1.13.42810 &gt; n0.int.intra.acnet: Flags [P.], seq 393:515, ack 394, win 1424, options [nop,nop,TS val 4392067 ecr 3829192], length 122
22:43:17.205999 IP n0.int.intra.acnet &gt; 10.1.1.13.42810: Flags [.], ack 515, win 252, options [nop,nop,TS val 3831231 ecr 4392067], length 0
22:43:17.206814 IP n0.int.intra.acnet &gt; 10.1.1.13.42810: Flags [P.], seq 394:525, ack 515, win 252, options [nop,nop,TS val 3831232 ecr 4392067], length 131
22:43:17.207547 IP 10.1.1.13.42810 &gt; n0.int.intra.acnet: Flags [.], ack 525, win 1424, options [nop,nop,TS val 4392069 ecr 3831232], length 0

….
Good !!

Now, “really” test your new cluster

Cf http://docs.ceph.com/docs/giant/rbd/libvirt/ :

First deploy the admin part of ceph on the destination system that will test your cluster

On the admin node :

[cephadm@admin cluster]$ ceph-deploy –overwrite-conf admin hyp03

On an hypervisor, with access to the network of course :

First give permissions for each process to know your cluster

chmod +r /etc/ceph/ceph.client.admin.keyring
[root@hyp03 ~]# ceph osd pool create libvirt-pool 128 128
pool 'libvirt-pool' created

[root@hyp03 ~]# ceph auth get-or-create client.libvirt mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=libvirt-pool'
[client.libvirt]
key = AQDsdMYVYR0IdmlkKDLKMZYUifn+lvqMH3D7Q==

Create a 16G image on your new cluster

[root@hyp03 ~]# qemu-img create -f rbd rbd:libvirt-pool/new-libvirt-image 16G
Formatting 'rbd:libvirt-pool/new-libvirt-image', fmt=rbd size=17179869184 cluster_size=0

Important : Jewel brings RBD features not compatible with Centos 7.3. Disable it, otherwise you won’t be able to mount your RBD image (either with rbd map or through qemu-img)

rbd feature disable libvirt-pool/new-libvirt-image exclusive-lock object-map fast-diff deep-flatten

Create a secret

cat > secret.xml <<EOF
<secret ephemeral='no' private='no'>
        <usage type='ceph'>
                <name>client.libvirt secret</name>
        </usage>
</secret>
EOF

Then issue :

[root@hyp03 ~]# sudo virsh secret-define --file secret.xml
[root@hyp03 ~]# ceph auth get-key client.libvirt | sudo tee client.libvirt.key
sudo virsh secret-set-value --secret 12390708-973c-4f6e-b0be-aba963608006 --base64 $(cat client.libvirt.key) && rm client.libvirt.key secret.xml

Replicate the secret on all hosts you want the VM to run (especially for using live migration). repeat the previous steps on each of these hosts, but with a modified secret.xml file that includes the secret UUID created at the first run on the first host.

<secret ephemeral='no' private='no'>
    <uuid>12390708-973c-4f6e-b0be-aba963608006</uuid>
        <usage type='ceph'>
            <name>client.libvirt secret</name>
        </usage<
</secret>

Follow the guide http://docs.ceph.com/docs/giant/rbd/libvirt/ for the vm configuration and then

sudo virsh secret-define --file secret.xml

[root@hyp03 ~]# virsh start dv03
Domain dv03 started

You’re done !

Configure the cluster

Crush map

For my needs, I want my cluster to be up even if one of my host is down.

In my home “datacenter”, I’ve two “rack”, two “physical hosts”, and 6 “ceph virtual hosts”, each of these running a 1TB OSD

How do I ensure the data replication occurs in a matter that no data is only on a phycial host ? you will do that bu managing your ceph CRUSH map with rules..

First, organize your ceph hosts in your “dataxenter”.

Because my home is not a really datacenter, for this example, I will name “hosts” the virtial machines hosting centos 7.3/ceph with an OSD for each VM.

I will name “rack” the deux physical hosts that run the “hosts” (VM)

I will call “datacenter” the rack where my two physical hosts are installed.

Create the datacenter, racks, and move them into the right place

ceph osd crush add-bucket rack1 rack
ceph osd crush move n0 rack=rack1
ceph osd crush move n1 rack=rack1
ceph osd crush move n2 rack=rack1
ceph osd crush move n3 rack=rack1
ceph osd crush move rack1 root=default
ceph osd crush add-bucket rack2 rack
ceph osd crush move rack2 root=default
ceph osd crush move n4 rack=rack2
ceph osd crush move n5 rack=rack2
ceph osd crush add-bucket dc datacenter
ceph osd crush move dc root=default
ceph osd crush move rack1 datacenter=dc
ceph osd crush move rack2 datacenter=dc

Look at the results

[root@hyp03 ~]# ceph osd tree
ID  WEIGHT  TYPE NAME             UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 5.54849 root default
-10 5.54849     datacenter dc
-8 3.63879         rack rack1
-2 0.90970             host n0
0 0.90970                 osd.0      up  1.00000          1.00000
-3 0.90970             host n1
1 0.90970                 osd.1      up  1.00000          1.00000
-4 0.90970             host n2
2 0.90970                 osd.2      up  1.00000          1.00000
-5 0.90970             host n3
3 0.90970                 osd.3      up  1.00000          1.00000
-9 1.90970         rack rack2
-6 0.90970             host n4
4 0.90970                 osd.4      up  1.00000          1.00000
-7 1.00000             host n5
5 1.00000                 osd.5      up  1.00000          1.00000
&nbsp;

Final config in june 2017 :

[cephadm@admin ~]$ ceph osd tree
ID  WEIGHT  TYPE NAME                     UP/DOWN REWEIGHT PRIMARY-AFFINITY 
 -1 7.36800 root default                                                    
-10 7.36800     datacenter dc                                               
-13 7.36800         room garage                                             
-14 4.54799             chassis chassis1                                    
 -8 4.54799                 rack rack1                                      
 -2 0.90999                     host n0                                     
  0 0.90999                         osd.0      up  1.00000          1.00000 
 -3 0.90999                     host n1                                     
  1 0.90999                         osd.1      up  1.00000          1.00000 
 -4 0.90999                     host n2                                     
  2 0.90999                         osd.2      up  1.00000          1.00000 
 -5 0.90999                     host n3                                     
  3 0.90999                         osd.3      up  1.00000          1.00000 
-11 0.90999                     host n6                                     
  6 0.90999                         osd.6      up  1.00000          1.00000 
-15 2.81898             chassis chassis2                                    
 -9 2.81898                 rack rack2                                      
 -6 0.90999                     host n4                                     
  4 0.90999                         osd.4      up  1.00000          1.00000 
 -7 1.00000                     host n5                                     
  5 1.00000                         osd.5      up  1.00000          1.00000 
-12 0.90999                     host n7                                     
  7 0.90999                         osd.7      up  1.00000          1.00000

data replication : play with crush map

In order to manage replicas, you create crush rules.

Crush map is a map shared by nodes and provided to clients that ensure data replication follows your policy.

Run the following commands with an authorized admin user of your cluster

First, extract the crush map of your cluster

ceph osd getcrushmap -o crush

iIt’s a binary file, you have to uncompile it so as you can edit the rules :

crushtool -d crush -o crush.txt

Edit the crush.txt file and add the following rule at the end.
We will replicate 2 occurences of the data on rack 1 (more OSD..) and the last on rack 2.

rule 3_rep_2_racks {
    ruleset 1
    type replicated
    min_size 2
    max_size 3
    step take default
    step choose firstn 2 type rack 
    step chooseleaf firstn 2 type osd
    step emit
}

Then, recompile the rules

crushtool -c crush.txt -o crushnew

the new compiled crush map is in the “crushne” file, that you’ll want to inject in your cluster nodes.

ceph osd setcrushmap -i crushnew

Then make sure you apply your rule to your pools (if not choosed related to your settings, ..)
For example, I want my VMs blocks to be replicated with this rule (my VMs are stored in the pool libvirt-pool)

ceph osd pool set libvirt-pool crush_ruleset 1

Execute the following command several times

ceph status

You will see the cluster rebalance your data dynamically until it’s OK (ceph health)

[cephadm@admin ~]$ ceph status
    cluster cd687e36-5670-48f5-b324-22a25082bede
     health HEALTH_WARN
            51 pgs backfill_wait
            3 pgs backfilling
            13 pgs degraded
            13 pgs recovery_wait
            67 pgs stuck unclean
            recovery 2188/32729 objects degraded (6.685%)
            recovery 9450/32729 objects misplaced (28.873%)
     monmap e5: 3 mons at {n0=192.168.10.210:6789/0,n4=192.168.10.214:6789/0,n8=192.168.10.218:6789/0}
            election epoch 98, quorum 0,1,2 n0,n4,n8
      fsmap e56: 1/1/1 up {0=n1=up:active}, 2 up:standby
     osdmap e915: 8 osds: 8 up, 8 in; 54 remapped pgs
            flags sortbitwise,require_jewel_osds
      pgmap v3123382: 304 pgs, 10 pools, 35945 MB data, 9318 objects
            107 GB used, 7429 GB / 7544 GB avail
            2188/32729 objects degraded (6.685%)
            9450/32729 objects misplaced (28.873%)
                 237 active+clean
                  51 active+remapped+wait_backfill
                  13 active+recovery_wait+degraded
                   3 active+remapped+backfilling
recovery io 116 MB/s, 29 objects/s

And finally :

[cephadm@admin ~]$ ceph status
    cluster cd687e36-5670-48f5-b324-22a25082bede
     health HEALTH_OK
     monmap e5: 3 mons at {n0=192.168.10.210:6789/0,n4=192.168.10.214:6789/0,n8=192.168.10.218:6789/0}
            election epoch 98, quorum 0,1,2 n0,n4,n8
      fsmap e56: 1/1/1 up {0=n1=up:active}, 2 up:standby
     osdmap e1010: 8 osds: 8 up, 8 in
            flags sortbitwise,require_jewel_osds
      pgmap v3123673: 304 pgs, 10 pools, 35945 MB data, 9318 objects
            106 GB used, 7430 GB / 7544 GB avail
                 304 active+clean
recovery io 199 MB/s, 49 objects/s

Repeat this for all pools you want to suit this new crush rule

In my example, with the following osd tree, you can see more data on the rack that has less drives.

[cephadm@admin ~]$ ceph osd df tree
ID  WEIGHT  REWEIGHT SIZE  USE    AVAIL %USE VAR  PGS TYPE NAME                     
 -1 7.36800        - 7544G   106G 7430G 1.41 1.00   0 root default                  
-10 7.36800        - 7544G   106G 7430G 1.41 1.00   0     datacenter dc             
-13 7.36800        - 7544G   106G 7430G 1.41 1.00   0         room garage           
-14 4.54799        - 4657G 59074M 4595G 1.24 0.88   0             chassis chassis1  
 -8 4.54799        - 4657G 59074M 4595G 1.24 0.88   0                 rack rack1    
 -2 0.90999        -  931G 10986M  919G 1.15 0.82   0                     host n0   
  0 0.90999  1.00000  931G 10986M  919G 1.15 0.82  94                         osd.0 
 -3 0.90999        -  931G 11617M  919G 1.22 0.87   0                     host n1   
  1 0.90999  1.00000  931G 11617M  919G 1.22 0.87 105                         osd.1 
 -4 0.90999        -  931G 11640M  919G 1.22 0.87   0                     host n2   
  2 0.90999  1.00000  931G 11640M  919G 1.22 0.87 114                         osd.2 
 -5 0.90999        -  931G 12416M  918G 1.30 0.92   0                     host n3   
  3 0.90999  1.00000  931G 12416M  918G 1.30 0.92 111                         osd.3 
-11 0.90999        -  931G 12413M  918G 1.30 0.92   0                     host n6   
  6 0.90999  1.00000  931G 12413M  918G 1.30 0.92 103                         osd.6 
-15 2.81898        - 2887G 49665M 2835G 1.68 1.19   0             chassis chassis2  
 -9 2.81898        - 2887G 49665M 2835G 1.68 1.19   0                 rack rack2    
 -6 0.90999        -  931G 17310M  913G 1.81 1.29   0                     host n4   
  4 0.90999  1.00000  931G 17310M  913G 1.81 1.29 125                         osd.4 
 -7 1.00000        - 1023G 16977M 1006G 1.62 1.15   0                     host n5   
  5 1.00000  1.00000 1023G 16977M 1006G 1.62 1.15 133                         osd.5 
-12 0.90999        -  931G 15376M  915G 1.61 1.15   0                     host n7   
  7 0.90999  1.00000  931G 15376M  915G 1.61 1.15 127                         osd.7 
               TOTAL 7544G   106G 7430G 1.41                                        
MIN/MAX VAR: 0.82/1.29  STDDEV: 0.23

Useful command :

[cephadm@admin ~]$ ceph osd df 
ID WEIGHT  REWEIGHT SIZE  USE    AVAIL %USE VAR  PGS 
 0 0.90999  1.00000  931G 10985M  919G 1.15 0.82  87 
 1 0.90999  1.00000  931G 11615M  919G 1.22 0.87 106 
 2 0.90999  1.00000  931G 11639M  919G 1.22 0.87 106 
 3 0.90999  1.00000  931G 12414M  918G 1.30 0.93 114 
 6 0.90999  1.00000  931G 12412M  918G 1.30 0.93 105 
 4 0.90999  1.00000  931G 17044M  913G 1.79 1.27 134 
 5 1.00000  1.00000 1023G 16977M 1006G 1.62 1.15 130 
 7 0.90999  1.00000  931G 15374M  915G 1.61 1.15 130 
              TOTAL 7544G   105G 7430G 1.40          
MIN/MAX VAR: 0.82/1.27  STDDEV: 0.22

Configuration management : puppet

Now we have to install a configuration management tool. It saves a lot of time..

Master installation

On the admin node, we will install the master :

[root@admin ~]# sudo rpm -ivh https://yum.puppetlabs.com/puppetlabs-release-pc1-el-7.noarch.rpm
[root@admin ~]# sudo yum -y install puppetserver
[root@admin ~]# systemctl enable puppetserver
[root@admin ~]# sudo systemctl start puppetserver

Agents installation :

Use dsh from the admin node :

[root@admin ~]# dsh -aM "sudo rpm -ivh https://yum.puppetlabs.com/puppetlabs-release-pc1-el-7.noarch.rpm"
[root@admin ~]# dsh -aM "sudo yum -y install puppet-agent"

Enable the agent

[root@admin ~]# dsh -aM "systemctl enable puppet"

Configure the agents, you need to set the server name if it’s not “puppet” (default). Use a fqdn, it’s important.

[root@admin ~]# dsh -aM "sudo /opt/puppetlabs/bin/puppet config set server admin.int.intra"

Start the agent

[root@admin ~]# dsh -aM "systemctl start puppet"

Puppet configuration

On the admin node, check all the agents have published their certificates to the server

[root@admin ~]# sudo /opt/puppetlabs/bin/puppet cert list
"n0.int.intra" (SHA256) 95:6B:A3:07:DA:70:04:D7:9B:18:4D:64:30:39:A1:19:9E:68:B9:6B:9C:92:DC:AB:98:36:16:6D:F3:66:B3:56
"n1.int.intra" (SHA256) 07:E3:1B:1F:6F:80:33:6C:A9:A4:96:88:71:A0:74:19:B0:DE:3A:EA:B2:36:2A:38:43:B1:5D:3E:92:3C:D0:47
"n2.int.intra" (SHA256) 62:2E:7E:91:CE:75:53:0C:DA:16:28:C7:14:EA:05:33:CD:DA:8D:B8:A4:A3:59:1B:B0:78:3B:29:AE:A6:CB:C4
"n3.int.intra" (SHA256) 77:92:0F:75:2F:75:E2:8F:68:22:4A:43:4C:BB:79:C5:24:6D:BB:98:42:D0:87:A5:13:57:52:9C:3D:82:D8:74
"n4.int.intra" (SHA256) 55:F4:15:F3:83:3A:39:99:B6:15:EC:D6:09:24:6D:6D:D2:07:9B:54:F5:73:15:C5:C8:74:9F:8F:BB:A0:E2:43

Sign the certificates

[root@admin ~]#  for i in {0..4}; do /opt/puppetlabs/bin/puppet cert sign n$i.int.intra ; done

Finished ! you can check all the nodes with a valid certificate :

[root@admin ~]# sudo /opt/puppetlabs/bin/puppet cert list --all
+ "admin.int.intra" (SHA256) F5:13:EE:E9:C2:F1:A7:86:01:3C:95:EE:61:EE:53:21:E9:75:15:24:45:FB:67:B8:D9:60:60:FE:DE:93:59:F6 (alt names: "DNS:puppet", "DNS:admin.int.intra")
+ "n0.int.intra"    (SHA256) 9D:C0:3E:AB:FD:67:00:DB:B5:25:CD:23:71:A4:2F:C5:3F:A6:56:FE:55:CA:5D:27:95:C6:97:79:A9:B2:7F:CB
+ "n1.int.intra"    (SHA256) 4F:C6:C1:B9:CD:21:4C:3A:76:B5:CF:E4:56:0D:20:D2:1D:72:35:7B:D9:53:86:D9:CD:CB:8D:3C:E8:39:F4:C2
+ "n2.int.intra"    (SHA256) D7:6E:85:63:04:CC:C6:24:79:E3:C2:CE:F2:0F:5B:2E:FA:EE:D9:EF:9C:E3:46:6A:83:9F:AA:DA:5D:3F:F8:52
+ "n3.int.intra"    (SHA256) 1C:95:61:C8:F6:E2:AF:4F:A5:52:B3:E0:CE:87:CF:16:02:2B:39:2C:61:EC:20:21:D0:BD:33:70:42:7A:6E:D9
+ "n4.int.intra"    (SHA256) E7:B6:4B:1B:0A:22:F8:C4:F1:E5:A9:3B:EA:17:5F:54:41:97:68:AF:D0:EC:A6:DB:74:3E:F9:7E:BF:04:16:FF

You have now a working puppet config management system running fine..

Monitoring

Telegraf

Install Telegraf on the nodes, with a puppet manifest.

vi /etc/puppetlabs/code/environments/production/manifests/site.pp

include this text in the file site.pp :

node 'n0', 'n1', 'n2', 'n3', 'n4' {
    file {'/etc/yum.repos.d/influxdb.repo':
        ensure  =&gt; present,                                               # make sure it exists
        mode    =&gt; '0644',                                                # file permissions
        content =&gt; "[influxdb]\nname = InfluxDB Repository - RHEL \$releasever\nbaseurl = https://repos.influxdata.com/rhel/\$releasever/\$basearch/stable\nenabled = 1\ngpgcheck = 1\ngpgkey = https://repos.influxdata.com/influxdb.key\n",
    }            
}

Install it on all nodes (we could do that with puppet, too):

dsh -aM "sudo yum install telegraf"

Create a puppet module for telegraf

[root@admin modules]# cd /etc/puppetlabs/code/modules
[root@admin modules]# mkdir -p telegraf_client/{files,manifests,templates}

Create a template for telegraf.conf

[root@admin telegraf_client]# vi templates/telegraf.conf.template

put the following in that file (note the fqdn variable) :

[tags]

# Configuration for telegraf agent
[agent]
debug = false
flush_buffer_when_full = true
flush_interval = "15s"
flush_jitter = "0s"
hostname = "&lt;%= fqdn %&gt;"
interval = "15s"
round_interval = true

Create a template for the inputs :

[root@admin telegraf_client]# vi templates/inputs_system.conf.template

put the following (no variables, yet. customize for your needs..) :

# Read metrics about CPU usage
[[inputs.cpu]]
percpu = false
totalcpu = true
fieldpass = [ "usage*" ]

# Read metrics about disk usagee
[[inputs.disk]]
fielddrop = [ "inodes*" ]
mount_points=["/","/home"]

# Read metrics about diskio usage
[[inputs.diskio]]
devices = ["sda2","sda3"]
skip_serial_number = true

# Read metrics about network usage
[[inputs.net]]
interfaces = [ "eth0" ]
fielddrop = [ "icmp*", "ip*", "tcp*", "udp*" ]

# Read metrics about memory usage
[[inputs.mem]]
# no configuration

# Read metrics about swap memory usage
[[inputs.swap]]
# no configuration

# Read metrics about system load &amp; uptime
[[inputs.system]]
# no configuration

Create a template for the outputs :

[root@admin telegraf_client]# vi templates/outputs.conf.template

and put the following text in the file

[[outputs.influxdb]]
database = "telegraf"
precision = "s"
urls = [ "http://admin:8086" ]
username = "telegraf"
password = "your_pass"

create the manifest for your module

[root@admin ~]# vi /etc/puppetlabs/code/modules/telegraf_client/manifests/init.pp

and add the following contents :

class telegraf_client {

package { 'telegraf':
ensure =&gt; installed,
}

file { "/etc/telegraf/telegraf.conf":
ensure =&gt; present,
owner =&gt; root,
group =&gt; root,
mode =&gt; "644",
content =&gt; template("telegraf_client/telegraf.conf.template"),
}

file { "/etc/telegraf/telegraf.d/outputs.conf":
ensure =&gt; present,
owner =&gt; root,
group =&gt; root,
mode =&gt; "644",
content =&gt; template("telegraf_client/outputs.conf.template"),
}

file { "/etc/telegraf/telegraf.d/inputs_system.conf":
ensure =&gt; present,
owner =&gt; root,
group =&gt; root,
mode =&gt; "644",
content =&gt; template("telegraf_client/inputs_system.conf.template"),
}

service { 'telegraf':
ensure =&gt; running,
enable =&gt; true,
}
}

And finally, include the module in the global puppet manifest file. Here is mine :

[root@admin ~]# vi /etc/puppetlabs/code/environments/production/manifests/site.pp

(which content is :)

node default {
case $facts['os']['name'] {
'Solaris':           { include solaris }
'RedHat', 'CentOS':  { include centos  }
/^(Debian|Ubuntu)$/: { include debian  }
default:             { include generic }
}
}

node 'n0','n1','n2','n3','n4' {
include cephnode
}

class cephnode {
include telegraf_client
}

class centos {
yumrepo { "CentOS-OS-Local":
baseurl =&gt; "http://nas4/centos/\$releasever/os/\$basearch",
descr =&gt; "Centos int.intra mirror (os)",
enabled =&gt; 1,
gpgcheck =&gt; 0,
priority =&gt; 1
}
yumrepo { "CentOS-Updates-Local":
baseurl =&gt; "http://nas4/centos/\$releasever/updates/\$basearch",
descr =&gt; "Centos int.intra mirror (updates)",
enabled =&gt; 1,
gpgcheck =&gt; 0,
priority =&gt; 1
}

yumrepo { "InfluxDB":
baseurl =&gt; "https://repos.influxdata.com/rhel/\$releasever/\$basearch/stable",
descr =&gt; "InfluxDB Repository - RHEL $releasever",
enabled =&gt; 1,
gpgcheck =&gt; 1,
gpgkey =&gt; "https://repos.influxdata.com/influxdb.key"
}
}

Wait for minutes for puppet to apply your work on the nodes or run :

[root@admin ~]# dsh -aM "/opt/puppetlabs/bin/puppet agent --test"

Check telegraf is up and running. And check the measurements in InfluxDB.

result

Monitoring a Ceph cluster and other things with InfluxDB, Grafana, collectd &#038; Telegraf

 

Now we have a working Ceph cluster (cf Ceph Cluster) you will certainly want to monitor it..

Here is another cool open source suite of software :)

Using Debian 8.6 Jessie for this memo

This blog is written with the help of several web pages, including https://www.guillaume-leduc.fr/monitoring-de-votre-serveur-avec-telegraf-influxdb-et-grafana.html

Thanks Guillaume ;)

Influx DB

I’ve a virtualized admin node in my home “datacenter”. This node will be used to collect and graph these stats.

InfluxDB is a great, time series database. I wille deploy it for my needs (monitoring all my systems, at first my ceph cluster)

First add the InfluxDB repository:

cephadm@admin:~$ curl -sL https://repos.influxdata.com/influxdb.key | sudo apt-key add -

root@admin:~# echo “deb https://repos.influxdata.com/debian jessie stable” > /etc/apt/sources.list.d/influxdb.list

cephadm@admin:~$ sudo apt-get update
cephadm@admin:~$ sudo apt-get install influxdb
Installation done.

Now configure the databases. For my needs, I will create two databases that will be two datasources in grafana (collectd and telegraf). Collectd and Telegraf are two well known agents that collects statistics from hosts. Collectd is useful for some hosts like routers (I own an openwrt internet router, very useful to monitor internet bandwidth..)

For Ceph and the nodes we will use Telegraf.

For other speialized things we will use collectd

Enable influxdb, start and enter the database like this :

root@admin:~# systemctl enable influxdb
root@admin:~# systemctl start influxdb

root@admin:~# influx
Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
Connected to http://localhost:8086 version 1.0.2
InfluxDB shell version: 1.0.2
>
Create the databases :

> CREATE DATABASE telegraf
> CREATE DATABASE collectd_db

Check the databases :

> SHOW DATABASES;

name: databases

name
telegraf
_internal
collectd_db
Create a unique user for the monitoring activities, for ex : “mon”

> CREATE USER telegraf WITH PASSWORD ‘pass’
> GRANT ALL ON telegraf TO telegraf
> GRANT ALL ON collect_db TO telegraf

Look at the retention policies for InfluxDB if you want your database to purge automatically the data. This function is not detailed here.Configure InfluxDB to receive the collectd data on port 25826 (default). In /etc/influxdb/influxdb.conf, insert this conf :

[[collectd]]
enabled = true
bind-address = “:25826”
database = “collectd_db”
typesdb = “/usr/share/collectd/types.db”






Then install collectd on the admin node to get /usr/share/collectd/types.db (you can just copy the file from an agent.. but you may want to monitor your admin node too ;) )
TelegrafOn the admin node, use dsh to install telegraf on the nodes :
cephadm@admin:~$ dsh -aM sudo apt-get install -y curl
on all nodes, run :

cephadm@admin:~$ dsh -aM “sudo curl -sL https://repos.influxdata.com/influxdb.key | sudo apt-key add -“

cephadm@admin:~$ dsh -aM “echo ‘deb https://repos.influxdata.com/debian jessie stable’ | sudo tee /etc/apt/sources.list.d/influxdb.list”
Then install telegraf, for example from the admin node using dsh :

cephadm@admin:~$ dsh -aM sudo apt-get update


cephadm@admin:~$ dsh -aM sudo apt-get install -y telegraf

Configure the nodes : in /etc/telegraf/telegraf.conf only let this, for ex for my node 1 (n1) :

[tags]

Configuration for telegraf agent

[agent]
debug = false
flush_buffer_when_full = true
flush_interval = “15s”
flush_jitter = “0s”
hostname = “n1”
interval = “15s”
round_interval = true





hostname : replace it with the influx host name

Then configure the outputs (to the central influxDB server). In /etc/telegraf/telegraf.d/outputs.conf :

[[outputs.influxdb]]
database = “telegraf”
precision = “s”
urls = [ “http://admin:8086“ ]
username = “telegraf”
password = “pass”






And configure the inputs :

Then configure the outputs (to the central influxDB server). In /etc/telegraf/telegraf.d/inputs_system.conf :

# Read metrics about CPU usage
[[inputs.cpu]]
percpu = false
totalcpu = true
fieldpass = [ “usage*” ]

Read metrics about disk usagee

[[inputs.disk]]
fielddrop = [ “inodes*” ]
mount_points=[“/“,”/home”]

Read metrics about diskio usage

[[inputs.diskio]]
devices = [“sda2”,”sda3”]
skip_serial_number = true

Read metrics about network usage

[[inputs.net]]
interfaces = [ “eth0” ]
fielddrop = [ “icmp“, “ip“, “tcp“, “udp“ ]

Read metrics about memory usage

[[inputs.mem]]

no configuration

Read metrics about swap memory usage

[[inputs.swap]]

no configuration

Read metrics about system load & uptime

[[inputs.system]]

no configuration






Enable and restart the telegraf service on all hosts

cephadm@admin:~$ dsh -aM sudo systemctl enable telegraf

cephadm@admin:~$ dsh -aM sudo systemctl start telegraf

Check the data in InfluxDB :
cephadm@admin:~$ influx
Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
Connected to http://localhost:8086 version 1.0.2
InfluxDB shell version: 1.0.2
> use telegraf
Using database telegraf
> show measurements;
name: measurements
——————
name
cpu
disk
diskio
kernel
mem
net
processes
swap
system
It worked !

Collectd

If you have other hosts like openwrt, and you want to monitor them

Install collectd on the hosts

opkg update

opkg install collectd collectd-mon-network
Configure collectd to send your data to the admin node :

insert this text in /etc/collectd.conf

## CollectD Servers
LoadPlugin network
<Plugin network>
Server “admin.int.intra” “25826”
</Plugin>

Set the hostname, polling interval and other things. User/pass if you need..

Restart collectd with this config, and check in the database

cephadm@admin:~$ influx
Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
Connected to http://localhost:8086 version 1.0.2
InfluxDB shell version: 1.0.2
> use collectd_db
Using database collectd_db
> show measurements;
name: measurements
——————
name
conntrack_value
cpu_value
df_free
df_used
disk_read
disk_write
interface_rx
interfacetx
iwinfo
iwinfo_value
load_longterm
load_midterm
load_shortterm
memory_value
netlink_rx
netlink_tx
netlink_value
processes_majflt
processes_minflt
processes_processes
processes_syst
processes_threads
processes_user
processes_value
tcpconns_value
wireless_value
It worked ;)

Grafana

To do later.

Here is the results :