How to install and configure Linux cluster with highly available failover nodes

November 09, 2015

How to install and configure Linux cluster with highly available failover nodes

INTRODUCTION:

There are a lot of different scenarios and types of clusters but here I will focus on a simple, 2 node, high availability cluster that is serving a website. The focus is on the availability and not on balancing the load over multiple nodes or improving performance etc.

To reach the services offered by our simple cluster, we will create a virtual IP which represents the cluster nodes, regardless of how many there are. The client only needs to know our virtual IP and doesn’t have to bother for the “real” IP addresses of the nodes or which node is the active one.
There is one owner of the virtual IP at one moment of time (Active/Passive cluster). The owner of the virtual IP also provides the service for the cluster at that moment.
For the client, nothing changes since the virtual IP remains the same. The client doesn’t know that the first node is no longer reachable and sees the same website as he is used to (assuming that both the webserver on node 01 and node 02 server the same webpages).

BACKGROUND:

To build this simple cluster, we need a few basic components:
-- Service which you want to be always available (webserver, mailserver, file-server,…)
-- Resource manager that can start and stop resources (like Pacemaker)
-- Messaging component which is responsible for communication and membership (like Corosync or Heartbeat)
-- Optionally: file synchronization which will keep filesystems equal at all cluster nodes (with DRDB or GlusterFS)
-- Optionally: Cluster manager to easily manange the cluster settings on all nodes (like PCS)

The components we will use will be Apache (webserver)as our service, Pacemaker as resource manager, Corosync as messaging (Heartbeat is considered deprecate since CentOS 7) and PCS to manage our cluster easily.

CONSIDERATION:

This is my consideration. You may replace the node names and IP according to yours:
keep the entries in /etc/hosts file on both nodes so that it could be resolvable by hostname from each node.

node01 192.168.111.213
node02 192.168.111.214
VIRTUAL_IP 192.168.111.100

FIREWALL:

-- Run below command on both nodes:
# For corosync:
iptables -I INPUT -m state --state NEW -p udp -m multiport --dports 5404,5405 -j ACCEPT

#For Pacemaker:
iptables -I INPUT -p tcp -m state --state NEW -m tcp --dport 2224 -j ACCEPT

# For IGMP:
iptables -I INPUT -p igmp -j ACCEPT

# For multicast traffic:
iptables -I INPUT -m addrtype --dst-type MULTICAST -j ACCEPT

# For Apache default hhtp port
iptables -I INPUT -p tcp -m state --state NEW -m tcp --dport 80 -j ACCEPT

# save the firewall configuration (default location: /etc/sysconfig/iptables)
service iptables save

INSTALLATION:

-- install the below pkg on both node:
yum install corosync pcs pacemaker

-- keep the same password of the service user "hacluster" (automatically created user)on both nodes:
passwd hacluster

-- start pcsd on both node:
systemctl start pcsd

-- configuring both nodes from node01, so authentication will be required on node01:
pcs cluster auth node01 node02
username: hacluster

==>> Now, all below cluster config commands will be run from node01 only.

-- adding both node into cluster named cluster_web (may refer /etc/corosync.conf):
pcs cluster setup --name cluster_web node01 node02

-- start cluster:
pcs cluster start --all

-- check cluster status:
pcs status cluster

-- check status of node in the cluster:
pcs status nodes
corosync-cmapctl | grep members
pcs status corosync

-- to check the configuration error:
crm_verify -L -V

-- you may get error in o/p of the above command. The error regarding STONITH (Shoot The Other Node In The Head), which is a mechanism to ensure that you don’t end up with two nodes that both think they are active and claim to be the service and virtual IP owner, also called a split brain situation. Since we have simple cluster, we’ll just disable the stonith option:

pcs property set stonith-enabled=false

-- While configuring the behavior of the cluster, we can also configure the quorum settings. The quorum describes the minimum number of nodes in the cluster that need to be active in order for the cluster to be available. This can be handy in a situation where a lot of nodes provide simultaneous computing power. When the number of available nodes is too low, it’s better to stop the cluster rather than deliver a non-working service. By default, the quorum is considered too low if the total number of nodes is smaller than twice the number of active nodes. For a 2 node cluster that means that both nodes need to be available in order for the cluster to be available. In our case this would completely destroy the purpose of the cluster.
To ignore the quorum:

pcs property set no-quorum-policy=ignore
pcs property

-- setting virtual IP:
pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=192.168.111.100 cidr_netmask=32 op monitor interval=30s

pcs status resources

-- To see who is the current owner of the resource/virtual IP:
pcs status|grep virtual_ip

APACHE WEB CONFIG:

-- Install apache on both node:
yum install httpd

-- create /etc/httpd/conf.d/serverstatus.conf with the following contents on both nodes:
Listen 127.0.0.1:80
<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from 127.0.0.1
</Location>

-- Disable the current Listen-statement in the Apache configuration in order to avoid trying to listen multiple times on the same port. So run the below command on both nodes:
sed -i 's/Listen/#Listen/' /etc/httpd/conf/httpd.conf
systemctl restart httpd
wget http://127.0.0.1/server-status

-- create a simple html page as /var/www/html/index.html with below contain, just to know which node is active:
# for node01

<html>
<h1>node01</h1>
</html>

# for node02

<html>
<h1>node02</h1>
</html>

-- Now, stop the Apache on both nodes and let the cluster to manage Apache service. So run the below commands on both nodes:
systemctl stop httpd
echo "Listen 192.168.111.100:80"|sudo tee --append /etc/httpd/conf/httpd.conf

-- run the below commands from node01:
pcs resource create webserver ocf:heartbeat:apache configfile=/etc/httpd/conf/httpd.conf statusurl="http://localhost/server-status" op monitor interval=1min

pcs constraint colocation add webserver virtual_ip INFINITY
pcs constraint order virtual_ip then webserver

# optional to prefer for node01:
pcs constraint location webserver prefers node01=50

-- to see the configured constraints:
pcs constraint

-- Restart the cluster services from node01:
pcs cluster stop --all && sudo pcs cluster start --all

-- now, check the browser on http://192.168.111.100. The webpage must be running.
-- To test the failover, you can stop the cluster for node01 and see if the website is still running on virtual IP binded with node02:
pcs cluster stop node01
pcs status

-- Enable the cluster component at start-up, run the below commands on both nodes:
systemctl enable pcsd
systemctl enable corosync
systemctl enable pacemaker
systemctl enable iptables
systemctl disable firewalld

If you want to confirm, take a reboot and check if the services are UP and running.

-- Note: Theoretically your cluster is UP however since there is a bug in Redhat release, you need to do some workaround once ONLY if you are having some issues in cluster status:

vim /usr/lib/systemd/system/corosync.service

###############################################
[Unit]
Description=Corosync Cluster Engine
ConditionKernelCommandLine=!nocluster
Requires=network-online.target
After=network-online.target

[Service]
ExecStartPre=/usr/bin/sleep 10 ########## you have to add this line only
ExecStart=/usr/share/corosync/corosync start
ExecStop=/usr/share/corosync/corosync stop
Type=forking

[Install]
WantedBy=multi-user.target
#############################################

-- run the below command on both nodes:
systemctl daemon-reload

-- Now take a final reboot and wait for 2 minutes at least after completion of the reboot to start the cluster services and all. Then check the website by running http://192.168.111.100

EXTRA:
-- If you want to manage your cluster using web based UI interface, you can manage from any client machine (e.g., your laptop):

https://ip_of_first_node:2224

Search This Blog

TechFrozen

How to install and configure Linux cluster with highly available failover nodes

Comments

Post a Comment

Popular Posts

Certificate Import Issue : Password Incorrect - Fixed

How to create missing .VMX file and missing .VMDK file