How to make cloud-native work-loads truly resilient to disasters
Deploy and orchestrate disaster recovery for cloud-native applications deployed in Openshift/Kubernetes
When we talk about resiliency for cloud-native applications, we are dealing with multiple components to achieve true resiliency. Using few utilities written by me, we will see how to achieve true resiliency. We will be using Wordpress deploy/run utilities and Openshift DR Manager to perform disaster recovery.
In this Section, we will auto-deploy wordpress on primary and secondary sites using wpdeployer executable and in the next section we will simulate a disaster and recover it in another site.
Why Wordpress
Wordpress official definition — WordPress is a free and open-source content management system written in PHP and paired with a MySQL or MariaDB database.
This makes it a classic open-source application composed of front-end and back-end which could be deployed in a cloud-native environment and also makes it ideal for providing resiliency.
Problems moving Wordpress
- Failing-over wordpress to a secondary/disaster recovery site is very laborious, take a look — Moving WordPress.
- Following the steps mentioned in Moving WordPress would entirely defeat the purpose of resiliency and definitely hamper your Business Continuity Plans, with high RPO/RTO.
How to mitigate
- Initial steps involve deployment of wordpress in a cloud-native environment. Deployment requires us to run wphelper service which would generate dynamic properties files for our side-car container to read from and we will see how wpdeployer helps us with deployment of both wordpress and mysql in no time.
- Using Velero to backup and using Openshift DR manager to achieve resiliency between the sites(Section-2).
Wordpress Auto Deployment
Deploying wordpress enabled for resiliency requires you to interact with several K8’s objects on two different K8’s clusters as well as storage. Since we are deploying wordpress and mysql in different pods, both require you to write separate manifests for deployment, services, persistent volume and persistent volume claims, and you would even have attach appropriate storage, create storage-class, create multiple NFS directories, map PV’s to right NFS directory etc. Imagine doing this on a production scale for multiple OCP/Kubernetes setups. To make our life easier, I have written a program(wpdeployer) in Go which takes care of the entire deployment process.
Run “wphelper” service
Download the wphelper and run it as a background job or as a service in NFS, make sure you have the NFS setup and running.
As a prerequisite for wpdeployer executable, we would have to run a simple helper daemon which dynamically generates dbdata.yml to source wp-manager sidecar. We need this data to be dynamically generated in relevant volumes for our sidecar to read and update the database entries dynamically(more on this in next section).
[root@NFS]# systemctl status wphelper.service
● wphelper.service - Wordpress YAML Generator
Loaded: loaded (/etc/systemd/system/wphelper.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2020-11-27 11:58:53 IST; 1 weeks 3 days ago
Main PID: 19415 (wphelper)
Tasks: 6
CGroup: /system.slice/wphelper.service
└─19415 /bin/wphelper
Execute “wpdeployer”
It reads the config file from “/etc/wpdeployer/deployprop.ini” whose contents are below, update the deployprop.ini file and run wpdeployer.
##### deploypop.ini ###### Openshift clsuter info, URL and Token Information.[OCP_Cluster]
prURL = <Openshift Primary URL>
drURL = <Openshift Disaster Site URL>
prToken = <Primary Site Oauth Token
drToken = <Secondary Site Oauth Token># Wordpress Service Information[WP_URL]
prsvc = http://<primary site node IP>
drsvc = http://<secondary site node IP># NFS Information[NFS]
ipaddr = <ip>
username = <username>
password = <password>
Wpdeployer application performs the following tasks:
[1] Storage Operations:
- 1a—Interact with NFS to create appropriate directories
- 1b—Update NFS Exports file
- 1c — Restart NFS Service
- 1d — Update wphelper service file
[2] OCP Primary Site Operations for Mysql and Wordpress:
- 2(a-b) — Create PVs
- 2(c-d) — Create PVCs
- 2e — Create Deployments
- 2f — Create Services
[3] OCP Recovery Site Operations for Mysql and Wordpress:
- 3(a-b) — Create Mysql and Wordpress PVs
[root@quayrhel wpdeployer]# wget https://github.com/i386kernel/wpdeployer/files/5658308/wpdeployer_v1.2.0_linux_amd64.gz
‘wpdeployer_v1.2.0_linux_amd64.gz’ saved [4008561/4008561]
[OpenShift-Cloud]# ./wpdeployer
------------Performing NFS Operations----------
Checking if the project exists
Proceeding with a new ---wp-auto-30839--- project
Performing NFS Directory Operations
Performing NFS Service Operations
Performing Wordpress helper service operations
---------Performing OCP operations----------
Creating Project in <Primary URL>
---Project Creation Status: 201---
MySql Operations
Creating Mysql Persistent Volume
Creating PV mysql-wp-auto-30839
---PV Status: 201---
Creating Mysql Persistent volume claim
Creating PVC mysql-wp-auto-30839
---PVC Status: 201---
Creating Mysql Deployment
Deploying Mysql mysql-wp-auto-30839
---MySql Deployment status: 201---
Creating Mysql Service
Creating MySql Service mysql-wp-auto-30839
---MySQL Service Status: 201---
Creating wordpress Persistent Volume
Creating PV wordpress-wp-auto-30839
---PV Status: 201---
Creating wordpress Persistent volume Volume
Creating PVC wordpress-wp-auto-30839
---PVC Status: 201---
Creating wordpress deployment
Deploying Wordpress wordpress-wp-auto-30839
---Wordpress Deployment status: 201---
Creating wordpress service
Creating Wordpress Service wordpress-wp-auto-30839
---Wordpress Service Status: 201---
Performing DR Volume Operations
Creating MySql Persistent Volume
Creating PV mysql-wp-auto-30839
---PV Status: 201---
Creating Wordpress Persistent Volume
Creating PV wordpress-wp-auto-30839
---PV Status: 201---
Time Elapsed: 4.2314755s
Wordpress PR URL: <PR Service URL>
Wordpress DR URL: <DR Service URL>
Working with deployed Wordpress
After the wordpress is deployed with mysql, we can access the site with the http://www.<node-ip>:nodeport. The nodeport here is a random integer generated within the nodeport range of 30000-32767, this unique integer also happens be a suffix of the project/namespace and all the objects belonging to that project.
Perform Fail-Over Operation
In the next section(In-progress) we simulate a disaster and see how the failover is performed.