High Availability Case Study

An Airline Company had identified a number of critical applications that would benefit from an improved level of redundancy. 



RHEIS were engaged to design and implement a high availability solution for the existing QA server environment. Because of the critical nature of these applications to the business, implementation of the solution in production would only occur after rigorous testing of the QA environment.

The solution was a two-node Replicated Data Cluster configuration providing both local high availability and disaster recovery functionality in a single cluster. The cluster nodes would be located in different data centre’s to provide greater redundancy, with data replication between cluster nodes to assure data availability and consistency.

The main challenges were:

  • The existing Solaris 9 server ran multiple applications (Oracle databases, WebSphere Message Queue, Samba and bespoke customer applications) with numerous interdependencies.
  • The lack of SAN connectivity between the 2 data centre’s necessitated data replication over a dedicated WAN link between data centre’s.

The solution implemented Symantec Veritas Cluster Server (VCS) to manage the application availability across the two-nodes in a campus cluster configuration and Symantec Veritas Volume Replicator (VVR) to provide asynchronous data replication over the WAN between data centre’s. The integration of VCS and VVR enabled the replication process to be managed seamlessly by the cluster failover process.

The recommended best practice configuration for a campus cluster requires the deployment of multiple Coordination Point (CP) servers. The three CP servers provide a cluster membership arbitration mechanism that integrates with the VCS I/O fencing module. I/O fencing is a feature that prevents data corruption in the event of a communication breakdown in a cluster, also known as split-brain. The CP server membership arbitration mechanism supports multiple VCS clusters and would be used by other clusters in the future.

The two-node campus cluster will run in an active/passive configuration, with all the application services running on one node. When the active server fails the services will be failed over to the other node in the cluster. 




Share by: