High Availability for pSeries and RS/6000
Complete Availability Line-up
HACMP: can keep mission-critical applications
highly available within a location through application fallover and monitoring.
HAGEO: can quickly restore access to data
following a location failure. HAGEO provides the same functionality as GeoRM, and,
together with HACMP, it can automate failover, recovery and reintegration between
geographically separate locations.
GeoRM: can provide an environment to mirror
data to another location. It can provide remote data mirroring for backup of regional
(up to seven) locations to a single centralized server or between just two geographically
separate servers. GeoRM is mutually exclusive and cannot be used in conjunction
with HAGEO.
HACMP
Highlights
- Combines world-class, easy-to-use, 24x7 clustering technology with IBM advanced
systems technologies
- Significantly reduces planned and unplanned outages, allowing for cluster upgrades
and system maintenance without interrupting operations
- Offers multiple data backup and recovery methods to meet disaster management needs
Need for High Availability
What happens when IT systems fail? During the business day, IT investments are hard
at work: recording customer activities, tracking inventory, keeping company statistics,
providing employees with the computing power needed to generate business revenue.
But what happens when those systems fail? The cost of computer downtime is widely
documented; unplanned outages cost real money and increase the total cost of ownership
(TCO) for IT. Planned outages for system maintenance can also impact business performance.
Keeping systems highly available should be the top goal of every system administrator
or corporate CIO. What every business needs are high-availability (HA) solutions
that keep a company's IT investment running 24x7, allow end users to never experience
any system outages, and let system maintenance occur without causing downtime.
IBM HA Clustering Solution
Better protect critical business applications from failures with the capabilities
of IBM High Availability Cluster Multiprocessing for AIX 5L V5.1.0 (HACMP V5.1).
For over 10 years, HACMP has been providing reliable high-availability services,
monitoring capabilities and dependable detection of application failures. HACMP
manages the fallover of business application environments to backup servers. And
with the introduction of the new optional package, HACMP/XD (Extended Distance),
HACMP will also manage fallover to backup servers at remote sites. HACMP/XD provides
long distance remote fallover for ESS/PPRC peers, and unlimited distance fallover
for IP connected peers using proven IBM HAGEO (High Availability Geographic Cluster)
technology. Now there is a single, world-class source of protection for mission-critical
applications.
HACMP makes use of redundant hardware configured in a cluster to keep an application
running, restarting it on a backup server if necessary. This minimizes expensive
downtime for both planned and unplanned outages and provides flexibility to accommodate
changing business needs. Up to 32 servers can participate in an HACMP cluster -
ideal for an environment requiring horizontal growth with rock-solid reliability.
HACMP can also detect software problems that are not severe enough to interrupt
proper operation of the system, such as process failure or exhaustion of system
resources. HACMP monitors, detects and reacts to such failure events, allowing the
system to stay available during random, unexpected software problems. HACMP can
be configured to react to hundreds of system events.
Using HACMP can virtually eliminate planned outages, since users, applications and
data can be moved to backup systems during scheduled system maintenance. HACMP clusters
can be configured to meet complex and varied application availability and recovery
needs.
Benefits of HACMP
HACMP takes advantage of AIX 5L - the high-performance, scalable UNIX operating
system from IBM - and exploits its systems and network management capabilities.
AIX 5L is one of the world's most open UNIX operating systems and includes functions
to improve usability, security, system availability, and performance. These include
improved availability of mirrored data and enhancements to AIX Workload Manager
that help solve problems of mixed workloads by quickly and dynamically providing
resource availability to critical applications. Used across the IBM eServer pSeries
line of on demand servers along with the Reliable Scalable Cluster Technology (RSCT)
infrastructure technology layer in AIX 5L, HACMP can provide both horizontal and
vertical scalability without downtime.
High Availability Cluster Enhancements
HACMP V5.1 requires AIX 5L and builds upon its features. New HACMP V5.1 functions
include:
- Reduced fallover time using fast disk takeover which happens within 10 seconds
- Streamlined configuration interface which requires only six user inputs to build
a simple HA cluster
- New non-IP heartbeating protection over disks where no additional hardware is required
- Enhanced security mechanism, removing the need for /.rhosts
- Increased administration productivity through faster cluster verification and synchronization
- Greater control over resources owning application startup and fallover behaviour
- More cluster status information readily available in the cluster monitor
- Addition of multiple disaster recovery technologies to keep the system accessible
if disaster strikes
Business Continuity
The HACMP/XD (Extended Distance) optional feature is a must for customers with business-critical
data who want to mirror data between separate sites to aid in disaster recovery.
This applies to businesses of any size, with multiple sites or regional operations,
or wherever decentralization of data is desired. HACMP/XD is an attractive and affordable
high-availability solution for small- and medium-sized enterprises, and for small-
and medium-sized business units of large enterprises. "High availability"
should be a fundamental buying criterion for business-critical and e-business applications.
In a single package, HACMP/XD offers multiple technologies for achieving long distance
data mirroring, fallover, and resynchronisation.
- HACMP/XD supports IBM Enterprise Storage Server (ESS) Peer-to-Peer Remote Copy (PPRC).
This allows HACMP clusters to support automatic fallover of disks that are PPRC
pairs and creates a powerful solution for customers on ESS with PPRC. By automating
the management of PPRC, recovery time is minimized after an outage, regardless of
whether the clustered environment is local or geographically dispersed. HACMP/XD
in combination with PPRC manages a clustered environment to ensure mirroring of
critical data is maintained at all times.
- HACMP/XD IP-based mirroring will provide the well-known unlimited distance data
mirroring of the IBM High Availability Geographic Cluster (HAGEO) for AIX product.
IP-base mirroring allows a cluster of pSeries servers to be placed in two widely
separated geographic locations, each maintaining an exact replica of the application
and data. Data synchronization during production, fallover, recovery, and restoration
is provided. HACMP/XD is independent of the disk storage used. RAID or mirroring
can be used for local protection. HACMP/XD IP-based mirroring is done at the logical
volume layer.
Complementary Cluster Software
IBM also offers a broad range of additional tools to aid in efficiently building,
managing and expanding HA clusters in AIX 5L environments. These include:
- Integrated Cluster File System utilizing General Parallel File System (GPFS) for
AIX V1.5. GPFS is a high-performance, shared-disk file system using standard UNIX
file system interfaces and providing concurrent access to data from all nodes in
a cluster.
- Workload Manager for AIX, which provides resource balancing between applications
- Geographic Remote Mirroring (GeoRM ) for AIX to provide unlimited distance data
mirroring for backup/recovery
- Tivoli for enterprise level systems management and monitoring
New Generation of On Demand Servers
HACMP runs on IBM eServer pSeries, the server platform of choice for UNIX-based
on demand applications. This technology-driven line of servers offers the availability,
scalability and range of performance demanded by today's growing on demand business
environments. It combines the benefits of high-performance copper chip and RISC
technology with AIX 5L for reliable handling of mission-critical applications.
pSeries is part of the IBM eServer product line, a generation of servers featuring
innovative technology, logical partitioning, outstanding scalability and availability,
broad support of open standards for application flexibility, and a full range of
new tools to manage IT infrastructure in an on demand world.
Gaining the IBM Advantage
HA solutions are often inherently single-sourced to reduce the risk of failures
occurring since each element of the solution is designed and tested for proven reliability.
This can be a critical decision factor for business environments, and IBM provides
the advantage of pSeries servers, the AIX 5L operating system, and IBM TotalStorage
offerings and HACMP solutions.
The IBM eServer product line is backed by comprehensive offerings and resources
that provide value at every stage of IT implementation. These include High Availability
Cluster Implementation Services, an offering which provides basic and customized
assistance for installation of HACMP clusters. This service is customisable with
the following elements:
- High Availability Cluster Proof of Concept Review
- Planning and design of a pSeries Availability Cluster
- Installation and configuration of a pSeries Availability Cluster
- Applications integration assistance
- Development and execution of a Cluster Test Plan
- Enhanced monitoring and reporting setup
- Operations planning and operations documentation development
- Migration/upgrades services
Based on an assessment of the complete system environment, IBM availability experts
can design a customer solution to meet the target availability level for on demand
business needs.
GeoRM for AIX and HAGEO for AIX
Highlights
- Provides disaster recovery and resynchronisation capability for geographically separated
sites
- Protects data against total location failure by remote mirroring of data
- Supports unlimited distance between participating sites
- Performs automatic site takeover and recovery
- Tight integration with IBM's High Availability Cluster Multiprocessing (HACMP) for
AIX clustering software
GeoRM/HAGEO Key Features
Key features:
- Support for both UDP and TCP transport options.
- 64-bit kernel support for the TCP protocol.
- Choice of "Write Ordering by Volume Group" under the TCP transport option
which can realize performance gains.
- Tighter integration with HACMP simplifies configuration of both products.
- Allows automatic detection and response to site and network failures in the geographic
cluster without user intervention.
- Provides load balancing across the links and enhanced by choosing the fastest path.
- Removes the AIX limitation of three mirror copies of a disk and allows three copies
at each geographic site.
- Wider range of data transmission rates, allowing more efficient use of networks
and better tuning of network utilization.
- Support for maximum sized logical volumes
Disaster Recovery Excellence
Today, keeping a business operational increasingly means keeping critical data and
information systems available around the clock. To compete successfully in the global
marketplace, companies are striving to protect critical information systems to help
minimize costly business impacts, such as lost sales, decreased customer satisfaction
and reduced employee productivity.
One aspect of high availability is protection against location disasters, such as
power outages, hardware or software failures, and natural disasters. This is accomplished
by eliminating the system and the site as points of failure.
Two software products provide differing levels of disaster recovery features for
IBM eServer pSeries and IBM RS/6000 UNIX systems. Geographic Remote Mirror (GeoRM)
for AIX protects critical data by duplicating the most up-to-date data reliably
and quickly at a remote location. High Availability Geographic Cluster (HAGEO) for
AIX helps keep mission-critical systems and applications operational in the event
of disasters.
HAGEO provides the geographic mirroring functions of GeoRM and adds automatic failover
and recovery capabilities.
GeoRM
GeoRM is a data mirroring product that provides a point-to-point method of duplicating
the customer data in real-time over unlimited geographic distances. Since GeoRM
is both database and file system independent, there is no modification required
of applications that utilize GeoRM's mirroring capabilities.
Businesses can be assured that GeoRM is designed to mirror any data destined for
one server (the source server) across any IP-based network to another server (the
target server). A total failure (e.g., CPU, disk, network, power) of the source
server at the local site will not cause the loss of data on the target server at
the remote site.
GeoRM has the ability to continue operations while recovering from a server failure.
Since a target server can support up to seven source servers in GeoRM, the flexibility
to design the correct backup configuration serves all types of business recovery
needs and allows business applications to continue running on the takeover system
while you recover from a disaster or planned outage. Each of these source and target
servers can be as near (in the same room) or as far (halfway around the world) as
required.
GeoRM offers a wide range of mirroring configurations allowing for the most stringent
data integrity mode to a higher performance mode. Data between the GeoRM sites can
be mirrored in three modes:
- Synchronous mode helps ensure that the same data exists on both sites at the completion
of every write. This mode provides a high level of data integrity.
- Synchronous with mirror write consistency helps ensure that both sites can be restored
with identical data, even in the event of a site failure in mid-transaction. This
mode provides data integrity and better performance results.
- Asynchronous mode writes on the local disk without waiting for the remote write
to complete. All data may not be on the remote site when a site failure occurs.
This mode is chosen when performance is the highest priority in disaster recovery.
GeoRM is suitable for all customers, from small and medium-sized companies to large
corporate enterprises. It is scalable and flexible across the entire range of IBM
AIX servers.
HAGEO
HAGEO supports the same critical data mirroring functions as GeoRM like point-to-point
mirroring, three mirroring modes, and backup configuration flexibility. Not only
is data protected, but HAGEO also has built-in features to automatically respond
to site and communication failures and provide for automatic site takeover.
An HAGEO cluster consists of two geographically separated sites, supporting a total
of eight systems. There are three types of disaster protection: remote hot backup,
remote mutual takeover and concurrent access.
Remote Hot Backup
A remote geographic site is designated as the hot backup site. This backup site
includes hardware, system and application software, and application data and files.
It is live and ready to takeover the current workload. In the event of a failure,
the failed site's application workload automatically transfers to the remote hot
backup site.
Remote Mutual Takeover
Remote mutual takeover takes remote hot backup a little further and allows geographically
separated system sites to be designated as hot backups for each other. Should either
site experience a failure, the other acts as a hot backup and automatically takes
over the designated application workload of the failed site. Two different workloads
running at two different sites are protected!
Concurrent Access
Concurrent access configurations have systems at both sites concurrently updating
the same database. Users run instances of the same application at both sites for
increased system throughput and extremely fast failover. HAGEO is one of the few
products to have this ability.
Remote System Recovery
Because of the above types of disaster protection, after a failed site has been
restored to operation, HAGEO can resynchronise mission-critical data and reintegrate
the failed system with the remote hot backup. HAGEO updates the failed system with
a current mirror of application data and files processed by the backup system after
the failed system ceased operations. Upon completing restoration of an up-to-date
data and file mirror, the HAGEO cluster will resume synchronized system operations,
including the mirroring of real-time data and files between the system sites. This
can occur while the remote backup is currently in user operation.
Complete Availability Line-Up
HAGEO is complemented by High Availability Cluster Multiprocessing (HACMP) for AIX,
which can be used for local or campus disaster survivability with real-time automated
fallover and reintegration for up to 32 servers. HACMP can protect against local
system and application failures, preserve data integrity and consistency, and maintain
cluster operations during unplanned and planned downtime. This strong line-up provides
IBM AIX system customers with a wide choice of high availability and disaster recovery
technologies.