Azure is a mature platform that provides a number of options for implementing high availability and scalability at multiple levels. It is vital for an architect to know about them, including the differences between them and the costs involved, and finally, be in a position to choose an appropriate solution that meets the best solution requirements. There is no one solution for everything, but there is a good one for each project.
Running applications and systems that are available to users for consumption whenever they need them is one of the topmost priorities for organizations. They want their applications to be operational and functional, and to continue to be available to their customers even when some untoward events occur. High availability is the primary theme of this chapter. Keeping the lights on is the common metaphor that is used for high availability. Achieving high availability for applications is not an easy task, and organizations have to spend considerable time, energy, resources, and money in doing so. Additionally, there is still the risk that an organization’s implementation will not produce the desired results. Azure provides a lot of high-availability features for virtual machines (VMs) and the Platform as a Service (PaaS) service. In this chapter, we will go through the architectural and design features that are provided by Azure to ensure high availability for running applications and services.
High availability
High availability forms one of the core non-functional technical requirements for any business-critical service and its deployment. High availability refers to the feature of a service or application that keeps it operational on a continuous basis; it does so by meeting or surpassing its promised service level agreement (SLA). Users are promised a certain SLA based on the service type. The service should be available for consumption based on its SLA. For example, an SLA can define 99% availability for an application for the entire year. This means that it should be available for consumption by users for 361.35 days. If it fails to remain available for this period, that constitutes a breach of the SLA. Most mission-critical applications define their high-availability SLA as 99.999% for a year. This means the application should be up, running, and available throughout the year, but it can only be down and unavailable for 5.2 hours. If the downtime goes beyond that, you are eligible for credit, which will be calculated based on the total uptime percentage. It is important to note here that high availability is defined in terms of time (yearly, monthly, weekly, or a combination of these).
A service or application is made up of multiple components and these components are deployed on separate tiers and layers. Moreover, a service or application is deployed on an operating system (OS) and hosted on a physical machine or VM. It consumes network and storage services for various purposes. It might even be dependent on external systems. For these services or applications to be highly available, it is important that networks, storage, OSes, VMs or physical machines, and each component of the application is designed with the SLA and high availability in mind. A definite application life cycle process is used to ensure that high availability should be baked in from the start of application planning until its introduction to operations. This also involves introducing redundancy. Redundant resources should be included in the overall application and deployment architecture to ensure that if one resource goes down, another takes over and serves the requests of the customer.
Some of the major factors affecting the high availability of an application are as follows:
- Planned maintenance
- Unplanned maintenance
- Application deployment architecture