Member-only story
Relevance of the CAP-Theorem for Highly Available, Distributed Storage Systems and the Impact on Cloud-based Applications
This article was published as an university assignment in November 2012
Abstract
The CAP-Theorem states that when building distributed systems only two of the three desired properties can be reached at a time: availability, consistency and partition tolerance. Network failures are considered to be a given in every distributed system, therefore consistency properties are relaxed in order to meet the high availability requirements. The weak consistency properties of a distributed storage system are propagated to the application level. Developers of cloud-based applications have to be aware of the potentially inconsistent state of the data in the underlying storage system and have to treat the data accordingly. An important issue in this context is the effect the inconsistent state of the data has on the end user. There are different approaches for adapting the application design in order to hide these inconsistencies from the end user.
1 Introduction
Cloud computing offers an efficient, flexible and low cost solution for the incessantly increasing business demands: IT-resources as a service. The internet provides the opportunity to reach a huge audience of potential users and customers. At the same time, the availability of these various new possibilities entails a new kind of requirements that have to be met by companies. The services that are offered through the cloud are often critical for the success of the entire enterprise. The world-wide e-commerce platform Amazon for example serves tens of millions customers at peak times [1]. Thus a downtime of their services for only a few minutes can have serious financial consequences and impact on customer trust. Cloud providers therefore have to fulfill the promised quality of services in order to contribute best possible to the achievement of the business goals. As a consequence, application developers are facing new requirements in terms of availability, scalability and reliability for their applications. All of these applications use the services of underlying highly distributed storage systems often consisting of tens of thousands of servers. Naturally, these storage systems also have the requirement of being highly scalable and highly available in order to serve the overlying applications appropriately. A simple storage system can be implemented…