Virtualization, the separation of the underlying hardware from the software running on it, is a key ingredient in the ability to provide efficient cloud services. Virtualization enables the execution of multiple Virtual Machines (VMs) over a shared physical host, thereby providing dynamicity and flexibility that are very limited otherwise. In order to exploit the full potential of virtualization, the mapping of VMs to their physical hosts is optimized for various workload patterns. This mapping, also known as VM placement, has a critical impact on the cost of cloud providers, and thus has been extensively studied. Typically, VM placement is periodically calculated, adjusting the placement to the current workload.
However, calculating the optimal placement is only the first step, since one has to actually migrate the VMs from their current location to the desired location determined by the placement algorithm. The execution of the migration plan must consider efficient transition schemes and resource availability. In a contemporary data center, hundreds of VM migrations may be needed in order to get from a given placement to the optimal one. This process sometimes involves transitions of a VM into a temporary location, which may be due to resource availability, but may also provide significant reduction of the overall migration time.
The migration plan should consider multiple criteria, including cost, resource availability, minimal disruption to cloud services, and of course the migration time. In this study we focus on the migration time, aiming at the fastest data center transition into the desired placement. Clearly, fast migration is essential for agile and adaptive datacenter performance, directly addressing the objectives of the Cloudwave project.
Starting with a simplified model of equal-size VMs, a single VM per host, and a fully connected network, the minimal migration plan is shown to be equivalent to a sequence of moves realizing a permutation over the number of VMs. This model necessitates additional temporary host slots. The minimal total migration time depends on the number of temporary hosts available, the time of single VM migration (between hosts, or when a possibly slower or remote temporary host is involved) and the total number of VM transitions. Interestingly, even this simplified model is computationally NP-hard, forcing us to develop approximation algorithms for achieving near-optimal provable performance.
The algorithm we designed for the simplified model involves transition cycles each handled by one temporary host. We then classify the transition cycles based on their completion speed into three categories, identify the slowest category and splitting its cycles among the fastest category. Computationally, this algorithm is proved to be a constant factor away from the optimal solution.
We then extend this simplified model to represent more realistic dynamic datacenter settings. In the first enhanced model, hosts and VMs can be added or removed. In this extended model, the target placement is not a permutation of the source placement, and the transition is better represented with the addition of a set of chains to the model. We designed an extended approximation algorithm for this setting. The chains and the transition cycles are now classified based on their completion speed, following iterative re-assignments of chains to categories, and achieving the same constant approximation bound as the algorithm for the simplified model.
As an additional extension to the model, we also accommodate different types of host hardware or connectivity setup, where the time duration for a single VM move involving the temporal hosts is not fixed. Our approximation algorithm for this setup avoids using temporal hosts with very high time values, and assigns the temporal hosts with lowest time values to the more critical sections of the chain transition stages, for achieving additive factor approximation algorithm (that is somewhat higher than the previous models).
In order to evaluate the practical benefits of our algorithms, we use real datacenter data to generate realistic tuples of current and desired placements, used event-driven simulator to simulate the network, and run our algorithms with these settings. The results indicate that our algorithms demonstrate a significant reduction in total migration time on realistic workloads, as compared with other published algorithms. The simulation results confirmed the small constant approximation bound as compared with the theoretical lower bound. We examined the performance of our algorithms using a few datacenter topologies, showing that our algorithms are robust to the actual topology. The addition of temporal locations, even in remote or slow parts of the datacenter, is very helpful in reducing the overall migration time.
In summary, the main contributions of our study are as follows:
- We formally study the minimum time execution of migration plans and provide near-optimal constant additive approximation algorithms for several restricted versions of the problem.
- We evaluate the practical expected performance of these algorithms in realistic scenarios and show that they over-perform any known algorithm.
- We provide strong evidence indicating that adding additional temporary hosts to be used when executing a migration plan in datacenters is very helpful to decrease the total migration time.
This work is the first published scientific contribution from the Technion – Israel to the Cloudwave project, written by Alexander Nus and Danny Raz. It was presented in IEEE/IFIP NOMS 2014 conference, 5-9 May, Krakow, Poland, http://noms2014.ieee-noms.org/.