LKSM is a system for physical
machine
migration. Its function is to migrate the running
state of a server over to a backup server with no downtime and minimal
service interruption.
In today's datacenters, many services are implemented as virtual machines. These virtual machines run under a VM hypervisor on physical hardware. One advantage of using virtual machines is that they can be migrated onto different physical machines for load balancing and server maintenance. If a server is showing signs of requiring maintenance, the VMs can be migrated off of it onto other systems and the system then taken out of service.
There are some services (such as file servers) that suffer severe performance penalties when running in a VM. These services are run on so-called "bare metal" servers with no VM hypervisor. Taking such a system out of service for maintenance therefore involves shutting it down and an interruption of service.
LKSM performs a similar function to virtual machine migration on bare metal servers. When a system needs to be taken out of service, a backup system can be configured and connected to it over a dedicated migration network. The target server is booted into a special mode to receive the migrated system state. The original system loads a special kernel module and runs a migration application which copies its memory and system state to the target server. An important feature of this is that the copy operation runs while the server is up and running. After the bulk of the system state is copied over, there is a brief "blackout" period where operations are suspended and the remainder of the state is copied over. At this point operations resume on the backup server and the original server can be taken out of service.
Current Status: LKSM is very much a work-in-progress. As of this writing (3/16/2011) it requires that both the source and target systems have exactly the same configuration (CPU, chipset, memory, PCI devices). The protocol assumes this is the case and does no checking of platform compatibility. Local storage is also not supported; successful migrations to date have been done either out of ramdisk or with multi-port fiber channel storage. Network storage (e.g. NFS) is also not supported at this time. We anticipate lifting some or all of these restrictions with future development.
In today's datacenters, many services are implemented as virtual machines. These virtual machines run under a VM hypervisor on physical hardware. One advantage of using virtual machines is that they can be migrated onto different physical machines for load balancing and server maintenance. If a server is showing signs of requiring maintenance, the VMs can be migrated off of it onto other systems and the system then taken out of service.
There are some services (such as file servers) that suffer severe performance penalties when running in a VM. These services are run on so-called "bare metal" servers with no VM hypervisor. Taking such a system out of service for maintenance therefore involves shutting it down and an interruption of service.
LKSM performs a similar function to virtual machine migration on bare metal servers. When a system needs to be taken out of service, a backup system can be configured and connected to it over a dedicated migration network. The target server is booted into a special mode to receive the migrated system state. The original system loads a special kernel module and runs a migration application which copies its memory and system state to the target server. An important feature of this is that the copy operation runs while the server is up and running. After the bulk of the system state is copied over, there is a brief "blackout" period where operations are suspended and the remainder of the state is copied over. At this point operations resume on the backup server and the original server can be taken out of service.
Current Status: LKSM is very much a work-in-progress. As of this writing (3/16/2011) it requires that both the source and target systems have exactly the same configuration (CPU, chipset, memory, PCI devices). The protocol assumes this is the case and does no checking of platform compatibility. Local storage is also not supported; successful migrations to date have been done either out of ramdisk or with multi-port fiber channel storage. Network storage (e.g. NFS) is also not supported at this time. We anticipate lifting some or all of these restrictions with future development.
News :
View a video
demo of LKSM in action!
Design by Minimalistic Design