Monday, 18 June 2012

Failover consederations on Database Mirroring

These topics will give you the brief idea of what happens behind the scenes during failover occurs in Database mirroring. This topic assumes that the user is familiar with the database mirroring concepts.
Failover Considerations
Depending upon the Transactions safety option with witness server, you can have Automatic failover or manual failover or both. Below is the short reference of the same:-
Transaction safety level
Operating mode
Supported failover mode
High safety with automatic failover (synchronous)
Automatic or Manual
High safety (synchronous)
High performance (asynchronous)
Forced Service (with possible data loss)

When failover occurs several events occurs in the background in sequence:-
1) Failover occurs: - Failover happens as principal database become unavailable. There are below possible reasons for the failover:-
a) Power failure
b) Network failure
c) Storage Failure
d) Hardware failure
2) Failure detection: - Failure has been detected by witness & mirror server. The default timeout for the communications between Principal, Mirror & witness database is around 10 seconds. If witness did not get the response from principal within default timeout, it is considered as down. Anyhow there is no need to change the default timeout (i.e. 10 sec), but it needs to be reconfigured very cautiously, otherwise may lead to false failovers.
3) Redo Phase: - As database would be on restoring mode, so all the transactions from the redo queue needs to be applied on the mirror server to completely recover it.
4) Decision making: - Now mirror will coordinate with witness and decide that database nwill now failver to mirror server & it’s usually take 1 sec to confirm. If principal server came back before the redo phase then there would be no need for failover.
5) Mirror owns the role of Principal: - Mirror server will own the role of principal server & makes its database online. Clients are now connecting to the new principal server.
6) Undo Phase: - If there are any uncommitted transactions in the T-log, they are rolled back.
So in general, the manual failover takes place from the failover occurs till the time mirror server assumes the role of principal and server its database to the clients. But redo phase plays the major role in this scenario. How much time redo phase will take can be calculated from the system monitors Redo Queue (in KB) and redo bytes/sec. So, if you divide Redo Queue (in KB) from Redo bytes/sec, you will get the estimate time it will take to apply the logs in the redo queue of mirror database.
The other option to monitor the same is through with SQL trace on both Principal  & Mirror server by monitoring the Database Mirroring State change event. The columns of interest are Start time & Text data.
Hope, this article helps you understanding the events fired in the background when failover occurs in database mirroring.

No comments:

Post a Comment