📄 partition.so
字号:
m4_comment([$Id: partition.so,v 1.6 2006/08/25 12:56:00 bostic Exp $])m4_ref_title(m4_db Replication, Network partitions,, rep/trans, rep/faq)m4_p([dnlThe m4_db replication implementation can be affected by networkpartitioning problems.])m4_p([dnlFor example, consider a replication group with N members. The networkpartitions with the master on one side and more than N/2 of the siteson the other side. The sites on the side with the master will continueforward, and the master will continue to accept write queries for thedatabases. Unfortunately, the sites on the other side of the partition,realizing they no longer have a master, will hold an election. Theelection will succeed as there are more than N/2 of the total sitesparticipating, and there will then be two masters for the replicationgroup. Since both masters are potentially accepting write queries, thedatabases could diverge in incompatible ways.])m4_p([dnlIf multiple masters are ever found to exist in a replication group, amaster detecting the problem will return m4_ref(DB_REP_DUPMASTER). Ifthe application sees this return, it should reconfigure itself as aclient (by calling m4_ref(rep_start)), and then call for an election(by calling m4_ref(rep_elect)). The site that wins the election may beone of the two previous masters, or it may be another site entirely.Regardless, the winning system will bring all of the other systems intoconformance.])m4_p([dnlAs another example, consider a replication group with a masterenvironment and two clients A and B, where client A may upgrade tomaster status and client B cannot. Then, assume client A is partitionedfrom the other two database environments, and it becomes out-of-datewith respect to the master. Then, assume the master crashes and doesnot come back on-line. Subsequently, the network partition is restored,and clients A and B hold an election. As client B cannot win theelection, client A will win by default, and in order to get back intosync with client B, possibly committed transactions on client B will beunrolled until the two sites can once again move forward together.])m4_p([dnlIn both of these examples, there is a phase where a newly elected masterbrings the members of a replication group into conformance with itselfso that it can start sending new information to them. This can resultin the loss of information as previously committed transactions areunrolled.])m4_p([dnlIn architectures where network partitions are an issue, applicationsmay want to implement a heart-beat protocol to minimize the consequencesof a bad network partition. As long as a master is able to contact atleast half of the sites in the replication group, it is impossible forthere to be two masters. If the master can no longer contact asufficient number of systems, it should reconfigure itself as a client,and hold an election. Replication Manager does not currentlyimplement such a feature, so this technique is only available toapplications which use the Base replication API.])m4_p([dnlThere is another tool applications can use to minimize the damage inthe case of a network partition. By specifying an m4_arg(nsites)argument to m4_ref(rep_elect) that is larger than the actual number ofdatabase environments in the replication group, applications can keepsystems from declaring themselves the master unless they can talk toa large percentage of the sites in the system. For example, if thereare 20 database environments in the replication group, and an argumentof 30 is specified to the m4_refT(rep_elect), then a system will haveto be able to talk to at least 16 of the sites to declare itself themaster.])m4_p([dnlReplication Manager uses the value of m4_arg(nsites) (configured bythe m4_refT(rep_set_nsites)) for elections as well as in calculating howmany acknowledgements to wait for when sending am4_ref(DB_REP_PERMANENT) message. So this technique may be useful hereas well, unless the application uses the m4_ref(DB_REPMGR_ACKS_ALL) orm4_ref(DB_REPMGR_ACKS_ALL_PEERS) acknowledgement policies.])m4_p([dnlSpecifying a m4_arg(nsites) argument to m4_ref(rep_elect) that issmaller than the actual number of database environments in thereplication group has its uses as well. For example, consider areplication group with 2 environments. If they are partitioned fromeach other, neither of the sites could ever get enough votes to becomethe master. A reasonable alternative would be to specify am4_arg(nsites) argument of 2 to one of the systems and a m4_arg(nsites)argument of 1 to the other. That way, one of the systems could winelections even when partitioned, while the other one could not. Thiswould allow one of the systems to continue accepting writequeries after the partition.])m4_p([dnlIn a 2-site group, Replication Manager reacts to the loss ofcommunication with the master by assuming the master has crashed: thesurviving client simply declares itself to be master. Thus it avoidsthe problem of the survivor never being able to get enough votes toprevail. But it does leave the group vulnerable to the risk ofmultiple masters, if both sites are running but cannot communicate.])m4_p([dnlThese scenarios stress the importance of good network infrastructure inm4_db replicated environments. When replicating database environmentsover sufficiently lossy networking, the best solution may well be topick a single master, and only hold elections when human interventionhas determined the selected master is unable to recover at all.])m4_page_footer
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -