📄 elect.so
字号:
m4_comment([$Id: elect.so,v 1.27 2007/04/05 20:37:29 bostic Exp $])m4_ref_title(m4_db Replication, Elections,, rep/newsite, rep/mastersync)m4_p([dnlWhen using the Base replication API, it is the responsibility of theapplication to initiate elections if desired. It is never dangerousto hold an election, as the m4_db election process ensures there isnever more than a single master database environment. Clients shouldinitiate an election whenever they lose contact with the masterenvironment, whenever they see a return of m4_ref(DB_REP_HOLDELECTION)from the m4_refT(rep_message), or when, for whatever reason, they donot know who the master is. It is not necessary for applications toimmediately hold elections when they start, as any existing masterwill be discovered after calling m4_ref(rep_start). If no master hasbeen found after a short wait period, then the application should callfor an election.])m4_p([dnlFor a client to win an election, the replication group must currentlyhave no master, and the client must have the most recent log records.In the case of clients having equivalent log records, the priority ofthe database environments participating in the election will determinethe winner. The application specifies the minimum number of replicationgroup members that must participate in an election for a winner to bedeclared. We recommend at least ((N/2) + 1) members. If fewer than thesimple majority are specified, a warning will be given.])m4_p([dnlIf an application's policy for what site should win an election can beparameterized in terms of the database environment's information (thatis, the number of sites, available log records and a relative priorityare all that matter), then m4_db can handle all elections transparently.However, there are cases where the application has more completeknowledge and needs to affect the outcome of elections. For example,applications may choose to handle master selection, explicitlydesignating master and client sites. Applications in these cases maynever need to call for an election. Alternatively, applications maychoose to use m4_ref(rep_elect)'s arguments to force the correct outcometo an election. That is, if an application has three sites, A, B, andC, and after a failure of C determines that A must become the winner,the application can guarantee an election's outcome by specifyingpriorities appropriately after an election:])m4_indent([dnlon A: priority 100, nsites 2on B: priority 0, nsites 2])m4_p([dnlIt is dangerous to configure more than one master environment using them4_refT(rep_start), and applications should be careful not to do so.Applications should only configure themselves as the master environmentif they are the only possible master, or if they have won an election.An application knows it has won an election when it receives them4_ref(DB_EVENT_REP_ELECTED) event.])m4_p([dnlNormally, when a master failure is detected it is desired that anelection finish quickly so the application can continue to serviceupdates. Also, participating sites are already up and can participate.However, in the case of restarting a whole group after an administrativeshut down, it is possible that a slower booting site had later logs thanany other site. To cover that case, an application would like to givethe election more time to ensure all sites have a chance to participate.Since it is intractable to for a starting site to determine which casethe whole group is in, the use of a long timeout gives all sites areasonable chance to participate. If an application wanting fullparticipation sets the m4_arg(nvotes) arg to the m4_refT(rep_elect) tothe number of sites in the group and one site does not reboot, a mastercan never be elected without manual intervention.])m4_p([In those cases, the desired action at a group level is to holda full election if all sites crashed and a majority election ifa subset of sites crashed or rebooted. Since an individual site cannot knowwhich number of votes to require, a mechanism is available toaccomplish this using timeouts. By setting a long timeout (perhapson the order of minutes) using the m4_arg(DB_REP_FULL_ELECTION_TIMEOUT)flag to the m4_refT(rep_set_timeout), an application canallow m4_db to elect a master even without full participation.Sites may also want to set a normal election timeout for majoritybased elections using the m4_arg(DB_REP_ELECTION_TIMEOUT) flagto the m4_refT(rep_set_timeout).])m4_p([Consider 3 sites, A, B, and C where A is the master. In thecase where all three sites crash and all reboot, all siteswill set a timeout for a full election, say 10 minutes, but onlyrequire a majority for m4_arg(nvotes) to the m4_refT(rep_elect).Once all three sites are booted the election will completeimmediately if they reboot within 10 minutes of each other. Considerif all three sites crash and only two reboot. The two sites willenter the election, but after the 10 minute timeout they willelect with the majority of two sites. Using the full electiontimeout sets a threshold for allowing a site to reboot and rejointhe group.])m4_p([dnlTo add a database environment to the replication group with the intentof it becoming the master, first add it as a client. Since it may beout-of-date with respect to the current master, allow it to updateitself from the current master. Then, shut the current master down.Presumably, the added client will win the subsequent election. If theclient does not win the election, it is likely that it was not givensufficient time to update itself with respect to the current master.])m4_p([dnlIf a client is unable to find a master or win an election, it means thatthe network has been partitioned and there are not enough environmentsparticipating in the election for one of the participants to win.In this case, the application should repeatedly call m4_ref(rep_start)and m4_ref(rep_elect), alternating between attempting to discover anexisting master, and holding an election to declare a new one. Indesperate circumstances, an application could simply declare itself themaster by calling m4_ref(rep_start), or by reducing the number ofparticipants required to win an election until the election is won.Neither of these solutions is recommended: in the case of a networkpartition, either of these choices can result in there being two mastersin one replication group, and the databases in the environment mightirretrievably diverge as they are modified in different ways by themasters. In the case of a two-system replication group, the applicationmay want to require access to a remote network site, or some otherexternal tie-breaker to allow a system to declare itself master.])m4_p([dnlIt is possible for a less-preferred database environment to win anelection if a number of systems crash at the same time. Because anelection winner is declared as soon as enough environments participatein the election, the environment on a slow booting but well-connectedmachine might lose to an environment on a badly connected but fasterbooting machine. In the case of a number of environments crashing atthe same time (for example, a set of replicated servers in a singlemachine room), applications should bring the database environments online as clients initially (which will allow them to process read queriesimmediately), and then hold an election after sufficient time has passedfor the slower booting machines to catch up.])m4_p([dnlIf, for any reason, a less-preferred database environment becomes themaster, it is possible to switch masters in a replicated environment.For example, the preferred master crashes, and one of the replicationgroup clients becomes the group master. In order to restore thepreferred master to master status, take the following steps:])m4_nlistbeginm4_nlist([dnlThe preferred master should reboot and re-join the replication groupas a client.])m4_nlistns([dnlOnce the preferred master has caught up with the replication group, theapplication on the current master should complete all active transactionsand reconfigure itself as a client using the m4_refT(rep_start).])m4_nlistns([dnlThen, the current or preferred master should call for an election usingthe m4_refT(rep_elect).])m4_nlistendm4_p([dnlReplication Manager automatically conducts elections when necessary,based on configuration information supplied to them4_refT(rep_set_priority) and the m4_refT(rep_set_nsites).])m4_page_footer
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -