⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 trans.so

📁 berkeley db 4.6.21的源码。berkeley db是一个简单的数据库管理系统
💻 SO
字号:
m4_comment([$Id: trans.so,v 1.18 2006/08/24 17:58:53 bostic Exp $])m4_ref_title(m4_db Replication,    Transactional guarantees,, rep/bulk, rep/partition)m4_p([dnlIt is important to consider replication in the context of the overalldatabase environment's transactional guarantees.  To briefly review,transactional guarantees in a non-replicated application are based onthe writing of log file records to "stable storage", usually a diskdrive.  If the application or system then fails, the m4_db logginginformation is reviewed during recovery, and the databases are updatedso that all changes made as part of committed transactions appear, andall changes made as part of uncommitted transactions do not appear.  Inthis case, no information will have been lost.])m4_p([dnlIf a database environment does not require the log be flushed tostable storage on transaction commit (using the m4_ref(DB_TXN_NOSYNC)flag to increase performance at the cost of sacrificing transactionaldurability), m4_db recovery will only be able to restore the system tothe state of the last commit found on stable storage.  In this case,information may have been lost (for example, the changes made by somecommitted transactions may not appear in the databases after recovery).])m4_p([dnlFurther, if there is database or log file loss or corruption (forexample, if a disk drive fails), then catastrophic recovery isnecessary, and m4_db recovery will only be able to restore the systemto the state of the last archived log file.  In this case, informationmay also have been lost.])m4_p([dnlReplicating the database environment extends this model, by adding anew component to "stable storage": the client's replicated information.If a database environment is replicated, there is no lost informationin the case of database or log file loss, because the replicated systemcan be configured to contain a complete set of databases and log recordsup to the point of failure.  A database environment that loses a diskdrive can have the drive replaced, and it can then rejoin thereplication group.])m4_p([dnlBecause of this new component of stable storage, specifyingm4_ref(DB_TXN_NOSYNC) in a replicated environment no longer sacrificesdurability, as long as one or more clients have acknowledged receipt ofthe messages sent by the master.  Since network connections are oftenfaster than local synchronous disk writes, replication becomes a wayfor applications to significantly improve their performance as well astheir reliability.])m4_p([dnlThe return status from the application's m4_arg(send) function must beset by the application to ensure the transactional guarantees theapplication wants to provide.  Whenever the m4_arg(send) functionreturns failure, the local database environment's log is flushed asnecessary to ensure that any information critical to database integrityis not lost.  Because this flush is an expensive operation in terms ofdatabase performance, applications should avoid returning an error fromthe m4_arg(send) function, if at all possible.])m4_p([dnlThe only interesting message type for replication transactionalguarantees is when the application's m4_arg(send) function was calledwith the m4_ref(DB_REP_PERMANENT) flag specified.  There is no reasonfor the m4_arg(send) function to ever return failure unless them4_ref(DB_REP_PERMANENT) flag was specified -- messages without them4_ref(DB_REP_PERMANENT) flag do not make visible changes to databases,and the m4_arg(send) function can return success to m4_db as soon asthe message has been sent to the client(s) or even just copied to localapplication memory in preparation for being sent.])m4_p([dnlWhen a client receives a m4_ref(DB_REP_PERMANENT) message, the clientwill flush its log to stable storage before returning (unless the clientenvironment has been configured with the m4_ref(DB_TXN_NOSYNC) option).If the client is unable to flush a complete transactional record to diskfor any reason (for example, there is a missing log record before theflagged message), the call to the m4_refT(rep_message) on the clientwill return m4_ref(DB_REP_NOTPERM) and return the LSN of this recordto the application in the m4_arg(ret_lsnp) parameter.The application's client or mastermessage handling loops should take proper action to ensure the correcttransactional guarantees in this case.  When missing records arriveand allow subsequent processing of previously stored permanentrecords, the call to the m4_refT(rep_message) on the client willreturn m4_ref(DB_REP_ISPERM) and return the largest LSN of thepermanent records that were flushed to disk.  Client applicationscan use these LSNs to know definitively if any particular LSN ispermanently stored or not.])m4_p([dnlAn application relying on a client's ability to become a master andguarantee that no data has been lost will need to write the m4_arg(send)function to return an error whenever it cannot guarantee the site thatwill win the next election has the record.  Applications not requiringthis level of transactional guarantees need not have the m4_arg(send)function return failure (unless the master's database environment hasbeen configured with m4_ref(DB_TXN_NOSYNC)), as any information criticalto database integrity has already been flushed to the local log beforem4_arg(send) was called.])m4_p([dnlTo sum up, the only reason for the m4_arg(send) function to returnfailure is when the master database environment has been configured tonot synchronously flush the log on transaction commit (that is,m4_ref(DB_TXN_NOSYNC) was configured on the master), them4_ref(DB_REP_PERMANENT) flag is specified for the message, and them4_arg(send) function was unable to determine that some number ofclients have received the current message (and all messages precedingthe current message).  How many clients need to receive the messagebefore the m4_arg(send) function can return success is an applicationchoice (and may not depend as much on a specific number of clientsreporting success as one or more geographically distributed clients).])m4_p([dnlIf, however, the application does require on-disk durability on the master,the master should be configured to synchronously flush the log on commit.If clients are not configured to synchronously flush the log,that is, if a client is running with m4_ref(DB_TXN_NOSYNC) configured,then it is up to the application to reconfigure that clientappropriately when it becomes a master.  That is, theapplication must explicitly call m4_ref(dbenv_set_flags) todisable asynchronous log flushing as part of re-configuringthe client as the new master.])m4_p([dnlOf course, it is important to ensure that the replicated master andclient environments are truly independent of each other.  For example,it does not help matters that a client has acknowledged receipt of amessage if both master and clients are on the same power supply, as thefailure of the power supply will still potentially lose information.])m4_p([dnlConfiguring your replication-based application to achieve the propermix of performance and transactional guarantees can be complex.  Inbrief, there are a few controls an application can set to configure theguarantees it makes: specification of m4_ref(DB_TXN_NOSYNC) for themaster environment, specification of m4_ref(DB_TXN_NOSYNC) for theclient environment, the priorities of different sites participating inan election, and the behavior of the application's m4_arg(send)function.])m4_p([dnlApplications using Replication Manager are free to usem4_ref(DB_TXN_NOSYNC) at the master and/or clients as they see fit.  Thebehavior of the m4_arg(send) function that Replication Manager provideson the application's behalf is determined by an "acknowledgementpolicy", which is configured by the m4_refT(repmgr_set_ack_policy).Clients always send acknowledgements for m4_ref(DB_REP_PERMANENT)messages (unless the acknowledgement policy in effect indicates that themaster doesn't care about them).  For a m4_ref(DB_REP_PERMANENT)message, the master blocks the sending thread until either it receivesthe proper number of acknowledgements, or the m4_ref(DB_REP_ACK_TIMEOUT)expires.  In the case of timeout, Replication Manager returns an errorcode from the m4_arg(send) function, causing m4_db to flush thetransaction log before returning to the application, as previouslydescribed.  The default acknowledgement policy ism4_ref(DB_REPMGR_ACKS_QUORUM), which ensures that the effect of apermanent record remains durable following an election.])m4_p([dnlFirst, it is rarely useful to write and synchronously flush the log whena transaction commits on a replication client.  It may be useful wheresystems share resources and multiple systems commonly fail at the sametime.  By default, all m4_db database environments, whether master orclient, synchronously flush the log on transaction commit or prepare.Generally, replication masters and clients turn log flush off fortransaction commit using the m4_ref(DB_TXN_NOSYNC) flag.])m4_p([dnlConsider two systems connected by a network interface.  One acts as themaster, the other as a read-only client.  The client takes over asmaster if the master crashes and the master rejoins the replicationgroup after such a failure.  Both master and client are configured tonot synchronously flush the log on transaction commit (that is,m4_ref(DB_TXN_NOSYNC) was configured on both systems).  Theapplication's m4_arg(send) function never returns failure to the m4_dblibrary, simply forwarding messages to the client (perhaps over abroadcast mechanism), and always returning success.  On the client, anym4_ref(DB_REP_NOTPERM) returns from the client's m4_refT(rep_message)are ignored, as well.  This system configuration has excellentperformance, but may lose data in some failure modes.])m4_p([dnlIf both the master and the client crash at once, it is possible to losecommitted transactions, that is, transactional durability is not beingmaintained.  Reliability can be increased by providing separate powersupplies for the systems and placing them in separate physical locations.])m4_p([dnlIf the connection between the two machines fails (or just some numberof messages are lost), and subsequently the master crashes, it ispossible to lose committed transactions.  Again, because transactionaldurability is not being maintained.  Reliability can be improved in acouple of ways:])m4_nlistbeginm4_nlist([dnlUse a reliable network protocol (for example, TCP/IP instead of UDP).])m4_nlist([dnlIncrease the number of clients and network paths to make it less likelythat a message will be lost.  In this case, it is important to also makesure a client that did receive the message wins any subsequent election.If a client that did not receive the message wins a subsequent election,data can still be lost.])m4_nlistendm4_p([dnlFurther, systems may want to guarantee message delivery to the client(s)(for example, to prevent a network connection from simply discardingmessages).  Some systems may want to ensure clients never returnout-of-date information, that is, once a transaction commit returnssuccess on the master, no client will return old information to aread-only query. Some of the following changes may be used to addressthese issues:])m4_nlistbeginm4_nlist([dnlWrite the application's m4_arg(send) function to not return to m4_dbuntil one or more clients have acknowledged receipt of the message.The number of clients chosen will be dependent on the application: youwill want to consider likely network partitions (ensure that a clientat each physical site receives the message) and geographical diversity(ensure that a client on each coast receives the message).])m4_nlist([dnlWrite the client's message processing loop to not acknowledge receiptof the message until a call to the m4_refT(rep_message) has returnedsuccess.  Messages resulting in a return of m4_ref(DB_REP_NOTPERM) fromthe m4_refT(rep_message) mean the message could not be flushed to theclient's disk.  If the client does not acknowledge receipt of suchmessages to the master until a subsequent call to them4_refT(rep_message) returns m4_ref(DB_REP_ISPERM) and the LSNreturned is at least as large as this message's LSN, then the master'sm4_arg(send) function will not return success to the m4_db library.This means the thread committing the transaction on the master will notbe allowed to proceed based on the transaction having committed untilthe selected set of clients have received the message and consider itcomplete.m4_p([dnlAlternatively, the client's message processing loop could acknowledgethe message to the master, but with an error code indicating that theapplication's m4_arg(send) function should not return to the m4_dblibrary until a subsequent acknowledgement from the same clientindicates success.])m4_p([dnlThe application send callback function invoked by m4_db containsan LSN of the record being sent (if appropriate for that record).When m4_refT(rep_message) returns indicators that a permanentrecord has been written then it also returns the maximum LSN of thepermanent record written.])])m4_nlistendm4_p([dnlThere is one final pair of failure scenarios to consider.  First, it isnot possible to abort transactions after the application's m4_arg(send)function has been called, as the master may have already written thecommit log records to disk, and so abort is no longer an option.Second, a related problem is that even though the master will attemptto flush the local log if the m4_arg(send) function returns failure,that flush may fail (for example, when the local disk is full).  Again,the transaction cannot be aborted as one or more clients may havecommitted the transaction even if m4_arg(send) returns failure.  Rareapplications may not be able to tolerate these unlikely failure modes.In that case the application may want to:])m4_nlistbeginm4_nlist([dnlConfigure the master to do always local synchronous commits (turningoff the m4_ref(DB_TXN_NOSYNC) configuration).  This will decreaseperformance significantly, of course (one of the reasons to usereplication is to avoid local disk writes.)  In this configuration,failure to write the local log will cause the transaction to abort inall cases.])m4_nlist([dnlDo not return from the application's m4_arg(send) function under anyconditions, until the selected set of clients has acknowledged themessage.  Until the m4_arg(send) function returns to the m4_db library,the thread committing the transaction on the master will wait, and sono application will be able to act on the knowledge that the transactionhas committed.])m4_nlistendm4_p([dnlThe final alternative for applications concerned about these types offailure is to use distributed transactions as an alternative means ofreplication, guaranteeing full consistency at the cost of implementinga Global Transaction Manager and performing two-phase commit acrossmultiple m4_db database environments.  More information on this topiccan be found in the m4_link(M4RELDIR/ref/xa/intro, [DistributedTransactions]) chapter.])m4_page_footer

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -