📄 app.so
字号:
m4_comment([$Id: app.so,v 10.28 2006/02/28 16:30:53 bostic Exp $])m4_ref_title(m4_tam Applications, Architecting Transactional Data Store applications,, transapp/fail, transapp/env_open)m4_p([dnlWhen building Transactional Data Store applications, the architecturedecisions involve application startup (running recovery) and handlingsystem or application failure. For details on performing recovery,see the m4_link(recovery, [Recovery procedures]).])m4_p([dnlRecovery in a database environment is a single-threaded procedure, thatis, one thread of control or process must complete database environmentrecovery before any other thread of control or process operates in them4_db environment. It may simplify matters that m4_db serializesrecovery and creation of a new database environment.])m4_p([dnlPerforming recovery first marks any existing database environment as"failed" and then removes it, causing threads of control running in thedatabase environment to fail and return to the application. Thisfeature allows applications to recover environments without concern forthreads of control that might still be running in the removedenvironment. The subsequent re-creation of the database environment isserialized, so multiple threads of control attempting to create adatabase environment will serialize behind a single creating thread.])m4_p([dnlOne consideration in removing (as part of recovering) a databaseenvironment which may be in use by another thread, is the type of mutexbeing used by the m4_db library. In the case of database environmentfailure when using test-and-set mutexes, threads of control waiting ona mutex when the environment is marked "failed" will quickly notice thefailure and will return an error from the m4_db API. In the case ofenvironment failure when using blocking mutexes, where the underlyingsystem mutex implementation does not unblock mutex waiters after thethread of control holding the mutex dies, threads waiting on a mutexwhen an environment is recovered might hang forever. Applicationsblocked on events (for example, an application blocked on a networksocket, or a GUI event) may also fail to notice environment recoverywithin a reasonable amount of time. Systems with such muteximplementations are rare, but do exist; applications on such systemsshould use an application architecture where the thread recovering thedatabase environment can explicitly terminate any process using thefailed environment, or configure m4_db for test-and-set mutexes, orincorporate some form of long-running timer or watchdog process to wakeor kill blocked processes should they block for too long.])m4_p([dnlRegardless, it makes little sense for multiple threads of control tosimultaneously attempt recovery of a database environment, since thelast one to run will remove all database environments created by thethreads of control that ran before it. However, for some applications,it may make sense for applications to have a single thread of controlthat performs recovery and then removes the database environment, afterwhich the application launches a number of processes, any of which willcreate the database environment and continue forward.])m4_p([dnlThere are three common ways to architect m4_db Transactional Data Storeapplications. The one chosen is usually based on whether or not theapplication is comprised of a single process or group of processesdescended from a single process (for example, a server started when thesystem first boots), or if the application is comprised of unrelatedprocesses (for example, processes started by web connections or userslogged into the system).])m4_nlistbeginm4_nlist([dnlThe first way to architect Transactional Data Store applications is asa single process (the process may or may not be multithreaded.)m4_p([dnlWhen this process starts, it runs recovery on the database environmentand then opens its databases. The application can subsequently createnew threads of control as it chooses. Those threads of control caneither share already open m4_db m4_ref(DbEnv) and m4_ref(Db) handles,or create their own. In this architecture, databases are rarely openedor closed when more than a single thread of control is running; that is,they are opened when only a single thread is running, and closed afterall threads but one have exited. The last thread of control to exitcloses the databases and the database environment.])m4_p([dnlThis architecture is simplest to implement because thread serializationis easy and failure detection does not require monitoring multipleprocesses.])m4_p([dnlIf the application's thread model allows processes to continue afterthread failure, the m4_refT(dbenv_failchk) can be used to determine ifthe database environment is usable after thread failure. If theapplication does not call m4_ref(dbenv_failchk), orm4_ref(dbenv_failchk) returns m4_ref(DB_RUNRECOVERY), the applicationmust behave as if there has been a system failure, performing recoveryand re-creating the database environment. Once these actions have beentaken, other threads of control can continue (as long as all existingm4_db handles are first discarded), or restarted.])])m4_nlist([dnlThe second way to architect Transactional Data Store applications is asa group of related processes (the processes may or may not bemultithreaded).m4_p([dnlThis architecture requires the order in which threads of control arecreated be controlled to serialize database environment recovery.])m4_p([dnlIn addition, this architecture requires that threads of control bemonitored. If any thread of control exits with open m4_db handles, theapplication may call the m4_refT(dbenv_failchk) to detect lost mutexesand locks and determine if the application can continue. If theapplication does not call m4_ref(dbenv_failchk), orm4_ref(dbenv_failchk) returns that the database environment can nolonger be used, the application must behave as if there has been asystem failure, performing recovery and creating a new databaseenvironment. Once these actions have been taken, other threads ofcontrol can be continued (as long as all existing m4_db handles arefirst discarded), or restarted.])m4_p([dnlThe easiest way to structure groups of related processes is to firstcreate a single "watcher" process (often a script) that starts when thesystem first boots, runs recovery on the database environment and thencreates the processes or threads that will actually perform work. Theinitial thread has no further responsibilities other than to wait on thethreads of control it has started, to ensure none of them unexpectedlyexit. If a thread of control exits, the watcher process optionallycalls the m4_refT(dbenv_failchk). If the application does not callm4_ref(dbenv_failchk) or if m4_ref(dbenv_failchk) returns that theenvironment can no longer be used, the watcher kills all of the threadsof control using the failed environment, runs recovery, and starts newthreads of control to perform work.])])m4_nlist([dnlThe third way to architect Transactional Data Store applications is asa group of unrelated processes (the processes may or may not bemultithreaded). This is the most difficult architecture to implementbecause of the level of difficulty in some systems of finding andmonitoring unrelated processes.m4_p([dnlOne solution is to log a thread of control ID when a new m4_db handleis opened. For example, an initial "watcher" process could run recoveryon the database environment and then create a sentinel file. Any"worker" process wanting to use the environment would check for thesentinel file. If the sentinel file does not exist, the worker wouldfail or wait for the sentinel file to be created. Once the sentinelfile exists, the worker would register its process ID with the watcher(via shared memory, IPC or some other registry mechanism), and then theworker would open its m4_ref(DbEnv) handles and proceed. When theworker finishes using the environment, it would unregister its processID with the watcher. The watcher periodically checks to ensure that noworker has failed while using the environment. If a worker fails whileusing the environment, the watcher removes the sentinel file, kills allof the workers currently using the environment, runs recovery on theenvironment, and finally creates a new sentinel file.])m4_p([dnlThe weakness of this approach is that, on some systems, it is difficultto determine if an unrelated process is still running. For example,POSIX systems generally disallow sending signals to unrelated processes.The trick to monitoring unrelated processes is to find a system resourceheld by the process that will be modified if the process dies. On POSIXsystems, flock- or fcntl-style locking will work, as will LockFile onWindows systems. Other systems may have to use other process-relatedinformation such as file reference counts or modification times. In theworst case, threads of control can be required to periodicallyre-register with the watcher process: if the watcher has not heard froma thread of control in a specified period of time, the watcher will takeaction, recovering the environment.])m4_p([dnlThe m4_db library includes one built-in implementation of this approach,the m4_refT(dbenv_open)'s m4_ref(DB_REGISTER) flag:])m4_p([dnlIf the m4_ref(DB_REGISTER) flag is set, each process opening thedatabase environment first checks to see if recovery needs to beperformed. If recovery needs to be performed for any reason (includingthe initial creation of the database environment), andm4_ref(DB_RECOVER) is also specified, recovery will be performed andthen the open will proceed normally. If recovery needs to be performedand m4_ref(DB_RECOVER) is not specified, m4_ref(DB_RUNRECOVERY) will bereturned. If recovery does not need to be performed, m4_ref(DB_RECOVER)will be ignored.])m4_p([dnlThere are two additional requirements for the m4_ref(DB_REGISTER)architecture to work: First, all applications using the databaseenvironment must specify the m4_ref(DB_REGISTER) flag when opening theenvironment. However, there is no additional requirement theapplication choose a single process to recover the environment, as thefirst process to open the database environment will know to performrecovery. Second, there can only be a single m4_ref(DbEnv) handle perdatabase environment in each process. As the m4_ref(DB_REGISTER)locking is per-process, not per-thread, multiple m4_ref(DbEnv) handlesin a single environment could race with each other, potentially causingdata corruption.])m4_p([dnlA second solution for groups of unrelated processes is also based on a"watcher process". This solution is intended for systems where it isnot practical to monitor the processes sharing a database environment,but it is possible to monitor the environment to detect if a thread ofcontrol has failed holding open m4_db handles. This would be done byhaving a "watcher" process periodically call the m4_refT(dbenv_failchk).If m4_ref(dbenv_failchk) returns that the environment can no longer beused, the watcher would then take action, recovering the environment.])m4_p([dnlThe weakness of this approach is that all threads of control using theenvironment must specify an "ID" function and an "is-alive" functionusing the m4_refT(dbenv_set_thread_id). (In other words, the m4_dblibrary must be able to assign a unique ID to each thread of control,and additionally determine if the thread of control is still running.It can be difficult to portably provide that information in applicationsusing a variety of different programming languages and running on avariety of different platforms.)])m4_p([dnlThe two described approaches are different, and should not be combined.Applications might use either the m4_ref(DB_REGISTER) approach or them4_ref(dbenv_failchk) approach, but not both together in the sameapplication. For example, a POSIX application written as a libraryunderneath a wide variety of interfaces and differing APIs might choosethe m4_ref(DB_REGISTER) approach for a few reasons: first, it does notrequire making periodic calls to the m4_refT(dbenv_failchk); second,when implementing in a variety of languages, is may be more difficultto specify unique IDs for each thread of control; third, it may be moredifficult determine if a thread of control is still running, as anyparticular thread of control is likely to lack sufficient permissionsto signal other processes. Alternatively, an application with adedicated watcher process, running with appropriate permissions, mightchoose the m4_ref(dbenv_failchk) approach as supporting higher overallthroughput and reliability, as that approach allows the application toabort unresolved transactions and continue forward without having torecover the database environment.])])m4_nlistendm4_p([dnlObviously, when implementing a process to monitor other threads ofcontrol, it is important the watcher process' code be as simple andwell-tested as possible, because the application may hang if it fails.])m4_page_footer
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -