📄 draft-ietf-dhc-failover-07.txt
字号:
client that gets a lease from one server while that server is unable to communicate with its failover partner. Then, assume that after that client reboots it is able only to communicate with the other failover server. If the failover servers have not been able to com- municate with each other during this process, then the DHCP client will get a new IP address instead of being able to continue to use its existing IP address. This will affect no applications on the DHCP client, since it is rebooting. However, it will use up an additional IP address in this marginal case.3.1.3. Stable storage update before DHCPACK The DHCP protocol allocates resources, and in order to operate correctly it requires that a DHCP server update some form of stable storage prior to sending a DHCPACK to a DHCP client in order to grant that client a lease on an IP address. One of the goals of the failover protocol is that it not add signifi- cant additional time to this already time consuming requirement to update stable storage prior to a DHCPACK. In particular, adding a requirement to communicate with another server prior to sending a DHCPACK would greatly simplify the failover protocol, but it would unacceptably limit the potential scalability of any DHCP server which employed the failover protocol.3.2. BOOTP relay agent implementation Many DHCP clients are not resident on the same network segment as a DHCP server. In order to support this form of network architecture, most contemporary routers implement something known as a BOOTP Relay Agent. This capability inside of a router listens for all broadcasts at the DHCP port, port 67, and will relay any broadcasts that it receives on to a DHCP server. The IP address of the DHCP server must have been previously configured into the router. As part of the relay process, the relay agent will place the address of the inter- face on which it received the broadcast into the giaddr field of the DHCP packet. Since the failover protocol requires two DHCP servers to receive any broadcast DHCP messages, in order to work with DHCP clients which are not local to the DHCP server, the BOOTP relay agent on the router closest to the DHCP client must be configured to point at more than one DHCP server. Most BOOTP relay agent implementations allow this duplication of packets.Droms, et. al. Expires January 2001 [Page 11]Internet Draft DHCP Failover Protocol July 2000 If this is not possible, an administrator might be able to configure the relay agent with a subnet broadcast address, but in this case the primary and secondary DHCP servers in a failover pair must both reside on the same subnet.3.3. What does it mean if a server can't communicate with its partner? In any protocol designed to allow one server to take over some responsibilities from a partner server in the event of "failure" of that partner server, there is an inherent difficulty in determining when that partner server has failed. In fact, it is fundamentally impossible for one server to distinguish a network communications failure from the outright failure of the server to which it is trying to communicate. In the case where each server is handing out resources (in this case IP addresses) to a client community, mistaking an inability to communicate with a partner server for failure of that partner server could easily cause both servers to be handing out the same IP addresses to different clients. One way that this is sometimes handled is for there to be more than two servers. In the case of an odd number of servers, the servers that can still communicate with a majority of other servers will con- sider themselves operational, and any server which can't communicate to a majority of other servers must immediately cease operations. While this technique works in some domains, having the only server to which a DHCP client can communicate voluntarily shut itself down seems like something worth avoiding. The failover protocol will operate correctly while both servers are unable to communicate, whether they are both running or not. At some point there may be resource contention, and if one of the servers is actually down, then the operator can inform the operational server and the operational server will be able to use all of the failed server's resources. The protocol also allows detection of an orderly shutdown of a parti- cipating server.3.4. Challenging scenarios for a Failover protocol There exist two failure scenarios which provide particular challenges to the correctness guarantees of a failover protocol.Droms, et. al. Expires January 2001 [Page 12]Internet Draft DHCP Failover Protocol July 20003.4.1. Primary Server crash before "lazy" update: In the case where the primary server sends a DHCPACK to a client for a newly allocated IP address and then crashes prior to sending the corresponding update to the secondary server, the secondary server will have no record of the IP address allocation. When the secondary server takes over, it may well try to allocate that IP address to a different client. In the case where the first client to receive the IP address is not on the net at the time (yet while there was still time to run on its lease), an ICMP echo (i.e., ping) will not prevent the secondary server from allocating that IP address to a different client. The failover protocol deals with this situation by having the primary and secondary servers allocate addresses for new clients from dis- joint address pools. See section 5.4 for details. A more likely (in that DHCPRENEWs are presumably more common than DHCPDISCOVERs) and more subtle version of this problem is where the primary server crashes after extending a client's lease time, and before updating the secondary with a new time using a lazy update. After the secondary takes over, if the client is not connected to the network the secondary will believe the client's lease has expired when, in fact, it has not. In this case as well, the IP address might be reallocated to a different client while the first client is still using it. This scenario is handled by the failover protocol through control of the lease time and the use of the maximum client lead time (MCLT). See section 5.2.1 for details.3.4.2. Network partition where DHCP servers can't communicate but eachcan talk to clients: Several conditions are required for this situation to occur. First, due to a network failure, the primary and secondary servers cannot communicate. As well, some of the DHCP clients must be able to com- municate with the primary server, and some of the clients must now only be able to communicate with the secondary server. When this condition occurs, both primary and secondary servers could attempt to allocate IP addresses for new clients from the same pool of available addresses. At some point, then, two clients will end up being allo- cated the same IP address. This will cause problems when the network failure that created this situation is corrected. The failover protocol deals with this situation by having the primary and secondary servers allocate addresses for new clients from dis- joint address pools. See section 5.4 for details.Droms, et. al. Expires January 2001 [Page 13]Internet Draft DHCP Failover Protocol July 20003.5. Using TCP to detect partner server failure There are several characteristics of TCP that are important to the functioning of the failover protocol, which uses one TCP connection for both bulk data transfer as well as to assess communications integrity with the other server. Reliable and ordered message delivery are chief among these important characteristics. It would be nice to use the capabilities built in to TCP to allow it to determine if communications integrity exists to the failover partner but this strategy contains some problems which require analysis. There exist three fundamental cases for an open TCP con- nection that must be examined. 1. When no data is being sent then no messages are traveling across the TCP connection. 2. When data is queued to be sent, and the receiver has not blocked the sending of additional data, then messages are flowing across the TCP connection containing the applications data. 3. When data is queued to be sent, and the receiver has blocked the transmission of additional data, then persist messages are flowing from the receiver to the sender to ensure that the sender doesn't miss the receiver opening the window for further transmissions. The first case can be turned into the second case by sending application-level keep-alive messages periodically when there is no other data queued to be sent. Note TCP keep-alive messages might be used as well, but they present additional problems. Thus, we can ensure that the TCP connection has messages flowing periodically across the connection fairly easily. The question remains as to what TCP will do if the other end of the connection fails to respond (either because of network partition or because the receiving server crashes). TCP will attempt to retransmit a message with an exponential backoff, and will eventually timeout that retransmission. However, the length of that timeout cannot, in gen- eral, be set on a per-connection basis, and is frequently as long as nine minutes, though in some cases it may be as short as two minutes. On some systems it can be set system-wide, while on other systems it cannot be changed at all. A value for this timeout that would be appropriate for the failover protocol, say less than 1 minute, could have unpleasant side-effects on other applications running on the same server, assuming that itDroms, et. al. Expires January 2001 [Page 14]Internet Draft DHCP Failover Protocol July 2000 could be changed at all on the host operating system. Nine minutes is a long time for the DHCP service to be unavailable to any new clients that were being served by the server which has crashed, when there is another server running that could respond to them as soon as it determines that its partner is not operational. The conclusion drawn from this analysis is that TCP provides very useful support for the failover protocol in the areas of reliable and ordered message delivery, but cannot by itself be relied upon to detect partner server failure in a fashion acceptable to the needs of the failover protocol. Additional failover protocol capabilities have been created to support timely detection of partner server failure. See section 8.3 for details on this mechanism.4. Design Goals This section lists the design goals and the limitations of the fail- over protocol.4.1. Design goals for this protocol The following is a list of goals that are met by this protocol. They are listed in priority order. 1. Implementations of this protocol must work with existing DHCP client implementations based on the DHCP protocol [1]. 2. Implementations of the protocol must work with existing BOOTP relay agent implementations. 3. The protocol must provide failover redundancy between servers that are not located on the same subnet. 4. Provide for continued service to DHCP clients through an automated mechanism in the event of failure of the primary server. 5. Avoid binding an IP address to a client while that binding is currently valid for another client. In other words, do not allocate the same IP address to two clients. 6. Minimize any need for manual administrative intervention. 7. Introduce no additional delays in server response time as a result of the network communications required to implement the failover protocol, i.e., don't require communications with the partner between the receipt of a DHCPREQUEST and theDroms, et. al. Expires January 2001 [Page 15]Internet Draft DHCP Failover Protocol July 2000 corresponding DHCPACK. 8. Share IP address ranges between primary and secondary servers; i.e., impose no requirement that the pool of available
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -