📄 rfc1536.txt

📁 RFC 的详细文档！
💻 TXT
📖 第 1 页 / 共 2 页
字号:
12 下一页






Network Working Group                                           A. Kumar
Request for Comments: 1536                                     J. Postel
Category: Informational                                        C. Neuman
                                                                     ISI
                                                               P. Danzig
                                                               S. Miller
                                                                     USC
                                                            October 1993


          Common DNS Implementation Errors and Suggested Fixes

Status of this Memo

   This memo provides information for the Internet community.  It does
   not specify an Internet standard.  Distribution of this memo is
   unlimited.

Abstract

   This memo describes common errors seen in DNS implementations and
   suggests some fixes. Where applicable, violations of recommendations
   from STD 13, RFC 1034 and STD 13, RFC 1035 are mentioned. The memo
   also describes, where relevant, the algorithms followed in BIND
   (versions 4.8.3 and 4.9 which the authors referred to) to serve as an
   example.

Introduction

   The last few years have seen, virtually, an explosion of DNS traffic
   on the NSFnet backbone. Various DNS implementations and various
   versions of these implementations interact with each other, producing
   huge amounts of unnecessary traffic. Attempts are being made by
   researchers all over the internet, to document the nature of these
   interactions, the symptomatic traffic patterns and to devise remedies
   for the sick pieces of software.

   This draft is an attempt to document fixes for known DNS problems so
   people know what problems to watch out for and how to repair broken
   software.

1. Fast Retransmissions

   DNS implements the classic request-response scheme of client-server
   interaction. UDP is, therefore, the chosen protocol for communication
   though TCP is used for zone transfers. The onus of requerying in case
   no response is seen in a "reasonable" period of time, lies with the
   client. Although RFC 1034 and 1035 do not recommend any



Kumar, Postel, Neuman, Danzig & Miller                          [Page 1]

RFC 1536            Common DNS Implementation Errors        October 1993


   retransmission policy, RFC 1035 does recommend that the resolvers
   should cycle through a list of servers. Both name servers and stub
   resolvers should, therefore, implement some kind of a retransmission
   policy based on round trip time estimates of the name servers. The
   client should back-off exponentially, probably to a maximum timeout
   value.

   However, clients might not implement either of the two. They might
   not wait a sufficient amount of time before retransmitting or they
   might not back-off their inter-query times sufficiently.

   Thus, what the server would see will be a series of queries from the
   same querying entity, spaced very close together. Of course, a
   correctly implemented server discards all duplicate queries but the
   queries contribute to wide-area traffic, nevertheless.

   We classify a retransmission of a query as a pure Fast retry timeout
   problem when a series of query packets meet the following conditions.

      a. Query packets are seen within a time less than a "reasonable
         waiting period" of each other.

      b. No response to the original query was seen i.e., we see two or
         more queries, back to back.

      c. The query packets share the same query identifier.

      d. The server eventually responds to the query.

A GOOD IMPLEMENTATION:

   BIND (we looked at versions 4.8.3 and 4.9) implements a good
   retransmission algorithm which solves or limits all of these
   problems.  The Berkeley stub-resolver queries servers at an interval
   that starts at the greater of 4 seconds and 5 seconds divided by the
   number of servers the resolver queries. The resolver cycles through
   servers and at the end of a cycle, backs off the time out
   exponentially.

   The Berkeley full-service resolver (built in with the program
   "named") starts with a time-out equal to the greater of 4 seconds and
   two times the round-trip time estimate of the server.  The time-out
   is backed off with each cycle, exponentially, to a ceiling value of
   45 seconds.







Kumar, Postel, Neuman, Danzig & Miller                          [Page 2]

RFC 1536            Common DNS Implementation Errors        October 1993


FIXES:

      a. Estimate round-trip times or set a reasonably high initial
         time-out.

      b. Back-off timeout periods exponentially.

      c. Yet another fundamental though difficult fix is to send the
         client an acknowledgement of a query, with a round-trip time
         estimate.

   Since UDP is used, no response is expected by the client until the
   query is complete.  Thus, it is less likely to have information about
   previous packets on which to estimate its back-off time.  Unless, you
   maintain state across queries, so subsequent queries to the same
   server use information from previous queries.  Unfortunately, such
   estimates are likely to be inaccurate for chained requests since the
   variance is likely to be high.

   The fix chosen in the ARDP library used by Prospero is that the
   server will send an initial acknowledgement to the client in those
   cases where the server expects the query to take a long time (as
   might be the case for chained queries).  This initial acknowledgement
   can include an expected time to wait before retrying.

   This fix is more difficult since it requires that the client software
   also be trained to expect the acknowledgement packet. This, in an
   internet of millions of hosts is at best a hard problem.

2. Recursion Bugs

   When a server receives a client request, it first looks up its zone
   data and the cache to check if the query can be answered. If the
   answer is unavailable in either place, the server seeks names of
   servers that are more likely to have the information, in its cache or
   zone data. It then does one of two things. If the client desires the
   server to recurse and the server architecture allows recursion, the
   server chains this request to these known servers closest to the
   queried name. If the client doesn't seek recursion or if the server
   cannot handle recursion, it returns the list of name servers to the
   client assuming the client knows what to do with these records.

   The client queries this new list of name servers to get either the
   answer, or names of another set of name servers to query. This
   process repeats until the client is satisfied. Servers might also go
   through this chaining process if the server returns a CNAME record
   for the queried name. Some servers reprocess this name to try and get
   the desired record type.



Kumar, Postel, Neuman, Danzig & Miller                          [Page 3]

RFC 1536            Common DNS Implementation Errors        October 1993


   However, in certain cases, this chain of events may not be good. For
   example, a broken or malicious name server might list itself as one
   of the name servers to query again. The unsuspecting client resends
   the same query to the same server.

   In another situation, more difficult to detect, a set of servers
   might form a loop wherein A refers to B and B refers to A. This loop
   might involve more than two servers.

   Yet another error is where the client does not know how to process
   the list of name servers returned, and requeries the same server
   since that is one (of the few) servers it knows.

   We, therefore, classify recursion bugs into three distinct
   categories:

      a. Ignored referral: Client did not know how to handle NS records
         in the AUTHORITY section.

      b. Too many referrals: Client called on a server too many times,
         beyond a "reasonable" number, with same query. This is
         different from a Fast retransmission problem and a Server
         Failure detection problem in that a response is seen for every
         query.  Also, the identifiers are always different. It implies
         client is in a loop and should have detected that and broken
         it.  (RFC 1035 mentions that client should not recurse beyond
         a certain depth.)

      c. Malicious Server: a server refers to itself in the authority
         section. If a server does not have an answer now, it is very
         unlikely it will be any better the next time you query it,
         specially when it claims to be authoritative over a domain.

      RFC 1034 warns against such situations, on page 35.

      "Bound the amount of work (packets sent, parallel processes
       started) so that a request can't get into an infinite loop or
       start off a chain reaction of requests or queries with other
       implementations EVEN IF SOMEONE HAS INCORRECTLY CONFIGURED
       SOME DATA."

A GOOD IMPLEMENTATION:

   BIND fixes at least one of these problems. It places an upper limit
   on the number of recursive queries it will make, to answer a
   question.  It chases a maximum of 20 referral links and 8 canonical
   name translations.




Kumar, Postel, Neuman, Danzig & Miller                          [Page 4]

RFC 1536            Common DNS Implementation Errors        October 1993


FIXES:

      a. Set an upper limit on the number of referral links and CNAME
         links you are willing to chase.

         Note that this is not guaranteed to break only recursion loops.
         It could, in a rare case, prune off a very long search path,
         prematurely.  We know, however, with high probability, that if
         the number of links cross a certain metric (two times the depth
         of the DNS tree), it is a recursion problem.

      b. Watch out for self-referring servers. Avoid them whenever
         possible.

      c. Make sure you never pass off an authority NS record with your
         own name on it!

      d. Fix clients to accept iterative answers from servers not built
         to provide recursion. Such clients should either be happy with
         the non-authoritative answer or be willing to chase the
         referral links themselves.

3. Zero Answer Bugs:

   Name servers sometimes return an authoritative NOERROR with no
   ANSWER, AUTHORITY or ADDITIONAL records. This happens when the
   queried name is valid but it does not have a record of the desired
   type. Of course, the server has authority over the domain.

   However, once again, some implementations of resolvers do not
   interpret this kind of a response reasonably. They always expect an
   answer record when they see an authoritative NOERROR. These entities
   continue to resend their queries, possibly endlessly.

A GOOD IMPLEMENTATION

   BIND resolver code does not query a server more than 3 times. If it
   is unable to get an answer from 4 servers, querying them three times
   each, it returns error.

   Of course, it treats a zero-answer response the way it should be
   treated; with respect!

FIXES:

      a. Set an upper limit on the number of retransmissions for a given
         query, at the very least.




Kumar, Postel, Neuman, Danzig & Miller                          [Page 5]

RFC 1536            Common DNS Implementation Errors        October 1993


      b. Fix resolvers to interpret such a response as an authoritative
         statement of non-existence of the record type for the given
         name.

4. Inability to detect server failure:

   Servers in the internet are not very reliable (they go down every
   once in a while) and resolvers are expected to adapt to the changed
   scenario by not querying the server for a while. Thus, when a server
   does not respond to a query, resolvers should try another server.
   Also, non-stub resolvers should update their round trip time estimate
   for the server to a large value so that server is not tried again
   before other, faster servers.

   Stub resolvers, however, cycle through a fixed set of servers and if,
   unfortunately, a server is down while others do not respond for other
   reasons (high load, recursive resolution of query is taking more time
   than the resolver's time-out, ....), the resolver queries the dead
   server again! In fact, some resolvers might not set an upper limit on
   the number of query retransmissions they will send and continue to
   query dead servers indefinitely.

   Name servers running system or chained queries might also suffer from
   the same problem. They store names of servers they should query for a
   given domain. They cycle through these names and in case none of them
   answers, hit each one more than one. It is, once again, important
   that there be an upper limit on the number of retransmissions, to
   prevent network overload.

   This behavior is clearly in violation of the dictum in RFC 1035 (page
   46)

      "If a resolver gets a server error or other bizarre response
       from a name server, it should remove it from SLIST, and may
       wish to schedule an immediate transmission to the next
       candidate server address."

   Removal from SLIST implies that the server is not queried again for
   some time.

   Correctly implemented full-service resolvers should, as pointed out
   before, update round trip time values for servers that do not respond
   and query them only after other, good servers. Full-service resolvers
   might, however, not follow any of these common sense directives. They
   query dead servers, and they query them endlessly.






Kumar, Postel, Neuman, Danzig & Miller                          [Page 6]
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -