📄 netfilter-hacking-howto.txt
字号:
The table changes are not written back until the `iptc_commit()'
function is called. This means it is possible for two library users
operating on the same chain to race each other; locking would be
required to prevent this, and it is not currently done.
There is no race with counters, however; counters are added back in to
the kernel in such a way that counter increments between the reading
and writing of the table still show up in the new table.
There are various helper functions:
[1miptc_first_chain()[0m
This function returns the first chain name in the table.
[1miptc_next_chain()[0m
This function returns the next chain name in the table: NULL
means no more chains.
[1miptc_builtin()[0m
Returns true if the given chain name is the name of a builtin
chain.
[1miptc_first_rule()[0m
This returns a pointer to the first rule in the given chain
name: NULL for an empty chain.
[1miptc_next_rule()[0m
This returns a pointer to the next rule in the chain: NULL means
the end of the chain.
[1miptc_get_target()[0m
This gets the target of the given rule. If it's an extended
target, the name of that target is returned. If it's a jump to
another chain, the name of that chain is returned. If it's a
verdict (eg. DROP), that name is returned. If it has no target
(an accounting-style rule), then the empty string is returned.
Note that this function should be used instead of using the
value of the `verdict' field of the ipt_entry structure
directly, as it offers the above further interpretations of the
standard verdict.
[1miptc_get_policy()[0m
This gets the policy of a builtin chain, and fills in the
`counters' argument with the hit statistics on that policy.
[1miptc_strerror()[0m
This function returns a more meaningful explanation of a failure
code in the iptc library. If a function fails, it will always
set errno: this value can be passed to iptc_strerror() to yield
an error message.
[1m4.3. Understanding NAT[0m
Welcome to Network Address Translation in the kernel. Note that the
infrastructure offered is designed more for completeness than raw
efficiency, and that future tweaks may increase the efficiency
markedly. For the moment I'm happy that it works at all.
NAT is separated into connection tracking (which doesn't manipulate
packets at all), and the NAT code itself. Connection tracking is also
designed to be used by an iptables modules, so it makes subtle
distinctions in states which NAT doesn't care about.
[1m4.3.1. Connection Tracking[0m
Connection tracking hooks into high-priority NF_IP_LOCAL_OUT and
NF_IP_PRE_ROUTING hooks, in order to see packets before they enter the
system.
The nfct field in the skb is a pointer to inside the struct
ip_conntrack, at one of the infos[] array. Hence we can tell the
state of the skb by which element in this array it is pointing to:
this pointer encodes both the state structure and the relationship of
this skb to that state.
The best way to extract the `nfct' field is to call
`ip_conntrack_get()', which returns NULL if it's not set, or the
connection pointer, and fills in ctinfo which describes the
relationship of the packet to that connection. This enumerated type
has several values:
[1mIP_CT_ESTABLISHED[0m
The packet is part of an established connection, in the original
direction.
[1mIP_CT_RELATED[0m
The packet is related to the connection, and is passing in the
original direction.
[1mIP_CT_NEW[0m
The packet is trying to create a new connection (obviously, it
is in the original direction).
[1mIP_CT_ESTABLISHED + IP_CT_IS_REPLY[0m
The packet is part of an established connection, in the reply
direction.
[1mIP_CT_RELATED + IP_CT_IS_REPLY[0m
The packet is related to the connection, and is passing in the
reply direction.
Hence a reply packet can be identified by testing for >=
IP_CT_IS_REPLY.
[1m4.4. Extending Connection Tracking/NAT[0m
These frameworks are designed to accommodate any number of protocols
and different mapping types. Some of these mapping types might be
quite specific, such as a load-balancing/fail-over mapping type.
Internally, connection tracking converts a packet to a "tuple",
representing the interesting parts of the packet, before searching for
bindings or rules which match it. This tuple has a manipulatable
part, and a non-manipulatable part; called "src" and "dst", as this is
the view for the first packet in the Source NAT world (it'd be a reply
packet in the Destination NAT world). The tuple for every packet in
the same packet stream in that direction is the same.
For example, a TCP packet's tuple contains the manipulatable part:
source IP and source port, the non-manipulatable part: destination IP
and the destination port. The manipulatable and non-manipulatable
parts do not need to be the same type though; for example, an ICMP
packet's tuple contains the manipulatable part: source IP and the ICMP
id, and the non-manipulatable part: the destination IP and the ICMP
type and code.
Every tuple has an inverse, which is the tuple of the reply packets in
the stream. For example, the inverse of an ICMP ping packet, icmp id
12345, from 192.168.1.1 to 1.2.3.4, is a ping-reply packet, icmp id
12345, from 1.2.3.4 to 192.168.1.1.
These tuples, represented by the `struct ip_conntrack_tuple', are used
widely. In fact, together with the hook the packet came in on (which
has an effect on the type of manipulation expected), and the device
involved, this is the complete information on the packet.
Most tuples are contained within a `struct ip_conntrack_tuple_hash',
which adds a doubly linked list entry, and a pointer to the connection
that the tuple belongs to.
A connection is represented by the `struct ip_conntrack': it has two
`struct ip_conntrack_tuple_hash' fields: one referring to the
direction of the original packet (tuplehash[IP_CT_DIR_ORIGINAL]), and
one referring to packets in the reply direction
(tuplehash[IP_CT_DIR_REPLY]).
Anyway, the first thing the NAT code does is to see if the connection
tracking code managed to extract a tuple and find an existing
connection, by looking at the skbuff's nfct field; this tells us if
it's an attempt on a new connection, or if not, which direction it is
in; in the latter case, then the manipulations determined previously
for that connection are done.
If it was the start of a new connection, we look for a rule for that
tuple, using the standard iptables traversal mechanism, on the `nat'
table. If a rule matches, it is used to initialize the manipulations
for both that direction and the reply; the connection-tracking code is
told that the reply it should expect has changed. Then, it's
manipulated as above.
If there is no rule, a `null' binding is created: this usually does
not map the packet, but exists to ensure we don't map another stream
over an existing one. Sometimes, the null binding cannot be created,
because we have already mapped an existing stream over it, in which
case the per-protocol manipulation may try to remap it, even though
it's nominally a `null' binding.
[1m4.4.1. Standard NAT Targets[0m
NAT targets are like any other iptables target extensions, except they
insist on being used only in the `nat' table. Both the SNAT and DNAT
targets take a `struct ip_nat_multi_range' as their extra data; this
is used to specify the range of addresses a mapping is allowed to bind
into. A range element, `struct ip_nat_range' consists of an inclusive
minimum and maximum IP address, and an inclusive maximum and minimum
protocol-specific value (eg. TCP ports). There is also room for
flags, which say whether the IP address can be mapped (sometimes we
only want to map the protocol-specific part of a tuple, not the IP),
and another to say that the protocol-specific part of the range is
valid.
A multi-range is an array of these `struct ip_nat_range' elements;
this means that a range could be "1.1.1.1-1.1.1.2 ports 50-55 AND
1.1.1.3 port 80". Each range element adds to the range (a union, for
those who like set theory).
[1m4.4.2. New Protocols[0m
[1m4.4.2.1. Inside The Kernel[0m
Implementing a new protocol first means deciding what the
manipulatable and non-manipulatable parts of the tuple should be.
Everything in the tuple has the property that it identifies the stream
uniquely. The manipulatable part of the tuple is the part you can do
NAT with: for TCP this is the source port, for ICMP it's the icmp ID;
something to use as a "stream identifier". The non-manipulatable part
is the rest of the packet that uniquely identifies the stream, but we
can't play with (eg. TCP destination port, ICMP type).
Once you've decided this, you can write an extension to the
connection-tracking code in the directory, and go about populating the
`ip_conntrack_protocol' structure which you need to pass to
`ip_conntrack_register_protocol()'.
The fields of `struct ip_conntrack_protocol' are:
[1mlist[0m
Set it to '{ NULL, NULL }'; used to sew you into the list.
[1mproto[0m
Your protocol number; see `/etc/protocols'.
[1mname[0m
The name of your protocol. This is the name the user will see;
it's usually best if it's the canonical name in
`/etc/protocols'.
[1mpkt_to_tuple[0m
The function which fills out the protocol specific parts of the
tuple, given the packet. The `datah' pointer points to the
start of your header (just past the IP header), and the datalen
is the length of the packet. If the packet isn't long enough to
contain the header information, return 0; datalen will always be
at least 8 bytes though (enforced by framework).
[1minvert_tuple[0m
This function is simply used to change the protocol-specific
part of the tuple into the way a reply to that packet would
look.
[1mprint_tuple[0m
This function is used to print out the protocol-specific part of
a tuple; usually it's sprintf()'d into the buffer provided. The
number of buffer characters used is returned. This is used to
print the states for the /proc entry.
[1mprint_conntrack[0m
This function is used to print the private part of the conntrack
structure, if any, also used for printing the states in /proc.
[1mpacket[0m
This function is called when a packet is seen which is part of
an established connection. You get a pointer to the conntrack
structure, the IP header, the length, and the ctinfo. You
return a verdict for the packet (usually NF_ACCEPT), or -1 if
the packet is not a valid part of the connection. You can
delete the connection inside this function if you wish, but you
must use the following idiom to avoid races (see
ip_conntrack_proto_icmp.c):
if (del_timer(&ct->timeout))
ct->timeout.function((unsigned long)ct);
[1mnew[0m
This function is called when a packet creates a connection for
the first time; there is no ctinfo arg, since the first packet
is of ctinfo IP_CT_NEW by definition. It returns 0 to fail to
create the connection, or a connection timeout in jiffies.
Once you've written and tested that you can track your new protocol,
it's time to teach NAT how to translate it. This means writing a new
module; an extension to the NAT code and go about populating the
`ip_nat_protocol' structure which you need to pass to
`ip_nat_protocol_register()'.
[1mlist[0m
Set it to '{ NULL, NULL }'; used to sew you into the list.
[1mname[0m
The name of your protocol. This is the name the user will see;
it's best if it's the canonical name in `/etc/protocols' for
userspace auto-loading, as we'll see later.
[1mprotonum[0m
Your protocol number; see `/etc/protocols'.
[1mmanip_pkt[0m
This is the other half of connection tracking's pkt_to_tuple
function: you can think of it as "tuple_to_pkt". There are some
differences though: you get a pointer to the start of the IP
header, and the total packet length. This is because some
protocols (UDP, TCP) need to know the IP header. You're given
the ip_nat_tuple_manip field from the tuple (i.e., the "src"
field), rather than the entire tuple, and the type of
manipulation you are to perform.
[1min_range[0m
This function is used to tell if manipulatable part of the given
tuple is in the given range. This function is a bit tricky:
we're given the manipulation type which has been applied to the
tuple, which tells us how to interpret the range (is it a source
range or a destination range we're aiming for?).
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -