📄 network device drivers.htm
字号:
<html><head><title>Network Buffers And Memory Management</title>
<link rel="owner" href="mailto:">
<script language="JavaScript">
<!-- hide this
function help(message) {
self.status = message;
return true;
}
// stop hiding -->
</script></head>
<body>
<strong>The
HyperNews <a href="http://tldp.org/LDP/khg/HyperNews/get/khg.html">Linux KHG</a>
Discussion Pages</strong>
<hr>
<h2>Network Buffers And Memory Management</h2>
<blockquote><i>Reprinted with permission of </i><a href="http://www.ssc.com/lj">
Linux Journal,</a><i> from issue 29, September 1996.
Some changes have been made to accomodate the web.
This article was originally written for the Kernel Korner column.
The Kernel Korner series has included
many other articles of interest to Linux kernel hackers, as well.</i></blockquote>
<h4>by Alan Cox</h4>
<p>The Linux operating system implements the industry-standard
Berkeley socket API, which has its origins in the BSD unix
developments (4.2/4.3/4.4 BSD). In this article, we will look
at the way the memory management and buffering is implemented
for network layers and network device drivers under the
existing Linux kernel, as well as explain how and why some
things have changed over time.
</p><h3>Core Concepts</h3>
<p>The networking layer tries to be fairly object-oriented in
its design, as indeed is much of the Linux kernel. The core
structure of the networking code goes back to the initial
networking and socket implementations by Ross Biro and Orest
Zborowski respectively. The key objects are:
</p><dl>
<dt><b>Device or Interface:</b>
</dt><dd>A network interface represents a thing which sends and
receives packets. This is normally interface code for a
physical device like an ethernet card. However some devices are
software only such as the loopback device which is used for
sending data to yourself.
</dd><dt><b>Protocol:</b>
</dt><dd>Each protocol is effectively a different language of
networking. Some protocols exist purely because vendors chose
to use proprietary networking schemes, others are designed for
special purposes. Within the Linux kernel each protocol is a
seperate module of code which provides services to the socket
layer.
</dd><dt><b>Socket:</b>
</dt><dd>So called from the notion of plugs and sockets. A socket is
a connection in the networking that provides unix file I/O and
exists to the user program as a file descriptor. In the kernel
each socket is a pair of structures that represent the high
level socket interface and low level protocol interface.
</dd><dt><b>sk_buff:</b>
</dt><dd>All the buffers used by the networking layers are
<tt>sk_buff</tt>s. The control for these is provided by core
low-level library routines available to the whole of the
networking. <tt>sk_buff</tt>s provide the general buffering and
flow control facilities needed by network protocols.
</dd></dl>
<h3>Implementation of <tt>sk_buff</tt>s</h3>
<p>The primary goal of the <tt>sk_buff</tt> routines is to
provide a consistent and efficient buffer handling method for
all of the network layers, and by being consistent to make it
possible to provide higher level <tt>sk_buff</tt> and socket
handling facilities to all the protocols.
</p><p>An <tt>sk_buff</tt> is a control structure with a block of
memory attached. There are two primary sets of functions
provided in the <tt>sk_buff</tt> library. Firstly routines to
manipulate doubly linked lists of <tt>sk_buffs</tt>, secondly
functions for controlling the attached memory. The buffers are
held on linked lists optimised for the common network
operations of append to end and remove from start. As so much
of the networking functionality occurs during interrupts these
routines are written to be atomic. The small extra overhead
this causes is well worth the pain it saves in bug hunting.
</p><p>We use the list operations to manage groups of packets as they
arrive from the network, and as we send them to the physical
interfaces. We use the memory manipulation routines for
handling the contents of packets in a standardised and
efficient manner.
</p><p>At its most basic level, a list of buffers is managed using
functions like this:
</p><pre>void append_frame(char *buf, int len)
{
struct sk_buff *skb=alloc_skb(len, GFP_ATOMIC);
if(skb==NULL)
my_dropped++;
else
{
skb_put(skb,len);
memcpy(skb->data,data,len);
skb_append(&my_list, skb);
}
}
void process_queue(void)
{
struct sk_buff *skb;
while((skb=skb_dequeue(&my_list))!=NULL)
{
process_data(skb);
kfree_skb(skb, FREE_READ);
}
}
</pre>
These two fairly simplistic pieces of code actually demonstrate
the receive packet mechanism quite accurately. The
<tt>append_frame()</tt> function is similar to the code called
from an interrupt by a device driver receiving a packet, and
<tt>process_frame()</tt> is similar to the code called to feed
data into the protocols. If you go and look in net/core/dev.c
at <tt>netif_rx()</tt> and <tt>net_bh()</tt>, you will see that
they manage buffers similarly. They are far more complex, as
they have to feed packets to the right protocol and manage flow
control, but the basic operations are the same. This is just as
true if you look at buffers going from the protocol code to a
user application.
<p>The example also shows the use of one of the data control
functions, <tt>skb_put()</tt>. Here it is used to reserve space
in the buffer for the data we wish to pass down.
</p><p>Let's look at <tt>append_frame()</tt>. The <tt>alloc_skb()</tt>
fucntion obtains a buffer of <tt>len</tt> bytes
(<a href="#fig1">Figure 1</a>),
which consists of:
</p><ul>
<li>0 bytes of room at the head of the buffer
</li><li>0 bytes of data, and
</li><li><tt>len</tt> bytes of room at the end of the data.
</li></ul>
The <tt>skb_put()</tt> function (<a href="#fig4">Figure 4</a>)
grows the <b>data</b>
area upwards in memory through the free space at the buffer end
and thus reserves space for the <tt>memcpy()</tt>. Many network
operations used in sending add to the start of the frame each
time in order to add headers to packets, so the
<tt>skb_push()</tt> function (<a href="#fig5">Figure 5</a>)
is provided to allow
you to move the start of the data frame down through memory,
providing enough space has been reserved to leave room for
doing this.
<p>Immediately after a buffer has been allocated, all the
available room is at the end. A further function named
<tt>skb_reserve()</tt> (<a href="#fig2">Figure 2</a>)
can be called before data is
added allows you to specify that some of the room should be at
the beginning. Thus, many sending routines start with something
like:
</p><pre> skb=alloc_skb(len+headspace, GFP_KERNEL);
skb_reserve(skb, headspace);
skb_put(skb,len);
memcpy_fromfs(skb->data,data,len);
pass_to_m_protocol(skb);
</pre>
<p>In systems such as BSD unix you don't need to know in
advance how much space you will need as it uses chains of small
buffers (mbufs) for its network buffers. Linux chooses to use
linear buffers and save space in advance (often wasting a few
bytes to allow for the worst case) because linear buffers make
many other things much faster.
</p><p>Now to return to the list functions. Linux provides the
following operations:
</p><ul>
<li><tt>skb_dequeue()</tt> takes the first buffer
from a list. If the list is empty a <tt>NULL</tt> pointer is
returned. This is used to pull buffers off queues. The buffers
are added with the routines <tt>skb_queue_head()</tt> and
<tt>skb_queue_tail()</tt>.
</li><li><tt>skb_queue_head()</tt> places a buffer at
the start of a list. As with all the list operations, it is
atomic.
</li><li><tt>skb_queue_tail()</tt> places a buffer at
the end of a list, which is the most commonly used function.
Almost all the queues are handled with one set of routines
queueing data with this function and another set removing items
from the same queues with <tt>skb_dequeue()</tt>.
</li><li><tt>skb_unlink()</tt> removes a buffer from
whatever list it was on. The buffer is not freed, merely
removed from the list. To make some operations easier, you need
not know what list the buffer is on, and you can always call
<tt>skb_unlink()</tt> on a buffer which is not in a list. This
enables network code to pull a buffer out of use even when the
network protocol has no idea who is currently using it. A
seperate locking mechanism is provided so device drivers do not
find someone removing a buffer they are using at that moment.
</li><li>Some more complex protocols like TCP keep
frames in order and re-order their input as data is received.
Two functions, <tt>skb_insert()</tt> and <tt>skb_append()</tt>,
exist to allow users to place <tt>sk_buff</tt>s before or after
a specific buffer in a list.
</li><li><tt>alloc_skb()</tt> creates a new
<tt>sk_buff</tt> and initialises it. The returned buffer is
ready to use but does assume you will fill in a few fields to
indicate how the buffer should be freed. Normally this is
<tt>skb->free=1</tt>. A buffer can be told not to be freed when
<tt>kfree_skb()</tt> (see below) is called.
</li><li><tt>kfree_skb()</tt> releases a buffer, and if
<tt>skb->sk</tt> is set it lowers the memory use counts of the
socket (<tt>sk</tt>). It is up tothe socket and protocol-level
routines to have incremented these counts and to avoid freeing
a socket with outstanding buffers. The memory counts are very
important, as the kernel networking layers need to know how
much memory is tied up by each connection in order to prevent
remote machines or local processes from using too much memory.
</li><li><tt>skb_clone()</tt> makes a copy of an
<tt>sk_buff</tt> but does not copy the data area, which must be
considered read only.
</li><li>For some things a copy of the data is needed
for editing, and <tt>skb_copy()</tt> provides the same
facilities but also copies the data (and thus has a much higher
overhead).
</li></ul>
<p><a name="fig1"><center><img src="network%20device%20drivers_files/fig1.gif"><br>
Figure 1: After alloc_skb</center></a>
</p><p><a name="fig2"><center><img src="network%20device%20drivers_files/fig2.gif"><br>
Figure 2: After skb_reserve</center></a>
</p><p><a name="fig3"><center><img src="network%20device%20drivers_files/fig3.gif"><br>
Figure 3: An sk_buff containing data</center></a>
</p><p><a name="fig4"><center><img src="network%20device%20drivers_files/fig4.gif"><br>
Figure 4: After skb_put has been called on the buffer</center></a>
</p><p><a name="fig5"><center><img src="network%20device%20drivers_files/fig5.gif"><br>
Figure 5: After an skb_push has occured on the previous buffer</center></a>
</p><p><a name="fig6"><center><img src="network%20device%20drivers_files/fig6.gif"><br>
Figure 6: Network device data flow</center></a>
</p><h3>Higher Level Support Routines</h3>
<p>The semantics of allocating and queueing buffers for sockets
also involve flow control rules and for sending a whole list of
interactions with signals and optional settings such as non
blocking. Two routines are designed to make this easy for most
protocols.
</p><p>The <tt>sock_queue_rcv_skb()</tt> function is used to handle
incoming data flow control and is normally used in the form:
</p><pre> sk=my_find_socket(whatever);
if(sock_queue_rcv_skb(sk,skb)==-1)
{
myproto_stats.dropped++;
kfree_skb(skb,FREE_READ);
return;
}
</pre>
This function uses the socket read queue counters to prevent
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -