📄 spider_net.txt
字号:
The Spidernet Device Driver ===========================Written by Linas Vepstas <linas@austin.ibm.com>Version of 7 June 2007Abstract========This document sketches the structure of portions of the spidernetdevice driver in the Linux kernel tree. The spidernet is a gigabitethernet device built into the Toshiba southbridge commonly usedin the SONY Playstation 3 and the IBM QS20 Cell blade.The Structure of the RX Ring.=============================The receive (RX) ring is a circular linked list of RX descriptors,together with three pointers into the ring that are used to manage itscontents.The elements of the ring are called "descriptors" or "descrs"; theydescribe the received data. This includes a pointer to a buffercontaining the received data, the buffer size, and various status bits.There are three primary states that a descriptor can be in: "empty","full" and "not-in-use". An "empty" or "ready" descriptor is readyto receive data from the hardware. A "full" descriptor has data in it,and is waiting to be emptied and processed by the OS. A "not-in-use"descriptor is neither empty or full; it is simply not ready. It maynot even have a data buffer in it, or is otherwise unusable.During normal operation, on device startup, the OS (specifically, thespidernet device driver) allocates a set of RX descriptors and RXbuffers. These are all marked "empty", ready to receive data. Thisring is handed off to the hardware, which sequentially fills in thebuffers, and marks them "full". The OS follows up, taking the fullbuffers, processing them, and re-marking them empty.This filling and emptying is managed by three pointers, the "head"and "tail" pointers, managed by the OS, and a hardware currentdescriptor pointer (GDACTDPA). The GDACTDPA points at the descrcurrently being filled. When this descr is filled, the hardwaremarks it full, and advances the GDACTDPA by one. Thus, when there isflowing RX traffic, every descr behind it should be marked "full",and everything in front of it should be "empty". If the hardwarediscovers that the current descr is not empty, it will signal aninterrupt, and halt processing.The tail pointer tails or trails the hardware pointer. When thehardware is ahead, the tail pointer will be pointing at a "full"descr. The OS will process this descr, and then mark it "not-in-use",and advance the tail pointer. Thus, when there is flowing RX traffic,all of the descrs in front of the tail pointer should be "full", andall of those behind it should be "not-in-use". When RX traffic is notflowing, then the tail pointer can catch up to the hardware pointer.The OS will then note that the current tail is "empty", and haltprocessing.The head pointer (somewhat mis-named) follows after the tail pointer.When traffic is flowing, then the head pointer will be pointing ata "not-in-use" descr. The OS will perform various housekeeping dutieson this descr. This includes allocating a new data buffer anddma-mapping it so as to make it visible to the hardware. The OS willthen mark the descr as "empty", ready to receive data. Thus, when thereis flowing RX traffic, everything in front of the head pointer shouldbe "not-in-use", and everything behind it should be "empty". If noRX traffic is flowing, then the head pointer can catch up to the tailpointer, at which point the OS will notice that the head descr is"empty", and it will halt processing.Thus, in an idle system, the GDACTDPA, tail and head pointers willall be pointing at the same descr, which should be "empty". All of theother descrs in the ring should be "empty" as well.The show_rx_chain() routine will print out the the locations of theGDACTDPA, tail and head pointers. It will also summarize the contentsof the ring, starting at the tail pointer, and listing the statusof the descrs that follow.A typical example of the output, for a nearly idle system, might benet eth1: Total number of descrs=256net eth1: Chain tail located at descr=20net eth1: Chain head is at 20net eth1: HW curr desc (GDACTDPA) is at 21net eth1: Have 1 descrs with stat=x40800101net eth1: HW next desc (GDACNEXTDA) is at 22net eth1: Last 255 descrs with stat=xa0800000In the above, the hardware has filled in one descr, number 20. Bothhead and tail are pointing at 20, because it has not yet been emptied.Meanwhile, hw is pointing at 21, which is free.The "Have nnn decrs" refers to the descr starting at the tail: in thiscase, nnn=1 descr, starting at descr 20. The "Last nnn descrs" refersto all of the rest of the descrs, from the last status change. The "nnn"is a count of how many descrs have exactly the same status.The status x4... corresponds to "full" and status xa... correspondsto "empty". The actual value printed is RXCOMST_A.In the device driver source code, a different set of names areused for these same concepts, so that"empty" == SPIDER_NET_DESCR_CARDOWNED == 0xa"full" == SPIDER_NET_DESCR_FRAME_END == 0x4"not in use" == SPIDER_NET_DESCR_NOT_IN_USE == 0xfThe RX RAM full bug/feature===========================As long as the OS can empty out the RX buffers at a rate faster thanthe hardware can fill them, there is no problem. If, for some reason,the OS fails to empty the RX ring fast enough, the hardware GDACTDPApointer will catch up to the head, notice the not-empty condition,ad stop. However, RX packets may still continue arriving on the wire.The spidernet chip can save some limited number of these in local RAM.When this local ram fills up, the spider chip will issue an interruptindicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bitwill be set in GHIINT1STS). When the RX ram full condition occurs,a certain bug/feature is triggered that has to be specially handled.This section describes the special handling for this condition.When the OS finally has a chance to run, it will empty out the RX ring.In particular, it will clear the descriptor on which the hardware hadstopped. However, once the hardware has decided that a certaindescriptor is invalid, it will not restart at that descriptor; insteadit will restart at the next descr. This potentially will lead to adeadlock condition, as the tail pointer will be pointing at this descr,which, from the OS point of view, is empty; the OS will be waiting forthis descr to be filled. However, the hardware has skipped this descr,and is filling the next descrs. Since the OS doesn't see this, thereis a potential deadlock, with the OS waiting for one descr to fill,while the hardware is waiting for a different set of descrs to becomeempty.A call to show_rx_chain() at this point indicates the nature of theproblem. A typical print when the network is hung shows the following:net eth1: Spider RX RAM full, incoming packets might be discarded!net eth1: Total number of descrs=256net eth1: Chain tail located at descr=255net eth1: Chain head is at 255net eth1: HW curr desc (GDACTDPA) is at 0net eth1: Have 1 descrs with stat=xa0800000net eth1: HW next desc (GDACNEXTDA) is at 1net eth1: Have 127 descrs with stat=x40800101net eth1: Have 1 descrs with stat=x40800001net eth1: Have 126 descrs with stat=x40800101net eth1: Last 1 descrs with stat=xa0800000Both the tail and head pointers are pointing at descr 255, which ismarked xa... which is "empty". Thus, from the OS point of view, thereis nothing to be done. In particular, there is the implicit assumptionthat everything in front of the "empty" descr must surely also be empty,as explained in the last section. The OS is waiting for descr 255 tobecome non-empty, which, in this case, will never happen.The HW pointer is at descr 0. This descr is marked 0x4.. or "full".Since its already full, the hardware can do nothing more, and thus hashalted processing. Notice that descrs 0 through 254 are all marked"full", while descr 254 and 255 are empty. (The "Last 1 descrs" isdescr 254, since tail was at 255.) Thus, the system is deadlocked,and there can be no forward progress; the OS thinks there's nothingto do, and the hardware has nowhere to put incoming data.This bug/feature is worked around with the spider_net_resync_head_ptr()routine. When the driver receives RX interrupts, but an examinationof the RX chain seems to show it is empty, then it is probable thatthe hardware has skipped a descr or two (sometimes dozens under heavynetwork conditions). The spider_net_resync_head_ptr() subroutine willsearch the ring for the next full descr, and the driver will resumeoperations there. Since this will leave "holes" in the ring, thereis also a spider_net_resync_tail_ptr() that will skip over such holes.As of this writing, the spider_net_resync() strategy seems to work verywell, even under heavy network loads.The TX ring===========The TX ring uses a low-watermark interrupt scheme to make sure thatthe TX queue is appropriately serviced for large packet sizes.For packet sizes greater than about 1KBytes, the kernel can fillthe TX ring quicker than the device can drain it. Once the ringis full, the netdev is stopped. When there is room in the ring,the netdev needs to be reawakened, so that more TX packets are placedin the ring. The hardware can empty the ring about four times per jiffy,so its not appropriate to wait for the poll routine to refill, sincethe poll routine runs only once per jiffy. The low-watermark mechanismmarks a descr about 1/4th of the way from the bottom of the queue, sothat an interrupt is generated when the descr is processed. Thisinterrupt wakes up the netdev, which can then refill the queue.For large packets, this mechanism generates a relatively small numberof interrupts, about 1K/sec. For smaller packets, this will drop to zerointerrupts, as the hardware can empty the queue faster than the kernelcan fill it. ======= END OF DOCUMENT ========
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -