⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 napi_howto.txt

📁 Linux Kernel 2.6.9 for OMAP1710
💻 TXT
📖 第 1 页 / 共 2 页
字号:
HISTORY:February 16/2002 -- revision 0.2.1:COR typo correctedFebruary 10/2002 -- revision 0.2:some spell checking ;->January 12/2002 -- revision 0.1This is still work in progress so may change.To keep up to date please watch this space.Introduction to NAPI====================NAPI is a proven (www.cyberus.ca/~hadi/usenix-paper.tgz) techniqueto improve network performance on Linux. For more details pleaseread that paper.NAPI provides a "inherent mitigation" which is bound by system capacityas can be seen from the following data collected by Robert on Gigabit ethernet (e1000): Psize    Ipps       Tput     Rxint     Txint    Done     Ndone ---------------------------------------------------------------   60    890000     409362        17     27622        7     6823  128    758150     464364        21      9301       10     7738  256    445632     774646        42     15507       21    12906  512    232666     994445    241292     19147   241192     1062 1024    119061    1000003    872519     19258   872511        0 1440     85193    1000003    946576     19505   946569        0 Legend:"Ipps" stands for input packets per second. "Tput" == packets out of total 1M that made it out."txint" == transmit completion interrupts seen"Done" == The number of times that the poll() managed to pull allpackets out of the rx ring. Note from this that the lower theload the more we could clean up the rxring"Ndone" == is the converse of "Done". Note again, that the higherthe load the more times we couldnt clean up the rxring.Observe that:when the NIC receives 890Kpackets/sec only 17 rx interrupts are generated. The system cant handle the processing at 1 interrupt/packet at that load level. At lower rates on the other hand, rx interrupts go up and therefore theinterrupt/packet ratio goes up (as observable from that table). So there ispossibility that under low enough input, you get one poll call for eachinput packet caused by a single interrupt each time. And if the system cant handle interrupt per packet ratio of 1, then it will just have to chug along ....0) Prerequisites:==================A driver MAY continue using the old 2.4 technique for interfacingto the network stack and not benefit from the NAPI changes.NAPI additions to the kernel do not break backward compatibility.NAPI, however, requires the following features to be available:A) DMA ring or enough RAM to store packets in software devices.B) Ability to turn off interrupts or maybe events that send packets up the stack.NAPI processes packet events in what is known as dev->poll() method.Typically, only packet receive events are processed in dev->poll(). The rest of the events MAY be processed by the regular interrupt handler to reduce processing latency (justified also because there are not that many of them).Note, however, NAPI does not enforce that dev->poll() only processes receive events. Tests with the tulip driver indicated slightly increased latency ifall of the interrupt handler is moved to dev->poll(). Also MII handlinggets a little trickier.The example used in this document is to move the receive processing onlyto dev->poll(); this is shown with the patch for the tulip driver.For an example of code that moves all the interrupt driver to dev->poll() look at the ported e1000 code.There are caveats that might force you to go with moving everything to dev->poll(). Different NICs work differently depending on their status/event acknowledgement setup. There are two types of event register ACK mechanisms.	I)  what is known as Clear-on-read (COR).	when you read the status/event register, it clears everything!	The natsemi and sunbmac NICs are known to do this.	In this case your only choice is to move all to dev->poll()	II) Clear-on-write (COW)	 i) you clear the status by writing a 1 in the bit-location you want.		These are the majority of the NICs and work the best with NAPI.		Put only receive events in dev->poll(); leave the rest in		the old interrupt handler.	 ii) whatever you write in the status register clears every thing ;->		Cant seem to find any supported by Linux which do this. If		someone knows such a chip email us please.		Move all to dev->poll()C) Ability to detect new work correctly.NAPI works by shutting down event interrupts when theres work andturning them on when theres none. New packets might show up in the small window while interrupts were being re-enabled (refer to appendix 2).  A packet might sneak in during the period we are enabling interrupts. We only get to know about such a packet when the next new packet arrives and generates an interrupt. Essentially, there is a small window of opportunity for a race conditionwhich for clarity we'll refer to as the "rotting packet".This is a very important topic and appendix 2 is dedicated for more discussion.Locking rules and environmental guarantees==========================================-Guarantee: Only one CPU at any time can call dev->poll(); this is becauseonly one CPU can pick the initial interrupt and hence the initialnetif_rx_schedule(dev);- The core layer invokes devices to send packets in a round robin format.This implies receive is totaly lockless because of the guarantee only that one CPU is executing it.-  contention can only be the result of some other CPU accessing the rxring. This happens only in close() and suspend() (when these methodstry to clean the rx ring); ****guarantee: driver authors need not worry about this; synchronization is taken care for them by the top net layer.-local interrupts are enabled (if you dont move all to dev->poll()). For example link/MII and txcomplete continue functioning just same old way. This improves the latency of processing these events. It is also assumed that the receive interrupt is the largest cause of noise. Note this might not always be true. [according to Manfred Spraul, the winbond insists on sending one txmitcomplete interrupt for each packet (although this can be mitigated)].For these broken drivers, move all to dev->poll().For the rest of this text, we'll assume that dev->poll() onlyprocesses receive events.new methods introduce by NAPI=============================a) netif_rx_schedule(dev)Called by an IRQ handler to schedule a poll for deviceb) netif_rx_schedule_prep(dev)puts the device in a state which allows for it to be added to theCPU polling list if it is up and running. You can look at this asthe first half of  netif_rx_schedule(dev) above; the second halfbeing c) below.c) __netif_rx_schedule(dev)Add device to the poll list for this CPU; assuming that _prep abovehas already been called and returned 1.d) netif_rx_reschedule(dev, undo)Called to reschedule polling for device specifically for somedeficient hardware. Read Appendix 2 for more details.e) netif_rx_complete(dev)Remove interface from the CPU poll list: it must be in the poll liston current cpu. This primitive is called by dev->poll(), whenit completes its work. The device cannot be out of poll list at thiscall, if it is then clearly it is a BUG(). You'll know ;->All these above nethods are used below. So keep reading for clarity.Device driver changes to be made when porting NAPI==================================================Below we describe what kind of changes are required for NAPI to work.1) introduction of dev->poll() method =====================================This is the method that is invoked by the network core when it requestsfor new packets from the driver. A driver is allowed to send uptodev->quota packets by the current CPU before yielding to the networksubsystem (so other devices can also get opportunity to send to the stack).dev->poll() prototype looks as follows:int my_poll(struct net_device *dev, int *budget)budget is the remaining number of packets the network subsystem on thecurrent CPU can send up the stack before yielding to other system tasks.*Each driver is responsible for decrementing budget by the total number ofpackets sent.	Total number of packets cannot exceed dev->quota.dev->poll() method is invoked by the top layer, the driver just sends if it can to the stack the packet quantity requested.more on dev->poll() below after the interrupt changes are explained.2) registering dev->poll() method===================================dev->poll should be set in the dev->probe() method. e.g:dev->open = my_open;../* two new additions *//* first register my poll method */dev->poll = my_poll;/* next register my weight/quanta; can be overridden in /proc */dev->weight = 16;..dev->stop = my_close;3) scheduling dev->poll()=============================This involves modifying the interrupt handler and the codepath which takes the packet off the NIC and sends them to the stack.it's important at this point to introduce the classical D Becker interrupt processor:------------------static irqreturn_tnetdevice_interrupt(int irq, void *dev_id, struct pt_regs *regs){	struct net_device *dev = (struct net_device *)dev_instance;	struct my_private *tp = (struct my_private *)dev->priv;	int work_count = my_work_count;        status = read_interrupt_status_reg();        if (status == 0)                return IRQ_NONE; /* Shared IRQ: not us */        if (status == 0xffff)                return IRQ_HANDLED;      /* Hot unplug */        if (status & error)		do_some_error_handling()        	do {		acknowledge_ints_ASAP();		if (status & link_interrupt) {			spin_lock(&tp->link_lock);			do_some_link_stat_stuff();			spin_lock(&tp->link_lock);		}				if (status & rx_interrupt) {			receive_packets(dev);		}		if (status & rx_nobufs) {			make_rx_buffs_avail();		}					if (status & tx_related) {			spin_lock(&tp->lock);			tx_ring_free(dev);			if (tx_died)				restart_tx();			spin_unlock(&tp->lock);		}		status = read_interrupt_status_reg();	} while (!(status & error) || more_work_to_be_done);	return IRQ_HANDLED;}----------------------------------------------------------------------We now change this to what is shown below to NAPI-enable it:----------------------------------------------------------------------static irqreturn_tnetdevice_interrupt(int irq, void *dev_id, struct pt_regs *regs){	struct net_device *dev = (struct net_device *)dev_instance;	struct my_private *tp = (struct my_private *)dev->priv;        status = read_interrupt_status_reg();        if (status == 0)                return IRQ_NONE;         /* Shared IRQ: not us */        if (status == 0xffff)                return IRQ_HANDLED;         /* Hot unplug */        if (status & error)		do_some_error_handling();        	do {/************************ start note *********************************/				acknowledge_ints_ASAP();  // dont ack rx and rxnobuff here/************************ end note *********************************/				if (status & link_interrupt) {			spin_lock(&tp->link_lock);			do_some_link_stat_stuff();			spin_unlock(&tp->link_lock);		}/************************ start note *********************************/				if (status & rx_interrupt || (status & rx_nobuffs)) {			if (netif_rx_schedule_prep(dev)) {				/* disable interrupts caused 			         *	by arriving packets */				disable_rx_and_rxnobuff_ints();				/* tell system we have work to be done. */				__netif_rx_schedule(dev);			} else {				printk("driver bug! interrupt while in poll\n");				/* FIX by disabling interrupts  */				disable_rx_and_rxnobuff_ints();			}		}/************************ end note note *********************************/							if (status & tx_related) {			spin_lock(&tp->lock);			tx_ring_free(dev);			if (tx_died)				restart_tx();			spin_unlock(&tp->lock);		}		status = read_interrupt_status_reg();/************************ start note *********************************/			} while (!(status & error) || more_work_to_be_done(status));/************************ end note note *********************************/			return IRQ_HANDLED;}---------------------------------------------------------------------We note several things from above:I) Any interrupt source which is caused by arriving packets is nowturned off when it occurs. Depending on the hardware, there could beseveral reasons that arriving packets would cause interrupts; these are theinterrupt sources we wish to avoid. The two common ones are a) a packet arriving (rxint) b) a packet arriving and finding no DMA buffers available(rxnobuff) .This means also acknowledge_ints_ASAP() will not clear the statusregister for those two items above; clearing is done in the place where proper work is done within NAPI; at the poll() and refill_rx_ring() discussed further below.netif_rx_schedule_prep() returns 1 if device is in running state andgets successfully added to the core poll list. If we get a zero valuewe can _almost_ assume are already added to the list (instead of not running. Logic based on the fact that you shouldn't get interrupt if not running)We rectify this by disabling rx and rxnobuf interrupts.II) that receive_packets(dev) and make_rx_buffs_avail() may have disappeared.These functionalities are still around actually......infact, receive_packets(dev) is very close to my_poll() and make_rx_buffs_avail() is invoked from my_poll()4) converting receive_packets() to dev->poll()===============================================We need to convert the classical D Becker receive_packets(dev) to my_poll()First the typical receive_packets() below:-------------------------------------------------------------------/* this is called by interrupt handler */static void receive_packets (struct net_device *dev){	struct my_private *tp = (struct my_private *)dev->priv;	rx_ring = tp->rx_ring;	cur_rx = tp->cur_rx;	int entry = cur_rx % RX_RING_SIZE;	int received = 0;	int rx_work_limit = tp->dirty_rx + RX_RING_SIZE - tp->cur_rx;	while (rx_ring_not_empty) {		u32 rx_status;		unsigned int rx_size;		unsigned int pkt_size;		struct sk_buff *skb;                /* read size+status of next frame from DMA ring buffer */		/* the number 16 and 4 are just examples */                rx_status = le32_to_cpu (*(u32 *) (rx_ring + ring_offset));

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -