📄 dma-mapping.txt
字号:
kernel logs when the PCI controller hardware detects violation of thepermission setting.Only streaming mappings specify a direction, consistent mappingsimplicitly have a direction attribute setting ofPCI_DMA_BIDIRECTIONAL.The SCSI subsystem provides mechanisms for you to easily obtainthe direction to use, in the SCSI command: scsi_to_pci_dma_dir(SCSI_DIRECTION)Where SCSI_DIRECTION is obtained from the 'sc_data_direction'member of the SCSI command your driver is working on. Thementioned interface above returns a value suitable for passinginto the streaming DMA mapping interfaces below.For Networking drivers, it's a rather simple affair. For transmitpackets, map/unmap them with the PCI_DMA_TODEVICE directionspecifier. For receive packets, just the opposite, map/unmap themwith the PCI_DMA_FROMDEVICE direction specifier. Using Streaming DMA mappingsThe streaming DMA mapping routines can be called from interruptcontext. There are two versions of each map/unmap, one which willmap/unmap a single memory region, and one which will map/unmap ascatterlist.To map a single region, you do: struct pci_dev *pdev = mydev->pdev; dma_addr_t dma_handle; void *addr = buffer->ptr; size_t size = buffer->len; dma_handle = pci_map_single(dev, addr, size, direction);and to unmap it: pci_unmap_single(dev, dma_handle, size, direction);You should call pci_unmap_single when the DMA activity is finished, e.g.from the interrupt which told you that the DMA transfer is done.Using cpu pointers like this for single mappings has a disadvantage,you cannot reference HIGHMEM memory in this way. Thus, there is amap/unmap interface pair akin to pci_{map,unmap}_single. Theseinterfaces deal with page/offset pairs instead of cpu pointers.Specifically: struct pci_dev *pdev = mydev->pdev; dma_addr_t dma_handle; struct page *page = buffer->page; unsigned long offset = buffer->offset; size_t size = buffer->len; dma_handle = pci_map_page(dev, page, offset, size, direction); ... pci_unmap_page(dev, dma_handle, size, direction);Here, "offset" means byte offset within the given page.With scatterlists, you map a region gathered from several regions by: int i, count = pci_map_sg(dev, sglist, nents, direction); struct scatterlist *sg; for (i = 0, sg = sglist; i < count; i++, sg++) { hw_address[i] = sg_dma_address(sg); hw_len[i] = sg_dma_len(sg); }where nents is the number of entries in the sglist.The implementation is free to merge several consecutive sglist entriesinto one (e.g. if DMA mapping is done with PAGE_SIZE granularity, anyconsecutive sglist entries can be merged into one provided the first oneends and the second one starts on a page boundary - in fact this is a hugeadvantage for cards which either cannot do scatter-gather or have verylimited number of scatter-gather entries) and returns the actual numberof sg entries it mapped them to.Then you should loop count times (note: this can be less than nents times)and use sg_dma_address() and sg_dma_length() macros where you previouslyaccessed sg->address and sg->length as shown above.To unmap a scatterlist, just call: pci_unmap_sg(dev, sglist, nents, direction);Again, make sure DMA activity has already finished.PLEASE NOTE: The 'nents' argument to the pci_unmap_sg call must be the _same_ one you passed into the pci_map_sg call, it should _NOT_ be the 'count' value _returned_ from the pci_map_sg call.Every pci_map_{single,sg} call should have its pci_unmap_{single,sg}counterpart, because the bus address space is a shared resource (althoughin some ports the mapping is per each BUS so less devices contend for thesame bus address space) and you could render the machine unusable by eatingall bus addresses.If you need to use the same streaming DMA region multiple times and touchthe data in between the DMA transfers, just map it withpci_map_{single,sg}, and after each DMA transfer call either: pci_dma_sync_single(dev, dma_handle, size, direction);or: pci_dma_sync_sg(dev, sglist, nents, direction);as appropriate.After the last DMA transfer call one of the DMA unmap routinespci_unmap_{single,sg}. If you don't touch the data from the first pci_map_*call till pci_unmap_*, then you don't have to call the pci_dma_sync_*routines at all.Here is pseudo code which shows a situation in which you would needto use the pci_dma_sync_*() interfaces. my_card_setup_receive_buffer(struct my_card *cp, char *buffer, int len) { dma_addr_t mapping; mapping = pci_map_single(cp->pdev, buffer, len, PCI_DMA_FROMDEVICE); cp->rx_buf = buffer; cp->rx_len = len; cp->rx_dma = mapping; give_rx_buf_to_card(cp); } ... my_card_interrupt_handler(int irq, void *devid, struct pt_regs *regs) { struct my_card *cp = devid; ... if (read_card_status(cp) == RX_BUF_TRANSFERRED) { struct my_card_header *hp; /* Examine the header to see if we wish * to accept the data. But synchronize * the DMA transfer with the CPU first * so that we see updated contents. */ pci_dma_sync_single(cp->pdev, cp->rx_dma, cp->rx_len, PCI_DMA_FROMDEVICE); /* Now it is safe to examine the buffer. */ hp = (struct my_card_header *) cp->rx_buf; if (header_is_ok(hp)) { pci_unmap_single(cp->pdev, cp->rx_dma, cp->rx_len, PCI_DMA_FROMDEVICE); pass_to_upper_layers(cp->rx_buf); make_and_setup_new_rx_buf(cp); } else { /* Just give the buffer back to the card. */ give_rx_buf_to_card(cp); } } }Drivers converted fully to this interface should not use virt_to_bus anylonger, nor should they use bus_to_virt. Some drivers have to be changed alittle bit, because there is no longer an equivalent to bus_to_virt in thedynamic DMA mapping scheme - you have to always store the DMA addressesreturned by the pci_alloc_consistent, pci_pool_alloc, and pci_map_singlecalls (pci_map_sg stores them in the scatterlist itself if the platformsupports dynamic DMA mapping in hardware) in your driver structures and/orin the card registers.All PCI drivers should be using these interfaces with no exceptions.It is planned to completely remove virt_to_bus() and bus_to_virt() asthey are entirely deprecated. Some ports already do not provide theseas it is impossible to correctly support them. 64-bit DMA and DAC cycle supportDo you understand all of the text above? Great, then you alreadyknow how to use 64-bit DMA addressing under Linux. Simply makethe appropriate pci_set_dma_mask() calls based upon your cardscapabilities, then use the mapping APIs above.It is that simple.Well, not for some odd devices. See the next section for informationabout that. DAC Addressing for Address Space Hungry DevicesThere exists a class of devices which do not mesh well with the PCIDMA mapping API. By definition these "mappings" are a finiteresource. The number of total available mappings per bus is platformspecific, but there will always be a reasonable amount.What is "reasonable"? Reasonable means that networking and block I/Odevices need not worry about using too many mappings.As an example of a problematic device, consider compute cluster cards.They can potentially need to access gigabytes of memory at once viaDMA. Dynamic mappings are unsuitable for this kind of access pattern.To this end we've provided a small API by which a device drivermay use DAC cycles to directly address all of physical memory.Not all platforms support this, but most do. It is easy to determinewhether the platform will work properly at probe time.First, understand that there may be a SEVERE performance penalty forusing these interfaces on some platforms. Therefore, you MUST onlyuse these interfaces if it is absolutely required. %99 of devices canuse the normal APIs without any problems.Note that for streaming type mappings you must either use theseinterfaces, or the dynamic mapping interfaces above. You may not mixusage of both for the same device. Such an act is illegal and isguarenteed to put a banana in your tailpipe.However, consistent mappings may in fact be used in conjunction withthese interfaces. Remember that, as defined, consistent mappings arealways going to be SAC addressable.The first thing your driver needs to do is query the PCI platformlayer with your devices DAC addressing capabilities: int pci_dac_set_dma_mask(struct pci_dev *pdev, u64 mask);This routine behaves identically to pci_set_dma_mask. You may notuse the following interfaces if this routine fails.Next, DMA addresses using this API are kept track of using thedma64_addr_t type. It is guarenteed to be big enough to hold anyDAC address the platform layer will give to you from the followingroutines. If you have consistent mappings as well, you stilluse plain dma_addr_t to keep track of those.All mappings obtained here will be direct. The mappings are nottranslated, and this is the purpose of this dialect of the DMA API.All routines work with page/offset pairs. This is the _ONLY_ way to portably refer to any piece of memory. If you have a cpu pointer(which may be validly DMA'd too) you may easily obtain the pageand offset using something like this: struct page *page = virt_to_page(ptr); unsigned long offset = ((unsigned long)ptr & ~PAGE_MASK);Here are the interfaces: dma64_addr_t pci_dac_page_to_dma(struct pci_dev *pdev, struct page *page, unsigned long offset, int direction);The DAC address for the tuple PAGE/OFFSET are returned. The directionargument is the same as for pci_{map,unmap}_single(). The same rulesfor cpu/device access apply here as for the streaming mappinginterfaces. To reiterate: The cpu may touch the buffer before pci_dac_page_to_dma. The device may touch the buffer after pci_dac_page_to_dma is made, but the cpu may NOT.When the DMA transfer is complete, invoke: void pci_dac_dma_sync_single(struct pci_dev *pdev, dma64_addr_t dma_addr, size_t len, int direction);This must be done before the CPU looks at the buffer again.This interface behaves identically to pci_dma_sync_{single,sg}().If you need to get back to the PAGE/OFFSET tuple from a dma64_addr_tthe following interfaces are provided: struct page *pci_dac_dma_to_page(struct pci_dev *pdev, dma64_addr_t dma_addr); unsigned long pci_dac_dma_to_offset(struct pci_dev *pdev, dma64_addr_t dma_addr);This is possible with the DAC interfaces purely because they arenot translated in any way. Optimizing Unmap State Space ConsumptionOn many platforms, pci_unmap_{single,page}() is simply a nop.Therefore, keeping track of the mapping address and length is a wasteof space. Instead of filling your drivers up with ifdefs and the liketo "work around" this (which would defeat the whole purpose of aportable API) the following facilities are provided.Actually, instead of describing the macros one by one, we'lltransform some example code.1) Use DECLARE_PCI_UNMAP_{ADDR,LEN} in state saving structures. Example, before: struct ring_state { struct sk_buff *skb; dma_addr_t mapping; __u32 len; }; after: struct ring_state { struct sk_buff *skb; DECLARE_PCI_UNMAP_ADDR(mapping) DECLARE_PCI_UNMAP_LEN(len) }; NOTE: DO NOT put a semicolon at the end of the DECLARE_*() macro.2) Use pci_unmap_{addr,len}_set to set these values. Example, before: ringp->mapping = FOO; ringp->len = BAR; after: pci_unmap_addr_set(ringp, mapping, FOO); pci_unmap_len_set(ringp, len, BAR);3) Use pci_unmap_{addr,len} to access these values. Example, before: pci_unmap_single(pdev, ringp->mapping, ringp->len, PCI_DMA_FROMDEVICE); after: pci_unmap_single(pdev, pci_unmap_addr(ringp, mapping), pci_unmap_len(ringp, len), PCI_DMA_FROMDEVICE);It really should be self-explanatory. We treat the ADDR and LENseperately, because it is possible for an implementation to onlyneed the address in order to perform the unmap operation. Platform IssuesIf you are just writing drivers for Linux and do not maintainan architecture port for the kernel, you can safely skip downto "Closing".1) Struct scatterlist requirements. Struct scatterlist must contain, at a minimum, the following members: char *address; struct page *page; unsigned int offset; unsigned int length; The "address" member will disappear in 2.5.x This means that your pci_{map,unmap}_sg() and all other interfaces dealing with scatterlists must be able to cope properly with page being non NULL. A scatterlist is in one of two states. The base address is either specified by "address" or by a "page+offset" pair. If "address" is NULL, then "page+offset" is being used. If "page" is NULL, then "address" is being used. In 2.5.x, all scatterlists will use "page+offset". But during 2.4.x we still have to support the old method.2) More to come... ClosingThis document, and the API itself, would not be in it's currentform without the feedback and suggestions from numerous individuals.We would like to specifically mention, in no particular order, thefollowing people: Russell King <rmk@arm.linux.org.uk> Leo Dagum <dagum@barrel.engr.sgi.com> Ralf Baechle <ralf@oss.sgi.com> Grant Grundler <grundler@cup.hp.com> Jay Estabrook <Jay.Estabrook@compaq.com> Thomas Sailer <sailer@ife.ee.ethz.ch> Andrea Arcangeli <andrea@suse.de> Jens Axboe <axboe@suse.de> David Mosberger-Tang <davidm@hpl.hp.com>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -