📄 dma-mapping.txt
字号:
Dynamic DMA mapping =================== David S. Miller <davem@redhat.com> Richard Henderson <rth@cygnus.com> Jakub Jelinek <jakub@redhat.com>Most of the 64bit platforms have special hardware that translates busaddresses (DMA addresses) into physical addresses. This is similar tohow page tables and/or a TLB translates virtual addresses to physicaladdresses on a cpu. This is needed so that e.g. PCI devices canaccess with a Single Address Cycle (32bit DMA address) any page in the64bit physical address space. Previously in Linux those 64bitplatforms had to set artificial limits on the maximum RAM size in thesystem, so that the virt_to_bus() static scheme works (the DMA addresstranslation tables were simply filled on bootup to map each busaddress to the physical page __pa(bus_to_virt())).So that Linux can use the dynamic DMA mapping, it needs some help from thedrivers, namely it has to take into account that DMA addresses should bemapped only for the time they are actually used and unmapped after the DMAtransfer.The following API will work of course even on platforms where no suchhardware exists, see e.g. include/asm-i386/pci.h for how it is implemented ontop of the virt_to_bus interface.First of all, you should make sure#include <linux/pci.h>is in your driver. This file will obtain for you the definition of thedma_addr_t (which can hold any valid DMA address for the platform)type which should be used everywhere you hold a DMA (bus) addressreturned from the DMA mapping functions. What memory is DMA'able?The first piece of information you must know is what kernel memory canbe used with the DMA mapping facilitites. There has been an unwrittenset of rules regarding this, and this text is an attempt to finallywrite them down.If you acquired your memory via the page allocator(i.e. __get_free_page*()) or the generic memory allocators(i.e. kmalloc() or kmem_cache_alloc()) then you may DMA to/fromthat memory using the addresses returned from those routines.This means specifically that you may _not_ use the memory/addressesreturned from vmalloc() for DMA. It is possible to DMA to the_underlying_ memory mapped into a vmalloc() area, but this requireswalking page tables to get the physical addresses, and thentranslating each of those pages back to a kernel address usingsomething like __va(). [ EDIT: Update this when we integrateGerd Knorr's generic code which does this. ]This rule also means that you may not use kernel image addresses(ie. items in the kernel's data/text/bss segment, or your driver's)nor may you use kernel stack addresses for DMA. Both of these itemsmight be mapped somewhere entirely different than the rest of physicalmemory.Also, this means that you cannot take the return of a kmap()call and DMA to/from that. This is similar to vmalloc().What about block I/O and networking buffers? The block I/O andnetworking subsystems make sure that the buffers they use are validfor you to DMA from/to. DMA addressing limitationsDoes your device have any DMA addressing limitations? For example, isyour device only capable of driving the low order 24-bits of addresson the PCI bus for SAC DMA transfers? If so, you need to inform thePCI layer of this fact.By default, the kernel assumes that your device can address the full32-bits in a SAC cycle. For a 64-bit DAC capable device, this needsto be increased. And for a device with limitations, as discussed inthe previous paragraph, it needs to be decreased.For correct operation, you must interrogate the PCI layer in yourdevice probe routine to see if the PCI controller on the machine canproperly support the DMA addressing limitation your device has. It isgood style to do this even if your device holds the default setting,because this shows that you did think about these issues wrt. yourdevice.The query is performed via a call to pci_set_dma_mask(): int pci_set_dma_mask(struct pci_dev *pdev, u64 device_mask);Here, pdev is a pointer to the PCI device struct of your device, anddevice_mask is a bit mask describing which bits of a PCI address yourdevice supports. It returns zero if your card can perform DMAproperly on the machine given the address mask you provided.If it returns non-zero, your device can not perform DMA properly onthis platform, and attempting to do so will result in undefinedbehavior. You must either use a different mask, or not use DMA.This means that in the failure case, you have three options:1) Use another DMA mask, if possible (see below).2) Use some non-DMA mode for data transfer, if possible.3) Ignore this device and do not initialize it.It is recommended that your driver print a kernel KERN_WARNING messagewhen you end up performing either #2 or #2. In this manner, if a userof your driver reports that performance is bad or that the device is noteven detected, you can ask them for the kernel messages to find outexactly why.The standard 32-bit addressing PCI device would do something likethis: if (pci_set_dma_mask(pdev, 0xffffffff)) { printk(KERN_WARNING "mydev: No suitable DMA available.\n"); goto ignore_this_device; }Another common scenario is a 64-bit capable device. The approachhere is to try for 64-bit DAC addressing, but back down to a32-bit mask should that fail. The PCI platform code may fail the64-bit mask not because the platform is not capable of 64-bitaddressing. Rather, it may fail in this case simply because32-bit SAC addressing is done more efficiently than DAC addressing.Sparc64 is one platform which behaves in this way.Here is how you would handle a 64-bit capable device which can driveall 64-bits during a DAC cycle: int using_dac; if (!pci_set_dma_mask(pdev, 0xffffffffffffffff)) { using_dac = 1; } else if (!pci_set_dma_mask(pdev, 0xffffffff)) { using_dac = 0; } else { printk(KERN_WARNING "mydev: No suitable DMA available.\n"); goto ignore_this_device; }If your 64-bit device is going to be an enormous consumer of DMAmappings, this can be problematic since the DMA mappings are afinite resource on many platforms. Please see the "DAC Addressingfor Address Space Hungry Devices" setion near the end of thisdocument for how to handle this case.Finally, if your device can only drive the low 24-bits ofaddress during PCI bus mastering you might do something like: if (pci_set_dma_mask(pdev, 0x00ffffff)) { printk(KERN_WARNING "mydev: 24-bit DMA addressing not available.\n"); goto ignore_this_device; }When pci_set_dma_mask() is successful, and returns zero, the PCI layersaves away this mask you have provided. The PCI layer will use thisinformation later when you make DMA mappings.There is a case which we are aware of at this time, which is worthmentioning in this documentation. If your device supports multiplefunctions (for example a sound card provides playback and recordfunctions) and the various different functions have _different_DMA addressing limitations, you may wish to probe each mask andonly provide the functionality which the machine can handle. Itis important that the last call to pci_set_dma_mask() be for the most specific mask.Here is pseudo-code showing how this might be done: #define PLAYBACK_ADDRESS_BITS 0xffffffff #define RECORD_ADDRESS_BITS 0x00ffffff struct my_sound_card *card; struct pci_dev *pdev; ... if (pci_set_dma_mask(pdev, PLAYBACK_ADDRESS_BITS)) { card->playback_enabled = 1; } else { card->playback_enabled = 0; printk(KERN_WARN "%s: Playback disabled due to DMA limitations.\n", card->name); } if (pci_set_dma_mask(pdev, RECORD_ADDRESS_BITS)) { card->record_enabled = 1; } else { card->record_enabled = 0; printk(KERN_WARN "%s: Record disabled due to DMA limitations.\n", card->name); }A sound card was used as an example here because this genre of PCIdevices seems to be littered with ISA chips given a PCI front end,and thus retaining the 16MB DMA addressing limitations of ISA. Types of DMA mappingsThere are two types of DMA mappings:- Consistent DMA mappings which are usually mapped at driver initialization, unmapped at the end and for which the hardware should guarantee that the device and the cpu can access the data in parallel and will see updates made by each other without any explicit software flushing. Think of "consistent" as "synchronous" or "coherent". Consistent DMA mappings are always SAC addressable. That is to say, consistent DMA addresses given to the driver will always be in the low 32-bits of the PCI bus space. Good examples of what to use consistent mappings for are: - Network card DMA ring descriptors. - SCSI adapter mailbox command data structures. - Device firmware microcode executed out of main memory. The invariant these examples all require is that any cpu store to memory is immediately visible to the device, and vice versa. Consistent mappings guarantee this. IMPORTANT: Consistent DMA memory does not preclude the usage of proper memory barriers. The cpu may reorder stores to consistent memory just as it may normal memory. Example: if it is important for the device to see the first word of a descriptor updated before the second, you must do something like: desc->word0 = address; wmb(); desc->word1 = DESC_VALID; in order to get correct behavior on all platforms.- Streaming DMA mappings which are usually mapped for one DMA transfer, unmapped right after it (unless you use pci_dma_sync below) and for which hardware can optimize for sequential accesses. This of "streaming" as "asynchronous" or "outside the coherency domain". Good examples of what to use streaming mappings for are: - Networking buffers transmitted/received by a device. - Filesystem buffers written/read by a SCSI device. The interfaces for using this type of mapping were designed in such a way that an implementation can make whatever performance optimizations the hardware allows. To this end, when using such mappings you must be explicit about what you want to happen.Neither type of DMA mapping has alignment restrictions that comefrom PCI, although some devices may have such restrictions. Using Consistent DMA mappings.To allocate and map large (PAGE_SIZE or so) consistent DMA regions,you should do: dma_addr_t dma_handle; cpu_addr = pci_alloc_consistent(dev, size, &dma_handle);where dev is a struct pci_dev *. You should pass NULL for PCI like buseswhere devices don't have struct pci_dev (like ISA, EISA). This may becalled in interrupt context. This argument is needed because the DMA translations may be busspecific (and often is private to the bus which the device is attachedto).Size is the length of the region you want to allocate, in bytes.This routine will allocate RAM for that region, so it acts similarly to__get_free_pages (but takes size instead of a page order). If yourdriver needs regions sized smaller than a page, you may prefer usingthe pci_pool interface, described below.The consistent DMA mapping interfaces, for non-NULL dev, will alwaysreturn a DMA address which is SAC (Single Address Cycle) addressible.Even if the device indicates (via PCI dma mask) that it may addressthe upper 32-bits and thus perform DAC cycles, consistent allocationwill still only return 32-bit PCI addresses for DMA. This is trueof the pci_pool interface as well.In fact, as mentioned above, all consistent memory provided by thekernel DMA APIs are always SAC addressable.pci_alloc_consistent returns two values: the virtual address which youcan use to access it from the CPU and dma_handle which you pass to thecard.The cpu return address and the DMA bus master address are bothguaranteed to be aligned to the smallest PAGE_SIZE order whichis greater than or equal to the requested size. This invariantexists (for example) to guarantee that if you allocate a chunkwhich is smaller than or equal to 64 kilobytes, the extent of thebuffer you receive will not cross a 64K boundary.To unmap and free such a DMA region, you call: pci_free_consistent(dev, size, cpu_addr, dma_handle);where dev, size are the same as in the above call and cpu_addr anddma_handle are the values pci_alloc_consistent returned to you.This function may not be called in interrupt context.If your driver needs lots of smaller memory regions, you can writecustom code to subdivide pages returned by pci_alloc_consistent,or you can use the pci_pool API to do that. A pci_pool is likea kmem_cache, but it uses pci_alloc_consistent not __get_free_pages.Also, it understands common hardware constraints for alignment,like queue heads needing to be aligned on N byte boundaries.Create a pci_pool like this: struct pci_pool *pool; pool = pci_pool_create(name, dev, size, align, alloc, flags);The "name" is for diagnostics (like a kmem_cache name); dev and sizeare as above. The device's hardware alignment requirement for thistype of data is "align" (which is expressed in bytes, and must be apower of two). The flags are SLAB_ flags as you'd pass tokmem_cache_create. Not all flags are understood, but SLAB_POISON mayhelp you find driver bugs. If you call this in a non- sleepingcontext (f.e. in_interrupt is true or while holding SMP locks), passSLAB_ATOMIC. If your device has no boundary crossing restrictions,pass 0 for alloc; passing 4096 says memory allocated from this poolmust not cross 4KByte boundaries (but at that time it may be better togo for pci_alloc_consistent directly instead).Allocate memory from a pci pool like this: cpu_addr = pci_pool_alloc(pool, flags, &dma_handle);flags are SLAB_KERNEL if blocking is permitted (not in_interrupt norholding SMP locks), SLAB_ATOMIC otherwise. Like pci_alloc_consistent,this returns two values, cpu_addr and dma_handle.Free memory that was allocated from a pci_pool like this: pci_pool_free(pool, cpu_addr, dma_handle);where pool is what you passed to pci_pool_alloc, and cpu_addr anddma_handle are the values pci_pool_alloc returned. This functionmay be called in interrupt context.Destroy a pci_pool by calling: pci_pool_destroy(pool);Make sure you've called pci_pool_free for all memory allocatedfrom a pool before you destroy the pool. This function may notbe called in interrupt context. DMA DirectionThe interfaces described in subsequent portions of this documenttake a DMA direction argument, which is an integer and takes onone of the following values: PCI_DMA_BIDIRECTIONAL PCI_DMA_TODEVICE PCI_DMA_FROMDEVICE PCI_DMA_NONEOne should provide the exact DMA direction if you know it.PCI_DMA_TODEVICE means "from main memory to the PCI device"PCI_DMA_FROMDEVICE means "from the PCI device to main memory"It is the direction in which the data moves during the DMAtransfer.You are _strongly_ encouraged to specify this as preciselyas you possibly can.If you absolutely cannot know the direction of the DMA transfer,specify PCI_DMA_BIDIRECTIONAL. It means that the DMA can go ineither direction. The platform guarantees that you may legallyspecify this, and that it will work, but this may be at thecost of performance for example.The value PCI_DMA_NONE is to be used for debugging. One canhold this in a data structure before you come to know theprecise direction, and this will help catch cases where yourdirection tracking logic has failed to set things up properly.Another advantage of specifying this value precisely (outside ofpotential platform-specific optimizations of such) is for debugging.Some platforms actually have a write permission boolean which DMAmappings can be marked with, much like page protections in the userprogram address space. Such platforms can and do report errors in the
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -