⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 biodoc.txt

📁 linux 内核源代码
💻 TXT
📖 第 1 页 / 共 4 页
字号:
The normal i/o submission interfaces, e.g submit_bio, could be bypassedfor specially crafted requests which such ioctl or diagnosticsinterfaces would typically use, and the elevator add_request routinecan instead be used to directly insert such requests in the queue or preferablythe blk_do_rq routine can be used to place the request on the queue andwait for completion. Alternatively, sometimes the caller might justinvoke a lower level driver specific interface with the request as aparameter.If the request is a means for passing on special information associated withthe command, then such information is associated with the request->specialfield (rather than misuse the request->buffer field which is meant for therequest data buffer's virtual mapping).For passing request data, the caller must build up a bio descriptorrepresenting the concerned memory buffer if the underlying driver interpretsbio segments or uses the block layer end*request* functions for i/ocompletion. Alternatively one could directly use the request->buffer field tospecify the virtual address of the buffer, if the driver expects bufferaddresses passed in this way and ignores bio entries for the request typeinvolved. In the latter case, the driver would modify and manage therequest->buffer, request->sector and request->nr_sectors orrequest->current_nr_sectors fields itself rather than using the block layerend_request or end_that_request_first completion interfaces.(See 2.3 or Documentation/block/request.txt for a brief explanation ofthe request structure fields)[TBD: end_that_request_last should be usable even in this case;Perhaps an end_that_direct_request_first routine could be implemented to makehandling direct requests easier for such drivers; Also for drivers thatexpect bios, a helper function could be provided for setting up a biocorresponding to a data buffer]<JENS: I dont understand the above, why is end_that_request_first() notusable? Or _last for that matter. I must be missing something><SUP: What I meant here was that if the request doesn't have a bio, then end_that_request_first doesn't modify nr_sectors or current_nr_sectors, and hence can't be used for advancing request state settings on the completion of partial transfers. The driver has to modify these fields  directly by hand. This is because end_that_request_first only iterates over the bio list, and always returns 0 if there are none associated with the request. _last works OK in this case, and is not a problem, as I mentioned earlier>1.3.1 Pre-built CommandsA request can be created with a pre-built custom command  to be sent directlyto the device. The cmd block in the request structure has room for fillingin the command bytes. (i.e rq->cmd is now 16 bytes in size, and meant forcommand pre-building, and the type of the request is now indicatedthrough rq->flags instead of via rq->cmd)The request structure flags can be set up to indicate the type of requestin such cases (REQ_PC: direct packet command passed to driver, REQ_BLOCK_PC:packet command issued via blk_do_rq, REQ_SPECIAL: special request).It can help to pre-build device commands for requests in advance.Drivers can now specify a request prepare function (q->prep_rq_fn) that theblock layer would invoke to pre-build device commands for a given request,or perform other preparatory processing for the request. This is routine iscalled by elv_next_request(), i.e. typically just before servicing a request.(The prepare function would not be called for requests that have REQ_DONTPREPenabled)Aside:  Pre-building could possibly even be done early, i.e before placing the  request on the queue, rather than construct the command on the fly in the  driver while servicing the request queue when it may affect latencies in  interrupt context or responsiveness in general. One way to add early  pre-building would be to do it whenever we fail to merge on a request.  Now REQ_NOMERGE is set in the request flags to skip this one in the future,  which means that it will not change before we feed it to the device. So  the pre-builder hook can be invoked there.2. Flexible and generic but minimalist i/o structure/descriptor.2.1 Reason for a new structure and requirements addressedPrior to 2.5, buffer heads were used as the unit of i/o at the generic blocklayer, and the low level request structure was associated with a chain ofbuffer heads for a contiguous i/o request. This led to certain inefficiencieswhen it came to large i/o requests and readv/writev style operations, as itforced such requests to be broken up into small chunks before being passedon to the generic block layer, only to be merged by the i/o schedulerwhen the underlying device was capable of handling the i/o in one shot.Also, using the buffer head as an i/o structure for i/os that didn't originatefrom the buffer cache unnecessarily added to the weight of the descriptorswhich were generated for each such chunk.The following were some of the goals and expectations considered in theredesign of the block i/o data structure in 2.5.i.  Should be appropriate as a descriptor for both raw and buffered i/o  -    avoid cache related fields which are irrelevant in the direct/page i/o path,    or filesystem block size alignment restrictions which may not be relevant    for raw i/o.ii. Ability to represent high-memory buffers (which do not have a virtual    address mapping in kernel address space).iii.Ability to represent large i/os w/o unnecessarily breaking them up (i.e    greater than PAGE_SIZE chunks in one shot)iv. At the same time, ability to retain independent identity of i/os from    different sources or i/o units requiring individual completion (e.g. for    latency reasons)v.  Ability to represent an i/o involving multiple physical memory segments    (including non-page aligned page fragments, as specified via readv/writev)    without unnecessarily breaking it up, if the underlying device is capable of    handling it.vi. Preferably should be based on a memory descriptor structure that can be    passed around different types of subsystems or layers, maybe even    networking, without duplication or extra copies of data/descriptor fields    themselves in the processvii.Ability to handle the possibility of splits/merges as the structure passes    through layered drivers (lvm, md, evms), with minimal overhead.The solution was to define a new structure (bio)  for the block layer,instead of using the buffer head structure (bh) directly, the idea beingavoidance of some associated baggage and limitations. The bio structureis uniformly used for all i/o at the block layer ; it forms a part of thebh structure for buffered i/o, and in the case of raw/direct i/o kiobufs aremapped to bio structures.2.2 The bio structThe bio structure uses a vector representation pointing to an array of tuplesof <page, offset, len> to describe the i/o buffer, and has various otherfields describing i/o parameters and state that needs to be maintained forperforming the i/o.Notice that this representation means that a bio has no virtual addressmapping at all (unlike buffer heads).struct bio_vec {       struct page     *bv_page;       unsigned short  bv_len;       unsigned short  bv_offset;};/* * main unit of I/O for the block layer and lower layers (ie drivers) */struct bio {       sector_t            bi_sector;       struct bio          *bi_next;    /* request queue link */       struct block_device *bi_bdev;	/* target device */       unsigned long       bi_flags;    /* status, command, etc */       unsigned long       bi_rw;       /* low bits: r/w, high: priority */       unsigned int	bi_vcnt;     /* how may bio_vec's */       unsigned int	bi_idx;		/* current index into bio_vec array */       unsigned int	bi_size;     /* total size in bytes */       unsigned short 	bi_phys_segments; /* segments after physaddr coalesce*/       unsigned short	bi_hw_segments; /* segments after DMA remapping */       unsigned int	bi_max;	     /* max bio_vecs we can hold                                        used as index into pool */       struct bio_vec   *bi_io_vec;  /* the actual vec list */       bio_end_io_t	*bi_end_io;  /* bi_end_io (bio) */       atomic_t		bi_cnt;	     /* pin count: free when it hits zero */       void             *bi_private;       bio_destructor_t *bi_destructor; /* bi_destructor (bio) */};With this multipage bio design:- Large i/os can be sent down in one go using a bio_vec list consisting  of an array of <page, offset, len> fragments (similar to the way fragments  are represented in the zero-copy network code)- Splitting of an i/o request across multiple devices (as in the case of  lvm or raid) is achieved by cloning the bio (where the clone points to  the same bi_io_vec array, but with the index and size accordingly modified)- A linked list of bios is used as before for unrelated merges (*) - this  avoids reallocs and makes independent completions easier to handle.- Code that traverses the req list can find all the segments of a bio  by using rq_for_each_segment.  This handles the fact that a request  has multiple bios, each of which can have multiple segments.- Drivers which can't process a large bio in one shot can use the bi_idx  field to keep track of the next bio_vec entry to process.  (e.g a 1MB bio_vec needs to be handled in max 128kB chunks for IDE)  [TBD: Should preferably also have a bi_voffset and bi_vlen to avoid modifying   bi_offset an len fields](*) unrelated merges -- a request ends up containing two or more bios that    didn't originate from the same place.bi_end_io() i/o callback gets called on i/o completion of the entire bio.At a lower level, drivers build a scatter gather list from the merged bios.The scatter gather list is in the form of an array of <page, offset, len>entries with their corresponding dma address mappings filled in at theappropriate time. As an optimization, contiguous physical pages can becovered by a single entry where <page> refers to the first page and <len>covers the range of pages (upto 16 contiguous pages could be covered thisway). There is a helper routine (blk_rq_map_sg) which drivers can use to buildthe sg list.Note: Right now the only user of bios with more than one page is ll_rw_kio,which in turn means that only raw I/O uses it (direct i/o may not workright now). The intent however is to enable clustering of pages etc tobecome possible. The pagebuf abstraction layer from SGI also uses multi-pagebios, but that is currently not included in the stock development kernels.The same is true of Andrew Morton's work-in-progress multipage bio writeout and readahead patches.2.3 Changes in the Request StructureThe request structure is the structure that gets passed down to low leveldrivers. The block layer make_request function builds up a request structure,places it on the queue and invokes the drivers request_fn. The driver makesuse of block layer helper routine elv_next_request to pull the next requestoff the queue. Control or diagnostic functions might bypass block and directlyinvoke underlying driver entry points passing in a specially constructedrequest structure.Only some relevant fields (mainly those which changed or may be referredto in some of the discussion here) are listed below, not necessarily inthe order in which they occur in the structure (see include/linux/blkdev.h)Refer to Documentation/block/request.txt for details about all the requeststructure fields and a quick reference about the layers which aresupposed to use or modify those fields.struct request {	struct list_head queuelist;  /* Not meant to be directly accessed by					the driver.					Used by q->elv_next_request_fn					rq->queue is gone					*/	.	.	unsigned char cmd[16]; /* prebuilt command data block */	unsigned long flags;   /* also includes earlier rq->cmd settings */	.	.	sector_t sector; /* this field is now of type sector_t instead of int			    preparation for 64 bit sectors */	.	.	/* Number of scatter-gather DMA addr+len pairs after	 * physical address coalescing is performed.	 */	unsigned short nr_phys_segments;	/* Number of scatter-gather addr+len pairs after	 * physical and DMA remapping hardware coalescing is performed.	 * This is the number of scatter-gather entries the driver	 * will actually have to deal with after DMA mapping is done.	 */	unsigned short nr_hw_segments;	/* Various sector counts */	unsigned long nr_sectors;  /* no. of sectors left: driver modifiable */	unsigned long hard_nr_sectors;  /* block internal copy of above */	unsigned int current_nr_sectors; /* no. of sectors left in the					   current segment:driver modifiable */	unsigned long hard_cur_sectors; /* block internal copy of the above */	.	.	int tag;	/* command tag associated with request */	void *special;  /* same as before */	char *buffer;   /* valid only for low memory buffers upto			 current_nr_sectors */	.	.	struct bio *bio, *biotail;  /* bio list instead of bh */	struct request_list *rl;}	See the rq_flag_bits definitions for an explanation of the various flagsavailable. Some bits are used by the block layer or i/o scheduler.	The behaviour of the various sector counts are almost the same as before,except that since we have multi-segment bios, current_nr_sectors refersto the numbers of sectors in the current segment being processed which couldbe one of the many segments in the current bio (i.e i/o completion unit).The nr_sectors value refers to the total number of sectors in the wholerequest that remain to be transferred (no change). The purpose of thehard_xxx values is for block to remember these counts every time it handsover the request to the driver. These values are updated by block onend_that_request_first, i.e. every time the driver completes a part of thetransfer and invokes block end*request helpers to mark this. Thedriver should not modify these values. The block layer sets up thenr_sectors and current_nr_sectors fields (based on the correspondinghard_xxx values and the number of bytes transferred) and updates it onevery transfer that invokes end_that_request_first. It does the same for thebuffer, bio, bio->bi_idx fields too.The buffer field is just a virtual address mapping of the current segmentof the i/o buffer in cases where the buffer resides in low-memory. For highmemory i/o, this field is not valid and must not be used by drivers.Code that sets up its own request structures and passes them down toa driver needs to be careful about interoperation with the block layer helperfunctions which the driver uses. (Section 1.3)3. Using bios3.1 Setup/TeardownThere are routines for managing the allocation, and reference counting, andfreeing of bios (bio_alloc, bio_get, bio_put).

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -