📄 00000046.htm

📁 一份很好的linux入门资料
💻 HTM
📖 第 1 页 / 共 5 页
字号:
t&nbsp;this&nbsp;operation&nbsp;can&nbsp;block&nbsp;(GFP_KERNEL&nbsp;allocation)&nbsp;so&nbsp;it&nbsp;must&nbsp;drop&nbsp;the&nbsp;inode&nbsp;<BR>_lock&nbsp;spinlock&nbsp;which&nbsp;guards&nbsp;the&nbsp;hashtable.&nbsp;Since&nbsp;it&nbsp;dropped&nbsp;the&nbsp;spinlock&nbsp;it&nbsp;&nbsp;<BR>must&nbsp;retry&nbsp;searching&nbsp;the&nbsp;inode&nbsp;in&nbsp;the&nbsp;hashtable&nbsp;and&nbsp;if&nbsp;it&nbsp;is&nbsp;found&nbsp;this&nbsp;time&nbsp;<BR>,&nbsp;it&nbsp;returns&nbsp;(after&nbsp;incrementing&nbsp;the&nbsp;reference&nbsp;by&nbsp;__iget)&nbsp;the&nbsp;one&nbsp;found&nbsp;in&nbsp;t&nbsp;<BR>he&nbsp;hashtable&nbsp;and&nbsp;destroys&nbsp;the&nbsp;newly&nbsp;allocated&nbsp;one.&nbsp;If&nbsp;it&nbsp;is&nbsp;still&nbsp;not&nbsp;found&nbsp;&nbsp;<BR>in&nbsp;the&nbsp;hashtable&nbsp;then&nbsp;the&nbsp;new&nbsp;inode&nbsp;we&nbsp;have&nbsp;just&nbsp;allocated&nbsp;is&nbsp;the&nbsp;one&nbsp;to&nbsp;be&nbsp;&nbsp;<BR>used&nbsp;and&nbsp;so&nbsp;it&nbsp;is&nbsp;initialised&nbsp;to&nbsp;the&nbsp;required&nbsp;values&nbsp;and&nbsp;the&nbsp;fs-specific&nbsp;sb-&nbsp;<BR>s_op-read_inode()&nbsp;method&nbsp;is&nbsp;invoked&nbsp;to&nbsp;populate&nbsp;the&nbsp;rest&nbsp;of&nbsp;the&nbsp;inode.&nbsp;This&nbsp;&nbsp;<BR>brings&nbsp;us&nbsp;from&nbsp;inode&nbsp;cache&nbsp;back&nbsp;to&nbsp;the&nbsp;filesystem&nbsp;code&nbsp;-&nbsp;remember&nbsp;that&nbsp;we&nbsp;ca&nbsp;<BR>me&nbsp;to&nbsp;the&nbsp;inode&nbsp;cache&nbsp;when&nbsp;filesystem-specific&nbsp;lookup()&nbsp;method&nbsp;invoked&nbsp;iget(&nbsp;<BR>).&nbsp;While&nbsp;the&nbsp;s_op-read_inode()&nbsp;method&nbsp;is&nbsp;reading&nbsp;the&nbsp;inode&nbsp;from&nbsp;disk&nbsp;the&nbsp;ino&nbsp;<BR>de&nbsp;is&nbsp;locked&nbsp;(i_state&nbsp;=&nbsp;I_LOCK)&nbsp;and&nbsp;after&nbsp;it&nbsp;returns&nbsp;it&nbsp;is&nbsp;unlocked&nbsp;and&nbsp;all&nbsp;&nbsp;<BR>the&nbsp;waiters&nbsp;for&nbsp;it&nbsp;are&nbsp;woken&nbsp;up&nbsp;<BR>Now,&nbsp;let's&nbsp;see&nbsp;what&nbsp;happens&nbsp;when&nbsp;we&nbsp;close&nbsp;this&nbsp;file&nbsp;descriptor.&nbsp;The&nbsp;close(2)&nbsp;<BR>&nbsp;system&nbsp;call&nbsp;is&nbsp;implemented&nbsp;in&nbsp;fs/open.c:sys_close()&nbsp;function&nbsp;which&nbsp;calls&nbsp;do&nbsp;<BR>_close(fd,&nbsp;1)&nbsp;which&nbsp;rips&nbsp;(replaces&nbsp;with&nbsp;NULL)&nbsp;the&nbsp;descriptor&nbsp;of&nbsp;the&nbsp;process'&nbsp;<BR>&nbsp;file&nbsp;descriptor&nbsp;table&nbsp;and&nbsp;invokes&nbsp;filp_close()&nbsp;function&nbsp;which&nbsp;does&nbsp;most&nbsp;of&nbsp;&nbsp;<BR>the&nbsp;work.&nbsp;The&nbsp;interesting&nbsp;things&nbsp;happen&nbsp;in&nbsp;fput()&nbsp;which&nbsp;checks&nbsp;if&nbsp;this&nbsp;was&nbsp;t&nbsp;<BR>he&nbsp;last&nbsp;reference&nbsp;to&nbsp;the&nbsp;file&nbsp;and&nbsp;if&nbsp;so&nbsp;calls&nbsp;fs/file_table.c:_fput()&nbsp;which&nbsp;&nbsp;<BR>calls&nbsp;__fput()&nbsp;which&nbsp;is&nbsp;where&nbsp;interaction&nbsp;with&nbsp;dcache&nbsp;(and&nbsp;therefore&nbsp;with&nbsp;in&nbsp;<BR>ode&nbsp;cache&nbsp;-&nbsp;remember&nbsp;dcache&nbsp;is&nbsp;a&nbsp;Master&nbsp;of&nbsp;inode&nbsp;cache!)&nbsp;happens.&nbsp;The&nbsp;fs/dca&nbsp;<BR>che.c:dput()&nbsp;does&nbsp;dentry_iput()&nbsp;which&nbsp;brings&nbsp;us&nbsp;back&nbsp;to&nbsp;inode&nbsp;cache&nbsp;via&nbsp;iput&nbsp;<BR>(inode)&nbsp;so&nbsp;let&nbsp;us&nbsp;understand&nbsp;fs/inode.c:iput(inode):&nbsp;<BR>1.&nbsp;if&nbsp;parameter&nbsp;passed&nbsp;to&nbsp;us&nbsp;is&nbsp;NULL,&nbsp;we&nbsp;do&nbsp;absolutely&nbsp;nothing&nbsp;and&nbsp;return&nbsp;<BR>2.&nbsp;if&nbsp;there&nbsp;is&nbsp;a&nbsp;fs-specific&nbsp;sb-s_op-put_inode()&nbsp;method&nbsp;it&nbsp;is&nbsp;invoked&nbsp;now&nbsp;wi&nbsp;<BR>th&nbsp;no&nbsp;spinlocks&nbsp;held&nbsp;(so&nbsp;it&nbsp;can&nbsp;block)&nbsp;<BR>3.&nbsp;inode_lock&nbsp;spinlock&nbsp;is&nbsp;taken&nbsp;and&nbsp;i_count&nbsp;is&nbsp;decremented.&nbsp;If&nbsp;this&nbsp;was&nbsp;NOT&nbsp;&nbsp;<BR>the&nbsp;last&nbsp;reference&nbsp;to&nbsp;this&nbsp;inode&nbsp;then&nbsp;we&nbsp;simply&nbsp;checked&nbsp;if&nbsp;there&nbsp;are&nbsp;too&nbsp;man&nbsp;<BR>y&nbsp;references&nbsp;to&nbsp;it&nbsp;and&nbsp;so&nbsp;i_count&nbsp;can&nbsp;wrap&nbsp;around&nbsp;the&nbsp;32&nbsp;bits&nbsp;allocated&nbsp;to&nbsp;i&nbsp;<BR>t&nbsp;and&nbsp;if&nbsp;so&nbsp;we&nbsp;print&nbsp;a&nbsp;warning&nbsp;and&nbsp;return.&nbsp;Note&nbsp;that&nbsp;we&nbsp;call&nbsp;printk()&nbsp;while&nbsp;&nbsp;<BR>holding&nbsp;the&nbsp;inode_lock&nbsp;spinlock&nbsp;-&nbsp;this&nbsp;is&nbsp;fine&nbsp;because&nbsp;printk()&nbsp;can&nbsp;never&nbsp;bl&nbsp;<BR>ock&nbsp;so&nbsp;it&nbsp;may&nbsp;be&nbsp;called&nbsp;in&nbsp;absolutely&nbsp;any&nbsp;context&nbsp;(even&nbsp;from&nbsp;interrupt&nbsp;handl&nbsp;<BR>ers!)&nbsp;<BR>4.&nbsp;if&nbsp;this&nbsp;was&nbsp;the&nbsp;last&nbsp;active&nbsp;reference&nbsp;then&nbsp;some&nbsp;work&nbsp;needs&nbsp;to&nbsp;be&nbsp;done.&nbsp;<BR>The&nbsp;work&nbsp;performed&nbsp;by&nbsp;iput()&nbsp;on&nbsp;the&nbsp;last&nbsp;inode&nbsp;reference&nbsp;is&nbsp;rather&nbsp;complex&nbsp;s&nbsp;<BR>o&nbsp;we&nbsp;separate&nbsp;it&nbsp;into&nbsp;a&nbsp;list&nbsp;of&nbsp;its&nbsp;own:&nbsp;<BR>1.&nbsp;If&nbsp;i_nlink&nbsp;==&nbsp;0&nbsp;(e.g.&nbsp;the&nbsp;file&nbsp;was&nbsp;unlinked&nbsp;while&nbsp;we&nbsp;held&nbsp;it&nbsp;open)&nbsp;then&nbsp;i&nbsp;<BR>node&nbsp;is&nbsp;removed&nbsp;from&nbsp;hashtable&nbsp;and&nbsp;from&nbsp;its&nbsp;type&nbsp;list&nbsp;and&nbsp;if&nbsp;there&nbsp;are&nbsp;any&nbsp;d&nbsp;<BR>ata&nbsp;pages&nbsp;held&nbsp;in&nbsp;page&nbsp;cache&nbsp;for&nbsp;this&nbsp;inode,&nbsp;they&nbsp;are&nbsp;removed&nbsp;by&nbsp;means&nbsp;of&nbsp;tr&nbsp;<BR>uncate_all_inode_pages(&amp;inode-i_data).&nbsp;Then&nbsp;filesystem-specific&nbsp;s_op-delete_&nbsp;<BR>inode()&nbsp;method&nbsp;is&nbsp;invoked&nbsp;which&nbsp;typically&nbsp;deletes&nbsp;on-disk&nbsp;copy&nbsp;of&nbsp;the&nbsp;inode.&nbsp;<BR>&nbsp;If&nbsp;there&nbsp;is&nbsp;no&nbsp;s_op-delete_inode()&nbsp;method&nbsp;registered&nbsp;by&nbsp;the&nbsp;filesystem&nbsp;(e.g&nbsp;<BR>.&nbsp;ramfs)&nbsp;then&nbsp;we&nbsp;call&nbsp;clear_inode(inode)&nbsp;which&nbsp;invokes&nbsp;s_op-clear_inode()&nbsp;if&nbsp;<BR>&nbsp;registered&nbsp;and&nbsp;if&nbsp;inode&nbsp;corresponds&nbsp;to&nbsp;a&nbsp;block&nbsp;device&nbsp;the&nbsp;device's&nbsp;referenc&nbsp;<BR>e&nbsp;count&nbsp;is&nbsp;dropped&nbsp;by&nbsp;bdput(inode-i_bdev).&nbsp;<BR>2.&nbsp;if&nbsp;i_nlink&nbsp;!=&nbsp;0&nbsp;then&nbsp;we&nbsp;check&nbsp;if&nbsp;there&nbsp;are&nbsp;other&nbsp;inodes&nbsp;in&nbsp;the&nbsp;same&nbsp;hash&nbsp;&nbsp;<BR>bucket&nbsp;and&nbsp;if&nbsp;there&nbsp;is&nbsp;none,&nbsp;then&nbsp;if&nbsp;inode&nbsp;is&nbsp;not&nbsp;dirty&nbsp;we&nbsp;delete&nbsp;it&nbsp;from&nbsp;it&nbsp;<BR>s&nbsp;type&nbsp;list&nbsp;and&nbsp;add&nbsp;it&nbsp;to&nbsp;inode_unused&nbsp;list&nbsp;incrementing&nbsp;inodes_stat.nr_unus&nbsp;<BR>ed.&nbsp;If&nbsp;there&nbsp;are&nbsp;inodes&nbsp;in&nbsp;the&nbsp;same&nbsp;hashbucket&nbsp;then&nbsp;we&nbsp;delete&nbsp;it&nbsp;from&nbsp;the&nbsp;ty&nbsp;<BR>pe&nbsp;list&nbsp;and&nbsp;add&nbsp;to&nbsp;inode_unused&nbsp;list.&nbsp;If&nbsp;this&nbsp;was&nbsp;anonymous&nbsp;inode&nbsp;(NetApp&nbsp;.s&nbsp;<BR>napshot)&nbsp;then&nbsp;we&nbsp;delete&nbsp;it&nbsp;from&nbsp;the&nbsp;type&nbsp;list&nbsp;and&nbsp;clear/destroy&nbsp;it&nbsp;completel&nbsp;<BR>y&nbsp;<BR>3.2&nbsp;Filesystem&nbsp;Registration/Unregistration&nbsp;<BR>Linux&nbsp;kernel&nbsp;provides&nbsp;a&nbsp;mechanism&nbsp;for&nbsp;new&nbsp;filesystems&nbsp;to&nbsp;be&nbsp;written&nbsp;with&nbsp;min&nbsp;<BR>imum&nbsp;effort.&nbsp;The&nbsp;historical&nbsp;reasons&nbsp;for&nbsp;this&nbsp;are:&nbsp;<BR>1.&nbsp;In&nbsp;the&nbsp;world&nbsp;where&nbsp;people&nbsp;still&nbsp;use&nbsp;non-Linux&nbsp;operating&nbsp;systems&nbsp;to&nbsp;protec&nbsp;<BR>t&nbsp;their&nbsp;investment&nbsp;in&nbsp;legacy&nbsp;software&nbsp;Linux&nbsp;had&nbsp;to&nbsp;provide&nbsp;interoperability&nbsp;&nbsp;<BR>by&nbsp;supporting&nbsp;a&nbsp;great&nbsp;multitude&nbsp;of&nbsp;different&nbsp;filesystems&nbsp;-&nbsp;most&nbsp;of&nbsp;which&nbsp;wou&nbsp;<BR>ld&nbsp;not&nbsp;deserve&nbsp;to&nbsp;exist&nbsp;on&nbsp;their&nbsp;own&nbsp;but&nbsp;only&nbsp;for&nbsp;compatibility&nbsp;with&nbsp;existin&nbsp;<BR>g&nbsp;non-Linux&nbsp;operating&nbsp;systems&nbsp;<BR>2.&nbsp;The&nbsp;interface&nbsp;for&nbsp;filesystem&nbsp;writers&nbsp;had&nbsp;to&nbsp;be&nbsp;very&nbsp;simple&nbsp;so&nbsp;that&nbsp;people&nbsp;<BR>&nbsp;could&nbsp;try&nbsp;to&nbsp;reverse&nbsp;engineer&nbsp;existing&nbsp;proprietary&nbsp;filesystems&nbsp;by&nbsp;writing&nbsp;r&nbsp;<BR>ead-only&nbsp;versions&nbsp;of&nbsp;them.&nbsp;Therefore&nbsp;Linux&nbsp;VFS&nbsp;makes&nbsp;it&nbsp;very&nbsp;easy&nbsp;to&nbsp;impleme&nbsp;<BR>nt&nbsp;read-only&nbsp;filesystems&nbsp;-&nbsp;95%&nbsp;of&nbsp;the&nbsp;work&nbsp;is&nbsp;to&nbsp;finish&nbsp;them&nbsp;by&nbsp;adding&nbsp;full&nbsp;&nbsp;<BR>write-support.&nbsp;As&nbsp;a&nbsp;concrete&nbsp;example,&nbsp;I&nbsp;wrote&nbsp;read-only&nbsp;BFS&nbsp;filesystem&nbsp;for&nbsp;L&nbsp;<BR>inux&nbsp;in&nbsp;about&nbsp;10&nbsp;hours&nbsp;but&nbsp;it&nbsp;took&nbsp;several&nbsp;weeks&nbsp;to&nbsp;complete&nbsp;it&nbsp;to&nbsp;have&nbsp;full&nbsp;<BR>&nbsp;write&nbsp;support&nbsp;(and&nbsp;even&nbsp;today&nbsp;some&nbsp;purists&nbsp;claim&nbsp;that&nbsp;it&nbsp;is&nbsp;not&nbsp;complete&nbsp;be&nbsp;<BR>cause&nbsp;&quot;it&nbsp;doesn't&nbsp;have&nbsp;compactification&nbsp;support&quot;)&nbsp;<BR>3.&nbsp;All&nbsp;Linux&nbsp;filesystems&nbsp;can&nbsp;be&nbsp;implemented&nbsp;as&nbsp;modules&nbsp;so&nbsp;VFS&nbsp;interface&nbsp;is&nbsp;e&nbsp;<BR>xported&nbsp;<BR>Let&nbsp;us&nbsp;consider&nbsp;the&nbsp;steps&nbsp;required&nbsp;to&nbsp;implement&nbsp;a&nbsp;filesystem&nbsp;under&nbsp;Linux.&nbsp;Th&nbsp;<BR>e&nbsp;code&nbsp;implementing&nbsp;a&nbsp;filesystem&nbsp;can&nbsp;be&nbsp;either&nbsp;a&nbsp;dynamically&nbsp;loadable&nbsp;module&nbsp;<BR>&nbsp;or&nbsp;statically&nbsp;linked&nbsp;into&nbsp;the&nbsp;kernel&nbsp;and&nbsp;the&nbsp;way&nbsp;it&nbsp;is&nbsp;done&nbsp;under&nbsp;Linux&nbsp;is&nbsp;&nbsp;<BR>very&nbsp;transparent.&nbsp;All&nbsp;that&nbsp;is&nbsp;needed&nbsp;is&nbsp;to&nbsp;fill&nbsp;in&nbsp;a&nbsp;'struct&nbsp;file_system_typ&nbsp;<BR>e'&nbsp;structure&nbsp;and&nbsp;register&nbsp;it&nbsp;with&nbsp;the&nbsp;VFS&nbsp;using&nbsp;register_filesystem()&nbsp;functi&nbsp;<BR>on&nbsp;as&nbsp;in&nbsp;the&nbsp;following&nbsp;example&nbsp;from&nbsp;fs/bfs/inode.c:&nbsp;<BR>#include&nbsp;&lt;linux/module.h&nbsp;<BR>#include&nbsp;&lt;linux/init.h&nbsp;<BR>static&nbsp;struct&nbsp;super_block&nbsp;*bfs_read_super(struct&nbsp;super_block&nbsp;*,&nbsp;void&nbsp;*,&nbsp;int)&nbsp;<BR>;&nbsp;<BR>static&nbsp;DECLARE_FSTYPE_DEV(bfs_fs_type,&nbsp;&quot;bfs&quot;,&nbsp;bfs_read_super);&nbsp;<BR>static&nbsp;int&nbsp;__init&nbsp;init_bfs_fs(void)&nbsp;<BR>{&nbsp;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return&nbsp;register_filesystem(&amp;bfs_fs_type);&nbsp;<BR>}&nbsp;<BR>static&nbsp;void&nbsp;__exit&nbsp;exit_bfs_fs(void)&nbsp;<BR>{&nbsp;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;unregister_filesystem(&amp;bfs_fs_type);&nbsp;<BR>}&nbsp;<BR>module_init(init_bfs_fs)&nbsp;<BR>module_exit(exit_bfs_fs)&nbsp;<BR>These&nbsp;macros&nbsp;ensure&nbsp;that&nbsp;for&nbsp;modules&nbsp;the&nbsp;functions&nbsp;init_bfs_fs()&nbsp;and&nbsp;exit_bf&nbsp;<BR>s_fs()&nbsp;turn&nbsp;into&nbsp;init_module()&nbsp;and&nbsp;cleanup_module()&nbsp;respectively&nbsp;and&nbsp;for&nbsp;sta&nbsp;<BR>tically&nbsp;linked&nbsp;objects&nbsp;the&nbsp;exit_bfs_fs()&nbsp;code&nbsp;vanishes&nbsp;as&nbsp;it&nbsp;is&nbsp;unnecessary.&nbsp;<BR>&nbsp;<BR>The&nbsp;'struct&nbsp;file_system_type'&nbsp;is&nbsp;declared&nbsp;in&nbsp;include/linux/fs.h:&nbsp;<BR>struct&nbsp;file_system_type&nbsp;{&nbsp;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;const&nbsp;char&nbsp;*name;&nbsp;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;int&nbsp;fs_flags;&nbsp;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;struct&nbsp;super_block&nbsp;*(*read_super)&nbsp;(struct&nbsp;super_block&nbsp;*,&nbsp;void&nbsp;*,&nbsp;int&nbsp;<BR>);&nbsp;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;struct&nbsp;module&nbsp;*owner;&nbsp;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;struct&nbsp;vfsmount&nbsp;*kern_mnt;&nbsp;/*&nbsp;For&nbsp;kernel&nbsp;mount,&nbsp;if&nbsp;it's&nbsp;FS_SINGLE&nbsp;fs&nbsp;<BR>&nbsp;*/&nbsp;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;struct&nbsp;file_system_type&nbsp;*&nbsp;next;&nbsp;<BR>};&nbsp;<BR>The&nbsp;fields&nbsp;thereof&nbsp;are&nbsp;explained&nbsp;thus:&nbsp;<BR>·&nbsp;name&nbsp;-&nbsp;human&nbsp;readable&nbsp;name,&nbsp;appears&nbsp;in&nbsp;/proc/filesystems&nbsp;file&nbsp;and&nbsp;is&nbsp;used&nbsp;<BR>&nbsp;as&nbsp;a&nbsp;key&nbsp;to&nbsp;find&nbsp;filesystem&nbsp;by&nbsp;name&nbsp;(type&nbsp;of&nbsp;mount(2))&nbsp;and&nbsp;to&nbsp;refuse&nbsp;to&nbsp;reg&nbsp;<BR>ister&nbsp;a&nbsp;different&nbsp;filesystem&nbsp;under&nbsp;the&nbsp;name&nbsp;of&nbsp;the&nbsp;one&nbsp;already&nbsp;registered&nbsp;-&nbsp;&nbsp;<BR>so&nbsp;there&nbsp;can&nbsp;(obviously)&nbsp;be&nbsp;only&nbsp;one&nbsp;filesystem&nbsp;with&nbsp;a&nbsp;given&nbsp;name.&nbsp;For&nbsp;modul&nbsp;<BR>es,&nbsp;name&nbsp;points&nbsp;to&nbsp;module's&nbsp;address&nbsp;spaces&nbsp;and&nbsp;not&nbsp;copied&nbsp;-&nbsp;this&nbsp;means&nbsp;cat&nbsp;/&nbsp;<BR>proc/filesystems&nbsp;can&nbsp;oops&nbsp;if&nbsp;the&nbsp;module&nbsp;was&nbsp;unloaded&nbsp;but&nbsp;filesystem&nbsp;is&nbsp;still&nbsp;<BR>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -