📄 qsnet-rhel4-2.6.patch
字号:
+environment where paging occurs and do not require memory to be locked+down. The advantage of this is that the user process can expose large+portions of it's address space without having to worry about physical+memory constraints.++However should the operating system decide to swap a page to disk,+then the NIC must be made aware that it should no longer read/write+from this memory, but should generate a translation fault instead.++The ioproc patch has been developed to provide a mechanism whereby the+device driver for a NIC can be aware of when a user process's address+translations change, either by paging or by explicitly mapping or+unmapping memory.++The patch involves inserting callbacks where translations are being+invalidated to notify the NIC that the memory behind those+translations is no longer visible to the application (and so should+not be visible to the NIC). This callback is then responsible for+ensuring that the NIC will not access the physical memory that was+being mapped.++An ioproc invalidate callback in the kswapd code could be utilised to+prevent memory from being paged out if the NIC is unable to support+network page faulting.++For NICs which support network page faulting, there is no requirement+for a user level pin down cache, since they are able to page-in their+translations on the first communication using a buffer. However this+is likely to be inefficient, resulting in slow first use of the+buffer. If the communication buffers were continually allocated and+freed using mmap based malloc() calls then this would lead to all+communications being slower than desirable.++To optimise these warm-up cases the ioproc patch adds calls to+ioproc_update wherever the kernel is creating translations for a user+process. These then allows the device driver to preload translations+so that they are already present for the first network communication+from a buffer.++Linux 2.6 IOPROC implementation details+=======================================++The Linux IOPROC patch adds hooks to the Linux VM code whenever page+table entries are being created and/or invalidated. IOPROC device+drivers can register their interest in being informed of such changes+by registering an ioproc_ops structure which is defined as follows;++extern int ioproc_register_ops(struct mm_struct *mm, struct ioproc_ops *ip);+extern int ioproc_unregister_ops(struct mm_struct *mm, struct ioproc_ops *ip);++typedef struct ioproc_ops {+ struct ioproc_ops *next;+ void *arg;++ void (*release)(void *arg, struct mm_struct *mm);+ void (*sync_range)(void *arg, struct vm_area_struct *vma, unsigned long start, unsigned long end);+ void (*invalidate_range)(void *arg, struct vm_area_struct *vma, unsigned long start, unsigned long end);+ void (*update_range)(void *arg, struct vm_area_struct *vma, unsigned long start, unsigned long end);++ void (*change_protection)(void *arg, struct vm_area_struct *vma, unsigned long start, unsigned long end, pgprot_t newprot);++ void (*sync_page)(void *arg, struct vm_area_struct *vma, unsigned long address);+ void (*invalidate_page)(void *arg, struct vm_area_struct *vma, unsigned long address);+ void (*update_page)(void *arg, struct vm_area_struct *vma, unsigned long address);++} ioproc_ops_t;++ioproc_register_ops+===================+This function should be called by the IOPROC device driver to register+it's interest in PTE changes for the process associated with the passed+in mm_struct.++The ioproc registration is not inherited across fork() and should be+called once for each process that IOPROC is interested in.++This function must be called whilst holding the mm->page_table_lock.++ioproc_unregister_ops+=====================+This function should be called by the IOPROC device driver when it no+longer requires informing of PTE changes in the process associated+with the supplied mm_struct.++This function is not normally needed to be called as the ioproc_ops+struct is unlinked from the associated mm_struct during the+ioproc_release() call.++This function must be called whilst holding the mm->page_table_lock.++ioproc_ops struct+=================+A linked list ioproc_ops structures is hung off the user process+mm_struct (linux/sched.h). At each hook point in the patched kernel,+the ioproc patch will call the associated ioproc_ops callback function+pointer in turn for each registered structure.++The intention of the callbacks is to allow the IOPROC device driver to+inspect the new or modified PTE entry via the Linux kernel+(e.g. find_pte_map()). These callbacks should not modify the Linux+kernel VM state or PTE entries.++The ioproc_ops callback function pointers are:++ioproc_release+==============+The release hook is called when a program exits and all it's vma areas+are torn down and unmapped, i.e. during exit_mmap(). Before each+release hook is called the ioproc_ops structure is unlinked from the+mm_struct.++No locks are required as the process has the only reference to the mm+at this point.++ioproc_sync_[range|page]+========================+The sync hooks are called when a memory map is synchronised with its+disk image i.e. when the msync() syscall is invoked. Any future read+or write by the IOPROC device to the associated pages should cause the+page to be marked as referenced or modified.++Called holding the mm->page_table_lock.++ioproc_invalidate_[range|page]+==============================+The invalidate hooks are called whenever a valid PTE is unloaded+e.g. when a page is unmapped by the user or paged out by the+kernel. After this call the IOPROC must not access the physical memory+again unless a new translation is loaded.++Called holding the mm->page_table_lock.++ioproc_update_[range|page]+==========================+The update hooks are called whenever a valid PTE is loaded+e.g. mmaping memory, moving the brk up, when breaking COW or faulting+in an anonymous page of memory. These give the IOPROC device the+opportunity to load translations speculatively, which can improve+performance by avoiding device translation faults.++Called holding the mm->page_table_lock.++ioproc_change_protection+========================+This hook is called when the protection on a region of memory is+changed i.e. when the mprotect() syscall is invoked.++The IOPROC must not be able to write to a read-only page, so if the+permissions are downgraded then it must honour them. If they are+upgraded it can treat this in the same way as the+ioproc_update_[range|page]() calls.++Called holding the mm->page_table_lock.+++Linux 2.6 IOPROC patch details+==============================++Here are the specific details of each ioproc hook added to the Linux+2.6 VM system and the reasons for doing so:++++++ FILE+ mm/fremap.c++==== FUNCTION+ zap_pte++CALLED FROM+ install_page+ install_file_pte++PTE MODIFICATION+ ptep_clear_flush++ADDED HOOKS+ ioproc_invalidate_page++==== FUNCTION+ install_page++CALLED FROM+ filemap_populate, shmem_populate++PTE MODIFICATION+ set_pte++ADDED HOOKS+ ioproc_update_page++==== FUNCTION+ install_file_pte++CALLED FROM+ filemap_populate, shmem_populate++PTE MODIFICATION+ set_pte++ADDED HOOKS+ ioproc_update_page+++++++ FILE+ mm/memory.c++==== FUNCTION+ zap_page_range++CALLED FROM+ read_zero_pagealigned, madvise_dontneed, unmap_mapping_range,+ unmap_mapping_range_list, do_mmap_pgoff++PTE MODIFICATION+ set_pte (unmap_vmas)++ADDED HOOKS+ ioproc_invalidate_range+++==== FUNCTION+ zeromap_page_range++CALLED FROM+ read_zero_pagealigned, mmap_zero++PTE MODIFICATION+ set_pte (zeromap_pte_range)++ADDED HOOKS+ ioproc_invalidate_range+ ioproc_update_range+++==== FUNCTION+ remap_page_range++CALLED FROM+ many device drivers++PTE MODIFICATION+ set_pte (remap_pte_range)++ADDED HOOKS+ ioproc_invalidate_range+ ioproc_update_range+++==== FUNCTION+ break_cow++CALLED FROM+ do_wp_page++PTE MODIFICATION+ ptep_establish++ADDED HOOKS+ ioproc_invalidate_page+ ioproc_update_page+++==== FUNCTION+ do_wp_page++CALLED FROM+ do_swap_page, handle_pte_fault++PTE MODIFICATION+ ptep_set_access_flags++ADDED HOOKS+ ioproc_update_page+++==== FUNCTION+ do_swap_page++CALLED FROM+ handle_pte_fault++PTE MODIFICATION+ set_pte++ADDED HOOKS+ ioproc_update_page+++==== FUNCTION+ do_anonymous_page++CALLED FROM+ do_no_page++PTE MODIFICATION+ set_pte++ADDED HOOKS+ ioproc_update_page+++==== FUNCTION+ do_no_page++CALLED FROM+ do_file_page, handle_pte_fault++PTE MODIFICATION+ set_pte++ADDED HOOKS+ ioproc_update_page+++++++ FILE+ mm/mmap.c++==== FUNCTION+ unmap_region++CALLED FROM+ do_munmap++PTE MODIFICATION+ set_pte (unmap_vmas)++ADDED HOOKS+ ioproc_invalidate_range+++==== FUNCTION+ exit_mmap++CALLED FROM+ mmput++PTE MODIFICATION+ set_pte (unmap_vmas)++ADDED HOOKS+ ioproc_release+++++++ FILE+ mm/mprotect.c++==== FUNCTION+ change_protection++CALLED FROM+ mprotect_fixup++PTE MODIFICATION+ set_pte (change_pte_range)++ADDED HOOKS+ ioproc_change_protection+++++++ FILE+ mm/mremap.c++==== FUNCTION+ move_page_tables++CALLED FROM+ move_vma++PTE MODIFICATION+ ptep_clear_flush (move_one_page)++ADDED HOOKS+ ioproc_invalidate_range+ ioproc_invalidate_range+++++++ FILE+ mm/rmap.c++==== FUNCTION+ try_to_unmap_one++CALLED FROM+ try_to_unmap_anon, try_to_unmap_file++PTE MODIFICATION+ ptep_clear_flush++ADDED HOOKS+ ioproc_invalidate_page+++==== FUNCTION+ try_to_unmap_cluster++CALLED FROM+ try_to_unmap_file++PTE MODIFICATION+ ptep_clear_flush++ADDED HOOKS+ ioproc_invalidate_page++++++++ FILE + mm/msync.c++==== FUNCTION+ filemap_sync++CALLED FROM+ msync_interval++PTE MODIFICATION+ ptep_clear_flush_dirty (filemap_sync_pte)++ADDED HOOKS+ ioproc_sync_range+++++++ FILE+ mm/hugetlb.c++==== FUNCTION+ zap_hugepage_range++CALLED FROM+ hugetlb_vmtruncate_list++PTE MODIFICATION+ ptep_get_and_clear (unmap_hugepage_range)++ADDED HOOK+ ioproc_invalidate_range+
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -