📄 numa_memory_policy.txt
字号:
a per node list of sibling nodes--called zonelists--built at boot time, or when nodes or memory are added or removed from the system [memory hotplug]. These per node zonelist are constructed with nodes in order of increasing distance based on information provided by the platform firmware. When a task/process policy or a shared policy contains the Default mode, this also means "local allocation", as described above. In the context of a VMA, Default mode means "fall back to task policy"--which may or may not specify Default mode. Thus, Default mode can not be counted on to mean local allocation when used on a non-shared region of the address space. However, see MPOL_PREFERRED below. The Default mode does not use the optional set of nodes. MPOL_BIND: This mode specifies that memory must come from the set of nodes specified by the policy. The memory policy APIs do not specify an order in which the nodes will be searched. However, unlike "local allocation", the Bind policy does not consider the distance between the nodes. Rather, allocations will fallback to the nodes specified by the policy in order of numeric node id. Like everything in Linux, this is subject to change. MPOL_PREFERRED: This mode specifies that the allocation should be attempted from the single node specified in the policy. If that allocation fails, the kernel will search other nodes, exactly as it would for a local allocation that started at the preferred node in increasing distance from the preferred node. "Local" allocation policy can be viewed as a Preferred policy that starts at the node containing the cpu where the allocation takes place. Internally, the Preferred policy uses a single node--the preferred_node member of struct mempolicy. A "distinguished value of this preferred_node, currently '-1', is interpreted as "the node containing the cpu where the allocation takes place"--local allocation. This is the way to specify local allocation for a specific range of addresses--i.e. for VMA policies. MPOL_INTERLEAVED: This mode specifies that page allocations be interleaved, on a page granularity, across the nodes specified in the policy. This mode also behaves slightly differently, based on the context where it is used: For allocation of anonymous pages and shared memory pages, Interleave mode indexes the set of nodes specified by the policy using the page offset of the faulting address into the segment [VMA] containing the address modulo the number of nodes specified by the policy. It then attempts to allocate a page, starting at the selected node, as if the node had been specified by a Preferred policy or had been selected by a local allocation. That is, allocation will follow the per node zonelist. For allocation of page cache pages, Interleave mode indexes the set of nodes specified by the policy using a node counter maintained per task. This counter wraps around to the lowest specified node after it reaches the highest specified node. This will tend to spread the pages out over the nodes specified by the policy based on the order in which they are allocated, rather than based on any page offset into an address range or file. During system boot up, the temporary interleaved system default policy works in this mode.MEMORY POLICY APIsLinux supports 3 system calls for controlling memory policy. These APISalways affect only the calling task, the calling task's address space, orsome shared object mapped into the calling task's address space. Note: the headers that define these APIs and the parameter data types for user space applications reside in a package that is not part of the Linux kernel. The kernel system call interfaces, with the 'sys_' prefix, are defined in <linux/syscalls.h>; the mode and flag definitions are defined in <linux/mempolicy.h>.Set [Task] Memory Policy: long set_mempolicy(int mode, const unsigned long *nmask, unsigned long maxnode); Set's the calling task's "task/process memory policy" to mode specified by the 'mode' argument and the set of nodes defined by 'nmask'. 'nmask' points to a bit mask of node ids containing at least 'maxnode' ids. See the set_mempolicy(2) man page for more detailsGet [Task] Memory Policy or Related Information long get_mempolicy(int *mode, const unsigned long *nmask, unsigned long maxnode, void *addr, int flags); Queries the "task/process memory policy" of the calling task, or the policy or location of a specified virtual address, depending on the 'flags' argument. See the get_mempolicy(2) man page for more detailsInstall VMA/Shared Policy for a Range of Task's Address Space long mbind(void *start, unsigned long len, int mode, const unsigned long *nmask, unsigned long maxnode, unsigned flags); mbind() installs the policy specified by (mode, nmask, maxnodes) as a VMA policy for the range of the calling task's address space specified by the 'start' and 'len' arguments. Additional actions may be requested via the 'flags' argument. See the mbind(2) man page for more details.MEMORY POLICY COMMAND LINE INTERFACEAlthough not strictly part of the Linux implementation of memory policy,a command line tool, numactl(8), exists that allows one to:+ set the task policy for a specified program via set_mempolicy(2), fork(2) and exec(2)+ set the shared policy for a shared memory segment via mbind(2)The numactl(8) tool is packages with the run-time version of the librarycontaining the memory policy system call wrappers. Some distributionspackage the headers and compile-time libraries in a separate developmentpackage.MEMORY POLICIES AND CPUSETSMemory policies work within cpusets as described above. For memory policiesthat require a node or set of nodes, the nodes are restricted to the set ofnodes whose memories are allowed by the cpuset constraints. If the nodemaskspecified for the policy contains nodes that are not allowed by the cpuset, orthe intersection of the set of nodes specified for the policy and the set ofnodes with memory is the empty set, the policy is considered invalidand cannot be installed.The interaction of memory policies and cpusets can be problematic for acouple of reasons:1) the memory policy APIs take physical node id's as arguments. As mentioned above, it is illegal to specify nodes that are not allowed in the cpuset. The application must query the allowed nodes using the get_mempolicy() API with the MPOL_F_MEMS_ALLOWED flag to determine the allowed nodes and restrict itself to those nodes. However, the resources available to a cpuset can be changed by the system administrator, or a workload manager application, at any time. So, a task may still get errors attempting to specify policy nodes, and must query the allowed memories again.2) when tasks in two cpusets share access to a memory region, such as shared memory segments created by shmget() of mmap() with the MAP_ANONYMOUS and MAP_SHARED flags, and any of the tasks install shared policy on the region, only nodes whose memories are allowed in both cpusets may be used in the policies. Obtaining this information requires "stepping outside" the memory policy APIs to use the cpuset information and requires that one know in what cpusets other task might be attaching to the shared region. Furthermore, if the cpusets' allowed memory sets are disjoint, "local" allocation is the only valid policy.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -