⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 hrtimers.txt

📁 linux 内核源代码
💻 TXT
字号:
hrtimers - subsystem for high-resolution kernel timers----------------------------------------------------This patch introduces a new subsystem for high-resolution kernel timers.One might ask the question: we already have a timer subsystem(kernel/timers.c), why do we need two timer subsystems? After a lot ofback and forth trying to integrate high-resolution and high-precisionfeatures into the existing timer framework, and after testing varioussuch high-resolution timer implementations in practice, we came to theconclusion that the timer wheel code is fundamentally not suitable forsuch an approach. We initially didn't believe this ('there must be a wayto solve this'), and spent a considerable effort trying to integratethings into the timer wheel, but we failed. In hindsight, there areseveral reasons why such integration is hard/impossible:- the forced handling of low-resolution and high-resolution timers in  the same way leads to a lot of compromises, macro magic and #ifdef  mess. The timers.c code is very "tightly coded" around jiffies and  32-bitness assumptions, and has been honed and micro-optimized for a  relatively narrow use case (jiffies in a relatively narrow HZ range)  for many years - and thus even small extensions to it easily break  the wheel concept, leading to even worse compromises. The timer wheel  code is very good and tight code, there's zero problems with it in its  current usage - but it is simply not suitable to be extended for  high-res timers.- the unpredictable [O(N)] overhead of cascading leads to delays which  necessitate a more complex handling of high resolution timers, which  in turn decreases robustness. Such a design still led to rather large  timing inaccuracies. Cascading is a fundamental property of the timer  wheel concept, it cannot be 'designed out' without unevitably  degrading other portions of the timers.c code in an unacceptable way.- the implementation of the current posix-timer subsystem on top of  the timer wheel has already introduced a quite complex handling of  the required readjusting of absolute CLOCK_REALTIME timers at  settimeofday or NTP time - further underlying our experience by  example: that the timer wheel data structure is too rigid for high-res  timers.- the timer wheel code is most optimal for use cases which can be  identified as "timeouts". Such timeouts are usually set up to cover  error conditions in various I/O paths, such as networking and block  I/O. The vast majority of those timers never expire and are rarely  recascaded because the expected correct event arrives in time so they  can be removed from the timer wheel before any further processing of  them becomes necessary. Thus the users of these timeouts can accept  the granularity and precision tradeoffs of the timer wheel, and  largely expect the timer subsystem to have near-zero overhead.  Accurate timing for them is not a core purpose - in fact most of the  timeout values used are ad-hoc. For them it is at most a necessary  evil to guarantee the processing of actual timeout completions  (because most of the timeouts are deleted before completion), which  should thus be as cheap and unintrusive as possible.The primary users of precision timers are user-space applications thatutilize nanosleep, posix-timers and itimer interfaces. Also, in-kernelusers like drivers and subsystems which require precise timed events(e.g. multimedia) can benefit from the availability of a separatehigh-resolution timer subsystem as well.While this subsystem does not offer high-resolution clock sources justyet, the hrtimer subsystem can be easily extended with high-resolutionclock capabilities, and patches for that exist and are maturing quickly.The increasing demand for realtime and multimedia applications alongwith other potential users for precise timers gives another reason toseparate the "timeout" and "precise timer" subsystems.Another potential benefit is that such a separation allows even morespecial-purpose optimization of the existing timer wheel for the lowresolution and low precision use cases - once the precision-sensitiveAPIs are separated from the timer wheel and are migrated over tohrtimers. E.g. we could decrease the frequency of the timeout subsystemfrom 250 Hz to 100 HZ (or even smaller).hrtimer subsystem implementation details----------------------------------------the basic design considerations were:- simplicity- data structure not bound to jiffies or any other granularity. All the  kernel logic works at 64-bit nanoseconds resolution - no compromises.- simplification of existing, timing related kernel codeanother basic requirement was the immediate enqueueing and ordering oftimers at activation time. After looking at several possible solutionssuch as radix trees and hashes, we chose the red black tree as the basicdata structure. Rbtrees are available as a library in the kernel and areused in various performance-critical areas of e.g. memory management andfile systems. The rbtree is solely used for time sorted ordering, whilea separate list is used to give the expiry code fast access to thequeued timers, without having to walk the rbtree.(This separate list is also useful for later when we'll introducehigh-resolution clocks, where we need separate pending and expiredqueues while keeping the time-order intact.)Time-ordered enqueueing is not purely for the purposes ofhigh-resolution clocks though, it also simplifies the handling ofabsolute timers based on a low-resolution CLOCK_REALTIME. The existingimplementation needed to keep an extra list of all armed absoluteCLOCK_REALTIME timers along with complex locking. In case ofsettimeofday and NTP, all the timers (!) had to be dequeued, thetime-changing code had to fix them up one by one, and all of them had tobe enqueued again. The time-ordered enqueueing and the storage of theexpiry time in absolute time units removes all this complex and poorlyscaling code from the posix-timer implementation - the clock can simplybe set without having to touch the rbtree. This also makes the handlingof posix-timers simpler in general.The locking and per-CPU behavior of hrtimers was mostly taken from theexisting timer wheel code, as it is mature and well suited. Sharing codewas not really a win, due to the different data structures. Also, thehrtimer functions now have clearer behavior and clearer names - such ashrtimer_try_to_cancel() and hrtimer_cancel() [which are roughlyequivalent to del_timer() and del_timer_sync()] - so there's no direct1:1 mapping between them on the algorithmical level, and thus no realpotential for code sharing either.Basic data types: every time value, absolute or relative, is in aspecial nanosecond-resolution type: ktime_t. The kernel-internalrepresentation of ktime_t values and operations is implemented viamacros and inline functions, and can be switched between a "hybridunion" type and a plain "scalar" 64bit nanoseconds representation (atcompile time). The hybrid union type optimizes time conversions on 32bitCPUs. This build-time-selectable ktime_t storage format was implementedto avoid the performance impact of 64-bit multiplications and divisionson 32bit CPUs. Such operations are frequently necessary to convertbetween the storage formats provided by kernel and userspace interfacesand the internal time format. (See include/linux/ktime.h for furtherdetails.)hrtimers - rounding of timer values-----------------------------------the hrtimer code will round timer events to lower-resolution clocksbecause it has to. Otherwise it will do no artificial rounding at all.one question is, what resolution value should be returned to the user bythe clock_getres() interface. This will return whatever real resolutiona given clock has - be it low-res, high-res, or artificially-low-res.hrtimers - testing and verification----------------------------------We used the high-resolution clock subsystem ontop of hrtimers to verifythe hrtimer implementation details in praxis, and we also ran the posixtimer tests in order to ensure specification compliance. We also rantests on low-resolution clocks.The hrtimer patch converts the following kernel functionality to usehrtimers: - nanosleep - itimers - posix-timersThe conversion of nanosleep and posix-timers enabled the unification ofnanosleep and clock_nanosleep.The code was successfully compiled for the following platforms: i386, x86_64, ARM, PPC, PPC64, IA64The code was run-tested on the following platforms: i386(UP/SMP), x86_64(UP/SMP), ARM, PPChrtimers were also integrated into the -rt tree, along with ahrtimers-based high-resolution clock implementation, so the hrtimerscode got a healthy amount of testing and use in practice.	Thomas Gleixner, Ingo Molnar

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -