📄 smp.txt
字号:
eCos SMP Support================eCos contains support for limited Symmetric Multi-Processing(SMP). This is only available on selected architectures and platforms.This first part of this document describes the platform-independentparts of the SMP support. Annexes at the end of this document describeany details that are specific to a particular platform.Target Hardware Limitations---------------------------To allow a reasonable implementation of SMP, and to reduce thedisruption to the existing source base, a number of assumptions havebeen made about the features of the target hardware.- Modest multiprocessing. The typical number of CPUs supported is two to four, with an upper limit around eight. While there are no inherent limits in the code, hardware and algorithmic limitations will probably become significant beyond this point.- SMP synchronization support. The hardware must supply a mechanism to allow software on two CPUs to synchronize. This is normally provided as part of the instruction set in the form of test-and-set, compare-and-swap or load-link/store-conditional instructions. An alternative approach is the provision of hardware semaphore registers which can be used to serialize implementations of these operations. Whatever hardware facilities are available, they are used in eCos to implement spinlocks.- Coherent caches. It is assumed that no extra effort will be required to access shared memory from any processor. This means that either there are no caches, they are shared by all processors, or are maintained in a coherent state by the hardware. It would be too disruptive to the eCos sources if every memory access had to be bracketed by cache load/flush operations. Any hardware that requires this is not supported.- Uniform addressing. It is assumed that all memory that is shared between CPUs is addressed at the same location from all CPUs. Like non-coherent caches, dealing with CPU-specific address translation is considered too disruptive to the eCos source base. This does not, however, preclude systems with non-uniform access costs for different CPUs.- Uniform device addressing. As with access to memory, it is assumed that all devices are equally accessible to all CPUs. Since device access is often made from thread contexts, it is not possible to restrict access to device control registers to certain CPUs, since there is currently no support for binding or migrating threads to CPUs. - Interrupt routing. The target hardware must have an interrupt controller that can route interrupts to specific CPUs. It is acceptable for all interrupts to be delivered to just one CPU, or for some interrupts to be bound to specific CPUs, or for some interrupts to be local to each CPU. At present dynamic routing, where a different CPU may be chosen each time an interrupt is delivered, is not supported. ECos cannot support hardware where all interrupts are delivered to all CPUs simultaneously with the expectation that software will resolve any conflicts.- Inter-CPU interrupts. A mechanism to allow one CPU to interrupt another is needed. This is necessary so that events on one CPU can cause rescheduling on other CPUs.- CPU Identifiers. Code running on a CPU must be able to determine which CPU it is running on. The CPU Id is usually provided either in a CPU status register, or in a register associated with the inter-CPU interrupt delivery subsystem. Ecos expects CPU Ids to be small positive integers, although alternative representations, such as bitmaps, can be converted relatively easily. Complex mechanisms for getting the CPU Id cannot be supported. Getting the CPU Id must be a cheap operation, since it is done often, and in performance critical places such as interrupt handlers and the scheduler. Kernel Support--------------This section describes how SMP is handled in the kernel, and wheresystem behaviour differs from a single CPU system.System Startup~~~~~~~~~~~~~~System startup takes place on only one CPU, called the primaryCPU. All other CPUs, the secondary CPUs, are either placed insuspended state at reset, or are captured by the HAL and put intoa spin as they start up.The primary CPU is responsible for copying the DATA segment andzeroing the BSS (if required), calling HAL variant and platforminitialization routines and invoking constructors. It then callscyg_start() to enter the application. The application may then createextra threads and other objects.It is only when the application calls Cyg_Scheduler::start() that thesecondary CPUs are initialized. This routine scans the list ofavailable secondary CPUs and calls HAL_SMP_CPU_START() to start each one.Finally it calls Cyg_Scheduler::start_cpu().Each secondary CPU starts in the HAL, where it completes any per-CPUinitialization before calling into the kernel atcyg_kernel_cpu_startup(). Here it claims the scheduler lock and calls Cyg_Scheduler::start_cpu().Cyg_Scheduler::start_cpu() is common to both the primary and secondaryCPUs. The first thing this code does is to install an interrupt objectfor this CPU's inter-CPU interrupt. From this point on the code is thesame as for the single CPU case: an initial thread is chosen andentered.From this point on the CPUs are all equal, eCos makes no furtherdistinction between the primary and secondary CPUs. However, thehardware may still distinguish them as far as interrupt delivery isconcerned.Scheduling~~~~~~~~~~To function correctly an operating system kernel must protect itsvital data structures, such as the run queues, from concurrentaccess. In a single CPU system the only concurrent activities to worryabout are asynchronous interrupts. The kernel can easily guard itsdata structures against these by disabling interrupts. However, in amulti-CPU system, this is inadequate since it does not block access byother CPUs.The eCos kernel protects its vital data structures using the schedulerlock. In single CPU systems this is a simple counter that isatomically incremented to acquire the lock and decremented to releaseit. If the lock is decremented to zero then the scheduler may beinvoked to choose a different thread to run. Because interrupts maycontinue to be serviced while the scheduler lock is claimed, ISRs arenot allowed to access kernel data structures, or call kernel routinesthat can. Instead all such operations are deferred to an associatedDSR routine that is run during the lock release operation, when thedata structures are in a consistent state.By choosing a kernel locking mechanism that does not rely on interruptmanipulation to protect data structures, it is easier to convert eCosto SMP than would otherwise be the case. The principal change needed tomake eCos SMP-safe is to convert the scheduler lock into a nestablespin lock. This is done by adding a spinlock and a CPU id to theoriginal counter.The algorithm for acquiring the scheduler lock is very simple. If thescheduler lock's CPU id matches the current CPU then it can incrementthe counter and continue. If it does not match, the CPU must spin onthe spinlock, after which it may increment the counter and store itsown identity in the CPU id.To release the lock, the counter is decremented. If it goes to zerothe CPU id value must be set to NONE and the spinlock cleared.To protect these sequences against interrupts, they must be performedwith interrupts disabled. However, since these are very short codesequences, they will not have an adverse effect on the interruptlatency.Beyond converting the scheduler lock, further preparing the kernel forSMP is a relatively minor matter. The main changes are to convertvarious scalar housekeeping variables into arrays indexed by CPUid. These include the current thread pointer, the need_rescheduleflag and the timeslice counter.At present only the Multi-Level Queue (MLQ) scheduler is capable ofsupporting SMP configurations. The main change made to this scheduleris to cope with having several threads in execution at the sametime. Running threads are marked with the CPU they are executing on.When scheduling a thread, the scheduler skips past any running threadsuntil it finds a thread that is pending. While not a constant-timealgorithm, as in the single CPU case, this is still deterministic,since the worst case time is bounded by the number of CPUs in thesystem.A second change to the scheduler is in the code used to decide whenthe scheduler should be called to choose a new thread. The schedulerattempts to keep the *n* CPUs running the *n* highest prioritythreads. Since an event or interrupt on one CPU may require areschedule on another CPU, there must be a mechanism for decidingthis. The algorithm currently implemented is very simple. Given athread that has just been awakened (or had its priority changed), thescheduler scans the CPUs, starting with the one it is currentlyrunning on, for a current thread that is of lower priority than thenew one. If one is found then a reschedule interrupt is sent to thatCPU and the scan continues, but now using the current thread of therescheduled CPU as the candidate thread. In this way the new threadgets to run as quickly as possible, hopefully on the current CPU, andthe remaining CPUs will pick up the remaining highest prioritythreads as a consequence of processing the reschedule interrupt.The final change to the scheduler is in the handling oftimeslicing. Only one CPU receives timer interrupts, although all CPUsmust handle timeslicing. To make this work, the CPU that receives thetimer interrupt decrements the timeslice counter for all CPUs, notjust its own. If the counter for a CPU reaches zero, then it sends atimeslice interrupt to that CPU. On receiving the interrupt thedestination CPU enters the scheduler and looks for another thread atthe same priority to run. This is somewhat more efficient thandistributing clock ticks to all CPUs, since the interrupt is onlyneeded when a timeslice occurs.Device Drivers~~~~~~~~~~~~~~The main area where the SMP nature of a system will be most apparentis in device drivers. It is quite possible for the ISR, DSR and threadcomponents of a device driver to execute on different CPUs. For thisreason it is much more important that SMP-capable device drivers usethe driver API routines correctly.Synchronization between threads and DSRs continues to require that thethread-side code use cyg_drv_dsr_lock() and cyg_drv_dsr_unlock() toprotect access to shared data. Synchronization between ISRs and DSRsor threads requires that access to sensitive data be protected, in allplaces, by calls to cyg_drv_isr_lock() and cyg_drv_isr_unlock().The ISR lock, for SMP systems, not only disables local interrupts, butalso acquires a spinlock to protect against concurrent access fromother CPUs. This is necessary because ISRs are not run with thescheduler lock claimed. Hence they can run in parallel with othercomponents of the device driver.The ISR lock provided by the driver API is just a shared spinlock thatis available for use by all drivers. If a driver needs to implement afiner grain of locking, it can use private spinlocks, accessed via thecyg_drv_spinlock_*() functions (see API later).API Extensions--------------In general, the SMP support is invisible to application code. Allsynchronization and communication operations function exactly asbefore. The main area where code needs to be SMP aware is in thehandling of interrupt routing, and in the synchronization of ISRs,DSRs and threads.The following sections contain brief descriptions of the APIextensions added for SMP support. More details will be found in theKernel C API and Device Driver API documentation.Interrupt Routing~~~~~~~~~~~~~~~~~Two new functions have been added to the Kernel API and the devicedriver API to do interrupt routing. These are:void cyg_interrupt_set_cpu( cyg_vector_t vector, cyg_cpu_t cpu );void cyg_drv_interrupt_set_cpu( cyg_vector_t vector, cyg_cpu_t cpu );cyg_cpu_t cyg_interrupt_get_cpu( cyg_vector_t vector );cyg_cpu_t cyg_drv_interrupt_get_cpu( cyg_vector_t vector );the *_set_cpu() functions cause the given interrupt to be handled bythe nominated CPU.The *_get_cpu() functions return the CPU to which the vector isrouted.Although not currently supported, special values for the cpu argumentmay be used to indicate that the interrupt is being routed dynamicallyor is CPU-local.Once a vector has been routed to a new CPU, all other interruptmasking and configuration operations are relative to that CPU, whererelevant.Synchronization~~~~~~~~~~~~~~~
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -