⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 smp.txt

📁 eCos/RedBoot for勤研ARM AnywhereII(4510) 含全部源代码
💻 TXT
📖 第 1 页 / 共 2 页
字号:

eCos SMP Support
================

eCos contains support for limited Symmetric Multi-Processing
(SMP). This is only available on selected architectures and platforms.

This first part of this document describes the platform-independent
parts of the SMP support. Annexes at the end of this document describe
any details that are specific to a particular platform.

Target Hardware Limitations
---------------------------

To allow a reasonable implementation of SMP, and to reduce the
disruption to the existing source base, a number of assumptions have
been made about the features of the target hardware.

- Modest multiprocessing. The typical number of CPUs supported is two
  to four, with an upper limit around eight. While there are no
  inherent limits in the code, hardware and algorithmic limitations
  will probably become significant beyond this point.

- SMP synchronization support. The hardware must supply a mechanism to
  allow software on two CPUs to synchronize. This is normally provided
  as part of the instruction set in the form of test-and-set,
  compare-and-swap or load-link/store-conditional instructions. An
  alternative approach is the provision of hardware semaphore
  registers which can be used to serialize implementations of these
  operations. Whatever hardware facilities are available, they are
  used in eCos to implement spinlocks.

- Coherent caches. It is assumed that no extra effort will be required
  to access shared memory from any processor. This means that either
  there are no caches, they are shared by all processors, or are
  maintained in a coherent state by the hardware. It would be too
  disruptive to the eCos sources if every memory access had to be
  bracketed by cache load/flush operations. Any hardware that requires
  this is not supported.

- Uniform addressing. It is assumed that all memory that is
  shared between CPUs is addressed at the same location from all
  CPUs. Like non-coherent caches, dealing with CPU-specific address
  translation is considered too disruptive to the eCos source
  base. This does not, however, preclude systems with non-uniform
  access costs for different CPUs.

- Uniform device addressing. As with access to memory, it is assumed
  that all devices are equally accessible to all CPUs. Since device
  access is often made from thread contexts, it is not possible to
  restrict access to device control registers to certain CPUs, since
  there is currently no support for binding or migrating threads to CPUs.
  
- Interrupt routing. The target hardware must have an interrupt
  controller that can route interrupts to specific CPUs. It is
  acceptable for all interrupts to be delivered to just one CPU, or
  for some interrupts to be bound to specific CPUs, or for some
  interrupts to be local to each CPU. At present dynamic routing,
  where a different CPU may be chosen each time an interrupt is
  delivered, is not supported. ECos cannot support hardware where all
  interrupts are delivered to all CPUs simultaneously with the
  expectation that software will resolve any conflicts.

- Inter-CPU interrupts. A mechanism to allow one CPU to interrupt
  another is needed. This is necessary so that events on one CPU can
  cause rescheduling on other CPUs.

- CPU Identifiers. Code running on a CPU must be able to determine
  which CPU it is running on. The CPU Id is usually provided either in
  a CPU status register, or in a register associated with the
  inter-CPU interrupt delivery subsystem. Ecos expects CPU Ids to be
  small positive integers, although alternative representations, such
  as bitmaps, can be converted relatively easily. Complex mechanisms
  for getting the CPU Id cannot be supported. Getting the CPU Id must
  be a cheap operation, since it is done often, and in performance
  critical places such as interrupt handlers and the scheduler.
  
Kernel Support
--------------

This section describes how SMP is handled in the kernel, and where
system behaviour differs from a single CPU system.

System Startup
~~~~~~~~~~~~~~

System startup takes place on only one CPU, called the primary
CPU. All other CPUs, the secondary CPUs, are either placed in
suspended state at reset, or are captured by the HAL and put into
a spin as they start up.

The primary CPU is responsible for copying the DATA segment and
zeroing the BSS (if required), calling HAL variant and platform
initialization routines and invoking constructors. It then calls
cyg_start() to enter the application. The application may then create
extra threads and other objects.

It is only when the application calls Cyg_Scheduler::start() that the
secondary CPUs are initialized. This routine scans the list of
available secondary CPUs and calls HAL_SMP_CPU_START() to start each one.
Finally it calls Cyg_Scheduler::start_cpu().

Each secondary CPU starts in the HAL, where it completes any per-CPU
initialization before calling into the kernel at
cyg_kernel_cpu_startup(). Here it claims the scheduler lock and calls 
Cyg_Scheduler::start_cpu().

Cyg_Scheduler::start_cpu() is common to both the primary and secondary
CPUs. The first thing this code does is to install an interrupt object
for this CPU's inter-CPU interrupt. From this point on the code is the
same as for the single CPU case: an initial thread is chosen and
entered.

From this point on the CPUs are all equal, eCos makes no further
distinction between the primary and secondary CPUs. However, the
hardware may still distinguish them as far as interrupt delivery is
concerned.


Scheduling
~~~~~~~~~~

To function correctly an operating system kernel must protect its
vital data structures, such as the run queues, from concurrent
access. In a single CPU system the only concurrent activities to worry
about are asynchronous interrupts. The kernel can easily guard its
data structures against these by disabling interrupts. However, in a
multi-CPU system, this is inadequate since it does not block access by
other CPUs.

The eCos kernel protects its vital data structures using the scheduler
lock. In single CPU systems this is a simple counter that is
atomically incremented to acquire the lock and decremented to release
it. If the lock is decremented to zero then the scheduler may be
invoked to choose a different thread to run. Because interrupts may
continue to be serviced while the scheduler lock is claimed, ISRs are
not allowed to access kernel data structures, or call kernel routines
that can. Instead all such operations are deferred to an associated
DSR routine that is run during the lock release operation, when the
data structures are in a consistent state.

By choosing a kernel locking mechanism that does not rely on interrupt
manipulation to protect data structures, it is easier to convert eCos
to SMP than would otherwise be the case. The principal change needed to
make eCos SMP-safe is to convert the scheduler lock into a nestable
spin lock. This is done by adding a spinlock and a CPU id to the
original counter.

The algorithm for acquiring the scheduler lock is very simple. If the
scheduler lock's CPU id matches the current CPU then it can increment
the counter and continue. If it does not match, the CPU must spin on
the spinlock, after which it may increment the counter and store its
own identity in the CPU id.

To release the lock, the counter is decremented. If it goes to zero
the CPU id value must be set to NONE and the spinlock cleared.

To protect these sequences against interrupts, they must be performed
with interrupts disabled. However, since these are very short code
sequences, they will not have an adverse effect on the interrupt
latency.

Beyond converting the scheduler lock, further preparing the kernel for
SMP is a relatively minor matter. The main changes are to convert
various scalar housekeeping variables into arrays indexed by CPU
id. These include the current thread pointer, the need_reschedule
flag and the timeslice counter.

At present only the Multi-Level Queue (MLQ) scheduler is capable of
supporting SMP configurations. The main change made to this scheduler
is to cope with having several threads in execution at the same
time. Running threads are marked with the CPU they are executing on.
When scheduling a thread, the scheduler skips past any running threads
until it finds a thread that is pending. While not a constant-time
algorithm, as in the single CPU case, this is still deterministic,
since the worst case time is bounded by the number of CPUs in the
system.

A second change to the scheduler is in the code used to decide when
the scheduler should be called to choose a new thread. The scheduler
attempts to keep the *n* CPUs running the *n* highest priority
threads. Since an event or interrupt on one CPU may require a
reschedule on another CPU, there must be a mechanism for deciding
this. The algorithm currently implemented is very simple. Given a
thread that has just been awakened (or had its priority changed), the
scheduler scans the CPUs, starting with the one it is currently
running on, for a current thread that is of lower priority than the
new one. If one is found then a reschedule interrupt is sent to that
CPU and the scan continues, but now using the current thread of the
rescheduled CPU as the candidate thread. In this way the new thread
gets to run as quickly as possible, hopefully on the current CPU, and
the remaining CPUs will pick up the remaining highest priority
threads as a consequence of processing the reschedule interrupt.

The final change to the scheduler is in the handling of
timeslicing. Only one CPU receives timer interrupts, although all CPUs
must handle timeslicing. To make this work, the CPU that receives the
timer interrupt decrements the timeslice counter for all CPUs, not
just its own. If the counter for a CPU reaches zero, then it sends a
timeslice interrupt to that CPU. On receiving the interrupt the
destination CPU enters the scheduler and looks for another thread at
the same priority to run. This is somewhat more efficient than
distributing clock ticks to all CPUs, since the interrupt is only
needed when a timeslice occurs.

Device Drivers
~~~~~~~~~~~~~~

The main area where the SMP nature of a system will be most apparent
is in device drivers. It is quite possible for the ISR, DSR and thread
components of a device driver to execute on different CPUs. For this
reason it is much more important that SMP-capable device drivers use
the driver API routines correctly.

Synchronization between threads and DSRs continues to require that the
thread-side code use cyg_drv_dsr_lock() and cyg_drv_dsr_unlock() to
protect access to shared data. Synchronization between ISRs and DSRs
or threads requires that access to sensitive data be protected, in all
places, by calls to cyg_drv_isr_lock() and cyg_drv_isr_unlock().

The ISR lock, for SMP systems, not only disables local interrupts, but
also acquires a spinlock to protect against concurrent access from
other CPUs. This is necessary because ISRs are not run with the
scheduler lock claimed. Hence they can run in parallel with other
components of the device driver.

The ISR lock provided by the driver API is just a shared spinlock that
is available for use by all drivers. If a driver needs to implement a
finer grain of locking, it can use private spinlocks, accessed via the
cyg_drv_spinlock_*() functions (see API later).


API Extensions
--------------

In general, the SMP support is invisible to application code. All
synchronization and communication operations function exactly as
before. The main area where code needs to be SMP aware is in the
handling of interrupt routing, and in the synchronization of ISRs,
DSRs and threads.

The following sections contain brief descriptions of the API
extensions added for SMP support. More details will be found in the
Kernel C API and Device Driver API documentation.

Interrupt Routing
~~~~~~~~~~~~~~~~~

Two new functions have been added to the Kernel API and the device
driver API to do interrupt routing. These are:

void cyg_interrupt_set_cpu( cyg_vector_t vector, cyg_cpu_t cpu );
void cyg_drv_interrupt_set_cpu( cyg_vector_t vector, cyg_cpu_t cpu );

cyg_cpu_t cyg_interrupt_get_cpu( cyg_vector_t vector );
cyg_cpu_t cyg_drv_interrupt_get_cpu( cyg_vector_t vector );

the *_set_cpu() functions cause the given interrupt to be handled by
the nominated CPU.

The *_get_cpu() functions return the CPU to which the vector is
routed.

Although not currently supported, special values for the cpu argument
may be used to indicate that the interrupt is being routed dynamically
or is CPU-local.

Once a vector has been routed to a new CPU, all other interrupt
masking and configuration operations are relative to that CPU, where
relevant.

Synchronization
~~~~~~~~~~~~~~~

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -