Real-time operating system - AbsoluteAstronomy.com

A real-time operating system (RTOS) is an operating system

Operating system

An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

(OS) intended to serve real-time

Real-time computing

In computer science, real-time computing , or reactive computing, is the study of hardware and software systems that are subject to a "real-time constraint"— e.g. operational deadlines from event to system response. Real-time programs must guarantee response within strict time constraints...

application requests.

A key characteristic of a RTOS is the level of its consistency concerning the amount of time it takes to accept and complete an application's task; the variability is jitter
Jitter
Jitter is the undesired deviation from true periodicity of an assumed periodic signal in electronics and telecommunications, often in relation to a reference clock source. Jitter may be observed in characteristics such as the frequency of successive pulses, the signal amplitude, or phase of...

. A hard real-time operating system has less jitter than a soft real-time operating system. The chief design goal is not high throughput

Throughput

In communication networks, such as Ethernet or packet radio, throughput or network throughput is the average rate of successful message delivery over a communication channel. This data may be delivered over a physical or logical link, or pass through a certain network node...

, but rather a guarantee of a soft or hard performance category. A RTOS that can usually or generally meet a deadline is a soft real-time OS, but if it can meet a deadline deterministically

Deterministic algorithm

In computer science, a deterministic algorithm is an algorithm which, in informal terms, behaves predictably. Given a particular input, it will always produce the same output, and the underlying machine will always pass through the same sequence of states...

it is a hard real-time OS.

A real-time OS has an advanced algorithm for scheduling

Scheduling (computing)

In computer science, a scheduling is the method by which threads, processes or data flows are given access to system resources . This is usually done to load balance a system effectively or achieve a target quality of service...

. Scheduler flexibility enables a wider, computer-system orchestration of process priorities, but a real-time OS is more frequently dedicated to a narrow set of applications. Key factors in a real-time OS are minimal interrupt latency

Interrupt latency

In real-time operating systems, interrupt latency is the time between the generation of an interrupt by a device and the servicing of the device which generated the interrupt. For many operating systems, devices are serviced as soon as the device's interrupt handler is executed...

and minimal thread switching latency

Thread switching latency

The thread switching latency is the time needed by the operating system to switch the CPU to another thread. In some operating systems running on some hardware, switching between threads belonging to the same process is much faster than switching to a thread from different process .-See also:*...

, but a real-time OS is valued more for how quickly or how predictably it can respond than for the amount of work it can perform in a given period of time.

Design philosophies

The most common designs are:

Event-driven which switches tasks
Context switch
A context switch is the computing process of storing and restoring the state of a CPU so that execution can be resumed from the same point at a later time. This enables multiple processes to share a single CPU. The context switch is an essential feature of a multitasking operating system...

only when an event of higher priority needs servicing, called preemptive priority
Preemption (computing)
In computing, preemption is the act of temporarily interrupting a task being carried out by a computer system, without requiring its cooperation, and with the intention of resuming the task at a later time. Such a change is known as a context switch...

, or priority scheduling.
Time-sharing designs switch tasks on a regular clock interrupt, and on events, called round robin
Round-robin scheduling
Round-robin is one of the simplest scheduling algorithms for processes in an operating system. As the term is generally used, time slices are assigned to each process in equal portions and in circular order, handling all processes without priority . Round-robin scheduling is simple, easy to...

.

Time-sharing designs switch tasks more often than strictly needed, but give smoother multitasking

Computer multitasking

In computing, multitasking is a method where multiple tasks, also known as processes, share common processing resources such as a CPU. In the case of a computer with a single CPU, only one task is said to be running at any point in time, meaning that the CPU is actively executing instructions for...

, giving the illusion that a process or user has sole use of a machine.

Early CPU design

CPU design

CPU design is the design engineering task of creating a central processing unit , a component of computer hardware. It is a subfield of electronics engineering and computer engineering.- Overview :CPU design focuses on these areas:...

s needed many cycles to switch tasks, during which the CPU could do nothing else useful. For example, with a 20 MHz 68000

Motorola 68000

The Motorola 68000 is a 16/32-bit CISC microprocessor core designed and marketed by Freescale Semiconductor...

processor (typical of late 1980s), task switch times are roughly 20 microseconds. (In contrast, a 100 MHz ARM

ARM architecture

ARM is a 32-bit reduced instruction set computer instruction set architecture developed by ARM Holdings. It was named the Advanced RISC Machine, and before that, the Acorn RISC Machine. The ARM architecture is the most widely used 32-bit ISA in numbers produced...

CPU (from 2008) switches in less than 3 microseconds.) Because of this, early OSes tried to minimize wasting CPU time by avoiding unnecessary task switching.

Scheduling

In typical designs, a task has three states:

Running (executing on the CPU);
Ready (ready to be executed);
Blocked (waiting for input/output).

Most tasks are blocked or ready most of the time because generally only one task can run at a time per CPU. The number of items in the ready queue can greatly vary, depending on the number of tasks the system needs to perform and the type of scheduler that the system uses. On simpler non-preemptive but still multitasking systems, a task has to give up its time on the CPU to other tasks, which can cause the ready queue to have a greater number of overall tasks in the ready to be executed state (resource starvation

Resource starvation

In computer science, starvation is a multitasking-related problem, where a process is perpetually denied necessary resources. Without those resources, the program can never finish its task....

).

Usually the data structure of the ready list in the scheduler is designed to minimize the worst-case length of time spent in the scheduler's critical section, during which preemption is inhibited, and, in some cases, all interrupts are disabled. But the choice of data structure depends also on the maximum number of tasks that can be on the ready list.

If there are never more than a few tasks on the ready list, then a doubly linked list of ready tasks is likely optimal. If the ready list usually contains only a few tasks but occasionally contains more, then the list should be sorted by priority. That way, finding the highest priority task to run does not require iterating through the entire list. Inserting a task then requires walking the ready list until reaching either the end of the list, or a task of lower priority than that of the task being inserted.

Care must be taken not to inhibit preemption during this search. Longer critical sections should be divided into small pieces. If an interrupt occurs that makes a high priority task ready during the insertion of a low priority task, that high priority task can be inserted and run immediately before the low priority task is inserted.

The critical response time, sometimes called the flyback time, is the time it takes to queue a new ready task and restore the state of the highest priority task to running. In a well-designed RTOS, readying a new task will take 3 to 20 instructions per ready-queue entry, and restoration of the highest-priority ready task will take 5 to 30 instructions.

In more advanced systems, real-time tasks share computing resources with many non-real-time tasks, and the ready list can be arbitrarily long. In such systems, a scheduler ready list implemented as a linked list would be inadequate.

Algorithms

Some commonly used RTOS scheduling algorithms are:

Cooperative scheduling

Preemptive scheduling
Preemption (computing)
In computing, preemption is the act of temporarily interrupting a task being carried out by a computer system, without requiring its cooperation, and with the intention of resuming the task at a later time. Such a change is known as a context switch...
- Rate-monotonic scheduling
  Rate-monotonic scheduling
  In computer science, rate-monotonic scheduling is a scheduling algorithm used in real-time operating systems with a static-priority scheduling class...
- Round-robin scheduling
  Round-robin scheduling
  Round-robin is one of the simplest scheduling algorithms for processes in an operating system. As the term is generally used, time slices are assigned to each process in equal portions and in circular order, handling all processes without priority . Round-robin scheduling is simple, easy to...
- Fixed priority pre-emptive scheduling
  Fixed priority pre-emptive scheduling
  Fixed-priority pre-emptive scheduling is a scheduling system commonly used in real-time systems. With fixed priority pre-emptive scheduling, the scheduler ensures that at any given time, the processor executes the highest priority task of all those tasks that are currently ready to execute.The...
  
  , an implementation of preemptive time slicing
- Fixed-Priority Scheduling with Deferred Preemption
- Fixed-Priority Non-preemptive Scheduling
- Critical section preemptive scheduling
- Static time scheduling
Earliest Deadline First
Earliest deadline first scheduling
Earliest deadline first or least time to go is a dynamic scheduling algorithm used in real-time operating systems. It places processes in a priority queue. Whenever a scheduling event occurs the queue will be searched for the process closest to its deadline...

approach
Stochastic
Stochastic
Stochastic refers to systems whose behaviour is intrinsically non-deterministic. A stochastic process is one whose behavior is non-deterministic, in that a system's subsequent state is determined both by the process's predictable actions and by a random element. However, according to M. Kac and E...

digraph
Directed graph
A directed graph or digraph is a pair G= of:* a set V, whose elements are called vertices or nodes,...

s with multi-threaded
Thread (computer science)
In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system. The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process...

graph traversal
Tree traversal
In computer science, tree-traversal refers to the process of visiting each node in a tree data structure, exactly once, in a systematic way. Such traversals are classified by the order in which the nodes are visited...

Intertask communication and resource sharing

Multitasking systems must manage sharing data and hardware resources among multiple tasks. It is usually "unsafe" for two tasks to access the same specific data or hardware resource simultaneously. "Unsafe" means the results are inconsistent or unpredictable. There are three common approaches to resolve this problem:

Temporarily masking/disabling interrupts

General-purpose operating systems usually do not allow user programs to mask (disable) interrupt

Interrupt

In computing, an interrupt is an asynchronous signal indicating the need for attention or a synchronous event in software indicating the need for a change in execution....

s, because the user program could control the CPU for as long as it wishes. Modern CPUs don't allow user mode code to disable interrupts as such control is considered a key operating system resource. Many embedded systems and RTOSs, however, allow the application itself to run in kernel mode for greater system call

System call

In computing, a system call is how a program requests a service from an operating system's kernel. This may include hardware related services , creating and executing new processes, and communicating with integral kernel services...

efficiency and also to permit the application to have greater control of the operating environment without requiring OS intervention.

On single-processor systems, if the application runs in kernel mode and can mask interrupts, often interrupt disablement is the best (lowest overhead) solution to prevent simultaneous access to a shared resource. While interrupts are masked, the current task has exclusive use of the CPU since no other task or interrupt can take control, so the critical section

Critical section

In concurrent programming a critical section is a piece of code that accesses a shared resource that must not be concurrently accessed by more than one thread of execution. A critical section will usually terminate in fixed time, and a thread, task or process will have to wait a fixed time to...

is protected. When the task exits its critical section, it must unmask interrupts; pending interrupts, if any, will then execute. Temporarily masking interrupts should only be done when the longest path through the critical section is shorter than the desired maximum interrupt latency

Interrupt latency

, or else this method increases the system's maximum interrupt latency. Typically this method of protection is used only when the critical section is just a few instructions and contains no loops. This method is ideal for protecting hardware bit-mapped registers when the bits are controlled by different tasks.

Binary semaphores

When the critical section is longer than a few source code lines or involves lengthy looping, an embedded/real-time algorithm must resort to using mechanisms identical or similar to those available on general-purpose operating systems, such as semaphores

Semaphore (programming)

In computer science, a semaphore is a variable or abstract data type that provides a simple but useful abstraction for controlling access by multiple processes to a common resource in a parallel programming environment....

and OS-supervised interprocess messaging. Such mechanisms involve system calls, and usually invoke the OS's dispatcher code on exit, so they typically take hundreds of CPU instructions to execute, while masking interrupts may take as few as one instruction on some processors. But for longer critical sections, there may be no choice; interrupts cannot be masked for long periods without increasing the system's interrupt latency.

A binary semaphore is either locked or unlocked. When it is locked, tasks must wait for the semaphore to unlock. A binary semaphore is therefore equivalent to a mutex. Typically a task will set a timeout on its wait for a semaphore. There are several well-known problems with semaphore based designs such as priority inversion

Priority inversion

In computer science, priority inversion is a problematic scenario in scheduling when a higher priority task is indirectly preempted by a lower priority task effectively "inverting" the relative priorities of the two tasks....

and deadlock

Deadlock

A deadlock is a situation where in two or more competing actions are each waiting for the other to finish, and thus neither ever does. It is often seen in a paradox like the "chicken or the egg"...

s.

In priority inversion

Priority inversion

a high priority task waits because a low priority task has a semaphore. A typical solution is to have the task that owns a semaphore run at (inherit) the priority of the highest waiting task. But this simplistic approach fails when there are multiple levels of waiting: task A waits for a binary semaphore locked by task B, which waits for a binary semaphore locked by task C. Handling multiple levels of inheritance without introducing instability in cycles is complex and problematic.

In a deadlock

Deadlock

A deadlock is a situation where in two or more competing actions are each waiting for the other to finish, and thus neither ever does. It is often seen in a paradox like the "chicken or the egg"...

, two or more tasks lock semaphores without timeouts and then wait forever for the other task's semaphore, creating a cyclic dependency. The simplest deadlock scenario occurs when two tasks alternately lock two semaphores, but in the opposite order. Deadlock is prevented by careful design or by having floored semaphores, which pass control of a semaphore to the higher priority task on defined conditions.

Message passing

The other approach to resource sharing is for tasks to send messages in an organized message passing

Message passing

Message passing in computer science is a form of communication used in parallel computing, object-oriented programming, and interprocess communication. In this model, processes or objects can send and receive messages to other processes...

scheme. In this paradigm, the resource is managed directly by only one task. When another task wants to interrogate or manipulate the resource, it sends a message to the managing task. Although their real-time behavior is less crisp than semaphore systems, simple message-based systems avoid most protocol deadlock hazards, and are generally better-behaved than semaphore systems. However, problems like those of semaphores are possible. Priority inversion can occur when a task is working on a low-priority message and ignores a higher-priority message (or a message originating indirectly from a high priority task) in its incoming message queue. Protocol deadlocks can occur when two or more tasks wait for each other to send response messages.

Interrupt handlers and the scheduler

Since an interrupt handler blocks the highest priority task from running, and since real time operating systems are designed to keep thread latency to a minimum, interrupt handlers are typically kept as short as possible. The interrupt handler defers all interaction with the hardware as long as possible; typically all that is necessary is to acknowledge or disable the interrupt (so that it won't occur again when the interrupt handler returns). The interrupt handler then queues work to be done at a lower priority level, such as unblocking a driver task through releasing a semaphore or sending a message. A scheduler often provides the ability to unblock a task from interrupt handler context.

An OS maintains catalogs of objects it manages such as threads, mutexes, memory, and so on. Updates to this catalog must be strictly controlled. For this reason it can be problematic when an interrupt handler calls an OS function while the application is in the act of also doing so. The OS function called from an interrupt handler could find the object database to be in an inconsistent state because of the application's update. There are two major approaches to deal with this problem: the unified architecture and the segmented architecture. RTOSs implementing the unified architecture solve the problem by simply disabling interrupts while the internal catalog is updated. The downside of this is that interrupt latency increases, potentially losing interrupts. The segmented architecture does not make direct OS calls but delegates the OS related work to a separate handler. This handler runs at a higher priority than any thread but lower than the interrupt handlers. The advantage of this architecture is that it adds very few cycles to interrupt latency. As a result, OSes which implement the segmented architecture are more predictable and can deal with higher interrupt rates compared to the unified architecture.

Memory allocation

Memory allocation is more critical in an RTOS than in other operating systems.

First, speed of allocation is important. A standard memory allocation scheme scans a linked list of indeterminate length to find a suitable free memory block. This is unacceptable in an RTOS since memory allocation has to occur within a certain amount of time.

The simple fixed-size-blocks algorithm works quite well for simple embedded system

Embedded system

An embedded system is a computer system designed for specific control functions within a larger system. often with real-time computing constraints. It is embedded as part of a complete device often including hardware and mechanical parts. By contrast, a general-purpose computer, such as a personal...

s because of its low overhead.

See memory allocation for more details.

Examples

An early example of a large-scale real-time operating system was the Transaction Processing Facility

Transaction Processing Facility

TPF is an IBM real-time operating system for mainframes descended from the IBM System/360 family, including zSeries and System z9. The name is an initialism for Transaction Processing Facility....

developed by American Airlines

American Airlines

American Airlines, Inc. is the world's fourth-largest airline in passenger miles transported and operating revenues. American Airlines is a subsidiary of the AMR Corporation and is headquartered in Fort Worth, Texas adjacent to its largest hub at Dallas/Fort Worth International Airport...

and IBM for the Sabre Airline Reservations System

Sabre (computer system)

Sabre Global Distribution System , owned by Sabre Holdings, is used by more than 55,000 travel agencies around the world with more than 400 airlines, 88,000 hotels, 24 car rental brands, and 13 cruise lines...

.

Currently the best known, most widely deployed, real-time operating systems are

LynxOS
LynxOS
The LynxOS RTOS is a Unix-like real-time operating system from LynuxWorks . Sometimes known as the Lynx Operating System, LynxOS features full POSIX conformance and, more recently, Linux compatibility...
OSE
Operating System Embedded
The Operating System Embedded is a real-time embedded operating system created by the Swedish information technology company ENEA AB. Bengt Eliasson, who at the time was a consultant from ENEA with an assignment at Ericsson, wrote the basic parts of the kernel...
QNX
QNX
QNX is a commercial Unix-like real-time operating system, aimed primarily at the embedded systems market. The product was originally developed by Canadian company, QNX Software Systems, which was later acquired by Canadian BlackBerry-producer Research In Motion.-Description:As a microkernel-based...
RTLinux
RTLinux
RTLinux or RTCore is a hard realtime RTOS microkernel that runs the entire Linux operating system as a fully preemptive process.It was developed by Victor Yodaiken , Michael Barabanov , Cort Dougan and others at the New Mexico Institute of Mining and Technology and then as a commercial product at...
VxWorks
VxWorks
VxWorks is a real-time operating system developed as proprietary software by Wind River Systems of Alameda, California, USA. First released in 1987, VxWorks is designed for use in embedded systems.- History :...
Windows CE
Windows CE
Microsoft Windows CE is an operating system developed by Microsoft for embedded systems. Windows CE is a distinct operating system and kernel, rather than a trimmed-down version of desktop Windows...

See the list of real-time operating systems for a comprehensive list. Also, see the list of operating systems for all types of operating systems.