All Topics  
Memory management unit

 

   Email Print
   Bookmark   Link






 

Memory management unit



 
 
A memory management unit (MMU), sometimes called paged memory management unit (PMMU), is a computer hardware
Computer hardware

A personal computer is made up of computer hardware, multiple physical components onto which can be loaded into a multitude of software that perform the functions of the computer....
 component responsible for handling accesses to memory
Computer memory

Computer memory is usually meant to refer to the semiconductor technology that is used to store information in Electronics devices. Current primary computer memory makes use of integrated circuits consisting of silicon-based transistors....
 requested by the central processing unit
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
 (CPU). Its functions include translation of virtual address
Virtual address

In computer terminology a virtual address is an address identifying a virtual entity. The term virtual address is most commonly used for virtual memory or virtual network address....
es to physical address
Physical address

In computing, a physical address, also real address, or binary address, is the memory address that is electronically presented on the address bus circuitry in order to enable the data bus to access a particular storage cell of main memory....
es (i.e., virtual memory
Virtual memory

Virtual memory is a computer system technique which gives an application program the impression that it has contiguous working memory , while in fact it may be physically fragmented and may even overflow on to disk storage....
 management), memory protection
Memory protection

Memory protection is a way to control memory usage on a computer, and is core to virtually every modern operating system. The main purpose of memory protection is to prevent a process running on an operating system from accessing memory beyond that allocated to it....
, cache
CPU cache

A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access computer storage. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations....
 control, bus
Computer bus

In computer architecture, a bus is a subsystem that transfers data between computer components inside a computer or between computers. Each bus defines its set of connectors to physically plug devices, cards or cables together....
 arbitration
Arbiter (electronics)

Arbiters are electronic devices that allocate access to shared resources....
, and, in simpler computer architectures (especially 8-bit
8-bit

Eight-bit CPUs normally use an 8-bit data bus and a 16-bit address bus which means that their address space is limited to 64 KBs. This is not a "natural law", however, so there are exceptions....
 systems), bank switching
Bank switching

Bank switching was a technique common in 8-bit microcomputer systems, to increase the amount of addressable random-access memory and read-only memory without extending the address bus....
.

How it works
Modern MMUs typically divide the virtual address space
Address space

In computing, an address space defines a range of discrete addresses, each of which may correspond to a physical or virtual memory register, a Node , peripheral device, disk sector or other logical or physical entity....
 (the range of addresses used by the processor) into pages, each having a size which is a power of 2, usually a few kilobyte
Kilobyte

Kilobyte is a unit of Computer data storage equal to either 1,024 bytes or 1,000 bytes , depending on context.It is abbreviated in a number of ways: KB, kB, K and Kbyte....
s.






Discussion
Ask a question about 'Memory management unit'
Start a new discussion about 'Memory management unit'
Answer questions from other users
Full Discussion Forum



Encyclopedia


A memory management unit (MMU), sometimes called paged memory management unit (PMMU), is a computer hardware
Computer hardware

A personal computer is made up of computer hardware, multiple physical components onto which can be loaded into a multitude of software that perform the functions of the computer....
 component responsible for handling accesses to memory
Computer memory

Computer memory is usually meant to refer to the semiconductor technology that is used to store information in Electronics devices. Current primary computer memory makes use of integrated circuits consisting of silicon-based transistors....
 requested by the central processing unit
Central processing unit

A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
 (CPU). Its functions include translation of virtual address
Virtual address

In computer terminology a virtual address is an address identifying a virtual entity. The term virtual address is most commonly used for virtual memory or virtual network address....
es to physical address
Physical address

In computing, a physical address, also real address, or binary address, is the memory address that is electronically presented on the address bus circuitry in order to enable the data bus to access a particular storage cell of main memory....
es (i.e., virtual memory
Virtual memory

Virtual memory is a computer system technique which gives an application program the impression that it has contiguous working memory , while in fact it may be physically fragmented and may even overflow on to disk storage....
 management), memory protection
Memory protection

Memory protection is a way to control memory usage on a computer, and is core to virtually every modern operating system. The main purpose of memory protection is to prevent a process running on an operating system from accessing memory beyond that allocated to it....
, cache
CPU cache

A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access computer storage. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations....
 control, bus
Computer bus

In computer architecture, a bus is a subsystem that transfers data between computer components inside a computer or between computers. Each bus defines its set of connectors to physically plug devices, cards or cables together....
 arbitration
Arbiter (electronics)

Arbiters are electronic devices that allocate access to shared resources....
, and, in simpler computer architectures (especially 8-bit
8-bit

Eight-bit CPUs normally use an 8-bit data bus and a 16-bit address bus which means that their address space is limited to 64 KBs. This is not a "natural law", however, so there are exceptions....
 systems), bank switching
Bank switching

Bank switching was a technique common in 8-bit microcomputer systems, to increase the amount of addressable random-access memory and read-only memory without extending the address bus....
.

How it works


Modern MMUs typically divide the virtual address space
Address space

In computing, an address space defines a range of discrete addresses, each of which may correspond to a physical or virtual memory register, a Node , peripheral device, disk sector or other logical or physical entity....
 (the range of addresses used by the processor) into pages, each having a size which is a power of 2, usually a few kilobyte
Kilobyte

Kilobyte is a unit of Computer data storage equal to either 1,024 bytes or 1,000 bytes , depending on context.It is abbreviated in a number of ways: KB, kB, K and Kbyte....
s. The bottom n bits of the address (the offset within a page) are left unchanged. The upper address bits are the (virtual) page number. The MMU normally translates virtual page numbers to physical page numbers via an associative cache called a Translation Lookaside Buffer
Translation Lookaside Buffer

A Translation lookaside buffer is a Central processing unit CPU cache that is used by Memory management unit to improve the speed of virtual address translation....
 (TLB). When the TLB lacks a translation, a slower mechanism involving hardware-specific data structures or software assistance is used. The data found in such data structures are typically called page table entries (PTEs), and the data structure itself is typically called a page table
Page table

A page table is the data structure used by a virtual memory system in a computer operating system to store the mapping between virtual addresses and physical addresses....
. The physical page number is combined with the page offset to give the complete physical address.

A PTE or TLB entry may also include information about whether the page has been written to (the dirty bit), when it was last used (the accessed bit, for a least recently used page replacement algorithm
Page replacement algorithm

In a computer operating system that utilizes paging for virtual memory memory management, page replacement algorithms decide which memory pages to page out when a page of memory needs to be allocated....
), what kind of processes (user mode, supervisor mode) may read and write it, and whether it should be cache
Cache

In computer science, a cache is a collection of data duplicating original values stored elsewhere or computed earlier, where the original data is expensive to fetch or to compute, compared to the cost of reading the cache....
d.

Sometimes, a TLB entry or PTE prohibits access to a virtual page, perhaps because no physical random access memory has been allocated to that virtual page. In this case the MMU signals a page fault
Page fault

In computer storage technology, a page is a fixed-length block of memory that is used as a unit of transfer between physical memory and external storage like a hard disk, and a page fault is an interrupt to the software raised by the hardware, when a program accesses a page that is mapped in address space, but not loaded in physical memory....
 to the CPU. The operating system
Operating system

An operating system is an interface between hardware and applications; it is responsible for the management and coordination of activities and the sharing of the limited resources of the computer....
 (OS) then handles the situation, perhaps by trying to find a spare frame of RAM and set up a new PTE to map it to the requested virtual address. If no RAM is free, it may be necessary to choose an existing page, using some replacement algorithm, and save it to disk (this is called "paging
Paging

In computer operating systems that have their main memory divided into page , paging is a transfer of pages between main memory and an auxiliary store, such as hard disk drive....
"). With some MMUs, there can also be a shortage of PTEs or TLB entries, in which case the OS will have to free one for the new mapping.

In some cases a "page fault" may indicate a software bug. A key benefit of an MMU is memory protection
Memory protection

Memory protection is a way to control memory usage on a computer, and is core to virtually every modern operating system. The main purpose of memory protection is to prevent a process running on an operating system from accessing memory beyond that allocated to it....
: an OS can use it to protect against errant programs, by disallowing access to memory that a particular program should not have access to. Typically, an OS assigns each program its own virtual address space.

An MMU also reduces the problem of fragmentation
Fragmentation (computer)

In computer storage, fragmentation is a phenomenon in which storage space is used inefficiently, reducing storage capacity. The term is also used to denote the wasted space itself....
 of memory. After blocks of memory have been allocated and freed, the free memory may become fragmented (discontinuous) so that the largest contiguous block of free memory may be much smaller than the total amount. With virtual memory, a contiguous range of virtual addresses can be mapped to several non-contiguous blocks of physical memory.

In some early microprocessor
Microprocessor

A microprocessor incorporates most or all of the functions of a central processing unit on a single integrated circuit . The first microprocessors emerged in the early 1970s and were used for electronic calculators, using Binary-coded decimal arithmetic on 4-bit Word ....
 designs, memory management was performed by a separate integrated circuit
Integrated circuit

In electronics, an integrated circuit is a miniaturized electronic circuit that has been manufactured in the surface of a thin Wafer of semiconductor material....
 such as the MC 68851 used with the Motorola 68020
Motorola 68020

The Motorola 68020 is a 32-bit microprocessor from Motorola, released in 1984. It is the successor to the Motorola 68010 and is succeeded by the Motorola 68030....
 CPU in the Macintosh II
Macintosh II

The Apple Macintosh II was the first personal computer model of the Macintosh II series in the Apple Macintosh line. Retailing for US$3,898 base price , the Macintosh II was the first "modular" Macintosh model, so called because it came in a horizontal desktop case like many PCs of the time....
 or the Z8015 used with the Zilog Z80
Zilog Z80

The Zilog Z80 is an 8-bit microprocessor designed and sold by Zilog from July 1976 onwards. It was widely used both in desktop and embedded computer designs as well as for military purposes....
 family of processors. Later microprocessors such as the Motorola 68030
Motorola 68030

The Motorola 68030 is a 32-bit microprocessor in Motorola's Motorola 68000 family. Released in 1987, the 68030 was the successor to the Motorola 68020, and was followed by the Motorola 68040....
 and the ZILOG Z280
Zilog Z280

The Zilog Z280 was an enhancement of the Zilog Z80 architecture introduced in July 1987, basically a slighly improved CMOS version of the earlier NMOS Zilog Z800, both versions were commercial failures....
 placed the MMU together with the CPU on the same integrated circuit, as did the Intel 80286
Intel 80286

The Intel 286, introduced on February 1, 1982, was an x86 16-bit microprocessor with 134,000 transistors.It was widely used in IBM PC compatible computers during the mid 1980s to early 1990s....
 and later x86 microprocessors.

While this article concentrates on modern MMUs, commonly based on page
Page (computing)

In a context of computer virtual memory, a page, memory page, or virtual page is a fixed-length block of main memory, that is contiguous in both physical memory addressing and virtual memory addressing....
s, early systems used a similar concept for base-limit addressing, that further developed into segmentation
Segmentation (memory)

In computing, memory segmentation is one of the most common ways to achieve memory protection; another common one is paging. In a computer system using segmentation, an instruction operand that refers to a memory location includes a value that identifies a segment and an offset within that segment....
. Those are occasionally also present on modern architectures. The x86 architecture
X86 architecture

The generic term x86 refers to the most commercially successful instruction set architecture in the history of personal computing. It derived from the model numbers, ending in "86", of the first few processor generations Backward compatibility with the original Intel 8086....
 provided segmentation rather than paging in the 80286, and provides both paging and segmentation in the 80386 and later processors.

Examples

Most modern systems divide memory into pages that are 4 KiB to 64 KiB in size, often with the possibility to use huge pages from 2 MiB
MIB

MIB may refer to any of several concepts:* Management Information Base, a computing information repository used by Simple Network Management Protocol...
 to 512 MiB in size. Page translations are cached in a TLB
Translation Lookaside Buffer

A Translation lookaside buffer is a Central processing unit CPU cache that is used by Memory management unit to improve the speed of virtual address translation....
. Some systems, mainly older RISC designs, trap
Trap (computing)

In computing and operating systems, a trap is a type of synchronization interrupt typically caused by an exception handling condition in a user process ....
 into the OS when a page translation is not found in the TLB. Most systems use a hardware-based tree walker. Most systems allow the MMU to be disabled; some disable the MMU when trapping into OS code.

VAX

VAX
VAX

VAX was an instruction set architecture developed by Digital Equipment Corporation in the mid-1970s. A 32-bit complex instruction set computer ISA, it was designed to extend or replace DEC's various Programmed Data Processor ISAs....
 pages are 512 bytes, which is very small. An OS may treat multiple pages as if they were a single larger page, for example Linux on VAX groups 8 pages together, so that the system is viewed as having 4 KiB pages. The VAX divides memory into 4 fixed-purpose regions, each 1 GiB
Gib

Gib may refer to:* A castrated male cat or ferret* Gibibit , a unit of information used, for example, to quantify computer memory or storage capacity...
 in size. They are:
  • P0 space, which is used for general-purpose per-process memory such as heaps,
  • P1 space, or control space, which is also per-process and is typically used for supervisor, executive, kernel, and user stacks and other per-process control structures managed by the operating system,
  • S0 space, or system space, which is global to all processes and stores operating system code and data, whether paged or not, including pagetables,
  • S1 space, which is unused and "Reserved to Digital
    Digital Equipment Corporation

    Digital Equipment Corporation was a pioneering United States company in the computer industry. It is often referred to within the computing industry as DEC ....
    ".


Page tables are big linear arrays. Normally this would be very wasteful when addresses are used at both ends of the possible range, but the page table for applications is itself stored in the kernel's paged memory. Thus there is effectively a 2-level tree, allowing applications to have sparse memory layout without wasting lots of space on unused page table entries. The VAX MMU is notable for lacking an accessed bit. OSes which implement paging must find some way to emulate the accessed bit if they are to operate efficiently. Typically, the OS will periodically unmap pages so that page-not-present faults can be used to let the OS set an accessed bit.

ARM

ARM architecture
ARM architecture

The ARM architecture is a 32-bit RISC central processing unit architecture developed by ARM Limited that is widely used in embedded system designs....
 based application processors implement an MMU defined by ARM's Virtual Memory System Architecture. The current architecture defines PTEs for describing 4KiB and 64KiB pages, 1MiB sections and 16MiB super-sections; legacy versions also defined a 1KiB tiny page.

TLB updates are performed automatically by page-table walking hardware.

PTEs include read/write access permission based on privilege, cacheability information, an XN bit (NX bit
NX bit

The NX bit, which stands for No eXecute, is a technology used in CPUs to segregate areas of memory for use by either storage of processor instructions or for storage of data, a feature normally only found in Harvard architecture processors....
), and a non-secure bit

IBM System/370 and successors

The IBM System/370 have had an MMU since early 1970s, it was initially known as DAT box. It has the unusual feature of storing accessed and dirty bits outside of the page table. They refer to physical memory rather than virtual memory. They are accessed by special-purpose instructions. This reduces overhead for the OS, which would otherwise need to propagate accessed and dirty bits from the page tables to a more physically-oriented data structure. This makes OS-level virtualization
Virtualization

In computing, platform virtualization is a virtualization of computers or operating systems. It hides the physical characteristics of computing platform from the users, instead showing another abstract, emulated computing platform....
 easier. These features have been inherited by succeeding mainframe architectures, up to the current z/Architecture
Z/Architecture

z/Architecture, initially and briefly called ESA/390 Modal Extensions , refers to IBM's 64-bit computing architecture for the current generation of IBM mainframe computers....
.

DEC Alpha

The DEC Alpha
DEC Alpha

Alpha, originally known as Alpha AXP, was a 64-bit reduced instruction set computer instruction set architecture developed by Digital Equipment Corporation , designed to replace the 32-bit VAX complex instruction set computer ISA and its implementations....
 processor divides memory into 8192-byte pages. After a TLB miss, low-level firmware machine code (here called PALcode
PALcode

In computing, on the DEC Alpha microprocessor, PALcode is the name used by DEC for a set of functions in the System Reference Manual or AlphaBIOS firmware, providing a hardware abstraction layer for system software, covering features such as cache management, translation lookaside buffer miss handling, interrupt handling and exception handl...
) walks a 3-level tree-structured page table. Addresses are broken down as follows: 21 bits unused, 10 bits to index the root level of the tree, 10 bits to index the middle level of the tree, 10 bits to index the leaf level of the tree, and 13 bits that pass through to the physical address without modification. Full read/write/execute permission bits are supported.

Sun 1

The original Sun 1 was a single-board computer built around the Motorola 68000
Motorola 68000

The Motorola 68000 is a 16/32-bit Complex instruction set computer microprocessor core designed and marketed by Freescale Semiconductor ....
 microprocessor
Microprocessor

A microprocessor incorporates most or all of the functions of a central processing unit on a single integrated circuit . The first microprocessors emerged in the early 1970s and were used for electronic calculators, using Binary-coded decimal arithmetic on 4-bit Word ....
 and introduced in 1982. It included the original Sun 1 Memory Management Unit, that provided address translation, memory protection, memory sharing and memory allocation for multiple processes running on the CPU. All access of the CPU to private on-board RAM, external Multibus memory, on-board I/O and the Multibus I/O ran through the MMU where they were translated and protected in uniform fashion. The MMU was implemented in hardware on the CPU board.

The MMU consisted of a context register, a segment map and a page map. Virtual addresses from the CPU were translated into intermediate addresses by the segment map, which in turn were translated into physical addresses by the page map. The page size was 2 KiB and the segment size was 32 KiB which gave 16 pages per segment. Up to 16 contexts could be mapped concurrently. The maximum logical address space for a context was 1024 pages or 2 MiB. The maximum physical address that could be mapped simultaneously was also 2 MiB.

The context register was important in a multitasking operating system because it allowed the CPU to switch between processes without reloading all the translation state information. The 4-bit context register could switch between 16 sections of the segment map under supervisor control which allowed 16 contexts to be mapped concurrently. Each context had its own virtual address space
Virtual address space

Virtual address space is a memory mapping mechanism available in modern operating systems such as OpenVMS, UNIX, Linux, and Windows NT. This is beneficial for different purposes, one is security through process isolation....
. Sharing of virtual address space and inter-context communications could be provided by writing the same values in to the segment or page maps of different contexts. Additional contexts could be handled by treating the segment map as a context cache and replacing out-of-date contexts on a least-recently-used basis.

The context register made no distinction between user and supervisor states; interrupts and traps did not switch contexts which required that all valid interrupt vectors always be mapped in page 0 of context, as well as the valid Supervisor Stack.

PowerPC

In PowerPC
PowerPC

PowerPC is a RISC instruction set architecture created by the 1991 Apple Inc.?IBM?Motorola alliance, known as AIM alliance. Originally intended for personal computers, PowerPC CPUs have since become popular embedded system and high-performance processors....
 G1, G2, G3, and G4, pages are normally 4 KiB. After a TLB miss, the standard PowerPC MMU begins two simultaneous lookups. One lookup attempts to match the address with one of 4 or 8 Data Block Address Translation (DBAT) registers, or 4 or 8 Instruction Block Address Translation registers (IBAT) as appropriate. The BAT registers can map linear chunks of memory as large as 256 MiB
MIB

MIB may refer to any of several concepts:* Management Information Base, a computing information repository used by Simple Network Management Protocol...
, and are normally used by an OS to map large portions of the address space for the OS kernel's own use. If the BAT lookup succeeds, the other lookup is halted and ignored.

The other lookup, not directly supported by all processors in this family, is via a so-called "inverted page table
Page table

A page table is the data structure used by a virtual memory system in a computer operating system to store the mapping between virtual addresses and physical addresses....
" which acts as a hashed off-chip extension of the TLB. First, the top 4 bits of the address are used to select one of 16 segment registers. 24 bits from the segment register replace those 4 bits, producing a 52-bit address. The use of segment registers allows multiple processes to share the same hash table. The 52-bit address is hashed, then used as an index into the off-chip table. There, a group of 8 page table entries is scanned for one that matches. If none match due to excessive hash collision
Hash collision

In computer science, a hash collision or hash clash is a situation that occurs when two distinct inputs into a hash function produce identical outputs....
s, the processor tries again with a slightly different hash function
Hash function

A hash function is any algorithm or function which converts a large, possibly variable-sized amount of data into a small datum, usually a single integer that may serve as an array index into an array....
. If this too fails, the CPU traps into the OS
OS

The os is the external orifice of the uterus; it is the opening at the tip of the cervix which separates the uterus from the vagina.Os may also refer to:...
 (with MMU disabled) so that the problem may be resolved. The OS needs to discard an entry from the hash table to make space for a new entry. The OS may generate the new entry from a more-normal tree-like page table or from per-mapping data structures which are likely to be slower and more space-efficient. Support for no-execute
NX bit

The NX bit, which stands for No eXecute, is a technology used in CPUs to segregate areas of memory for use by either storage of processor instructions or for storage of data, a feature normally only found in Harvard architecture processors....
 control is in the segment registers, leading to 256-MiB granularity.

A major problem with this design is poor cache locality caused by the hash function. Tree-based designs avoid this by placing the page table entries for adjacent pages in adjacent locations. An operating system running on the PowerPC may minimize the size of the hash table to reduce this problem.

It is also somewhat slow to remove the page table entries of a process; the OS may avoid reusing segment values to delay facing this or it may elect to suffer the waste of memory associated with per-process hash tables. G1 chips do not search for page table entries, but they do generate the hash with the expectation that an OS will search the standard hash table via software. The OS can write to the TLB. G2, G3, and early G4 chips use hardware to search the hash table. The latest chips allow the OS to choose either method. On chips that make this optional or do not support it at all, the OS may choose to use a tree-based page table exclusively.

x86

The x86 architecture has evolved over a long time while maintaining full software compatibility even for OS code. Thus the MMU is extremely complex, with many different possible operating modes. Normal operation of the traditional 80386 CPU and its successors is described here.

The CPU primarily divides memory into 4 KiB pages. Segment registers, fundamental to the older 8088
Intel 8088

The Intel 8088 is an Intel x86 microprocessor based on the Intel 8086, with 16-bit registers and an 8-bit external data bus. It can address up to 1 megabyte of random access memory....
 and 80286 MMU designs, are avoided as much as possible by modern OSes. There is one major exception to this: access to thread
Thread (computer science)

In computer science, a thread of execution is a Fork of a computer program into two or more Concurrency running task s. The implementation of threads and process es differs from one operating system to another, but in most cases, a thread is contained inside a process....
-specific data for applications or CPU-specific data for OS kernels, which is done with explicit use of the FS and GS segment registers. All memory access involves a segment register, chosen according to the code being executed. The segment register acts as an index into a table, which provides an offset to be added to the virtual address. Except when using FS or GS as described above, the OS ensures that the offset will be zero. After the offset is added, the address is masked to be no larger than 32 bits. The result may be looked up via a tree-structured page table, with the bits of the address being split as follows: 10 bits for the root of the tree, 10 bits for the leaves of the tree, and the 12 lowest bits being directly copied to the result.

Minor revisions of the MMU introduced with the Pentium
Pentium

Introduced on March 22, 1993, the original Pentium was the first superscalar x86 architecture microprocessor. Its fifth-generation x86 microarchitecture was a direct extension of the 80486 architecture with dual integer pipeline s, a faster FPU unit, wider data bus, and features for further reduced address calculation latency....
 have allowed very large 2 MiB or 4 MiB pages by skipping the bottom level of the tree. Minor revisions of the MMU introduced with the Pentium Pro
Pentium Pro

The Pentium Pro is a sixth-generation x86-based microprocessor developed and manufactured by Intel introduced in November 1995. It introduced the Intel P6 and was originally intended to replace the original Pentium in a full range of applications....
 have allowed 36-bit physical addresses with the Physical Address Extension
Physical Address Extension

In computing, Physical Address Extension is a feature of x86 and x86-64 processors that enable the use of more than 4 gigabytes of physical memory to be used in 32-bit systems, given appropriate operating system support....
 (PAE) feature and have allowed specification of cacheability by looking up a few high bits in a small on-CPU table.

No-execute
NX bit

The NX bit, which stands for No eXecute, is a technology used in CPUs to segregate areas of memory for use by either storage of processor instructions or for storage of data, a feature normally only found in Harvard architecture processors....
 support was originally only provided on a per-segment basis, making it very awkward to use. More recent x86 chips provide a per-page no-execute bit in the PAE mode. PaX
Pax

Pax may refer to:* the Latin language word for peace, used in phrases such as Pax Romana ; also, its personification, Pax , goddess of peace in Roman mythology...
 is one way to emulate per-page non-execute support via the segments, with a performance loss and halving the available address space.

x86-64

x86-64
X86-64

x86-64 is a superset of the x86. x86-64 Central processing units can run existing 32-bit or 16-bit x86 programs at full speed, but also support new programs written with a 64-bit address space and other additional capabilities....
 is a 64-bit extension of x86, that uses the long mode
Long mode

In the x86-64 computer architecture, long mode is the mode where a 64-bit application can access the 64-bit instructions and registers, while 32-bit and 16-bit programs are executed in a compatibility sub-mode....
. In long mode, all segment offsets are ignored, except FS and GS. The page table tree has four levels, instead of three. The virtual addresses are divided up as follows: 16 bits unused, 9 bits each for 4 tree levels (total: 36 bits), and the 12 lowest bits unmodified. The 16 highest bits are required to be equal to 48th bit, or in the other words, the low 48 bits are sign extended
Sign extension

Sign extension is the operation, in computer arithmetic, of increasing the number of bits of a binary number while preserving the number's negative and non-negative numbers....
 to the higher bits. This is done to allow a future expansion of the addressable range, without compromising backwards compatibility.

In the page table, the highest bit is a per-page no-execute
NX bit

The NX bit, which stands for No eXecute, is a technology used in CPUs to segregate areas of memory for use by either storage of processor instructions or for storage of data, a feature normally only found in Harvard architecture processors....
 bit.

Unisys MCP Systems (Burroughs B5000)


Tanenbaum et al., recently stated that the B5000 (and descendant systems) have no MMU. To understand the functionality provided by an MMU, it is instructive to study a counter example of a system that achieves this functionality by other means.

The B5000 was the first commercial system to support virtual memory after the Atlas. It provides the two functions of an MMU in different ways. Firstly, the mapping of virtual memory addresses. Instead of needing an MMU, the MCP systems are descriptor based. Each allocated memory block is given a master descriptor with the properties of the block, ie., the size, address, and whether present in memory. When a request is made to access the block for reading or writing, the hardware checks its presence via the presence bit (pbit) in the descriptor.

A pbit of 1 indicates the presence of the block. In this case the block can be accessed via the physical address in the descriptor. If the pbit is zero, an interrupt is generated for the MCP (operating system) to make the block present. If the address field is zero, this is the first access to this block and it is allocated (an init pbit). If the address field is non-zero, it is a disk address of the block, which has previously been rolled out, so the block is fetched from disk and the pbit is set to 1 and the physical memory address updated to point to the block in memory (an other pbit). This makes descriptors equivalent to a page-table entry in an MMU system. System performance can be monitored through the number of pbits. Init pbits indicate initial allocations, but a high level of other pbits indicate that the system may be thrashing.

Note that all memory allocation is therefore completely automatic (one of the features of modern systems) and there is no way to allocate blocks other than this mechanism. There are no such calls as malloc or dealloc, since memory blocks are also automatically discarded. The scheme is also lazy, since a block will not be allocated until it is actually referenced. When memory is near full, the MCP examines the working set, trying compaction (since the system is segmented, not paged), deallocating read-only segments (such as code-segments which can be restored from their original copy), and as a last resort, rolling dirty data segments out to disk.

Secondly, protection. Since all accesses are via the descriptor the hardware can check all accesses are within bounds, and in the case of a write that the process has write permission. The MCP system is inherently secure and thus has no need of an MMU to provide this level of memory protection. Descriptors are read only to user processes and may only be updated by the system (hardware or MCP). (Descriptors have a tag of 5 and odd-tagged words are read only – code words have a tag of 3.)

Blocks can be shared between processes via copy descriptors in the process stack – thus some processes may have write permission, whereas others not. A code segment is read only, thus reentrant and shared between processes. Copy descriptors contain a 20-bit address field giving index of the master descriptor in the master descriptor array. This also implements a very efficient and secure IPC mechanism. Blocks can easily be relocated since only the master descriptor needs update when a block's status changes.

The only other aspect is performance – do MMU- or non-MMU-based systems provide better performance? MCP systems may be implemented on top of standard hardware that does have an MMU (eg., a standard PC). Even if the system implementation uses the MMU in some way, this will not be at all visible at the MCP level.

See also

  • Memory management
    Memory management

    Memory management is the act of managing computer memory. In its simpler forms, this involves providing ways to allocate portions of memory to programs at their request, and freeing it for reuse when no longer needed....
  • Memory segmentation
  • Virtual memory
    Virtual memory

    Virtual memory is a computer system technique which gives an application program the impression that it has contiguous working memory , while in fact it may be physically fragmented and may even overflow on to disk storage....