Stack machine - AbsoluteAstronomy.com

A stack machine may be

A real or emulated computer that evaluates each sub-expression of a program statement via a pushdown data stack
Stack (data structure)
In computer science, a stack is a last in, first out abstract data type and linear data structure. A stack can have any abstract data type as an element, but is characterized by only three fundamental operations: push, pop and stack top. The push operation adds a new item to the top of the stack,...

and uses a reverse Polish notation
Reverse Polish notation
Reverse Polish notation is a mathematical notation wherein every operator follows all of its operands, in contrast to Polish notation, which puts the operator in the prefix position. It is also known as Postfix notation and is parenthesis-free as long as operator arities are fixed...

instruction set
Instruction set
An instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...

.
Any computer that uses a call stack
Call stack
In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program. This kind of stack is also known as an execution stack, control stack, run-time stack, or machine stack, and is often shortened to just "the stack"...

in memory to manage the local variables of nested calls. (Nearly all computers and programming languages do this now.)
In automata
Automata
Automata is the plural form of automaton, a self-operating machine. It may also refer to:* "Automata", a short story by E. T. A. Hoffmann* "Automata", a hardboiled science fiction crime series by Penny Arcade...

theory, a rudimentary computer model lacking regular random-access memory
Random-access memory
Random access memory is a form of computer data storage. Today, it takes the form of integrated circuits that allow stored data to be accessed in any order with a worst case performance of constant time. Strictly speaking, modern types of DRAM are therefore not random access, as data is read in...

.

Practical Expression-Stack Machines

A stack machine implements a stack with registers. The operands of the ALU are always the top two registers of the stack and the result from the ALU is stored in the top register of the stack.
'Stack machine' commonly refers to computers which use a Last-in, First-out stack

Stack (data structure)

In computer science, a stack is a last in, first out abstract data type and linear data structure. A stack can have any abstract data type as an element, but is characterized by only three fundamental operations: push, pop and stack top. The push operation adds a new item to the top of the stack,...

to hold short-lived temporary values while executing individual program statements. The instruction set

Instruction set

An instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...

carries out most ALU actions with postfix (Reverse Polish notation

Reverse Polish notation

Reverse Polish notation is a mathematical notation wherein every operator follows all of its operands, in contrast to Polish notation, which puts the operator in the prefix position. It is also known as Postfix notation and is parenthesis-free as long as operator arities are fixed...

) operations that work only on the expression stack, not on data registers or main memory cells.

For a typical instruction like Add, both operands implicitly come from the topmost (most recent) values of the stack, and those two values get replaced by the result of the Add. The instruction's operands are 'popped' off the stack, and its result(s) are then 'pushed' back onto the stack, ready for the next instruction. Most stack instructions are encoded as just an opcode, with no additional fields to specify a register number or memory address or literal constant. This encoding is easily extended to richer operations with more than two inputs or more than one result. Integer constant operands are pushed by separate Load Immediate instructions. All accessing of program variables in main memory RAM is segregated into separate Load or Store instructions containing one memory address or some way to calculate that address from stacked operands.

The stack machine style is in contrast to register file machines which hold temporary values in a small fast visible array of similar registers, or accumulator machines which have only one visible general-purpose temp register, or memory-to-memory machines which have no visible temp registers.

Some machines have a stack of very limited size, implemented as a register file and a dynamic register renumbering scheme. Some machines have a stack of unlimited size, implemented as an array in RAM accessed by a 'top of stack' address register. Its topmost N values may be cached by invisible data registers for speed.

A few machines have both an expression stack in memory and a separate visible register stack.

Stack machines may have their expression stack and their call-return stack

Call stack

In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program. This kind of stack is also known as an execution stack, control stack, run-time stack, or machine stack, and is often shortened to just "the stack"...

as separate things or as one integrated structure.

Some technical handheld calculators use reverse Polish notation

Reverse Polish notation

in their keyboard interface, instead of having parenthesis keys. This is a form of stack machine. The Plus key relies on its two operands already being at the correct topmost positions of the user-visible stack.

Very Compact Object Code

Stack machines have much smaller instructions than the other styles of machines. But operand loads are separate and so stack code requires roughly twice as many instructions as the equivalent code for register machine

In mathematical logic and theoretical computer science a register machine is a generic class of abstract machines used in a manner similar to a Turing machine...

s. The total code size (in bytes) is still less for stack machines.

In stack machine code, the most frequent instructions consist of just an opcode and can easily fit in 6 bits or less. Branches, load immediates, and load/store instructions require an argument field, but stack machines often arrange that the frequent cases of these still fit together with thin opcode into a single byte or syllable. The selection of operands from prior results is done implicitly by the ordering of the stack ops. In contrast, register machines require two or three register-number fields per ALU instruction to select its operands; the densest register machines average about 16 bits per instruction.

The instructions for accumulator or memory-to-memory machines are not padded out with multiple register fields. Instead, they use compiler-managed anonymous variables for subexpression values. These temps require extra memory reference instructions which take more code space than for the stack machine.

All stack machines have variants of the load/store opcodes for accessing local variables and formal parameters without explicit address calculations. This can be by offsets from the current top-of-stack address, or by offsets from a stable frame-base register. Register machines handle this with a register+offset address mode, but use a wider offset field.

Dense machine code was very valuable in the 1960s, when main memory was very expensive and very limited even on mainframes. It became important again on the initially-tiny memories of minicomputers and then microprocessors. Density remains important today, for smart phone apps and for Java apps downloaded into browsers over slow internet connections. Density also improves the effectiveness of caches and instruction prefetch cycles. Density allows smaller ROMs in embedded applications. But extreme density often comes with compromised program performance. And extreme performance often requires code that is several times bigger than stack code.

Some of the density of Burroughs B6700 code was due to moving vital operand information elsewhere, to 'tags' on every data word or into tables of pointers. The Add instruction itself was generic or polymorphic. It had to fetch the operand to discover whether this was an integer add or floating point add. The Load instruction could find itself tripping on an indirect address, or worse, a disguised call to a call-by-name thunk routine. The generic opcodes required fewer opcode bits but made the hardware more like an interpreter, with less opportunity to pipeline the common cases.

Simple Compilers

Compilers for stack machines are simpler and quicker to build than compilers for other machines. Code generation is trivial and independent of prior or subsequent code. It can be easily integrated into the parsing pass. No register management is needed, and no optimizations for constants or repeated simple memory references are needed (or even allowed). The same opcode that handles the frequent common case of Add, Indexed Load, or Function Call will also handle the general case involving complex subexpressions and nested calls. The compiler and the machine need not deal separately with corner cases.

This simplicity has allowed compilers to fit onto very small machines. The simple compilers allowed new product lines to get to market quickly, and allowed new operating systems to be written entirely in a new high level language rather than in assembly. The UCSD p-System

UCSD Pascal

UCSD Pascal was a Pascal programming language system that ran on the UCSD p-System, a portable, highly machine-independent operating system. UCSD Pascal was first released in 1978...

supported a complete student programming environment on early 8-bit microprocessors with poor instruction sets and little RAM, by compiling to a virtual stack machine rather than to the actual hardware.

The downside to the simplicity of compilers for stack machines, is that pure stack machines have not benefited much from subsequent advancements in compiler optimizer technology. However optimisation of compiled stack code is quite possible. Back-end optimisation of compiler output has been demonstrated to significantly improve code, and potentially performance, whilst global optimisation within the compiler itself achieves further gains.

Simple Interpreters

Some stack machine instruction sets are intended for interpretive execution of a virtual machine, rather than driving hardware directly. Interpreters for virtual stack machines are easier to build than interpreters for register or memory-to-memory machines; the logic for handling memory address modes is in just one place rather than repeated in many instructions. Stack machines also tend to have fewer variations of an opcode; one generalized opcode will handle both frequent cases and obscure corner cases of memory references or function call setup. (But code density is often improved by adding short and long forms for the same operation.)

Minimal Processor State

A machine with an expression stack can get by with just two visible registers, the top-of-stack address and the next-instruction address. The minimal hardware implementation has few bits of flipflops or registers. Faster implementations buffer the topmost N stack cells into invisible temp registers to reduce memory stack cycles.

Responding to an interrupt involves pushing the visible registers and branching to the interrupt handler. This is faster than storing most or all of the visible registers of a register machine, giving a quicker response to the interrupt. Some register machines deal with this by having multiple register files that can be instantly swapped but this increases its costs and slows down the register file.

More Memory References

Some in the industry believe that stack machines execute more data cache cycles for temporary values and local variables than do register machines. However, see for a counter-argument which disproves many of the commonly held fallacies behind such arguments. The reader would do well to understand the conditions under which a stack machine can or cannot get away with pure stack versus mixed stack/memory access patterns.

On stack machines, temporary values often get spilled into memory, whereas on machines with many registers these temps usually remain in registers. (However, these values often need to be spilled into "activation frames" at the end of a procedure's definition, basic block, or at the very least, into a memory buffer during interrupt processing). Values spilled to memory add more cache cycles. This spilling effect depends on the number of hidden registers used to buffer top-of-stack values, upon the frequency of nested procedure calls, and upon host computer interrupt processing rates.

Some simple stack machines or stack interpreters use no top-of-stack hardware registers. Those minimal implementations are always slower than standard register machines. A typical expression like X+1 compiles to 'Load X; Load 1; Add'. This does implicit writes and reads of the memory stack which weren't needed:

Load X, push to memory
Load 1, push to memory
Pop 2 values from memory, add, and push result to memory

for a total of 5 data cache references.

The next step up from this is a stack machine or interpreter with a single top-of-stack register. The above code then does:

Load X into empty TOS register (if hardware machine), or
Push TOS register to memory, Load X into TOS register (if interpreter)
Push TOS register to memory, Load 1 into TOS register
Pop left operand from memory, add to TOS register and leave it there

for a total of 5 data cache references, worst-case. Generally, interpreters don't track emptiness, because they don't have to -- anything below the stack pointer is a non-empty value, and the TOS cache register is always kept hot. Typical Java interpreters do not buffer the top-of-stack this way, however, because the program and stack have a mix of short and wide data values.

If the hardwired stack machine has N registers to cache the topmost memory stack words, then all spills and refills are avoided in this example and there is only 1 data cache cycle, the same as for a register or accumulator machine.

On register machines using optimizing compilers, it is very common for the most-used local variables to live in registers rather than in stack frame memory cells. This eliminates all data cache cycles for reading and writing those values, except for their initial load and final store upon procedure termination. The development of 'stack scheduling' for performing live-variable analysis, and thus retaining key variables on the stack for extended periods, goes a long way to answering this concern.

On the other hand, register machines must spill many of their registers to memory across nested procedure calls. The decision of which registers to spill, and when, is made statically at compile time rather than on the dynamic depth of the calls. This can lead to more data cache traffic than in an advanced stack machine implementation.

Factoring Out Common Subexpressions Has High Cost

In register machines, a subexpression which is used multiple times with the same result value can be evaluated just once and its result saved in a fast register. The subsequent reuses have no time or code cost, just a register reference that would have happened anyhow. This optimization wins for common simple expressions (e.g. loading variable X or pointer P) as well as less-common complex expressions.

With stack machines, in contrast, the results of a subexpression can be stored in one of two ways. The first way involves a temporary variable in memory. Storing and subsequent retrievals cost additional instructions and additional data cache cycles. Doing this is only a win if the subexpression computation costs more in time than fetching from memory, which in most stack CPUs, almost always is the case. It is never worthwhile for simple variables and pointer fetches, because those already have the same cost of one data cache cycle per access. It is only marginally worthwhile for expressions like X+1. These simpler expressions make up the majority of redundant, optimizable expressions in programs written in non-concatenative languages. An optimizing compiler can only win on redundancies that the programmer could have avoided in the source code.

The second way involves just leaving a computed value on the data stack, and duplicating it on an as-needed basis. This requires some amount of stack permutation, at the very least, an instruction to duplicate the results. This approach wins only if you can keep your data stack depth shallow enough for a "DUP", "ROT", or "OVER" type of instruction to gain access to the desired computed value. Some virtual machines support a general purpose permutation primitive, "PICK," which allows one to select arbitrarily any item in the data stack for duplication. Despite how limiting this approach sounds, hand-written stack code tends to make extensive use of this approach, resulting in software with runtime overheads comparable to those of general-purpose register-register architectures. Unfortunately, algorithms for optimal "stack scheduling" of values aren't known to exist in general, making such stack optimizations difficult to impossible to automate for non-concatenative programming languages.

As a result, it is very common for compilers for stack machines to never bother applying code-factoring optimizations. It is too much trouble, despite the significant payoff.

Rigid Code Order

In modern machines, the time to fetch a variable from the data cache is often several times longer than the time needed for basic ALU operations. A program runs faster without stalls if its memory loads can be started several cycles before the instruction which needs that variable, while also working on independent instructions. Complex machines can do that with a deep pipeline and "out-of-order execution" that examines and runs many instructions at once. Register machines can also do this with much simpler "in-order" hardware, a shallow pipeline, and slightly smarter compilers.
The load step becomes a separate instruction, and that instruction is statically scheduled much earlier in the code sequence. The compiler puts independent steps in between.

This scheduling trick requires explicit, spare registers. It is not possible on stack machines without exposing some aspect of the micro-architecture to the programmer. For the expression A-B, the right operand must be evaluated and pushed immediately prior to the Minus step. Without stack permutation or hardware multithreading, relatively little useful code can be put in between while waiting for the Load B to finish. Stack machines can work around the memory delay by either having a deep out-of-order execution pipeline covering many instructions at once, or more likely, they can permute the stack such that they can work on other workloads while the load completes, or they can interlace the execution of different program threads, as in the Unisys A9 system. Today's increasingly parallel computational loads suggests, however, this might not be the disadvantage it's been made out to be in the past.

Hides a Faster Register Machine Inside

Some simple stack machines have a chip design which is fully customized all the way down to the level of individual registers. The top of stack address register and the N top of stack data buffers are built from separate individual register circuits, with separate adders and ad hoc connections.

However, most stack machines are built from larger circuit components where the N data buffers are stored together within a register file and share read/write buses. The decoded stack instructions are mapped into one or more sequential actions on that hidden register file. Loads and ALU ops act on a few topmost registers, and implicit spills and fills act on the bottommost registers. The decoder allows the instruction stream to be compact. But if the code stream instead had explicit register-select fields which directly manipulated the underlying register file, the compiler could make better use of all registers and the program would run faster.

Microprogrammed

Microcode

Microcode is a layer of hardware-level instructions and/or data structures involved in the implementation of higher level machine code instructions in many computers and other processors; it resides in special high-speed memory and translates machine instructions into sequences of detailed...

stack machines are an example of this. The inner microcode engine is some kind of RISC-like register machine or a VLIW-like machine using multiple register files. When controlled directly by task-specific microcode, that engine gets much more work completed per cycle than when controlled indirectly by equivalent stack code for that same task.

The object code translators for the HP 3000

HP 3000

The HP 3000 series is a family of minicomputers released by Hewlett-Packard in 1973. It was designed to be the first minicomputer delivered with a full featured operating system with time-sharing. The first models were withdrawn from the market until speed improvements could be made. It ultimately...

and Tandem

Tandem Computers

Tandem Computers, Inc. was the dominant manufacturer of fault-tolerant computer systems for ATM networks, banks, stock exchanges, telephone switching centers, and other similar commercial transaction processing applications requiring maximum uptime and zero data loss. The company was founded in...

T/16 are another example.
They translated stack code sequences into equivalent sequences of RISC code. Minor 'local' optimizations removed much of the overhead of a stack architecture. Spare registers were used to factor out repeated address calculations. The translated code still retained plenty of emulation overhead from the mismatch between original and target machines. Despite that burden, the cycle efficiency of the translated code matched the cycle efficiency of the original stack code. And when the source code was recompiled directly to the register machine via optimizing compilers, the efficiency doubled. This shows that the stack architecture and its non-optimizing compilers were wasting over half of the power of the underlying hardware.

Register files are good tools for computing because they have high bandwidth and very low latency, compared to memory references via data caches. In a simple machine, the register file allows reading two independent registers and writing of a third, all in one ALU cycle with one-cycle or less latency. Whereas the corresponding data cache can start only one read or one write (not both) per cycle, and the read typically has a latency of two ALU cycles. That's one third of the throughput at twice the pipeline delay. In a complex machine like Athlon

Athlon

Athlon is the brand name applied to a series of x86-compatible microprocessors designed and manufactured by Advanced Micro Devices . The original Athlon was the first seventh-generation x86 processor and, in a first, retained the initial performance lead it had over Intel's competing processors...

that completes two or more instructions per cycle, the register file allows reading of four or more independent registers and writing of two others, all in one ALU cycle with one-cycle latency. Whereas the corresponding dual-ported data cache can start only two reads or writes per cycle, with multiple cycles of latency. Again, that's one third of the throughput of registers. It is very expensive to build a cache with additional ports.

More Instructions, Slower Interpreters

Interpreters for virtual stack machines are often slower than interpreters for other styles of virtual machine. This slowdown is worst when running on host machines with deep execution pipelines, such as current x86 chips.

A program has to execute more instructions when compiled to a stack machine than when compiled to a register machine or memory-to-memory machine. Every variable load or constant requires its own separate Load instruction, instead of being bundled within the instruction which uses that value. The separated instructions may be simple and faster running, but the total instruction count is still higher.

In every interpreter, the interpreter must execute a N-way switch jump to decode the next opcode and branch to its steps for that particular opcode. The host machine's prefetch mechanisms are unable to predict and fetch the target of that indexed or indirect jump. So the host machine's execution pipeline must restart each time the hosted interpreter decodes another virtual instruction. This happens more often for virtual stack machines than for other styles of virtual machine.

Android's Dalvik virtual machine for Java uses a virtual-register 16-bit instruction set instead of Java's usual 8-bit stack code, to minimize instruction count and opcode dispatch stalls. Arithmetic instructions directly fetch or store local variables via 4-bit (or larger) instruction fields.

Hybrid Machines

Pure stack machines are quite inefficient for procedures which access multiple fields from the same object. The stack machine code must reload the object pointer for each pointer+offset calculation. A common fix for this is to add some register-machine features to the stack machine: a visible register file dedicated to holding addresses, and register-style instructions for doing loads and simple address calculations. It is uncommon to have the registers be fully general purpose, because then there is no strong reason to have an expression stack and postfix instructions.

Another common hybrid is to start with a register machine architecture, and add another memory address mode which emulates the push or pop operations of stack machines: 'memaddress = reg; reg += instr.displ'. This was first used in DEC

Digital Equipment Corporation

Digital Equipment Corporation was a major American company in the computer industry and a leading vendor of computer systems, software and peripherals from the 1960s to the 1990s...

's PDP-11

PDP-11

The PDP-11 was a series of 16-bit minicomputers sold by Digital Equipment Corporation from 1970 into the 1990s, one of a succession of products in the PDP series. The PDP-11 replaced the PDP-8 in many real-time applications, although both product lines lived in parallel for more than 10 years...

minicomputer. This feature was carried forward in VAX

VAX

VAX was an instruction set architecture developed by Digital Equipment Corporation in the mid-1970s. A 32-bit complex instruction set computer ISA, it was designed to extend or replace DEC's various Programmed Data Processor ISAs...

computers and in Motorola 6800

Motorola 6800

The 6800 was an 8-bit microprocessor designed and first manufactured by Motorola in 1974. The MC6800 microprocessor was part of the M6800 Microcomputer System that also included serial and parallel interface ICs, RAM, ROM and other support chips...

and M68000

Motorola 68000

The Motorola 68000 is a 16/32-bit CISC microprocessor core designed and marketed by Freescale Semiconductor...

microprocessors. This allowed the use of simpler stack methods in early compilers. It also efficiently supported virtual machines using stack interpreters or threaded code

Threaded code

In computer science, the term threaded code refers to a compiler implementation technique where the generated code has a form that essentially consists entirely of calls to subroutines...

. However, this feature did not help the register machine's own code to become as compact as pure stack machine code. And the execution speed was less than when compiling well to the register architecture. It is faster to change the top-of-stack pointer only occasionally (once per call or return) rather than constantly stepping it up and down throughout each program statement. And even faster to avoid memory references entirely.

More recently, so-called second-generation stack machines have adopted a dedicated collection of registers to serve as address registers, off-loading the task of memory addressing from the data stack. For example, MuP21 relies on a register called "A", while the more recent GreenArrays processors relies on two registers: A and B.

The Intel x86 family of microprocessors have a register-style instruction set for most operations, but use stack instructions for its oldest x87

X87

x87 is a floating point-related subset of the x86 architecture instruction set. It originated as an extension of the 8086 instruction set in the form of optional floating point coprocessors that worked in tandem with corresponding x86 CPUs. These microchips had names ending in "87"...

form of floating point arithmetic.

Commercial stack machines

Examples of stack instruction sets directly executed in hardware include

GA144 multi-computer chip by GreenArrays, Inc., http://greenarrays.com/
the Burroughs large systems architecture (since 1961)
the English Electric KDF9
English Electric KDF9
KDF9 was an early British computer designed and built by English Electric, later English Electric Leo Marconi, EELM, later still incorporated into ICL. It first came into service in 1964 and was still in use in 1980 in at least one installation...

machine. Introduced 1964, the KDF9 had a 16-deep pushdown stack of arithmetic registers, and a similar stack for subroutine return addresses
the Collins Radio Collins Adaptive Processing System minicomputer (CAPS, since 1969) and Rockwell Collins
Rockwell Collins
Rockwell Collins, Inc. is a large United States-based international company headquartered in Cedar Rapids, Iowa, primarily providing aviation and information technology systems and services to governmental agencies and aircraft manufacturers.- History :...

Advanced Architecture Microprocessor (AAMP, since 1981).
the UCSD Pascal
UCSD Pascal
UCSD Pascal was a Pascal programming language system that ran on the UCSD p-System, a portable, highly machine-independent operating system. UCSD Pascal was first released in 1978...

p-machine (as the Pascal MicroEngine
Pascal MicroEngine
The Pascal MicroEngine was a series of microcomputer products manufactured by Western Digital from 1979 through the mid 1980s, designed specifically to efficiently run the UCSD p-System...

and many others)
MU5
Manchester computers
The Manchester computers were an innovative series of stored-program electronic computers developed during the 30-year period between 1947 and 1977 by a small team at the University of Manchester, under the leadership of Tom Kilburn...

and ICL 2900 Series
ICL 2900 Series
The ICL 2900 Series was a range of mainframe computer systems announced by the UK manufacturer ICL on 9 October 1974. The company had started development, under the name "New Range" immediately on its formation in 1968...

. Hybrid stack and accumulator machines. The accumulator register buffered the memory stack's top data value. Variants of load and store opcodes controlled when that register was spilled to the memory stack or reloaded from there.
HP 3000
HP 3000
The HP 3000 series is a family of minicomputers released by Hewlett-Packard in 1973. It was designed to be the first minicomputer delivered with a full featured operating system with time-sharing. The first models were withdrawn from the market until speed improvements could be made. It ultimately...

(Classic, not PA-RISC)
Tandem Computers
Tandem Computers
Tandem Computers, Inc. was the dominant manufacturer of fault-tolerant computer systems for ATM networks, banks, stock exchanges, telephone switching centers, and other similar commercial transaction processing applications requiring maximum uptime and zero data loss. The company was founded in...

T/16. Like HP 3000, except that compilers, not microcode, controlled when the register stack spilled to the memory stack or was refilled from the memory stack.
the Atmel
Atmel
Atmel Corporation is a manufacturer of semiconductors, founded in 1984. Its focus is on system-level solutions built around flash microcontrollers...

MARC4 microcontroller
Microcontroller
A microcontroller is a small computer on a single integrated circuit containing a processor core, memory, and programmable input/output peripherals. Program memory in the form of NOR flash or OTP ROM is also often included on chip, as well as a typically small amount of RAM...
Several "Forth chips" such as the RTX2000, the RTX2010
RTX2010
The RTX2010 radiation-hardened microprocessor manufactured by Intersil is a radiation hardened stack machine microprocessor which has been used in numerous spacecraft.-Characteristics:...

, the F21 and the PSC1000
The 4stack processor by Bernd Paysan has four stacks.
Patriot Scientific's Ignite stack machine designed by Charles H. Moore
Charles H. Moore
Charles H. Moore is the inventor of the Forth programming language.- Biography :In 1968, while employed at the United States National Radio Astronomy Observatory , Moore invented the initial version of the Forth language to help control radio telescopes...

holds a leading functional density benchmark.
Saab Ericsson Space
Saab Ericsson Space
Saab Microwave systems is a Swedish company which was founded in 1956 as Ericsson Microwave Systems. The business was acquired by Saab and renamed in 2006...

Thor radiation hardened microprocessor
Inmos
INMOS
Inmos Limited was a British semiconductor company, founded by Iann Barron, with both the head office and the design office at Aztec West in Bristol, it was incorporated in November 1978.- Products :...

transputers.

Virtual stack machines

Examples of virtual stack machines interpreted in software:

the UCSD Pascal
UCSD Pascal
UCSD Pascal was a Pascal programming language system that ran on the UCSD p-System, a portable, highly machine-independent operating system. UCSD Pascal was first released in 1978...

p-machine (which closely resembled Burroughs)
the Java virtual machine
Java Virtual Machine
A Java virtual machine is a virtual machine capable of executing Java bytecode. It is the code execution component of the Java software platform. Sun Microsystems stated that there are over 4.5 billion JVM-enabled devices.-Overview:...

instruction set
the VES (Virtual Execution System
Virtual Execution System
The Virtual Execution System provides an environment for executing managed code. It provides direct support for a set of built-in data types, defines a hypothetical machine with an associated machine model and state, a set of control flow constructs, and an exception handling model...

) for the CIL (Common Intermediate Language
Common Intermediate Language
Common Intermediate Language is the lowest-level human-readable programming language defined by the Common Language Infrastructure specification and is used by the .NET Framework and Mono...

) instruction set of the ECMA 335 (Microsoft .NET environment)
the Forth programming language, in particular the Forth virtual machine
Adobe's PostScript
PostScript
PostScript is a dynamically typed concatenative programming language created by John Warnock and Charles Geschke in 1982. It is best known for its use as a page description language in the electronic and desktop publishing areas. Adobe PostScript 3 is also the worldwide printing and imaging...
Parakeet programming language
Parakeet programming language
Parakeet is object-oriented stack machine language inspired by Forth for Parrot virtual machine developed by Michel Pelletier.Parakeet is written in PIR and compiles its code directly to PIR....
Sun Microsystems
Sun Microsystems
Sun Microsystems, Inc. was a company that sold :computers, computer components, :computer software, and :information technology services. Sun was founded on February 24, 1982...

' SwapDrop programming language for Sun Ray
Sun Ray
The Sun Ray from Oracle is a stateless thin client solution aimed at corporate environments, originally introduced by Sun Microsystems in September 1999...

smartcard identification
Adobe's AVM2

Computers Using Call Stacks and Stack Frames

Most current computers (of any instruction set style) and most compilers use a large call-return stack

Call stack

in memory to organize the short-lived local variables and return links for all currently active procedures or functions. Each nested call creates a new stack frame in memory, which persists until that call completes. This call-return stack may be entirely managed by the hardware via specialized address registers and special address modes in the instructions. Or it may be merely a set of conventions followed by the compilers, using generic registers and register+offset address modes. Or it may be something in between.

Since this technique is now nearly universal, even on register machines, it is not helpful to refer to all these machines as stack machines. That term is commonly reserved for machines which also use an expression stack and stack-only arithmetic instructions to evaluate the pieces of a single statement.

Computers commonly provide direct, efficient access to the program's global variables and to the local variables of only the current innermost procedure or function, the topmost stack frame. 'Up level' addressing of the contents of callers' stack frames is usually not needed and not supported as directly by the hardware. If needed, compilers support this by passing in frame pointers as additional, hidden parameters.

Some Burroughs stack machines do support up-level refs directly in the hardware, with specialized address modes and a special 'display' register file holding the frame addresses of all outer scopes. No subsequent computer lines have done this in hardware. When Niklaus Wirth

Niklaus Wirth

Niklaus Emil Wirth is a Swiss computer scientist, best known for designing several programming languages, including Pascal, and for pioneering several classic topics in software engineering. In 1984 he won the Turing Award for developing a sequence of innovative computer languages.-Biography:Wirth...

developed the first Pascal

Pascal

Pascal or PASCAL may refer to:-People:* Pascal , a French given name* Pascal , a French and Italian surname* Adam Pascal , American actor and singer, best known for his role of Roger Davis in the Broadway musical Rent* Blaise Pascal , French mathematician and philosopher* Cleo Paskal, environmental...

compiler on a CDC 6000, he found that it was faster overall to pass in the frame pointers as a chain, rather than constantly updating complete arrays of frame pointers. This software method also adds no overhead for common languages like C which lack up-level refs.

The same Burroughs machines also supported nesting of tasks or threads. The task and its creator share the stack frames that existed at the time of task creation, but not the creator's subsequent frames nor the task's own frames. This was supported by a cactus stack

Spaghetti stack

A spaghetti stack in computer science is an N-ary tree data structure in which child nodes have pointers to the parent nodes . When a list of nodes is traversed from a leaf node to the root node by chasing these parent pointers, the structure looks like a linked list stack...

, whose layout diagram resembled the trunk and arms of a Saguaro

Saguaro

The saguaro is a large, tree-sized cactus species in the monotypic genus Carnegiea. It is native to the Sonoran Desert in the U.S. state of Arizona, the Mexican state of Sonora, a small part of Baja California in the San Felipe Desert and an extremely small area of California, U.S...

cactus. Each task had its own memory segment holding its stack and the frames that it owns. The base of this stack is linked to the middle of its creator's stack. In machines with a conventional flat address space, the creator stack and task stacks would be separate heap objects in one heap.

In some programming languages, the outer-scope data environments are not always nested in time. These languages organize their procedure 'activation records' as separate heap objects rather than as stack frames appended to a linear stack.

In simple languages like Forth that lack local variables and naming of parameters, stack frames would contain nothing more than return branch addresses and frame management overhead. So their return stack holds bare return addresses rather than frames. The return stack is separate from the data value stack, to improve the flow of call setup and returns.

Stack Machines in Automata Theory

In the automata theory

Automata theory

In theoretical computer science, automata theory is the study of abstract machines and the computational problems that can be solved using these machines. These abstract machines are called automata...

branch of theoretical computer science

Computer science

Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, a 'stack machine' is a hypothetical rudimentary computer that lacks random-access memory

Random-access memory

Random access memory is a form of computer data storage. Today, it takes the form of integrated circuits that allow stored data to be accessed in any order with a worst case performance of constant time. Strictly speaking, modern types of DRAM are therefore not random access, as data is read in...

but does have one or more pure LIFO stacks for data. This is used in theories about what can be computed. (No practical computers are built this way. In all real computers or useful virtual computers, the bulk of active data memory is physically a large RAM.)

The computer's input is the initial content of stack 1; all the other stacks start empty. Each state of a stack machine is either a read state or a write state; and each state specifies a stack number to read (pop) from, or write (push) onto. In addition, a write state specifies the symbol to write, and the next state to transition to. A read state specifies, for each symbol in the alphabet, what state it would transition to if that symbol were read; in addition, it also specifies what state to transition to if the stack were empty. A stack machine halts when it transitions into a special halting state.

A stack machine with 1 stack is a very weak model of computation

Model of computation

In computability theory and computational complexity theory, a model of computation is the definition of the set of allowable operations used in computation and their respective costs...

. For example, it can be shown that no 1-stack stack machine can recognize the simple language 0ⁿ1ⁿ0ⁿ (a number of 0s followed by the same number of 1s followed by the same number of 0s), via pumping arguments

Pumping lemma

In the theory of formal languages in computability theory, a pumping lemma or pumping argument states that, for a particular language to be a member of a language class, any sufficiently long string in the language contains a section, or sections, that can be removed, or repeated any number of...

. The computational power of 1-stack stack machines is strictly greater than that of finite automata, (as seen by noting that it can recognize but 0ⁿ1ⁿ which cannot be recognized by a finite autotmata). The 1-stack machine is strictly less powerful than that of deterministic pushdown automata

Deterministic pushdown automaton

In automata theory, a pushdown automaton is a finite automaton with an additional stack of symbols; its transitions can take the top symbol on the stack and depend on its value, and they can add new top symbols to the stack....

.

A stack machine with multiple stacks, on the other hand, is equivalent to a Turing machine

Turing machine

A Turing machine is a theoretical device that manipulates symbols on a strip of tape according to a table of rules. Despite its simplicity, a Turing machine can be adapted to simulate the logic of any computer algorithm, and is particularly useful in explaining the functions of a CPU inside a...

. For example, a 2-stack machine can emulate a Turing machine by using one stack for the tape portion to the left of the Turing machine's current head position and the other stack for the portion to the right. And a Turing machine can, with near-infinite amounts of time, emulate any conceivable deterministic computer of any design.

External links

Stack Computers: the new wave book by Philip J. Koopman, Jr. 1989
Homebrew CPU in an FPGA — homebrew stack machine using FPGA
Mark 1 FORTH Computer — homebrew stack machine using discrete logical circuits
Mark 2 FORTH Computer — homebrew stack machine using bitslice/PLD
Second-Generation Stack Computer Architecture — Thesis about the history and design of stack machines.

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.