An assembly language is a
low-level programming languageIn computer science, a low-level programming language is a programming language that provides little or no abstraction from a computer's instruction set architecture. Generally this refers to either machine code or assembly language...
for
computerA computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...
s,
microprocessorA microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...
s,
microcontrollerA microcontroller is a small computer on a single integrated circuit containing a processor core, memory, and programmable input/output peripherals. Program memory in the form of NOR flash or OTP ROM is also often included on chip, as well as a typically small amount of RAM...
s, and other programmable devices. It implements a symbolic representation of the
machine codeMachine code or machine language is a system of impartible instructions executed directly by a computer's central processing unit. Each instruction performs a very specific task, typically either an operation on a unit of data Machine code or machine language is a system of impartible instructions...
s and other constants needed to program a given CPU architecture. This representation is usually defined by the hardware manufacturer, and is based on mnemonics that symbolize processing steps (instructions),
processor registerIn computer architecture, a processor register is a small amount of storage available as part of a CPU or other digital processor. Such registers are addressed by mechanisms other than main memory and can be accessed more quickly...
s, memory locations, and other language features. An assembly language is thus specific to a certain physical (or virtual) computer architecture. This is in contrast to most
high-level programming languageA high-level programming language is a programming language with strong abstraction from the details of the computer. In comparison to low-level programming languages, it may use natural language elements, be easier to use, or be from the specification of the program, making the process of...
s, which, ideally, are
portableIn computer science, porting is the process of adapting software so that an executable program can be created for a computing environment that is different from the one for which it was originally designed...
.
A utility program called an assembler is used to translate assembly language statements into the target computer's machine code. The assembler performs a more or less
isomorphicIn abstract algebra, an isomorphism is a mapping between objects that shows a relationship between two properties or operations. If there exists an isomorphism between two structures, the two structures are said to be isomorphic. In a certain sense, isomorphic structures are...
translation (a one-to-one mapping) from
mnemonicA mnemonic , or mnemonic device, is any learning technique that aids memory. To improve long term memory, mnemonic systems are used to make memorization easier. Commonly encountered mnemonics are often verbal, such as a very short poem or a special word used to help a person remember something,...
statements into machine instructions and data. This is in contrast with high-level languages, in which a single statement generally results in many machine instructions.
Many advanced assemblers offer additional mechanisms to facilitate program development, control the assembly process, and aid
debuggingDebugging is a methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece of electronic hardware, thus making it behave as expected. Debugging tends to be harder when various subsystems are tightly coupled, as changes in one may cause bugs to emerge...
. In particular, most modern assemblers include a macro facility (described below), and are called macro assemblers.
Why use an assembly language?
A simple assembler converts each assembly language statement into the corresponding machine-language statement, so at first glance seems merely a minor convenience, substituting obscure machine instructions by easily remembered names. However, consider a machine-language program loaded into memory from location 0, which has an instruction which is the machine-code equivalent of
jump to instruction 25 or
jump forward skipping 5 instructions, followed by other instructions. If written in assembly language, assembled, and loaded it will produce exactly the same code, although it will have convenient
mnemonicA mnemonic , or mnemonic device, is any learning technique that aids memory. To improve long term memory, mnemonic systems are used to make memorization easier. Commonly encountered mnemonics are often verbal, such as a very short poem or a special word used to help a person remember something,...
s and use symbolic labels rather than absolute locations, as in
jumpto label4, instead of something like hexadecimal number
A13528CD, the machine code for the instruction. However, if the program is modified in a way which changes the number of instructions, the destination of the jump may no longer be at location 25 or 5 instructions ahead, meaning the entire machine-language program must be modified every time such a change is made; to correct all jump destinations. If symbols have been used in an assembly language to identify the destination of the jump, the programmer need only work on the changed parts with no regard to anything else; the assembler will assemble the modified program with all jumps and so on adjusted to remain correct. Similarly, if memory availability requires the program to be loaded starting at location 123 instead of 0, the assembler will adjust all references to suit. Assembly language has many more such advantages over machine language.
Assembler
- Compare with: Microassembler
A microassembler is a computer program that helps prepare a microprogram to control the low level operation of a computer in much the same way an assembler helps prepare higher level code for a processor. The difference is that the microprogram is usually only developed by the processor...
.
Typically a modern assembler creates
object codeObject code, or sometimes object module, is what a computer compiler produces. In a general sense object code is a sequence of statements in a computer language, usually a machine code language....
by translating assembly instruction mnemonics into
opcodeIn computer science engineering, an opcode is the portion of a machine language instruction that specifies the operation to be performed. Their specification and format are laid out in the instruction set architecture of the processor in question...
s, and by resolving
symbolic namesAn identifier is a name that identifies either a unique object or a unique class of objects, where the "object" or class may be an idea, physical [countable] object , or physical [noncountable] substance...
for memory locations and other entities. The use of symbolic references is a key feature of assemblers, saving tedious calculations and manual address updates after program modifications. Most assemblers also include macro facilities for performing textual substitution—e.g., to generate common short sequences of instructions as
inlineIn computing, inline expansion, or inlining, is a manual or compiler optimization that replaces a function call site with the body of the callee. This optimization may improve time and space usage at runtime, at the possible cost of increasing the final size of the program In computing, inline...
, instead of called
subroutineIn computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....
s.
Assemblers are generally simpler to write than
compilerA compiler is a computer program that transforms source code written in a programming language into another computer language...
s for high-level languages, and have been available since the 1950s. Modern assemblers, especially for RISC architectures, such as
SPARCSPARC is a RISC instruction set architecture developed by Sun Microsystems and introduced in mid-1987....
or
POWERPOWER is a reduced instruction set computer instruction set architecture developed by IBM. The name is an acronym for Performance Optimization With Enhanced RISC....
, as well as x86 and
x86-64x86-64 is an extension of the x86 instruction set. It supports vastly larger virtual and physical address spaces than are possible on x86, thereby allowing programmers to conveniently work with much larger data sets. x86-64 also provides 64-bit general purpose registers and numerous other...
, optimize
Instruction schedulingIn computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines...
to exploit the CPU pipeline efficiently.
Number of passes
There are two types of assemblers based on how many passes through the source are needed to produce the executable program.
- One-pass assemblers go through the source code once. Any symbol used before it is defined will require "errata" at the end of the object code (or, at least, no earlier than the point where the symbol is defined) telling the linker or the loader to "go back" and overwrite a placeholder which had been left where the as yet undefined symbol was used.
- Two-pass assemblers create a table with all symbols and their values in the first pass, then use the table in a second pass to generate code.
- In both cases, the assembler must be able to determine the size of each instruction on the first or only pass in order to calculate the addresses of symbols. This means that if the size of an operation referring to an operand defined later depends on the type or distance of the operand, the assembler will make a pessimistic estimate when first encountering the operation, and if necessary pad it with one or more "no-operation" instructions in the second pass or the errata.
The original reason for the use of one-pass assemblers was speed of assembly; however, modern computers perform two-pass assembly without unacceptable delay. The advantage of the two-pass assembler is that the absence of a need for errata makes the linker (or the loader if the assembler directly produces executable code) simpler and faster.
High-level assemblers
More sophisticated
high-level assemblerHigh-level assemblers in computing are assemblers for assembly language that incorporate features found in high-level programming languages.Some high-level assemblers are Borland's TASM, Microsoft's MASM, IBM's HLASM , Alessandro Ghignola's Linoleum, and Niklaus Wirth's PL/360.High-level assemblers...
s provide language abstractions such as:
- Advanced control structures
- High-level procedure/function declarations and invocations
- High-level abstract data types, including structures/records, unions, classes, and sets
- Sophisticated macro processing (although available on ordinary assemblers since late 1950s for IBM 700 series
The IBM 700/7000 series was a series of large-scale computer systems made by IBM through the 1950s and early 1960s. The series included several different, incompatible processor architectures. The 700s used vacuum tube logic and were made obsolete by the introduction of the transistorized 7000s...
and since 1960's for IBM/360, amongst other machines)
- Object-oriented programming
Object-oriented programming is a programming paradigm using "objects" – data structures consisting of data fields and methods together with their interactions – to design applications and computer programs. Programming techniques may include features such as data abstraction,...
features such as classes, objectsIn computer science, an object is any entity that can be manipulated by the commands of a programming language, such as a value, variable, function, or data structure...
, abstractionIn computer science, abstraction is the process by which data and programs are defined with a representation similar to its pictorial meaning as rooted in the more complex realm of human life and language with their higher need of summarization and categorization , while hiding away the...
, polymorphismIn computer science, polymorphism is a programming language feature that allows values of different data types to be handled using a uniform interface. The concept of parametric polymorphism applies to both data types and functions...
, and inheritanceIn object-oriented programming , inheritance is a way to reuse code of existing objects, establish a subtype from an existing object, or both, depending upon programming language support...
See Language design below for more details.
Use of the term
Note that, in normal professional usage, the term assembler is used to refer both to an assembly language, and to software which assembles an assembly-language program. Thus: "CP/CMS was written in S/360 assembler" as well as "ASM-H was a widely-used S/370 assembler."
Assembly language
A program written in assembly language consists of a series of (mnemonic) processor instructions and meta-statements (known variously as directives, pseudo-instructions and pseudo-ops), comments and data. Assembly language instructions usually consist of an opcode mnemonic followed by a comma-separated list of data, arguments or parameters.These are translated by an assembler to a stream of executable instructions that can be loaded into memory and executed. Assemblers can also be used to produce blocks of data from formatted and commented source code, to be used by other code.
Take, for example, the instruction that tells an x86/
IA-32IA-32 , also known as x86-32, i386 or x86, is the CISC instruction-set architecture of Intel's most commercially successful microprocessors, and was first implemented in the Intel 80386 as a 32-bit extension of x86 architecture...
processor to move an
immediate 8-bit valueIn computer programming, a constant is an identifier whose associated value cannot typically be altered by the program during its execution...
into a
registerIn computer architecture, a processor register is a small amount of storage available as part of a CPU or other digital processor. Such registers are addressed by mechanisms other than main memory and can be accessed more quickly...
. The binary code for this instruction is 10110 followed by a 3-bit identifier for which register to use. The identifier for the AL register is 000, so the following
machine codeMachine code or machine language is a system of impartible instructions executed directly by a computer's central processing unit. Each instruction performs a very specific task, typically either an operation on a unit of data Machine code or machine language is a system of impartible instructions...
loads the AL register with the data 01100001.
10110000 01100001
This binary computer code can be made more human-readable by expressing it in
hexadecimalIn mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...
as follows
B0 61
Here,
B0 means 'Move a copy of the following value into AL, and
61 is a hexadecimal representation of the value 01100001, which is 97 in
decimalThe decimal numeral system has ten as its base. It is the numerical base most widely used by modern civilizations....
. Intel assembly language provides the
mnemonicA mnemonic , or mnemonic device, is any learning technique that aids memory. To improve long term memory, mnemonic systems are used to make memorization easier. Commonly encountered mnemonics are often verbal, such as a very short poem or a special word used to help a person remember something,...
MOVIn the x86 assembly language, the MOV instruction is a mnemonic for the copying of data from one location to another. The x86 assembly language has a number of different move instructions...
(an abbreviation of move) for instructions such as this, so the machine code above can be written as follows in assembly language, complete with an explanatory comment if required, after the semicolon. This is much easier to read and to remember.
MOV AL, 61h ; Load AL with 97 decimal (61 hex)
At one time many assembly language mnemonics were three letter abbreviations, such as JMP for jump, INC for increment, etc. Modern processors have a much larger instruction set and many mnemonics are now longer, for example FPATAN for "
floating pointIn computing, floating point describes a method of representing real numbers in a way that can support a wide range of values. Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent. The base for the scaling is normally 2, 10 or 16...
partial arctangent" and BOUND for "check array
indexIn computer science, an index can be:# an integer that identifies an array element# a data structure that enables sublinear-time lookup -Array element identifier:...
against
boundsIn computer programming, bounds checking is any method of detecting whether a variable is within some bounds before its use. It is particularly relevant to a variable used as an index into an array to ensure its value lies within the bounds of the array...
".
The same mnemonic MOV refers to a family of related opcodes to do with loading, copying and moving data, whether these are immediate values, values in registers, or memory locations pointed to by values in registers. The opcode 10110000 (
B0) copies an 8-bit value into the AL register, while 10110001 (
B1) moves it into CL and 10110010 (
B2) does so into DL. Assembly language examples for these follow.
MOV AL, 1h ; Load AL with immediate value 1
MOV CL, 2h ; Load CL with immediate value 2
MOV DL, 3h ; Load DL with immediate value 3
The syntax of MOV can also be more complex as the following examples show.
MOV EAX, [EBX] ; Move the 4 bytes in memory at the address contained in EBX into EAX
MOV [ESI+EAX], CL ; Move the contents of CL into the byte at address ESI+EAX
In each case, the MOV mnemonic is translated directly into an opcode in the ranges 88-8E, A0-A3, B0-B8, C6 or C7 by an assembler, and the programmer does not have to know or remember which.
Transforming assembly language into machine code is the job of an assembler, and the reverse can at least partially be achieved by a
disassemblerA disassembler is a computer program that translates machine language into assembly language—the inverse operation to that of an assembler. A disassembler differs from a decompiler, which targets a high-level language rather than an assembly language...
. Unlike high-level languages, there is usually a one-to-one correspondence between simple assembly statements and machine language instructions. However, in some cases, an assembler may provide pseudoinstructions (essentially macros) which expand into several machine language instructions to provide commonly needed functionality. For example, for a machine that lacks a "branch if greater or equal" instruction, an assembler may provide a pseudoinstruction that expands to the machine's "set if less than" and "branch if zero (on the result of the set instruction)". Most full-featured assemblers also provide a rich macro language (discussed below) which is used by vendors and programmers to generate more complex code and data sequences.
Each
computer architectureIn computer science and engineering, computer architecture is the practical art of selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals and the formal modelling of those systems....
and processor architecture usually has its own machine language.
On this level, each instruction is simple enough to be executed using a relatively small number of electronic circuits. Computers differ by the number and type of operations they support. For example, a machine with a 64-bit word length would have different circuitry from a 32-bit machine. They may also have different sizes and numbers of registers, and different representations of data types in storage. While most general-purpose computers are able to carry out essentially the same functionality, the ways they do so differ; the corresponding assembly languages may reflect these differences.
Multiple sets of
mnemonicA mnemonic , or mnemonic device, is any learning technique that aids memory. To improve long term memory, mnemonic systems are used to make memorization easier. Commonly encountered mnemonics are often verbal, such as a very short poem or a special word used to help a person remember something,...
s or assembly-language syntax may exist for a single instruction set, typically instantiated in different assembler programs. In these cases, the most popular one is usually that supplied by the manufacturer and used in its documentation.
Basic elements
There is a large degree of diversity in the way the authors of assemblers categorize statements and in the nomenclature that they use. In particular, some describe anything other than a machine mnemonic or extended mnemonic as a pseudo-operation (pseudo-op). A typical assembly language consists of 3 types of instruction statements that are used to define program operations:
- Opcode
In computer science engineering, an opcode is the portion of a machine language instruction that specifies the operation to be performed. Their specification and format are laid out in the instruction set architecture of the processor in question...
mnemonics
- Data sections
- Assembly directives
Opcode mnemonics and extended mnemonics
Instructions (statements) in assembly language are generally very simple, unlike those in
high-level languageA high-level programming language is a programming language with strong abstraction from the details of the computer. In comparison to low-level programming languages, it may use natural language elements, be easier to use, or be from the specification of the program, making the process of...
. Generally, a mnemonic is a symbolic name for a single executable machine language instruction (an
opcodeIn computer science engineering, an opcode is the portion of a machine language instruction that specifies the operation to be performed. Their specification and format are laid out in the instruction set architecture of the processor in question...
), and there is at least one opcode mnemonic defined for each machine language instruction. Each instruction typically consists of an operation or opcode plus zero or more
operandIn mathematics, an operand is the object of a mathematical operation, a quantity on which an operation is performed.-Example :The following arithmetic expression shows an example of operators and operands:3 + 6 = 9\;...
s. Most instructions refer to a single value, or a pair of values. Operands can be immediate (typically one byte values, coded in the instruction itself), registers specified in the instruction, implied or the addresses of data located elsewhere in storage. This is determined by the underlying processor architecture: the assembler merely reflects how this architecture works. Extended mnemonics are often used to specify a combination of an opcode with a specific operand, e.g., the System/360 assemblers use B as an extended mnemonic for BC with a mask of 15 and NOP for BC with a mask of 0.
Extended mnemonics are often used to support specialized uses of instructions, often for purposes not obvious from the instruction name. For example, many CPU's do not have an explicit NOP instruction, but do have instructions that can be used for the purpose. In 8086 CPUs the instruction xchg ax,ax is used for nop, with nop being a pseudo-opcode to encode the instruction xchg ax,ax. Some disassemblers recognize this and will decode the xchg ax,ax instruction as nop. Similarly, IBM assemblers for System/360 and System/370 use the extended mnemonics NOP and NOPR for BC and BCR with zero masks.
Some assemblers also support simple built-in macro-instructions that generate two or more machine instructions. For instance, with some Z80 assemblers the instruction ld hl,bc is recognized to generate ld l,c followed by ld h,b. These are sometimes known as pseudo-opcodes.
Data sections
There are instructions used to define data elements to hold data and variables. They define the type of data, the length and the
alignmentData structure alignment is the way data is arranged and accessed in computer memory. It consists of two separate but related issues: data alignment and data structure padding. When a modern computer reads from or writes to a memory address, it will do this in word sized chunks...
of data. These instructions can also define whether the data is available to outside programs (programs assembled separately) or only to the program in which the data section is defined. Some assemblers classify these as pseudo-ops.
Assembly directives
Assembly directives, also called pseudo opcodes, pseudo-operations or pseudo-ops, are instructions that are executed by an assembler at assembly time, not by a CPU at run time. They can make the assembly of the program dependent on parameters input by a programmer, so that one program can be assembled different ways, perhaps for different applications. They also can be used to manipulate presentation of a program to make it easier to read and maintain.
(For example, directives would be used to reserve storage areas and optionally their initial contents.) The names of directives often start with a dot to distinguish them from machine instructions.
Symbolic assemblers let programmers associate arbitrary names (
labelA label in a programming language is a sequence of characters that identifies a location within source code. In most languages labels take the form of an identifier, often followed by a punctuation character . In many high level programming languages the purpose of a label is to act as the...
s or symbols) with memory locations. Usually, every constant and variable is given a name so instructions can reference those locations by name, thus promoting self-documenting code. In executable code, the name of each subroutine is associated with its entry point, so any calls to a subroutine can use its name. Inside subroutines,
GOTOgoto is a statement found in many computer programming languages. It is a combination of the English words go and to. It performs a one-way transfer of control to another line of code; in contrast a function call normally returns control...
destinations are given labels. Some assemblers support local symbols which are lexically distinct from normal symbols (e.g., the use of "10$" as a GOTO destination).
Some assemblers provide flexible symbol management, letting programmers manage different
namespacesA namespace is an abstract container or environment created to hold a logical grouping of unique identifiers or symbols . An identifier defined in a namespace is associated only with that namespace. The same identifier can be independently defined in multiple namespaces...
, automatically calculate offsets within
data structureIn computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
s, and assign labels that refer to literal values or the result of simple computations performed by the assembler. Labels can also be used to initialize constants and variables with relocatable addresses.
Assembly languages, like most other computer languages, allow comments to be added to assembly
source codeIn computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...
that are ignored by the assembler. Good use of comments is even more important with assembly code than with higher-level languages, as the meaning and purpose of a sequence of instructions is harder to decipher from the code itself.
Wise use of these facilities can greatly simplify the problems of coding and maintaining low-level code. Raw assembly source code as generated by compilers or disassemblers—code without any comments, meaningful symbols, or data definitions—is quite difficult to read when changes must be made.
Macros
Many assemblers support predefined macros, and others support programmer-defined (and repeatedly re-definable) macros involving sequences of text lines in which variables and constants are embedded. This sequence of text lines may include opcodes or directives. Once a macro has been defined its name may be used in place of a mnemonic. When the assembler processes such a statement, it replaces the statement with the text lines associated with that macro, then processes them as if they existed in the source code file (including, in some assemblers, expansion of any macros existing in the replacement text).
Since macros can have 'short' names but expand to several or indeed many lines of code, they can be used to make assembly language programs appear to be far shorter, requiring fewer lines of source code, as with higher level languages. They can also be used to add higher levels of structure to assembly programs, optionally introduce embedded debugging code via parameters and other similar features.
Many assemblers have built-in (or predefined) macros for system calls and other special code sequences, such as the generation and storage of data realized through advanced bitwise and boolean operations used in gaming, software security, data management, and cryptography.
Macro assemblers often allow macros to take
parameterIn computer programming, a parameter is a special kind of variable, used in a subroutine to refer to one of the pieces of data provided as input to the subroutine. These pieces of data are called arguments...
s. Some assemblers include quite sophisticated macro languages, incorporating such high-level language elements as optional parameters, symbolic variables, conditionals, string manipulation, and arithmetic operations, all usable during the execution of a given macro, and allowing macros to save context or exchange information. Thus a macro might generate a large number of assembly language instructions or data definitions, based on the macro arguments. This could be used to generate record-style data structures or "unrolled" loops, for example, or could generate entire algorithms based on complex parameters. An organization using assembly language that has been heavily extended using such a macro suite can be considered to be working in a higher-level language, since such programmers are not working with a computer's lowest-level conceptual elements.
Macros were used to customize large scale software systems for specific customers in the mainframe era and were also used by customer personnel to satisfy their employers' needs by making specific versions of manufacturer operating systems. This was done, for example, by systems programmers working with
IBMInternational Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
's Conversational Monitor System / Virtual Machine (CMS/VM) and with IBM's "real time transaction processing" add-ons,
CICSCustomer Information Control System is a transaction server that runs primarily on IBM mainframe systems under z/OS and z/VSE.CICS is a transaction manager designed for rapid, high-volume online processing. This processing is mostly interactive , but background transactions are possible...
, Customer Information Control System, and ACP/
TPFTPF is an IBM real-time operating system for mainframes descended from the IBM System/360 family, including zSeries and System z9. The name is an initialism for Transaction Processing Facility....
, the airline/financial system that began in the 1970s and still runs many large
computer reservations systemA computer reservations system is a computerized system used to store and retrieve information and conduct transactions related to air travel. Originally designed and operated by airlines, CRSes were later extended for the use of travel agencies...
s (CRS) and credit card systems today.
It was also possible to use solely the macro processing abilities of an assembler to generate code written in completely different languages, for example, to generate a version of a program in COBOL using a pure macro assembler program containing lines of COBOL code inside assembly time operators instructing the assembler to generate arbitrary code.
This was because, as was realized in the 1960s, the concept of "macro processing" is independent of the concept of "assembly", the former being in modern terms more word processing, text processing, than generating object code. The concept of macro processing appeared, and appears, in the C programming language, which supports "preprocessor instructions" to set variables, and make conditional tests on their values. Note that unlike certain previous macro processors inside assemblers, the C preprocessor was not
Turing-completeIn computability theory, a system of data-manipulation rules is said to be Turing complete or computationally universal if and only if it can be used to simulate any single-taped Turing machine and thus in principle any computer. A classic example is the lambda calculus...
because it lacked the ability to either loop or "go to", the latter allowing programs to loop.
Despite the power of macro processing, it fell into disuse in many high level languages (major exceptions being C/C++ and
PL/IPL/I is a procedural, imperative computer programming language designed for scientific, engineering, business and systems programming applications...
) while remaining a perennial for assemblers.
Macro parameter substitution is strictly by name: at macro processing time, the value of a parameter is textually substituted for its name. The most famous class of bugs resulting was the use of a parameter that itself was an expression and not a simple name when the macro writer expected a name. In the macro:
foo: macro a
load a*b
the intention was that the caller would provide the name of a variable, and the "global" variable or constant b would be used to multiply "a". If foo is called with the parameter
a-c, the macro expansion of
load a-c*b occurs. To avoid any possible ambiguity, users of macro processors can parenthesize formal parameters inside macro definitions, or callers can parenthesize the input parameters.
PL/I and C/C++ feature macros, but this facility can only manipulate text. On the other hand, homoiconic languages, such as Lisp,
PrologProlog is a general purpose logic programming language associated with artificial intelligence and computational linguistics.Prolog has its roots in first-order logic, a formal logic, and unlike many other programming languages, Prolog is declarative: the program logic is expressed in terms of...
, and Forth, retain the power of assembly language macros because they are able to manipulate their own code as data.
Support for structured programming
Some assemblers have incorporated
structured programmingStructured programming is a programming paradigm aimed on improving the clarity, quality, and development time of a computer program by making extensive use of subroutines, block structures and for and while loops - in contrast to using simple tests and jumps such as the goto statement which could...
elements to encode execution flow. The earliest example of this approach was in the Concept-14 macro set, originally proposed by Dr. H.D. Mills (March, 1970), and implemented by Marvin Kessler at IBM's Federal Systems Division, which extended the S/360 macro assembler with IF/ELSE/ENDIF and similar control flow blocks. This was a way to reduce or eliminate the use of
GOTOgoto is a statement found in many computer programming languages. It is a combination of the English words go and to. It performs a one-way transfer of control to another line of code; in contrast a function call normally returns control...
operations in assembly code, one of the main factors causing
spaghetti codeSpaghetti code is a pejorative term for source code that has a complex and tangled control structure, especially one using many GOTOs, exceptions, threads, or other "unstructured" branching constructs. It is named such because program flow tends to look like a bowl of spaghetti, i.e. twisted and...
in assembly language. This approach was widely accepted in the early 80s (the latter days of large-scale assembly language use).
A curious design was A-natural, a "stream-oriented" assembler for 8080/Z80 processors from
Whitesmiths Ltd.Whitesmiths Ltd. was a software company based in Westford, Massachusetts. It sold a Unix-like operating system called Idris, as well as the first commercial C compiler...
(developers of the
UnixUnix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
-like
IdrisIdris is a multi-tasking, Unix-like, multi-user, real-time operating system released by Whitesmiths, of Westford, Massachusetts. The product was commercially available from 1979 through 1988.-Background:...
operating system, and what was reported to be the first commercial
CC is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
compilerA compiler is a computer program that transforms source code written in a programming language into another computer language...
). The language was classified as an assembler, because it worked with raw machine elements such as opcodes, registers, and memory references; but it incorporated an expression syntax to indicate execution order. Parentheses and other special symbols, along with block-oriented structured programming constructs, controlled the sequence of the generated instructions. A-natural was built as the object language of a C compiler, rather than for hand-coding, but its logical syntax won some fans.
There has been little apparent demand for more sophisticated assemblers since the decline of large-scale assembly language development. In spite of that, they are still being developed and applied in cases where resource constraints or peculiarities in the target system's architecture prevent the effective use of higher-level languages.
Historical perspective
Assembly languages were first developed in the 1950s, when they were referred to as second generation programming languages.
Nathaniel RochesterNathan Rochester designed the IBM 701, wrote the first assembler and participated in the founding of the field of artificial intelligence.- Early work :...
wrote the first assembler for an
IBM 701The IBM 701, known as the Defense Calculator while in development, was announced to the public on April 29, 1952, and was IBM’s first commercial scientific computer...
.
For example, SOAP (Symbolic Optimal Assembly Program) was a 1957 assembly language for the
IBM 650The IBM 650 was one of IBM’s early computers, and the world’s first mass-produced computer. It was announced in 1953, and over 2000 systems were produced between the first shipment in 1954 and its final manufacture in 1962...
computer. Assembly languages eliminated much of the error-prone and time-consuming first-generation programming needed with the earliest computers, freeing programmers from tedium such as remembering numeric codes and calculating addresses. They were once widely used for all sorts of programming. However, by the 1980s (1990s on
microcomputerA microcomputer is a computer with a microprocessor as its central processing unit. They are physically small compared to mainframe and minicomputers...
s), their use had largely been supplanted by high-level languages, in the search for improved
programming productivityProgramming productivity refers to a variety of software development issues and methodologies affecting the quantity and quality of code produced by an individual or team...
. Today, although assembly language is almost always handled and generated by
compilerA compiler is a computer program that transforms source code written in a programming language into another computer language...
s, it is still used for direct hardware manipulation, access to specialized processor instructions, or to address critical performance issues. Typical uses are
device driverIn computing, a device driver or software driver is a computer program allowing higher-level computer programs to interact with a hardware device....
s, low-level
embedded systemAn embedded system is a computer system designed for specific control functions within a larger system. often with real-time computing constraints. It is embedded as part of a complete device often including hardware and mechanical parts. By contrast, a general-purpose computer, such as a personal...
s, and
real-timeIn computer science, real-time computing , or reactive computing, is the study of hardware and software systems that are subject to a "real-time constraint"— e.g. operational deadlines from event to system response. Real-time programs must guarantee response within strict time constraints...
systems.
Historically, a large number of programs have been written entirely in assembly language. Operating systems were almost exclusively written in assembly language until the widespread acceptance of
CC is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
in the 1970s and early 1980s. Many commercial applications were written in assembly language as well, including a large amount of the IBM mainframe software written by large corporations.
COBOLCOBOL is one of the oldest programming languages. Its name is an acronym for COmmon Business-Oriented Language, defining its primary domain in business, finance, and administrative systems for companies and governments....
and
FORTRANFortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...
eventually displaced much of this work, although a number of large organizations retained assembly-language application infrastructures well into the 90s.
Most early microcomputers relied on hand-coded assembly language, including most operating systems and large applications. This was because these systems had severe resource constraints, imposed idiosyncratic memory and display architectures, and provided limited, buggy system services. Perhaps more important was the lack of first-class high-level language compilers suitable for microcomputer use. A psychological factor may have also played a role: the first generation of microcomputer programmers retained a hobbyist, "wires and pliers" attitude.
In a more commercial context, the biggest reasons for using assembly language were minimal bloat (size), minimal overhead, greater speed, and reliability.
Typical examples of large assembly language programs from this time are IBM PC
DOSDOS, short for "Disk Operating System", is an acronym for several closely related operating systems that dominated the IBM PC compatible market between 1981 and 1995, or until about 2000 if one includes the partially DOS-based Microsoft Windows versions 95, 98, and Millennium Edition.Related...
operating systems and early applications such as the
spreadsheetA spreadsheet is a computer application that simulates a paper accounting worksheet. It displays multiple cells usually in a two-dimensional matrix or grid consisting of rows and columns. Each cell contains alphanumeric text, numeric values or formulas...
program
Lotus 1-2-3Lotus 1-2-3 is a spreadsheet program from Lotus Software . It was the IBM PC's first "killer application"; its huge popularity in the mid-1980s contributed significantly to the success of the IBM PC in the corporate environment.-Beginnings:...
. Even into the 1990s, most console video games were written in assembly, including most games for the
Mega Drive/GenesisThe Sega Genesis is a fourth-generation video game console developed and produced by Sega. It was originally released in Japan in 1988 as , then in North America in 1989 as Sega Genesis, and in Europe, Australia and other PAL regions in 1990 as Mega Drive. The reason for the two names is that...
and the
Super Nintendo Entertainment SystemThe Super Nintendo Entertainment System is a 16-bit video game console that was released by Nintendo in North America, Europe, Australasia , and South America between 1990 and 1993. In Japan and Southeast Asia, the system is called the , or SFC for short...
. According to some industry insiders, the assembly language was the best computer language to use to get the best performance out of the
Sega SaturnThe is a 32-bit fifth-generation video game console that was first released by Sega on November 22, 1994 in Japan, May 11, 1995 in North America, and July 8, 1995 in Europe...
, a console that was notoriously challenging to develop and program games for. The popular arcade game
NBA JamNBA Jam is a basketball arcade game developed by Midway in 1993. It is the first entry in the NBA Jam series, and was written entirely in assembly language. The main designer and programmer for this game was Mark Turmell...
(1993) is another example. Assembly language has long been the primary development language for many popular home computers of the 1980s and 1990s (such as the Sinclair ZX Spectrum,
Commodore 64The Commodore 64 is an 8-bit home computer introduced by Commodore International in January 1982.Volume production started in the spring of 1982, with machines being released on to the market in August at a price of US$595...
, Commodore Amiga, and
Atari STThe Atari ST is a home/personal computer that was released by Atari Corporation in 1985 and commercially available from that summer into the early 1990s. The "ST" officially stands for "Sixteen/Thirty-two", which referred to the Motorola 68000's 16-bit external bus and 32-bit internals...
). This was in large part because BASIC dialects on these systems offered insufficient execution speed, as well as insufficient facilities to take full advantage of the available hardware on these systems. Some systems, most notably the Amiga, even have IDEs with highly advanced debugging and macro facilities, such as the freeware
ASM-One assembler, comparable to that of
Microsoft Visual StudioMicrosoft Visual Studio is an integrated development environment from Microsoft. It is used to develop console and graphical user interface applications along with Windows Forms applications, web sites, web applications, and web services in both native code together with managed code for all...
facilities (ASM-One predates Microsoft Visual Studio).
The Assembler for the VIC-20 was written by Don French and published by French Silk. At 1639 bytes in length, its author believes it is the smallest symbolic assembler ever written. The assembler supported the usual symbolic addressing and the definition of character strings or hex strings. It also allowed address expressions which could be combined with
additionAddition is a mathematical operation that represents combining collections of objects together into a larger collection. It is signified by the plus sign . For example, in the picture on the right, there are 3 + 2 apples—meaning three apples and two other apples—which is the same as five apples....
,
subtractionIn arithmetic, subtraction is one of the four basic binary operations; it is the inverse of addition, meaning that if we start with any number and add any number and then subtract the same number we added, we return to the number we started with...
,
multiplicationMultiplication is the mathematical operation of scaling one number by another. It is one of the four basic operations in elementary arithmetic ....
,
divisionright|thumb|200px|20 \div 4=5In mathematics, especially in elementary arithmetic, division is an arithmetic operation.Specifically, if c times b equals a, written:c \times b = a\,...
, logical AND, logical OR, and
exponentiationExponentiation is a mathematical operation, written as an, involving two numbers, the base a and the exponent n...
operators.
Current usage
There have always been debates over the usefulness and performance of assembly language relative to high-level languages. Assembly language has specific niche uses where it is important; see below. But in general, modern optimizing compilers are claimed to render high-level languages into code that can run as fast as hand-written assembly, despite the counter-examples that can be found. The complexity of modern processors and memory sub-system makes effective optimization increasingly difficult for compilers, as well as assembly programmers. Moreover, and to the dismay of efficiency lovers, increasing processor performance has meant that most CPUs sit idle most of the time, with delays caused by predictable bottlenecks such as
I/OI/O may refer to:* Input/output, a system of communication for information processing systems* Input-output model, an economic model of flow prediction between sectors...
operations and
pagingIn computer operating systems, paging is one of the memory-management schemes by which a computer can store and retrieve data from secondary storage for use in main memory. In the paging memory-management scheme, the operating system retrieves data from secondary storage in same-size blocks called...
. This has made raw code execution speed a non-issue for many programmers.
There are some situations in which practitioners might choose to use assembly language, such as when:
- a stand-alone binary executable of compact size is required, i.e. one that must execute without recourse to the run-time
A run-time system is a software component designed to support the execution of computer programs written in some computer language...
components or librariesIn computer science, a library is a collection of resources used to develop software. These may include pre-written code and subroutines, classes, values or type specifications....
associated with a high-level language; this is perhaps the most common situation. These are embedded single-tasking programs, and use only a relatively small amount of memory. Examples include firmware for telephones, automobile fuel and ignition systems, air-conditioning control systems, security systems, and sensors.
- particularly, a system with severe resource constraints (e.g., an embedded system
An embedded system is a computer system designed for specific control functions within a larger system. often with real-time computing constraints. It is embedded as part of a complete device often including hardware and mechanical parts. By contrast, a general-purpose computer, such as a personal...
) must be hand-coded to maximize the use of limited resources; but this is becoming less common as processor price decreases and performance improves.
- interacting directly with the hardware, for example in device driver
In computing, a device driver or software driver is a computer program allowing higher-level computer programs to interact with a hardware device....
s and interrupt handlerAn interrupt handler, also known as an interrupt service routine , is a callback subroutine in microcontroller firmware, operating system or device driver whose execution is triggered by the reception of an interrupt...
s.
- using processor-specific instructions not implemented in a compiler. A common example is the bitwise rotation
In combinatorial mathematics, a circular shift is the operation of rearranging the entries in a tuple, either by moving the final entry to the first position, while shifting all other entries to the next position, or by performing the inverse operation...
instruction at the core of many encryption algorithms.
- creating vectorized functions for programs in higher-level languages such as C. In the higher-level language this is sometimes aided by compiler intrinsic function
In compiler theory, an intrinsic function is a function available for use in a given language whose implementation is handled specially by the compiler. Typically, it substitutes a sequence of automatically generated instructions for the original function call, similar to an inline function...
s which map directly to SIMD mnemonics, but nevertheless result in a one-to-one assembly conversion specific for the given vector processor.
- extreme optimization is required, e.g., in an inner loop in a processor-intensive algorithm. Game programmer
A game programmer is a software engineer, programmer, or computer scientist who primarily develops codebase for video games or related software, such as game development tools. Game programming has many specialized disciplines all of which fall under the umbrella term of "game programmer"...
s take advantage of the abilities of hardware features in systems, enabling games to run faster. Also large scientific simulations require highly optimized algorithms, e.g. linear algebraLinear algebra is a branch of mathematics that studies vector spaces, also called linear spaces, along with linear functions that input one vector and output another. Such functions are called linear maps and can be represented by matrices if a basis is given. Thus matrix theory is often...
with BLASBasic Linear Algebra Subprograms is a de facto application programming interface standard for publishing libraries to perform basic linear algebra operations such as vector and matrix multiplication. They were first published in 1979, and are used to build larger packages such as LAPACK...
or discrete cosine transformation (e.g. SIMDSingle instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...
assembly version from x264x264 is a free software library for encoding video streams into the H.264/MPEG-4 AVC format. It is released under the terms of the GNU General Public License.-History:...
)
- no high-level language exists, on a new or specialized processor, for example.
- programs need precise timing such as
- real-time
In computer science, real-time computing , or reactive computing, is the study of hardware and software systems that are subject to a "real-time constraint"— e.g. operational deadlines from event to system response. Real-time programs must guarantee response within strict time constraints...
programs that need precise timing and responses, such as simulations, flight navigation systems, and medical equipment. For example, in a fly-by-wireFly-by-wire is a system that replaces the conventional manual flight controls of an aircraft with an electronic interface. The movements of flight controls are converted to electronic signals transmitted by wires , and flight control computers determine how to move the actuators at each control...
system, telemetry must be interpreted and acted upon within strict time constraints. Such systems must eliminate sources of unpredictable delays, which may be created by (some) interpreted languages, automatic garbage collectionIn computer science, garbage collection is a form of automatic memory management. The garbage collector, or just collector, attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the program...
, paging operations, or preemptive multitasking. However, some higher-level languages incorporate run-time components and operating system interfaces that can introduce such delays. Choosing assembly or lower-level languages for such systems gives programmers greater visibility and control over processing details.
- cryptographic algorithms that must always take strictly the same time to execute, preventing timing attack
In cryptography, a timing attack is a side channel attack in which the attacker attempts to compromise a cryptosystem by analyzing the time taken to execute cryptographic algorithms...
s.
- complete control over the environment is required, in extremely high security situations where nothing can be taken for granted.
- writing computer virus
A computer virus is a computer program that can replicate itself and spread from one computer to another. The term "virus" is also commonly but erroneously used to refer to other types of malware, including but not limited to adware and spyware programs that do not have the reproductive ability...
es, bootloaders, certain device driverIn computing, a device driver or software driver is a computer program allowing higher-level computer programs to interact with a hardware device....
s, or other items very close to the hardware or low-level operating system.
- writing instruction set simulator
An instruction set simulator is a simulation model, usually coded in a high-level programming language, which mimics the behavior of a mainframe or microprocessor by "reading" instructions and maintaining internal variables which represent the processor's registers.Instruction simulation is a...
s for monitoring, tracing and debuggingDebugging is a methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece of electronic hardware, thus making it behave as expected. Debugging tends to be harder when various subsystems are tightly coupled, as changes in one may cause bugs to emerge...
where additional overhead is kept to a minimum
- reverse-engineering and modifying program files such as
- existing binaries
A binary file is a computer file which may contain any type of data, encoded in binary form for computer storage and processing purposes; for example, computer document files containing formatted text...
that may or may not have originally been written in a high-level language, for example when trying to recreate programs for which source code is not available or has been lost, or cracking copy protection of proprietary software.
- video games (also termed ROM hacking
ROM hacking is the process of modifying a video game ROM image to alter the game's graphics, dialogue, levels, gameplay, or other elements. This is usually done by technically inclined video game fans to breathe new life into a cherished old game, as a creative outlet, or to make essentially new...
), which is possible via several methods. The most widely employed is altering program code at the assembly language level.
- writing self modifying code, to which assembly language lends itself well.
- writing games
Calculator gaming is the phenomenon of programming and playing games on programmable calculators, especially graphing calculators. It is largely a pastime of high school and college students, who generally are required to use such powerful calculators in advanced mathematics classes; as a result,...
and other software for graphing calculatorA graphing calculator typically refers to a class of handheld calculators that are capable of plotting graphs, solving simultaneous equations, and performing numerous other tasks with variables...
s.
- writing compiler software that generates assembly code; the programmers must be expert assembly language programmers to generate correct assembly code.
Assembly language is still taught in most
computer scienceComputer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
and
electronic engineeringElectronics engineering, also referred to as electronic engineering, is an engineering discipline where non-linear and active electrical components such as electron tubes, and semiconductor devices, especially transistors, diodes and integrated circuits, are utilized to design electronic...
programs. Although few programmers today regularly work with assembly language as a tool, the underlying concepts remain very important. Such fundamental topics as binary arithmetic, memory allocation,
stack processingIn computer science, a stack is a last in, first out abstract data type and linear data structure. A stack can have any abstract data type as an element, but is characterized by only three fundamental operations: push, pop and stack top. The push operation adds a new item to the top of the stack,...
, character set encoding,
interruptIn computing, an interrupt is an asynchronous signal indicating the need for attention or a synchronous event in software indicating the need for a change in execution....
processing, and
compilerA compiler is a computer program that transforms source code written in a programming language into another computer language...
design would be hard to study in detail without a grasp of how a computer operates at the hardware level. Since a computer's behavior is fundamentally defined by its instruction set, the logical way to learn such concepts is to study an assembly language. Most modern computers have similar instruction sets. Therefore, studying a single assembly language is sufficient to learn: I) the basic concepts; II) to recognize situations where the use of assembly language might be appropriate; and III) to see how efficient executable code can be created from high-level languages.
This is analogous to children needing to learn the basic arithmetic operations (e.g., long division), although
calculatorAn electronic calculator is a small, portable, usually inexpensive electronic device used to perform the basic operations of arithmetic. Modern calculators are more portable than most computers, though most PDAs are comparable in size to handheld calculators.The first solid-state electronic...
s are widely used for all except the most trivial calculations.
Typical applications
Hard-coded assembly language is typically used in a system's boot ROM (
BIOSIn IBM PC compatible computers, the basic input/output system , also known as the System BIOS or ROM BIOS , is a de facto standard defining a firmware interface....
on IBM-compatible
PCA personal computer is any general-purpose computer whose size, capabilities, and original sales price make it useful for individuals, and which is intended to be operated directly by an end-user with no intervening computer operator...
systems). This low-level code is used, among other things, to initialize and test the system hardware prior to booting the OS, and is stored in
ROMRead-only memory is a class of storage medium used in computers and other electronic devices. Data stored in ROM cannot be modified, or can be modified only slowly or with difficulty, so it is mainly used to distribute firmware .In its strictest sense, ROM refers only...
. Once a certain level of hardware initialization has taken place, execution transfers to other code, typically written in higher level languages; but the code running immediately after power is applied is usually written in assembly language. The same is true of most boot loaders.
Many compilers render high-level languages into assembly first before fully compiling, allowing the assembly code to be viewed for debugging and optimization purposes. Relatively low-level languages, such as
CC is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
, often provide special
syntaxIn computer science, the syntax of a programming language is the set of rules that define the combinations of symbols that are considered to be correctly structured programs in that language. The syntax of a language defines its surface form...
to embed assembly language directly in the source code. Programs using such facilities, such as the
Linux kernelThe Linux kernel is an operating system kernel used by the Linux family of Unix-like operating systems. It is one of the most prominent examples of free and open source software....
, can then construct abstractions using different assembly language on each hardware platform. The system's
portablePortability in high-level computer programming is the usability of the same software in different environments. The prerequirement for portability is the generalized abstraction between the application logic and system interfaces...
code can then use these processor-specific components through a uniform interface.
Assembly language is also valuable in
reverse engineeringReverse engineering is the process of discovering the technological principles of a device, object, or system through analysis of its structure, function, and operation...
, since many programs are distributed only in machine code form, and machine code is usually easy to translate into assembly language and carefully examine in this form, but very difficult to translate into a higher-level language. Tools such as the
Interactive DisassemblerThe Interactive Disassembler, more commonly known as simply IDA, is a disassembler for computer software which generates assembly language source code from machine-executable code. It supports a variety of executable formats for different processors and operating systems. It also can be used as a...
make extensive use of disassembly for such a purpose.
One niche that makes use of assembly language is the
demosceneThe demoscene is a computer art subculture that specializes in producing demos, which are non-interactive audio-visual presentations that run in real-time on a computer...
. Certain competitions require contestants to restrict their creations to a very small size (e.g. 256B, 1
KBThe kilobyte is a multiple of the unit byte for digital information. Although the prefix kilo- means 1000, the term kilobyte and symbol KB have historically been used to refer to either 1024 bytes or 1000 bytes, dependent upon context, in the fields of computer science and information...
, 4KB or 64 KB), and assembly language is the language of choice to achieve this goal. When resources, especially CPU processing-constrained systems, like the earlier
AmigaThe Amiga is a family of personal computers that was sold by Commodore in the 1980s and 1990s. The first model was launched in 1985 as a high-end home computer and became popular for its graphical, audio and multi-tasking abilities...
models, and the
Commodore 64The Commodore 64 is an 8-bit home computer introduced by Commodore International in January 1982.Volume production started in the spring of 1982, with machines being released on to the market in August at a price of US$595...
, are a concern, assembler coding is a must. Optimized assembler code is written "by hand" and instructions are sequenced manually by
programmerA programmer, computer programmer or coder is someone who writes computer software. The term computer programmer can refer to a specialist in one area of computer programming or to a generalist who writes code for many kinds of software. One who practices or professes a formal approach to...
s in an attempt to minimize the number of CPU cycles used. The CPU constraints are so great that every CPU cycle counts. However, using such methods has enabled systems like the Commodore 64 to produce real-time 3D graphics with advanced effects, a feat which might be considered unlikely or even impossible for a system with a 1.02MHz processor.
Related terminology
- Assembly language or assembler language is commonly called assembly, assembler, ASM, or symbolic machine code. A generation of IBM mainframe programmers called it ALC for Assembly Language Code or BAL for Basic Assembly Language.
-
- Note: Calling the language assembler is of course potentially confusing and ambiguous, since this is also the name of the utility program that translates assembly language statements into machine code. Some may regard this as imprecision or error. However, this usage has been common among professionals and in the literature for decades. Similarly, some early computers called their assembler their assembly program.)
- The computational step where an assembler is run, including all macro processing, is termed assembly time.
- The use of the word assembly dates from the early years of computers (cf. short code
Short Code was one of the first higher-level languages ever developed for an electronic computer. Unlike machine code, Short Code statements represented mathematic expressions rather than a machine instruction.-History:...
, speedcodeSpeedcoding or Speedcode was the first higher-level language created for an IBM computer. The language was developed by John Backus in 1953 for the IBM 701 to support computation with floating point numbers....
).A cross assembler (see cross compilerA cross compiler is a compiler capable of creating executable code for a platform other than the one on which the compiler is run. Cross compiler tools are used to generate executables for embedded system or multiple platforms. It is used to compile for a platform upon which it is not feasible to...
) is functionally just an assembler. This term is used to stress that the assembler is run on a computer or operating systemAn operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
of different type and incompatible with the system on which the resulting code is to run. Cross-assembling may be necessary if the target system cannot run an assembler itself, as is typically the case for small embedded systems. A cross assembler must provide for or interface to facilities to transport the code to the target processor, e.g. to reside in flash or EPROM memory. It generates a binary image, or Intel HEXIntel HEX is a file format for conveying binary information for applications like programming microcontrollers, EPROMs, and other kinds of chips. It is one of the oldest file formats available for this purpose, having been in use since the 1970s...
file rather than an object fileAn object file is a file containing relocatable format machine code that is usually not directly executable. Object files are produced by an assembler, compiler, or other language translator, and used as input to the linker....
.
- An assembler directive or pseudo-opcode is a command given to an assembler. These directives may do anything from telling the assembler to include other source files, to telling it to allocate memory for constant data.
List of assemblers for different computer architectures
The following page has a list of different assemblers for the different computer architectures, along with any associated information for that specific assembler:
Further details
For any given personal computer, mainframe, embedded system, and game console, both past and present, at least one – possibly dozens – of assemblers have been written. For some examples, see the
list of assemblers.
On
UnixUnix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
systems, the assembler is traditionally called
asas is a generic name for an assembler on Unix. The GNU Project's assembler is named Gas....
, although it is not a single body of code, being typically written anew for each port. A number of Unix variants use
GASThe GNU Assembler, commonly known as GAS , is the assembler used by the GNU Project. It is the default back-end of GCC. It is used to assemble the GNU operating system and the Linux kernel, and various other software. It is a part of the GNU Binutils package.GAS' executable is named after as, a...
.
Within processor groups, each assembler has its own dialect. Sometimes, some assemblers can read another assembler's dialect, for example,
TASMTASM can refer to:*Turbo Assembler, Borland's x86 assembler* Turbo Assembler, Omikron's Commodore 64-based 6502 assembler*Table Assembler, a table driven cross-assembler for small microprocessors.*Tomahawk Anti-Ship Missile...
can read old MASM code, but not the reverse.
FASMFASM in computing is an assembler. It supports programming in Intel-style assembly language on the IA-32 and x86-64 computer architectures. It claims high speed, size optimizations, operating system portability, and macro abilities. It is a low-level assembler and intentionally uses very few...
and NASM have similar syntax, but each support different macros that could make them difficult to translate to each other. The basics are all the same, but the advanced features will differ.
Also, assembly can sometimes be portable across different operating systems on the same type of CPU.
Calling conventionIn computer science, a calling convention is a scheme for how subroutines receive parameters from their caller and how they return a result; calling conventions can differ in:...
s between operating systems often differ slightly or not at all, and with care it is possible to gain some portability in assembly language, usually by linking with a
CC is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
library that does not change between operating systems. An
instruction set simulatorAn instruction set simulator is a simulation model, usually coded in a high-level programming language, which mimics the behavior of a mainframe or microprocessor by "reading" instructions and maintaining internal variables which represent the processor's registers.Instruction simulation is a...
(which would ideally be written in an assembler language) can, in theory, process the
object codeObject code, or sometimes object module, is what a computer compiler produces. In a general sense object code is a sequence of statements in a computer language, usually a machine code language....
/
binaryA binary file is a computer file which may contain any type of data, encoded in binary form for computer storage and processing purposes; for example, computer document files containing formatted text...
of any assembler to achieve portability even across
platformA computing platform includes some sort of hardware architecture and a software framework , where the combination allows software, particularly application software, to run...
s (with an overhead no greater than a typical bytecode interpreter). This is essentially what microcode achieves when a hardware platform changes internally.
For example, many things in libc depend on the preprocessor to do OS-specific, C-specific things to the program before compiling. In fact, some functions and symbols are not even guaranteed to exist outside of the preprocessor. Worse, the size and field order of structs, as well as the size of certain
typedeftypedef is a keyword in the C and C++ programming languages. The purpose of typedef is to assign alternative names to existing types, most often those whose standard declaration is cumbersome, potentially confusing, or likely to vary from one implementation to another.Under C convention , types...
s such as off_t, are entirely unavailable in assembly language without help from a configure script, and differ even between versions of
LinuxLinux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
, making it impossible to portably call functions in libc other than ones that only take simple integers and pointers as parameters. To address this issue, FASMLIB project provides a portable assembly library for Win32 and Linux platforms, but it is yet very incomplete.
Some higher level computer languages, such as
CC is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
and Borland Pascal, support
inline assemblyIn computer programming, the inline assembler is a feature of some compilers that allows very low level code written in assembly to be embedded in a high level language like C or Ada...
where sections of assembly code, in practice usually brief, can be embedded into the high level language code. The Forth language commonly contains an assembler used in CODE words.
An
emulatorIn computing, an emulator is hardware or software or both that duplicates the functions of a first computer system in a different second computer system, so that the behavior of the second system closely resembles the behavior of the first system...
can be used to debug assembly-language programs.
Example listing of assembly language source code
| Address |
Label |
Instruction (AT&T syntax) |
Object code |
|
|
.begin |
|
|
|
.org 2048 |
|
|
a_start |
.equ 3000 |
|
| 2048 |
|
ld length,% |
|
| 2064 |
|
be done |
00000010 10000000 00000000 00000110 |
| 2068 |
|
addcc %r1,-4,%r1 |
10000010 10000000 01111111 11111100 |
| 2072 |
|
addcc %r1,%r2,%r4 |
10001000 10000000 01000000 00000010 |
| 2076 |
|
ld %r4,%r5 |
11001010 00000001 00000000 00000000 |
| 2080 |
|
ba loop |
00010000 10111111 11111111 11111011 |
| 2084 |
|
addcc %r3,%r5,%r3 |
10000110 10000000 11000000 00000101 |
| 2088 |
done: |
jmpl %r15+4,%r0 |
10000001 11000011 11100000 00000100 |
| 2092 |
length: |
20 |
00000000 00000000 00000000 00010100 |
| 2096 |
address: |
a_start |
00000000 00000000 00001011 10111000 |
|
|
.org a_start |
|
| 3000 |
a: |
|
|
Example of a selection of instructions (for a
virtual computerUVC-based preservation is a viable strategy to ensure digital preservation on a technical level.A Universal Virtual Computer is a virtual machine specially designed for preservation of digital objects such as held by libraries, archives and institutions alike. The method is based on emulation but...
) with the
corresponding
addressA digital computer's memory, more specifically main memory, consists of many memory locations, each having a memory address, a number, analogous to a street address, at which computer programs store and retrieve, machine code or data. Most application programs do not directly read and write to...
in memory where each instruction will be placed. These addresses are not static, see
memory managementMemory management is the act of managing computer memory. The essential requirement of memory management is to provide ways to dynamically allocate portions of memory to programs at their request, and freeing it for reuse when no longer needed. This is critical to the computer system.Several...
.
Accompanying each instruction is the generated (by the assembler)
object codeAn object file is a file containing relocatable format machine code that is usually not directly executable. Object files are produced by an assembler, compiler, or other language translator, and used as input to the linker....
that coincides with the virtual computer's architecture (or
ISAAn instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...
).
See also
- Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...
- Disassembler
A disassembler is a computer program that translates machine language into assembly language—the inverse operation to that of an assembler. A disassembler differs from a decompiler, which targets a high-level language rather than an assembly language...
- Instruction set
An instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...
- Little man computer
The Little Man Computer is an instructional model of a computer, created by Dr. Stuart Madnick in 1965. The LMC is generally used to teach students, because it models a simple von Neumann architecture computer - which has all of the basic features of a modern computer...
– an educational computer model with a base-10 assembly language
- Microassembler
A microassembler is a computer program that helps prepare a microprogram to control the low level operation of a computer in much the same way an assembler helps prepare higher level code for a processor. The difference is that the microprogram is usually only developed by the processor...
- Typed assembly language
In computer science, a typed assembly language is an assembly language that is extended to include a method of annotating the datatype of each value that is manipulated by the code. These annotations can then be used by a program that processes the assembly language code in order to analyse how...
Further reading
- ASM Community Book "An online book full of helpful ASM info, tutorials and code examples" by the ASM Community
- Jonathan Bartlett: Programming from the Ground Up. Bartlett Publishing, 2004. ISBN 0-9752838-4-7
Also available online as PDF
- Robert Britton: MIPS Assembly Language Programming. Prentice Hall, 2003. ISBN 0-13-142044-5
- Paul Carter: PC Assembly Language. Free ebook, 2001.
Website
- Jeff Duntemann: Assembly Language Step-by-Step. Wiley, 2000. ISBN 0-471-37523-3
- Randall Hyde: The Art of Assembly Language. No Starch Press, 2003. ISBN 1-886411-97-2
Draft versions available online as PDF and HTML
- Peter Norton, John Socha, Peter Norton's Assembly Language Book for the IBM PC, Brady Books, NY: 1986.
- Michael Singer, PDP-11. Assembler Language Programming and Machine Organization, John Wiley & Sons, NY: 1980.
- Dominic Sweetman: See MIPS Run. Morgan Kaufmann Publishers, 1999. ISBN 1-55860-410-3
- John Waldron: Introduction to RISC Assembly Language Programming. Addison Wesley, 1998. ISBN 0-201-39828-1
External links