Machine code or
machine language is a system of impartible instructions executed directly by a
computerA computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...
's
central processing unitThe central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...
. Each instruction performs a very specific task, typically either an operation on a unit of data (in a register or in memory, e.g. add or move), or a jump operation (deciding which instruction executes next, often conditional on the results of a previous instruction). Every executable program is made up of a series of these atomic instructions. Machine code may be regarded as a primitive (and cumbersome)
programming languageA programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
or as the lowest-level representation of a compiled and/or
assembledAn assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...
computer programA computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...
. While it is possible to write programs in machine code, because of the tedious difficulty in managing CPU resources, it is rarely done any more, except for situations that require the most extreme optimization.
Almost all executable programs are written in higher-level languages, and translated to executable machine code by a
compilerA compiler is a computer program that transforms source code written in a programming language into another computer language...
and linker. Machine code is sometimes called
native code when referring to platform-dependent parts of language features or libraries.
Programs in
interpreted languageInterpreted language is a programming language in which programs are 'indirectly' executed by an interpreter program. This can be contrasted with a compiled language which is converted into machine code and then 'directly' executed by the host CPU...
s are
not represented by machine code; however, their
interpreter (which may be seen as a processor executing the higher-level program) often is. Machine code should not be confused with so-called "
bytecodeBytecode, also known as p-code , is a term which has been used to denote various forms of instruction sets designed for efficient execution by a software interpreter as well as being suitable for further compilation into machine code...
", which is executed by an interpreter.
Machine code instructions
Every processor or processor family has its own machine code
instruction setAn instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...
. Instructions are patterns of
bitA bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...
s that by physical design correspond to different commands to the machine. The instruction set is thus specific to a class of processors using (much) the same architecture. Successor or derivative processor designs often include all the instructions of a predecessor and may add additional instructions. Occasionally a successor design will discontinue or alter the meaning of some instruction code (typically because it is needed for new purposes), affecting code compatibility to some extent; even nearly completely compatible processors may show slightly different behavior for some instructions, but this is seldom a problem. Systems may also differ in other details, such as memory arrangement, operating systems, or
peripheral devicesA peripheral is a device attached to a host computer, but not part of it, and is more or less dependent on the host. It expands the host's capabilities, but does not form part of the core computer architecture....
. Because a program normally relies on such factors, different systems will typically not run the same machine code, even when the same type of processor is used.
A machine code instruction set may have all instructions of the same length, or it may have variable-length instructions. How the patterns are organized varies strongly with the particular architecture and often also with the type of instruction. Most instructions have one or more
opcodeIn computer science engineering, an opcode is the portion of a machine language instruction that specifies the operation to be performed. Their specification and format are laid out in the instruction set architecture of the processor in question...
fields which specifies the basic instruction type (such as arithmetic, logical,
jumpA branch is sequence of code in a computer program which is conditionally executed depending on whether the flow of control is altered or not . The term can be used when referring to programs in high level languages as well as program written in machine code or assembly language...
, etc.) and the actual operation (such as add or compare) and other fields that may give the type of the
operandIn mathematics, an operand is the object of a mathematical operation, a quantity on which an operation is performed.-Example :The following arithmetic expression shows an example of operators and operands:3 + 6 = 9\;...
(s), the
addressing modeAddressing modes are an aspect of the instruction set architecture in most central processing unit designs. The various addressing modes that are defined in a given instruction set architecture define how machine language instructions in that architecture identify the operand of each instruction...
(s), the addressing offset(s) or index, or the actual value itself (such constant operands contained in an instruction are called
immediates).
Programs
A computer program is a sequence of instructions that are executed by a CPU. While simple processors execute instructions one after the other,
superscalarA superscalar CPU architecture implements a form of parallelism called instruction level parallelism within a single processor. It therefore allows faster CPU throughput than would otherwise be possible at a given clock rate...
processors are capable of executing several instructions at once.
Program flow may be influenced by special 'jump' instructions that transfer execution to an instruction other than the numerically following one. Conditional jumps are taken (execution continues at another address) or not (execution continues at the next instruction) depending on some condition.
Assembly languages
A much more readable rendition of machine language, called
assembly languageAn assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...
, uses mnemonic codes to refer to machine code instructions, rather than using the instructions' numeric values directly. For example, on the
Zilog Z80The Zilog Z80 is an 8-bit microprocessor designed by Zilog and sold from July 1976 onwards. It was widely used both in desktop and embedded computer designs as well as for military purposes...
processor, the machine code
00000101, which causes the CPU to decrement the
B processor registerIn computer architecture, a processor register is a small amount of storage available as part of a CPU or other digital processor. Such registers are addressed by mechanisms other than main memory and can be accessed more quickly...
, would be represented in assembly language as
DEC B.
Example
The
MIPS architectureMIPS is a reduced instruction set computer instruction set architecture developed by MIPS Technologies . The early MIPS architectures were 32-bit, and later versions were 64-bit...
provides a specific example for a machine code whose instructions are always 32 bits long. The general type of instruction is given by the
op (operation) field, the highest 6 bits. J-type (jump) and I-type (immediate) instructions are fully specified by
op. R-type (register) instructions include an additional field
funct to determine the exact operation. The fields used in these types are:
6 5 5 5 5 6 bits
[ op | rs | rt | rd |shamt| funct] R-type
[ op | rs | rt | address/immediate] I-type
[ op | target address ] J-type
rs,
rt, and
rd indicate register operands;
shamt gives a shift amount; and the
address or
immediate fields contain an operand directly.
For example adding the registers 1 and 2 and placing the result in register 6 is encoded:
[ op | rs | rt | rd |shamt| funct]
0 1 2 6 0 32 decimal
000000 00001 00010 00110 00000 100000 binary
Load a value into register 8, taken from the memory cell 68 cells after the location listed in register 3:
[ op | rs | rt | address/immediate]
35 3 8 68 decimal
100011 00011 01000 00000 00001 000100 binary
Jumping to the address 1024:
[ op | target address ]
2 1024 decimal
000010 00000 00000 00000 10000 000000 binary
Relationship to microcode
In some
computer architectureIn computer science and engineering, computer architecture is the practical art of selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals and the formal modelling of those systems....
s, the machine code is implemented by a more fundamental underlying layer of programs called microprograms, providing a common machine language interface across a line or family of different models of computer with widely different underlying
dataflowDataflow is a term used in computing, and may have various shades of meaning. It is closely related to message passing.-Software architecture:...
s. This is done to facilitate
portingIn computer science, porting is the process of adapting software so that an executable program can be created for a computing environment that is different from the one for which it was originally designed...
of machine language programs between different models. An example of this use is the IBM
System/360The IBM System/360 was a mainframe computer system family first announced by IBM on April 7, 1964, and sold between 1964 and 1978. It was the first family of computers designed to cover the complete range of applications, from small to large, both commercial and scientific...
family of computers and their successors. With dataflow path widths of 8 bits to 64 bits and beyond, they nevertheless present a common architecture at the machine language level across the entire line.
Using a microcode layer to implement an
emulatorIn computing, an emulator is hardware or software or both that duplicates the functions of a first computer system in a different second computer system, so that the behavior of the second system closely resembles the behavior of the first system...
enables the computer to present the architecture of an entirely different computer. The System/360 line used this to allow porting programs from earlier IBM machines to the new family of computers, e.g. an
IBM 1401/1440/1460The IBM 1400 series were second generation mid-range business decimal computers that IBM sold in the early 1960s. They could be operated as an independent system, in conjunction with IBM punched card equipment, or as auxiliary equipment to other computer systems.1400-series machines stored...
emulator on the IBM S/360 model 40.
Storing in memory
The
Harvard architectureThe Harvard architecture is a computer architecture with physically separate storage and signal pathways for instructions and data. The term originated from the Harvard Mark I relay-based computer, which stored instructions on punched tape and data in electro-mechanical counters...
is a computer architecture with physically separate storage and signal pathways for the code (instructions) and
dataIn computer science, data is information in a form suitable for use with a computer. Data is often distinguished from programs. A program is a sequence of instructions that detail a task for the computer to perform...
. Today, most processors implement such separate signal pathways for performance reasons but actually implement a
Modified Harvard architectureThe Modified Harvard Architecture is a variation of the Harvard computer architecture that allows the contents of the instruction memory to be accessed as if it were data...
, so they can support tasks like loading a program from
disk storagethumb|200px|right|A reel-to-reel tape recorder .The magnetic tape is a data storage medium. The recorder is data storage equipment using a portable medium to store the data....
as data and then executing it. Harvard architecture is contrasted to the
Von Neumann architectureThe term Von Neumann architecture, aka the Von Neumann model, derives from a computer architecture proposal by the mathematician and early computer scientist John von Neumann and others, dated June 30, 1945, entitled First Draft of a Report on the EDVAC...
, where data and code are stored in the same memory.
From the point of view of a
processIn computing, a process is an instance of a computer program that is being executed. It contains the program code and its current activity. Depending on the operating system , a process may be made up of multiple threads of execution that execute instructions concurrently.A computer program is a...
, the
code space is the part of its
address spaceIn computing, an address space defines a range of discrete addresses, each of which may correspond to a network host, peripheral device, disk sector, a memory cell or other logical or physical entity.- Overview :...
where code in execution is stored. In
multi-threadingIn computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system. The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process...
environment,
threadsIn computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system. The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process...
share code space along with data space, which reduces the overhead of context switching considerably as compared to process switching.
Readability by humans
It has been said that machine code is so unreadable that the Copyright Office cannot even identify whether a particular encoded program is an original work of authorship. Hofstadter writes, "Looking at a program written in machine language is vaguely comparable to looking at a
DNADeoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...
molecule atom by atom."
See also
- Reduced Instruction Set Computer
Reduced instruction set computing, or RISC , is a CPU design strategy based on the insight that simplified instructions can provide higher performance if this simplicity enables much faster execution of each instruction. A computer based on this strategy is a reduced instruction set computer...
(RISC)
- VLIW
- P-code machine
In computer programming, a p-code machine, or portable code machine is a virtual machine designed to execute p-code...
- Endianness
In computing, the term endian or endianness refers to the ordering of individually addressable sub-components within the representation of a larger data item as stored in external memory . Each sub-component in the representation has a unique degree of significance, like the place value of digits...
- Teaching Machine Code: Microprofessor I
The Micro-Professor MPF-I, introduced in 1981 by Multitech , was the first branded computer product from Multitech and probably one of the world's longest selling computers...