Perl virtual machine
Encyclopedia
The Perl virtual machine is a stack-based
Stack machine
A stack machine may be* A real or emulated computer that evaluates each sub-expression of a program statement via a pushdown data stack and uses a reverse Polish notation instruction set....

 process virtual machine implemented as an opcode
Opcode
In computer science engineering, an opcode is the portion of a machine language instruction that specifies the operation to be performed. Their specification and format are laid out in the instruction set architecture of the processor in question...

s interpreter
Interpreter (computing)
In computer science, an interpreter normally means a computer program that executes, i.e. performs, instructions written in a programming language...

 which runs previously compiled Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

 programs. The opcodes interpreter is a part of the Perl interpreter, which also contains a compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

 (lexer
Lexical analysis
In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. A program or function which performs lexical analysis is called a lexical analyzer, lexer or scanner...

, parser
Parsing
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar...

 and optimizer
Compiler optimization
Compiler optimization is the process of tuning the output of a compiler to minimize or maximize some attributes of an executable computer program. The most common requirement is to minimize the time taken to execute a program; a less common one is to minimize the amount of memory occupied...

) in one executable file, commonly /usr/bin/perl on various Unix-like
Unix-like
A Unix-like operating system is one that behaves in a manner similar to a Unix system, while not necessarily conforming to or being certified to any version of the Single UNIX Specification....

 systems or perl.exe on Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

 systems.

Opcodes

The Perl compiler outputs a compiled program into memory as an internal structure which can be represented as a tree graph in which each node represents an opcode. Opcodes are represented internally by typedef
Typedef
typedef is a keyword in the C and C++ programming languages. The purpose of typedef is to assign alternative names to existing types, most often those whose standard declaration is cumbersome, potentially confusing, or likely to vary from one implementation to another.Under C convention , types...

s. Each opcode has next / other and first / sibling pointers, so the opcode tree can be drawn as a basic OP tree starting from root node or as flat OP list in the order they would normally execute from start node. Opcodes tree can be mapped to the source code, so it is possible to decompile
Decompiler
A decompiler is the name given to a computer program that performs, as far as possible, the reverse operation to that of a compiler. That is, it translates a file containing information at a relatively low level of abstraction into a form having a higher level of abstraction...

 to high-level source code.

Perl's opcodes interpreter is implemented as a tree walker which travels by opcode tree in execute order from start node, following the next or other pointers. Each opcode has a function pointer to a pp_opname function, i.e. say opcode calls pp_say function of internal Perl API.

The phase of compiling the Perl program is hidden for the end user, but it can be exposed with B Perl module or other specialized modules like B::Concise Perl module.

An example of compiled simple Hello world program with a help of B::Concise Perl module, dumped in execute order:


$ perl -MO=Concise,-exec -E 'say "Hello, world!"'
1 <0> enter
2 <;> nextstate(main 46 -e:1) v:%,{
3 <0> pushmark s
4 <$> const[PV "Hello, world!"] s
5 <@> say vK
6 <@> leave[1 ref] vKP/REFC


Some opcodes (entereval, dofile, require) call Perl compiler functions which generate other opcodes in the same Perl virtual machine.

Variables

Perl variables can be global, dynamic (local keyword), or lexical (my and our keywords).

Global variables are accessible via the stash and the corresponding typeglob.

Local variables are the same as global variables but a special opcode is generated to save its value on savestack and restore it later.

Lexical variables are stored on padlist.

Data structures

Perl VM data structures are represented internally by typedef
Typedef
typedef is a keyword in the C and C++ programming languages. The purpose of typedef is to assign alternative names to existing types, most often those whose standard declaration is cumbersome, potentially confusing, or likely to vary from one implementation to another.Under C convention , types...

s.

The internal data structures can be examined with B Perl module or other specialized tools like Devel::Peek Perl module.

data types

Perl has three typedefs that handle Perl's three main data types: Scalar Value (SV), Array Value (AV), Hash Value (HV). Perl uses a special typedef for simple signed integer type (IV), an unsigned integer (IV), a floating point number (NV) and string (PV).

Perl uses a reference count-driven garbage collection mechanism. SVs, AVs, or HVs start their life with a reference count of 1. If the reference count of a data value ever drops to 0, then it will be destroyed and its memory made available for reuse.

Other typedefs are Glob Value (GV) which contains named references to the various objects, Code Value (CV) which contains a reference to Perl subroutine, I/O Handler (IO), a reference to regular expression
Regular expression
In computing, a regular expression provides a concise and flexible means for "matching" strings of text, such as particular characters, words, or patterns of characters. Abbreviations for "regular expression" include "regex" and "regexp"...

 (REGEXP; RV in Perl before 5.11), reference to compiled format for output record (FM) and simple reference which is a special type of scalar that point to other data types (RV).

stash

Special Hash Value is stash, a hash that contains all variables that are defined within a package. Each value in this hash table is a Glob Value (GV).

padlist

Special Array Value is padlist which is an array of array. Its 0th element to an AV containing all lexical variable names (with prefix symbols) used within that subroutine. The padlist's first element points to a scratchpad AV, whose elements contain the values corresponding to the lexical variables named in the 0th row. Another elements of padlist are created when the subroutine recurses or new thread is created.

Argument stack

Arguments are passed to opcode and returned from opcode using the argument stack. The typical way to handle arguments is to pop them off the stack, and then push the result back onto the stack.

Mark stack

This stack saves bookmarks to locations in the argument stack usable by each function so the functions doesn't necessarily get the whole argument stack to itself.

Save stack

This stack is used for saving and restoring a value of dynamically scoped
Scope (programming)
In computer programming, scope is an enclosing context where values and expressions are associated. Various programming languages have various types of scopes. The type of scope determines what kind of entities it can contain and how it affects them—or semantics...

local variable.

Scope stack

This stack stores information about actual scope and it is used only for debugging purposes.

Other implementations

There is no standarization for Perl language and Perl virtual machine. The internal API should be considered as non-stable and changes from version to version. The Perl virtual machine is tied closely to compiler. These things make very hard to reimplement Perl virtual machine.

The most known and most stable implementation is a B::C Perl module which translates opcodes tree to representation in C language and adds own tree walker.

Another implementation is an Acme::Perl::VM Perl module which is an implementation coded in Perl language only but it is still tied with original Perl virtual machine via B:: modules.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK