All Topics  
Shunting yard algorithm

 

   Email Print
   Bookmark   Link






 

Shunting yard algorithm



 
 
The shunting yard algorithm is a method for parsing mathematical equations specified in infix notation
Infix notation

Infix notation is the common arithmetic and logical formula notation, in which operators are written infix-style between the operands they act on ....
. It can be used to produce output in Reverse Polish notation
Reverse Polish notation

Reverse Polish notation by analogy with the related Polish notation, a prefix notation introduced in 1920 by the Poland mathematician Jan Lukasiewicz, is a mathematical notation wherein every operator follows all of its operands....
 (RPN) or as an abstract syntax tree
Abstract syntax tree

In computer science, an abstract syntax tree , or just syntax tree, is a directed tree representation of the abstract syntactic structure of source code written in a certain programming language....
 (AST). The algorithm
Algorithm

In mathematics, computing, linguistics and related subjects, an algorithm is a sequence of finite instructions, often used for calculation and data processing....
 was invented by Edsger Dijkstra
Edsger Dijkstra

Edsger Wybe Dijkstra was a Netherlands computer science. He received the 1972 Turing Award for fundamental contributions in the area of programming languages, and was the Schlumberger Centennial Chair of Computer Sciences at University of Texas at Austin from 1984 until 2000....
 and named the "shunting yard" algorithm because its operation resembles that of a railroad shunting yard
Classification yard

A classification yard or marshalling yard is a railroad Rail yard found at some goods station, used to separate railroad cars on to one of several tracks....
.

Like the evaluation of RPN, the shunting yard algorithm is stack
Stack (data structure)

In computer science, a stack is an abstract data type and data structure based on the principle of LIFO . Stacks are used extensively at every level of a modern computer system....
-based. Infix expressions are the form of math most people are used to, for instance 3+4 or 3+4*(2-1).






Discussion
Ask a question about 'Shunting yard algorithm'
Start a new discussion about 'Shunting yard algorithm'
Answer questions from other users
Full Discussion Forum



Encyclopedia


The shunting yard algorithm is a method for parsing mathematical equations specified in infix notation
Infix notation

Infix notation is the common arithmetic and logical formula notation, in which operators are written infix-style between the operands they act on ....
. It can be used to produce output in Reverse Polish notation
Reverse Polish notation

Reverse Polish notation by analogy with the related Polish notation, a prefix notation introduced in 1920 by the Poland mathematician Jan Lukasiewicz, is a mathematical notation wherein every operator follows all of its operands....
 (RPN) or as an abstract syntax tree
Abstract syntax tree

In computer science, an abstract syntax tree , or just syntax tree, is a directed tree representation of the abstract syntactic structure of source code written in a certain programming language....
 (AST). The algorithm
Algorithm

In mathematics, computing, linguistics and related subjects, an algorithm is a sequence of finite instructions, often used for calculation and data processing....
 was invented by Edsger Dijkstra
Edsger Dijkstra

Edsger Wybe Dijkstra was a Netherlands computer science. He received the 1972 Turing Award for fundamental contributions in the area of programming languages, and was the Schlumberger Centennial Chair of Computer Sciences at University of Texas at Austin from 1984 until 2000....
 and named the "shunting yard" algorithm because its operation resembles that of a railroad shunting yard
Classification yard

A classification yard or marshalling yard is a railroad Rail yard found at some goods station, used to separate railroad cars on to one of several tracks....
.

Like the evaluation of RPN, the shunting yard algorithm is stack
Stack (data structure)

In computer science, a stack is an abstract data type and data structure based on the principle of LIFO . Stacks are used extensively at every level of a modern computer system....
-based. Infix expressions are the form of math most people are used to, for instance 3+4 or 3+4*(2-1). For the conversion there are two text variable
Variable

A variable is a symbol that stands for a value that may vary; the term usually occurs in opposition to constant, which is a symbol for a non-varying value, i.e....
s (strings
String (computer science)

In computer programming and some branches of mathematics, a string is an ordered sequence of symbols. These symbols are chosen from a predetermined set or alphabet....
), the input and the output. There is also a stack that holds operators not yet added to the output queue. To convert, the program reads each symbol in order and does something based on that symbol.

A simple conversion


Input: 3+4
  1. Add 3 to the output queue (whenever a number is read it is added to the output)
  2. Push
    Stack (data structure)

    In computer science, a stack is an abstract data type and data structure based on the principle of LIFO . Stacks are used extensively at every level of a modern computer system....
     + (or its ID) onto the operator stack
    Stack (data structure)

    In computer science, a stack is an abstract data type and data structure based on the principle of LIFO . Stacks are used extensively at every level of a modern computer system....
  3. Add 4 to the output queue
  4. After reading expression pop
    Stack (data structure)

    In computer science, a stack is an abstract data type and data structure based on the principle of LIFO . Stacks are used extensively at every level of a modern computer system....
     the operators off the stack and add them to the output.
  5. In this case there is only one, "+".
  6. Output 3 4 +


This already shows a couple of rules:
  • All numbers are added to the output when they are read.
  • At the end of reading the expression, pop all operators off the stack and onto the output.


The algorithm in detail


  • While there are tokens to be read:
  • Read a token.
  • If the token is a number, then add it to the output queue.
  • If the token is a function
    Function (mathematics)

    The mathematical concept of a function expresses dependence between two quantities, one of which is known and the other which is produced. A function associates a single output to each input element drawn from a fixed Set , such as the real numbers , although different inputs may have the same output....
     token, then push it onto the stack.
  • If the token is a function argument separator (e.g., a comma):


  • Until the token at the top of the stack is a left parenthesis, pop operators off the stack onto the output queue. If no left parentheses are encountered, either the separator was misplaced or parentheses were mismatched.
  • If the token is an operator, o1, then:


  • while there is an operator, o2, at the top of the stack, and either
o1 is associative
Operator associativity

In programming languages, the associativity of an operator is a property that determines how operators of the same order of operations are grouped in the absence of Bracket ....
 or left-associative and its precedence
Order of operations

In algebra and computer programming, when a number or expression is both preceded and followed by an operator such as minus or multiplication, a rule is needed to specify which operator should be applied first; this rule is known as a precedence rule, or more informally order of operation....
 is less than (lower precedence) or equal to that of o2, or
o1 is right-associative and its precedence is less than (lower precedence) that of o2,
pop o2 off the stack, onto the output queue;
  • push o1 onto the stack.
  • If the token is a left parenthesis, then push it onto the stack.
  • If the token is a right parenthesis:


  • Until the token at the top of the stack is a left parenthesis, pop operators off the stack onto the output queue.
  • Pop the left parenthesis from the stack, but not onto the output queue.
  • If the token at the top of the stack is a function token, pop it onto the output queue.
  • If the stack runs out without finding a left parenthesis, then there are mismatched parentheses.
  • When there are no more tokens to read:
  • While there are still operator tokens in the stack:


  • If the operator token on the top of the stack is a parenthesis, then there are mismatched parenthesis.
  • Pop the operator onto the output queue.
  • Exit.


To analyze the running time complexity of this algorithm, one has only to note that each token will be read once, each number, function, or operator will be printed once, and each function, operator, or parenthesis will be pushed onto the stack and popped off the stack once - therefore, there are at most a constant number of operations executed per token, and the running time is thus O(n) - linear in the size of the input.

Complex example

Input: 3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3
Token Action Output (in RPN
Reverse Polish notation

Reverse Polish notation by analogy with the related Polish notation, a prefix notation introduced in 1920 by the Poland mathematician Jan Lukasiewicz, is a mathematical notation wherein every operator follows all of its operands....
)
Operator Stack Notes
3 Add token to output 3  
+ Push token to stack 3 +  
4 Add token to output 3 4 +  
* Push token to stack 3 4 * + * has higher precedence than +
2 Add token to output 3 4 2 * +  
/ Pop stack to output 3 4 2 * + / and * have same precedence
Push token to stack 3 4 2 * / + / has higher precedence than +
( Push token to stack 3 4 2 * ( / +  
1 Add token to output 3 4 2 * 1 ( / +  
- Push token to stack 3 4 2 * 1 - ( / +  
5 Add token to output 3 4 2 * 1 5 - ( / +  
) Pop stack to output 3 4 2 * 1 5 - ( / + Repeated until "(" found
Pop stack 3 4 2 * 1 5 - / + Discard matching parenthesis
^ Push token to stack 3 4 2 * 1 5 - ^ / + ^ has higher precedence than /
2 Add token to output 3 4 2 * 1 5 - 2 ^ / +  
^ Push token to stack 3 4 2 * 1 5 - 2 ^ ^ / + ^ is evaluated right-to-left
3 Add token to output 3 4 2 * 1 5 - 2 3 ^ ^ / +  
end Pop entire stack to output 3 4 2 * 1 5 - 2 3 ^ ^ / +  


If you were writing an interpreter
Interpreter (computing)

In computer science, an interpreter normally means a computer program that execution , i.e. performs, instructions written in a programming language....
, this output would be tokenized and written to a compiled file to be later interpreted
Interpreter (computing)

In computer science, an interpreter normally means a computer program that execution , i.e. performs, instructions written in a programming language....
. Conversion from infix to RPN can also allow for easier simplification of expressions. To do this, act like you are solving the RPN expression, however, whenever you come to a variable its value is null, and whenever an operator has a null value, it and its parameters are written to the output (this is a simplification, problems arise when the parameters are operators). When an operator has no null parameters its value can simply be written to the output. This method obviously doesn't include all the simplifications possible: It's more of a constant folding
Constant folding

In compiler theory, constant folding and constant propagation are related compiler optimizations used by many modern compilers. A more advanced form of constant propagation known as sparse conditional constant propagation may be utilized to simultaneously remove dead code and more accurately propagate constants....
 optimization.

See also

  • Operator-precedence parser
    Operator-precedence parser

    An operator precedence parser is a computer program that interprets an operator-precedence grammar. For example, most calculators use operator precedence parsers to convert from infix notation with order of operations into a different format they use internally to compute the result....
  • Reverse Polish notation
    Reverse Polish notation

    Reverse Polish notation by analogy with the related Polish notation, a prefix notation introduced in 1920 by the Poland mathematician Jan Lukasiewicz, is a mathematical notation wherein every operator follows all of its operands....


External links

Theodore Norvell (C) 1999–2001. Access data September 14, 2006.