AWK (programming language)
Encyclopedia
The AWK utility is a data extraction and reporting tool that uses a data-driven scripting language
Scripting language
A scripting language, script language, or extension language is a programming language that allows control of one or more applications. "Scripts" are distinct from the core code of the application, as they are usually written in a different language and are often created or at least modified by the...

 consisting of a set of actions to be taken against textual data (either in files or data streams) for the purpose of producing formatted reports. The language used by awk extensively uses the string
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....

 datatype, associative array
Associative array
In computer science, an associative array is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection....

s (that is, arrays indexed by key strings), and regular expression
Regular expression
In computing, a regular expression provides a concise and flexible means for "matching" strings of text, such as particular characters, words, or patterns of characters. Abbreviations for "regular expression" include "regex" and "regexp"...

s.

AWK is one of the early tools to appear in Version 7 Unix
Version 7 Unix
Seventh Edition Unix, also called Version 7 Unix, Version 7 or just V7, was an important early release of the Unix operating system. V7, released in 1979, was the last Bell Laboratories release to see widespread distribution before the commercialization of Unix by AT&T in the early 1980s...

 and gained popularity as a way to add computational features to a Unix pipeline
Pipeline (Unix)
In Unix-like computer operating systems , a pipeline is the original software pipeline: a set of processes chained by their standard streams, so that the output of each process feeds directly as input to the next one. Each connection is implemented by an anonymous pipe...

. A version of the AWK language is a standard feature of nearly every modern Unix-like
Unix-like
A Unix-like operating system is one that behaves in a manner similar to a Unix system, while not necessarily conforming to or being certified to any version of the Single UNIX Specification....

 operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

 available today. AWK is mentioned in the Single UNIX Specification
Single UNIX Specification
The Single UNIX Specification is the collective name of a family of standards for computer operating systems to qualify for the name "Unix"...

 as one of the mandatory utilities of a Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

 operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

. Besides the Bourne shell
Bourne shell
The Bourne shell, or sh, was the default Unix shell of Unix Version 7 and most Unix-like systems continue to have /bin/sh - which will be the Bourne shell, or a symbolic link or hard link to a compatible shell - even when more modern shells are used by most users.Developed by Stephen Bourne at AT&T...

, AWK is the only other scripting language available in a standard Unix environment. It is also present amongst the commands required by the Linux Standard Base
Linux Standard Base
The Linux Standard Base is a joint project by several Linux distributions under the organizational structure of the Linux Foundation to standardize the software system structure, including the filesystem hierarchy, used with Linux operating system...

 specification. Implementations of AWK exist as installed software for almost all other operating systems.

AWK was created at Bell Labs
Bell Labs
Bell Laboratories is the research and development subsidiary of the French-owned Alcatel-Lucent and previously of the American Telephone & Telegraph Company , half-owned through its Western Electric manufacturing subsidiary.Bell Laboratories operates its...

 in the 1970s, and its name is derived from the family name
Family name
A family name is a type of surname and part of a person's name indicating the family to which the person belongs. The use of family names is widespread in cultures around the world...

s of its authors — Alfred Aho, Peter Weinberger
Peter J. Weinberger
Peter Jay Weinberger is a computer scientist best known for his early work at Bell Labs. He now works at Google.Weinberger was an undergraduate at Swarthmore College, graduating in 1964...

, and Brian Kernighan
Brian Kernighan
Brian Wilson Kernighan is a Canadian computer scientist who worked at Bell Labs alongside Unix creators Ken Thompson and Dennis Ritchie and contributed to the development of Unix. He is also coauthor of the AWK and AMPL programming languages. The 'K' of K&R C and the 'K' in AWK both stand for...

. The name is not commonly pronounced as a string of separate letters but rather to sound the same as the name of the bird, auk
Auk
An auk is a bird of the family Alcidae in the order Charadriiformes. Auks are superficially similar to penguins due to their black-and-white colours, their upright posture and some of their habits...

 (which acts as an emblem of the language such as on The AWK Programming Language book cover - the book is often referred to by the abbreviation TAPL). awk, when written in all lowercase letters, refers to the Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

 or Plan 9
Plan 9 from Bell Labs
Plan 9 from Bell Labs is a distributed operating system. It was developed primarily for research purposes as the successor to Unix by the Computing Sciences Research Center at Bell Labs between the mid-1980s and 2002...

 program that runs other programs written in the AWK programming language.

The power, terseness, and limitations of early AWK programs inspired Larry Wall
Larry Wall
Larry Wall is a programmer and author, most widely known for his creation of the Perl programming language in 1987.-Education:Wall earned his bachelor's degree from Seattle Pacific University in 1976....

 to write Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

 just as a new, more powerful POSIX AWK and gawk (GNU AWK) were being defined. Although AWK and sed
Sed
sed is a Unix utility that parses text and implements a programming language which can apply transformations to such text. It reads input line by line , applying the operation which has been specified via the command line , and then outputs the line. It was developed from 1973 to 1974 as a Unix...

 were designed to support one-liner program
One-liner program
A one-liner is textual input to the command-line of an operating system shell that performs some function in just one line of input.The one liner can be# An expression written in the language of the shell....

s, even the early Bell Labs users of AWK often wrote well-structured large AWK programs. Despite its limited intended area of use, AWK is Turing-complete.

Structure of AWK programs


"AWK is a language for processing files of text. A file is treated as a sequence of records, and by default each line is a record. Each line is broken up into a sequence of fields, so we can think of the first word in a line as the first field, the second word as the second field, and so on. An AWK program is of a sequence of pattern-action statements. AWK reads the input a line at a time. A line is scanned for each pattern in the program, and for each pattern that matches, the associated action is executed." - Alfred V. Aho


An AWK program is a series of pattern action pairs, written as:

condition { action }

where condition is typically an expression and action is a series of commands.
The input is split into records, where by default records are separated by newline characters so that the input is split into lines. The program tests each record against each of the conditions in turn, and executes the action for each expression that is true.
Either the condition or the action may be omitted.
The condition defaults to matching every record.
The default action is to print the record.

In addition to a simple AWK expression, such as foo 1 or /^foo/, the condition can be BEGIN or END causing the action to be executed before or after all records have been read, or pattern1, pattern2 which matches the range of records starting with a record that matches pattern1 up to and including the record that matches pattern2 before again trying to match against pattern1 on future lines.

In addition to normal arithmetic and logical operators, AWK expressions include the tilde operator, ~, which matches a regular expression
Regular expression
In computing, a regular expression provides a concise and flexible means for "matching" strings of text, such as particular characters, words, or patterns of characters. Abbreviations for "regular expression" include "regex" and "regexp"...

 against a string.
As handy syntactic sugar
Syntactic sugar
Syntactic sugar is a computer science term that refers to syntax within a programming language that is designed to make things easier to read or to express....

, /regexp/ without using the tilde operator matches against the current record.
AWK commands
AWK commands are the statement that is substituted for action in the examples above. AWK commands can include function calls, variable assignments, calculations, or any combination thereof. AWK contains built-in support for many functions; many more are provided by the various flavors of AWK. Also, some flavors support the inclusion of dynamically linked libraries, which can also provide more functions.

For brevity, the enclosing curly braces ( { } ) will be omitted from these examples.

The print command

The print command is used to output text. The output text is always terminated with a predefined string called the output record separator (ORS) whose default value is a newline. The simplest form of this command is:

print
This displays the contents of the current record. In AWK, records are broken down into fields, and these can be displayed separately:


print $1
Displays the first field of the current record


print $1, $3
Displays the first and third fields of the current record, separated by a predefined string called the output field separator (OFS) whose default value is a single space character


Although these fields ($X) may bear resemblance to variables (the $ symbol indicates variables in perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

), they actually refer to the fields of the current record. A special case, $0, refers to the entire record. In fact, the commands "print" and "print $0" are identical in functionality.

The print command can also display the results of calculations and/or function calls:

print 3+2
print foobar(3)
print foobar(variable)
print sin(3-2)

Output may be sent to a file:

print "expression" > "file name"

or through a pipe:

print "expression" | "command"

Built-in variables

Awk's built-in variables include the field variables: $1, $2, $3, and so on ($0 represents the entire record). They hold the text or values in the individual text-fields in a record.

Other variables include:
  • NR: Keeps a current count of the number of input records.
  • NF: Keeps a count of the number of fields in an input record. The last field in the input record can be designated by $NF.
  • FILENAME: Contains the name of the current input-file.
  • FS: Contains the "field separator" character used to divide fields on the input record. The default, "white space", includes any space and tab characters. FS can be reassigned to another character to change the field separator.
  • RS: Stores the current "record separator" character. Since, by default, an input line is the input record, the default record separator character is a "newline".
  • OFS: Stores the "output field separator", which separates the fields when Awk prints them. The default is a "space" character.
  • ORS: Stores the "output record separator", which separates the output records when Awk prints them. The default is a "newline" character.
  • OFMT: Stores the format for numeric output. The default format is "%.6g".

Variables and syntax

Variable names can use any of the characters [A-Za-z0-9_], with the exception of language keywords. The operators + - * / represent addition, subtraction, multiplication, and division, respectively. For string concatenation
Concatenation
In computer programming, string concatenation is the operation of joining two character strings end-to-end. For example, the strings "snow" and "ball" may be concatenated to give "snowball"...

, simply place two variables (or string constants) next to each other. It is optional to use a space in between if string constants are involved, but two variable names cannot be placed adjacent to each other without having a space in between. String constants are delimited
Delimited
Formats that use delimiter-separated values store two-dimensional arrays of data by separating the values in each row with specific delimiter characters...

 by double quotes. Statements need not end with semicolons. Finally, comments can be added to programs by using # as the first character on a line.

User-defined functions

In a format similar to C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

, function definitions consist of the keyword function, the function name, argument names and the function body. Here is an example of a function.

function add_three (number) {
return number + 3
}

This statement can be invoked as follows:

print add_three(36) # Outputs 39

Functions can have variables that are in the local scope. The names of these are added to the end of the argument list, though values for these should be omitted when calling the function. It is convention to add some whitespace in the argument list before the local variables, in order to indicate where the parameters end and the local variables begin

Hello World

Here is the customary "Hello world" program written in AWK:

BEGIN { print "Hello, world!" }

Note that an explicit exit statement is not needed here; since the only pattern is BEGIN, no command-line arguments are processed.

Print lines longer than 80 characters

Print all lines longer than 80 characters. Note that the default action is to print the current line.

length($0) > 80

Print a count of words

Count words in the input, and print the number of lines, words, and characters (like wc
Wc (Unix)
wc is a command in Unix-like operating systems.The program reads either standard input or a list of files and generates one or more of the following statistics: number of bytes, number of words, and number of lines...

)

{
w += NF
c += length + 1
}
END { print NR, w, c }

As there is no pattern for the first line of the program, every line of input matches by default so the increment actions are executed for every line. Note that w += NF is shorthand for w = w + NF.

Sum last word

{ s += $NF }
END { print s + 0 }

s is incremented by the numeric value of $NF which is the last word on the line as defined by AWK's field separator, by default white-space. NF is the number of fields in the current line, e.g. 4. Since $4 is the value of the fourth field, $NF is the value of the last field in the line regardless of how many fields this line has, or whether it has more or fewer fields than surrounding lines. $ is actually a unary operator with the highest operator precedence. (If the line has no fields then NF is 0, $0 is the whole line, which in this case is empty apart from possible white-space, and so has the numeric value 0

At the end of the input the END pattern matches so s is printed. However, since there may have been no lines of input at all, in which case no value has ever been assigned to s, it will by default be an empty string. Adding zero to a variable is an AWK idiom for coercing it from a string to a numeric value. (Concatenating an empty string is to coerce from a number to a string, e.g. s "". Note, there's no operator to concatenate strings, they're just placed adjacently.) With the coercion the program prints 0 on an empty input, without it an empty line is printed.

Match a range of input lines

$ yes Wikipedia | awk 'NR % 4 1, NR % 4 3 { printf "%6d %s\n", NR, $0 }' | sed 7q
1 Wikipedia
2 Wikipedia
3 Wikipedia
5 Wikipedia
6 Wikipedia
7 Wikipedia
9 Wikipedia
$

The yes
Yes (Unix)
yes is a Unix command, which outputs an affirmative response, or a user-defined string of text continuously until killed.-Description:By itself, the yes command outputs 'y' or whatever is specified as an argument, followed by a newline repeatedly until stopped by the user or otherwise killed; when...

 command repeatedly prints its argument (by default the letter "y") on a line. In this case, we tell the command to print the word "Wikipedia". The action statement prints each line numbered. The printf function emulates the standard C printf
Printf
Printf format string refers to a control parameter used by a class of functions typically associated with some types of programming languages. The format string specifies a method for rendering an arbitrary number of varied data type parameter into a string...

, and works similarly to the print command described above. The pattern to match, however, works as follows: NR is the number of records, typically lines of input, AWK has so far read, i.e. the current line number, starting at 1 for the first line of input. % is the modulo
Modulo operation
In computing, the modulo operation finds the remainder of division of one number by another.Given two positive numbers, and , a modulo n can be thought of as the remainder, on division of a by n...

 operator. NR % 4 1 is true for the first, fifth, ninth, etc., lines of input. Likewise, NR % 4 3 is true for the third, seventh, eleventh, etc., lines of input. The range pattern is false until the first part matches, on line 1, and then remains true up to and including when the second part matches, on line 3. It then stays false until the first part matches again on line 5. The sed
Sed
sed is a Unix utility that parses text and implements a programming language which can apply transformations to such text. It reads input line by line , applying the operation which has been specified via the command line , and then outputs the line. It was developed from 1973 to 1974 as a Unix...

 command is used to print the first 7 lines, to prevent yes running forever. It is equivalent to head -n7 if the head
Head (Unix)
head is a program on Unix and Unix-like systems used to display the first few lines of a text file or piped data. The command syntax is: head [options] <file_name>...

 command is available.

The first part of a range pattern being constantly true, e.g. 1, can be used to start the range at the beginning of input. Similarly, if the second part is constantly false, e.g. 0, the range continues until the end of input:
/^--cut here--$/, 0
prints lines of input from the first line matching the regular expression ^--cut here--$, that is, a line containing only the phrase "--cut here--", to the end.

Calculate word frequencies

Word frequency uses associative array
Associative array
In computer science, an associative array is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection....

s:

BEGIN {
FS="[^a-zA-Z]+"
}
{
for (i=1; i<=NF; i++)
words[tolower($i)]++
}
END {
for (i in words)
print i, words[i]
}

The BEGIN block sets the field separator to any sequence of non-alphabetic characters. Note that separators can be regular expressions. After that, we get to a bare action, which performs the action on every input line. In this case, for every field on the line, we add one to the number of times that word, first converted to lowercase, appears. Finally, in the END block, we print the words with their frequencies. The line
for (i in words)
creates a loop that goes through the array words, setting i to each subscript of the array. This is different from most languages, where such a loop goes through each value in the array. The loop thus prints out each word followed by its frequency count. tolower was an addition to the One True awk (see below) made after the book was published.

Match pattern from command line

This program can be represented in several ways. The first one uses the Bourne shell
Bourne shell
The Bourne shell, or sh, was the default Unix shell of Unix Version 7 and most Unix-like systems continue to have /bin/sh - which will be the Bourne shell, or a symbolic link or hard link to a compatible shell - even when more modern shells are used by most users.Developed by Stephen Bourne at AT&T...

 to make a shell script that does everything. It is the shortest of these methods:

$ cat grepinawk
pattern=$1
shift
awk '/'$pattern'/ { print FILENAME ":" $0 }' "$@"
$

The $pattern in the awk command is not protected by quotes. A pattern by itself in the usual way checks to see if the whole line ($0) matches. FILENAME contains the current filename. awk has no explicit concatenation operator; two adjacent strings concatenate them. $0 expands to the original unchanged input line.

There are alternate ways of writing this. This shell script accesses the environment directly from within awk:

$ cat grepinawk
pattern=$1
shift
awk '$0 ~ ENVIRON["pattern"] { print FILENAME ":" $0 }' "$@"
$

This is a shell script that uses ENVIRON, an array introduced in a newer version of the One True awk after the book was published. The subscript of ENVIRON is the name of an environment variable; its result is the variable's value. This is like the getenv function in various standard libraries and POSIX
POSIX
POSIX , an acronym for "Portable Operating System Interface", is a family of standards specified by the IEEE for maintaining compatibility between operating systems...

. The shell script makes an environment variable pattern containing the first argument, then drops that argument and has awk look for the pattern in each file.

~ checks to see if its left operand matches its right operand; !~ is its inverse. Note that a regular expression is just a string and can be stored in variables.

The next way uses command-line variable assignment, in which an argument to awk can be seen as an assignment to a variable:

$ cat grepinawk
pattern=$1
shift
awk '$0 ~ pattern { print FILENAME ":" $0 }' "pattern=$pattern" "$@"
$

Finally, this is written in pure awk, without help from a shell or without the need to know too much about the implementation of the awk script (as the variable assignment on command line one does), but is a bit lengthy:

BEGIN {
pattern = ARGV[1]
for (i = 1; i < ARGC; i++) # remove first argument
ARGV[i] = ARGV[i + 1]
ARGC--
if (ARGC 1) { # the pattern was the only thing, so force read from standard input (used by book)
ARGC = 2
ARGV[1] = "-"
}
}
$0 ~ pattern { print FILENAME ":" $0 }

The BEGIN is necessary not only to extract the first argument, but also to prevent it from being interpreted as a filename after the BEGIN block ends. ARGC, the number of arguments, is always guaranteed to be ≥1, as ARGV[0] is the name of the command that executed the script, most often the string "awk". Also note that ARGV[ARGC] is the empty string, "". # initiates a comment that expands to the end of the line.

Note the if block. awk only checks to see if it should read from standard input before it runs the command. This means that
awk 'prog'
only works because the fact that there are no filenames is only checked before prog is run! If you explicitly set ARGC to 1 so that there are no arguments, awk will simply quit because it feels there are no more input files. Therefore, you need to explicitly say to read from standard input with the special filename -.

Self-contained AWK scripts

As with many other programming languages, self-contained AWK script can be constructed using the so-called "shebang
Shebang (Unix)
In computing, a shebang is the character sequence consisting of the characters number sign and exclamation point , when it occurs as the first two characters on the first line of a text file...

" syntax.

For example, a UNIX command called hello.awk that prints the string "Hello, world!" may be built by creating a file named hello.awk containing the following lines:

#!/usr/bin/awk -f
BEGIN { print "Hello, world!" }

The -f tells awk that the argument that follows is the file to read the awk program from, which is placed there by the shell when running.

Versions and implementations

AWK was originally written in 1977, and distributed with Version 7 Unix
Version 7 Unix
Seventh Edition Unix, also called Version 7 Unix, Version 7 or just V7, was an important early release of the Unix operating system. V7, released in 1979, was the last Bell Laboratories release to see widespread distribution before the commercialization of Unix by AT&T in the early 1980s...

.

In 1985 its authors started expanding the language, most significantly by adding user-defined functions. The language is described in the book The AWK Programming Language, published 1988, and its implementation was made available in releases of UNIX System V
UNIX System V
Unix System V, commonly abbreviated SysV , is one of the first commercial versions of the Unix operating system. It was originally developed by American Telephone & Telegraph and first released in 1983. Four major versions of System V were released, termed Releases 1, 2, 3 and 4...

. To avoid confusion with the incompatible older version, this version was sometimes known as "new awk" or nawk. This implementation was released under a free software license in 1996, and is still maintained by Brian Kernighan. (see external links below)

Old versions of Unix, such as UNIX/32V
UNIX/32V
UNIX/32V was an early version of the Unix operating system from Bell Laboratories, released in June 1979. 32V was a direct port of the PDP-11 Seventh Edition Unix to the DEC VAX architecture....

, included awkcc, which converted AWK to C. Kernighan wrote a program to turn awk into C++; its state is not known.
  • BWK awk refers to the version by Brian W. Kernighan
    Brian Kernighan
    Brian Wilson Kernighan is a Canadian computer scientist who worked at Bell Labs alongside Unix creators Ken Thompson and Dennis Ritchie and contributed to the development of Unix. He is also coauthor of the AWK and AMPL programming languages. The 'K' of K&R C and the 'K' in AWK both stand for...

    , also known as nawk. It has been dubbed the "One True AWK" because of the use of the term in association with the book that originally described the language and the fact that Kernighan was one of the original authors of AWK. FreeBSD refers to this version as one-true-awk. This version also has features not in the book, such as tolower and ENVIRON that are explained above; see the FIXES file in the source archive for details. This version is used by e.g. FreeBSD
    FreeBSD
    FreeBSD is a free Unix-like operating system descended from AT&T UNIX via BSD UNIX. Although for legal reasons FreeBSD cannot be called “UNIX”, as the direct descendant of BSD UNIX , FreeBSD’s internals and system APIs are UNIX-compliant...

    , NetBSD
    NetBSD
    NetBSD is a freely available open source version of the Berkeley Software Distribution Unix operating system. It was the second open source BSD descendant to be formally released, after 386BSD, and continues to be actively developed. The NetBSD project is primarily focused on high quality design,...

    , OpenBSD
    OpenBSD
    OpenBSD is a Unix-like computer operating system descended from Berkeley Software Distribution , a Unix derivative developed at the University of California, Berkeley. It was forked from NetBSD by project leader Theo de Raadt in late 1995...

     or MacOS X.
  • gawk (GNU
    GNU
    GNU is a Unix-like computer operating system developed by the GNU project, ultimately aiming to be a "complete Unix-compatible software system"...

     awk) is another free software implementation and the only implementation that made serious attempts at implementing Internationalization and localization
    Internationalization and localization
    In computing, internationalization and localization are means of adapting computer software to different languages, regional differences and technical requirements of a target market...

     and TCP/IP networking. It also allows the user to extend the functionality of the program via user-written shared libraries. It was written before the original implementation became freely available, and is still widely used. Many Linux distribution
    Linux distribution
    A Linux distribution is a member of the family of Unix-like operating systems built on top of the Linux kernel. Such distributions are operating systems including a large collection of software applications such as word processors, spreadsheets, media players, and database applications...

    s come with a recent version of gawk and it is widely recognized as the de facto standard implementation in the Linux
    Linux
    Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

     world. gawk version 3.0 was included as awk in FreeBSD
    FreeBSD
    FreeBSD is a free Unix-like operating system descended from AT&T UNIX via BSD UNIX. Although for legal reasons FreeBSD cannot be called “UNIX”, as the direct descendant of BSD UNIX , FreeBSD’s internals and system APIs are UNIX-compliant...

     prior to version 5.0. Subsequent versions of FreeBSD use BWK awk in order to avoid the GPL
    GNU General Public License
    The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....

    , a more restrictive license than the BSD license
    BSD licenses
    BSD licenses are a family of permissive free software licenses. The original license was used for the Berkeley Software Distribution , a Unix-like operating system after which it is named....

    .

  • mawk is a very fast AWK implementation by Mike Brennan based on a byte code interpreter.

  • libmawk is a fork of mawk, allowing applications to embed multiple parallel instances of awk interpreters.

  • awka (whose front end is written on top of the mawk program) is another translator of AWK scripts into C code. When compiled, statically including the author's libawka.a, the resulting executables are considerably sped up and, according to the author's tests, compare very well with other versions of AWK, Perl
    Perl
    Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

    , or Tcl
    Tcl
    Tcl is a scripting language created by John Ousterhout. Originally "born out of frustration", according to the author, with programmers devising their own languages intended to be embedded into applications, Tcl gained acceptance on its own...

    . Small scripts will turn into programs of 160-170 kB.

  • tawk (Thompson AWK) is an AWK compiler
    Compiler
    A compiler is a computer program that transforms source code written in a programming language into another computer language...

     for Solaris (operating system), DOS
    DOS
    DOS, short for "Disk Operating System", is an acronym for several closely related operating systems that dominated the IBM PC compatible market between 1981 and 1995, or until about 2000 if one includes the partially DOS-based Microsoft Windows versions 95, 98, and Millennium Edition.Related...

    , OS/2
    OS/2
    OS/2 is a computer operating system, initially created by Microsoft and IBM, then later developed by IBM exclusively. The name stands for "Operating System/2," because it was introduced as part of the same generation change release as IBM's "Personal System/2 " line of second-generation personal...

    , and Windows
    Microsoft Windows
    Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

    , previously sold by Thompson Automation Software (which has ceased its activities).

  • Jawk is a SourceForge project to implement AWK in Java
    Java (programming language)
    Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

    . Extensions to the language are added to provide access to Java
    Java (programming language)
    Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

     features within AWK scripts (i.e., Java threads, sockets, Collections, etc.).

  • jawk (Josh's Awk) is a post-modern and perly implementation of AWK in Perl, which supports ranges, indexing columns by negative numbers, a perl mode, and more.

  • xgawk is a SourceForge project based on gawk. It extends gawk with dynamically loadable libraries.

  • QSEAWK is an embedded AWK interpreter implementation included in the QSE library that provides embedding API for C
    C (programming language)
    C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

     and C++
    C++
    C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

    .

  • BusyBox
    BusyBox
    BusyBox provides several stripped-down Unix tools in a single executable. It runs in a variety of POSIX environments such as Linux, Android, FreeBSD and others, such as proprietary kernels, although many of the tools it provides are designed to work with interfaces provided by the Linux kernel. It...

     includes a sparsely documented AWK implementation that appears to be complete, written by Dmitry Zakharov. This is a very small implementation suitable for embedded systems.

Books

The book's webpage includes downloads of the current implementation of Awk and links to others. Arnold Robbins maintained the GNU Awk implementation of AWK for more than 10 years. The free GNU Awk manual was also published by O'Reilly in May 2001. Free download of this manual is possible through the following book references.

See also

  • Data transformation
    Data transformation
    In metadata and data warehouse, a data transformation converts data from a source data format into destination data.Data transformation can be divided into two steps:...

  • Event-driven programming
    Event-driven programming
    In computer programming, event-driven programming or event-based programming is a programming paradigm in which the flow of the program is determined by events—i.e., sensor outputs or user actions or messages from other programs or threads.Event-driven programming can also be defined as an...

  • List of Unix programs
  • Procedural programming
    Procedural programming
    Procedural programming can sometimes be used as a synonym for imperative programming , but can also refer to a programming paradigm, derived from structured programming, based upon the concept of the procedure call...

  • sed
    Sed
    sed is a Unix utility that parses text and implements a programming language which can apply transformations to such text. It reads input line by line , applying the operation which has been specified via the command line , and then outputs the line. It was developed from 1973 to 1974 as a Unix...


Further reading

 – Interview with Alfred V. Aho on AWK

External links

  • The Amazing Awk Assembler by Henry Spencer
    Henry Spencer
    Henry Spencer is a Canadian computer programmer and space enthusiast. He wrote "regex", a widely-used software library for regular expressions, and co-wrote C News, a Usenet server program. He also authored The Ten Commandments for C Programmers. He is coauthor, with David Lawrence, of the book...

    .
  • Awk Community Portal
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK