Comparison of regular expression engines
Encyclopedia

Libraries

List of regular expression libraries
Official website Programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

Software license
Boost.Regex  Boost C++ Libraries C++ Boost Software License
Boost.Xpressive Boost C++ Libraries C++ Boost Software License
CL-PPCRE Edi Weitz Common Lisp
Common Lisp
Common Lisp, commonly abbreviated CL, is a dialect of the Lisp programming language, published in ANSI standard document ANSI INCITS 226-1994 , . From the ANSI Common Lisp standard the Common Lisp HyperSpec has been derived for use with web browsers...

BSD
BSD licenses
BSD licenses are a family of permissive free software licenses. The original license was used for the Berkeley Software Distribution , a Unix-like operating system after which it is named....

cppre Jeff Stuart C++ GPL
DEELX RegExLab C++ "free for personal use and commercial use"
FREJ Fuzzy Regular Expressions for Java Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

LGPL
GLib
GLib
GLib is a cross-platform software utility library that began as part of the GTK+ project. However, before releasing version 2 of GTK+, the project's developers decided to separate non-GUI-specific code from the GTK+ platform, thus creating GLib as a separate product...

/GRegex
Marco Barisione C LGPL
GRETA
Greta
The name Greta is derived from the name Margaret, which comes from the Greek word margarites or "pearl."Greta may refer to:-People:* Gréta Arn , professional tennis player* Greta Bösel , executed Nazi concentration camp guard...

Microsoft Research C++
ICU
International Components for Unicode
International Components for Unicode is an open source project of mature C/C++ and Java libraries for Unicode support, software internationalization and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all...

International Components for Unicode C/C++/Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

ICU license
Jakarta/Regexp The Apache Jakarta Project Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

Apache License
Apache License
The Apache License is a copyfree free software license authored by the Apache Software Foundation . The Apache License requires preservation of the copyright notice and disclaimer....

JRegex JRegex Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

BSD
Oniguruma
Oniguruma
by K. Kosako is a BSD licensed regular expression library that supports a variety of character encodings. The Ruby programming language, since version 1.9, as well as PHP's multi-byte string module , use Oniguruma as their regular expression engine. It is also used in products such as Tera Term,...

Kosako C BSD
BSD licenses
BSD licenses are a family of permissive free software licenses. The original license was used for the Berkeley Software Distribution , a Unix-like operating system after which it is named....

Pattwo Stevesoft Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

 (compatible with Java 1.0)
LGPL
GNU Lesser General Public License
The GNU Lesser General Public License or LGPL is a free software license published by the Free Software Foundation . It was designed as a compromise between the strong-copyleft GNU General Public License or GPL and permissive licenses such as the BSD licenses and the MIT License...

PCRE
Perl Compatible Regular Expression
Perl Compatible Regular Expressions is a regular expression C library inspired by Perl's external interface, written by Philip Hazel. PCRE's syntax is much more powerful and flexible than either of the POSIX regular expression flavors and many classic regular expression libraries...

Philip Hazel C/C++ BSD
BSD licenses
BSD licenses are a family of permissive free software licenses. The original license was used for the Berkeley Software Distribution , a Unix-like operating system after which it is named....

Qt
Qt (toolkit)
Qt is a cross-platform application framework that is widely used for developing application software with a graphical user interface , and also used for developing non-GUI programs such as command-line tools and consoles for servers...

/QRegExp
http://doc.trolltech.com/4.7/qregexp.html C++ Qt GNU GPL v. 3.0 / Qt GNU LGPL v. 2.1 / Qt Commercial
regex - Henry Spencer
Henry Spencer
Henry Spencer is a Canadian computer programmer and space enthusiast. He wrote "regex", a widely-used software library for regular expressions, and co-wrote C News, a Usenet server program. He also authored The Ten Commandments for C Programmers. He is coauthor, with David Lawrence, of the book...

's regular expression libraries
ArgList C BSD
BSD licenses
BSD licenses are a family of permissive free software licenses. The original license was used for the Berkeley Software Distribution , a Unix-like operating system after which it is named....

re2 Google Code C++ BSD
BSD licenses
BSD licenses are a family of permissive free software licenses. The original license was used for the Berkeley Software Distribution , a Unix-like operating system after which it is named....

TRE
TRE (computing)
TRE is an open-source library for texts search, which works like regular expression engine with ability of fuzzy string searching. It is developed by Ville Laurikari under 2-clause BSD-like license....

 
Ville Laurikari C BSD
BSD licenses
BSD licenses are a family of permissive free software licenses. The original license was used for the Berkeley Software Distribution , a Unix-like operating system after which it is named....

TPerlRegEx TPerlRegEx VCL Component Object Pascal
Object Pascal
Object Pascal refers to a branch of object-oriented derivatives of Pascal, mostly known as the primary programming language of Embarcadero Delphi.-Early history at Apple:...

MPLv1.1
Mozilla Public License
The Mozilla Public License is a free and open source software license. Version 1.0 was developed by Mitchell Baker when she worked as a lawyer at Netscape Communications Corporation and version 1.1 at the Mozilla Foundation...

TRegExpr RegExp Studio Object Pascal
Object Pascal
Object Pascal refers to a branch of object-oriented derivatives of Pascal, mostly known as the primary programming language of Embarcadero Delphi.-Early history at Apple:...

double licensed: Freeware
Freeware
Freeware is computer software that is available for use at no cost or for an optional fee, but usually with one or more restricted usage rights. Freeware is in contrast to commercial software, which is typically sold for profit, but might be distributed for a business or commercial purpose in the...

 or LGPL with static linking exception
RGX RGX C++ based component library P6R license


Languages

List of languages and frameworks coming with regular expression support
Language Official website Software license Remarks
.NET
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...

MSDN Proprietary
C++
C++0x
C++11, also formerly known as C++0x, is the name of the most recent iteration of the C++ programming language, replacing C++03, approved by the ISO as of 12 August 2011...

since ISO14822:2011(e)
D
D (programming language)
The D programming language is an object-oriented, imperative, multi-paradigm, system programming language created by Walter Bright of Digital Mars. It originated as a re-engineering of C++, but even though it is mainly influenced by that language, it is not a variant of C++...

D Boost Software License
Boost Software License
The Boost Software License is an open-source license used by the Boost C++ Libraries. It is also a popular license for a significant number of other open source C++ projects...

Go
Go (programming language)
Go is a compiled, garbage-collected, concurrent programming language developed by Google Inc.The initial design of Go was started in September 2007 by Robert Griesemer, Rob Pike, and Ken Thompson. Go was officially announced in November 2009. In May 2010, Rob Pike publicly stated that Go was being...

Golang.org BSD-style license
Haskell
Haskell (programming language)
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...

Haskell.org BSD3 Not included in the language report; nor in GHC's Hierarchical Libraries
Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

Java GNU General Public License
GNU General Public License
The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....

REs are written as strings in source code (all backslashes must be doubled, hurting readability).
JavaScript
JavaScript
JavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....

/ECMAScript
ECMAScript
ECMAScript is the scripting language standardized by Ecma International in the ECMA-262 specification and ISO/IEC 16262. The language is widely used for client-side scripting on the web, in the form of several well-known dialects such as JavaScript, JScript, and ActionScript.- History :JavaScript...

Limited but REs are first-class citizens of the language with a specific /.../mod syntax.
Lua Lua.org MIT License
MIT License
The MIT License is a free software license originating at the Massachusetts Institute of Technology . It is a permissive license, meaning that it permits reuse within proprietary software provided all copies of the licensed software include a copy of the MIT License terms...

Uses a simplified, limited dialect. Can be bound to a more powerful library, like PCRE or an alternative parser like LPeg.
Object Pascal
Object Pascal
Object Pascal refers to a branch of object-oriented derivatives of Pascal, mostly known as the primary programming language of Embarcadero Delphi.-Early history at Apple:...

 (Free Pascal
Free Pascal
Free Pascal Compiler is a free Pascal and Object Pascal compiler.In addition to its own Object Pascal dialect, Free Pascal supports, to varying degrees, the dialects of several other compilers, including those of Turbo Pascal, Delphi, and some historical Macintosh compilers...

)
www.freepascal.org LGPL with static linking exception Free Pascal 2.6+ ships with TRegExpr from Sorokin as well as with 2 other regular expression libraries. See http://wiki.lazarus.freepascal.org/Regexpr
Objective-C
Objective-C
Objective-C is a reflective, object-oriented programming language that adds Smalltalk-style messaging to the C programming language.Today, it is used primarily on Apple's Mac OS X and iOS: two environments derived from the OpenStep standard, though not compliant with it...

 (Cocoa
Cocoa (API)
Cocoa is Apple's native object-oriented application programming interface for the Mac OS X operating system and—along with the Cocoa Touch extension for gesture recognition and animation—for applications for the iOS operating system, used on Apple devices such as the iPhone, the iPod Touch, and...

 on iOS only)
Apple Proprietary
Proprietary software
Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary...

Currently only available on iOS 4+
OCaml Caml LGPL
Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

Perl.com Artistic License
Artistic License
The Artistic License refers most commonly to the original Artistic License , a software license used for certain free and open source software packages, most notably the standard Perl implementation and most CPAN modules, which are dual-licensed under the Artistic License and the GNU General Public...

 or the GNU General Public License
GNU General Public License
The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....

Full, central part of the language.
PHP
PHP
PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...

PHP.net PHP License
PHP License
The PHP License is the software license under which the PHP scripting language is released. The PHP License is a non-copyleft free software license according to the Free Software Foundation and an open source license according to the Open Source Initiative...

Has two implementations, with PCRE being the more efficient (speed, functionalities).
Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

python.org Python Software Foundation License
Python Software Foundation License
The Python Software Foundation License is a BSD-style, permissive free software license which is compatible with the GNU General Public License . Its primary use is for distribution of the Python project software...

Ruby
Ruby (programming language)
Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...

ruby-doc.org GNU Library General Public License Ruby 1.8 and 1.9 use different engines; Ruby 1.9 integrates Oniguruma.
SAP ABAP SAP.com
Tcl
Tcl
Tcl is a scripting language created by John Ousterhout. Originally "born out of frustration", according to the author, with programmers devising their own languages intended to be embedded into applications, Tcl gained acceptance on its own...

 8.4
tcl.tk Tcl/Tk License
(Permissive, similar to BSD)
ActionScript
ActionScript
ActionScript is an object-oriented language originally developed by Macromedia Inc. . It is a dialect of ECMAScript , and is used primarily for the development of websites and software targeting the Adobe Flash Player platform, used on Web pages in the form of...

 3


Language features

NOTE: An application using a library for regular expression support does not necessarily offer the full set of features of the library, e.g. GNU Grep which uses PCRE does not offer lookahead
support, though PCRE does.

Part 1

Language feature comparison (part 1)
"+" quantifier Negated character classes Recursion Lookahead Lookbehind >9 indexable captures
Boost.Regex
Boost.Xpressive
CL-PPCRE
EmEditor
EmEditor
EmEditor is a lightweight extensible commercial text editor for Microsoft Windows. It was developed by Yutaka Emura of Emurasoft, Inc. EmEditor includes full Unicode support, 32-bit and 64-bit builds, syntax highlighting, find and replace with regular expressions, vertical selection editing,...

FREJ
GLib
GLib
GLib is a cross-platform software utility library that began as part of the GTK+ project. However, before releasing version 2 of GTK+, the project's developers decided to separate non-GUI-specific code from the GTK+ platform, thus creating GLib as a separate product...

/GRegex
GNU Grep
Grep
grep is a command-line text-search utility originally written for Unix. The name comes from the ed command g/re/p...

Haskell
Haskell (programming language)
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...

Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

ICU
International Components for Unicode
International Components for Unicode is an open source project of mature C/C++ and Java libraries for Unicode support, software internationalization and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all...

 Regex
JGsoft
.NET
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...

OCaml
OmniOutliner
OmniOutliner
OmniOutliner is commercial outlining software for Mac OS X produced by The Omni Group. OmniOutliner has most features of a conventional outliner, allowing the user to create nested lists of topics for almost any purpose, but has additional features extending its functionality beyond simple...

 3.6.2
PCRE
Perl Compatible Regular Expression
Perl Compatible Regular Expressions is a regular expression C library inspired by Perl's external interface, written by Philip Hazel. PCRE's syntax is much more powerful and flexible than either of the POSIX regular expression flavors and many classic regular expression libraries...

Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

PHP
PHP
PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...

Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

Qt
Qt (toolkit)
Qt is a cross-platform application framework that is widely used for developing application software with a graphical user interface , and also used for developing non-GUI programs such as command-line tools and consoles for servers...

/QRegExp
re2
Ruby
Ruby (programming language)
Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...

TRE
TRE (computing)
TRE is an open-source library for texts search, which works like regular expression engine with ability of fuzzy string searching. It is developed by Ville Laurikari under 2-clause BSD-like license....

Vim
Vim (text editor)
Vim is a text editor written by Bram Moolenaar and first released publicly in 1991. Based on the vi editor common to Unix-like systems, Vim is designed for use both from a command line interface and as a standalone application in a graphical user interface...

 
RGX
TRegExpr


Part 2

Language feature comparison (part 2)
Conditionals Comments Embedded code Fuzzy matching Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 property support http://www.unicode.org/reports/tr18/
Boost.Regex
Boost.Xpressive
CL-PPCRE
EmEditor
EmEditor
EmEditor is a lightweight extensible commercial text editor for Microsoft Windows. It was developed by Yutaka Emura of Emurasoft, Inc. EmEditor includes full Unicode support, 32-bit and 64-bit builds, syntax highlighting, find and replace with regular expressions, vertical selection editing,...

FREJ
GLib
GLib
GLib is a cross-platform software utility library that began as part of the GTK+ project. However, before releasing version 2 of GTK+, the project's developers decided to separate non-GUI-specific code from the GTK+ platform, thus creating GLib as a separate product...

/GRegex
GNU Grep
Grep
grep is a command-line text-search utility originally written for Unix. The name comes from the ed command g/re/p...

Haskell
Haskell (programming language)
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...

Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

ICU
International Components for Unicode
International Components for Unicode is an open source project of mature C/C++ and Java libraries for Unicode support, software internationalization and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all...

 Regex
JGsoft
.NET
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...

OCaml
OmniOutliner
OmniOutliner
OmniOutliner is commercial outlining software for Mac OS X produced by The Omni Group. OmniOutliner has most features of a conventional outliner, allowing the user to create nested lists of topics for almost any purpose, but has additional features extending its functionality beyond simple...

 3.6.2
PCRE
Perl Compatible Regular Expression
Perl Compatible Regular Expressions is a regular expression C library inspired by Perl's external interface, written by Philip Hazel. PCRE's syntax is much more powerful and flexible than either of the POSIX regular expression flavors and many classic regular expression libraries...

Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

PHP
PHP
PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...

Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

Qt
Qt (toolkit)
Qt is a cross-platform application framework that is widely used for developing application software with a graphical user interface , and also used for developing non-GUI programs such as command-line tools and consoles for servers...

/QRegExp
re2 ?
Ruby
Ruby (programming language)
Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...

TRE
TRE (computing)
TRE is an open-source library for texts search, which works like regular expression engine with ability of fuzzy string searching. It is developed by Ville Laurikari under 2-clause BSD-like license....

Vim
Vim (text editor)
Vim is a text editor written by Bram Moolenaar and first released publicly in 1991. Based on the vi editor common to Unix-like systems, Vim is designed for use both from a command line interface and as a standalone application in a graphical user interface...

 
RGX


API features

API feature comparison
Native UTF-16 support Native UTF-8
UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...

 support
Non-linear input support Dot-matches-newline option Anchor-matches-newline option
Boost.Regex
GLib
GLib
GLib is a cross-platform software utility library that began as part of the GTK+ project. However, before releasing version 2 of GTK+, the project's developers decided to separate non-GUI-specific code from the GTK+ platform, thus creating GLib as a separate product...

/GRegex
ICU
International Components for Unicode
International Components for Unicode is an open source project of mature C/C++ and Java libraries for Unicode support, software internationalization and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all...

 Regex
Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

.NET
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...

PCRE
Perl Compatible Regular Expression
Perl Compatible Regular Expressions is a regular expression C library inspired by Perl's external interface, written by Philip Hazel. PCRE's syntax is much more powerful and flexible than either of the POSIX regular expression flavors and many classic regular expression libraries...

Qt
Qt (toolkit)
Qt is a cross-platform application framework that is widely used for developing application software with a graphical user interface , and also used for developing non-GUI programs such as command-line tools and consoles for servers...

/QRegExp
TRE
TRE (computing)
TRE is an open-source library for texts search, which works like regular expression engine with ability of fuzzy string searching. It is developed by Ville Laurikari under 2-clause BSD-like license....

RGX


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK