Re: Future of our regular expression code - Mailing list pgsql-hackers

From Jay Levitt
Subject Re: Future of our regular expression code
Date
Msg-id 4F41E39B.8010502@gmail.com
Whole thread Raw
In response to Re: Future of our regular expression code  (Stephen Frost <sfrost@snowman.net>)
Responses Re: Future of our regular expression code  (Billy Earney <billy.earney@gmail.com>)
List pgsql-hackers
Stephen Frost wrote:
> Alright, I'll bite..  Which existing regexp implementation that's well
> written, well maintained, and which is well protected against malicious
> regexes should we be considering then?

FWIW, there's a benchmark here that compares a number of regexp engines, 
including PCRE, TRE and Russ Cox's RE2:

http://lh3lh3.users.sourceforge.net/reb.shtml

The fastest backtracking-style engine seems to be oniguruma, which is native 
to Ruby 1.9 and thus not only supports Unicode but I'd bet performs pretty 
well on it, on account of it's developed in Japan.  But it goes pathological 
on regexen containing '|'; the only safe choice among PCRE-style engines is 
RE2, but of course that doesn't support backreferences.

Russ's page on re2 (http://code.google.com/p/re2/) says:

"If you absolutely need backreferences and generalized assertions, then RE2 
is not for you, but you might be interested in irregexp, Google Chrome's 
regular expression engine."

That's here:

http://blog.chromium.org/2009/02/irregexp-google-chromes-new-regexp.html

Sadly, it's in Javascript.  Seems like if you need a safe, performant regexp 
implementation, your choice is (a) finish PLv8 and support it on all 
platforms, or (b) add backreferences to RE2 and precompile it to C with 
Comeau (if that's still around), or...

Jay


pgsql-hackers by date:

Previous
From: Don Baccus
Date:
Subject: Re: leakproof
Next
From: Robert Haas
Date:
Subject: Re: leakproof