Re: Future of our regular expression code - Mailing list pgsql-hackers
From | Billy Earney |
---|---|
Subject | Re: Future of our regular expression code |
Date | |
Msg-id | CAB1ii-f83hQvC7mpbQQa5UuuvYdgCSpw6E1+wXghXEzWf=_YZg@mail.gmail.com Whole thread Raw |
In response to | Re: Future of our regular expression code (Jay Levitt <jay.levitt@gmail.com>) |
Responses |
Re: Future of our regular expression code
|
List | pgsql-hackers |
Jay,<br /><br /> Good links, and I've also looked at a few others with benchmarks. I believe most of the benchmarks aredone before PCRE implemented jit. I haven't found a benchmark with jit enabled, so I'm not sure if it will make a difference. Also I'm not sure how accurately the benchmarks will show how they will perform in an RDBMS environment. Theoptimizer probably is a very important variable in many complex queries. I'm leaning towards trying to implement RE2and PCRE and running some benchmarks to see which performs best. <br /><br /> Also would it be possible to set a sessionvariable (lets say PGREGEXTYPE) and set it to ARE (current alg), RE2, or PCRE, that way users could choose whichimplementation they want (unless we find a single implementation that beats the others in almost all categories)? Oris this a bad idea?<br /><br /> Just a thought.<br /><br /><br /><div class="gmail_quote">On Mon, Feb 20, 2012 at 12:09AM, Jay Levitt <span dir="ltr"><<a href="mailto:jay.levitt@gmail.com">jay.levitt@gmail.com</a>></span> wrote:<br/><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">StephenFrost wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Alright, I'll bite.. Which existing regexp implementation that's well<br /> written, well maintained,and which is well protected against malicious<br /> regexes should we be considering then?<br /></blockquote><br/></div> FWIW, there's a benchmark here that compares a number of regexp engines, including PCRE, TRE andRuss Cox's RE2:<br /><br /><a href="http://lh3lh3.users.sourceforge.net/reb.shtml" target="_blank">http://lh3lh3.users.<u></u>sourceforge.net/reb.shtml</a><br/><br /> The fastest backtracking-style engineseems to be oniguruma, which is native to Ruby 1.9 and thus not only supports Unicode but I'd bet performs pretty wellon it, on account of it's developed in Japan. But it goes pathological on regexen containing '|'; the only safe choiceamong PCRE-style engines is RE2, but of course that doesn't support backreferences.<br /><br /> Russ's page on re2(<a href="http://code.google.com/p/re2/" target="_blank">http://code.google.com/p/re2/</a><u></u>) says:<br /><br /> "Ifyou absolutely need backreferences and generalized assertions, then RE2 is not for you, but you might be interested inirregexp, Google Chrome's regular expression engine."<br /><br /> That's here:<br /><br /><a href="http://blog.chromium.org/2009/02/irregexp-google-chromes-new-regexp.html" target="_blank">http://blog.chromium.org/2009/<u></u>02/irregexp-google-chromes-<u></u>new-regexp.html</a><br/><br /> Sadly,it's in Javascript. Seems like if you need a safe, performant regexp implementation, your choice is (a) finish PLv8and support it on all platforms, or (b) add backreferences to RE2 and precompile it to C with Comeau (if that's stillaround), or...<span class="HOEnZb"><font color="#888888"><br /><br /> Jay</font></span><div class="HOEnZb"><div class="h5"><br/><br /> -- <br /> Sent via pgsql-hackers mailing list (<a href="mailto:pgsql-hackers@postgresql.org" target="_blank">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your subscription:<br /><a href="http://www.postgresql.org/mailpref/pgsql-hackers" target="_blank">http://www.postgresql.org/<u></u>mailpref/pgsql-hackers</a><br/></div></div></blockquote></div><br />
pgsql-hackers by date: