Re: Future of our regular expression code - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Future of our regular expression code
Date
Msg-id 20120220033805.GM17355@tamriel.snowman.net
Whole thread Raw
In response to Re: Future of our regular expression code  (Greg Stark <stark@mit.edu>)
Responses Re: Future of our regular expression code  (Jay Levitt <jay.levitt@gmail.com>)
List pgsql-hackers
Greg,

* Greg Stark (stark@mit.edu) wrote:
> I can't see how your first claim that the Spencer code is worth
> keeping around because it's just a superior regex implementation has
> much force unless we can accomplish the latter. If the library can be
> split off into a standalone library then it might have some longevity.
> But if we're the only ones maintaining it then it's just prolonging
> the inevitable. I can't see Postgres having its own special brand of
> regexes that nobody else uses being an acceptable situation forever.
>
> One thing that concerns me more and more is that most sufficiently
> powerful regex implementations are susceptible to DOS attacks. A
> database application is quite likely to allow users to decide directly
> or indirectly what regexes to apply and it can be hard to predict
> which regexes will cause which implementations to explode its cpu or
> memory requirements. We need a library that can be used to defend
> against malicious regexes and i suspect neither Perl's nor Python's
> library will suffice for this.

Alright, I'll bite..  Which existing regexp implementation that's well
written, well maintained, and which is well protected against malicious
regexes should we be considering then?

While we might not be able to formalize the regex code as a stand-alone
library, my bet would be that the Tcl folks (and anyone else using this
code..) will be paying attention to the changes and improvments we're
making.  Sure, it'd be easier for them to incorporate those changes if
they could just pull in a new version of the library, but we can't all
have our cake and eat it too.
Thanks,
    Stephen

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Potential reference miscounts and segfaults in plpython.c
Next
From: Tom Lane
Date:
Subject: Re: Future of our regular expression code