Re: Future of our regular expression code - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Future of our regular expression code
Date
Msg-id CAM-w4HN1abmWjaPD7i0jqBYC2FOiq--W=f=QdCkggfttWGnH3g@mail.gmail.com
Whole thread Raw
In response to Future of our regular expression code  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Future of our regular expression code
Re: Future of our regular expression code
Re: Future of our regular expression code
List pgsql-hackers
On Sat, Feb 18, 2012 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>  A larger point is that it'd be a real shame
> for the Spencer regex engine to die off, because it is in fact one of
> the best pieces of regex technology on the planet.
...
> Another possible long-term answer is to finish the work Henry never did,
> that is make the code into a standalone library.  That would make it
> available to more projects and perhaps attract other people to help
> maintain it.  However, that looks like a lot of work too, with distant
> and uncertain payoff.

I can't see how your first claim that the Spencer code is worth
keeping around because it's just a superior regex implementation has
much force unless we can accomplish the latter. If the library can be
split off into a standalone library then it might have some longevity.
But if we're the only ones maintaining it then it's just prolonging
the inevitable. I can't see Postgres having its own special brand of
regexes that nobody else uses being an acceptable situation forever.

One thing that concerns me more and more is that most sufficiently
powerful regex implementations are susceptible to DOS attacks. A
database application is quite likely to allow users to decide directly
or indirectly what regexes to apply and it can be hard to predict
which regexes will cause which implementations to explode its cpu or
memory requirements. We need a library that can be used to defend
against malicious regexes and i suspect neither Perl's nor Python's
library will suffice for this.

--
greg


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: leakproof
Next
From: Tom Lane
Date:
Subject: Re: Potential reference miscounts and segfaults in plpython.c