Re: Some regular-expression performance hacking - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Some regular-expression performance hacking
Date
Msg-id 1929732.1613321140@sss.pgh.pa.us
Whole thread Raw
In response to Re: Some regular-expression performance hacking  ("Joel Jacobson" <joel@compiler.org>)
List pgsql-hackers
"Joel Jacobson" <joel@compiler.org> writes:
> I've successfully tested both patches against the 1.5M regexes-in-the-wild dataset.
> Out of the 1489489 (pattern, text string) pairs tested,
> there was only one single deviation:
> This 100577 bytes big regex (pattern_id = 207811)...
> ...
> ...previously raised...
>     error invalid regular expression: regular expression is too complex
> ...but now goes through:

> Nice. The patched regex engine is apparently capable of handling even more complex regexes than before.

Yeah.  There are various limitations that can lead to REG_ETOOBIG, but the
main ones are "too many states" and "too many arcs".  The RAINBOW change
directly reduces the number of arcs and thus makes larger regexes feasible.
I'm sure it's coincidental that the one such example you captured happens
to be fixed by this change, but hey I'll take it.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: Re: pg_cryptohash_final possible out-of-bounds access (per Coverity)
Next
From: Dave Cramer
Date:
Subject: Re: Extensibility of the PostgreSQL wire protocol