Thread: pgsql: Avoid assertion due to disconnected NFA sub-graphs in regex pars

Avoid assertion due to disconnected NFA sub-graphs in regex parsing.

In commit 08c0d6ad6 which introduced "rainbow" arcs in regex NFAs,
I didn't think terribly hard about what to do when creating the color
complement of a rainbow arc.  Clearly, the complement cannot match any
characters, and I took the easy way out by just not building any arcs
at all in the complement arc set.  That mostly works, but Nikolay
Shaplov found a case where it doesn't: if we decide to delete that
sub-NFA later because it's inside a "{0}" quantifier, delsub()
suffered an assertion failure.  That's because delsub() relies on
the target sub-NFA being fully connected.  That was always true
before, and the best fix seems to be to restore that property.
Hence, invent a new arc type CANTMATCH that can be generated in
place of an empty color complement, and drop it again later when we
start NFA optimization.  (At that point we don't need to do delsub()
any more, and besides there are other cases where NFA optimization can
lead to disconnected subgraphs.)

It appears that this bug has no consequences in a non-assert-enabled
build: there will be some transiently leaked NFA states/arcs, but
they'll get cleaned up eventually.  Still, we don't like assertion
failures, so back-patch to v14 where rainbow arcs were introduced.

Per bug #18708 from Nikolay Shaplov.

Discussion: https://postgr.es/m/18708-f94f2599c9d2c005@postgresql.org

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/b69bdcee9c9cfd8550c4e847d035f441fcee7d01

Modified Files
--------------
src/backend/regex/regc_color.c                     | 18 +++++++++-
src/backend/regex/regc_nfa.c                       | 40 ++++++++++++++++++++++
src/backend/regex/regcomp.c                        |  3 ++
src/include/regex/regguts.h                        |  2 ++
.../modules/test_regex/expected/test_regex.out     | 14 ++++++++
src/test/modules/test_regex/sql/test_regex.sql     |  3 ++
6 files changed, 79 insertions(+), 1 deletion(-)