Thread: [COMMITTERS] pgsql: Fix contrib/pg_trgm's extraction of trigrams from regularexpres
[COMMITTERS] pgsql: Fix contrib/pg_trgm's extraction of trigrams from regularexpres
From
Tom Lane
Date:
Fix contrib/pg_trgm's extraction of trigrams from regular expressions. The logic for removing excess trigrams from the result was faulty. It intends to avoid merging the initial and final states of the NFA, which is necessary, but in testing whether removal of a specific trigram would cause that, it failed to consider the combined effects of all the state merges that that trigram's removal would cause. This could result in a broken final graph that would never match anything, leading to GIN or GiST indexscans not finding anything. To fix, add a "tentParent" field that is used only within this loop, and set it to show state merges that we are tentatively going to do. While examining a particular arc, we must chase up through tentParent links as well as regular parent links (the former can only appear atop the latter), and we must account for state init/fin flag merges that haven't actually been done yet. To simplify the latter, combine the separate init and fin bool fields into a bitmap flags field. I also chose to get rid of the "children" state list, which seems entirely inessential. Per bug #14563 from Alexey Isayko, which the added test cases are based on. Back-patch to 9.3 where this code was added. Report: https://postgr.es/m/20170222111446.1256.67547@wrigleys.postgresql.org Discussion: https://postgr.es/m/8816.1487787594@sss.pgh.pa.us Branch ------ REL9_5_STABLE Details ------- http://git.postgresql.org/pg/commitdiff/513c9f9de2a95f89150e8191a9a0eddc40403bf0 Modified Files -------------- contrib/pg_trgm/expected/pg_trgm.out | 18 ++++++ contrib/pg_trgm/sql/pg_trgm.sql | 3 + contrib/pg_trgm/trgm_regexp.c | 107 +++++++++++++++++++++++------------ 3 files changed, 91 insertions(+), 37 deletions(-)