Re: trgm regex index peculiarity - Mailing list pgsql-hackers

From Tom Lane
Subject Re: trgm regex index peculiarity
Date
Msg-id 29409.1396745529@sss.pgh.pa.us
Whole thread Raw
In response to Re: trgm regex index peculiarity  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-hackers
Alexander Korotkov <aekorotkov@gmail.com> writes:
> Next revision of patch is attached. Changes are so:
> 1) Notion "penalty" is used instead of "size".
> 2) We try to reduce total penalty to WISH_TRGM_PENALTY, but restriction is
> MAX_TRGM_COUNT total trigrams count.
> 3) Penalties are assigned to particular color trigram classes. I.e.
> separate penalties for __a, _aa, _a_, aa_. It's based on analysis of
> trigram frequencies in Oscar Wilde writings. We can end up with different
> numbers, but I don't think they will be dramatically different.

Committed with cosmetic improvements (adjusting the comments mostly).

The new whitespace penalties look reasonably sane to me.  I wonder though
if WISH_TRGM_PENALTY is too small --- it seems like this code will tend to
select many fewer trigrams than the old code did.  What testing did you do
that led you to select the specific value of 16?
        regards, tom lane



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Another assert failure from no-palloc-in-critical-sections
Next
From: Amit Kapila
Date:
Subject: Re: [BUG FIX] Compare returned value by socket() against PGINVALID_SOCKET instead of < 0