Thread: Supporting = operator in gin/gist_trgm_ops

Supporting = operator in gin/gist_trgm_ops

From

Julien Rouhaud

Date:

25 October 2020, 11:32:29

Hello,

A french user recently complained that with an index created using
gin_trgm_ops (or gist_trgm_ops), you can use the index with a clause
like

col LIKE 'something'

but not

col = 'something'

even though both clauses are technically identical.  That's clearly
not a high priority thing to support, but looking at the code it seems
to me that this could be achieved quite simply: just adding a new
operator = in the opclass, with an operator strategy number that falls
back doing exactly what LikeStrategyNumber is doing and that's it.
There shouldn't be any wrong results, even using wildcards as the
recheck will remove any incorrect one.

Did I miss something? And if not would such a patch be welcome?

Re: Supporting = operator in gin/gist_trgm_ops

From

Tom Lane

Date:

25 October 2020, 21:03:21

Julien Rouhaud <rjuju123@gmail.com> writes:
> A french user recently complained that with an index created using
> gin_trgm_ops (or gist_trgm_ops), you can use the index with a clause
> like
> col LIKE 'something'
> but not
> col = 'something'

Huh, I'd supposed we did that already.

> even though both clauses are technically identical.  That's clearly
> not a high priority thing to support, but looking at the code it seems
> to me that this could be achieved quite simply: just adding a new
> operator = in the opclass, with an operator strategy number that falls
> back doing exactly what LikeStrategyNumber is doing and that's it.
> There shouldn't be any wrong results, even using wildcards as the
> recheck will remove any incorrect one.

I think you may be overoptimistic about being able to use the identical
code path without regard for LIKE wildcards; but certainly it should be
possible to do this with not a lot of new code.  +1.

            regards, tom lane

Re: Supporting = operator in gin/gist_trgm_ops

From

Julien Rouhaud

Date:

26 October 2020, 04:02:46

On Mon, Oct 26, 2020 at 5:03 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Julien Rouhaud <rjuju123@gmail.com> writes:
> > A french user recently complained that with an index created using
> > gin_trgm_ops (or gist_trgm_ops), you can use the index with a clause
> > like
> > col LIKE 'something'
> > but not
> > col = 'something'
>
> Huh, I'd supposed we did that already.
>
> > even though both clauses are technically identical.  That's clearly
> > not a high priority thing to support, but looking at the code it seems
> > to me that this could be achieved quite simply: just adding a new
> > operator = in the opclass, with an operator strategy number that falls
> > back doing exactly what LikeStrategyNumber is doing and that's it.
> > There shouldn't be any wrong results, even using wildcards as the
> > recheck will remove any incorrect one.
>
> I think you may be overoptimistic about being able to use the identical
> code path without regard for LIKE wildcards; but certainly it should be
> possible to do this with not a lot of new code.  +1.

Well, that's what I was thinking too, but I tried all the possible
wildcard combinations I could think of and I couldn't find any case
yielding wrong results.  As far as I can see the index scans return at
least all the required rows, and all extraneous rows are correctly
removed either by heap recheck or index recheck.

I'm attaching a patch POC pach with regression tests covering those
combinations.  I also found a typo in the 1.4--1.5 pg_trgm upgrade
script, so I'm also attaching a patch for that.

Attachment

Re: Supporting = operator in gin/gist_trgm_ops

From

Julien Rouhaud

Date:

26 October 2020, 04:10:12

On Mon, Oct 26, 2020 at 12:02 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Mon, Oct 26, 2020 at 5:03 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Julien Rouhaud <rjuju123@gmail.com> writes:
> > > A french user recently complained that with an index created using
> > > gin_trgm_ops (or gist_trgm_ops), you can use the index with a clause
> > > like
> > > col LIKE 'something'
> > > but not
> > > col = 'something'
> >
> > Huh, I'd supposed we did that already.
> >
> > > even though both clauses are technically identical.  That's clearly
> > > not a high priority thing to support, but looking at the code it seems
> > > to me that this could be achieved quite simply: just adding a new
> > > operator = in the opclass, with an operator strategy number that falls
> > > back doing exactly what LikeStrategyNumber is doing and that's it.
> > > There shouldn't be any wrong results, even using wildcards as the
> > > recheck will remove any incorrect one.
> >
> > I think you may be overoptimistic about being able to use the identical
> > code path without regard for LIKE wildcards; but certainly it should be
> > possible to do this with not a lot of new code.  +1.
>
> Well, that's what I was thinking too, but I tried all the possible
> wildcard combinations I could think of and I couldn't find any case
> yielding wrong results.  As far as I can see the index scans return at
> least all the required rows, and all extraneous rows are correctly
> removed either by heap recheck or index recheck.
>
> I'm attaching a patch POC pach with regression tests covering those
> combinations.  I also found a typo in the 1.4--1.5 pg_trgm upgrade
> script, so I'm also attaching a patch for that.

Oops, I forgot to git-add the 1.5--1.6.sql upgrade script in the previous patch.

Attachment

Re: Supporting = operator in gin/gist_trgm_ops

From

Tom Lane

Date:

26 October 2020, 04:19:48

Julien Rouhaud <rjuju123@gmail.com> writes:
> On Mon, Oct 26, 2020 at 5:03 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think you may be overoptimistic about being able to use the identical
>> code path without regard for LIKE wildcards; but certainly it should be
>> possible to do this with not a lot of new code.  +1.

> Well, that's what I was thinking too, but I tried all the possible
> wildcard combinations I could think of and I couldn't find any case
> yielding wrong results.  As far as I can see the index scans return at
> least all the required rows, and all extraneous rows are correctly
> removed either by heap recheck or index recheck.

But "does it get the right answers" isn't the only figure of merit.
If the index scan visits far more rows than necessary, that's bad.
Maybe it's OK given that we only make trigrams from alphanumerics,
but I'm not quite sure.

            regards, tom lane

Re: Supporting = operator in gin/gist_trgm_ops

From

Julien Rouhaud

Date:

26 October 2020, 04:38:08

On Mon, Oct 26, 2020 at 12:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Julien Rouhaud <rjuju123@gmail.com> writes:
> > On Mon, Oct 26, 2020 at 5:03 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I think you may be overoptimistic about being able to use the identical
> >> code path without regard for LIKE wildcards; but certainly it should be
> >> possible to do this with not a lot of new code.  +1.
>
> > Well, that's what I was thinking too, but I tried all the possible
> > wildcard combinations I could think of and I couldn't find any case
> > yielding wrong results.  As far as I can see the index scans return at
> > least all the required rows, and all extraneous rows are correctly
> > removed either by heap recheck or index recheck.
>
> But "does it get the right answers" isn't the only figure of merit.
> If the index scan visits far more rows than necessary, that's bad.
> Maybe it's OK given that we only make trigrams from alphanumerics,
> but I'm not quite sure.

Ah, yes this might lead to bad performance if the "fake wildcard"
matches too many rows, but this shouldn't be a very common use case,
and the only alternative for that might be to create trigrams for non
alphanumerics characters.  I didn't try to do that because it would
mean meaningful overhead for mainstream usage of pg_trgm, and would
also mean on-disk format break.  In my opinion supporting = should be
a best effort, especially for such corner cases.

Re: Supporting = operator in gin/gist_trgm_ops

From

Alexander Korotkov

Date:

26 October 2020, 17:50:05

On Mon, Oct 26, 2020 at 7:38 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> Ah, yes this might lead to bad performance if the "fake wildcard"
> matches too many rows, but this shouldn't be a very common use case,
> and the only alternative for that might be to create trigrams for non
> alphanumerics characters.  I didn't try to do that because it would
> mean meaningful overhead for mainstream usage of pg_trgm, and would
> also mean on-disk format break.  In my opinion supporting = should be
> a best effort, especially for such corner cases.

It would be more efficient to generate trigrams for equal operator
using generate_trgm() instead of generate_wildcard_trgm().  It some
cases it would generate more trigrams.  For instance generate_trgm()
would generate '__a', '_ab', 'ab_' for '%ab%' while
generate_wildcard_trgm() would generate nothing.

Also I wonder how our costing would work if there are multiple indices
of the same column.  We should clearly prefer btree than pg_trgm
gist/gin, and I believe our costing provides this.  But we also should
prefer btree_gist/btree_gin than pg_trgm gist/gin, and I'm not sure
our costing provides this especially for gist.

------
Regards,
Alexander Korotkov

Re: Supporting = operator in gin/gist_trgm_ops

From

Georgios Kokolatos

Date:

11 November 2020, 12:33:11

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            not tested

Hi,

this patch implements a useful and missing feature. Thank you.

It includes documentation, which to a non-native speaker as myself seems appropriate.
It includes comprehensive tests that cover the implemented cases.

In the thread Alexander has pointed out, quote:
"It would be more efficient to generate trigrams for equal operator
using generate_trgm() instead of generate_wildcard_trgm()"

I will echo the sentiment, though from a slightly different and possibly not
as important point of view. The method used to extract trigrams from the query
should match the method used to extract trigrams from the values when they
get added to the index. This is gin_extract_value_trgm() and is indeed using
generate_trgm().

I have no opinion over Alexander's second comment regarding costing.

I change the status to 'Waiting on Author', but please feel free to override
my opinion if you feel I am wrong and reset it to 'Needs review'.

Cheers,
//Georgios

The new status of this patch is: Waiting on Author

Re: Supporting = operator in gin/gist_trgm_ops

From

Julien Rouhaud

Date:

13 November 2020, 09:50:09

On Wed, Nov 11, 2020 at 8:34 PM Georgios Kokolatos
<gkokolatos@protonmail.com> wrote:
>
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:       tested, passed
> Spec compliant:           not tested
> Documentation:            not tested
>
> Hi,
>
> this patch implements a useful and missing feature. Thank you.
>
> It includes documentation, which to a non-native speaker as myself seems appropriate.
> It includes comprehensive tests that cover the implemented cases.
>
> In the thread Alexander has pointed out, quote:
> "It would be more efficient to generate trigrams for equal operator
> using generate_trgm() instead of generate_wildcard_trgm()"
>
> I will echo the sentiment, though from a slightly different and possibly not
> as important point of view. The method used to extract trigrams from the query
> should match the method used to extract trigrams from the values when they
> get added to the index. This is gin_extract_value_trgm() and is indeed using
> generate_trgm().
>
> I have no opinion over Alexander's second comment regarding costing.
>
> I change the status to 'Waiting on Author', but please feel free to override
> my opinion if you feel I am wrong and reset it to 'Needs review'.

Thanks for the reminder Georgios!  Thanks a lot Alexander for the review!

Indeed, I should have used generate_trgm() rather than
generate_wildcard_trgm().  IIUC, the rest of the code should still be
doing the same as [I]LikeStrategyNumber.  I attach a v3 with that
modification.

For the costing, I tried this naive dataset:

CREATE TABLE t1 AS select md5(random()::text) AS val from
generate_series(1, 100000);
CREATE INDEX t1_btree ON t1 (val);
CREATE INDEX t1_gist ON t1 USING gist (val gist_trgm_ops);

Cost are like this (all default configuration, using any random existing entry):

# EXPLAIN ANALYZE SELECT * FROM t1 where val =
'8dcf324ce38428e4d27a363953ac1c51';
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Index Only Scan using t1_btree on t1  (cost=0.42..4.44 rows=1
width=33) (actual time=0.192..0.194 rows=1 loops=1)
   Index Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text)
   Heap Fetches: 0
 Planning Time: 0.133 ms
 Execution Time: 0.222 ms
(5 rows)

# EXPLAIN ANALYZE SELECT * FROM t1 where val =
'8dcf324ce38428e4d27a363953ac1c51';
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Index Scan using t1_gist on t1  (cost=0.28..8.30 rows=1 width=33)
(actual time=0.542..2.359 rows=1 loops=1)
   Index Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text)
 Planning Time: 0.189 ms
 Execution Time: 2.382 ms
(4 rows)

# EXPLAIN ANALYZE SELECT * FROM t1 where val =
'8dcf324ce38428e4d27a363953ac1c51';
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=400.01..404.02 rows=1 width=33) (actual
time=2.486..2.488 rows=1 loops=1)
   Recheck Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on t1_gin  (cost=0.00..400.01 rows=1 width=0)
(actual time=2.474..2.474 rows=1 loops=1)
         Index Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text)
 Planning Time: 0.206 ms
 Execution Time: 2.611 ms

So assuming that this dataset is representative enough, costing indeed
prefers a btree index over a gist/gin index, which should avoid
regression with this change.

Gin is however quite off, likely because it's a bitmap index scan
rather than an index scan, so gist is preferred in this scenario.
That's not ideal, but I'm not sure that there are many people having
both gin_trgm_ops and gist_trgm_ops.

Attachment

v3-0001-Handle-operator-in-pg_trgm.patch

Re: Supporting = operator in gin/gist_trgm_ops

From

Georgios

Date:

13 November 2020, 10:47:35


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, November 13, 2020 10:50 AM, Julien Rouhaud <julien.rouhaud@free.fr> wrote:

> On Wed, Nov 11, 2020 at 8:34 PM Georgios Kokolatos
> gkokolatos@protonmail.com wrote:
>
> > The following review has been posted through the commitfest application:
> > make installcheck-world: tested, passed
> > Implements feature: tested, passed
> > Spec compliant: not tested
> > Documentation: not tested
> > Hi,
> > this patch implements a useful and missing feature. Thank you.
> > It includes documentation, which to a non-native speaker as myself seems appropriate.
> > It includes comprehensive tests that cover the implemented cases.
> > In the thread Alexander has pointed out, quote:
> > "It would be more efficient to generate trigrams for equal operator
> > using generate_trgm() instead of generate_wildcard_trgm()"
> > I will echo the sentiment, though from a slightly different and possibly not
> > as important point of view. The method used to extract trigrams from the query
> > should match the method used to extract trigrams from the values when they
> > get added to the index. This is gin_extract_value_trgm() and is indeed using
> > generate_trgm().
> > I have no opinion over Alexander's second comment regarding costing.
> > I change the status to 'Waiting on Author', but please feel free to override
> > my opinion if you feel I am wrong and reset it to 'Needs review'.
>
> Thanks for the reminder Georgios! Thanks a lot Alexander for the review!
>
> Indeed, I should have used generate_trgm() rather than
> generate_wildcard_trgm(). IIUC, the rest of the code should still be
> doing the same as [I]LikeStrategyNumber. I attach a v3 with that
> modification.

Great, thanks!

v3 looks good.

>
> For the costing, I tried this naive dataset:
>
> CREATE TABLE t1 AS select md5(random()::text) AS val from
> generate_series(1, 100000);
> CREATE INDEX t1_btree ON t1 (val);
> CREATE INDEX t1_gist ON t1 USING gist (val gist_trgm_ops);
>
> Cost are like this (all default configuration, using any random existing entry):
>
> EXPLAIN ANALYZE SELECT * FROM t1 where val =
>
> =============================================
>
> '8dcf324ce38428e4d27a363953ac1c51';
> QUERY PLAN
>
> -----------------------------------------------
>
> Index Only Scan using t1_btree on t1 (cost=0.42..4.44 rows=1
> width=33) (actual time=0.192..0.194 rows=1 loops=1)
> Index Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text)
> Heap Fetches: 0
> Planning Time: 0.133 ms
> Execution Time: 0.222 ms
> (5 rows)
>
> EXPLAIN ANALYZE SELECT * FROM t1 where val =
>
> =============================================
>
> '8dcf324ce38428e4d27a363953ac1c51';
> QUERY PLAN
>
> -----------------------------------------------
>
> Index Scan using t1_gist on t1 (cost=0.28..8.30 rows=1 width=33)
> (actual time=0.542..2.359 rows=1 loops=1)
> Index Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text)
> Planning Time: 0.189 ms
> Execution Time: 2.382 ms
> (4 rows)
>
> EXPLAIN ANALYZE SELECT * FROM t1 where val =
>
> =============================================
>
> '8dcf324ce38428e4d27a363953ac1c51';
> QUERY PLAN
>
> -----------------------------------------------
>
> Bitmap Heap Scan on t1 (cost=400.01..404.02 rows=1 width=33) (actual
> time=2.486..2.488 rows=1 loops=1)
> Recheck Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text)
> Heap Blocks: exact=1
> -> Bitmap Index Scan on t1_gin (cost=0.00..400.01 rows=1 width=0)
> (actual time=2.474..2.474 rows=1 loops=1)
> Index Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text)
> Planning Time: 0.206 ms
> Execution Time: 2.611 ms
>
> So assuming that this dataset is representative enough, costing indeed
> prefers a btree index over a gist/gin index, which should avoid
> regression with this change.

It sounds reasonable, although I would leave it to a bit more cost savvy
people to argue the point.

>
> Gin is however quite off, likely because it's a bitmap index scan
> rather than an index scan, so gist is preferred in this scenario.
> That's not ideal, but I'm not sure that there are many people having
> both gin_trgm_ops and gist_trgm_ops.

Same as above.

In short, I think v3 of the patch looks good to change to 'RFC' status.
Given the possible costing concerns, I will refrain from changing the
status just yet, to give the opportunity to more reviewers to chime in.
If in the next few days there are no more reviews, I will update the
status to 'RFC' to move the patch forward.

Thoughts?

Cheers,
//Georgios

Re: Supporting = operator in gin/gist_trgm_ops

From

Alexander Korotkov

Date:

14 November 2020, 05:30:51

Hi!

On Fri, Nov 13, 2020 at 1:47 PM Georgios <gkokolatos@protonmail.com> wrote:
> In short, I think v3 of the patch looks good to change to 'RFC' status.
> Given the possible costing concerns, I will refrain from changing the
> status just yet, to give the opportunity to more reviewers to chime in.
> If in the next few days there are no more reviews, I will update the
> status to 'RFC' to move the patch forward.
>
> Thoughts?

I went through and revised this patch.  I made the documentation
statement less categorical.  pg_trgm gist/gin indexes might have lower
performance of equality operator search than B-tree.  So, we can't
claim the B-tree index is always not needed.  Also, simple comparison
operators are <, <=, >, >=, and they are not supported.

I also have checked that btree_gist is preferred over pg_trgm gist
index for equality search.  Despite our gist cost estimate is quite
dumb, it selects btree_gist index due to its lower size.  So, this
part also looks good to me.

I'm going to push this if no objections.

------
Regards,
Alexander Korotkov

Attachment

v4-0001-Handle-equality-operator-in-contrib-pg_trgm.patch

Re: Supporting = operator in gin/gist_trgm_ops

From

Erik Rijkers

Date:

14 November 2020, 08:37:09

On 2020-11-14 06:30, Alexander Korotkov wrote:

> [v4-0001-Handle-equality...in-contrib-pg_trgm.patch (~]
> 
> I'm going to push this if no objections.
> 

About the sgml, in doc/src/sgml/pgtrgm.sgml :

Beginning in <productname>PostgreSQL</productname> 14, these indexes 
also support equality operator (simple comparison operators are not 
supported).

should be:

Beginning in <productname>PostgreSQL</productname> 14, these indexes 
also support the equality operator (simple comparison operators are not 
supported).

(added 'the')

And:

Although these indexes might have lower the performance of equality 
operator
search than regular B-tree indexes.

should be (I think - please check the meaning)

Although these indexes might have a lower performance with equality 
operator
search than with regular B-tree indexes.

I am not sure I understood this last sentence correctly. Does this mean 
the slower trgm index might be chosen over the faster btree?

Thanks,

Erik Rijkers

Re: Supporting = operator in gin/gist_trgm_ops

From

Alexander Korotkov

Date:

14 November 2020, 10:07:22

Hi, Erik!

On Sat, Nov 14, 2020 at 11:37 AM Erik Rijkers <er@xs4all.nl> wrote:
> On 2020-11-14 06:30, Alexander Korotkov wrote:
>
> > [v4-0001-Handle-equality...in-contrib-pg_trgm.patch (~]
> >
> > I'm going to push this if no objections.
> >
>
> About the sgml, in doc/src/sgml/pgtrgm.sgml :
>
>
> Beginning in <productname>PostgreSQL</productname> 14, these indexes
> also support equality operator (simple comparison operators are not
> supported).
>
> should be:
>
> Beginning in <productname>PostgreSQL</productname> 14, these indexes
> also support the equality operator (simple comparison operators are not
> supported).
>
> (added 'the')
>
>
> And:
>
> Although these indexes might have lower the performance of equality
> operator
> search than regular B-tree indexes.
>
> should be (I think - please check the meaning)
>
> Although these indexes might have a lower performance with equality
> operator
> search than with regular B-tree indexes.
>
>
> I am not sure I understood this last sentence correctly. Does this mean
> the slower trgm index might be chosen over the faster btree?

Thank you for your review.

I mean searching for an equal string with pg_trgm GiST/GIN might be
slower than the same search with B-tree.  If you need both pg_trgm
similarity/pattern search and equal search on your column, you have
choice.  You can run with a single pg_trgm index, but your search for
an equal string wouldn't be as fast as with B-tree.  Alternatively you
can have two indexes: pg_trgm index for similarity/pattern search and
B-tree index for equality search.  Second option gives you a fast
equality search, but the second B-tree index would take extra space
and maintenance overhead.  For equality search, the B-tree index
should be chosen by the planner (and that was tested).

------
Regards,
Alexander Korotkov

Re: Supporting = operator in gin/gist_trgm_ops

From

Julien Rouhaud

Date:

14 November 2020, 11:53:59

On Sat, Nov 14, 2020 at 6:07 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> Hi, Erik!
>
> On Sat, Nov 14, 2020 at 11:37 AM Erik Rijkers <er@xs4all.nl> wrote:
> > On 2020-11-14 06:30, Alexander Korotkov wrote:
> >
> > > [v4-0001-Handle-equality...in-contrib-pg_trgm.patch (~]
> > >
> > > I'm going to push this if no objections.
> > >
> >
> > About the sgml, in doc/src/sgml/pgtrgm.sgml :
> >
> >
> > Beginning in <productname>PostgreSQL</productname> 14, these indexes
> > also support equality operator (simple comparison operators are not
> > supported).
> >
> > should be:
> >
> > Beginning in <productname>PostgreSQL</productname> 14, these indexes
> > also support the equality operator (simple comparison operators are not
> > supported).
> >
> > (added 'the')
> >
> >
> > And:
> >
> > Although these indexes might have lower the performance of equality
> > operator
> > search than regular B-tree indexes.
> >
> > should be (I think - please check the meaning)
> >
> > Although these indexes might have a lower performance with equality
> > operator
> > search than with regular B-tree indexes.
> >
> >
> > I am not sure I understood this last sentence correctly. Does this mean
> > the slower trgm index might be chosen over the faster btree?
>
> Thank you for your review.
>
> I mean searching for an equal string with pg_trgm GiST/GIN might be
> slower than the same search with B-tree.  If you need both pg_trgm
> similarity/pattern search and equal search on your column, you have
> choice.  You can run with a single pg_trgm index, but your search for
> an equal string wouldn't be as fast as with B-tree.  Alternatively you
> can have two indexes: pg_trgm index for similarity/pattern search and
> B-tree index for equality search.  Second option gives you a fast
> equality search, but the second B-tree index would take extra space
> and maintenance overhead.  For equality search, the B-tree index
> should be chosen by the planner (and that was tested).

Thanks everyone for the review, and thanks Alexander for the modifications!

I agree that it's important to document that those indexes may be less
performant than btree indexes.   I also agree that there's an
extraneous "the" in the documentation.  Maybe this rewrite could be
better?

   Note that those indexes may not be as afficient as regulat B-tree indexes
   for equality operator.

While at it, there's a small grammar error:

        case EqualStrategyNumber:
-           /* Wildcard search is inexact */
+           /* Wildcard and equal search is inexact */

It should be /* Wildcard and equal search are inexact */

Re: Supporting = operator in gin/gist_trgm_ops

From

Erik Rijkers

Date:

14 November 2020, 11:57:56

On 2020-11-14 12:53, Julien Rouhaud wrote:
> On Sat, Nov 14, 2020 at 6:07 PM Alexander Korotkov 
> <aekorotkov@gmail.com> >

>    Note that those indexes may not be as afficient as regulat B-tree 
> indexes
>    for equality operator.


'afficient as regulat'  should be
'efficient as regular'


Sorry to be nitpicking - it's the one thing I'm really good at :P

Erik

Re: Supporting = operator in gin/gist_trgm_ops

From

Julien Rouhaud

Date:

14 November 2020, 17:26:36

On Sat, Nov 14, 2020 at 7:58 PM Erik Rijkers <er@xs4all.nl> wrote:
>
> On 2020-11-14 12:53, Julien Rouhaud wrote:
> > On Sat, Nov 14, 2020 at 6:07 PM Alexander Korotkov
> > <aekorotkov@gmail.com> >
>
> >    Note that those indexes may not be as afficient as regulat B-tree
> > indexes
> >    for equality operator.
>
>
> 'afficient as regulat'  should be
> 'efficient as regular'
>
>
> Sorry to be nitpicking - it's the one thing I'm really good at :P

Oops indeed :)

Re: Supporting = operator in gin/gist_trgm_ops

From

Alexander Korotkov

Date:

15 November 2020, 05:55:13

On Sat, Nov 14, 2020 at 8:26 PM Julien Rouhaud <julien.rouhaud@free.fr> wrote:
> On Sat, Nov 14, 2020 at 7:58 PM Erik Rijkers <er@xs4all.nl> wrote:
> >
> > On 2020-11-14 12:53, Julien Rouhaud wrote:
> > > On Sat, Nov 14, 2020 at 6:07 PM Alexander Korotkov
> > > <aekorotkov@gmail.com> >
> >
> > >    Note that those indexes may not be as afficient as regulat B-tree
> > > indexes
> > >    for equality operator.
> >
> >
> > 'afficient as regulat'  should be
> > 'efficient as regular'
> >
> >
> > Sorry to be nitpicking - it's the one thing I'm really good at :P
>
> Oops indeed :)

Pushed with all the corrections above.  Thanks!

------
Regards,
Alexander Korotkov

Re: Supporting = operator in gin/gist_trgm_ops

From

Julien Rouhaud

Date:

15 November 2020, 07:18:25

On Sun, Nov 15, 2020 at 1:55 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Sat, Nov 14, 2020 at 8:26 PM Julien Rouhaud <julien.rouhaud@free.fr> wrote:
> > On Sat, Nov 14, 2020 at 7:58 PM Erik Rijkers <er@xs4all.nl> wrote:
> > >
> > > On 2020-11-14 12:53, Julien Rouhaud wrote:
> > > > On Sat, Nov 14, 2020 at 6:07 PM Alexander Korotkov
> > > > <aekorotkov@gmail.com> >
> > >
> > > >    Note that those indexes may not be as afficient as regulat B-tree
> > > > indexes
> > > >    for equality operator.
> > >
> > >
> > > 'afficient as regulat'  should be
> > > 'efficient as regular'
> > >
> > >
> > > Sorry to be nitpicking - it's the one thing I'm really good at :P
> >
> > Oops indeed :)
>
> Pushed with all the corrections above.  Thanks!

Thanks a lot!

Re: Supporting = operator in gin/gist_trgm_ops

From

Erik Rijkers

Date:

15 November 2020, 10:44:23

On 2020-11-15 06:55, Alexander Korotkov wrote:

>> > Sorry to be nitpicking - it's the one thing I'm really good at :P

Hi Alexander,

The last touch... (you forgot the missing 'the')

thanks!

Erik Rijkers

Attachment

pgtrgm-20201115.sgml.diff

Re: Supporting = operator in gin/gist_trgm_ops

From

Jeff Janes

Date:

15 November 2020, 23:13:15

On Sat, Nov 14, 2020 at 12:31 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:

I went through and revised this patch. I made the documentation
statement less categorical. pg_trgm gist/gin indexes might have lower
performance of equality operator search than B-tree. So, we can't
claim the B-tree index is always not needed. Also, simple comparison
operators are <, <=, >, >=, and they are not supported.

Is "simple comparison" here a well-known term of art? If I read the doc as committed (which doesn't include the sentence above), and if I didn't already know what it was saying, I would be left wondering which comparisons those are. Could we just say "inequality operators"?

Cheers,

Jeff

Re: Supporting = operator in gin/gist_trgm_ops

From

Alexander Korotkov

Date:

16 November 2020, 06:12:07

On Mon, Nov 16, 2020 at 2:13 AM Jeff Janes <jeff.janes@gmail.com> wrote:
> On Sat, Nov 14, 2020 at 12:31 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>> I went through and revised this patch.  I made the documentation
>> statement less categorical.  pg_trgm gist/gin indexes might have lower
>> performance of equality operator search than B-tree.  So, we can't
>> claim the B-tree index is always not needed.  Also, simple comparison
>> operators are <, <=, >, >=, and they are not supported.
>
> Is "simple comparison" here a well-known term of art?  If I read the doc as committed (which doesn't include the
sentenceabove), and if I didn't already know what it was saying, I would be left wondering which comparisons those are.
Could we just say "inequality operators"? 

You're right.  "Simple comparison" is vague, let's replace it with
"inequality".  Pushed, thanks!

------
Regards,
Alexander Korotkov