Thread: Why is this query not using GIN index?

Why is this query not using GIN index?

From
Aaron Lewis
Date:
I have a simple table, and a gin index,

create table mytable(hash char(40), title varchar(500));
create index name_fts on mytable using gin(to_tsvector('english', 'title'));
create unique index md5_uniq_idx on mytable(hash);

When I execute a query with tsquery, the GIN index was not in use:

test=# explain analyze select * from mytable where
to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..277.35 rows=10 width=83) (actual
time=0.111..75.549 rows=10 loops=1)
   ->  Seq Scan on mytable  (cost=0.00..381187.45 rows=13744 width=83)
(actual time=0.110..75.546 rows=10 loops=1)
         Filter: (to_tsvector('english'::regconfig, (title)::text) @@
'''abc'' | ''def'''::tsquery)
         Rows Removed by Filter: 10221
 Planning time: 0.176 ms
 Execution time: 75.564 ms
(6 rows)

Any ideas?


--
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33


Re: Why is this query not using GIN index?

From
Oleg Bartunov
Date:


On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis <the.warl0ck.1989@gmail.com> wrote:
I have a simple table, and a gin index,

create table mytable(hash char(40), title varchar(500));
create index name_fts on mytable using gin(to_tsvector('english', 'title'));

                                                                                                            ^^^^^
 
create unique index md5_uniq_idx on mytable(hash);

When I execute a query with tsquery, the GIN index was not in use:

test=# explain analyze select * from mytable where
to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..277.35 rows=10 width=83) (actual
time=0.111..75.549 rows=10 loops=1)
   ->  Seq Scan on mytable  (cost=0.00..381187.45 rows=13744 width=83)
(actual time=0.110..75.546 rows=10 loops=1)
         Filter: (to_tsvector('english'::regconfig, (title)::text) @@
'''abc'' | ''def'''::tsquery)
         Rows Removed by Filter: 10221
 Planning time: 0.176 ms
 Execution time: 75.564 ms
(6 rows)

Any ideas?


--
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Why is this query not using GIN index?

From
Aaron Lewis
Date:
Hi Oleg,

Can you elaborate on the title column? I don't get it.

On Sun, Nov 13, 2016 at 10:10 PM, Oleg Bartunov <obartunov@gmail.com> wrote:
>
>
> On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis <the.warl0ck.1989@gmail.com>
> wrote:
>>
>> I have a simple table, and a gin index,
>>
>> create table mytable(hash char(40), title varchar(500));
>> create index name_fts on mytable using gin(to_tsvector('english',
>> 'title'));
>
>
>
> ^^^^^
>
>>
>> create unique index md5_uniq_idx on mytable(hash);
>>
>> When I execute a query with tsquery, the GIN index was not in use:
>>
>> test=# explain analyze select * from mytable where
>> to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
>>                                                      QUERY PLAN
>>
>> --------------------------------------------------------------------------------------------------------------------
>>  Limit  (cost=0.00..277.35 rows=10 width=83) (actual
>> time=0.111..75.549 rows=10 loops=1)
>>    ->  Seq Scan on mytable  (cost=0.00..381187.45 rows=13744 width=83)
>> (actual time=0.110..75.546 rows=10 loops=1)
>>          Filter: (to_tsvector('english'::regconfig, (title)::text) @@
>> '''abc'' | ''def'''::tsquery)
>>          Rows Removed by Filter: 10221
>>  Planning time: 0.176 ms
>>  Execution time: 75.564 ms
>> (6 rows)
>>
>> Any ideas?
>>
>>
>> --
>> Best Regards,
>> Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
>> Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>



--
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33


Re: Why is this query not using GIN index?

From
Julien Rouhaud
Date:
On 13/11/2016 15:26, Aaron Lewis wrote:
> Hi Oleg,
>
> Can you elaborate on the title column? I don't get it.
>

>>> create table mytable(hash char(40), title varchar(500));
>>> create index name_fts on mytable using gin(to_tsvector('english',
>>> 'title'));

You created an index on the text 'title', not on the title column, so
the index is useless.

Drop the existing index and create this one instead:

create index name_fts on mytable using gin(to_tsvector('english', title));

> On Sun, Nov 13, 2016 at 10:10 PM, Oleg Bartunov <obartunov@gmail.com> wrote:
>>
>>
>> On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis <the.warl0ck.1989@gmail.com>
>> wrote:
>>>
>>> I have a simple table, and a gin index,
>>>
>>> create table mytable(hash char(40), title varchar(500));
>>> create index name_fts on mytable using gin(to_tsvector('english',
>>> 'title'));
>>
>>
>>
>> ^^^^^
>>
>>>
>>> create unique index md5_uniq_idx on mytable(hash);
>>>
>>> When I execute a query with tsquery, the GIN index was not in use:
>>>
>>> test=# explain analyze select * from mytable where
>>> to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
>>>                                                      QUERY PLAN
>>>
>>>
--------------------------------------------------------------------------------------------------------------------
>>>  Limit  (cost=0.00..277.35 rows=10 width=83) (actual
>>> time=0.111..75.549 rows=10 loops=1)
>>>    ->  Seq Scan on mytable  (cost=0.00..381187.45 rows=13744 width=83)
>>> (actual time=0.110..75.546 rows=10 loops=1)
>>>          Filter: (to_tsvector('english'::regconfig, (title)::text) @@
>>> '''abc'' | ''def'''::tsquery)
>>>          Rows Removed by Filter: 10221
>>>  Planning time: 0.176 ms
>>>  Execution time: 75.564 ms
>>> (6 rows)
>>>
>>> Any ideas?
>>>

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org


Re: Why is this query not using GIN index?

From
Aaron Lewis
Date:
Sigh, didn't notice that. Thanks for the heads up.

It takes 500ms with 10m rows, could it be faster?
I've increased work_mem to 256MB

test=# explain analyze select * from mytable where
to_tsvector('english', title) @@ 'x264'::tsquery limit 1000 offset 10;
                                                              QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=684.06..2949.42 rows=1000 width=83) (actual
time=348.506..536.483 rows=1000 loops=1)
   ->  Bitmap Heap Scan on mytable  (cost=661.41..158917.22 rows=69859
width=83) (actual time=345.354..536.199 rows=1010 loops=1)
         Recheck Cond: (to_tsvector('english'::regconfig, title) @@
'''x264'''::tsquery)
         Rows Removed by Index Recheck: 12242
         Heap Blocks: exact=20 lossy=186
         ->  Bitmap Index Scan on name_fts  (cost=0.00..643.95
rows=69859 width=0) (actual time=333.703..333.703 rows=1044673
loops=1)
               Index Cond: (to_tsvector('english'::regconfig, title)
@@ '''x264'''::tsquery)
 Planning time: 0.144 ms
 Execution time: 537.212 ms
(9 rows)

On Sun, Nov 13, 2016 at 10:33 PM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> On 13/11/2016 15:26, Aaron Lewis wrote:
>> Hi Oleg,
>>
>> Can you elaborate on the title column? I don't get it.
>>
>
>>>> create table mytable(hash char(40), title varchar(500));
>>>> create index name_fts on mytable using gin(to_tsvector('english',
>>>> 'title'));
>
> You created an index on the text 'title', not on the title column, so
> the index is useless.
>
> Drop the existing index and create this one instead:
>
> create index name_fts on mytable using gin(to_tsvector('english', title));
>
>> On Sun, Nov 13, 2016 at 10:10 PM, Oleg Bartunov <obartunov@gmail.com> wrote:
>>>
>>>
>>> On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis <the.warl0ck.1989@gmail.com>
>>> wrote:
>>>>
>>>> I have a simple table, and a gin index,
>>>>
>>>> create table mytable(hash char(40), title varchar(500));
>>>> create index name_fts on mytable using gin(to_tsvector('english',
>>>> 'title'));
>>>
>>>
>>>
>>> ^^^^^
>>>
>>>>
>>>> create unique index md5_uniq_idx on mytable(hash);
>>>>
>>>> When I execute a query with tsquery, the GIN index was not in use:
>>>>
>>>> test=# explain analyze select * from mytable where
>>>> to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
>>>>                                                      QUERY PLAN
>>>>
>>>>
--------------------------------------------------------------------------------------------------------------------
>>>>  Limit  (cost=0.00..277.35 rows=10 width=83) (actual
>>>> time=0.111..75.549 rows=10 loops=1)
>>>>    ->  Seq Scan on mytable  (cost=0.00..381187.45 rows=13744 width=83)
>>>> (actual time=0.110..75.546 rows=10 loops=1)
>>>>          Filter: (to_tsvector('english'::regconfig, (title)::text) @@
>>>> '''abc'' | ''def'''::tsquery)
>>>>          Rows Removed by Filter: 10221
>>>>  Planning time: 0.176 ms
>>>>  Execution time: 75.564 ms
>>>> (6 rows)
>>>>
>>>> Any ideas?
>>>>
>
> --
> Julien Rouhaud
> http://dalibo.com - http://dalibo.org



--
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33


Re: Why is this query not using GIN index?

From
Oleg Bartunov
Date:


On Sun, Nov 13, 2016 at 6:05 PM, Aaron Lewis <the.warl0ck.1989@gmail.com> wrote:
Sigh, didn't notice that. Thanks for the heads up.

It takes 500ms with 10m rows, could it be faster?

sure.  Recheck with function call is pretty expensive, so I'd not recommend to create functional index, just create separate column of type tsvector (materialize to_tsvector) and create gin index on it.  You should surprise.
 
I've increased work_mem to 256MB

test=# explain analyze select * from mytable where
to_tsvector('english', title) @@ 'x264'::tsquery limit 1000 offset 10;
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=684.06..2949.42 rows=1000 width=83) (actual
time=348.506..536.483 rows=1000 loops=1)
   ->  Bitmap Heap Scan on mytable  (cost=661.41..158917.22 rows=69859
width=83) (actual time=345.354..536.199 rows=1010 loops=1)
         Recheck Cond: (to_tsvector('english'::regconfig, title) @@
'''x264'''::tsquery)
         Rows Removed by Index Recheck: 12242
         Heap Blocks: exact=20 lossy=186
         ->  Bitmap Index Scan on name_fts  (cost=0.00..643.95
rows=69859 width=0) (actual time=333.703..333.703 rows=1044673
loops=1)
               Index Cond: (to_tsvector('english'::regconfig, title)
@@ '''x264'''::tsquery)
 Planning time: 0.144 ms
 Execution time: 537.212 ms
(9 rows)

On Sun, Nov 13, 2016 at 10:33 PM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> On 13/11/2016 15:26, Aaron Lewis wrote:
>> Hi Oleg,
>>
>> Can you elaborate on the title column? I don't get it.
>>
>
>>>> create table mytable(hash char(40), title varchar(500));
>>>> create index name_fts on mytable using gin(to_tsvector('english',
>>>> 'title'));
>
> You created an index on the text 'title', not on the title column, so
> the index is useless.
>
> Drop the existing index and create this one instead:
>
> create index name_fts on mytable using gin(to_tsvector('english', title));
>
>> On Sun, Nov 13, 2016 at 10:10 PM, Oleg Bartunov <obartunov@gmail.com> wrote:
>>>
>>>
>>> On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis <the.warl0ck.1989@gmail.com>
>>> wrote:
>>>>
>>>> I have a simple table, and a gin index,
>>>>
>>>> create table mytable(hash char(40), title varchar(500));
>>>> create index name_fts on mytable using gin(to_tsvector('english',
>>>> 'title'));
>>>
>>>
>>>
>>> ^^^^^
>>>
>>>>
>>>> create unique index md5_uniq_idx on mytable(hash);
>>>>
>>>> When I execute a query with tsquery, the GIN index was not in use:
>>>>
>>>> test=# explain analyze select * from mytable where
>>>> to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
>>>>                                                      QUERY PLAN
>>>>
>>>> --------------------------------------------------------------------------------------------------------------------
>>>>  Limit  (cost=0.00..277.35 rows=10 width=83) (actual
>>>> time=0.111..75.549 rows=10 loops=1)
>>>>    ->  Seq Scan on mytable  (cost=0.00..381187.45 rows=13744 width=83)
>>>> (actual time=0.110..75.546 rows=10 loops=1)
>>>>          Filter: (to_tsvector('english'::regconfig, (title)::text) @@
>>>> '''abc'' | ''def'''::tsquery)
>>>>          Rows Removed by Filter: 10221
>>>>  Planning time: 0.176 ms
>>>>  Execution time: 75.564 ms
>>>> (6 rows)
>>>>
>>>> Any ideas?
>>>>
>
> --
> Julien Rouhaud
> http://dalibo.com - http://dalibo.org



--
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33

Re: Why is this query not using GIN index?

From
Tom Lane
Date:
Oleg Bartunov <obartunov@gmail.com> writes:
> On Sun, Nov 13, 2016 at 6:05 PM, Aaron Lewis <the.warl0ck.1989@gmail.com>
>> It takes 500ms with 10m rows, could it be faster?

> sure.  Recheck with function call is pretty expensive, so I'd not recommend
> to create functional index, just create separate column of type tsvector
> (materialize to_tsvector) and create gin index on it.  You should surprise.

I doubt it'll help that much --- more than half the time is going into the
bitmap indexscan, and with over 1m candidate matches, there's no way
that's going to be super cheap.

I wonder whether a gist index would be better here, since it would support
a plain indexscan which should require scanning much less of the index
given the small LIMIT.

(Materializing the tsvector would probably help for gist, too, by reducing
the cost of lossy-index rechecks.)

BTW, it still looks like the performance is being significantly hurt by
inadequate work_mem.

            regards, tom lane


Re: Why is this query not using GIN index?

From
Aaron Lewis
Date:
Hey guys,

I'm trying to understand the performance impact of "Index Recheck", I
googled for Index Recheck, but didn't find much details about it,
where can I know more about it?

And how did you know the performance is being significantly hurt by
inadequate work_mem?

I'm running PG 9.6.1, built from source.


On Mon, Nov 14, 2016 at 2:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Oleg Bartunov <obartunov@gmail.com> writes:
>> On Sun, Nov 13, 2016 at 6:05 PM, Aaron Lewis <the.warl0ck.1989@gmail.com>
>>> It takes 500ms with 10m rows, could it be faster?
>
>> sure.  Recheck with function call is pretty expensive, so I'd not recommend
>> to create functional index, just create separate column of type tsvector
>> (materialize to_tsvector) and create gin index on it.  You should surprise.
>
> I doubt it'll help that much --- more than half the time is going into the
> bitmap indexscan, and with over 1m candidate matches, there's no way
> that's going to be super cheap.
>
> I wonder whether a gist index would be better here, since it would support
> a plain indexscan which should require scanning much less of the index
> given the small LIMIT.
>
> (Materializing the tsvector would probably help for gist, too, by reducing
> the cost of lossy-index rechecks.)
>
> BTW, it still looks like the performance is being significantly hurt by
> inadequate work_mem.
>
>                         regards, tom lane



--
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33