Re: Why is this query not using GIN index? - Mailing list pgsql-general

From Oleg Bartunov
Subject Re: Why is this query not using GIN index?
Date
Msg-id CAF4Au4xW0428ownCkAYZ46p_dS8ggDs5c=409T_Dkpbip1g7kw@mail.gmail.com
Whole thread Raw
In response to Re: Why is this query not using GIN index?  (Aaron Lewis <the.warl0ck.1989@gmail.com>)
Responses Re: Why is this query not using GIN index?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general


On Sun, Nov 13, 2016 at 6:05 PM, Aaron Lewis <the.warl0ck.1989@gmail.com> wrote:
Sigh, didn't notice that. Thanks for the heads up.

It takes 500ms with 10m rows, could it be faster?

sure.  Recheck with function call is pretty expensive, so I'd not recommend to create functional index, just create separate column of type tsvector (materialize to_tsvector) and create gin index on it.  You should surprise.
 
I've increased work_mem to 256MB

test=# explain analyze select * from mytable where
to_tsvector('english', title) @@ 'x264'::tsquery limit 1000 offset 10;
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=684.06..2949.42 rows=1000 width=83) (actual
time=348.506..536.483 rows=1000 loops=1)
   ->  Bitmap Heap Scan on mytable  (cost=661.41..158917.22 rows=69859
width=83) (actual time=345.354..536.199 rows=1010 loops=1)
         Recheck Cond: (to_tsvector('english'::regconfig, title) @@
'''x264'''::tsquery)
         Rows Removed by Index Recheck: 12242
         Heap Blocks: exact=20 lossy=186
         ->  Bitmap Index Scan on name_fts  (cost=0.00..643.95
rows=69859 width=0) (actual time=333.703..333.703 rows=1044673
loops=1)
               Index Cond: (to_tsvector('english'::regconfig, title)
@@ '''x264'''::tsquery)
 Planning time: 0.144 ms
 Execution time: 537.212 ms
(9 rows)

On Sun, Nov 13, 2016 at 10:33 PM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> On 13/11/2016 15:26, Aaron Lewis wrote:
>> Hi Oleg,
>>
>> Can you elaborate on the title column? I don't get it.
>>
>
>>>> create table mytable(hash char(40), title varchar(500));
>>>> create index name_fts on mytable using gin(to_tsvector('english',
>>>> 'title'));
>
> You created an index on the text 'title', not on the title column, so
> the index is useless.
>
> Drop the existing index and create this one instead:
>
> create index name_fts on mytable using gin(to_tsvector('english', title));
>
>> On Sun, Nov 13, 2016 at 10:10 PM, Oleg Bartunov <obartunov@gmail.com> wrote:
>>>
>>>
>>> On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis <the.warl0ck.1989@gmail.com>
>>> wrote:
>>>>
>>>> I have a simple table, and a gin index,
>>>>
>>>> create table mytable(hash char(40), title varchar(500));
>>>> create index name_fts on mytable using gin(to_tsvector('english',
>>>> 'title'));
>>>
>>>
>>>
>>> ^^^^^
>>>
>>>>
>>>> create unique index md5_uniq_idx on mytable(hash);
>>>>
>>>> When I execute a query with tsquery, the GIN index was not in use:
>>>>
>>>> test=# explain analyze select * from mytable where
>>>> to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
>>>>                                                      QUERY PLAN
>>>>
>>>> --------------------------------------------------------------------------------------------------------------------
>>>>  Limit  (cost=0.00..277.35 rows=10 width=83) (actual
>>>> time=0.111..75.549 rows=10 loops=1)
>>>>    ->  Seq Scan on mytable  (cost=0.00..381187.45 rows=13744 width=83)
>>>> (actual time=0.110..75.546 rows=10 loops=1)
>>>>          Filter: (to_tsvector('english'::regconfig, (title)::text) @@
>>>> '''abc'' | ''def'''::tsquery)
>>>>          Rows Removed by Filter: 10221
>>>>  Planning time: 0.176 ms
>>>>  Execution time: 75.564 ms
>>>> (6 rows)
>>>>
>>>> Any ideas?
>>>>
>
> --
> Julien Rouhaud
> http://dalibo.com - http://dalibo.org



--
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33

pgsql-general by date:

Previous
From: John R Pierce
Date:
Subject: Re: pg_dumpall: could not connect to database "template1": FATAL:
Next
From: Tom Lane
Date:
Subject: Re: Why is this query not using GIN index?