Thread: Why is this query not using GIN index?
I have a simple table, and a gin index, create table mytable(hash char(40), title varchar(500)); create index name_fts on mytable using gin(to_tsvector('english', 'title')); create unique index md5_uniq_idx on mytable(hash); When I execute a query with tsquery, the GIN index was not in use: test=# explain analyze select * from mytable where to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..277.35 rows=10 width=83) (actual time=0.111..75.549 rows=10 loops=1) -> Seq Scan on mytable (cost=0.00..381187.45 rows=13744 width=83) (actual time=0.110..75.546 rows=10 loops=1) Filter: (to_tsvector('english'::regconfig, (title)::text) @@ '''abc'' | ''def'''::tsquery) Rows Removed by Filter: 10221 Planning time: 0.176 ms Execution time: 75.564 ms (6 rows) Any ideas? -- Best Regards, Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/ Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33
On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis <the.warl0ck.1989@gmail.com> wrote:
^^^^^
I have a simple table, and a gin index,
create table mytable(hash char(40), title varchar(500));
create index name_fts on mytable using gin(to_tsvector('english', 'title'));
^^^^^
create unique index md5_uniq_idx on mytable(hash);
When I execute a query with tsquery, the GIN index was not in use:
test=# explain analyze select * from mytable where
to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
QUERY PLAN
------------------------------------------------------------ ------------------------------ --------------------------
Limit (cost=0.00..277.35 rows=10 width=83) (actual
time=0.111..75.549 rows=10 loops=1)
-> Seq Scan on mytable (cost=0.00..381187.45 rows=13744 width=83)
(actual time=0.110..75.546 rows=10 loops=1)
Filter: (to_tsvector('english'::regconfig, (title)::text) @@
'''abc'' | ''def'''::tsquery)
Rows Removed by Filter: 10221
Planning time: 0.176 ms
Execution time: 75.564 ms
(6 rows)
Any ideas?
--
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Hi Oleg, Can you elaborate on the title column? I don't get it. On Sun, Nov 13, 2016 at 10:10 PM, Oleg Bartunov <obartunov@gmail.com> wrote: > > > On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis <the.warl0ck.1989@gmail.com> > wrote: >> >> I have a simple table, and a gin index, >> >> create table mytable(hash char(40), title varchar(500)); >> create index name_fts on mytable using gin(to_tsvector('english', >> 'title')); > > > > ^^^^^ > >> >> create unique index md5_uniq_idx on mytable(hash); >> >> When I execute a query with tsquery, the GIN index was not in use: >> >> test=# explain analyze select * from mytable where >> to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10; >> QUERY PLAN >> >> -------------------------------------------------------------------------------------------------------------------- >> Limit (cost=0.00..277.35 rows=10 width=83) (actual >> time=0.111..75.549 rows=10 loops=1) >> -> Seq Scan on mytable (cost=0.00..381187.45 rows=13744 width=83) >> (actual time=0.110..75.546 rows=10 loops=1) >> Filter: (to_tsvector('english'::regconfig, (title)::text) @@ >> '''abc'' | ''def'''::tsquery) >> Rows Removed by Filter: 10221 >> Planning time: 0.176 ms >> Execution time: 75.564 ms >> (6 rows) >> >> Any ideas? >> >> >> -- >> Best Regards, >> Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/ >> Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33 >> >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general > > -- Best Regards, Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/ Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33
On 13/11/2016 15:26, Aaron Lewis wrote: > Hi Oleg, > > Can you elaborate on the title column? I don't get it. > >>> create table mytable(hash char(40), title varchar(500)); >>> create index name_fts on mytable using gin(to_tsvector('english', >>> 'title')); You created an index on the text 'title', not on the title column, so the index is useless. Drop the existing index and create this one instead: create index name_fts on mytable using gin(to_tsvector('english', title)); > On Sun, Nov 13, 2016 at 10:10 PM, Oleg Bartunov <obartunov@gmail.com> wrote: >> >> >> On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis <the.warl0ck.1989@gmail.com> >> wrote: >>> >>> I have a simple table, and a gin index, >>> >>> create table mytable(hash char(40), title varchar(500)); >>> create index name_fts on mytable using gin(to_tsvector('english', >>> 'title')); >> >> >> >> ^^^^^ >> >>> >>> create unique index md5_uniq_idx on mytable(hash); >>> >>> When I execute a query with tsquery, the GIN index was not in use: >>> >>> test=# explain analyze select * from mytable where >>> to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10; >>> QUERY PLAN >>> >>> -------------------------------------------------------------------------------------------------------------------- >>> Limit (cost=0.00..277.35 rows=10 width=83) (actual >>> time=0.111..75.549 rows=10 loops=1) >>> -> Seq Scan on mytable (cost=0.00..381187.45 rows=13744 width=83) >>> (actual time=0.110..75.546 rows=10 loops=1) >>> Filter: (to_tsvector('english'::regconfig, (title)::text) @@ >>> '''abc'' | ''def'''::tsquery) >>> Rows Removed by Filter: 10221 >>> Planning time: 0.176 ms >>> Execution time: 75.564 ms >>> (6 rows) >>> >>> Any ideas? >>> -- Julien Rouhaud http://dalibo.com - http://dalibo.org
Sigh, didn't notice that. Thanks for the heads up. It takes 500ms with 10m rows, could it be faster? I've increased work_mem to 256MB test=# explain analyze select * from mytable where to_tsvector('english', title) @@ 'x264'::tsquery limit 1000 offset 10; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=684.06..2949.42 rows=1000 width=83) (actual time=348.506..536.483 rows=1000 loops=1) -> Bitmap Heap Scan on mytable (cost=661.41..158917.22 rows=69859 width=83) (actual time=345.354..536.199 rows=1010 loops=1) Recheck Cond: (to_tsvector('english'::regconfig, title) @@ '''x264'''::tsquery) Rows Removed by Index Recheck: 12242 Heap Blocks: exact=20 lossy=186 -> Bitmap Index Scan on name_fts (cost=0.00..643.95 rows=69859 width=0) (actual time=333.703..333.703 rows=1044673 loops=1) Index Cond: (to_tsvector('english'::regconfig, title) @@ '''x264'''::tsquery) Planning time: 0.144 ms Execution time: 537.212 ms (9 rows) On Sun, Nov 13, 2016 at 10:33 PM, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote: > On 13/11/2016 15:26, Aaron Lewis wrote: >> Hi Oleg, >> >> Can you elaborate on the title column? I don't get it. >> > >>>> create table mytable(hash char(40), title varchar(500)); >>>> create index name_fts on mytable using gin(to_tsvector('english', >>>> 'title')); > > You created an index on the text 'title', not on the title column, so > the index is useless. > > Drop the existing index and create this one instead: > > create index name_fts on mytable using gin(to_tsvector('english', title)); > >> On Sun, Nov 13, 2016 at 10:10 PM, Oleg Bartunov <obartunov@gmail.com> wrote: >>> >>> >>> On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis <the.warl0ck.1989@gmail.com> >>> wrote: >>>> >>>> I have a simple table, and a gin index, >>>> >>>> create table mytable(hash char(40), title varchar(500)); >>>> create index name_fts on mytable using gin(to_tsvector('english', >>>> 'title')); >>> >>> >>> >>> ^^^^^ >>> >>>> >>>> create unique index md5_uniq_idx on mytable(hash); >>>> >>>> When I execute a query with tsquery, the GIN index was not in use: >>>> >>>> test=# explain analyze select * from mytable where >>>> to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10; >>>> QUERY PLAN >>>> >>>> -------------------------------------------------------------------------------------------------------------------- >>>> Limit (cost=0.00..277.35 rows=10 width=83) (actual >>>> time=0.111..75.549 rows=10 loops=1) >>>> -> Seq Scan on mytable (cost=0.00..381187.45 rows=13744 width=83) >>>> (actual time=0.110..75.546 rows=10 loops=1) >>>> Filter: (to_tsvector('english'::regconfig, (title)::text) @@ >>>> '''abc'' | ''def'''::tsquery) >>>> Rows Removed by Filter: 10221 >>>> Planning time: 0.176 ms >>>> Execution time: 75.564 ms >>>> (6 rows) >>>> >>>> Any ideas? >>>> > > -- > Julien Rouhaud > http://dalibo.com - http://dalibo.org -- Best Regards, Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/ Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33
On Sun, Nov 13, 2016 at 6:05 PM, Aaron Lewis <the.warl0ck.1989@gmail.com> wrote:
Sigh, didn't notice that. Thanks for the heads up.
It takes 500ms with 10m rows, could it be faster?
sure. Recheck with function call is pretty expensive, so I'd not recommend to create functional index, just create separate column of type tsvector (materialize to_tsvector) and create gin index on it. You should surprise.
I've increased work_mem to 256MB
test=# explain analyze select * from mytable where
to_tsvector('english', title) @@ 'x264'::tsquery limit 1000 offset 10;
QUERY PLAN
------------------------------------------------------------ ------------------------------ ------------------------------ ---------------
Limit (cost=684.06..2949.42 rows=1000 width=83) (actual
time=348.506..536.483 rows=1000 loops=1)
-> Bitmap Heap Scan on mytable (cost=661.41..158917.22 rows=69859
width=83) (actual time=345.354..536.199 rows=1010 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, title) @@
'''x264'''::tsquery)
Rows Removed by Index Recheck: 12242
Heap Blocks: exact=20 lossy=186
-> Bitmap Index Scan on name_fts (cost=0.00..643.95
rows=69859 width=0) (actual time=333.703..333.703 rows=1044673
loops=1)
Index Cond: (to_tsvector('english'::regconfig, title)
@@ '''x264'''::tsquery)
Planning time: 0.144 ms
Execution time: 537.212 ms
(9 rows)
On Sun, Nov 13, 2016 at 10:33 PM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> On 13/11/2016 15:26, Aaron Lewis wrote:
>> Hi Oleg,
>>
>> Can you elaborate on the title column? I don't get it.
>>
>
>>>> create table mytable(hash char(40), title varchar(500));
>>>> create index name_fts on mytable using gin(to_tsvector('english',
>>>> 'title'));
>
> You created an index on the text 'title', not on the title column, so
> the index is useless.
>
> Drop the existing index and create this one instead:
>
> create index name_fts on mytable using gin(to_tsvector('english', title));
>
>> On Sun, Nov 13, 2016 at 10:10 PM, Oleg Bartunov <obartunov@gmail.com> wrote:
>>>
>>>
>>> On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis <the.warl0ck.1989@gmail.com>
>>> wrote:
>>>>
>>>> I have a simple table, and a gin index,
>>>>
>>>> create table mytable(hash char(40), title varchar(500));
>>>> create index name_fts on mytable using gin(to_tsvector('english',
>>>> 'title'));
>>>
>>>
>>>
>>> ^^^^^
>>>
>>>>
>>>> create unique index md5_uniq_idx on mytable(hash);
>>>>
>>>> When I execute a query with tsquery, the GIN index was not in use:
>>>>
>>>> test=# explain analyze select * from mytable where
>>>> to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
>>>> QUERY PLAN
>>>>
>>>> ------------------------------------------------------------ ------------------------------ --------------------------
>>>> Limit (cost=0.00..277.35 rows=10 width=83) (actual
>>>> time=0.111..75.549 rows=10 loops=1)
>>>> -> Seq Scan on mytable (cost=0.00..381187.45 rows=13744 width=83)
>>>> (actual time=0.110..75.546 rows=10 loops=1)
>>>> Filter: (to_tsvector('english'::regconfig, (title)::text) @@
>>>> '''abc'' | ''def'''::tsquery)
>>>> Rows Removed by Filter: 10221
>>>> Planning time: 0.176 ms
>>>> Execution time: 75.564 ms
>>>> (6 rows)
>>>>
>>>> Any ideas?
>>>>
>
> --
> Julien Rouhaud
> http://dalibo.com - http://dalibo.org--
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33
Oleg Bartunov <obartunov@gmail.com> writes: > On Sun, Nov 13, 2016 at 6:05 PM, Aaron Lewis <the.warl0ck.1989@gmail.com> >> It takes 500ms with 10m rows, could it be faster? > sure. Recheck with function call is pretty expensive, so I'd not recommend > to create functional index, just create separate column of type tsvector > (materialize to_tsvector) and create gin index on it. You should surprise. I doubt it'll help that much --- more than half the time is going into the bitmap indexscan, and with over 1m candidate matches, there's no way that's going to be super cheap. I wonder whether a gist index would be better here, since it would support a plain indexscan which should require scanning much less of the index given the small LIMIT. (Materializing the tsvector would probably help for gist, too, by reducing the cost of lossy-index rechecks.) BTW, it still looks like the performance is being significantly hurt by inadequate work_mem. regards, tom lane
Hey guys, I'm trying to understand the performance impact of "Index Recheck", I googled for Index Recheck, but didn't find much details about it, where can I know more about it? And how did you know the performance is being significantly hurt by inadequate work_mem? I'm running PG 9.6.1, built from source. On Mon, Nov 14, 2016 at 2:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Oleg Bartunov <obartunov@gmail.com> writes: >> On Sun, Nov 13, 2016 at 6:05 PM, Aaron Lewis <the.warl0ck.1989@gmail.com> >>> It takes 500ms with 10m rows, could it be faster? > >> sure. Recheck with function call is pretty expensive, so I'd not recommend >> to create functional index, just create separate column of type tsvector >> (materialize to_tsvector) and create gin index on it. You should surprise. > > I doubt it'll help that much --- more than half the time is going into the > bitmap indexscan, and with over 1m candidate matches, there's no way > that's going to be super cheap. > > I wonder whether a gist index would be better here, since it would support > a plain indexscan which should require scanning much less of the index > given the small LIMIT. > > (Materializing the tsvector would probably help for gist, too, by reducing > the cost of lossy-index rechecks.) > > BTW, it still looks like the performance is being significantly hurt by > inadequate work_mem. > > regards, tom lane -- Best Regards, Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/ Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33