Thread: Question about gin index not used on a tsv column

Question about gin index not used on a tsv column

From
Patrick Dung
Date:
Hello Postgresql users,

In my setting, I found that sometimes the query does not use the gin index built for a tsv column.

Attached file provide more info (with explain analyze).

Thanks and regards,
Patrick
Attachment

Re: Question about gin index not used on a tsv column

From
David G Johnston
Date:
Patrick Dung-2 wrote
> Hello Postgresql users,
>
> In my setting, I found that sometimes the query does not use the gin index
> built for a tsv column.
>
> Attached file provide more info (with explain analyze).

So the difference between the first and third queries shown is the fact that
the data is now in-memory when the third query is run; in both plans only
sequential scans are used.

The difference between the first and second queries, where the index is
used, is therefore at least partially - if not wholly - due to those same
caching effects.

On a reasonably small table it is not that unreasonable for the planner to
choose a single retrieve-and-filter sequential scan as opposed to a
piecemeal lookup-and-retrieve index scan.

Your example is not controlled enough to prove that the planner has made an
incorrect decision.  The estimated costs of 34.5k vs 35.3k are extremely
small and since the 34.5k is the result of changing random_page_cost you
cannot directly compare them anyway.

You need to use "enable_seqscan" and related configuration parameters so
that you can force the planner to choose different plans without changing
the underlying costs.

http://www.postgresql.org/docs/9.3/static/runtime-config-query.html

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Question-about-gin-index-not-used-on-a-tsv-column-tp5817433p5817438.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Question about gin index not used on a tsv column

From
Patrick Dung
Date:
Thanks for reply, David.

I have searched internet and changed one parameter cpu_tuple_cost from 0.01 to 0.08. I would see if it helped.
I found the problem occurred randomly.

For tsv, I thought if there is an index already built, postgresql should try to make use of it because I think for most of the time it would be faster than full table scan.

Thanks and regards,
Patrick


On Wednesday, September 3, 2014 3:00 AM, David G Johnston <david.g.johnston@gmail.com> wrote:


Patrick Dung-2 wrote
> Hello Postgresql users,
>
> In my setting, I found that sometimes the query does not use the gin index
> built for a tsv column.
>
> Attached file provide more info (with explain analyze).

So the difference between the first and third queries shown is the fact that
the data is now in-memory when the third query is run; in both plans only
sequential scans are used.

The difference between the first and second queries, where the index is
used, is therefore at least partially - if not wholly - due to those same
caching effects.

On a reasonably small table it is not that unreasonable for the planner to
choose a single retrieve-and-filter sequential scan as opposed to a
piecemeal lookup-and-retrieve index scan. 

Your example is not controlled enough to prove that the planner has made an
incorrect decision.  The estimated costs of 34.5k vs 35.3k are extremely
small and since the 34.5k is the result of changing random_page_cost you
cannot directly compare them anyway.

You need to use "enable_seqscan" and related configuration parameters so
that you can force the planner to choose different plans without changing
the underlying costs.

http://www.postgresql.org/docs/9.3/static/runtime-config-query.html

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/Question-about-gin-index-not-used-on-a-tsv-column-tp5817433p5817438.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: Question about gin index not used on a tsv column

From
Patrick Dung
Date:
Hello all,

I have encounter problems:
1) While testing, I found that in a particular case, LIMIT 1 is slower (gin index is not used).

2) It is about constraint exclusion.
The query is ordered by (DESC) post_timestamp.
In human way of thinking, it only need to search from the 2014 partition table and only need to provide one record. The explain statement show that the 2013 partition table is also scanned.

So for PostgreSQL in the current version(s), the query has to match the constraint of the partition in order to take advantage of the constraint exclusion, right?


jobs=> explain analyze SELECT * FROM jobs WHERE tsv @@ to_tsquery('english', 'BARCODE') ORDER BY post_timestamp DESC;
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=5165.49..5170.49 rows=2002 width=397) (actual time=3.067..3.101 rows=321 loops=1)
   Sort Key: jobs.post_timestamp
   Sort Method: quicksort  Memory: 202kB
   ->  Append  (cost=0.00..5055.70 rows=2002 width=397) (actual time=2.090..2.849 rows=321 loops=1)
         ->  Seq Scan on jobs  (cost=0.00..0.00 rows=1 width=390) (actual time=0.001..0.001 rows=0 loops=1)
               Filter: (tsv @@ '''barcod'''::tsquery)
         ->  Bitmap Heap Scan on jobs_2014p  (cost=1337.46..3986.36 rows=1479 width=384) (actual time=2.088..2.434 rows=218 loops=1)
               Recheck Cond: (tsv @@ '''barcod'''::tsquery)
               ->  Bitmap Index Scan on jobs_2014p_tsv_gin_idx  (cost=0.00..1337.09 rows=1479 width=0) (actual time=2.053..2.053 rows=218 loops=1)
                     Index Cond: (tsv @@ '''barcod'''::tsquery)
         ->  Bitmap Heap Scan on jobs_2013p  (cost=126.05..1069.34 rows=522 width=432) (actual time=0.236..0.377 rows=103 loops=1)
               Recheck Cond: (tsv @@ '''barcod'''::tsquery)
               ->  Bitmap Index Scan on jobs_2013p_tsv_gin_idx  (cost=0.00..125.92 rows=522 width=0) (actual time=0.220..0.220 rows=103 loops=1)
                     Index Cond: (tsv @@ '''barcod'''::tsquery)
 Total runtime: 3.173 ms
(15 rows)

jobs=> explain analyze SELECT * FROM jobs WHERE tsv @@ to_tsquery('english', 'BARCODE') ORDER BY post_timestamp DESC LIMIT 1;
                                                                               QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
 Limit  (cost=0.99..55.43 rows=1 width=397) (actual time=19.992..19.993 rows=1 loops=1)
   ->  Merge Append  (cost=0.99..109001.45 rows=2002 width=397) (actual time=19.991..19.991 rows=1 loops=1)
         Sort Key: jobs.post_timestamp
         ->  Index Scan Backward using jobs_post_timestamp_idx on jobs  (cost=0.12..4.23 rows=1 width=390) (actual time=0.004..0.004 rows=0 loops=1)
               Filter: (tsv @@ '''barcod'''::tsquery)
         ->  Index Scan Backward using jobs_2014p_post_timestamp_idx on jobs_2014p  (cost=0.42..79205.48 rows=1479 width=384) (actual time=6.845..6.845 rows=1 loops=1)
               Filter: (tsv @@ '''barcod'''::tsquery)
               Rows Removed by Filter: 718
         ->  Index Scan Backward using jobs_2013p_post_timestamp_idx on jobs_2013p  (cost=0.42..29754.98 rows=522 width=432) (actual time=13.138..13.138 rows=1 loops=1)
               Filter: (tsv @@ '''barcod'''::tsquery)
               Rows Removed by Filter: 1664
 Total runtime: 20.051 ms
(12 rows)


Thanks and regards,
Patrick


On Wednesday, September 3, 2014 11:03 AM, Patrick Dung <patrick_dkt@yahoo.com.hk> wrote:


Thanks for reply, David.

I have searched internet and changed one parameter cpu_tuple_cost from 0.01 to 0.08. I would see if it helped.
I found the problem occurred randomly.

For tsv, I thought if there is an index already built, postgresql should try to make use of it because I think for most of the time it would be faster than full table scan.

Thanks and regards,
Patrick


On Wednesday, September 3, 2014 3:00 AM, David G Johnston <david.g.johnston@gmail.com> wrote:


Patrick Dung-2 wrote
> Hello Postgresql users,
>
> In my setting, I found that sometimes the query does not use the gin index
> built for a tsv column.
>
> Attached file provide more info (with explain analyze).

So the difference between the first and third queries shown is the fact that
the data is now in-memory when the third query is run; in both plans only
sequential scans are used.

The difference between the first and second queries, where the index is
used, is therefore at least partially - if not wholly - due to those same
caching effects.

On a reasonably small table it is not that unreasonable for the planner to
choose a single retrieve-and-filter sequential scan as opposed to a
piecemeal lookup-and-retrieve index scan. 

Your example is not controlled enough to prove that the planner has made an
incorrect decision.  The estimated costs of 34.5k vs 35.3k are extremely
small and since the 34.5k is the result of changing random_page_cost you
cannot directly compare them anyway.

You need to use "enable_seqscan" and related configuration parameters so
that you can force the planner to choose different plans without changing
the underlying costs.

http://www.postgresql.org/docs/9.3/static/runtime-config-query.html

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/Question-about-gin-index-not-used-on-a-tsv-column-tp5817433p5817438.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general