Thread: Differents execution times with gin index, prepared statement and literals.
Differents execution times with gin index, prepared statement and literals.
Hello,
One of our customer is facing an issue regarding a prepared statement and a gin index.
An important gap between execution times can be shown when using execute or plain literals.
Here is the test that shown this issue :
Init table :
create table tmp_tk_test_index ( sync_id bigint, line_id varchar(50), chk_upgrade_index smallint default 0 ); create index idx_tmp_tk_test_index_1 on tmp_tk_test_index using gin (sync_id, line_id); insert into tmp_tk_test_index SELECT tb_seq as sync_id, case when tb_seq = 950000 then 'the-test-value-fa529a621a15' else gen_random_uuid()::text end as line_id FROM generate_series(1,1000000) as tb_seq;
Prepare query :
prepare stmt(bigint, text, int) as delete from tmp_tk_test_index where sync_id <= $1 and line_id = $2 and chk_upgrade_index = $3;
And then execute it :
postgres=# begin; BEGIN postgres=*# explain (analyse) execute stmt(950000, 'the-test-value-fa529a621a15', 0); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Delete on tmp_tk_test_index (cost=212.52..2662.59 rows=0 width=0) (actual time=60.766..60.767 rows=0 loops=1) -> Bitmap Heap Scan on tmp_tk_test_index (cost=212.52..2662.59 rows=4 width=6) (actual time=60.756..60.758 rows=1 loops=1) Recheck Cond: ((sync_id <= '950000'::bigint) AND ((line_id)::text = 'the-test-value-fa529a621a15'::text)) Filter: (chk_upgrade_index = 0) Heap Blocks: exact=1 -> Bitmap Index Scan on idx_tmp_tk_test_index_1 (cost=0.00..212.52 rows=810 width=0) (actual time=60.745..60 .745 rows=1 loops=1) Index Cond: ((sync_id <= '950000'::bigint) AND ((line_id)::text = 'the-test-value-fa529a621a15'::text)) Planning Time: 6.765 ms Execution Time: 61.160 ms (9 rows) postgres=*# rollback ; ROLLBACK
It takes 61.160ms to be executed. However, the "same" query without a prepared statement is far faster :
postgres=# begin; BEGIN postgres=*# explain analyze delete from tmp_tk_test_index where sync_id <= 950000 and line_id = 'the-test-value-fa529a621a15' and chk_upgrade_index = 0;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
----------------
Delete on tmp_tk_test_index (cost=21.36..25.38 rows=0 width=0) (actual time=0.084..0.085 rows=0 loops=1)
-> Bitmap Heap Scan on tmp_tk_test_index (cost=21.36..25.38 rows=1 width=6) (actual time=0.042..0.043 rows=1 loops
=1)
Recheck Cond: ((line_id)::text = 'the-test-value-fa529a621a15'::text)
Filter: ((sync_id <= 950000) AND (chk_upgrade_index = 0))
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_tmp_tk_test_index_1 (cost=0.00..21.36 rows=1 width=0) (actual time=0.027..0.028
rows=1 loops=1)
Index Cond: ((line_id)::text = 'the-test-value-fa529a621a15'::text)
Planning Time: 0.325 ms
Execution Time: 0.148 ms
(9 rows)
postgres=*# rollback ;
ROLLBACK
Only 0.148ms to be executed...
What can cause this quite important gap ? (0.148ms vs 61.160ms).
In the explain plan, I can see that the Index Cond is different : in the slowest case (sync_id <= '950000'::bigint) is present.
Why isn't it present in the second case ?
Two more tests :
The prepared statement is tweaked with sync_id+0. The query is fast :
prepare stmt3(bigint, text, int) as delete from tmp_tk_test_index where sync_id+0 <= $1 and line_id = $2 and chk_upgrade_index = $3; explain (analyse) execute stmt3(950000, 'the-test-value-fa529a621a15', 0);
Delete on tmp_tk_test_index (cost=21.36..25.38 rows=0 width=0) (actual time=0.057..0.058 rows=0 loops=1)
-> Bitmap Heap Scan on tmp_tk_test_index (cost=21.36..25.38 rows=1 width=6) (actual time=0.040..0.042 rows=1 loops
=1)
Recheck Cond: ((line_id)::text = 'the-test-value-fa529a621a15'::text)
Filter: ((chk_upgrade_index = 0) AND ((sync_id + 0) <= '950000'::bigint))
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_tmp_tk_test_index_1 (cost=0.00..21.36 rows=1 width=0) (actual time=0.020..0.020
rows=1 loops=1)
Index Cond: ((line_id)::text = 'the-test-value-fa529a621a15'::text)
Planning Time: 0.355 ms
Execution Time: 0.101 ms
(9 rows)
And one with an index but without sync_id inside it. Prepared statement isn't changed.
create index idx_tmp_tk_test_index_2 on tmp_tk_test_index using gin (line_id);
Delete on tmp_tk_test_index (cost=21.36..25.38 rows=0 width=0) (actual time=0.052..0.053 rows=0 loops=1) -> Bitmap Heap Scan on tmp_tk_test_index (cost=21.36..25.38 rows=1 width=6) (actual time=0.036..0.038 rows=1 loops =1) Recheck Cond: ((line_id)::text = 'the-test-value-fa529a621a15'::text) Filter: ((sync_id <= '950000'::bigint) AND (chk_upgrade_index = 0)) Heap Blocks: exact=1 -> Bitmap Index Scan on idx_tmp_tk_test_index_2 (cost=0.00..21.36 rows=1 width=0) (actual time=0.020..0.021 rows=1 loops=1) Index Cond: ((line_id)::text = 'the-test-value-fa529a621a15'::text) Planning Time: 0.439 ms Execution Time: 0.098 ms (9 rows
Quite fast as well...
Have you got an idea on the initial issue ? Why when using a prepared statement and a gin index the execution time "explode" ?
Something to do with the planner ? optimizer ?
(We executed the same test with a btree index and execution times are the same in both cases).
Regards
Re: Differents execution times with gin index, prepared statement and literals.
On 7/16/24 17:43, Pierrick Chovelon wrote: > ... > > Quite fast as well... > > Have you got an idea on the initial issue ? Why when using a prepared > statement and a gin index the execution time "explode" ? > Something to do with the planner ? optimizer ? > > (We executed the same test with a btree index and execution times are > the same in both cases). > The reason why the two queries end up with different plans is pretty simple - the condition ends up matching different operators, because of data type difference. In case of the prepared query, the (x <= 950000) matches <=(bigint,bitint) operator, and thus it matches the index. But that happens because the query is prepared with bigint parameter. For the regular query, the 950000 literal gets treated as int, the condition matches to <=(bigint,int) and that does not match the index - hence it's treated as a filter, not an index condition. If you cast the literal to bigint (by doing ::bigint) in the regular query, we end it'll use the same same plan as the prepared query - but that's the slower one, unfortunately :-( Which gets us to why that plan is slower, compared to the plan using fewer conditions. I think the problem is that <= 950000 matches most of the table, which means the GIN index will have to load and process a pretty long TID list, which is clearly not cheap. I don't think there's much you can do do - we don't consider this when matching conditions to the index, we simply match as many conditions as possible. And the GIN code is not smart enough to make judgements about which columns to process first - it just goes column by column and builds the bitmap, and building a bitmap on 95% of the table is costly. If this is a systemic problem for most/all queries (i.e. it's enough to have a condition on line_id), I believe the +0 trick is a good way to make sure the condition is treated as a filter. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Differents execution times with gin index, prepared statement and literals.
Hello,
Thanks a lot for your clear answer.
I try the following thing :On 7/16/24 17:43, Pierrick Chovelon wrote:... Quite fast as well... Have you got an idea on the initial issue ? Why when using a prepared statement and a gin index the execution time "explode" ? Something to do with the planner ? optimizer ? (We executed the same test with a btree index and execution times are the same in both cases).The reason why the two queries end up with different plans is pretty simple - the condition ends up matching different operators, because of data type difference. In case of the prepared query, the (x <= 950000) matches <=(bigint,bitint) operator, and thus it matches the index. But that happens because the query is prepared with bigint parameter. For the regular query, the 950000 literal gets treated as int, the condition matches to <=(bigint,int) and that does not match the index - hence it's treated as a filter, not an index condition. If you cast the literal to bigint (by doing ::bigint) in the regular query, we end it'll use the same same plan as the prepared query - but that's the slower one, unfortunately :-(
postgres=# prepare stmt(int, text, int) as delete from tmp_tk_test_index where sync_id <= $1 and line_id = $2 and chk_upgrade_index = $3; PREPARE postgres=# begin ; BEGIN postgres=*# explain (analyse) execute stmt(950000, 'the-test-value-fa529a621a15', 0);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Delete on tmp_tk_test_index (cost=21.36..25.38 rows=0 width=0) (actual time=0.148..0.149 rows=0 loops=1) -> Bitmap Heap Scan on tmp_tk_test_index (cost=21.36..25.38 rows=1 width=6) (actual time=0.146..0.147 rows=0 loops=1) Recheck Cond: ((line_id)::text = 'the-test-value-fa529a621a15'::text) Filter: ((sync_id <= 950000) AND (chk_upgrade_index = 0)) Heap Blocks: exact=1 -> Bitmap Index Scan on idx_tmp_tk_test_index_1 (cost=0.00..21.36 rows=1 width=0) (actual time=0.099..0.099 rows=1 loops=1) Index Cond: ((line_id)::text = 'the-test-value-fa529a621a15'::text) Planning Time: 9.412 ms Execution Time: 1.570 ms (9 rows) postgres=*# rollback ; ROLLBACK
So preparing a query with a data type different from the column (int (prepared statement) vs bigint (table)) is faster in our case :/
It doesn't sound obvious to me :)
Thanks again for your answer Tomas.
Which gets us to why that plan is slower, compared to the plan using fewer conditions. I think the problem is that <= 950000 matches most of the table, which means the GIN index will have to load and process a pretty long TID list, which is clearly not cheap. I don't think there's much you can do do - we don't consider this when matching conditions to the index, we simply match as many conditions as possible. And the GIN code is not smart enough to make judgements about which columns to process first - it just goes column by column and builds the bitmap, and building a bitmap on 95% of the table is costly. If this is a systemic problem for most/all queries (i.e. it's enough to have a condition on line_id), I believe the +0 trick is a good way to make sure the condition is treated as a filter. regards
-- Pierrick Chovelon Consultant DBA PostgreSQL - Dalibo