Re: Poor man's partitioned index .... not being used? - Mailing list pgsql-performance

From Gunther
Subject Re: Poor man's partitioned index .... not being used?
Date
Msg-id 82475aad-cc70-4aab-d2ac-dd193c6ec4c5@gusw.net
Whole thread Raw
In response to Re: Poor man's partitioned index .... not being used?  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Responses Re: Poor man's partitioned index .... not being used?
List pgsql-performance

Thanks David Rowley and Andrew Gierth.

On 3/20/2019 23:46, Andrew Gierth wrote:
If you did  WHERE mod(id,2) = mod(8934,2) AND id = 8934

then the index would likely be used - because the prover can then treat
mod(id,2) as an atom (call it X), constant-fold mod(8934,2) to 0 because
mod() is immutable, and then observe that (X = 0) proves that (X = 0).
foo=# EXPLAIN SELECT * FROM Test WHERE mod(id,2) = mod(8934,2) AND id = 8934;                               QUERY PLAN
--------------------------------------------------------------------------Index Only Scan using test_pk0 on test  (cost=0.42..8.44 rows=1 width=4)  Index Cond: (id = 8934)
(2 rows)

Yes indeed! It's being used that way! Interesting. Only that we can't use it if id was a variable? Hmm ...

foo=# PREPARE testplan(int) AS 
foo-#    SELECT * FROM Test WHERE mod(id,2) = mod($1,2) AND id = $1;
PREPARE
foo=# EXPLAIN EXECUTE testplan(8934);                               QUERY PLAN
--------------------------------------------------------------------------Index Only Scan using test_pk0 on test  (cost=0.42..8.44 rows=1 width=4)  Index Cond: (id = 8934)
(2 rows)

That's quite alright actually. Now the questions is, could we use this in a nested loop query plan? That's where I think it can't work:

foo=# CREATE TABLE Test2 AS SELECT * FROM Test WHERE random() < 0.01 ORDER BY id DESC;
SELECT 3730
integrator=# ANALYZE Test2;
ANALYZE
foo=# EXPLAIN SELECT * FROM Test2 a LEFT OUTER JOIN Test b ON( mod(b.id,2) = mod(a.id,2) AND b.id = a.id) LIMIT 10;                                QUERY PLAN
-----------------------------------------------------------------------------Limit  (cost=110.25..135.67 rows=10 width=8)  ->  Hash Right Join  (cost=110.25..9591.02 rows=3730 width=8)        Hash Cond: ((mod(b.id, 2) = mod(a.id, 2)) AND (b.id = a.id))        ->  Seq Scan on test b  (cost=0.00..5599.29 rows=388129 width=4)        ->  Hash  (cost=54.30..54.30 rows=3730 width=4)              ->  Seq Scan on test2 a  (cost=0.00..54.30 rows=3730 width=4)
(6 rows)

foo=# SET enable_hashjoin TO off;
SET
foo=# EXPLAIN SELECT * FROM Test2 a LEFT OUTER JOIN Test b ON( mod(b.id,2) = mod(a.id,2) AND b.id = a.id) LIMIT 10;                                  QUERY PLAN
--------------------------------------------------------------------------------Limit  (cost=47214.73..47227.86 rows=10 width=8)  ->  Merge Right Join  (cost=47214.73..52113.16 rows=3730 width=8)        Merge Cond: (((mod(b.id, 2)) = (mod(a.id, 2))) AND (b.id = a.id))        ->  Sort  (cost=46939.15..47909.47 rows=388129 width=4)              Sort Key: (mod(b.id, 2)), b.id              ->  Seq Scan on test b  (cost=0.00..5599.29 rows=388129 width=4)        ->  Sort  (cost=275.58..284.91 rows=3730 width=4)              Sort Key: (mod(a.id, 2)), a.id              ->  Seq Scan on test2 a  (cost=0.00..54.30 rows=3730 width=4)
(9 rows)

foo=# SET enable_mergejoin TO off;
SET
foo=# EXPLAIN SELECT * FROM Test2 a LEFT OUTER JOIN Test b ON( mod(b.id,2) = mod(a.id,2) AND b.id = a.id) LIMIT 10;                                  QUERY PLAN
--------------------------------------------------------------------------------Limit  (cost=0.00..102516.78 rows=10 width=8)  ->  Nested Loop Left Join  (cost=0.00..38238760.24 rows=3730 width=8)        Join Filter: ((b.id = a.id) AND (mod(b.id, 2) = mod(a.id, 2)))        ->  Seq Scan on test2 a  (cost=0.00..54.30 rows=3730 width=4)        ->  Materialize  (cost=0.00..9056.93 rows=388129 width=4)              ->  Seq Scan on test b  (cost=0.00..5599.29 rows=388129 width=4)
(6 rows)

It looks like it doesn't want to evaluate the mod(a.id, 2) before it moves to the index query for the nested loop.

Notably the partitioned table approach should do that, but it has a different expression for the partition. No mod function but MODULUS and REMAINDER.

I wonder if there was a way of marking such expressions as safe in the query, like suggesting a certain evaluation order, i.e.,

SELECT * FROM Test2 a LEFT OUTER JOIN Test b ON(mod(b.id,2) = EVAL(mod(a.id,2)) AND b.id = a.id) LIMIT 10;

It's OK though. It just goes to show that in a case like this, it is best to just go with the partitioned table anyway.

regards,
-Gunther

pgsql-performance by date:

Previous
From: Andrew Gierth
Date:
Subject: Re: Poor man's partitioned index .... not being used?
Next
From: Gunther
Date:
Subject: EXPLAIN PLAN for DELETE CASCADE or DELETE not using pkey indexdespite EXPLAINing that it would?