Re: Poor man's partitioned index .... not being used? - Mailing list pgsql-performance
From | Gunther |
---|---|
Subject | Re: Poor man's partitioned index .... not being used? |
Date | |
Msg-id | 82475aad-cc70-4aab-d2ac-dd193c6ec4c5@gusw.net Whole thread Raw |
In response to | Re: Poor man's partitioned index .... not being used? (Andrew Gierth <andrew@tao11.riddles.org.uk>) |
Responses |
Re: Poor man's partitioned index .... not being used?
|
List | pgsql-performance |
Thanks David Rowley and Andrew Gierth.
If you did WHERE mod(id,2) = mod(8934,2) AND id = 8934 then the index would likely be used - because the prover can then treat mod(id,2) as an atom (call it X), constant-fold mod(8934,2) to 0 because mod() is immutable, and then observe that (X = 0) proves that (X = 0).
foo=# EXPLAIN SELECT * FROM Test WHERE mod(id,2) = mod(8934,2) AND id = 8934; QUERY PLAN --------------------------------------------------------------------------Index Only Scan using test_pk0 on test (cost=0.42..8.44 rows=1 width=4) Index Cond: (id = 8934) (2 rows)
Yes indeed! It's being used that way! Interesting. Only that we can't use it if id was a variable? Hmm ...
foo=# PREPARE testplan(int) AS foo-# SELECT * FROM Test WHERE mod(id,2) = mod($1,2) AND id = $1; PREPARE foo=# EXPLAIN EXECUTE testplan(8934); QUERY PLAN --------------------------------------------------------------------------Index Only Scan using test_pk0 on test (cost=0.42..8.44 rows=1 width=4) Index Cond: (id = 8934) (2 rows)
That's quite alright actually. Now the questions is, could we use this in a nested loop query plan? That's where I think it can't work:
foo=# CREATE TABLE Test2 AS SELECT * FROM Test WHERE random() < 0.01 ORDER BY id DESC; SELECT 3730 integrator=# ANALYZE Test2; ANALYZE foo=# EXPLAIN SELECT * FROM Test2 a LEFT OUTER JOIN Test b ON( mod(b.id,2) = mod(a.id,2) AND b.id = a.id) LIMIT 10; QUERY PLAN -----------------------------------------------------------------------------Limit (cost=110.25..135.67 rows=10 width=8) -> Hash Right Join (cost=110.25..9591.02 rows=3730 width=8) Hash Cond: ((mod(b.id, 2) = mod(a.id, 2)) AND (b.id = a.id)) -> Seq Scan on test b (cost=0.00..5599.29 rows=388129 width=4) -> Hash (cost=54.30..54.30 rows=3730 width=4) -> Seq Scan on test2 a (cost=0.00..54.30 rows=3730 width=4) (6 rows) foo=# SET enable_hashjoin TO off; SET foo=# EXPLAIN SELECT * FROM Test2 a LEFT OUTER JOIN Test b ON( mod(b.id,2) = mod(a.id,2) AND b.id = a.id) LIMIT 10; QUERY PLAN --------------------------------------------------------------------------------Limit (cost=47214.73..47227.86 rows=10 width=8) -> Merge Right Join (cost=47214.73..52113.16 rows=3730 width=8) Merge Cond: (((mod(b.id, 2)) = (mod(a.id, 2))) AND (b.id = a.id)) -> Sort (cost=46939.15..47909.47 rows=388129 width=4) Sort Key: (mod(b.id, 2)), b.id -> Seq Scan on test b (cost=0.00..5599.29 rows=388129 width=4) -> Sort (cost=275.58..284.91 rows=3730 width=4) Sort Key: (mod(a.id, 2)), a.id -> Seq Scan on test2 a (cost=0.00..54.30 rows=3730 width=4) (9 rows) foo=# SET enable_mergejoin TO off; SET foo=# EXPLAIN SELECT * FROM Test2 a LEFT OUTER JOIN Test b ON( mod(b.id,2) = mod(a.id,2) AND b.id = a.id) LIMIT 10; QUERY PLAN --------------------------------------------------------------------------------Limit (cost=0.00..102516.78 rows=10 width=8) -> Nested Loop Left Join (cost=0.00..38238760.24 rows=3730 width=8) Join Filter: ((b.id = a.id) AND (mod(b.id, 2) = mod(a.id, 2))) -> Seq Scan on test2 a (cost=0.00..54.30 rows=3730 width=4) -> Materialize (cost=0.00..9056.93 rows=388129 width=4) -> Seq Scan on test b (cost=0.00..5599.29 rows=388129 width=4) (6 rows)
It looks like it doesn't want to evaluate the mod(a.id, 2) before it moves to the index query for the nested loop.
Notably the partitioned table approach should do that, but it has a different expression for the partition. No mod function but MODULUS and REMAINDER.
I wonder if there was a way of marking such expressions as safe in the query, like suggesting a certain evaluation order, i.e.,
SELECT * FROM Test2 a LEFT OUTER JOIN Test b ON(mod(b.id,2) = EVAL(mod(a.id,2)) AND b.id = a.id) LIMIT 10;
It's OK though. It just goes to show that in a case like this, it is best to just go with the partitioned table anyway.
regards,
-Gunther
pgsql-performance by date: