Re: force partition pruning - Mailing list pgsql-general
From | Vijaykumar Jain |
---|---|
Subject | Re: force partition pruning |
Date | |
Msg-id | CAM+6J96R9i2zU0CrvHSwK+0LdAMeW=kQ0VnznRxh_9h-NO3tjg@mail.gmail.com Whole thread Raw |
In response to | Re: force partition pruning (Vijaykumar Jain <vijaykumarjain.github@gmail.com>) |
Responses |
SV: force partition pruning
|
List | pgsql-general |
During initialization of the query plan. Partition pruning can be performed here for parameter values which are known during the initialization phase of execution. Partitions which are pruned during this stage will not show up in the query's
EXPLAIN
orEXPLAIN ANALYZE
. It is possible to determine the number of partitions which were removed during this phase by observing the “Subplans Removed” property in theEXPLAIN
output.During actual execution of the query plan. Partition pruning may also be performed here to remove partitions using values which are only known during actual query execution. This includes values from subqueries and values from execution-time parameters such as those from parameterized nested loop joins. Since the value of these parameters may change many times during the execution of the query, partition pruning is performed whenever one of the execution parameters being used by partition pruning changes. Determining if partitions were pruned during this phase requires careful inspection of the
loops
property in theEXPLAIN ANALYZE
output. Subplans corresponding to different partitions may have different values for it depending on how many times each of them was pruned during execution. Some may be shown as(never executed)
if they were pruned every time.
I do not know how to put this in words,but see below when the predicate is explicitly applied to the main table with partition.postgres=# \d+ prt1Partitioned table "public.prt1"Column | Type | Collation | Nullable | Default | Storage | Stats target | Description--------+-------------------+-----------+----------+---------+----------+--------------+-------------a | integer | | not null | | plain | |b | integer | | | | plain | |c | character varying | | | | extended | |Partition key: RANGE (a)Partitions: prt1_p1 FOR VALUES FROM (0) TO (250),prt1_p2 FOR VALUES FROM (250) TO (500),prt1_p3 FOR VALUES FROM (500) TO (600)(failed reverse-i-search)`': ^Cpostgres=# \d+ bTable "public.b"Column | Type | Collation | Nullable | Default | Storage | Stats target | Description--------+---------+-----------+----------+---------+---------+--------------+-------------id | integer | | not null | | plain | |Indexes:"b_id_idx" btree (id)Access method: heappostgres=# table b;id-----200400(2 rows)-- basically if the table is joined and predicate can be applied to the outer table which has constraints matching,partition pruning takes place.I do not know the theory, or even what i did is correct, but just FYI.postgres=# explain analyze select prt1.* from prt1 where a in ( select id from b where id in (1, 100, 200) );QUERY PLAN-------------------------------------------------------------------------------------------------------------------------Hash Semi Join (cost=1.05..9.36 rows=2 width=13) (actual time=0.034..0.074 rows=1 loops=1)Hash Cond: (prt1.a = b.id)-> Append (cost=0.00..7.50 rows=300 width=13) (actual time=0.006..0.043 rows=300 loops=1)-> Seq Scan on prt1_p1 prt1_1 (cost=0.00..2.25 rows=125 width=13) (actual time=0.005..0.013 rows=125 loops=1)-> Seq Scan on prt1_p2 prt1_2 (cost=0.00..2.25 rows=125 width=13) (actual time=0.003..0.009 rows=125 loops=1)-> Seq Scan on prt1_p3 prt1_3 (cost=0.00..1.50 rows=50 width=13) (actual time=0.002..0.004 rows=50 loops=1)-> Hash (cost=1.03..1.03 rows=2 width=4) (actual time=0.005..0.005 rows=1 loops=1)Buckets: 1024 Batches: 1 Memory Usage: 9kB-> Seq Scan on b (cost=0.00..1.03 rows=2 width=4) (actual time=0.003..0.003 rows=1 loops=1)Filter: (id = ANY ('{1,100,200}'::integer[]))Rows Removed by Filter: 1Planning Time: 0.181 msExecution Time: 0.089 ms(13 rows)postgres=# explain analyze select prt1.* from prt1 where a in ( select id from b where b.id = prt1.a) and a in (1, 100, 200);QUERY PLAN---------------------------------------------------------------------------------------------------------------------------Index Scan using iprt1_p1_a on prt1_p1 prt1 (cost=0.14..14.03 rows=2 width=13) (actual time=0.024..0.025 rows=1 loops=1)Index Cond: (a = ANY ('{1,100,200}'::integer[]))Filter: (SubPlan 1)Rows Removed by Filter: 1SubPlan 1-> Seq Scan on b (cost=0.00..1.02 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=2)Filter: (id = prt1.a)Rows Removed by Filter: 1Planning Time: 0.120 msExecution Time: 0.041 ms(10 rows)postgres=# explain analyze select prt1.* from prt1 where exists ( select 1 from b where b.id = prt1.a) and a in (1, 100, 200);QUERY PLAN-------------------------------------------------------------------------------------------------------------Hash Semi Join (cost=1.04..3.79 rows=2 width=13) (actual time=0.024..0.028 rows=1 loops=1)Hash Cond: (prt1.a = b.id)-> Seq Scan on prt1_p1 prt1 (cost=0.00..2.72 rows=3 width=13) (actual time=0.011..0.017 rows=2 loops=1)Filter: (a = ANY ('{1,100,200}'::integer[]))Rows Removed by Filter: 123-> Hash (cost=1.02..1.02 rows=2 width=4) (actual time=0.004..0.004 rows=2 loops=1)Buckets: 1024 Batches: 1 Memory Usage: 9kB-> Seq Scan on b (cost=0.00..1.02 rows=2 width=4) (actual time=0.001..0.002 rows=2 loops=1)Planning Time: 0.192 msExecution Time: 0.043 ms(10 rows)postgres=# explain analyze select prt1.* from prt1 inner join b on prt1.a = b.id where a in (1, 100, 200);QUERY PLAN-------------------------------------------------------------------------------------------------------------Hash Join (cost=1.04..3.79 rows=2 width=13) (actual time=0.024..0.028 rows=1 loops=1)Hash Cond: (prt1.a = b.id)-> Seq Scan on prt1_p1 prt1 (cost=0.00..2.72 rows=3 width=13) (actual time=0.012..0.018 rows=2 loops=1)Filter: (a = ANY ('{1,100,200}'::integer[]))Rows Removed by Filter: 123-> Hash (cost=1.02..1.02 rows=2 width=4) (actual time=0.003..0.004 rows=2 loops=1)Buckets: 1024 Batches: 1 Memory Usage: 9kB-> Seq Scan on b (cost=0.00..1.02 rows=2 width=4) (actual time=0.001..0.002 rows=2 loops=1)Planning Time: 0.181 msExecution Time: 0.043 ms(10 rows)On Mon, 10 May 2021 at 17:09, Niels Jespersen <NJN@dst.dk> wrote:Hi all
I need a litte advice on how to
Postgres 13.2
A metadata query pulls partition keys:
select m.period_version from register.register_metadata m where m.current_version and m.period between '201712' and '201912' ;
A query using these in an in-list easily makes the planner do partition pruning.
select * from register.register d where d.period_version in ('201712_1', '201812_1', '201912_1');
However combining the metadataquery into the dataquery makes the planner decide to scan all partitions.
select * from register.register d where d.period_version in (select m.period_version from register.register_metadata m where m.current_version and m.period between '201712' and '201912');
I am quite aware that the latter query requires partition pruning to take place during execution not during planning.
My question here is how do I package the two-step proces into an interface that analysts can actually use?
One possibility is to have a prepare step that creates a temporary view with the hard-coded values built-in. And then query data via the temp view. This works ok, but there is an issue with possible naming conflicts on the temp view (not that this could not be worked around).
Ideally I would like a function to figure out the query and then return the data from that dynamically executed query. Complicating matters is the fact that there are more than one set of data/metatable tables and each datatable has a different set of columns. This excludes a table returning function since that must list the columns present.
Best regards
Niels Jespersen
Chief Adviser
IT Center
Mobile phone:+45 42 42 93 73
Email: njn@dst.dk
Statistics Denmark, Sejrøgade 11, DK-2100 Copenhagenwww.dst.dk/en | Twitter | LinkedIn | Facebook
--Thanks,VijayMumbai, India
Attachment
pgsql-general by date: