why memoize is not used for correlated subquery - Mailing list pgsql-hackers
From | Pavel Stehule |
---|---|
Subject | why memoize is not used for correlated subquery |
Date | |
Msg-id | CAFj8pRBRyML16xf4W-UJGg+O1OaH4WDR9BcwZhVzW0YLCD4XBA@mail.gmail.com Whole thread Raw |
Responses |
Re: why memoize is not used for correlated subquery
Re: why memoize is not used for correlated subquery |
List | pgsql-hackers |
Hi
I am playing with examples for P2D2, and I found few issues related to memoize
1. I use dataset https://pgsql.cz/files/obce.sql - it is data about czech population
Dictionary - "obec" -> "village", "pocet_muzu" -> "number_of_men", "pocet_zen" -> "number_of_woman", "okres" -> "district", "nazev" -> "name"
I wrote the query - biggest village per district
select nazev
from obce o
where pocet_muzu + pocet_zen = (select max(pocet_muzu + pocet_zen)
from obce
where o.okres_id = okres_id);
I expected usage of memoize, because in this query, it can be very effective https://explain.depesz.com/s/0ubC
(2024-05-28 09:09:58) postgres=# explain select nazev from obce o where pocet_muzu + pocet_zen = (select max(pocet_muzu + pocet_zen) from obce where o.okres_id = okres_id);
QUERY PLAN
══════════════════════════════════════════════════════════════════════════════════════════════════════════
Seq Scan on obce o (cost=0.00..33588.33 rows=31 width=10)
Filter: ((pocet_muzu + pocet_zen) = (SubPlan 2))
SubPlan 2
-> Result (cost=5.34..5.35 rows=1 width=4)
InitPlan 1
-> Limit (cost=0.28..5.34 rows=1 width=4)
-> Index Scan Backward using obce_expr_idx on obce (cost=0.28..409.92 rows=81 width=4)
Index Cond: ((pocet_muzu + pocet_zen) IS NOT NULL)
Filter: ((o.okres_id)::text = (okres_id)::text)
(9 rows)
QUERY PLAN
══════════════════════════════════════════════════════════════════════════════════════════════════════════
Seq Scan on obce o (cost=0.00..33588.33 rows=31 width=10)
Filter: ((pocet_muzu + pocet_zen) = (SubPlan 2))
SubPlan 2
-> Result (cost=5.34..5.35 rows=1 width=4)
InitPlan 1
-> Limit (cost=0.28..5.34 rows=1 width=4)
-> Index Scan Backward using obce_expr_idx on obce (cost=0.28..409.92 rows=81 width=4)
Index Cond: ((pocet_muzu + pocet_zen) IS NOT NULL)
Filter: ((o.okres_id)::text = (okres_id)::text)
(9 rows)
But it doesn't do. I rewrote this query to lateral join, and memoize was used, but the result was not good, because filter wa pushed to subquery
explain select * from obce o, lateral (select max(pocet_zen + pocet_muzu) from obce where o.okres_id = okres_id) where pocet_zen + pocet_muzu = max;
QUERY PLAN
══════════════════════════════════════════════════════════════════════════════════════════════════════
Nested Loop (cost=12.83..19089.82 rows=31 width=45)
-> Seq Scan on obce o (cost=0.00..121.50 rows=6250 width=41)
-> Memoize (cost=12.83..12.85 rows=1 width=4)
Cache Key: (o.pocet_zen + o.pocet_muzu), o.okres_id
Cache Mode: binary
-> Subquery Scan on unnamed_subquery (cost=12.82..12.84 rows=1 width=4)
Filter: ((o.pocet_zen + o.pocet_muzu) = unnamed_subquery.max)
-> Aggregate (cost=12.82..12.83 rows=1 width=4)
-> Index Scan using obce_okres_id_idx on obce (cost=0.28..12.41 rows=81 width=8)
Index Cond: ((okres_id)::text = (o.okres_id)::text)
(10 rows)
QUERY PLAN
══════════════════════════════════════════════════════════════════════════════════════════════════════
Nested Loop (cost=12.83..19089.82 rows=31 width=45)
-> Seq Scan on obce o (cost=0.00..121.50 rows=6250 width=41)
-> Memoize (cost=12.83..12.85 rows=1 width=4)
Cache Key: (o.pocet_zen + o.pocet_muzu), o.okres_id
Cache Mode: binary
-> Subquery Scan on unnamed_subquery (cost=12.82..12.84 rows=1 width=4)
Filter: ((o.pocet_zen + o.pocet_muzu) = unnamed_subquery.max)
-> Aggregate (cost=12.82..12.83 rows=1 width=4)
-> Index Scan using obce_okres_id_idx on obce (cost=0.28..12.41 rows=81 width=8)
Index Cond: ((okres_id)::text = (o.okres_id)::text)
(10 rows)
and then the effect of memoize is almost negative https://explain.depesz.com/s/TKLL
When I used optimization fence, then memoize was used effectively https://explain.depesz.com/s/hhgi
explain select * from (select * from obce o, lateral (select max(pocet_zen + pocet_muzu) from obce where o.okres_id = okres_id) offset 0) where pocet_zen + pocet_muzu = max;
QUERY PLAN
══════════════════════════════════════════════════════════════════════════════════════════════════════
Subquery Scan on unnamed_subquery (cost=12.83..1371.93 rows=31 width=45)
Filter: ((unnamed_subquery.pocet_zen + unnamed_subquery.pocet_muzu) = unnamed_subquery.max)
-> Nested Loop (cost=12.83..1278.18 rows=6250 width=45)
-> Seq Scan on obce o (cost=0.00..121.50 rows=6250 width=41)
-> Memoize (cost=12.83..12.84 rows=1 width=4)
Cache Key: o.okres_id
Cache Mode: binary
-> Aggregate (cost=12.82..12.83 rows=1 width=4)
-> Index Scan using obce_okres_id_idx on obce (cost=0.28..12.41 rows=81 width=8)
Index Cond: ((okres_id)::text = (o.okres_id)::text)
(10 rows)
QUERY PLAN
══════════════════════════════════════════════════════════════════════════════════════════════════════
Subquery Scan on unnamed_subquery (cost=12.83..1371.93 rows=31 width=45)
Filter: ((unnamed_subquery.pocet_zen + unnamed_subquery.pocet_muzu) = unnamed_subquery.max)
-> Nested Loop (cost=12.83..1278.18 rows=6250 width=45)
-> Seq Scan on obce o (cost=0.00..121.50 rows=6250 width=41)
-> Memoize (cost=12.83..12.84 rows=1 width=4)
Cache Key: o.okres_id
Cache Mode: binary
-> Aggregate (cost=12.82..12.83 rows=1 width=4)
-> Index Scan using obce_okres_id_idx on obce (cost=0.28..12.41 rows=81 width=8)
Index Cond: ((okres_id)::text = (o.okres_id)::text)
(10 rows)
My question is - does memoize support subqueries? And can be enhanced to support this exercise without LATERAL and optimization fences?
Regards
Pavel
pgsql-hackers by date: