Home > mailing lists

why memoize is not used for correlated subquery - Mailing list pgsql-hackers

From	Pavel Stehule
Subject	why memoize is not used for correlated subquery
Date	May 28, 2024 07:31:03
Msg-id	CAFj8pRBRyML16xf4W-UJGg+O1OaH4WDR9BcwZhVzW0YLCD4XBA@mail.gmail.com Whole thread
Responses	Re: why memoize is not used for correlated subquery Re: why memoize is not used for correlated subquery
List	pgsql-hackers

Tree view

I am playing with examples for P2D2, and I found few issues related to memoize

1. I use dataset https://pgsql.cz/files/obce.sql - it is data about czech population

Dictionary - "obec" -> "village", "pocet_muzu" -> "number_of_men", "pocet_zen" -> "number_of_woman", "okres" -> "district", "nazev" -> "name"

I wrote the query - biggest village per district

select nazev

from obce o

where pocet_muzu + pocet_zen = (select max(pocet_muzu + pocet_zen)

from obce

where o.okres_id = okres_id);

I expected usage of memoize, because in this query, it can be very effective https://explain.depesz.com/s/0ubC

(2024-05-28 09:09:58) postgres=# explain select nazev from obce o where pocet_muzu + pocet_zen = (select max(pocet_muzu + pocet_zen) from obce where o.okres_id = okres_id);
QUERY PLAN
══════════════════════════════════════════════════════════════════════════════════════════════════════════
Seq Scan on obce o (cost=0.00..33588.33 rows=31 width=10)
Filter: ((pocet_muzu + pocet_zen) = (SubPlan 2))
SubPlan 2
-> Result (cost=5.34..5.35 rows=1 width=4)
InitPlan 1
-> Limit (cost=0.28..5.34 rows=1 width=4)
-> Index Scan Backward using obce_expr_idx on obce (cost=0.28..409.92 rows=81 width=4)
Index Cond: ((pocet_muzu + pocet_zen) IS NOT NULL)
Filter: ((o.okres_id)::text = (okres_id)::text)
(9 rows)

But it doesn't do. I rewrote this query to lateral join, and memoize was used, but the result was not good, because filter wa pushed to subquery

explain select * from obce o, lateral (select max(pocet_zen + pocet_muzu) from obce where o.okres_id = okres_id) where pocet_zen + pocet_muzu = max;
QUERY PLAN
══════════════════════════════════════════════════════════════════════════════════════════════════════
Nested Loop (cost=12.83..19089.82 rows=31 width=45)
-> Seq Scan on obce o (cost=0.00..121.50 rows=6250 width=41)
-> Memoize (cost=12.83..12.85 rows=1 width=4)
Cache Key: (o.pocet_zen + o.pocet_muzu), o.okres_id
Cache Mode: binary
-> Subquery Scan on unnamed_subquery (cost=12.82..12.84 rows=1 width=4)
Filter: ((o.pocet_zen + o.pocet_muzu) = unnamed_subquery.max)
-> Aggregate (cost=12.82..12.83 rows=1 width=4)
-> Index Scan using obce_okres_id_idx on obce (cost=0.28..12.41 rows=81 width=8)
Index Cond: ((okres_id)::text = (o.okres_id)::text)
(10 rows)

and then the effect of memoize is almost negative https://explain.depesz.com/s/TKLL

When I used optimization fence, then memoize was used effectively https://explain.depesz.com/s/hhgi

explain select * from (select * from obce o, lateral (select max(pocet_zen + pocet_muzu) from obce where o.okres_id = okres_id) offset 0) where pocet_zen + pocet_muzu = max;
QUERY PLAN
══════════════════════════════════════════════════════════════════════════════════════════════════════
Subquery Scan on unnamed_subquery (cost=12.83..1371.93 rows=31 width=45)
Filter: ((unnamed_subquery.pocet_zen + unnamed_subquery.pocet_muzu) = unnamed_subquery.max)
-> Nested Loop (cost=12.83..1278.18 rows=6250 width=45)
-> Seq Scan on obce o (cost=0.00..121.50 rows=6250 width=41)
-> Memoize (cost=12.83..12.84 rows=1 width=4)
Cache Key: o.okres_id
Cache Mode: binary
-> Aggregate (cost=12.82..12.83 rows=1 width=4)
-> Index Scan using obce_okres_id_idx on obce (cost=0.28..12.41 rows=81 width=8)
Index Cond: ((okres_id)::text = (o.okres_id)::text)
(10 rows)

My question is - does memoize support subqueries? And can be enhanced to support this exercise without LATERAL and optimization fences?

Regards

Pavel

pgsql-hackers by date:

From: Pradeep Kumar
Date: 28 May 2024, 07:10:47
Subject: Re: Need clarification on compilation errors in PG 16.2

From: Tender Wang
Date: 28 May 2024, 07:46:21
Subject: Re: why memoize is not used for correlated subquery

why memoize is not used for correlated subquery - Mailing list pgsql-hackers

Previous

Next