I ran into a query that shows a performance regression related to the
Memoize node.
create table t (a int, b int, c int);
insert into t select i%3, i, i from generate_series(1,500)i;
analyze t;
explain (analyze, costs off, timing off)
select * from t t1 join lateral
(select t2.a, t2.b, t1.a x from t t2, t t3 offset 0) s
on s.a = t1.a;
with enable_memoize set to on:
Planning Time: 2.470 ms
Execution Time: 98869.240 ms
with enable_memoize set to off:
Planning Time: 1.791 ms
Execution Time: 55754.080 ms
This shows a 77.3% performance regression with Memoize enabled.
The stats of the Memoize node shows some clues:
-> Memoize (actual rows=83334.00 loops=500)
Cache Key: t1.a
Cache Mode: binary
Hits: 0 Misses: 500 Evictions: 498 Overflows: 0 Memory
Usage: 8193kB
There are 0 cache hits, and too many cache evictions.
So I suspect that during the phase of filling the Memoize cache, the
memory usage exceeds the specified limit, causing cache entries to be
repeatedly evicted.
While cost_memoize_rescan() does account for the eviction ratio when
estimating the cost of Memoize, the estimate does not seem to be
accurate enough in this case to prevent the planner from choosing a
Memoize node.
Any thoughts on how we might improve this?
Thanks
Richard