Re: A performance issue with Memoize - Mailing list pgsql-hackers

From Richard Guo
Subject Re: A performance issue with Memoize
Date
Msg-id CAMbWs48ePZupHo01wAek9Qv+naf0ZY+CUF0fHW1OHXrXL_fMzQ@mail.gmail.com
Whole thread Raw
In response to Re: A performance issue with Memoize  (Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>)
List pgsql-hackers

On Thu, Feb 1, 2024 at 3:43 PM Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> wrote:
Hi,

I've seen a similar issue with the following query (tested on the current head):

EXPLAIN ANALYZE SELECT * FROM tenk1 t1
LEFT JOIN LATERAL (SELECT t1.two, tenk2.hundred, tenk2.two FROM tenk2) t2
ON t1.hundred = t2.hundred WHERE t1.hundred < 5;
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=8.46..1495.10 rows=50000 width=256)
(actual time=0.860..111.013 rows=50000 loops=1)
   ->  Bitmap Heap Scan on tenk1 t1  (cost=8.16..376.77 rows=500
width=244) (actual time=0.798..1.418 rows=500 loops=1)
         Recheck Cond: (hundred < 5)
         Heap Blocks: exact=263
         ->  Bitmap Index Scan on tenk1_hundred  (cost=0.00..8.04
rows=500 width=0) (actual time=0.230..0.230 rows=500 loops=1)
               Index Cond: (hundred < 5)
   ->  Memoize  (cost=0.30..4.89 rows=100 width=12) (actual
time=0.009..0.180 rows=100 loops=500)
         Cache Key: t1.hundred
         Cache Mode: logical
         Hits: 0  Misses: 500  Evictions: 499  Overflows: 0  Memory Usage: 5kB
         ->  Index Scan using tenk2_hundred on tenk2  (cost=0.29..4.88
rows=100 width=12) (actual time=0.007..0.124 rows=100 loops=500)
               Index Cond: (hundred = t1.hundred)
 Planning Time: 0.661 ms
 Execution Time: 113.076 ms
(14 rows)

The memoize's cache key only uses t1.hundred while the nested loop has
two changed parameters: the lateral var t1.two and t1.hundred. This
leads to a chgParam that is always different and the cache is purged
on each rescan.

Thanks for the report!  This issue is caused by that we fail to check
PHVs for lateral references, and hence cannot know that t1.two should
also be included in the cache keys.

I reported exactly the same issue in [1], and verified that it can be
fixed by the patch in [2] (which seems to require a rebase).

[1] https://www.postgresql.org/message-id/CAMbWs4_imG5C8rXt7xdU7zf6whUDc2rdDun%2BVtrowcmxb41CzA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAMbWs49%2BCjoy0S0xkCRDcHXGHvsYLOdvr9jq9OTONOBnsgzXOg%40mail.gmail.com

Thanks
Richard

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Fix bugs not to discard statistics when changing stats_fetch_consistency
Next
From: David Rowley
Date:
Subject: Re: set_cheapest without checking pathlist