Re: Bug in row_number() optimization - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Bug in row_number() optimization
Date
Msg-id CAMbWs48DLWK5dkA6ZS0Z42-b7Hzu7-CT1_hBmB7YyeAY6Em+_w@mail.gmail.com
Whole thread Raw
In response to Re: Bug in row_number() optimization  (Richard Guo <guofenglinux@gmail.com>)
Responses Re: Bug in row_number() optimization
List pgsql-hackers

On Tue, Nov 22, 2022 at 3:44 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Wed, Nov 16, 2022 at 7:38 AM Sergey Shinderuk <s.shinderuk@postgrespro.ru> wrote:
The failing query is:
SELECT * FROM
   (SELECT *,
           count(salary) OVER (PARTITION BY depname || '') c1, -- w1
           row_number() OVER (PARTITION BY depname) rn, -- w2
           count(*) OVER (PARTITION BY depname) c2, -- w2
           count(*) OVER (PARTITION BY '' || depname) c3 -- w3
    FROM empsalary
) e WHERE rn <= 1 AND c1 <= 3;
As far as I understand, ExecWindowAgg for the intermediate WindowAgg
node switches into pass-through mode, stops evaluating row_number(), and
returns the previous value instead. But if int8 is passed by reference,
the previous value stored in econtext->ecxt_aggvalues becomes a dangling
pointer when the per-output-tuple memory context is reset.
 
Yeah, you're right.  In this example the window function row_number()
goes into pass-through mode after the second evaluation because its
run condition does not hold true any more.  The remaining run would just
return the result from the second evaluation, which is stored in
econtext->ecxt_aggvalues[wfuncno].

If int8 is configured as pass-by-ref, the precomputed value from the
second evaluation is actually located in a memory area from context
ecxt_per_tuple_memory, with its pointer stored in ecxt_aggvalues.  As
this memory context is reset once per tuple, we would be prone to wrong
results.
 
Regarding how to fix this problem, firstly I believe we need to evaluate
window functions in the per-tuple memory context, as the HEAD does.
When we decide we need to go into pass-through mode, I'm thinking that
we can just copy out the results of the last evaluation to the per-query
memory context, while still storing their pointers in ecxt_aggvalues.

Does this idea work?

Thanks
Richard

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Collation version tracking for macOS
Next
From: Andrey Chudnovsky
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER