Re: Optimize WindowAgg's use of tuplestores - Mailing list pgsql-hackers

From David Rowley
Subject Re: Optimize WindowAgg's use of tuplestores
Date
Msg-id CAApHDvpJyvaKpye+5CzE8=22zJ5UhEynCo-KY8-gteSHctMvyA@mail.gmail.com
Whole thread Raw
In response to Re: Optimize WindowAgg's use of tuplestores  (Andy Fan <zhihuifan1213@163.com>)
List pgsql-hackers
On Thu, 18 Jul 2024 at 19:56, Andy Fan <zhihuifan1213@163.com> wrote:
> I'm curious about why a 'unlikey' change can cause noticeable change,
> especially there is just one function call in the 'if-statement' (I am
> thinking more instrucments in the if-statement body, more changes it can
> cause).
>
> +       if (unlikely(winstate->buffer == NULL))
> +               prepare_tuplestore(winstate);

This isn't the branch being discussed.  We've not talked about whether
the above one is useful or not. The branch we were discussing is the
"if (winstate->all_first)", of which has a large number of
instructions inside it.

However, for this one, my understanding of this particular case is,
it's a very predictable branch as, even if the first prediction gets
it wrong the first time, it's not going to be wrong for long as the
condition is false for all subsequent calls. So, from there, if you
assume that the instruction decoder is always going to fetch the
correct instructions according to the branch predictor's correct
prediction (the ones for branch not taken), then the only benefit
possible is that the next instructions to execute are the next
instructions in the code rather than instructions located far away,
possibly on another cacheline that needs to be loaded from RAM or
higher cache levels.  Adding the unlikely() should coax the compiler
into laying out the code so the branch's instructions at the end of
the function and that, in theory, should reduce the likelihood of
frontend stalls waiting for cachelines for further instructions to
arrive from RAM or higher cache levels. That's my theory, at least.  I
expected to see perf stat show me a lower "stalled-cycles-frontend"
with the v2 patch than v1, but didn't see that when I looked, so my
theory might be wrong.

For the case you're mentioning above, calling the function isn't very
many instructions, so the benefits are likely very small with a branch
this predictable.

David



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Next
From: Joe Conway
Date:
Subject: Re: CI, macports, darwin version problems