Thread: BUG #16968: Planner does not recognize optimization

BUG #16968: Planner does not recognize optimization

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      16968
Logged by:          Eugen Konkov
Email address:      kes-kes@yandex.ru
PostgreSQL version: 13.1
Operating system:   Linux Mint 19.3
Description:

TLDR;
If I refer to same column by different ways planner may or may not recognize
optimization

select * from order_total_suma() ots where agreement_id = 3943;
-- fast
select * from order_total_suma() ots where (ots.o).agreement_id = 3943;  --
slow

Where `order_total_suma` is sql function:

        SELECT
         sum( ocd.item_suma     ) OVER( PARTITION BY (ocd.o).agreement_id
       ) AS agreement_suma,
         sum( ocd.item_suma     ) OVER( PARTITION BY (ocd.o).agreement_id,
(ocd.o).id                           ) AS order_suma,
         sum( ocd.item_cost     ) OVER( PARTITION BY (ocd.o).agreement_id,
(ocd.o).id, (ocd.ic).consumed_period ) AS group_cost,
         sum( ocd.item_suma     ) OVER( PARTITION BY (ocd.o).agreement_id,
(ocd.o).id, (ocd.ic).consumed_period ) AS group_suma,
         max( (ocd.ic).consumed ) OVER( PARTITION BY (ocd.o).agreement_id,
(ocd.o).id, (ocd.ic).consumed_period ) AS consumed,
         ocd.item_qty, ocd.item_price, ocd.item_cost, ocd.item_suma,
         ocd.o, ocd.c, ocd.p, ocd.ic,
         (ocd.o).id as order_id,
         (ocd.o).agreement_id as agreement_id
        FROM order_cost_details( _target_range ) ocd

Problem is window function, because ID can not go through. But this occur
not always.
When I filter by field I partition result by then optimization occur 
BUT only when I create an alias for this field and do filtering via this
alias.

Expected: apply optimization not only when I do `WHERE agreement_id = XXX`

but and for `WHERE (ots.o).agreement_id = XXX`

Thank you.


Re: BUG #16968: Planner does not recognize optimization

From
Eugen Konkov
Date:
Now I attarch plans for both queries.

tucha=> \out f2
tucha=>  EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) select * from order_total_suma() ots where agreement_id = 3943;
tucha=>  EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) select * from order_total_suma() ots where (ots.o).agreement_id = 3943;
tucha=>  EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) select * from order_total_suma() ots where agreement_id = 3943;
tucha=>  EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) select * from order_total_suma() ots where (ots.o).agreement_id = 3943;



Friday, April 16, 2021, 10:18:45 PM, you wrote:

> The following bug has been logged on the website:

> Bug reference:      16968
> Logged by:          Eugen Konkov
> Email address:      
kes-kes@yandex.ru
> PostgreSQL version: 13.1
> Operating system:   Linux Mint 19.3
> Description:        

> TLDR;
> If I refer to same column by different ways planner may or may not recognize
> optimization

> select * from order_total_suma() ots where agreement_id = 3943;    
> -- fast
> select * from order_total_suma() ots where (ots.o).agreement_id = 3943;  --
> slow

> Where `order_total_suma` is sql function:

>                 SELECT
>                  sum( ocd.item_suma     ) OVER( PARTITION BY (ocd.o).agreement_id
>        ) AS agreement_suma,
>                  sum( ocd.item_suma     ) OVER( PARTITION BY (ocd.o).agreement_id,
> (ocd.o).id                           ) AS order_suma,
>                  sum( ocd.item_cost     ) OVER( PARTITION BY (ocd.o).agreement_id,
> (ocd.o).id, (ocd.ic).consumed_period ) AS group_cost,
>                  sum( ocd.item_suma     ) OVER( PARTITION BY (ocd.o).agreement_id,
> (ocd.o).id, (ocd.ic).consumed_period ) AS group_suma,
>                  max( (ocd.ic).consumed ) OVER( PARTITION BY (ocd.o).agreement_id,
> (ocd.o).id, (ocd.ic).consumed_period ) AS consumed,
>                  ocd.item_qty, ocd.item_price, ocd.item_cost, ocd.item_suma,
>                  ocd.o, ocd.c, ocd.p, ocd.ic,
>                  (ocd.o).id as order_id,
>                  (ocd.o).agreement_id as agreement_id
>                 FROM order_cost_details( _target_range ) ocd

> Problem is window function, because ID can not go through. But this occur
> not always.
> When I filter by field I partition result by then optimization occur
> BUT only when I create an alias for this field and do filtering via this
> alias.

> Expected: apply optimization not only when I do `WHERE agreement_id = XXX`

> but and for `WHERE (ots.o).agreement_id = XXX`

> Thank you.




--
Best regards,
Eugen Konkov
Attachment

Re: BUG #16968: Planner does not recognize optimization

From
Eugen Konkov
Date:
Now I create minimal reproducible test case.
https://dbfiddle.uk/?rdbms=postgres_13&fiddle=761a00fb599789d3db31b120851d6341

Optimization is not applyed when I filter/partition by column using composite type name.

Looking at this comparison table, we can see that optimization work only when I refer to column using alias:

    (t.ag).ag_id as agreement_id  -- making an alias

PARTITION | FILTER | IS USED?
------------------------------
ALIAS     | ORIG   | NO
ALIAS     | ALIAS  | YES
ORIG      | ALIAS  | NO
ORIG      | ORIG   | NO

link to original problem with EXPLAIN ANALYZE:
https://stackoverflow.com/q/67492673/4632019
Links to similar problems:
https://stackoverflow.com/a/26237464/4632019
https://stackoverflow.com/q/65780112/4632019


Friday, April 16, 2021, 10:27:35 PM, you wrote:


Now I attarch plans for both queries.

tucha=> \out f2
tucha=>  EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) select * from order_total_suma() ots where agreement_id = 3943;
tucha=>  EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) select * from order_total_suma() ots where (ots.o).agreement_id = 3943;
tucha=>  EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) select * from order_total_suma() ots where agreement_id = 3943;
tucha=>  EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) select * from order_total_suma() ots where (ots.o).agreement_id = 3943;



Friday, April 16, 2021, 10:18:45 PM, you wrote:

> The following bug has been logged on the website:

> Bug reference:      16968
> Logged by:          Eugen Konkov
> Email address:      
kes-kes@yandex.ru
> PostgreSQL version: 13.1
> Operating system:   Linux Mint 19.3
> Description:        

> TLDR;
> If I refer to same column by different ways planner may or may not recognize
> optimization

> select * from order_total_suma() ots where agreement_id = 3943;    
> -- fast
> select * from order_total_suma() ots where (ots.o).agreement_id = 3943;  --
> slow

> Where `order_total_suma` is sql function:

>                 SELECT
>                  sum( ocd.item_suma     ) OVER( PARTITION BY (ocd.o).agreement_id
>        ) AS agreement_suma,
>                  sum( ocd.item_suma     ) OVER( PARTITION BY (ocd.o).agreement_id,
> (ocd.o).id                           ) AS order_suma,
>                  sum( ocd.item_cost     ) OVER( PARTITION BY (ocd.o).agreement_id,
> (ocd.o).id, (ocd.ic).consumed_period ) AS group_cost,
>                  sum( ocd.item_suma     ) OVER( PARTITION BY (ocd.o).agreement_id,
> (ocd.o).id, (ocd.ic).consumed_period ) AS group_suma,
>                  max( (ocd.ic).consumed ) OVER( PARTITION BY (ocd.o).agreement_id,
> (ocd.o).id, (ocd.ic).consumed_period ) AS consumed,
>                  ocd.item_qty, ocd.item_price, ocd.item_cost, ocd.item_suma,
>                  ocd.o, ocd.c, ocd.p, ocd.ic,
>                  (ocd.o).id as order_id,
>                  (ocd.o).agreement_id as agreement_id
>                 FROM order_cost_details( _target_range ) ocd

> Problem is window function, because ID can not go through. But this occur
> not always.
> When I filter by field I partition result by then optimization occur
> BUT only when I create an alias for this field and do filtering via this
> alias.

> Expected: apply optimization not only when I do `WHERE agreement_id = XXX`

> but and for `WHERE (ots.o).agreement_id = XXX`

> Thank you.




--
Best regards,
Eugen Konkov



--
Best regards,
Eugen Konkov

Re: BUG #16968: Planner does not recognize optimization

From
David Rowley
Date:
On Fri, 14 May 2021 at 02:38, Eugen Konkov <kes-kes@yandex.ru> wrote:
> Now I create minimal reproducible test case.
> https://dbfiddle.uk/?rdbms=postgres_13&fiddle=761a00fb599789d3db31b120851d6341
>
> Optimization is not applyed when I filter/partition by column using composite type name.

You probably already know this part, but let me explain it just in
case it's not clear.

The pushdown of the qual from the top-level query into the subquery,
or function, in this case, is only legal when the qual references a
column that's in the PARTITION BY clause of all window functions in
the subquery.  The reason for this is, if we filter rows before
calling the window function, then it could affect which rows are in
see in the window's frame. If it did filter, that could cause
incorrect results.  We can relax the restriction a bit if we can
eliminate entire partitions at once. The window function results are
independent between partitions, so we can allow qual pushdowns that
are in all PARTITION BY clauses.

As for the reason you're having trouble getting this to work, it's
down to the way you're using whole-row vars in your targetlist.

A slightly simplified case which shows this problem is:

create table ab(a int, b int);
explain select * from (select ab as wholerowvar,row_number() over
(partition by a) from ab) ab where (ab.wholerowvar).a=1;

The reason it does not work is down to how this is implemented
internally.   The details are, transformGroupClause() not assigning a
ressortgroupref to the whole-row var.  It's unable to because there is
no way to track which actual column within the whole row var is in the
partition by clause.  When it comes to the code that tries to push the
qual down into the subquery, check_output_expressions() checks if the
column in the subquery is ok to accept push downs or not. One of the
checks is to see if the query has windowing functions and to ensure
that the column is in all the PARTITION BY clauses of each windowing
function.  That check is done by checking if a ressortgroupref is
assigned and matches a tleSortGroupRef in the PARTITION BY clause.  In
this case, it does not match.  We didn't assign any ressortgroupref to
the whole-row var.

Unfortunately, whole-row vars are a bit to 2nd class citizen when it
comes to the query planner. Also, it would be quite a bit of effort to
make the planner push down the qual in this case. We'd need some sort
of ability to assign ressortgroupref to a particular column within a
whole-row var and we'd need to adjust the code to check for that when
doing subquery pushdowns to allow it to mention which columns within
whole-row vars can legally accept pushdowns.  I imagine that's
unlikely to be fixed any time soon.  Whole-row vars just don't seem to
be used commonly enough to warrant going to the effort of making this
stuff work.

To work around this, you should include a reference to the actual
column in the targetlist of the subquery, or your function, in this
case, and ensure you use that same column in the PARTITION BY clause.
You'll then need to write that column in your condition that you need
pushed into the subquery. I'm sorry if that messes up your design.
However, I imagine this is not the only optimisation that you'll miss
out on by doing things the way you are.

David



Re: BUG #16968: Planner does not recognize optimization

From
David Rowley
Date:
On Fri, 14 May 2021 at 02:38, Eugen Konkov <kes-kes@yandex.ru> wrote:
> Now I create minimal reproducible test case.
> https://dbfiddle.uk/?rdbms=postgres_13&fiddle=761a00fb599789d3db31b120851d6341
>
> Optimization is not applyed when I filter/partition by column using composite type name.

You probably already know this part, but let me explain it just in
case it's not clear.

The pushdown of the qual from the top-level query into the subquery,
or function, in this case, is only legal when the qual references a
column that's in the PARTITION BY clause of all window functions in
the subquery.  The reason for this is, if we filter rows before
calling the window function, then it could affect which rows are in
see in the window's frame. If it did filter, that could cause
incorrect results.  We can relax the restriction a bit if we can
eliminate entire partitions at once. The window function results are
independent between partitions, so we can allow qual pushdowns that
are in all PARTITION BY clauses.

As for the reason you're having trouble getting this to work, it's
down to the way you're using whole-row vars in your targetlist.

A slightly simplified case which shows this problem is:

create table ab(a int, b int);
explain select * from (select ab as wholerowvar,row_number() over
(partition by a) from ab) ab where (ab.wholerowvar).a=1;

The reason it does not work is down to how this is implemented
internally.   The details are, transformGroupClause() not assigning a
ressortgroupref to the whole-row var.  It's unable to because there is
no way to track which actual column within the whole row var is in the
partition by clause.  When it comes to the code that tries to push the
qual down into the subquery, check_output_expressions() checks if the
column in the subquery is ok to accept push downs or not. One of the
checks is to see if the query has windowing functions and to ensure
that the column is in all the PARTITION BY clauses of each windowing
function.  That check is done by checking if a ressortgroupref is
assigned and matches a tleSortGroupRef in the PARTITION BY clause.  In
this case, it does not match.  We didn't assign any ressortgroupref to
the whole-row var.

Unfortunately, whole-row vars are a bit to 2nd class citizen when it
comes to the query planner. Also, it would be quite a bit of effort to
make the planner push down the qual in this case. We'd need some sort
of ability to assign ressortgroupref to a particular column within a
whole-row var and we'd need to adjust the code to check for that when
doing subquery pushdowns to allow it to mention which columns within
whole-row vars can legally accept pushdowns.  I imagine that's
unlikely to be fixed any time soon.  Whole-row vars just don't seem to
be used commonly enough to warrant going to the effort of making this
stuff work.

To work around this, you should include a reference to the actual
column in the targetlist of the subquery, or your function, in this
case, and ensure you use that same column in the PARTITION BY clause.
You'll then need to write that column in your condition that you need
pushed into the subquery. I'm sorry if that messes up your design.
However, I imagine this is not the only optimisation that you'll miss
out on by doing things the way you are.

David



Re: BUG #16968: Planner does not recognize optimization

From
Eugen Konkov
Date:
Hello David,

I found a case when `not assigning a ressortgroupref to the whole-row var` cause
wrong window function calculations.

I  use  same  query.  The  difference  come  when I wrap my query into
function. (see full queries in attachment)

1.
SELECT *
FROM agreement_totals( tstzrange( '2020-07-01', '2020-08-01' ) )
WHERE agreement_id = 161::int  AND  (o).period_id = 10::int

2.
SELECT *
  sum( .... ) over wagreement
FROM ....
WHERE agreement_id = 161::int  AND  (o).period_id = 10::int
WINDOW wagreement AS ( PARTITION BY agreement_id )

For  first  query  window function calculates SUM over all agreements,
then some are filtered out by (o).period_id condition.

But for second query agreements with "wrong" (o).period_id are filtered out,
then SUM is calculated.

I suppose here is problem with `not assigning a ressortgroupref to the whole-row var`
which cause different calculation when I try to filter: (o).period_id

I will also attach plans for both queries.


Friday, May 14, 2021, 2:52:33 AM, you wrote:

> On Fri, 14 May 2021 at 02:38, Eugen Konkov <kes-kes@yandex.ru> wrote:
>> Now I create minimal reproducible test case.
>> https://dbfiddle.uk/?rdbms=postgres_13&fiddle=761a00fb599789d3db31b120851d6341

>> Optimization is not applyed when I filter/partition by column using composite type name.

> You probably already know this part, but let me explain it just in
> case it's not clear.

> The pushdown of the qual from the top-level query into the subquery,
> or function, in this case, is only legal when the qual references a
> column that's in the PARTITION BY clause of all window functions in
> the subquery.  The reason for this is, if we filter rows before
> calling the window function, then it could affect which rows are in
> see in the window's frame. If it did filter, that could cause
> incorrect results.  We can relax the restriction a bit if we can
> eliminate entire partitions at once. The window function results are
> independent between partitions, so we can allow qual pushdowns that
> are in all PARTITION BY clauses.

> As for the reason you're having trouble getting this to work, it's
> down to the way you're using whole-row vars in your targetlist.

> A slightly simplified case which shows this problem is:

> create table ab(a int, b int);
> explain select * from (select ab as wholerowvar,row_number() over
> (partition by a) from ab) ab where (ab.wholerowvar).a=1;

> The reason it does not work is down to how this is implemented
> internally.   The details are, transformGroupClause() not assigning a
> ressortgroupref to the whole-row var.  It's unable to because there is
> no way to track which actual column within the whole row var is in the
> partition by clause.  When it comes to the code that tries to push the
> qual down into the subquery, check_output_expressions() checks if the
> column in the subquery is ok to accept push downs or not. One of the
> checks is to see if the query has windowing functions and to ensure
> that the column is in all the PARTITION BY clauses of each windowing
> function.  That check is done by checking if a ressortgroupref is
> assigned and matches a tleSortGroupRef in the PARTITION BY clause.  In
> this case, it does not match.  We didn't assign any ressortgroupref to
> the whole-row var.

> Unfortunately, whole-row vars are a bit to 2nd class citizen when it
> comes to the query planner. Also, it would be quite a bit of effort to
> make the planner push down the qual in this case. We'd need some sort
> of ability to assign ressortgroupref to a particular column within a
> whole-row var and we'd need to adjust the code to check for that when
> doing subquery pushdowns to allow it to mention which columns within
> whole-row vars can legally accept pushdowns.  I imagine that's
> unlikely to be fixed any time soon.  Whole-row vars just don't seem to
> be used commonly enough to warrant going to the effort of making this
> stuff work.

> To work around this, you should include a reference to the actual
> column in the targetlist of the subquery, or your function, in this
> case, and ensure you use that same column in the PARTITION BY clause.
> You'll then need to write that column in your condition that you need
> pushed into the subquery. I'm sorry if that messes up your design.
> However, I imagine this is not the only optimisation that you'll miss
> out on by doing things the way you are.

> David



-- 
Best regards,
Eugen Konkov
Attachment

Re: BUG #16968: Planner does not recognize optimization

From
Eugen Konkov
Date:
Hello David,

I found a case when `not assigning a ressortgroupref to the whole-row var` cause
wrong window function calculations.

I  use  same  query.  The  difference  come  when I wrap my query into
function. (see full queries in attachment)

1.
SELECT *
FROM agreement_totals( tstzrange( '2020-07-01', '2020-08-01' ) )
WHERE agreement_id = 161::int  AND  (o).period_id = 10::int

2.
SELECT *
  sum( .... ) over wagreement
FROM ....
WHERE agreement_id = 161::int  AND  (o).period_id = 10::int
WINDOW wagreement AS ( PARTITION BY agreement_id )

For  first  query  window function calculates SUM over all agreements,
then some are filtered out by (o).period_id condition.

But for second query agreements with "wrong" (o).period_id are filtered out,
then SUM is calculated.

I suppose here is problem with `not assigning a ressortgroupref to the whole-row var`
which cause different calculation when I try to filter: (o).period_id

I will also attach plans for both queries.


Friday, May 14, 2021, 2:52:33 AM, you wrote:

> On Fri, 14 May 2021 at 02:38, Eugen Konkov <kes-kes@yandex.ru> wrote:
>> Now I create minimal reproducible test case.
>> https://dbfiddle.uk/?rdbms=postgres_13&fiddle=761a00fb599789d3db31b120851d6341

>> Optimization is not applyed when I filter/partition by column using composite type name.

> You probably already know this part, but let me explain it just in
> case it's not clear.

> The pushdown of the qual from the top-level query into the subquery,
> or function, in this case, is only legal when the qual references a
> column that's in the PARTITION BY clause of all window functions in
> the subquery.  The reason for this is, if we filter rows before
> calling the window function, then it could affect which rows are in
> see in the window's frame. If it did filter, that could cause
> incorrect results.  We can relax the restriction a bit if we can
> eliminate entire partitions at once. The window function results are
> independent between partitions, so we can allow qual pushdowns that
> are in all PARTITION BY clauses.

> As for the reason you're having trouble getting this to work, it's
> down to the way you're using whole-row vars in your targetlist.

> A slightly simplified case which shows this problem is:

> create table ab(a int, b int);
> explain select * from (select ab as wholerowvar,row_number() over
> (partition by a) from ab) ab where (ab.wholerowvar).a=1;

> The reason it does not work is down to how this is implemented
> internally.   The details are, transformGroupClause() not assigning a
> ressortgroupref to the whole-row var.  It's unable to because there is
> no way to track which actual column within the whole row var is in the
> partition by clause.  When it comes to the code that tries to push the
> qual down into the subquery, check_output_expressions() checks if the
> column in the subquery is ok to accept push downs or not. One of the
> checks is to see if the query has windowing functions and to ensure
> that the column is in all the PARTITION BY clauses of each windowing
> function.  That check is done by checking if a ressortgroupref is
> assigned and matches a tleSortGroupRef in the PARTITION BY clause.  In
> this case, it does not match.  We didn't assign any ressortgroupref to
> the whole-row var.

> Unfortunately, whole-row vars are a bit to 2nd class citizen when it
> comes to the query planner. Also, it would be quite a bit of effort to
> make the planner push down the qual in this case. We'd need some sort
> of ability to assign ressortgroupref to a particular column within a
> whole-row var and we'd need to adjust the code to check for that when
> doing subquery pushdowns to allow it to mention which columns within
> whole-row vars can legally accept pushdowns.  I imagine that's
> unlikely to be fixed any time soon.  Whole-row vars just don't seem to
> be used commonly enough to warrant going to the effort of making this
> stuff work.

> To work around this, you should include a reference to the actual
> column in the targetlist of the subquery, or your function, in this
> case, and ensure you use that same column in the PARTITION BY clause.
> You'll then need to write that column in your condition that you need
> pushed into the subquery. I'm sorry if that messes up your design.
> However, I imagine this is not the only optimisation that you'll miss
> out on by doing things the way you are.

> David



-- 
Best regards,
Eugen Konkov

Re: BUG #16968: Planner does not recognize optimization

From
David Rowley
Date:
On Sun, 16 May 2021 at 02:34, Eugen Konkov <kes-kes@yandex.ru> wrote:
> I found a case when `not assigning a ressortgroupref to the whole-row var` cause
> wrong window function calculations.
>
> I  use  same  query.  The  difference  come  when I wrap my query into
> function. (see full queries in attachment)
>
> 1.
> SELECT *
> FROM agreement_totals( tstzrange( '2020-07-01', '2020-08-01' ) )
> WHERE agreement_id = 161::int  AND  (o).period_id = 10::int
>
> 2.
> SELECT *
>   sum( .... ) over wagreement
> FROM ....
> WHERE agreement_id = 161::int  AND  (o).period_id = 10::int
> WINDOW wagreement AS ( PARTITION BY agreement_id )
>
> For  first  query  window function calculates SUM over all agreements,
> then some are filtered out by (o).period_id condition.

This is unrelated to the optimisation that you were asking about before.

All that's going on here is that WHERE is evaluated before SELECT.
This means that your filtering is done before the window functions are
executed.  This is noted in the docs in [1]:

> The rows considered by a window function are those of the “virtual table” produced by the query's FROM clause as
filteredby its WHERE, GROUP BY, and HAVING clauses if any. For example, a row removed because it does not meet the
WHEREcondition is not seen by any window function. A query can contain multiple window functions that slice up the data
indifferent ways using different OVER clauses, but they all act on the same collection of rows defined by this virtual
table.

If you want to filter rows after the window functions are evaluated
then you'll likely want to use a subquery.

David

[1] https://www.postgresql.org/docs/13/tutorial-window.html



Re: BUG #16968: Planner does not recognize optimization

From
David Rowley
Date:
On Sun, 16 May 2021 at 02:34, Eugen Konkov <kes-kes@yandex.ru> wrote:
> I found a case when `not assigning a ressortgroupref to the whole-row var` cause
> wrong window function calculations.
>
> I  use  same  query.  The  difference  come  when I wrap my query into
> function. (see full queries in attachment)
>
> 1.
> SELECT *
> FROM agreement_totals( tstzrange( '2020-07-01', '2020-08-01' ) )
> WHERE agreement_id = 161::int  AND  (o).period_id = 10::int
>
> 2.
> SELECT *
>   sum( .... ) over wagreement
> FROM ....
> WHERE agreement_id = 161::int  AND  (o).period_id = 10::int
> WINDOW wagreement AS ( PARTITION BY agreement_id )
>
> For  first  query  window function calculates SUM over all agreements,
> then some are filtered out by (o).period_id condition.

This is unrelated to the optimisation that you were asking about before.

All that's going on here is that WHERE is evaluated before SELECT.
This means that your filtering is done before the window functions are
executed.  This is noted in the docs in [1]:

> The rows considered by a window function are those of the “virtual table” produced by the query's FROM clause as
filteredby its WHERE, GROUP BY, and HAVING clauses if any. For example, a row removed because it does not meet the
WHEREcondition is not seen by any window function. A query can contain multiple window functions that slice up the data
indifferent ways using different OVER clauses, but they all act on the same collection of rows defined by this virtual
table.

If you want to filter rows after the window functions are evaluated
then you'll likely want to use a subquery.

David

[1] https://www.postgresql.org/docs/13/tutorial-window.html



Re: BUG #16968: Planner does not recognize optimization

From
David Rowley
Date:
On Sat, 15 May 2021 at 00:39, KES <kes-kes@yandex.ru> wrote:
>
> Thank you for detailed explanation. I glad to hear that I can use aliases and this will be recognized and
optimizationis applied.
 
>
> >We'd need some sort of ability to assign ressortgroupref to a particular column within a
> whole-row var
> Could it be possible to create hidden alias in same way as I did that manually?
>
> Algorithm seems not complex:
> 1. User refer column from composite type/whole-row: (o).agreement_id
> 2. Create hidden column at select: _o_agreement_id
> 3. Replace other references to (o).agreement_id by _o_agreement_id
> 4. Process query as usual after replacements

Internally Postgresql does use a hidden column for columns that are
required for calculations which are not in the SELECT list. e.g ones
that are in the GROUP BY / ORDER BY, or in your case a window
function's PARTITION BY. We call these "resjunk" columns.  The problem
is you can't reference those from the parent query. If you explicitly
had listed that column in the SELECT clause, it won't cost you
anything more since the planner will add it regardless and just hide
it from you.  When you add it yourself you'll be able to use it in the
subquery and you'll be able to filter out the partitions that you
don't want.

I really think you're driving yourself down a difficult path by
expecting queries with whole-row vars to be optimised just as well as
using select * or explicitly listing the columns.

David



Re: BUG #16968: Planner does not recognize optimization

From
David Rowley
Date:
On Sat, 15 May 2021 at 00:39, KES <kes-kes@yandex.ru> wrote:
>
> Thank you for detailed explanation. I glad to hear that I can use aliases and this will be recognized and
optimizationis applied.
 
>
> >We'd need some sort of ability to assign ressortgroupref to a particular column within a
> whole-row var
> Could it be possible to create hidden alias in same way as I did that manually?
>
> Algorithm seems not complex:
> 1. User refer column from composite type/whole-row: (o).agreement_id
> 2. Create hidden column at select: _o_agreement_id
> 3. Replace other references to (o).agreement_id by _o_agreement_id
> 4. Process query as usual after replacements

Internally Postgresql does use a hidden column for columns that are
required for calculations which are not in the SELECT list. e.g ones
that are in the GROUP BY / ORDER BY, or in your case a window
function's PARTITION BY. We call these "resjunk" columns.  The problem
is you can't reference those from the parent query. If you explicitly
had listed that column in the SELECT clause, it won't cost you
anything more since the planner will add it regardless and just hide
it from you.  When you add it yourself you'll be able to use it in the
subquery and you'll be able to filter out the partitions that you
don't want.

I really think you're driving yourself down a difficult path by
expecting queries with whole-row vars to be optimised just as well as
using select * or explicitly listing the columns.

David



Re: BUG #16968: Planner does not recognize optimization

From
Eugen Konkov
Date:
Hello David,

Saturday, May 15, 2021, 5:52:47 PM, you wrote:

> On Sun, 16 May 2021 at 02:34, Eugen Konkov <kes-kes@yandex.ru> wrote:
>> I found a case when `not assigning a ressortgroupref to the whole-row var` cause
>> wrong window function calculations.
>>
>> I  use  same  query.  The  difference  come  when I wrap my query into
>> function. (see full queries in attachment)
>>
>> 1.
>> SELECT *
>> FROM agreement_totals( tstzrange( '2020-07-01', '2020-08-01' ) )
>> WHERE agreement_id = 161::int  AND  (o).period_id = 10::int
>>
>> 2.
>> SELECT *
>>   sum( .... ) over wagreement
>> FROM ....
>> WHERE agreement_id = 161::int  AND  (o).period_id = 10::int
>> WINDOW wagreement AS ( PARTITION BY agreement_id )
>>
>> For  first  query  window function calculates SUM over all agreements,
>> then some are filtered out by (o).period_id condition.

> This is unrelated to the optimisation that you were asking about before.

> All that's going on here is that WHERE is evaluated before SELECT.
> This means that your filtering is done before the window functions are
> executed.  This is noted in the docs in [1]:

>> The rows considered by a window function are those of the “virtual table” produced by the query's FROM clause as
filteredby its WHERE, GROUP BY, and HAVING clauses if any. For example, a row removed because it does not meet the
WHEREcondition is not seen by any window function. A query can contain multiple window functions that slice up the data
indifferent ways using different OVER clauses, but they all act on the same collection of rows defined by this virtual
table.

> If you want to filter rows after the window functions are evaluated
> then you'll likely want to use a subquery.

> David

> [1] https://www.postgresql.org/docs/13/tutorial-window.html


Sorry,  I  miss  that WHERE works first and after it
window function.

>This is unrelated to the optimisation that you were asking about before.
So, yes,  unrelated.

Thank you for your answers.

--
Best regards,
Eugen Konkov




Re: BUG #16968: Planner does not recognize optimization

From
Eugen Konkov
Date:
Hello David,

Saturday, May 15, 2021, 5:52:47 PM, you wrote:

> On Sun, 16 May 2021 at 02:34, Eugen Konkov <kes-kes@yandex.ru> wrote:
>> I found a case when `not assigning a ressortgroupref to the whole-row var` cause
>> wrong window function calculations.
>>
>> I  use  same  query.  The  difference  come  when I wrap my query into
>> function. (see full queries in attachment)
>>
>> 1.
>> SELECT *
>> FROM agreement_totals( tstzrange( '2020-07-01', '2020-08-01' ) )
>> WHERE agreement_id = 161::int  AND  (o).period_id = 10::int
>>
>> 2.
>> SELECT *
>>   sum( .... ) over wagreement
>> FROM ....
>> WHERE agreement_id = 161::int  AND  (o).period_id = 10::int
>> WINDOW wagreement AS ( PARTITION BY agreement_id )
>>
>> For  first  query  window function calculates SUM over all agreements,
>> then some are filtered out by (o).period_id condition.

> This is unrelated to the optimisation that you were asking about before.

> All that's going on here is that WHERE is evaluated before SELECT.
> This means that your filtering is done before the window functions are
> executed.  This is noted in the docs in [1]:

>> The rows considered by a window function are those of the “virtual table” produced by the query's FROM clause as
filteredby its WHERE, GROUP BY, and HAVING clauses if any. For example, a row removed because it does not meet the
WHEREcondition is not seen by any window function. A query can contain multiple window functions that slice up the data
indifferent ways using different OVER clauses, but they all act on the same collection of rows defined by this virtual
table.

> If you want to filter rows after the window functions are evaluated
> then you'll likely want to use a subquery.

> David

> [1] https://www.postgresql.org/docs/13/tutorial-window.html


Sorry,  I  miss  that WHERE works first and after it
window function.

>This is unrelated to the optimisation that you were asking about before.
So, yes,  unrelated.

Thank you for your answers.

--
Best regards,
Eugen Konkov




Re: BUG #16968: Planner does not recognize optimization

From
Eugen Konkov
Date:
Hello David,

> I really think you're driving yourself down a difficult path by
> expecting queries with whole-row vars to be optimised just as well as
> using select * or explicitly listing the columns.
Yes, I was expect that. I use whole-row because do not want repeat all
10+  columns  at select. I do not use (row1).*, (row2).*, because rows
could   have   same   columns.  eg:  row1.name, row2.name both will be
named as 'name' and then I can not distinguish them.
So I select whole-row and put myself into problems ((

It  would be nice if (row1).** will be expanded to: row1_id, row1_name
etc.  But this is other question which I already ask at different
thread.



Saturday, May 15, 2021, 5:59:41 PM, you wrote:

> On Sat, 15 May 2021 at 00:39, KES <kes-kes@yandex.ru> wrote:
>>
>> Thank you for detailed explanation. I glad to hear that I can use aliases and this will be recognized and
optimizationis applied.
 
>>
>> >We'd need some sort of ability to assign ressortgroupref to a particular column within a
>> whole-row var
>> Could it be possible to create hidden alias in same way as I did that manually?
>>
>> Algorithm seems not complex:
>> 1. User refer column from composite type/whole-row: (o).agreement_id
>> 2. Create hidden column at select: _o_agreement_id
>> 3. Replace other references to (o).agreement_id by _o_agreement_id
>> 4. Process query as usual after replacements

> Internally Postgresql does use a hidden column for columns that are
> required for calculations which are not in the SELECT list. e.g ones
> that are in the GROUP BY / ORDER BY, or in your case a window
> function's PARTITION BY. We call these "resjunk" columns.  The problem
> is you can't reference those from the parent query. If you explicitly
> had listed that column in the SELECT clause, it won't cost you
> anything more since the planner will add it regardless and just hide
> it from you.  When you add it yourself you'll be able to use it in the
> subquery and you'll be able to filter out the partitions that you
> don't want.

> I really think you're driving yourself down a difficult path by
> expecting queries with whole-row vars to be optimised just as well as
> using select * or explicitly listing the columns.

> David



-- 
Best regards,
Eugen Konkov




Re: BUG #16968: Planner does not recognize optimization

From
Eugen Konkov
Date:
Hello David,

> I really think you're driving yourself down a difficult path by
> expecting queries with whole-row vars to be optimised just as well as
> using select * or explicitly listing the columns.
Yes, I was expect that. I use whole-row because do not want repeat all
10+  columns  at select. I do not use (row1).*, (row2).*, because rows
could   have   same   columns.  eg:  row1.name, row2.name both will be
named as 'name' and then I can not distinguish them.
So I select whole-row and put myself into problems ((

It  would be nice if (row1).** will be expanded to: row1_id, row1_name
etc.  But this is other question which I already ask at different
thread.



Saturday, May 15, 2021, 5:59:41 PM, you wrote:

> On Sat, 15 May 2021 at 00:39, KES <kes-kes@yandex.ru> wrote:
>>
>> Thank you for detailed explanation. I glad to hear that I can use aliases and this will be recognized and
optimizationis applied.
 
>>
>> >We'd need some sort of ability to assign ressortgroupref to a particular column within a
>> whole-row var
>> Could it be possible to create hidden alias in same way as I did that manually?
>>
>> Algorithm seems not complex:
>> 1. User refer column from composite type/whole-row: (o).agreement_id
>> 2. Create hidden column at select: _o_agreement_id
>> 3. Replace other references to (o).agreement_id by _o_agreement_id
>> 4. Process query as usual after replacements

> Internally Postgresql does use a hidden column for columns that are
> required for calculations which are not in the SELECT list. e.g ones
> that are in the GROUP BY / ORDER BY, or in your case a window
> function's PARTITION BY. We call these "resjunk" columns.  The problem
> is you can't reference those from the parent query. If you explicitly
> had listed that column in the SELECT clause, it won't cost you
> anything more since the planner will add it regardless and just hide
> it from you.  When you add it yourself you'll be able to use it in the
> subquery and you'll be able to filter out the partitions that you
> don't want.

> I really think you're driving yourself down a difficult path by
> expecting queries with whole-row vars to be optimised just as well as
> using select * or explicitly listing the columns.

> David



-- 
Best regards,
Eugen Konkov