Thread: BUG #16968: Planner does not recognize optimization
The following bug has been logged on the website: Bug reference: 16968 Logged by: Eugen Konkov Email address: kes-kes@yandex.ru PostgreSQL version: 13.1 Operating system: Linux Mint 19.3 Description: TLDR; If I refer to same column by different ways planner may or may not recognize optimization select * from order_total_suma() ots where agreement_id = 3943; -- fast select * from order_total_suma() ots where (ots.o).agreement_id = 3943; -- slow Where `order_total_suma` is sql function: SELECT sum( ocd.item_suma ) OVER( PARTITION BY (ocd.o).agreement_id ) AS agreement_suma, sum( ocd.item_suma ) OVER( PARTITION BY (ocd.o).agreement_id, (ocd.o).id ) AS order_suma, sum( ocd.item_cost ) OVER( PARTITION BY (ocd.o).agreement_id, (ocd.o).id, (ocd.ic).consumed_period ) AS group_cost, sum( ocd.item_suma ) OVER( PARTITION BY (ocd.o).agreement_id, (ocd.o).id, (ocd.ic).consumed_period ) AS group_suma, max( (ocd.ic).consumed ) OVER( PARTITION BY (ocd.o).agreement_id, (ocd.o).id, (ocd.ic).consumed_period ) AS consumed, ocd.item_qty, ocd.item_price, ocd.item_cost, ocd.item_suma, ocd.o, ocd.c, ocd.p, ocd.ic, (ocd.o).id as order_id, (ocd.o).agreement_id as agreement_id FROM order_cost_details( _target_range ) ocd Problem is window function, because ID can not go through. But this occur not always. When I filter by field I partition result by then optimization occur BUT only when I create an alias for this field and do filtering via this alias. Expected: apply optimization not only when I do `WHERE agreement_id = XXX` but and for `WHERE (ots.o).agreement_id = XXX` Thank you.
Now I attarch plans for both queries.
tucha=> \out f2
tucha=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) select * from order_total_suma() ots where agreement_id = 3943;
tucha=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) select * from order_total_suma() ots where (ots.o).agreement_id = 3943;
tucha=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) select * from order_total_suma() ots where agreement_id = 3943;
tucha=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) select * from order_total_suma() ots where (ots.o).agreement_id = 3943;
Friday, April 16, 2021, 10:18:45 PM, you wrote:
> The following bug has been logged on the website:
> Bug reference: 16968
> Logged by: Eugen Konkov
> Email address: kes-kes@yandex.ru
> PostgreSQL version: 13.1
> Operating system: Linux Mint 19.3
> Description:
> TLDR;
> If I refer to same column by different ways planner may or may not recognize
> optimization
> select * from order_total_suma() ots where agreement_id = 3943;
> -- fast
> select * from order_total_suma() ots where (ots.o).agreement_id = 3943; --
> slow
> Where `order_total_suma` is sql function:
> SELECT
> sum( ocd.item_suma ) OVER( PARTITION BY (ocd.o).agreement_id
> ) AS agreement_suma,
> sum( ocd.item_suma ) OVER( PARTITION BY (ocd.o).agreement_id,
> (ocd.o).id ) AS order_suma,
> sum( ocd.item_cost ) OVER( PARTITION BY (ocd.o).agreement_id,
> (ocd.o).id, (ocd.ic).consumed_period ) AS group_cost,
> sum( ocd.item_suma ) OVER( PARTITION BY (ocd.o).agreement_id,
> (ocd.o).id, (ocd.ic).consumed_period ) AS group_suma,
> max( (ocd.ic).consumed ) OVER( PARTITION BY (ocd.o).agreement_id,
> (ocd.o).id, (ocd.ic).consumed_period ) AS consumed,
> ocd.item_qty, ocd.item_price, ocd.item_cost, ocd.item_suma,
> ocd.o, ocd.c, ocd.p, ocd.ic,
> (ocd.o).id as order_id,
> (ocd.o).agreement_id as agreement_id
> FROM order_cost_details( _target_range ) ocd
> Problem is window function, because ID can not go through. But this occur
> not always.
> When I filter by field I partition result by then optimization occur
> BUT only when I create an alias for this field and do filtering via this
> alias.
> Expected: apply optimization not only when I do `WHERE agreement_id = XXX`
> but and for `WHERE (ots.o).agreement_id = XXX`
> Thank you.
--
Best regards,
Eugen Konkov
tucha=> \out f2
tucha=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) select * from order_total_suma() ots where agreement_id = 3943;
tucha=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) select * from order_total_suma() ots where (ots.o).agreement_id = 3943;
tucha=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) select * from order_total_suma() ots where agreement_id = 3943;
tucha=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) select * from order_total_suma() ots where (ots.o).agreement_id = 3943;
Friday, April 16, 2021, 10:18:45 PM, you wrote:
> The following bug has been logged on the website:
> Bug reference: 16968
> Logged by: Eugen Konkov
> Email address: kes-kes@yandex.ru
> PostgreSQL version: 13.1
> Operating system: Linux Mint 19.3
> Description:
> TLDR;
> If I refer to same column by different ways planner may or may not recognize
> optimization
> select * from order_total_suma() ots where agreement_id = 3943;
> -- fast
> select * from order_total_suma() ots where (ots.o).agreement_id = 3943; --
> slow
> Where `order_total_suma` is sql function:
> SELECT
> sum( ocd.item_suma ) OVER( PARTITION BY (ocd.o).agreement_id
> ) AS agreement_suma,
> sum( ocd.item_suma ) OVER( PARTITION BY (ocd.o).agreement_id,
> (ocd.o).id ) AS order_suma,
> sum( ocd.item_cost ) OVER( PARTITION BY (ocd.o).agreement_id,
> (ocd.o).id, (ocd.ic).consumed_period ) AS group_cost,
> sum( ocd.item_suma ) OVER( PARTITION BY (ocd.o).agreement_id,
> (ocd.o).id, (ocd.ic).consumed_period ) AS group_suma,
> max( (ocd.ic).consumed ) OVER( PARTITION BY (ocd.o).agreement_id,
> (ocd.o).id, (ocd.ic).consumed_period ) AS consumed,
> ocd.item_qty, ocd.item_price, ocd.item_cost, ocd.item_suma,
> ocd.o, ocd.c, ocd.p, ocd.ic,
> (ocd.o).id as order_id,
> (ocd.o).agreement_id as agreement_id
> FROM order_cost_details( _target_range ) ocd
> Problem is window function, because ID can not go through. But this occur
> not always.
> When I filter by field I partition result by then optimization occur
> BUT only when I create an alias for this field and do filtering via this
> alias.
> Expected: apply optimization not only when I do `WHERE agreement_id = XXX`
> but and for `WHERE (ots.o).agreement_id = XXX`
> Thank you.
--
Best regards,
Eugen Konkov
Attachment
Now I create minimal reproducible test case.
https://dbfiddle.uk/?rdbms=postgres_13&fiddle=761a00fb599789d3db31b120851d6341
Optimization is not applyed when I filter/partition by column using composite type name.
Looking at this comparison table, we can see that optimization work only when I refer to column using alias:
(t.ag).ag_id as agreement_id -- making an alias
PARTITION | FILTER | IS USED?
------------------------------
ALIAS | ORIG | NO
ALIAS | ALIAS | YES
ORIG | ALIAS | NO
ORIG | ORIG | NO
link to original problem with EXPLAIN ANALYZE: https://stackoverflow.com/q/67492673/4632019
Links to similar problems:
https://stackoverflow.com/a/26237464/4632019
https://stackoverflow.com/q/65780112/4632019
Friday, April 16, 2021, 10:27:35 PM, you wrote:
--
Best regards,
Eugen Konkov
https://dbfiddle.uk/?rdbms=postgres_13&fiddle=761a00fb599789d3db31b120851d6341
Optimization is not applyed when I filter/partition by column using composite type name.
Looking at this comparison table, we can see that optimization work only when I refer to column using alias:
(t.ag).ag_id as agreement_id -- making an alias
PARTITION | FILTER | IS USED?
------------------------------
ALIAS | ORIG | NO
ALIAS | ALIAS | YES
ORIG | ALIAS | NO
ORIG | ORIG | NO
link to original problem with EXPLAIN ANALYZE: https://stackoverflow.com/q/67492673/4632019
Links to similar problems:
https://stackoverflow.com/a/26237464/4632019
https://stackoverflow.com/q/65780112/4632019
Friday, April 16, 2021, 10:27:35 PM, you wrote:
Now I attarch plans for both queries. tucha=> \out f2 tucha=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) select * from order_total_suma() ots where agreement_id = 3943; tucha=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) select * from order_total_suma() ots where (ots.o).agreement_id = 3943; tucha=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) select * from order_total_suma() ots where agreement_id = 3943; tucha=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) select * from order_total_suma() ots where (ots.o).agreement_id = 3943; Friday, April 16, 2021, 10:18:45 PM, you wrote: > The following bug has been logged on the website: > Bug reference: 16968 > Logged by: Eugen Konkov > Email address: kes-kes@yandex.ru > PostgreSQL version: 13.1 > Operating system: Linux Mint 19.3 > Description: > TLDR; > If I refer to same column by different ways planner may or may not recognize > optimization > select * from order_total_suma() ots where agreement_id = 3943; > -- fast > select * from order_total_suma() ots where (ots.o).agreement_id = 3943; -- > slow > Where `order_total_suma` is sql function: > SELECT > sum( ocd.item_suma ) OVER( PARTITION BY (ocd.o).agreement_id > ) AS agreement_suma, > sum( ocd.item_suma ) OVER( PARTITION BY (ocd.o).agreement_id, > (ocd.o).id ) AS order_suma, > sum( ocd.item_cost ) OVER( PARTITION BY (ocd.o).agreement_id, > (ocd.o).id, (ocd.ic).consumed_period ) AS group_cost, > sum( ocd.item_suma ) OVER( PARTITION BY (ocd.o).agreement_id, > (ocd.o).id, (ocd.ic).consumed_period ) AS group_suma, > max( (ocd.ic).consumed ) OVER( PARTITION BY (ocd.o).agreement_id, > (ocd.o).id, (ocd.ic).consumed_period ) AS consumed, > ocd.item_qty, ocd.item_price, ocd.item_cost, ocd.item_suma, > ocd.o, ocd.c, ocd.p, ocd.ic, > (ocd.o).id as order_id, > (ocd.o).agreement_id as agreement_id > FROM order_cost_details( _target_range ) ocd > Problem is window function, because ID can not go through. But this occur > not always. > When I filter by field I partition result by then optimization occur > BUT only when I create an alias for this field and do filtering via this > alias. > Expected: apply optimization not only when I do `WHERE agreement_id = XXX` > but and for `WHERE (ots.o).agreement_id = XXX` > Thank you. -- Best regards, Eugen Konkov |
--
Best regards,
Eugen Konkov
On Fri, 14 May 2021 at 02:38, Eugen Konkov <kes-kes@yandex.ru> wrote: > Now I create minimal reproducible test case. > https://dbfiddle.uk/?rdbms=postgres_13&fiddle=761a00fb599789d3db31b120851d6341 > > Optimization is not applyed when I filter/partition by column using composite type name. You probably already know this part, but let me explain it just in case it's not clear. The pushdown of the qual from the top-level query into the subquery, or function, in this case, is only legal when the qual references a column that's in the PARTITION BY clause of all window functions in the subquery. The reason for this is, if we filter rows before calling the window function, then it could affect which rows are in see in the window's frame. If it did filter, that could cause incorrect results. We can relax the restriction a bit if we can eliminate entire partitions at once. The window function results are independent between partitions, so we can allow qual pushdowns that are in all PARTITION BY clauses. As for the reason you're having trouble getting this to work, it's down to the way you're using whole-row vars in your targetlist. A slightly simplified case which shows this problem is: create table ab(a int, b int); explain select * from (select ab as wholerowvar,row_number() over (partition by a) from ab) ab where (ab.wholerowvar).a=1; The reason it does not work is down to how this is implemented internally. The details are, transformGroupClause() not assigning a ressortgroupref to the whole-row var. It's unable to because there is no way to track which actual column within the whole row var is in the partition by clause. When it comes to the code that tries to push the qual down into the subquery, check_output_expressions() checks if the column in the subquery is ok to accept push downs or not. One of the checks is to see if the query has windowing functions and to ensure that the column is in all the PARTITION BY clauses of each windowing function. That check is done by checking if a ressortgroupref is assigned and matches a tleSortGroupRef in the PARTITION BY clause. In this case, it does not match. We didn't assign any ressortgroupref to the whole-row var. Unfortunately, whole-row vars are a bit to 2nd class citizen when it comes to the query planner. Also, it would be quite a bit of effort to make the planner push down the qual in this case. We'd need some sort of ability to assign ressortgroupref to a particular column within a whole-row var and we'd need to adjust the code to check for that when doing subquery pushdowns to allow it to mention which columns within whole-row vars can legally accept pushdowns. I imagine that's unlikely to be fixed any time soon. Whole-row vars just don't seem to be used commonly enough to warrant going to the effort of making this stuff work. To work around this, you should include a reference to the actual column in the targetlist of the subquery, or your function, in this case, and ensure you use that same column in the PARTITION BY clause. You'll then need to write that column in your condition that you need pushed into the subquery. I'm sorry if that messes up your design. However, I imagine this is not the only optimisation that you'll miss out on by doing things the way you are. David
On Fri, 14 May 2021 at 02:38, Eugen Konkov <kes-kes@yandex.ru> wrote: > Now I create minimal reproducible test case. > https://dbfiddle.uk/?rdbms=postgres_13&fiddle=761a00fb599789d3db31b120851d6341 > > Optimization is not applyed when I filter/partition by column using composite type name. You probably already know this part, but let me explain it just in case it's not clear. The pushdown of the qual from the top-level query into the subquery, or function, in this case, is only legal when the qual references a column that's in the PARTITION BY clause of all window functions in the subquery. The reason for this is, if we filter rows before calling the window function, then it could affect which rows are in see in the window's frame. If it did filter, that could cause incorrect results. We can relax the restriction a bit if we can eliminate entire partitions at once. The window function results are independent between partitions, so we can allow qual pushdowns that are in all PARTITION BY clauses. As for the reason you're having trouble getting this to work, it's down to the way you're using whole-row vars in your targetlist. A slightly simplified case which shows this problem is: create table ab(a int, b int); explain select * from (select ab as wholerowvar,row_number() over (partition by a) from ab) ab where (ab.wholerowvar).a=1; The reason it does not work is down to how this is implemented internally. The details are, transformGroupClause() not assigning a ressortgroupref to the whole-row var. It's unable to because there is no way to track which actual column within the whole row var is in the partition by clause. When it comes to the code that tries to push the qual down into the subquery, check_output_expressions() checks if the column in the subquery is ok to accept push downs or not. One of the checks is to see if the query has windowing functions and to ensure that the column is in all the PARTITION BY clauses of each windowing function. That check is done by checking if a ressortgroupref is assigned and matches a tleSortGroupRef in the PARTITION BY clause. In this case, it does not match. We didn't assign any ressortgroupref to the whole-row var. Unfortunately, whole-row vars are a bit to 2nd class citizen when it comes to the query planner. Also, it would be quite a bit of effort to make the planner push down the qual in this case. We'd need some sort of ability to assign ressortgroupref to a particular column within a whole-row var and we'd need to adjust the code to check for that when doing subquery pushdowns to allow it to mention which columns within whole-row vars can legally accept pushdowns. I imagine that's unlikely to be fixed any time soon. Whole-row vars just don't seem to be used commonly enough to warrant going to the effort of making this stuff work. To work around this, you should include a reference to the actual column in the targetlist of the subquery, or your function, in this case, and ensure you use that same column in the PARTITION BY clause. You'll then need to write that column in your condition that you need pushed into the subquery. I'm sorry if that messes up your design. However, I imagine this is not the only optimisation that you'll miss out on by doing things the way you are. David
Hello David, I found a case when `not assigning a ressortgroupref to the whole-row var` cause wrong window function calculations. I use same query. The difference come when I wrap my query into function. (see full queries in attachment) 1. SELECT * FROM agreement_totals( tstzrange( '2020-07-01', '2020-08-01' ) ) WHERE agreement_id = 161::int AND (o).period_id = 10::int 2. SELECT * sum( .... ) over wagreement FROM .... WHERE agreement_id = 161::int AND (o).period_id = 10::int WINDOW wagreement AS ( PARTITION BY agreement_id ) For first query window function calculates SUM over all agreements, then some are filtered out by (o).period_id condition. But for second query agreements with "wrong" (o).period_id are filtered out, then SUM is calculated. I suppose here is problem with `not assigning a ressortgroupref to the whole-row var` which cause different calculation when I try to filter: (o).period_id I will also attach plans for both queries. Friday, May 14, 2021, 2:52:33 AM, you wrote: > On Fri, 14 May 2021 at 02:38, Eugen Konkov <kes-kes@yandex.ru> wrote: >> Now I create minimal reproducible test case. >> https://dbfiddle.uk/?rdbms=postgres_13&fiddle=761a00fb599789d3db31b120851d6341 >> Optimization is not applyed when I filter/partition by column using composite type name. > You probably already know this part, but let me explain it just in > case it's not clear. > The pushdown of the qual from the top-level query into the subquery, > or function, in this case, is only legal when the qual references a > column that's in the PARTITION BY clause of all window functions in > the subquery. The reason for this is, if we filter rows before > calling the window function, then it could affect which rows are in > see in the window's frame. If it did filter, that could cause > incorrect results. We can relax the restriction a bit if we can > eliminate entire partitions at once. The window function results are > independent between partitions, so we can allow qual pushdowns that > are in all PARTITION BY clauses. > As for the reason you're having trouble getting this to work, it's > down to the way you're using whole-row vars in your targetlist. > A slightly simplified case which shows this problem is: > create table ab(a int, b int); > explain select * from (select ab as wholerowvar,row_number() over > (partition by a) from ab) ab where (ab.wholerowvar).a=1; > The reason it does not work is down to how this is implemented > internally. The details are, transformGroupClause() not assigning a > ressortgroupref to the whole-row var. It's unable to because there is > no way to track which actual column within the whole row var is in the > partition by clause. When it comes to the code that tries to push the > qual down into the subquery, check_output_expressions() checks if the > column in the subquery is ok to accept push downs or not. One of the > checks is to see if the query has windowing functions and to ensure > that the column is in all the PARTITION BY clauses of each windowing > function. That check is done by checking if a ressortgroupref is > assigned and matches a tleSortGroupRef in the PARTITION BY clause. In > this case, it does not match. We didn't assign any ressortgroupref to > the whole-row var. > Unfortunately, whole-row vars are a bit to 2nd class citizen when it > comes to the query planner. Also, it would be quite a bit of effort to > make the planner push down the qual in this case. We'd need some sort > of ability to assign ressortgroupref to a particular column within a > whole-row var and we'd need to adjust the code to check for that when > doing subquery pushdowns to allow it to mention which columns within > whole-row vars can legally accept pushdowns. I imagine that's > unlikely to be fixed any time soon. Whole-row vars just don't seem to > be used commonly enough to warrant going to the effort of making this > stuff work. > To work around this, you should include a reference to the actual > column in the targetlist of the subquery, or your function, in this > case, and ensure you use that same column in the PARTITION BY clause. > You'll then need to write that column in your condition that you need > pushed into the subquery. I'm sorry if that messes up your design. > However, I imagine this is not the only optimisation that you'll miss > out on by doing things the way you are. > David -- Best regards, Eugen Konkov
Attachment
Hello David, I found a case when `not assigning a ressortgroupref to the whole-row var` cause wrong window function calculations. I use same query. The difference come when I wrap my query into function. (see full queries in attachment) 1. SELECT * FROM agreement_totals( tstzrange( '2020-07-01', '2020-08-01' ) ) WHERE agreement_id = 161::int AND (o).period_id = 10::int 2. SELECT * sum( .... ) over wagreement FROM .... WHERE agreement_id = 161::int AND (o).period_id = 10::int WINDOW wagreement AS ( PARTITION BY agreement_id ) For first query window function calculates SUM over all agreements, then some are filtered out by (o).period_id condition. But for second query agreements with "wrong" (o).period_id are filtered out, then SUM is calculated. I suppose here is problem with `not assigning a ressortgroupref to the whole-row var` which cause different calculation when I try to filter: (o).period_id I will also attach plans for both queries. Friday, May 14, 2021, 2:52:33 AM, you wrote: > On Fri, 14 May 2021 at 02:38, Eugen Konkov <kes-kes@yandex.ru> wrote: >> Now I create minimal reproducible test case. >> https://dbfiddle.uk/?rdbms=postgres_13&fiddle=761a00fb599789d3db31b120851d6341 >> Optimization is not applyed when I filter/partition by column using composite type name. > You probably already know this part, but let me explain it just in > case it's not clear. > The pushdown of the qual from the top-level query into the subquery, > or function, in this case, is only legal when the qual references a > column that's in the PARTITION BY clause of all window functions in > the subquery. The reason for this is, if we filter rows before > calling the window function, then it could affect which rows are in > see in the window's frame. If it did filter, that could cause > incorrect results. We can relax the restriction a bit if we can > eliminate entire partitions at once. The window function results are > independent between partitions, so we can allow qual pushdowns that > are in all PARTITION BY clauses. > As for the reason you're having trouble getting this to work, it's > down to the way you're using whole-row vars in your targetlist. > A slightly simplified case which shows this problem is: > create table ab(a int, b int); > explain select * from (select ab as wholerowvar,row_number() over > (partition by a) from ab) ab where (ab.wholerowvar).a=1; > The reason it does not work is down to how this is implemented > internally. The details are, transformGroupClause() not assigning a > ressortgroupref to the whole-row var. It's unable to because there is > no way to track which actual column within the whole row var is in the > partition by clause. When it comes to the code that tries to push the > qual down into the subquery, check_output_expressions() checks if the > column in the subquery is ok to accept push downs or not. One of the > checks is to see if the query has windowing functions and to ensure > that the column is in all the PARTITION BY clauses of each windowing > function. That check is done by checking if a ressortgroupref is > assigned and matches a tleSortGroupRef in the PARTITION BY clause. In > this case, it does not match. We didn't assign any ressortgroupref to > the whole-row var. > Unfortunately, whole-row vars are a bit to 2nd class citizen when it > comes to the query planner. Also, it would be quite a bit of effort to > make the planner push down the qual in this case. We'd need some sort > of ability to assign ressortgroupref to a particular column within a > whole-row var and we'd need to adjust the code to check for that when > doing subquery pushdowns to allow it to mention which columns within > whole-row vars can legally accept pushdowns. I imagine that's > unlikely to be fixed any time soon. Whole-row vars just don't seem to > be used commonly enough to warrant going to the effort of making this > stuff work. > To work around this, you should include a reference to the actual > column in the targetlist of the subquery, or your function, in this > case, and ensure you use that same column in the PARTITION BY clause. > You'll then need to write that column in your condition that you need > pushed into the subquery. I'm sorry if that messes up your design. > However, I imagine this is not the only optimisation that you'll miss > out on by doing things the way you are. > David -- Best regards, Eugen Konkov
On Sun, 16 May 2021 at 02:34, Eugen Konkov <kes-kes@yandex.ru> wrote: > I found a case when `not assigning a ressortgroupref to the whole-row var` cause > wrong window function calculations. > > I use same query. The difference come when I wrap my query into > function. (see full queries in attachment) > > 1. > SELECT * > FROM agreement_totals( tstzrange( '2020-07-01', '2020-08-01' ) ) > WHERE agreement_id = 161::int AND (o).period_id = 10::int > > 2. > SELECT * > sum( .... ) over wagreement > FROM .... > WHERE agreement_id = 161::int AND (o).period_id = 10::int > WINDOW wagreement AS ( PARTITION BY agreement_id ) > > For first query window function calculates SUM over all agreements, > then some are filtered out by (o).period_id condition. This is unrelated to the optimisation that you were asking about before. All that's going on here is that WHERE is evaluated before SELECT. This means that your filtering is done before the window functions are executed. This is noted in the docs in [1]: > The rows considered by a window function are those of the “virtual table” produced by the query's FROM clause as filteredby its WHERE, GROUP BY, and HAVING clauses if any. For example, a row removed because it does not meet the WHEREcondition is not seen by any window function. A query can contain multiple window functions that slice up the data indifferent ways using different OVER clauses, but they all act on the same collection of rows defined by this virtual table. If you want to filter rows after the window functions are evaluated then you'll likely want to use a subquery. David [1] https://www.postgresql.org/docs/13/tutorial-window.html
On Sun, 16 May 2021 at 02:34, Eugen Konkov <kes-kes@yandex.ru> wrote: > I found a case when `not assigning a ressortgroupref to the whole-row var` cause > wrong window function calculations. > > I use same query. The difference come when I wrap my query into > function. (see full queries in attachment) > > 1. > SELECT * > FROM agreement_totals( tstzrange( '2020-07-01', '2020-08-01' ) ) > WHERE agreement_id = 161::int AND (o).period_id = 10::int > > 2. > SELECT * > sum( .... ) over wagreement > FROM .... > WHERE agreement_id = 161::int AND (o).period_id = 10::int > WINDOW wagreement AS ( PARTITION BY agreement_id ) > > For first query window function calculates SUM over all agreements, > then some are filtered out by (o).period_id condition. This is unrelated to the optimisation that you were asking about before. All that's going on here is that WHERE is evaluated before SELECT. This means that your filtering is done before the window functions are executed. This is noted in the docs in [1]: > The rows considered by a window function are those of the “virtual table” produced by the query's FROM clause as filteredby its WHERE, GROUP BY, and HAVING clauses if any. For example, a row removed because it does not meet the WHEREcondition is not seen by any window function. A query can contain multiple window functions that slice up the data indifferent ways using different OVER clauses, but they all act on the same collection of rows defined by this virtual table. If you want to filter rows after the window functions are evaluated then you'll likely want to use a subquery. David [1] https://www.postgresql.org/docs/13/tutorial-window.html
On Sat, 15 May 2021 at 00:39, KES <kes-kes@yandex.ru> wrote: > > Thank you for detailed explanation. I glad to hear that I can use aliases and this will be recognized and optimizationis applied. > > >We'd need some sort of ability to assign ressortgroupref to a particular column within a > whole-row var > Could it be possible to create hidden alias in same way as I did that manually? > > Algorithm seems not complex: > 1. User refer column from composite type/whole-row: (o).agreement_id > 2. Create hidden column at select: _o_agreement_id > 3. Replace other references to (o).agreement_id by _o_agreement_id > 4. Process query as usual after replacements Internally Postgresql does use a hidden column for columns that are required for calculations which are not in the SELECT list. e.g ones that are in the GROUP BY / ORDER BY, or in your case a window function's PARTITION BY. We call these "resjunk" columns. The problem is you can't reference those from the parent query. If you explicitly had listed that column in the SELECT clause, it won't cost you anything more since the planner will add it regardless and just hide it from you. When you add it yourself you'll be able to use it in the subquery and you'll be able to filter out the partitions that you don't want. I really think you're driving yourself down a difficult path by expecting queries with whole-row vars to be optimised just as well as using select * or explicitly listing the columns. David
On Sat, 15 May 2021 at 00:39, KES <kes-kes@yandex.ru> wrote: > > Thank you for detailed explanation. I glad to hear that I can use aliases and this will be recognized and optimizationis applied. > > >We'd need some sort of ability to assign ressortgroupref to a particular column within a > whole-row var > Could it be possible to create hidden alias in same way as I did that manually? > > Algorithm seems not complex: > 1. User refer column from composite type/whole-row: (o).agreement_id > 2. Create hidden column at select: _o_agreement_id > 3. Replace other references to (o).agreement_id by _o_agreement_id > 4. Process query as usual after replacements Internally Postgresql does use a hidden column for columns that are required for calculations which are not in the SELECT list. e.g ones that are in the GROUP BY / ORDER BY, or in your case a window function's PARTITION BY. We call these "resjunk" columns. The problem is you can't reference those from the parent query. If you explicitly had listed that column in the SELECT clause, it won't cost you anything more since the planner will add it regardless and just hide it from you. When you add it yourself you'll be able to use it in the subquery and you'll be able to filter out the partitions that you don't want. I really think you're driving yourself down a difficult path by expecting queries with whole-row vars to be optimised just as well as using select * or explicitly listing the columns. David
Hello David, Saturday, May 15, 2021, 5:52:47 PM, you wrote: > On Sun, 16 May 2021 at 02:34, Eugen Konkov <kes-kes@yandex.ru> wrote: >> I found a case when `not assigning a ressortgroupref to the whole-row var` cause >> wrong window function calculations. >> >> I use same query. The difference come when I wrap my query into >> function. (see full queries in attachment) >> >> 1. >> SELECT * >> FROM agreement_totals( tstzrange( '2020-07-01', '2020-08-01' ) ) >> WHERE agreement_id = 161::int AND (o).period_id = 10::int >> >> 2. >> SELECT * >> sum( .... ) over wagreement >> FROM .... >> WHERE agreement_id = 161::int AND (o).period_id = 10::int >> WINDOW wagreement AS ( PARTITION BY agreement_id ) >> >> For first query window function calculates SUM over all agreements, >> then some are filtered out by (o).period_id condition. > This is unrelated to the optimisation that you were asking about before. > All that's going on here is that WHERE is evaluated before SELECT. > This means that your filtering is done before the window functions are > executed. This is noted in the docs in [1]: >> The rows considered by a window function are those of the “virtual table” produced by the query's FROM clause as filteredby its WHERE, GROUP BY, and HAVING clauses if any. For example, a row removed because it does not meet the WHEREcondition is not seen by any window function. A query can contain multiple window functions that slice up the data indifferent ways using different OVER clauses, but they all act on the same collection of rows defined by this virtual table. > If you want to filter rows after the window functions are evaluated > then you'll likely want to use a subquery. > David > [1] https://www.postgresql.org/docs/13/tutorial-window.html Sorry, I miss that WHERE works first and after it window function. >This is unrelated to the optimisation that you were asking about before. So, yes, unrelated. Thank you for your answers. -- Best regards, Eugen Konkov
Hello David, Saturday, May 15, 2021, 5:52:47 PM, you wrote: > On Sun, 16 May 2021 at 02:34, Eugen Konkov <kes-kes@yandex.ru> wrote: >> I found a case when `not assigning a ressortgroupref to the whole-row var` cause >> wrong window function calculations. >> >> I use same query. The difference come when I wrap my query into >> function. (see full queries in attachment) >> >> 1. >> SELECT * >> FROM agreement_totals( tstzrange( '2020-07-01', '2020-08-01' ) ) >> WHERE agreement_id = 161::int AND (o).period_id = 10::int >> >> 2. >> SELECT * >> sum( .... ) over wagreement >> FROM .... >> WHERE agreement_id = 161::int AND (o).period_id = 10::int >> WINDOW wagreement AS ( PARTITION BY agreement_id ) >> >> For first query window function calculates SUM over all agreements, >> then some are filtered out by (o).period_id condition. > This is unrelated to the optimisation that you were asking about before. > All that's going on here is that WHERE is evaluated before SELECT. > This means that your filtering is done before the window functions are > executed. This is noted in the docs in [1]: >> The rows considered by a window function are those of the “virtual table” produced by the query's FROM clause as filteredby its WHERE, GROUP BY, and HAVING clauses if any. For example, a row removed because it does not meet the WHEREcondition is not seen by any window function. A query can contain multiple window functions that slice up the data indifferent ways using different OVER clauses, but they all act on the same collection of rows defined by this virtual table. > If you want to filter rows after the window functions are evaluated > then you'll likely want to use a subquery. > David > [1] https://www.postgresql.org/docs/13/tutorial-window.html Sorry, I miss that WHERE works first and after it window function. >This is unrelated to the optimisation that you were asking about before. So, yes, unrelated. Thank you for your answers. -- Best regards, Eugen Konkov
Hello David, > I really think you're driving yourself down a difficult path by > expecting queries with whole-row vars to be optimised just as well as > using select * or explicitly listing the columns. Yes, I was expect that. I use whole-row because do not want repeat all 10+ columns at select. I do not use (row1).*, (row2).*, because rows could have same columns. eg: row1.name, row2.name both will be named as 'name' and then I can not distinguish them. So I select whole-row and put myself into problems (( It would be nice if (row1).** will be expanded to: row1_id, row1_name etc. But this is other question which I already ask at different thread. Saturday, May 15, 2021, 5:59:41 PM, you wrote: > On Sat, 15 May 2021 at 00:39, KES <kes-kes@yandex.ru> wrote: >> >> Thank you for detailed explanation. I glad to hear that I can use aliases and this will be recognized and optimizationis applied. >> >> >We'd need some sort of ability to assign ressortgroupref to a particular column within a >> whole-row var >> Could it be possible to create hidden alias in same way as I did that manually? >> >> Algorithm seems not complex: >> 1. User refer column from composite type/whole-row: (o).agreement_id >> 2. Create hidden column at select: _o_agreement_id >> 3. Replace other references to (o).agreement_id by _o_agreement_id >> 4. Process query as usual after replacements > Internally Postgresql does use a hidden column for columns that are > required for calculations which are not in the SELECT list. e.g ones > that are in the GROUP BY / ORDER BY, or in your case a window > function's PARTITION BY. We call these "resjunk" columns. The problem > is you can't reference those from the parent query. If you explicitly > had listed that column in the SELECT clause, it won't cost you > anything more since the planner will add it regardless and just hide > it from you. When you add it yourself you'll be able to use it in the > subquery and you'll be able to filter out the partitions that you > don't want. > I really think you're driving yourself down a difficult path by > expecting queries with whole-row vars to be optimised just as well as > using select * or explicitly listing the columns. > David -- Best regards, Eugen Konkov
Hello David, > I really think you're driving yourself down a difficult path by > expecting queries with whole-row vars to be optimised just as well as > using select * or explicitly listing the columns. Yes, I was expect that. I use whole-row because do not want repeat all 10+ columns at select. I do not use (row1).*, (row2).*, because rows could have same columns. eg: row1.name, row2.name both will be named as 'name' and then I can not distinguish them. So I select whole-row and put myself into problems (( It would be nice if (row1).** will be expanded to: row1_id, row1_name etc. But this is other question which I already ask at different thread. Saturday, May 15, 2021, 5:59:41 PM, you wrote: > On Sat, 15 May 2021 at 00:39, KES <kes-kes@yandex.ru> wrote: >> >> Thank you for detailed explanation. I glad to hear that I can use aliases and this will be recognized and optimizationis applied. >> >> >We'd need some sort of ability to assign ressortgroupref to a particular column within a >> whole-row var >> Could it be possible to create hidden alias in same way as I did that manually? >> >> Algorithm seems not complex: >> 1. User refer column from composite type/whole-row: (o).agreement_id >> 2. Create hidden column at select: _o_agreement_id >> 3. Replace other references to (o).agreement_id by _o_agreement_id >> 4. Process query as usual after replacements > Internally Postgresql does use a hidden column for columns that are > required for calculations which are not in the SELECT list. e.g ones > that are in the GROUP BY / ORDER BY, or in your case a window > function's PARTITION BY. We call these "resjunk" columns. The problem > is you can't reference those from the parent query. If you explicitly > had listed that column in the SELECT clause, it won't cost you > anything more since the planner will add it regardless and just hide > it from you. When you add it yourself you'll be able to use it in the > subquery and you'll be able to filter out the partitions that you > don't want. > I really think you're driving yourself down a difficult path by > expecting queries with whole-row vars to be optimised just as well as > using select * or explicitly listing the columns. > David -- Best regards, Eugen Konkov