Thread: Re: [BUGS] BUG #14890: Error grouping by same column twice using FDW
sean.johnston@edgeintelligence.com writes: > select col1, col1 from loop_tbl group by 1, 2; > psql:t.sql:7: ERROR: targetlist item has multiple sortgroupref labels This is the fault of commit 7012b132d ("Push down aggregates to remote servers"), which imagined that postgres_fdw can use apply_pathtarget_labeling_to_tlist() for situations well beyond that function's limited abilities. I kinda suspect that foreign_grouping_ok() has got more bugs than this with complicated sortgroupref situations, too; its willingness to scribble on the sortgrouprefs doesn't look like a great idea to me. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Nov 8, 2017 at 3:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
In the attached patch, I have removed apply_pathtarget_labeling_to_tlist()
call. Since we are anyways checking that whether the GROUP BY expression is
entirely present in a target list or not, we can safely create a tle for that
GROUP BY expression and append that to the new targetlist by making sure that
its sortgroupref is also transferred, eliminating the need for relabeling
afterwords.
Now with these changes i.e. as we have moved setting sortgroupref for tle
inside the if block itself, we don't have to reset the sortgroupref for any
ORDER BY expression. That code is removed. This also eliminated the need for
copying the pathtarget.
Please have a look at the changes and let me know if I missed any.
Thanks
sean.johnston@edgeintelligence.com writes:
> select col1, col1 from loop_tbl group by 1, 2;
> psql:t.sql:7: ERROR: targetlist item has multiple sortgroupref labels
This is the fault of commit 7012b132d ("Push down aggregates to remote
servers"), which imagined that postgres_fdw can use
apply_pathtarget_labeling_to_tlist() for situations well beyond that
function's limited abilities. I kinda suspect that foreign_grouping_ok()
has got more bugs than this with complicated sortgroupref situations, too;
its willingness to scribble on the sortgrouprefs doesn't look like a great
idea to me.
In the attached patch, I have removed apply_pathtarget_labeling_to_tlist()
call. Since we are anyways checking that whether the GROUP BY expression is
entirely present in a target list or not, we can safely create a tle for that
GROUP BY expression and append that to the new targetlist by making sure that
its sortgroupref is also transferred, eliminating the need for relabeling
afterwords.
Now with these changes i.e. as we have moved setting sortgroupref for tle
inside the if block itself, we don't have to reset the sortgroupref for any
ORDER BY expression. That code is removed. This also eliminated the need for
copying the pathtarget.
Please have a look at the changes and let me know if I missed any.
Thanks
regards, tom lane
--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Attachment
On Thu, Nov 9, 2017 at 5:29 PM, Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote: > > > In the attached patch, I have removed apply_pathtarget_labeling_to_tlist() > call. Since we are anyways checking that whether the GROUP BY expression is > entirely present in a target list or not, we can safely create a tle for > that > GROUP BY expression and append that to the new targetlist by making sure > that > its sortgroupref is also transferred, eliminating the need for relabeling > afterwords. > > Now with these changes i.e. as we have moved setting sortgroupref for tle > inside the if block itself, we don't have to reset the sortgroupref for any > ORDER BY expression. That code is removed. This also eliminated the need for > copying the pathtarget. Some review comments. + Remote SQL: SELECT c2, c2 FROM "S 1"."T 1" WHERE ((c2 = 6)) GROUP BY c2, c2 GROUP BY 1, 2 is changed to GROUP BY c2, c2 which is technically wrong. The remote planner will think that both the GROUP BY entries refer to one of the (possibly the first) entry in the SELECT clause. That's not what really it is. This isn't a real problem, since we push only immutable expressions, but nonetheless something to avoid if we can. That brings another question, do we really bother to pass duplicate entries in the GROUP BY clause as they are? Can we just pass a single one? The result will be the same since both these entries are immutable. May be we should pass repeated entries in GROUP BY clause just in case, we somehow manage to ship non-immutable expressions in future. Thoughts? - * have those arising from ORDER BY clause. These sortgrouprefs may be - * different from those in the plan's targetlist. May be we were not explaining this correctly earlier. The sortgrouprefs of GROUP BY clause can not be different between those two tlists. The difference is really the absence of ORDER BY entries. May be we should add some tests where there some entries common between ORDER BY and GROUP BY. + /* + * Pushable, add to tlist. We need to create a tle for this + * expression and need to transfer the sortgroupref too. We cannot + * use add_to_flat_tlist() here as it avoids the duplicate entries + * in the targetlist but here we want those duplicate entries as + * there can be multiple GROUP BY expressions pointing to the same + * column at different positions. + */ + tle = makeTargetEntry((Expr *) expr, list_length(tlist) + 1, NULL, + false); + tle->ressortgroupref = sgref; + tlist = lappend(tlist, tle); May be we want to use add_flat_to_tlist() when sgref is not present to avoid duplicate entries. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes: > On Thu, Nov 9, 2017 at 5:29 PM, Jeevan Chalke > <jeevan.chalke@enterprisedb.com> wrote: >> + Remote SQL: SELECT c2, c2 FROM "S 1"."T 1" WHERE ((c2 = 6)) GROUP BY c2, c2 > GROUP BY 1, 2 is changed to GROUP BY c2, c2 which is technically wrong. The > remote planner will think that both the GROUP BY entries refer to one of the > (possibly the first) entry in the SELECT clause. That's not what really it is. Yeah. I'm inclined to think that part of what needs to happen here is for postgres_fdw to change over to always emitting GROUP BY column-number, so that the grouping columns are clearly matched up with the tlist entries it's considering, and the remote parser is certain to build ressortgrouprefs that match what we thought was happening locally. As you say, we can probably get away without that as long as we don't push mutable grouping expressions ... but just because we think a grouping expression is immutable at our end doesn't necessarily mean that it is at the far end. Also, in view of the (as yet unfixed) bug discussed in https://www.postgresql.org/message-id/flat/7dbdcf5c-b5a6-ef89-4958-da212fe10176@iki.fi there's no hope of extending postgres_fdw to push GROUPING SETS correctly unless it is able to distinguish textually-equal grouping columns. > May be we were not explaining this correctly earlier. The sortgrouprefs of > GROUP BY clause can not be different between those two tlists. The difference > is really the absence of ORDER BY entries. May be we should add some tests > where there some entries common between ORDER BY and GROUP BY. As I alluded to upthread, I suspect that dropping ORDER BY markings from the tlist is likely to break some cases (that is, the planner may expect the output of the foreign scan to include those columns). If this isn't fully exercised by the existing tests then we definitely need more tests. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Fri, Nov 10, 2017 at 6:14 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
+ /*
+ * Pushable, add to tlist. We need to create a tle for this
+ * expression and need to transfer the sortgroupref too. We cannot
+ * use add_to_flat_tlist() here as it avoids the duplicate entries
+ * in the targetlist but here we want those duplicate entries as
+ * there can be multiple GROUP BY expressions pointing to the same
+ * column at different positions.
+ */
+ tle = makeTargetEntry((Expr *) expr, list_length(tlist) + 1, NULL,
+ false);
+ tle->ressortgroupref = sgref;
+ tlist = lappend(tlist, tle);
May be we want to use add_flat_to_tlist() when sgref is not present to avoid
duplicate entries.
These code changes are already inside sgref check. And for other places we do use add_to_flat_tlist().
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
On Sat, Nov 11, 2017 at 12:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:
> On Thu, Nov 9, 2017 at 5:29 PM, Jeevan Chalke
> <jeevan.chalke@enterprisedb.com> wrote:
>> + Remote SQL: SELECT c2, c2 FROM "S 1"."T 1" WHERE ((c2 = 6)) GROUP BY c2, c2
> GROUP BY 1, 2 is changed to GROUP BY c2, c2 which is technically wrong. The
> remote planner will think that both the GROUP BY entries refer to one of the
> (possibly the first) entry in the SELECT clause. That's not what really it is.
Yeah. I'm inclined to think that part of what needs to happen here is for
postgres_fdw to change over to always emitting GROUP BY column-number,
so that the grouping columns are clearly matched up with the tlist entries
it's considering, and the remote parser is certain to build
ressortgrouprefs that match what we thought was happening locally.
Understood.
Along with the last patch changes which does put duplicate entries if they
were referred in GROUP BY clause, in attached patch I have modified deparsing
logic to emit GROUP BY column-number.
were referred in GROUP BY clause, in attached patch I have modified deparsing
logic to emit GROUP BY column-number.
As you say, we can probably get away without that as long as we don't push
mutable grouping expressions ... but just because we think a grouping
expression is immutable at our end doesn't necessarily mean that it is at
the far end. Also, in view of the (as yet unfixed) bug discussed in
https://www.postgresql.org/message-id/flat/7dbdcf5c-b5a6- ef89-4958-da212fe10176@iki.fi
there's no hope of extending postgres_fdw to push GROUPING SETS correctly
unless it is able to distinguish textually-equal grouping columns.
> May be we were not explaining this correctly earlier. The sortgrouprefs of
> GROUP BY clause can not be different between those two tlists. The difference
> is really the absence of ORDER BY entries. May be we should add some tests
> where there some entries common between ORDER BY and GROUP BY.
As I alluded to upthread, I suspect that dropping ORDER BY markings from
the tlist is likely to break some cases (that is, the planner may expect
the output of the foreign scan to include those columns). If this isn't
fully exercised by the existing tests then we definitely need more tests.
We do have test-coverage where ORDER BY and GROUP BY have common entries.
All ORDER BY expressions are part of the grouping targets and they must be
either plain vars or aggregate function. Plain vars are anyways part of the
GROUP BY expression otherwsie we will end-up in a error. And for aggregate
function, we do include those in a targetlist.
either plain vars or aggregate function. Plain vars are anyways part of the
GROUP BY expression otherwsie we will end-up in a error. And for aggregate
function, we do include those in a targetlist.
regards, tom lane
Let me know if I missed any or wrongly interpreted anything over here.
Thanks
--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Attachment
On Sat, Nov 11, 2017 at 12:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > As you say, we can probably get away without that as long as we don't push > mutable grouping expressions ... but just because we think a grouping > expression is immutable at our end doesn't necessarily mean that it is at > the far end. Well, I thought of that case but then we may have problems without GROUP BY. use_physical_tlist() doesn't list ForeignScan as an exception. deparseTargetList() constructs the tlist based on attrs_used, which doesn't know if the same column was repeated in the SELECT clause. build_tlist_to_deparse() flattens the tlist to be deparsed. I think the assumption is that the user won't use immutable expression as a column. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On Wed, Nov 15, 2017 at 2:36 PM, Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote: >> >> As I alluded to upthread, I suspect that dropping ORDER BY markings from >> the tlist is likely to break some cases (that is, the planner may expect >> the output of the foreign scan to include those columns). If this isn't >> fully exercised by the existing tests then we definitely need more tests. > > > We do have test-coverage where ORDER BY and GROUP BY have common entries. > > All ORDER BY expressions are part of the grouping targets and they must be > either plain vars or aggregate function. Plain vars are anyways part of the > GROUP BY expression otherwsie we will end-up in a error. And for aggregate > function, we do include those in a targetlist. All the ORDER BY expressions are part of the targetlist. If the whole expression is pushable, we check if the Vars and Aggrefs involved in that expressions are pushable and include those in the targetlist to the pushed down. If they are not pushable, we do not push aggregate/grouping. So, we are good wrt that. I also verified that the current tests are enough. I am fine with the patch. It introduces a lot of expected output diff, since we always deparse GROUP BY clause in positional notation. That's fine. There is no point in adding extra logic to use positional notation only when there are duplicate entries in the GROUP BY clause. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Hi,
I have added this to the 2018-01 Commitfest.
Here is the link to the entry:I have added this to the 2018-01 Commitfest.
https://commitfest.postgresql.
Earlier attached patch still applies cleanly on the latest HEAD.
Thanks--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes: > I am fine with the patch. It introduces a lot of expected output diff, > since we always deparse GROUP BY clause in positional notation. That's > fine. There is no point in adding extra logic to use positional > notation only when there are duplicate entries in the GROUP BY clause. Yeah, agreed, particularly since that behavior has only been there since 10.0 anyway. Pushed with a few cosmetic adjustments (mostly, improving shaky English in the comments in foreign_grouping_ok()). regards, tom lane