Thread: GROUP BY and inheritance issue

GROUP BY and inheritance issue

From
Manuel Rigger
Date:
Hi everyone,

Consider the example below:

CREATE TABLE t0(c0 INT PRIMARY KEY, c1 INT);
CREATE TABLE t1(c0 INT) INHERITS (t0);
INSERT INTO t0(c0, c1) VALUES(0, 0);
INSERT INTO t1(c0, c1) VALUES(0, 1);
SELECT c0, c1 FROM t0 GROUP BY c0, c1; -- expected: 0|0 and 0|1, actual: 0|0

Note that column c0 in t0 and t1 are merged. The GROUP BY clause above
causes only one row to be fetched, while I'd expect that both are
fetched (which is the behavior when no GROUP BY is used). Section
5.9.1 [1] in the documentation mentions some caveats of using
inheritance, also stating that the PRIMARY KEY is not inherited. Is
this some implication of this or a bug?

Best,
Manuel

[1] https://www.postgresql.org/docs/11/ddl-inherit.html



Re: GROUP BY and inheritance issue

From
David Rowley
Date:
On Wed, 3 Jul 2019 at 00:47, Manuel Rigger <rigger.manuel@gmail.com> wrote:
> Consider the example below:
>
> CREATE TABLE t0(c0 INT PRIMARY KEY, c1 INT);
> CREATE TABLE t1(c0 INT) INHERITS (t0);
> INSERT INTO t0(c0, c1) VALUES(0, 0);
> INSERT INTO t1(c0, c1) VALUES(0, 1);
> SELECT c0, c1 FROM t0 GROUP BY c0, c1; -- expected: 0|0 and 0|1, actual: 0|0
>
> Note that column c0 in t0 and t1 are merged. The GROUP BY clause above
> causes only one row to be fetched, while I'd expect that both are
> fetched (which is the behavior when no GROUP BY is used). Section
> 5.9.1 [1] in the documentation mentions some caveats of using
> inheritance, also stating that the PRIMARY KEY is not inherited. Is
> this some implication of this or a bug?

Thanks for the report.  This is a bug.

Basically, there is some code in remove_useless_groupby_columns() that
thinks because t0 has a primary key on c0, that it can just GROUP BY
c0 instead of c0, c1. If you look at the EXPLAIN you'll see the
planner removed the c1 column from the GROUP BY.  Really the planner
needs to consider that the relation might be an inheritance parent and
skip the optimisation in that case.

It might be a simple fix to just skip anything with rte->inh in
foreach(lc, parse->rtable) loop in remove_useless_groupby_columns(),
but it's late here, so will look a bit harder tomorrow. It'll need a
bit more thought about partitioned tables as the optimisation might be
valid for those.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: GROUP BY and inheritance issue

From
Manuel Rigger
Date:
Hi David,

Okay, thank you for the quick response and the explanation!

Best,
Manuel

On Tue, Jul 2, 2019 at 3:13 PM David Rowley
<david.rowley@2ndquadrant.com> wrote:
>
> On Wed, 3 Jul 2019 at 00:47, Manuel Rigger <rigger.manuel@gmail.com> wrote:
> > Consider the example below:
> >
> > CREATE TABLE t0(c0 INT PRIMARY KEY, c1 INT);
> > CREATE TABLE t1(c0 INT) INHERITS (t0);
> > INSERT INTO t0(c0, c1) VALUES(0, 0);
> > INSERT INTO t1(c0, c1) VALUES(0, 1);
> > SELECT c0, c1 FROM t0 GROUP BY c0, c1; -- expected: 0|0 and 0|1, actual: 0|0
> >
> > Note that column c0 in t0 and t1 are merged. The GROUP BY clause above
> > causes only one row to be fetched, while I'd expect that both are
> > fetched (which is the behavior when no GROUP BY is used). Section
> > 5.9.1 [1] in the documentation mentions some caveats of using
> > inheritance, also stating that the PRIMARY KEY is not inherited. Is
> > this some implication of this or a bug?
>
> Thanks for the report.  This is a bug.
>
> Basically, there is some code in remove_useless_groupby_columns() that
> thinks because t0 has a primary key on c0, that it can just GROUP BY
> c0 instead of c0, c1. If you look at the EXPLAIN you'll see the
> planner removed the c1 column from the GROUP BY.  Really the planner
> needs to consider that the relation might be an inheritance parent and
> skip the optimisation in that case.
>
> It might be a simple fix to just skip anything with rte->inh in
> foreach(lc, parse->rtable) loop in remove_useless_groupby_columns(),
> but it's late here, so will look a bit harder tomorrow. It'll need a
> bit more thought about partitioned tables as the optimisation might be
> valid for those.
>
> --
>  David Rowley                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services



Re: GROUP BY and inheritance issue

From
David Rowley
Date:
On Wed, 3 Jul 2019 at 01:13, David Rowley <david.rowley@2ndquadrant.com> wrote:
>
> On Wed, 3 Jul 2019 at 00:47, Manuel Rigger <rigger.manuel@gmail.com> wrote:
> > Consider the example below:
> >
> > CREATE TABLE t0(c0 INT PRIMARY KEY, c1 INT);
> > CREATE TABLE t1(c0 INT) INHERITS (t0);
> > INSERT INTO t0(c0, c1) VALUES(0, 0);
> > INSERT INTO t1(c0, c1) VALUES(0, 1);
> > SELECT c0, c1 FROM t0 GROUP BY c0, c1; -- expected: 0|0 and 0|1, actual: 0|0
> >
> > Note that column c0 in t0 and t1 are merged. The GROUP BY clause above
> > causes only one row to be fetched, while I'd expect that both are
> > fetched (which is the behavior when no GROUP BY is used). Section
> > 5.9.1 [1] in the documentation mentions some caveats of using
> > inheritance, also stating that the PRIMARY KEY is not inherited. Is
> > this some implication of this or a bug?
>
> Thanks for the report.  This is a bug.

I've pushed a fix for this.


--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: GROUP BY and inheritance issue

From
Manuel Rigger
Date:
Great, thanks a lot!

Best,
Manuel

On Thu, Jul 4, 2019 at 12:12 AM David Rowley
<david.rowley@2ndquadrant.com> wrote:
>
> On Wed, 3 Jul 2019 at 01:13, David Rowley <david.rowley@2ndquadrant.com> wrote:
> >
> > On Wed, 3 Jul 2019 at 00:47, Manuel Rigger <rigger.manuel@gmail.com> wrote:
> > > Consider the example below:
> > >
> > > CREATE TABLE t0(c0 INT PRIMARY KEY, c1 INT);
> > > CREATE TABLE t1(c0 INT) INHERITS (t0);
> > > INSERT INTO t0(c0, c1) VALUES(0, 0);
> > > INSERT INTO t1(c0, c1) VALUES(0, 1);
> > > SELECT c0, c1 FROM t0 GROUP BY c0, c1; -- expected: 0|0 and 0|1, actual: 0|0
> > >
> > > Note that column c0 in t0 and t1 are merged. The GROUP BY clause above
> > > causes only one row to be fetched, while I'd expect that both are
> > > fetched (which is the behavior when no GROUP BY is used). Section
> > > 5.9.1 [1] in the documentation mentions some caveats of using
> > > inheritance, also stating that the PRIMARY KEY is not inherited. Is
> > > this some implication of this or a bug?
> >
> > Thanks for the report.  This is a bug.
>
> I've pushed a fix for this.
>
>
> --
>  David Rowley                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services