Thread: GROUP BY using tablename.* does not work if tablename has 1 column with NULL values

GROUP BY using tablename.* does not work if tablename has 1 column with NULL values

From
"Narayanan Iyer"
Date:

Hi,

 

In the below example, t1 points to a table with just 1 column (lastName) and so I expect the 2 SELECT queries (pasted below) using t1.lastName or t1.* syntax to produce the exact same results. But the latter produces one extra row of output (3 rows vs 2 rows).

 

$ psql --version

psql (PostgreSQL) 13.4 (Ubuntu 13.4-0ubuntu0.21.04.1)

 

$ psql db

db=> DROP TABLE IF EXISTS tmp;

DROP TABLE

db=> CREATE TABLE tmp (id INTEGER PRIMARY KEY, firstName VARCHAR(30), lastName VARCHAR(30));

CREATE TABLE

db=> INSERT INTO tmp VALUES (1,'Acid','Burn');

INSERT 0 1

db=> INSERT INTO tmp VALUES (2,'Joey',NULL);

INSERT 0 1

db=> SELECT COUNT(t1.lastName) FROM tmp t2 LEFT JOIN (SELECT lastName FROM tmp) AS t1 ON (t2.firstName <= 'Acid') GROUP BY t1.lastName;

count

0

1

(2 rows)

db=> SELECT COUNT(t1.*) FROM tmp t2 LEFT JOIN (SELECT lastName FROM tmp) AS t1 ON (t2.firstName <= 'Acid') GROUP BY t1.*;

count

1

1

0

(3 rows)

 

It seems like a Postgres bug to me. Not sure if there is a Postgres setting that I need to enable in order to avoid this discrepancy.

 

Let me know if you need any more information.

 

Thanks,

Narayanan.

On Fri, 8 Oct 2021 at 17:54, Narayanan Iyer <nars@yottadb.com> wrote:
> In the below example, t1 points to a table with just 1 column (lastName) and so I expect the 2 SELECT queries (pasted
below)using t1.lastName or t1.* syntax to produce the exact same results. But the latter produces one extra row of
output(3 rows vs 2 rows). 

Given your inner query select three single value rows, one of them is
null, and https://www.postgresql.org/docs/13/functions-aggregate.html
says:

count ( * ) → bigint  Computes the number of input rows.
count ( "any" ) → bigint Computes the number of input rows in which
the input value is not null.

It does not seem like a bug ( * does not mean the same everywhere ).
It maybe a missfeature or a missinterpretation of the std, but given
how careful developers are on this one I highly doubt it, I would vote
for erroneous expectations.

Francisco Olarte.



"Narayanan Iyer" <nars@yottadb.com> writes:
> In the below example, t1 points to a table with just 1 column (lastName) and so I expect the 2 SELECT queries (pasted
below)using t1.lastName or t1.* syntax to produce the exact same results. But the latter produces one extra row of
output(3 rows vs 2 rows). 

I don't see any bug here.  If you take out the grouping it's a bit easier
to see what's going on:

SELECT *, t1.*::record FROM tmp t2 LEFT JOIN (SELECT lastName FROM tmp) AS t1 ON (t2.firstName <= 'Acid');
 id | firstname | lastname | lastname |   t1
----+-----------+----------+----------+--------
  1 | Acid      | Burn     | Burn     | (Burn)
  1 | Acid      | Burn     |          | ()
  2 | Joey      |          |          |
(3 rows)

The first row comes from joining t2 Acid,Burn to t1 Acid,Burn.
The second row comes from joining t2 Acid,Burn to t1 Joey,NULL.
The third row comes from failing to join t2 Joey,NULL to anything.

So in the second row, there is a matched t1 row, and t1.*
therefore represents a non-null composite value that happens
to contain one null field.  In the third row, there is no
matched t1 row at all, so we consider that t1.* is a composite NULL,
which is different from a composite containing NULL fields.

You can argue about whether composite NULL ought to be considered
identical to ROW(NULL).  But there are a lot of contexts where
that would be a bad idea, and very few where it'd be good.

> It seems like a Postgres bug to me. Not sure if there is a Postgres setting that I need to enable in order to avoid
thisdiscrepancy. 

You might be able to get the results you want by explicitly
converting to row-containing-nulls, which is not very hard:

 SELECT *, row(t1.*) FROM tmp t2 LEFT JOIN (SELECT lastName FROM tmp) AS t1 ON (t2.firstName <= 'Acid');
 id | firstname | lastname | lastname |  row
----+-----------+----------+----------+--------
  1 | Acid      | Burn     | Burn     | (Burn)
  1 | Acid      | Burn     |          | ()
  2 | Joey      |          |          | ()
(3 rows)

            regards, tom lane



Re: GROUP BY using tablename.* does not work if tablename has 1 column with NULL values

From
"David G. Johnston"
Date:
On Friday, October 8, 2021, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Narayanan Iyer" <nars@yottadb.com> writes:
> In the below example, t1 points to a table with just 1 column (lastName) and so I expect the 2 SELECT queries (pasted below) using t1.lastName or t1.* syntax to produce the exact same results. But the latter produces one extra row of output (3 rows vs 2 rows).

SELECT *, t1.*::record FROM tmp t2 LEFT JOIN (SELECT lastName FROM tmp) AS t1 ON (t2.firstName <= 'Acid');

The third row comes from failing to join t2 Joey,NULL to anything.

This is the part that seems the most unusual (from a “why did you write the query that way” perspective, not the result).  A left join’s on clause does not act as a filter for the left side table, so a record with t1.firstName=Joey can still be output.  So, on the whole, this is just a poorly written query that takes too much effort for someone to understand due to the non-traditional use of left join and an on clause that doesn’t actually join and oddly decides to restrict the left side.

The fact that nulls are not counted and are also not equal to each other in group by does indeed explain the rest.

David J.
Tom, thank you for the detailed response. Did not realize t1.* creates a ROW(NULL) value whereas t1.lastName does not.
Thatexplains the difference. 

I also did not know the ::record syntax. Nice tool. Thanks.

I agree it is not a bug. Please close the issue.

Narayanan.

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, October 8, 2021 12:35 PM
To: Narayanan Iyer <nars@yottadb.com>
Cc: pgsql-bugs@lists.postgresql.org; support@yottadb.com
Subject: Re: GROUP BY using tablename.* does not work if tablename has 1 column with NULL values

"Narayanan Iyer" <nars@yottadb.com> writes:
> In the below example, t1 points to a table with just 1 column (lastName) and so I expect the 2 SELECT queries (pasted
below)using t1.lastName or t1.* syntax to produce the exact same results. But the latter produces one extra row of
output(3 rows vs 2 rows). 

I don't see any bug here.  If you take out the grouping it's a bit easier
to see what's going on:

SELECT *, t1.*::record FROM tmp t2 LEFT JOIN (SELECT lastName FROM tmp) AS t1 ON (t2.firstName <= 'Acid');
 id | firstname | lastname | lastname |   t1
----+-----------+----------+----------+--------
  1 | Acid      | Burn     | Burn     | (Burn)
  1 | Acid      | Burn     |          | ()
  2 | Joey      |          |          |
(3 rows)

The first row comes from joining t2 Acid,Burn to t1 Acid,Burn.
The second row comes from joining t2 Acid,Burn to t1 Joey,NULL.
The third row comes from failing to join t2 Joey,NULL to anything.

So in the second row, there is a matched t1 row, and t1.*
therefore represents a non-null composite value that happens
to contain one null field.  In the third row, there is no
matched t1 row at all, so we consider that t1.* is a composite NULL,
which is different from a composite containing NULL fields.

You can argue about whether composite NULL ought to be considered
identical to ROW(NULL).  But there are a lot of contexts where
that would be a bad idea, and very few where it'd be good.

> It seems like a Postgres bug to me. Not sure if there is a Postgres setting that I need to enable in order to avoid
thisdiscrepancy. 

You might be able to get the results you want by explicitly
converting to row-containing-nulls, which is not very hard:

 SELECT *, row(t1.*) FROM tmp t2 LEFT JOIN (SELECT lastName FROM tmp) AS t1 ON (t2.firstName <= 'Acid');
 id | firstname | lastname | lastname |  row
----+-----------+----------+----------+--------
  1 | Acid      | Burn     | Burn     | (Burn)
  1 | Acid      | Burn     |          | ()
  2 | Joey      |          |          | ()
(3 rows)

            regards, tom lane




David,

 

This query was the simplest I could come up with to illustrate what I thought was the issue. It is not a real world query.

 

Narayanan.

 

From: David G. Johnston [mailto:david.g.johnston@gmail.com]
Sent: Friday, October 8, 2021 12:51 PM
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Narayanan Iyer <nars@yottadb.com>; pgsql-bugs@lists.postgresql.org; support@yottadb.com
Subject: Re: GROUP BY using tablename.* does not work if tablename has 1 column with NULL values

 

On Friday, October 8, 2021, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Narayanan Iyer" <nars@yottadb.com> writes:
> In the below example, t1 points to a table with just 1 column (lastName) and so I expect the 2 SELECT queries (pasted below) using t1.lastName or t1.* syntax to produce the exact same results. But the latter produces one extra row of output (3 rows vs 2 rows).

SELECT *, t1.*::record FROM tmp t2 LEFT JOIN (SELECT lastName FROM tmp) AS t1 ON (t2.firstName <= 'Acid');

The third row comes from failing to join t2 Joey,NULL to anything.

 

This is the part that seems the most unusual (from a “why did you write the query that way” perspective, not the result).  A left join’s on clause does not act as a filter for the left side table, so a record with t1.firstName=Joey can still be output.  So, on the whole, this is just a poorly written query that takes too much effort for someone to understand due to the non-traditional use of left join and an on clause that doesn’t actually join and oddly decides to restrict the left side.

 

The fact that nulls are not counted and are also not equal to each other in group by does indeed explain the rest.

 

David J.

Narayanan.

On Fri, 8 Oct 2021 at 18:38, Narayanan Iyer <nars@yottadb.com> wrote:
> The tmp table only has 2 rows so the inner query should have selected two (not three) single value rows.
> So I still do not understand where the 3rd row in the final output comes from.

Your quoting style make this a bit difficult to follow. I may be
confused, but from what I remember your inner query was an outer join
returning three rows, paste it again so we are sure of of which query
we are talking about.

FOS



Francisco,

Sorry about the quoting style. I am used to including the original email in my response for better context. Did not
realizethat would get pasted in the discussion thread. I don't have a way of editing it. If you do, please remove those
prioremails in my response.  

You are right. My inner query was an outer join returning 3 rows. But what I did not understand was that the NULLs in 2
ofthe 3 rows were different because 1 was a composite NULL and 1 was a ROW(NULL) and hence they showed up as 2
differentNULLs/rows even after a GROUP BY. Tom's explanation cleared it up for me. 

So no issues. Like I mentioned in another response, please close this bug report. I don't think I can.

Narayanan.




Narayanan:

On Mon, 11 Oct 2021 at 14:10, Narayanan Iyer <nars@yottadb.com> wrote:
...
> You are right. My inner query was an outer join returning 3 rows. But what I did not understand was that the NULLs in
2of the 3 rows were different because 1 was a composite NULL and 1 was a ROW(NULL) and hence they showed up as 2
differentNULLs/rows even after a GROUP BY. Tom's explanation cleared it up for me. 

Perfect. null rows, row(null) and siblings are extremely tricky, I
always try to avoid them, just pointed it in case you did not run the
subquery first and counted results.

Francisco Olarte.