Thread: GROUP BY using tablename.* does not work if tablename has 1 column with NULL values
GROUP BY using tablename.* does not work if tablename has 1 column with NULL values
Hi,
In the below example, t1 points to a table with just 1 column (lastName) and so I expect the 2 SELECT queries (pasted below) using t1.lastName or t1.* syntax to produce the exact same results. But the latter produces one extra row of output (3 rows vs 2 rows).
$ psql --version
psql (PostgreSQL) 13.4 (Ubuntu 13.4-0ubuntu0.21.04.1)
$ psql db
db=> DROP TABLE IF EXISTS tmp;
DROP TABLE
db=> CREATE TABLE tmp (id INTEGER PRIMARY KEY, firstName VARCHAR(30), lastName VARCHAR(30));
CREATE TABLE
db=> INSERT INTO tmp VALUES (1,'Acid','Burn');
INSERT 0 1
db=> INSERT INTO tmp VALUES (2,'Joey',NULL);
INSERT 0 1
db=> SELECT COUNT(t1.lastName) FROM tmp t2 LEFT JOIN (SELECT lastName FROM tmp) AS t1 ON (t2.firstName <= 'Acid') GROUP BY t1.lastName;
count
0
1
(2 rows)
db=> SELECT COUNT(t1.*) FROM tmp t2 LEFT JOIN (SELECT lastName FROM tmp) AS t1 ON (t2.firstName <= 'Acid') GROUP BY t1.*;
count
1
1
0
(3 rows)
It seems like a Postgres bug to me. Not sure if there is a Postgres setting that I need to enable in order to avoid this discrepancy.
Let me know if you need any more information.
Thanks,
Narayanan.
Re: GROUP BY using tablename.* does not work if tablename has 1 column with NULL values
On Fri, 8 Oct 2021 at 17:54, Narayanan Iyer <nars@yottadb.com> wrote: > In the below example, t1 points to a table with just 1 column (lastName) and so I expect the 2 SELECT queries (pasted below)using t1.lastName or t1.* syntax to produce the exact same results. But the latter produces one extra row of output(3 rows vs 2 rows). Given your inner query select three single value rows, one of them is null, and https://www.postgresql.org/docs/13/functions-aggregate.html says: count ( * ) → bigint Computes the number of input rows. count ( "any" ) → bigint Computes the number of input rows in which the input value is not null. It does not seem like a bug ( * does not mean the same everywhere ). It maybe a missfeature or a missinterpretation of the std, but given how careful developers are on this one I highly doubt it, I would vote for erroneous expectations. Francisco Olarte.
Re: GROUP BY using tablename.* does not work if tablename has 1 column with NULL values
"Narayanan Iyer" <nars@yottadb.com> writes: > In the below example, t1 points to a table with just 1 column (lastName) and so I expect the 2 SELECT queries (pasted below)using t1.lastName or t1.* syntax to produce the exact same results. But the latter produces one extra row of output(3 rows vs 2 rows). I don't see any bug here. If you take out the grouping it's a bit easier to see what's going on: SELECT *, t1.*::record FROM tmp t2 LEFT JOIN (SELECT lastName FROM tmp) AS t1 ON (t2.firstName <= 'Acid'); id | firstname | lastname | lastname | t1 ----+-----------+----------+----------+-------- 1 | Acid | Burn | Burn | (Burn) 1 | Acid | Burn | | () 2 | Joey | | | (3 rows) The first row comes from joining t2 Acid,Burn to t1 Acid,Burn. The second row comes from joining t2 Acid,Burn to t1 Joey,NULL. The third row comes from failing to join t2 Joey,NULL to anything. So in the second row, there is a matched t1 row, and t1.* therefore represents a non-null composite value that happens to contain one null field. In the third row, there is no matched t1 row at all, so we consider that t1.* is a composite NULL, which is different from a composite containing NULL fields. You can argue about whether composite NULL ought to be considered identical to ROW(NULL). But there are a lot of contexts where that would be a bad idea, and very few where it'd be good. > It seems like a Postgres bug to me. Not sure if there is a Postgres setting that I need to enable in order to avoid thisdiscrepancy. You might be able to get the results you want by explicitly converting to row-containing-nulls, which is not very hard: SELECT *, row(t1.*) FROM tmp t2 LEFT JOIN (SELECT lastName FROM tmp) AS t1 ON (t2.firstName <= 'Acid'); id | firstname | lastname | lastname | row ----+-----------+----------+----------+-------- 1 | Acid | Burn | Burn | (Burn) 1 | Acid | Burn | | () 2 | Joey | | | () (3 rows) regards, tom lane
Re: GROUP BY using tablename.* does not work if tablename has 1 column with NULL values
"Narayanan Iyer" <nars@yottadb.com> writes:
> In the below example, t1 points to a table with just 1 column (lastName) and so I expect the 2 SELECT queries (pasted below) using t1.lastName or t1.* syntax to produce the exact same results. But the latter produces one extra row of output (3 rows vs 2 rows).
SELECT *, t1.*::record FROM tmp t2 LEFT JOIN (SELECT lastName FROM tmp) AS t1 ON (t2.firstName <= 'Acid');
The third row comes from failing to join t2 Joey,NULL to anything.
RE: GROUP BY using tablename.* does not work if tablename has 1 column with NULL values
Tom, thank you for the detailed response. Did not realize t1.* creates a ROW(NULL) value whereas t1.lastName does not. Thatexplains the difference. I also did not know the ::record syntax. Nice tool. Thanks. I agree it is not a bug. Please close the issue. Narayanan. -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Friday, October 8, 2021 12:35 PM To: Narayanan Iyer <nars@yottadb.com> Cc: pgsql-bugs@lists.postgresql.org; support@yottadb.com Subject: Re: GROUP BY using tablename.* does not work if tablename has 1 column with NULL values "Narayanan Iyer" <nars@yottadb.com> writes: > In the below example, t1 points to a table with just 1 column (lastName) and so I expect the 2 SELECT queries (pasted below)using t1.lastName or t1.* syntax to produce the exact same results. But the latter produces one extra row of output(3 rows vs 2 rows). I don't see any bug here. If you take out the grouping it's a bit easier to see what's going on: SELECT *, t1.*::record FROM tmp t2 LEFT JOIN (SELECT lastName FROM tmp) AS t1 ON (t2.firstName <= 'Acid'); id | firstname | lastname | lastname | t1 ----+-----------+----------+----------+-------- 1 | Acid | Burn | Burn | (Burn) 1 | Acid | Burn | | () 2 | Joey | | | (3 rows) The first row comes from joining t2 Acid,Burn to t1 Acid,Burn. The second row comes from joining t2 Acid,Burn to t1 Joey,NULL. The third row comes from failing to join t2 Joey,NULL to anything. So in the second row, there is a matched t1 row, and t1.* therefore represents a non-null composite value that happens to contain one null field. In the third row, there is no matched t1 row at all, so we consider that t1.* is a composite NULL, which is different from a composite containing NULL fields. You can argue about whether composite NULL ought to be considered identical to ROW(NULL). But there are a lot of contexts where that would be a bad idea, and very few where it'd be good. > It seems like a Postgres bug to me. Not sure if there is a Postgres setting that I need to enable in order to avoid thisdiscrepancy. You might be able to get the results you want by explicitly converting to row-containing-nulls, which is not very hard: SELECT *, row(t1.*) FROM tmp t2 LEFT JOIN (SELECT lastName FROM tmp) AS t1 ON (t2.firstName <= 'Acid'); id | firstname | lastname | lastname | row ----+-----------+----------+----------+-------- 1 | Acid | Burn | Burn | (Burn) 1 | Acid | Burn | | () 2 | Joey | | | () (3 rows) regards, tom lane
RE: GROUP BY using tablename.* does not work if tablename has 1 column with NULL values
David,
This query was the simplest I could come up with to illustrate what I thought was the issue. It is not a real world query.
Narayanan.
From: David G. Johnston [mailto:david.g.johnston@gmail.com]
Sent: Friday, October 8, 2021 12:51 PM
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Narayanan Iyer <nars@yottadb.com>; pgsql-bugs@lists.postgresql.org; support@yottadb.com
Subject: Re: GROUP BY using tablename.* does not work if tablename has 1 column with NULL values
On Friday, October 8, 2021, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Narayanan Iyer" <nars@yottadb.com> writes:
> In the below example, t1 points to a table with just 1 column (lastName) and so I expect the 2 SELECT queries (pasted below) using t1.lastName or t1.* syntax to produce the exact same results. But the latter produces one extra row of output (3 rows vs 2 rows).
SELECT *, t1.*::record FROM tmp t2 LEFT JOIN (SELECT lastName FROM tmp) AS t1 ON (t2.firstName <= 'Acid');
The third row comes from failing to join t2 Joey,NULL to anything.
This is the part that seems the most unusual (from a “why did you write the query that way” perspective, not the result). A left join’s on clause does not act as a filter for the left side table, so a record with t1.firstName=Joey can still be output. So, on the whole, this is just a poorly written query that takes too much effort for someone to understand due to the non-traditional use of left join and an on clause that doesn’t actually join and oddly decides to restrict the left side.
The fact that nulls are not counted and are also not equal to each other in group by does indeed explain the rest.
David J.
Re: GROUP BY using tablename.* does not work if tablename has 1 column with NULL values
Narayanan. On Fri, 8 Oct 2021 at 18:38, Narayanan Iyer <nars@yottadb.com> wrote: > The tmp table only has 2 rows so the inner query should have selected two (not three) single value rows. > So I still do not understand where the 3rd row in the final output comes from. Your quoting style make this a bit difficult to follow. I may be confused, but from what I remember your inner query was an outer join returning three rows, paste it again so we are sure of of which query we are talking about. FOS
RE: GROUP BY using tablename.* does not work if tablename has 1 column with NULL values
Francisco, Sorry about the quoting style. I am used to including the original email in my response for better context. Did not realizethat would get pasted in the discussion thread. I don't have a way of editing it. If you do, please remove those prioremails in my response. You are right. My inner query was an outer join returning 3 rows. But what I did not understand was that the NULLs in 2 ofthe 3 rows were different because 1 was a composite NULL and 1 was a ROW(NULL) and hence they showed up as 2 differentNULLs/rows even after a GROUP BY. Tom's explanation cleared it up for me. So no issues. Like I mentioned in another response, please close this bug report. I don't think I can. Narayanan.
Re: GROUP BY using tablename.* does not work if tablename has 1 column with NULL values
Narayanan: On Mon, 11 Oct 2021 at 14:10, Narayanan Iyer <nars@yottadb.com> wrote: ... > You are right. My inner query was an outer join returning 3 rows. But what I did not understand was that the NULLs in 2of the 3 rows were different because 1 was a composite NULL and 1 was a ROW(NULL) and hence they showed up as 2 differentNULLs/rows even after a GROUP BY. Tom's explanation cleared it up for me. Perfect. null rows, row(null) and siblings are extremely tricky, I always try to avoid them, just pointed it in case you did not run the subquery first and counted results. Francisco Olarte.