Thread: BUG #5025: Aggregate function with subquery in 8.3 and 8.4.
The following bug has been logged online: Bug reference: 5025 Logged by: Sheng Y. Cheng Email address: scheng@adconion.com PostgreSQL version: 8.4.0 / 8.3.1 Operating system: Red Hat 4.1.1-52 Description: Aggregate function with subquery in 8.3 and 8.4. Details: Here are some facts and questions about the aggregate function with subquery in 8.3 and 8.4. ================= Question 1. ================== I though the following query would give me the same results in 8.4.0 and 8.3.1. vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv BEGIN; SELECT version(); CREATE TEMPORARY TABLE t1 (f1 text ) on commit drop ; CREATE TEMPORARY TABLE t2 (f1 text ) on commit drop ; INSERT INTO t1 (f1) VALUES ('aaa'); INSERT INTO t1 (f1) VALUES ('bbb'); INSERT INTO t1 (f1) VALUES ('ccc'); INSERT INTO t2 (f1) VALUES ('bbb'); SELECT t1.f1, COUNT(ts.*) FROM t1 LEFT JOIN (SELECT CASE WHEN f1 = '111' THEN '111' ELSE f1 END FROM t2) AS ts ON t1.f1 = ts.f1 GROUP BY t1.f1; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ However, In 8.3.1 I got the following. vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv BEGIN version ---------------------------------------------------------------------------- ------------------------------- PostgreSQL 8.3.1 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-52) (1 row) CREATE TABLE CREATE TABLE INSERT 0 1 INSERT 0 1 INSERT 0 1 INSERT 0 1 f1 | count -----+------- aaa | 0 bbb | 1 ccc | 0 (3 rows) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Whereas, in 8.4.0 I got the following. vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv BEGIN version ---------------------------------------------------------------------------- --------------------------------------- PostgreSQL 8.4.0 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-52), 64-bit (1 row) CREATE TABLE CREATE TABLE INSERT 0 1 INSERT 0 1 INSERT 0 1 INSERT 0 1 f1 | count -----+------- aaa | 1 bbb | 1 ccc | 1 (3 rows) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The Session 4.2.7. Aggregate Expressions in 8.3 document at http://www.postgresql.org/docs/8.3/static/sql-expressions.html states "The last form invokes the aggregate once for each input row regardless of null or non-null values." I am wondering if the result I saw from 8.4.0 is a bug fix for 8.3.1? ================= Question 2. ================== vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv BEGIN; SELECT version(); CREATE TEMPORARY TABLE t1 (f1 text ) on commit drop ; CREATE TEMPORARY TABLE t2 (f1 text ) on commit drop ; INSERT INTO t1 (f1) VALUES ('aaa'); INSERT INTO t1 (f1) VALUES ('bbb'); INSERT INTO t1 (f1) VALUES ('ccc'); INSERT INTO t2 (f1) VALUES ('bbb'); SELECT t1.f1, COUNT(ts.*) FROM t1 LEFT JOIN (SELECT f1 FROM t2) AS ts ON t1.f1 = ts.f1 GROUP BY t1.f1; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I though the result of the above query would be the following. f1 | count -----+------- aaa | 0 bbb | 1 ccc | 0 however, I got the following in both 8.4.0 and 8.3.1. Result from 8.3.1. vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv BEGIN version ---------------------------------------------------------------------------- ------------------------------- PostgreSQL 8.3.1 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-52) (1 row) CREATE TABLE CREATE TABLE INSERT 0 1 INSERT 0 1 INSERT 0 1 INSERT 0 1 f1 | count -----+------- aaa | 1 bbb | 1 ccc | 1 (3 rows) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Result from 8.4.0. vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv BEGIN version ---------------------------------------------------------------------------- --------------------------------------- PostgreSQL 8.4.0 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-52), 64-bit (1 row) CREATE TABLE CREATE TABLE INSERT 0 1 INSERT 0 1 INSERT 0 1 INSERT 0 1 f1 | count -----+------- aaa | 1 bbb | 1 ccc | 1 (3 rows) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Is this how Postgres works for aggregate function? Thank you, Sheng
Sheng Y. Cheng wrote: > The Session 4.2.7. Aggregate Expressions in 8.3 document at > http://www.postgresql.org/docs/8.3/static/sql-expressions.html states "The > last form invokes the aggregate once for each input row regardless of null > or non-null values." I am wondering if the result I saw from 8.4.0 is a bug > fix for 8.3.1? Well, a COUNT(ts.*) is in fact not of the last form, but the first. "ts.*" is a whole-row reference to t2, like just "ts" would be (as in "COUNT(t2)"). But there indeed seems to be something wrong. The query can be reduced into: SELECT t1.f1, COUNT(ts) FROM t1 LEFT JOIN (SELECT f1 As f1 FROM t2 OFFSET 0) AS ts ON t1.f1 = ts.f1 GROUP BY t1.f1; With this you can reproduce the discrepancy in CVS HEAD alone - the query produces a different result if you remove the "OFFSET 0": postgres=# SELECT t1.f1, COUNT(ts) FROM t1 postgres-# LEFT JOIN postgres-# (SELECT f1 As f1 FROM t2 OFFSET 0) AS ts postgres-# ON t1.f1 = ts.f1 postgres-# GROUP BY t1.f1; f1 | count -----+------- aaa | 0 bbb | 1 ccc | 0 (3 rows) postgres=# SELECT t1.f1, COUNT(ts) FROM t1 LEFT JOIN (SELECT f1 As f1 FROM t2 /* OFFSET 0 */) AS ts ON t1.f1 = ts.f1 GROUP BY t1.f1; f1 | count -----+------- aaa | 1 bbb | 1 ccc | 1 (3 rows) Without the OFFSET, the subquery is "pulled up" into the top query, and that optimization makes the difference. PostgreSQL 8.4 is more aggressive at that optimization, which is why you saw different results on 8.3 and 8.4. The "pullup" code transforms the "ts" reference into a ROW constructor: postgres=# explain verbose SELECT t1.f1, COUNT(ts) FROM t1 LEFT JOIN (SELECT f1 As f1 FROM t2 /* OFFSET 0 */) AS ts ON t1.f1 = ts.f1 GROUP BY t1.f1; QUERY PLAN -------------------------------------------------------------------------------- GroupAggregate (cost=181.86..362.51 rows=200 width=64) Output: t1.f1, count(ROW(t2.f1)) ... That transformation isn't 100% accurate. A ROW expression with all NULL columns is not the same thing as a NULL whole-row expression when it comes to STRICT functions - a strict function is invoked with the former, but not the latter, even though both return true for an IS NULL test. That let's us write the test case as: CREATE FUNCTION stricttest (a anyelement) RETURNS boolean AS $$ SELECT true; $$ LANGUAGE SQL STRICT; postgres=# SELECT t1.f1, stricttest(ts) FROM t1 LEFT JOIN (SELECT f1 As f1 FROM t2 OFFSET 0) AS ts ON t1.f1 = ts.f1; f1 | stricttest -----+------------ aaa | bbb | t ccc | (3 rows) postgres=# SELECT t1.f1, stricttest(ts) FROM t1 LEFT JOIN (SELECT f1 As f1 FROM t2 /* OFFSET 0 */) AS ts ON t1.f1 = ts.f1; f1 | stricttest -----+------------ aaa | t bbb | t ccc | t (3 rows) I can see two possible interpretations for this: 1. The subquery pull-up code is broken, the transformation of a whole-row reference to ROW(...) is not valid. 2. The semantics of STRICT with row arguments is broken. It should be made consistent with IS NULL. Strict function should not be called if the argument is a row value with all NULL columns. I'm not sure which interpretation is correct. Thoughts? The SQL spec probably has something to say about this. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
2009/9/1 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>: > I can see two possible interpretations for this: > > 1. The subquery pull-up code is broken, the transformation of a > whole-row reference to ROW(...) is not valid. > > 2. The semantics of STRICT with row arguments is broken. It should be > made consistent with IS NULL. Strict function should not be called if > the argument is a row value with all NULL columns. > > I'm not sure which interpretation is correct. Thoughts? The SQL spec > probably has something to say about this. I suppose ts.* that wasn't joined is NULL. Not "a row value with all NULL columns" but "a NULL row value". contrib_regression=# SELECT t1.f1, ts.* IS NULL, ts.* FROM t1 LEFT JOIN (SELECT f1 FROM t2 -- offset 0 ) AS ts ON t1.f1 = ts.f1 ; f1 | ?column? | f1 -----+----------+----- aaa | t | bbb | f | bbb ccc | t | (3 rows) So the 1. ROW(...) construction seems not valid. > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com > > -- > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-bugs > Regards, -- Hitoshi Harada
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > I can see two possible interpretations for this: > 1. The subquery pull-up code is broken, the transformation of a > whole-row reference to ROW(...) is not valid. I think the problem is probably that we need a PlaceHolderVar wrapper around the ROW() constructor. > 2. The semantics of STRICT with row arguments is broken. It should be > made consistent with IS NULL. Well, that's a different argument. The immediate problem is that this case doesn't behave consistently with pre-8.4 behavior, which was not an intended change --- so I think we'd better make it work like before. regards, tom lane
On Tue, Sep 1, 2009 at 4:11 AM, Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> wrote: > 2. The semantics of STRICT with row arguments is broken. It should be > made consistent with IS NULL. Strict function should not be called if > the argument is a row value with all NULL columns. not just STRICT, but coalesce(), libpq 'is null' bit, plpgsql row variables, type input/output, joins, etc. see recent threads on -hackers and -bugs note that fixing this would break code in the field (like mine for example) :-). merlin
I wrote: > I think the problem is probably that we need a PlaceHolderVar wrapper > around the ROW() constructor. I looked into this a bit more. The issue is that flattening of subqueries that are inside an outer join first puts PlaceHolderVars around non-nullable output expressions of the subquery, and then applies ResolveNew() to substitute those expressions for upper references to the subquery outputs. However, a whole-row Var referencing the subquery is expanded by ResolveNew into a ROW() expression over the subquery outputs. To preserve compatibility with pre-8.4 behavior, what we really need here is to have a PlaceHolderVar around the ROW(), not on the individual expressions inside it. While it's not that hard to put in the PlaceHolderVar, the easy way to do it would require passing the PlannerInfo "root" struct to ResolveNew, which goes well beyond my threshold of pain from a modularity standpoint --- ResolveNew isn't part of the planner and has no business using that struct. ResolveNew's API is a study in ugliness already, so I'm thinking it's time to bite the bullet and refactor it. The idea that comes to mind is to provide a callback function and "void *" context argument, which ResolveNew would call upon finding a Var that needs substitution. The existing guts of ResolveNew would move into a callback that is specific to rewriteHandler.c's uses, and we'd write a different one for the planner. The PlannerInfo link would be hidden within the "void *" argument so we'd avoid exposing it to rewriter code. Comments? I believe BTW that there are related issues in other places where we expand composites into RowExprs. But the other places have been doing that for awhile. I think that for 8.4 our goals should be limited to not changing the behavior compared to prior releases. If any consensus is reached on the general issue of how we want "row() is null" to behave, then it'll be time to reconsider the other stuff. regards, tom lane
I blithely opined: > I believe BTW that there are related issues in other places where we > expand composites into RowExprs. But the other places have been doing > that for awhile. I think that for 8.4 our goals should be limited to > not changing the behavior compared to prior releases. So while I was testing my fix for this, I found out that that's more complicated than I thought. Consider these examples in the regression database: select t1.q2, count(t2.*) from int8_tbl t1 left join int8_tbl t2 on (t1.q2 = t2.q1) group by t1.q2 order by 1; select t1.q2, count(t2.*) from int8_tbl t1 left join (select * from int8_tbl) t2 on (t1.q2 = t2.q1) group by t1.q2 order by 1; select t1.q2, count(t2.*) from int8_tbl t1 left join (select * from int8_tbl offset 0) t2 on (t1.q2 = t2.q1) group by t1.q2 order by 1; select t1.q2, count(t2.*) from int8_tbl t1 left join (select q1, case when q2=1 then 1 else q2 end as q2 from int8_tbl) t2 on (t1.q2 = t2.q1) group by t1.q2 order by 1; If you believe that "t2.*" should go to NULL in a join-extended row, then the correct answer for all four of these is q2 | count -------------------+------- -4567890123456789 | 0 123 | 2 456 | 0 4567890123456789 | 6 (4 rows) However, the actual behavior of every release since 8.0 has been that the second case gives q2 | count -------------------+------- -4567890123456789 | 1 123 | 2 456 | 1 4567890123456789 | 6 (4 rows) ie, t2.* fails to go to NULL because it's expanded as ROW(t2.q1,t2.q2). The OFFSET 0 in the third case restores expected behavior by preventing flattening of the subquery, and up till 8.4 the CASE expression in the fourth case did too. With the fix I was just about to apply, all four cases give the first set of results. This clearly satisfies the principle of least astonishment, at least more nearly than what we have; but it equally clearly is *not* going to restore 8.4 to work just like 8.3. I'm inclined to apply the patch to 8.4 anyway, because it seems like a bug fix. I would consider patching further back except there's no chance of making it work in older branches, at least not without destabilizing them quite a bit (the PlaceHolderVar mechanism would have to be back-ported). It might be possible to fix the older branches by not flattening subqueries that have whole-row references; but even that would take nontrivial work, and it'd be sacrificing performance to fix a corner case no one has previously complained about. So I'm leaning to patching 8.4 and leaving the older branches alone. Thoughts? regards, tom lane
Tom Lane wrote: > With the fix I was just about to apply, all four cases give the first > set of results. This clearly satisfies the principle of least > astonishment, at least more nearly than what we have; but it equally > clearly is *not* going to restore 8.4 to work just like 8.3. Right, 8.3 had the same underlying problem, 8.4 just makes it more visible as it's better at flattening subqueries. > I'm inclined to apply the patch to 8.4 anyway, because it seems like a > bug fix. I would consider patching further back except there's no > chance of making it work in older branches, at least not without > destabilizing them quite a bit (the PlaceHolderVar mechanism would have > to be back-ported). > > It might be possible to fix the older branches by not flattening > subqueries that have whole-row references; but even that would take > nontrivial work, and it'd be sacrificing performance to fix a corner > case no one has previously complained about. So I'm leaning to patching > 8.4 and leaving the older branches alone. > > Thoughts? Seems reasonable. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Tom Lane wrote: >> With the fix I was just about to apply, all four cases give the first >> set of results. This clearly satisfies the principle of least >> astonishment, at least more nearly than what we have; but it equally >> clearly is *not* going to restore 8.4 to work just like 8.3. > Right, 8.3 had the same underlying problem, 8.4 just makes it more > visible as it's better at flattening subqueries. What is interesting is that the CASE in the OP's original submission is apparently only there to dodge the visible-since-8.0 version of the problem; at least I can't see that it does anything else useful. The complaint apparently is not so much that 8.3 was right, as that the workaround for its bug stopped working ... >> ... So I'm leaning to patching >> 8.4 and leaving the older branches alone. >> >> Thoughts? > Seems reasonable. Will apply fix shortly --- I thought of one minor improvement to make (the code as it stands is generating redundant PlaceHolderVars). regards, tom lane
Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > > Tom Lane wrote: > >> With the fix I was just about to apply, all four cases give the first > >> set of results. This clearly satisfies the principle of least > >> astonishment, at least more nearly than what we have; but it equally > >> clearly is *not* going to restore 8.4 to work just like 8.3. > > > Right, 8.3 had the same underlying problem, 8.4 just makes it more > > visible as it's better at flattening subqueries. > > What is interesting is that the CASE in the OP's original submission > is apparently only there to dodge the visible-since-8.0 version of > the problem; at least I can't see that it does anything else useful. > The complaint apparently is not so much that 8.3 was right, as that > the workaround for its bug stopped working ... In that light, it probably doesn't make much sense to backport the fix further back, given that the people (person?) bitten by the bug surely must already be working around it. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> What is interesting is that the CASE in the OP's original submission >> is apparently only there to dodge the visible-since-8.0 version of >> the problem; at least I can't see that it does anything else useful. >> The complaint apparently is not so much that 8.3 was right, as that >> the workaround for its bug stopped working ... > In that light, it probably doesn't make much sense to backport the fix > further back, given that the people (person?) bitten by the bug surely > must already be working around it. Yeah, and there might even be people relying on the incorrect behavior. I'm not too worried about changing it in 8.4 since that's such a young release, but it's a bit harder to make a case for that in older branches. regards, tom lane