Thread: Re: [SQL] MAX() of 0 records.
Paul McGarry <paulm@opentec.com.au> writes: > CREATE TABLE entry_stats > ( > entry_id INT4 NOT NULL REFERENCES entry ON DELETE CASCADE, > entry_minprice INT4 NOT NULL DEFAULT 0 > ); > > CREATE TABLE item( > item_id INT4 PRIMARY KEY, > item_entry_id INT4 NOT NULL REFERENCES entry ON DELETE NO ACTION, > item_price INT4 NOT NULL, > item_live bool NOT NULL DEFAULT 'n' > ); > > [trigger using] > > UPDATE entry_stats > SET entry_minprice=min(item_price) > FROM item where item_entry_id=NEW.item_entry_id AND item_live='t'; > > ERROR: ExecutePlan: (junk) `ctid' is NULL! Hmm. There are several things going on here, but one thing that needs clarification is whether this UPDATE is written correctly. Since it has no constraint on entry_stats, it seems to me that *every* row of entry_stats will have entry_minprice set to the same value, namely the minimum item_price over those item rows that satisfy the WHERE condition. Surely that wasn't what you wanted? Shouldn't there be an additional WHERE clause like entry_id = item_entry_id? Anyway, the proximate cause of the error message is as follows. A cross-table UPDATE like this is actually implemented as if it were a SELECT:SELECT entry_stats.ctid, min(item_price)FROM entry_stats, item WHERE ...; For each row emitted by this underlying SELECT, the executor takes the ctid result column (which identifies the particular target tuple in the target table) and updates that tuple by stuffing the additional SELECT result column(s) into the specified fields of that tuple. Now, if you try a SELECT like the above in a situation where there are no tuples matching the WHERE clause, what you get out is a row of all NULLs --- because that's what you get from SELECT if there's an aggregate function with no GROUP BY and no input rows. The executor gets this dummy row, tries to do a tuple update using it, and chokes because the ctid is NULL. So that explains why the error message is what it is. Next question is what if anything should be done differently. We could just have the executor ignore result rows where ctid is NULL, but that seems like patching around the problem not fixing it. The thing that jumps out at me is that if you actually try the SELECT illustrated above, you do not get any row, null or otherwise; you get ERROR: Attribute entry_stats.ctid must be GROUPed or used in an aggregate function which is a mighty valid complaint. If you are aggregating rows to get the MIN() then you don't have a unique ctid to deliver, so which row ought to be updated? This is the system's way of expressing the same concern I started with: this query doesn't seem to be well-posed. You don't see this complaint when you try the UPDATE, because ctid is added to the implicit select result in a back-door way that doesn't get checked for GROUP-BY validity. I wonder whether that is the bug. If so, we'd basically be saying that no query like this is valid (since UPDATE doesn't have a GROUP BY option, there'd be no way to pass the grouping check). Another way to look at it is that perhaps an UPDATE involving aggregate functions ought to be implicitly treated as GROUP BY targetTable.ctid. In other words, the MIN() or other aggregate function is implicitly evaluated over only those join tuples that are formed for a single target tuple. Intuitively that seems to make sense, and it solves the problem you're complaining of, because no matching tuples = no groups = no result tuples = update does nothing = no problem. But I have a sneaking suspicion that I'm missing some nasty problem with this idea too. Comments anyone? What exactly *should* be the behavior of an UPDATE that uses an aggregate function and a join to another table? Over what set of tuples should the aggregate be evaluated? regards, tom lane
Another observation is that if the WHERE clause is successful, it seems to update the first record in the target relation that it finds which is a pretty random result. pghack=# create table e(ee text, eee integer); CREATE pghack=# create table f(ff text, fff integer); CREATE pghack=# insert into e values('e', 1); INSERT 18871 1 pghack=# insert into e values('ee', 2); INSERT 18872 1 pghack=# insert into e values('eee', 3); INSERT 18873 1 pghack=# insert into f values('fff', 3); INSERT 18874 1 pghack=# insert into f values('ff', 2); INSERT 18875 1 pghack=# insert into f values('f', 1); INSERT 18876 1 pghack=# update e set eee=min(f.fff) from f; UPDATE 1 pghack=# select * from e;ee | eee -----+-----ee | 2eee | 3e | 1 (3 rows) pghack=# select min(f.fff) from f;min ----- 1 (1 row) pghack=# update e set eee=min(f.fff) from f; UPDATE 1 pghack=# select min(f.fff) from f;min ----- 1 (1 row) pghack=# select * from e;ee | eee -----+-----eee | 3e | 1ee | 1 (3 rows) pghack=# update e set eee=min(f.fff) from f; UPDATE 1 pghack=# select * from e;ee | eee -----+-----e | 1ee | 1eee | 1 (3 rows) Tom Lane wrote: > > Paul McGarry <paulm@opentec.com.au> writes: > > CREATE TABLE entry_stats > > ( > > entry_id INT4 NOT NULL REFERENCES entry ON DELETE CASCADE, > > entry_minprice INT4 NOT NULL DEFAULT 0 > > ); > > > > CREATE TABLE item( > > item_id INT4 PRIMARY KEY, > > item_entry_id INT4 NOT NULL REFERENCES entry ON DELETE NO ACTION, > > item_price INT4 NOT NULL, > > item_live bool NOT NULL DEFAULT 'n' > > ); > > > > [trigger using] > > > > UPDATE entry_stats > > SET entry_minprice=min(item_price) > > FROM item where item_entry_id=NEW.item_entry_id AND item_live='t'; > > > > ERROR: ExecutePlan: (junk) `ctid' is NULL! > > Hmm. There are several things going on here, but one thing that needs > clarification is whether this UPDATE is written correctly. Since it > has no constraint on entry_stats, it seems to me that *every* row of > entry_stats will have entry_minprice set to the same value, namely > the minimum item_price over those item rows that satisfy the WHERE > condition. Surely that wasn't what you wanted? Shouldn't there be an > additional WHERE clause like entry_id = item_entry_id? > > Anyway, the proximate cause of the error message is as follows. > A cross-table UPDATE like this is actually implemented as if it were > a SELECT: > SELECT entry_stats.ctid, min(item_price) > FROM entry_stats, item WHERE ...; > For each row emitted by this underlying SELECT, the executor takes > the ctid result column (which identifies the particular target tuple > in the target table) and updates that tuple by stuffing the additional > SELECT result column(s) into the specified fields of that tuple. > > Now, if you try a SELECT like the above in a situation where there are > no tuples matching the WHERE clause, what you get out is a row of all > NULLs --- because that's what you get from SELECT if there's an > aggregate function with no GROUP BY and no input rows. The executor > gets this dummy row, tries to do a tuple update using it, and chokes > because the ctid is NULL. So that explains why the error message is > what it is. Next question is what if anything should be done > differently. We could just have the executor ignore result rows where > ctid is NULL, but that seems like patching around the problem not fixing > it. > > The thing that jumps out at me is that if you actually try the SELECT > illustrated above, you do not get any row, null or otherwise; you get > ERROR: Attribute entry_stats.ctid must be GROUPed or used in an > aggregate function > which is a mighty valid complaint. If you are aggregating rows to get > the MIN() then you don't have a unique ctid to deliver, so which row > ought to be updated? This is the system's way of expressing the same > concern I started with: this query doesn't seem to be well-posed. > > You don't see this complaint when you try the UPDATE, because ctid > is added to the implicit select result in a back-door way that doesn't > get checked for GROUP-BY validity. I wonder whether that is the bug. > If so, we'd basically be saying that no query like this is valid > (since UPDATE doesn't have a GROUP BY option, there'd be no way to > pass the grouping check). > > Another way to look at it is that perhaps an UPDATE involving aggregate > functions ought to be implicitly treated as GROUP BY targetTable.ctid. > In other words, the MIN() or other aggregate function is implicitly > evaluated over only those join tuples that are formed for a single > target tuple. Intuitively that seems to make sense, and it solves the > problem you're complaining of, because no matching tuples = no groups = > no result tuples = update does nothing = no problem. But I have a > sneaking suspicion that I'm missing some nasty problem with this idea > too. > > Comments anyone? What exactly *should* be the behavior of an UPDATE > that uses an aggregate function and a join to another table? Over what > set of tuples should the aggregate be evaluated? > > regards, tom lane
Chris Bitmead <chris@bitmead.com> writes: > Another observation is that if the WHERE clause is successful, it seems > to update the first record in the target relation that it finds which is > a pretty random result. Wouldn't surprise me --- leastwise, you will get a random one of the input ctid values emitted into the aggregated SELECT row. Offhand I'd have expected the last-scanned one, not the first-scanned, but the point is that the behavior is dependent on the implementation's choice of scanning order. This is exactly the uncertainty that the check for "attribute must be GROUPed or used in an aggregate function" is designed to protect you from. But ctid is (currently) escaping that check. It seems to me that we have two reasonable ways to proceed: 1. Forbid aggregates at the top level of UPDATE. Then you'd need to do a subselect, perhaps something likeUPDATE fooSET bar = (SELECT min(f1) FROM othertab WHERE othertab.keycol = foo.keycol)WHEREcondition-determining-which-foo-rows-to-update if you wanted to use an aggregate. This is pretty ugly, especially so if the outer WHERE condition is itself dependent on scanning othertab to see if there are matches to the foo row. 2. Do an implicit GROUP BY ctid as I suggested last night. I still don't see any holes in that idea, but I am still worried that there might be one. regards, tom lane
On Fri, 07 Jul 2000, Tom Lane wrote: > Chris Bitmead <chris@bitmead.com> writes: > UPDATE foo > SET bar = (SELECT min(f1) FROM othertab > WHERE othertab.keycol = foo.keycol) > WHERE condition-determining-which-foo-rows-to-update > if you wanted to use an aggregate. This is pretty ugly, especially so If you use min(x) or max(x) frequently, isn't it best to make a trigger that intercepts x on insert and update, then check it and store it somewhere rather than scanning for it everytime? (not that this fixes any db problem thats being discussed here) -- Robert
Paul McGarry <paulm@opentec.com.au> writes: > Would that mean that any update that used an aggregate function > would be invalid? That would be a bit scary seeing as I am doing > this in part to get around using aggregate functions in a view. You'd have to embed the aggregate in a sub-select if we did things that way. I'd rather not have such a restriction, but only if we can understand clearly what it means to put an aggregate directly into UPDATE. The executive summary of what I said before is "exactly what SHOULD this query do, anyway?" I think it's not well-defined without some additional assumptions. >> Another way to look at it is that perhaps an UPDATE involving aggregate >> functions ought to be implicitly treated as GROUP BY targetTable.ctid. > What exactly is a ctid? Physical location of the tuple, expressed as block# and tuple# within the file. Try "select ctid,* from sometable" ... regards, tom lane
"Robert B. Easter" <reaster@comptechnews.com> writes: > If you use min(x) or max(x) frequently, isn't it best to make a > trigger that intercepts x on insert and update, then check it and > store it somewhere rather than scanning for it everytime? I believe that's exactly what the original questioner is trying to do... regards, tom lane
I wrote: > Comments anyone? What exactly *should* be the behavior of an UPDATE > that uses an aggregate function and a join to another table? Over what > set of tuples should the aggregate be evaluated? Further note on this: SQL99 specifies: <update statement: searched> ::= UPDATE <target table> SET <set clause list> [ WHERE <search condition> ] ... 5) A <value expression> in a <set clause> shall not directly contain a <set function specification>. so the construct is definitely not SQL-compliant. Maybe we should just forbid it. However, if you are joining against another table (which itself is not an SQL feature) then it seems like there is some potential use in it. What do people think of my implicit-GROUP-BY-ctid idea? That would basically say that the aggregate is computed over all the tuples that join to a single target tuple. regards, tom lane
At 14:35 9/07/00 -0400, Tom Lane wrote: > >so the construct is definitely not SQL-compliant. Maybe we should just >forbid it. However, if you are joining against another table (which >itself is not an SQL feature) then it seems like there is some potential >use in it. What do people think of my implicit-GROUP-BY-ctid idea? >That would basically say that the aggregate is computed over all the >tuples that join to a single target tuple. Sounds perfect to me... ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: >> What do people think of my implicit-GROUP-BY-ctid idea? >> That would basically say that the aggregate is computed over all the >> tuples that join to a single target tuple. > Sounds perfect to me... Note that it would not meet your expectation that update t1 set f2=count(*) from t2 where t1.f1=2 and t2.f1=t1.f1 ; means the same as update t1 set f2=(Select Count(*) from t2 where t2.f1=t1.f1) where t1.f1 = 2 ... at least not without some kind of outer-join support too. With an inner join, t1 tuples not matching any t2 tuple wouldn't be modified at all. regards, tom lane
At 21:21 9/07/00 -0400, Tom Lane wrote: > >> Sounds perfect to me... > >Note that it would not meet your expectation that This seems OK; the 'update...from' syntax does also seemingly implies that the rows affected will only be those rows that match the predicate, so your interpretation is probably more in keeping with intuitive expectation. > >... at least not without some kind of outer-join support too. With >an inner join, t1 tuples not matching any t2 tuple wouldn't be modified >at all. This sounds good, but even when OJ come along, I can't see how I would get the same behaviour as: update t1 set f2=(Select Count(*) from t2 where t2.f1=t1.f1) where t1.f1 = 2 since in an OJ, count(*) will, I think, always be at least 1. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Here's the attachment I said I was going to attach to the last message. TFIF! -- Paul McGarry mailto:paulm@opentec.com.au Systems Integrator http://www.opentec.com.au Opentec Pty Ltd http://www.iebusiness.com.au 6 Lyon Park Road Phone: (02) 9878 1744 North Ryde NSW 2113 Fax: (02) 9878 1755
Hi Tom, > Hmm. There are several things going on here, but one thing that needs > clarification is whether this UPDATE is written correctly. Since it My goofup (you said you wanted a compact script!). You are correct there should have been an extra where condition in the triggered function. ====UPDATE entry_stats SET entry_minprice=min(item_price) FROM item where item_entry_id=NEW.item_entry_id AND item_live='f'; ==== Should really have been: ====UPDATE entry_stats SET entry_minprice=min(item_price) FROM item where item_entry_id=NEW.item_entry_id AND entry_stats.entry_id=item_entry_id AND item_live='f'; ==== which still generates the same error message (as the 'problem' is caused by the where clause, not what is being updated). FWIW I've attached the real function that I've implemented to get around the error message. In all probability the way I'm handling it is the right way: 1. Check I'm going to get a valid response from my aggregate 2a. If so perform the update with the aggregate 2b. If not perform the update with zeros(default value) Originally I was just wondering if I could do it all in one go, Try to perform the update and automatically get the aggregate result if it were 'available' and default to zeros if not. If I forget about aggregate functions for a moment and just consider an update where nothing matches the where criterion then I'd still use the same logic above to reset the values to their default. The only differences between using the aggregate function and not is that one throws an error and the other just updates 0 rows. > The thing that jumps out at me is that if you actually try the SELECT > illustrated above, you do not get any row, null or otherwise; you get > ERROR: Attribute entry_stats.ctid must be GROUPed or used in an > aggregate function > which is a mighty valid complaint. If you are aggregating rows to get > the MIN() then you don't have a unique ctid to deliver, so which row > ought to be updated? This is the system's way of expressing the same > concern I started with: this query doesn't seem to be well-posed. > > You don't see this complaint when you try the UPDATE, because ctid > is added to the implicit select result in a back-door way that doesn't > get checked for GROUP-BY validity. I wonder whether that is the bug. > If so, we'd basically be saying that no query like this is valid > (since UPDATE doesn't have a GROUP BY option, there'd be no way to > pass the grouping check). Would that mean that any update that used an aggregate function would be invalid? That would be a bit scary seeing as I am doing this in part to get around using aggregate functions in a view. > Another way to look at it is that perhaps an UPDATE involving aggregate > functions ought to be implicitly treated as GROUP BY targetTable.ctid. What exactly is a ctid? Thanks for your response Tom, it has been enlightening. I feel I'm getting a better understanding of what's going inside pgsql by the day from yourself and other peoples posts on the various lists. -- Paul McGarry mailto:paulm@opentec.com.au Systems Integrator http://www.opentec.com.au Opentec Pty Ltd http://www.iebusiness.com.au 6 Lyon Park Road Phone: (02) 9878 1744 North Ryde NSW 2113 Fax: (02) 9878 1755