Thread: Re: [SQL] MAX() of 0 records.

Re: [SQL] MAX() of 0 records.

From
Paul McGarry
Date:
Tom Lane wrote:

> This seems like a backend bug to me, but being an overworked hacker
> I'm too lazy to try to reconstruct the scenario from your sketch.
> Could I trouble you to submit a formal bug report with a specific,
> hopefully compact script that triggers the problem?

I've attached it here, along with the output I see. I am running 7.0.2
and the problem occurs on both my x86 Linux and Sparc Solaris
installations.

In addition to the output attached the postmaster console adds:
====
DEBUG: Last error occured while executing PL/pgSQL function
setentrystats
DEBUG: line 4 at SQL statement
====

Thanks,

--
Paul McGarry            mailto:paulm@opentec.com.au
Systems Integrator      http://www.opentec.com.au
Opentec Pty Ltd         http://www.iebusiness.com.au
6 Lyon Park Road        Phone: (02) 9878 1744
North Ryde NSW 2113     Fax:   (02) 9878 1755
Attachment

Re: [SQL] MAX() of 0 records.

From
Paul McGarry
Date:
Hi Tom,

> Hmm.  There are several things going on here, but one thing that needs
> clarification is whether this UPDATE is written correctly.  Since it

My goofup (you said you wanted a compact script!). 
You are correct there should have been an extra where condition in
the triggered function.
====UPDATE entry_stats    SET entry_minprice=min(item_price)  FROM item where item_entry_id=NEW.item_entry_id AND
item_live='f';
====
Should really have been:
====UPDATE entry_stats    SET entry_minprice=min(item_price)  FROM item where item_entry_id=NEW.item_entry_id   AND
entry_stats.entry_id=item_entry_id  AND item_live='f';
 
====
which still generates the same error message (as the 'problem' is
caused by the where clause, not what is being updated).

FWIW I've attached the real function that I've implemented to get 
around the error message. In all probability the way I'm handling
it is the right way:

1. Check I'm going to get a valid response from my aggregate
2a. If so perform the update with the aggregate
2b. If not perform the update with zeros(default value)

Originally I was just wondering if I could do it all in one go,
Try to perform the update and automatically get the aggregate
result if it were 'available' and default to zeros if not.

If I forget about aggregate functions for a moment and just
consider an update where nothing matches the where criterion
then I'd still use the same logic above to reset the values
to their default. The only differences between using the
aggregate function and not is that one throws an error and
the other just updates 0 rows.

> The thing that jumps out at me is that if you actually try the SELECT
> illustrated above, you do not get any row, null or otherwise; you get
> ERROR:  Attribute entry_stats.ctid must be GROUPed or used in an
>         aggregate function
> which is a mighty valid complaint.  If you are aggregating rows to get
> the MIN() then you don't have a unique ctid to deliver, so which row
> ought to be updated?  This is the system's way of expressing the same
> concern I started with: this query doesn't seem to be well-posed.
> 
> You don't see this complaint when you try the UPDATE, because ctid
> is added to the implicit select result in a back-door way that doesn't
> get checked for GROUP-BY validity.  I wonder whether that is the bug.
> If so, we'd basically be saying that no query like this is valid
> (since UPDATE doesn't have a GROUP BY option, there'd be no way to
> pass the grouping check).

Would that mean that any update that used an aggregate function
would be invalid? That would be a bit scary seeing as I am doing
this in part to get around using aggregate functions in a view.

> Another way to look at it is that perhaps an UPDATE involving aggregate
> functions ought to be implicitly treated as GROUP BY targetTable.ctid.

What exactly is a ctid?

Thanks for your response Tom, it has been enlightening. I feel I'm
getting a better understanding of what's going inside pgsql by the
day from yourself and other peoples posts on the various lists.

-- 
Paul McGarry            mailto:paulm@opentec.com.au 
Systems Integrator      http://www.opentec.com.au 
Opentec Pty Ltd         http://www.iebusiness.com.au
6 Lyon Park Road        Phone: (02) 9878 1744 
North Ryde NSW 2113     Fax:   (02) 9878 1755


Re: [SQL] MAX() of 0 records.

From
Tom Lane
Date:
Paul McGarry <paulm@opentec.com.au> writes:
> CREATE TABLE entry_stats
> (
>     entry_id INT4 NOT NULL REFERENCES entry ON DELETE CASCADE,
>     entry_minprice INT4 NOT NULL DEFAULT 0
> );
>
> CREATE TABLE item(
>     item_id INT4 PRIMARY KEY,
>     item_entry_id INT4 NOT NULL REFERENCES entry ON DELETE NO ACTION,
>     item_price INT4 NOT NULL,
>     item_live bool NOT NULL DEFAULT 'n'
> );
> 
> [trigger using]
>
>     UPDATE entry_stats 
>     SET entry_minprice=min(item_price)
>     FROM item where item_entry_id=NEW.item_entry_id AND item_live='t';
>
> ERROR:  ExecutePlan: (junk) `ctid' is NULL!

Hmm.  There are several things going on here, but one thing that needs
clarification is whether this UPDATE is written correctly.  Since it
has no constraint on entry_stats, it seems to me that *every* row of
entry_stats will have entry_minprice set to the same value, namely
the minimum item_price over those item rows that satisfy the WHERE
condition.  Surely that wasn't what you wanted?  Shouldn't there be an
additional WHERE clause like entry_id = item_entry_id?

Anyway, the proximate cause of the error message is as follows.
A cross-table UPDATE like this is actually implemented as if it were
a SELECT:SELECT entry_stats.ctid, min(item_price)FROM entry_stats, item WHERE ...;
For each row emitted by this underlying SELECT, the executor takes
the ctid result column (which identifies the particular target tuple
in the target table) and updates that tuple by stuffing the additional
SELECT result column(s) into the specified fields of that tuple.

Now, if you try a SELECT like the above in a situation where there are
no tuples matching the WHERE clause, what you get out is a row of all
NULLs --- because that's what you get from SELECT if there's an
aggregate function with no GROUP BY and no input rows.  The executor
gets this dummy row, tries to do a tuple update using it, and chokes
because the ctid is NULL.  So that explains why the error message is
what it is.  Next question is what if anything should be done
differently.  We could just have the executor ignore result rows where
ctid is NULL, but that seems like patching around the problem not fixing
it.

The thing that jumps out at me is that if you actually try the SELECT
illustrated above, you do not get any row, null or otherwise; you get
ERROR:  Attribute entry_stats.ctid must be GROUPed or used in an       aggregate function
which is a mighty valid complaint.  If you are aggregating rows to get
the MIN() then you don't have a unique ctid to deliver, so which row
ought to be updated?  This is the system's way of expressing the same
concern I started with: this query doesn't seem to be well-posed.

You don't see this complaint when you try the UPDATE, because ctid
is added to the implicit select result in a back-door way that doesn't
get checked for GROUP-BY validity.  I wonder whether that is the bug.
If so, we'd basically be saying that no query like this is valid
(since UPDATE doesn't have a GROUP BY option, there'd be no way to
pass the grouping check).

Another way to look at it is that perhaps an UPDATE involving aggregate
functions ought to be implicitly treated as GROUP BY targetTable.ctid.
In other words, the MIN() or other aggregate function is implicitly
evaluated over only those join tuples that are formed for a single
target tuple.  Intuitively that seems to make sense, and it solves the
problem you're complaining of, because no matching tuples = no groups =
no result tuples = update does nothing = no problem.  But I have a
sneaking suspicion that I'm missing some nasty problem with this idea
too.

Comments anyone?  What exactly *should* be the behavior of an UPDATE
that uses an aggregate function and a join to another table?  Over what
set of tuples should the aggregate be evaluated?
        regards, tom lane


Re: [SQL] MAX() of 0 records.

From
Paul McGarry
Date:
Here's the attachment I said I was going to attach to the last message.

TFIF!

-- 
Paul McGarry            mailto:paulm@opentec.com.au 
Systems Integrator      http://www.opentec.com.au 
Opentec Pty Ltd         http://www.iebusiness.com.au
6 Lyon Park Road        Phone: (02) 9878 1744 
North Ryde NSW 2113     Fax:   (02) 9878 1755

Re: [HACKERS] Re: [SQL] MAX() of 0 records.

From
Chris Bitmead
Date:
Another observation is that if the WHERE clause is successful, it seems
to update the first record in the target relation that it finds which is
a pretty random result.

pghack=# create table e(ee text, eee integer);
CREATE
pghack=# create table f(ff text, fff integer);
CREATE
pghack=# insert into e values('e', 1);
INSERT 18871 1
pghack=# insert into e values('ee', 2);
INSERT 18872 1
pghack=# insert into e values('eee', 3);
INSERT 18873 1
pghack=# insert into f values('fff', 3);
INSERT 18874 1
pghack=# insert into f values('ff', 2);
INSERT 18875 1
pghack=# insert into f values('f', 1);
INSERT 18876 1
pghack=# update e set eee=min(f.fff) from f;
UPDATE 1
pghack=# select * from e;ee  | eee 
-----+-----ee  |   2eee |   3e   |   1
(3 rows)

pghack=# select min(f.fff) from f;min 
-----  1
(1 row)

pghack=# update e set eee=min(f.fff) from f;
UPDATE 1
pghack=# select min(f.fff) from f;min 
-----  1
(1 row)

pghack=# select * from e;ee  | eee 
-----+-----eee |   3e   |   1ee  |   1
(3 rows)

pghack=# update e set eee=min(f.fff) from f;
UPDATE 1
pghack=# select * from e;ee  | eee 
-----+-----e   |   1ee  |   1eee |   1
(3 rows)




Tom Lane wrote:
> 
> Paul McGarry <paulm@opentec.com.au> writes:
> > CREATE TABLE entry_stats
> > (
> >     entry_id INT4 NOT NULL REFERENCES entry ON DELETE CASCADE,
> >     entry_minprice INT4 NOT NULL DEFAULT 0
> > );
> >
> > CREATE TABLE item(
> >     item_id INT4 PRIMARY KEY,
> >     item_entry_id INT4 NOT NULL REFERENCES entry ON DELETE NO ACTION,
> >     item_price INT4 NOT NULL,
> >     item_live bool NOT NULL DEFAULT 'n'
> > );
> >
> > [trigger using]
> >
> >     UPDATE entry_stats
> >     SET entry_minprice=min(item_price)
> >     FROM item where item_entry_id=NEW.item_entry_id AND item_live='t';
> >
> > ERROR:  ExecutePlan: (junk) `ctid' is NULL!
> 
> Hmm.  There are several things going on here, but one thing that needs
> clarification is whether this UPDATE is written correctly.  Since it
> has no constraint on entry_stats, it seems to me that *every* row of
> entry_stats will have entry_minprice set to the same value, namely
> the minimum item_price over those item rows that satisfy the WHERE
> condition.  Surely that wasn't what you wanted?  Shouldn't there be an
> additional WHERE clause like entry_id = item_entry_id?
> 
> Anyway, the proximate cause of the error message is as follows.
> A cross-table UPDATE like this is actually implemented as if it were
> a SELECT:
>         SELECT entry_stats.ctid, min(item_price)
>         FROM entry_stats, item WHERE ...;
> For each row emitted by this underlying SELECT, the executor takes
> the ctid result column (which identifies the particular target tuple
> in the target table) and updates that tuple by stuffing the additional
> SELECT result column(s) into the specified fields of that tuple.
> 
> Now, if you try a SELECT like the above in a situation where there are
> no tuples matching the WHERE clause, what you get out is a row of all
> NULLs --- because that's what you get from SELECT if there's an
> aggregate function with no GROUP BY and no input rows.  The executor
> gets this dummy row, tries to do a tuple update using it, and chokes
> because the ctid is NULL.  So that explains why the error message is
> what it is.  Next question is what if anything should be done
> differently.  We could just have the executor ignore result rows where
> ctid is NULL, but that seems like patching around the problem not fixing
> it.
> 
> The thing that jumps out at me is that if you actually try the SELECT
> illustrated above, you do not get any row, null or otherwise; you get
> ERROR:  Attribute entry_stats.ctid must be GROUPed or used in an
>         aggregate function
> which is a mighty valid complaint.  If you are aggregating rows to get
> the MIN() then you don't have a unique ctid to deliver, so which row
> ought to be updated?  This is the system's way of expressing the same
> concern I started with: this query doesn't seem to be well-posed.
> 
> You don't see this complaint when you try the UPDATE, because ctid
> is added to the implicit select result in a back-door way that doesn't
> get checked for GROUP-BY validity.  I wonder whether that is the bug.
> If so, we'd basically be saying that no query like this is valid
> (since UPDATE doesn't have a GROUP BY option, there'd be no way to
> pass the grouping check).
> 
> Another way to look at it is that perhaps an UPDATE involving aggregate
> functions ought to be implicitly treated as GROUP BY targetTable.ctid.
> In other words, the MIN() or other aggregate function is implicitly
> evaluated over only those join tuples that are formed for a single
> target tuple.  Intuitively that seems to make sense, and it solves the
> problem you're complaining of, because no matching tuples = no groups =
> no result tuples = update does nothing = no problem.  But I have a
> sneaking suspicion that I'm missing some nasty problem with this idea
> too.
> 
> Comments anyone?  What exactly *should* be the behavior of an UPDATE
> that uses an aggregate function and a join to another table?  Over what
> set of tuples should the aggregate be evaluated?
> 
>                         regards, tom lane


Re: Re: [HACKERS] Re: [SQL] MAX() of 0 records.

From
Tom Lane
Date:
Chris Bitmead <chris@bitmead.com> writes:
> Another observation is that if the WHERE clause is successful, it seems
> to update the first record in the target relation that it finds which is
> a pretty random result.

Wouldn't surprise me --- leastwise, you will get a random one of the
input ctid values emitted into the aggregated SELECT row.  Offhand I'd
have expected the last-scanned one, not the first-scanned, but the
point is that the behavior is dependent on the implementation's choice
of scanning order.  This is exactly the uncertainty that the check for
"attribute must be GROUPed or used in an aggregate function" is designed
to protect you from.  But ctid is (currently) escaping that check.

It seems to me that we have two reasonable ways to proceed:

1. Forbid aggregates at the top level of UPDATE.  Then you'd need to do
a subselect, perhaps something likeUPDATE fooSET bar = (SELECT min(f1) FROM othertab           WHERE othertab.keycol =
foo.keycol)WHEREcondition-determining-which-foo-rows-to-update
 
if you wanted to use an aggregate.  This is pretty ugly, especially so
if the outer WHERE condition is itself dependent on scanning othertab
to see if there are matches to the foo row.

2. Do an implicit GROUP BY ctid as I suggested last night.  I still
don't see any holes in that idea, but I am still worried that there
might be one.
        regards, tom lane


Re: Re: [HACKERS] Re: [SQL] MAX() of 0 records.

From
"Robert B. Easter"
Date:
On Fri, 07 Jul 2000, Tom Lane wrote:
> Chris Bitmead <chris@bitmead.com> writes:
>     UPDATE foo
>     SET bar = (SELECT min(f1) FROM othertab
>                WHERE othertab.keycol = foo.keycol)
>     WHERE condition-determining-which-foo-rows-to-update
> if you wanted to use an aggregate.  This is pretty ugly, especially so

If you use min(x) or max(x) frequently, isn't it best to make a trigger that
intercepts x on insert and update, then check it and store it somewhere rather
than scanning for it everytime?  (not that this fixes any db problem thats being
discussed here)
--         Robert


Re: Re: [SQL] MAX() of 0 records.

From
Tom Lane
Date:
Paul McGarry <paulm@opentec.com.au> writes:
> Would that mean that any update that used an aggregate function
> would be invalid? That would be a bit scary seeing as I am doing
> this in part to get around using aggregate functions in a view.

You'd have to embed the aggregate in a sub-select if we did things
that way.  I'd rather not have such a restriction, but only if we can
understand clearly what it means to put an aggregate directly into
UPDATE.  The executive summary of what I said before is "exactly what
SHOULD this query do, anyway?"  I think it's not well-defined without
some additional assumptions.

>> Another way to look at it is that perhaps an UPDATE involving aggregate
>> functions ought to be implicitly treated as GROUP BY targetTable.ctid.

> What exactly is a ctid?

Physical location of the tuple, expressed as block# and tuple# within
the file.  Try "select ctid,* from sometable" ...
        regards, tom lane


Re: Re: [HACKERS] Re: [SQL] MAX() of 0 records.

From
Tom Lane
Date:
"Robert B. Easter" <reaster@comptechnews.com> writes:
> If you use min(x) or max(x) frequently, isn't it best to make a
> trigger that intercepts x on insert and update, then check it and
> store it somewhere rather than scanning for it everytime?

I believe that's exactly what the original questioner is trying to do...
        regards, tom lane


Re: [SQL] MAX() of 0 records.

From
Tom Lane
Date:
I wrote:
> Comments anyone?  What exactly *should* be the behavior of an UPDATE
> that uses an aggregate function and a join to another table?  Over what
> set of tuples should the aggregate be evaluated?

Further note on this: SQL99 specifies:
        <update statement: searched> ::=             UPDATE <target table>               SET <set clause list>
    [ WHERE <search condition> ]
 
        ...
        5) A <value expression> in a <set clause> shall not directly           contain a <set function specification>.

so the construct is definitely not SQL-compliant.  Maybe we should just
forbid it.  However, if you are joining against another table (which
itself is not an SQL feature) then it seems like there is some potential
use in it.  What do people think of my implicit-GROUP-BY-ctid idea?
That would basically say that the aggregate is computed over all the
tuples that join to a single target tuple.
        regards, tom lane


Re: [HACKERS] Re: [SQL] MAX() of 0 records.

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
>> What do people think of my implicit-GROUP-BY-ctid idea?
>> That would basically say that the aggregate is computed over all the
>> tuples that join to a single target tuple.

> Sounds perfect to me...

Note that it would not meet your expectation that
   update t1 set f2=count(*) from t2 where t1.f1=2 and t2.f1=t1.f1 ;

means the same as
   update t1 set f2=(Select Count(*) from t2 where t2.f1=t1.f1) where
t1.f1 = 2

... at least not without some kind of outer-join support too.  With
an inner join, t1 tuples not matching any t2 tuple wouldn't be modified
at all.
        regards, tom lane


Re: [HACKERS] Re: [SQL] MAX() of 0 records.

From
Philip Warner
Date:
At 14:35 9/07/00 -0400, Tom Lane wrote:
>
>so the construct is definitely not SQL-compliant.  Maybe we should just
>forbid it.  However, if you are joining against another table (which
>itself is not an SQL feature) then it seems like there is some potential
>use in it.  What do people think of my implicit-GROUP-BY-ctid idea?
>That would basically say that the aggregate is computed over all the
>tuples that join to a single target tuple.

Sounds perfect to me...


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: [HACKERS] Re: [SQL] MAX() of 0 records.

From
Philip Warner
Date:
At 21:21 9/07/00 -0400, Tom Lane wrote:
>
>> Sounds perfect to me...
>
>Note that it would not meet your expectation that

This seems OK; the 'update...from' syntax does also seemingly implies that
the rows affected will only be those rows that match the predicate, so your
interpretation is probably more in keeping with intuitive expectation.

>
>... at least not without some kind of outer-join support too.  With
>an inner join, t1 tuples not matching any t2 tuple wouldn't be modified
>at all.

This sounds good, but even when OJ come along, I can't see how I would get
the same behaviour as:
   update t1 set f2=(Select Count(*) from t2 where t2.f1=t1.f1)        where t1.f1 = 2

since in an OJ, count(*) will, I think, always be at least 1.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/