Re: PG 7.2b4 bug? - Mailing list pgsql-hackers

From Don Baccus
Subject Re: PG 7.2b4 bug?
Date
Msg-id 3C20FF98.8080005@pacifier.com
Whole thread Raw
In response to PG 7.2b4 bug?  (Don Baccus <dhogaza@pacifier.com>)
List pgsql-hackers
Some more bug-or-not-bug thoughts ...

I thought I'd add a quote from Date that furthers my belief that the 
subselect example I posted does indeed expose a bug:

(T1 is the table conceptually created by the various joins, etc)

"[if] the select-item takes the form "scalar-expression [[AS] column]"

...

For each row in T1 the scalar-expression is evaluated .."

(page 151 Date & Darwin)

SQL92 didn't support subselects in the select-item-list.  SQL3 extends 
the expression to include one-row selects that return a single scalar 
value.  It does NOT however add any wording that allows the subselect to 
be yanked and evaluated once rather than evaluated for each row.  The 
standard uses the word "applied" not "evaluated".  I interpret this to 
mean "evaluated" and it appears that Date does, too.

On the other hand the view example is giving the proper result in PG 
7.2, though only by luck, as Tom pointed out earler.  For (given the 
view "create view foo as select nextval('foo_sequence') as nextval;")

select foo.nextval
from multiple_rows;

isn't actually legal SQL.  It must be stated as:

select foo.nextval
from foo, multiple_rows;

(all PG does is add "foo" to the from clause for me if I leave it out).

The semantics of this are obvious when you think about it - materialize 
"foo" then cross-join the resulting table with multiple_rows.  Since 
"foo" returns a single row computed by "nextval('foo_sequence')" 
obviously the result seen with PG 7.2 is correct.  Date is quite clear 
on the semantics of this and it makes tons of sense since views are 
meant to be treated like tables.

So:

1. If an explicit scalar subselect appears in the target list, it should   be executed for every row in the result
set.

2. A view referenced in the target list is actually supposed to be   materialized in the FROM clause (even if implictly
addedto it for   you) then joined to the other tables in the query, if any.  Meaning   it should always be executed
onceand only once.  The standard   doesn't have PG-style rules, of course, but such tables are also   should be in the
FROMclause, evaluated and joined afterwards   IMO.
 

At least that's my reading and I've spent quite a bit of time on this now.

Unfortunately PG currently doesn't use the form of the query to decide 
whether or not to execute the subselect or view once or for each row, 
but rather does so depending on the estimated cost of each approach.

That's the real bug it seems.  The form of the query, not the whim of 
the optimizer, is the determinant.

Neither of these cases is likely to arise frequently in practice, so if 
I ruled Middle Earth I'd decree that:

1. It be filed as a bug

2. It not be assigned a high priority.

However it's not merely of academic interest.  The semantics of the view 
example is such that you should be able to force single-evaluation of a 
function by simply wrapping it in a view, regardless of whether or not 
it has side-effects.

Meanwhile I get to go off and inspect the roughly 750 queries that use 
this particular style view and determine which ones incorrectly assume 
that the view's evaluated more than once per query! :)


-- 
Don Baccus
Portland, OR
http://donb.photo.net, http://birdnotes.net, http://openacs.org



pgsql-hackers by date:

Previous
From: "Andrew G. Hammond"
Date:
Subject: Re: Explicit config patch 7.2B4
Next
From: Vince Vielhaber
Date:
Subject: Re: Explicit config patch 7.2B4