Re: Problem with a rule on upgrade to v7.1.1 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Problem with a rule on upgrade to v7.1.1
Date
Msg-id 6336.989606061@sss.pgh.pa.us
Whole thread Raw
In response to Problem with a rule on upgrade to v7.1.1  (Jon Lapham <lapham@extracta.com.br>)
Responses Re: Problem with a rule on upgrade to v7.1.1  (Jon Lapham <lapham@extracta.com.br>)
List pgsql-hackers
Jon Lapham <lapham@extracta.com.br> writes:
> But, there is definitely something wrong here, b/c the rule that is 
> causing this *should* only need to run the subselect [SELECT count(*) FROM 
> tplantorgan WHERE tplantid=NEW.tplantid AND sampleid<>NEW.sampleid AND 
> active='t'] one time!  My understanding is that the first conditional in 
> the statement (WHERE tplantid=NEW.tplantid) would be evaluated before the 
> subselect, and there is only 1 tuple in which tplantid=NEW.tplantid.

Well, I've figured it out.  The problem is that the expensive subselect
is actually being evaluated *first* among the several available WHERE
clauses, and thus it's run at basically every row of tplantorgan.  The
tplantid=NEW.tplantid clause does not help because it's a join clause
and is not evaluated until we do the join between tplant and
tplantorgan.  The subselect is a restriction clause and so is evaluated
lower down in the plan tree.

There are other restriction clauses available, however: OLD.active='t'
from the rule qual and sampleid=100430 from the original query both
end up in the same list of restriction clauses for the tplantorgan scan.
But in 7.1.1 they get put behind the subselect clause.  Had they come
first, the subselect would get evaluated at very few tplantorgan rows.

Postgres has never been particularly careful about the ordering of WHERE
clauses, and depending on release version, phase of the moon, etc etc
it's perfectly possible that the subselect would have ended up last in
the other versions you happened to try.  I was able to make current
sources run quickly by backing out the rev 1.59 change seen at
http://www.ca.postgresql.org/cgi/cvsweb.cgi/pgsql/src/backend/optimizer/plan/initsplan.c
This explains why you saw different results between 7.1RC1 and 7.1.1.
There was probably some other change between 7.0.2 and 7.0.3 that caused
7.0.3 to put the clauses in the "right" order whereas 7.0.2 didn't, but
I don't feel like trawling the revision history right now to find it.

The long-term solution to this is for the planner to pay attention to
the execution cost of WHERE clauses and try to put the expensive ones
last in whatever list they end up in.

Meanwhile, I don't really recommend that you hack up the code to reverse
the ordering yet again.  The query is a mess anyway, and rewriting it
seems the better pathway.

> I'm beginning to suspect that my rule is just simply designed poorly...

Yes.  Why not replace both of those rules with

ON UPDATE to tplantorgan DO UPDATE tplant SET active = EXISTS (SELECT 1 FROM tplantorgan WHERE
tplantid=NEW.tplantidAND active) WHERE tplantid=NEW.tplantid;
 

which seems a lot more obvious as well as quicker.  BTW, an index on
tplantorgan(tplantid) would likely help too...
        regards, tom lane


pgsql-hackers by date:

Previous
From: Ian Lance Taylor
Date:
Subject: Re: REL7_1_STABLE tag/branch
Next
From: Kaare Rasmussen
Date:
Subject: Search