Inheritance and query plans - Mailing list pgsql-general

From apb18@cornell.edu
Subject Inheritance and query plans
Date
Msg-id Pine.SOL.3.91.1030922144054.24160D-100000@travelers.mail.cornell.edu
Whole thread Raw
List pgsql-general
A bit of query plan strangeness.  Suppose you have an inheritance tree
such that the columns 'ID' and 'field' appear in the top level table,
call that table XXX.  tables YYY and ZZZ both inherit XXX.  Now suppose
there exists some query that returns a set of IDs that match some criteria
(that query may involve various tests/joins/etc on other arbitrary
tables).  Executing that query alone produces an optimal plan and the
exact result set desired.   If you want to display the contents
of the 'field' column for each element of XXX, YYY, and ZZZ that contains
an ID that is in this set,  have found (with 7.3.4 at least) that the
planner appears to be making a suboptimal plan when joining this set with
an inherited table, whereas without inheritance it makes a good plan.
Here are what the plans look like:


GOOD Query: Select field from only XXX, (SELECT ID from [subquery that
returns set of IDs]) as success where success.ID = XXX.ID
 ------
Nested loop
  -> Subquery scan
     [good plan that gets set of IDs that meet some criteria]
     ...
  -> Index scan on XXX


BAD Query: Select field from XXX, (SELECT ID from [subquery query that
returns set of IDs]) as success where success.ID = XXX.ID
---------
Hash Join
  Hash Cond:("outer".id = "inner".id)
    -> Append
      -> Seq scan on XXX
      -> Seq scan on YYY
      -> Seq scan on ZZZ
  Hash
    [good plan that gets set of IDs that meet some criteria]
        ...

In the first 'good' one, I essentially disable inheritance, and it
produces a good, fast plan.  For the query that uses inheritance, a
terrible plan is produced.  All tables XXX, YYY, and ZZZ are indexed
properly, and either one when subsituted into the query alone produces a
good plan (everything is vacuumed & analyzed and all ID fields are of
the same type).  Disabling hash joins, merge joins, sequential scans, etc
will not ever turn a "bad" plan into a "good" one.  Wouldn't a 'good'
plan with inheritance append the results of index scans on XXX, YYY, and
ZZZ instead of going to sequential scans (assume all tables are very big)?

I tried this with 7.4 and basically the same thing happens (with 7.4 the
'good' plans can be reformulated to use 'IN' and they're still good,
but the 'bad' plan is still bad).

So..  Is this as much of a bad thing as I believe, or does it look like I
made a mistake somewhere or could have done something differently?  After
a fair bit of head scratching, I'm pretty sure it's Postgres's fault, but
I'd like a second opinion.  In the case that it is.. I've seen quite a
TODO list for inheritance, should possibly be added to the
pile?*

    -Aaron


*...one that I'd be interested in working on.  I have some ideas that
would possibly solve multiple problems, but have not proven to myself
that they are good enough to be worth the trouble... mostly because I
haven't defined 'trouble' or 'good' yet :)

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: State of Beta 2
Next
From: "Marc G. Fournier"
Date:
Subject: Re: State of Beta 2