Thread: Joins on inherited tables

Joins on inherited tables

From
apb18@cornell.edu
Date:
Hi,
    In some situations, it looks like the optimizer does not chose
efficient paths for joining inherited tables.  For I created a
rather trivial formulation to serve as an example.  I created the table
'numbers' comprising of the columns id (int) and value (text).  I also
created the table 'evens' and 'odds' that inherit numbers, with no
additional columns.  Into 'evens' I placed 50000 entries, each one with
an even (unique) id and random 'value'.  Likewise, for 'odds', I created
50000 odd (and unique) id fields id fields and random 'value', and
created index on all ID fields of every table that has any rows (and
analyzed).

so.. my tables look like this:

    Table "public.numbers"
 Column |  Type   | Modifiers
--------+---------+-----------
 id     | integer |
 value  | text    |

     Table "public.evens"
 Column |  Type   | Modifiers
--------+---------+-----------
 id     | integer |
 value  | text    |
Indexes:
    "ei" btree (id)
Inherits: numbers

     Table "public.odds"
 Column |  Type   | Modifiers
--------+---------+-----------
 id     | integer |
 value  | text    |
Indexes:
    "oi" btree (id)
Inherits: numbers


As per the above construction, 'evens' and 'odds' both have 50000
rows. 'numbers' contains none.



Now, I created a trivial query that would use 'numbers' as an inheritor
table in a join (a very stupid one, but a join nevertheless) as
follows, which produces a terrible, but correct, plan:

select value from (SELECT 1::integer as id) as ids JOIN numbers on
(numbers.id = ids.id);
                                   QUERY PLAN
---------------------------------------------------------------------------
Hash Join  (cost=0.02..2195.79 rows=501 width=19)
   Hash Cond: ("outer".id = "inner".id)
   ->  Append  (cost=0.00..1690.50 rows=100051 width=23)
         ->  Seq Scan on numbers  (cost=0.00..0.00 rows=1 width=23)
         ->  Seq Scan on evens numbers (cost=0.00..845.25 rows=50025 width=23)
         ->  Seq Scan on odds numbers (cost=0.00..845.25 rows=50025 width=23)
   ->  Hash  (cost=0.02..0.02 rows=1 width=4)
         ->  Subquery Scan ids  (cost=0.00..0.02 rows=1 width=4)
               ->  Result  (cost=0.00..0.01 rows=1 width=0)




 Now, I subsitute 'evens' for 'numbers', so I am now joining with a normal,
non-inherited table.  The plan is much better:

select value from (SELECT 1::integer as id) as ids JOIN evens on
(evens.id = ids.id);

                              QUERY PLAN
-----------------------------------------------------------------------
 Nested Loop  (cost=0.00..3.05 rows=2 width=19)
   ->  Subquery Scan ids  (cost=0.00..0.02 rows=1 width=4)
         ->  Result  (cost=0.00..0.01 rows=1 width=0)
   ->  Index Scan using ei on evens  (cost=0.00..3.01 rows=1 width=23)
         Index Cond: (evens.id = "outer".id)


I would think that the ideal plan for the first query should be a nested
loop like the second that considers indexes where possible, and it would
look as follows:

            HYPOTHETICAL PLAN
------------------------------------------------------------------------
 Nested Loop
   ->  Subquery Scan ids
         ->  Result
   ->  Append
      ->  Seq Scan on numbers
      ->  Index Scan using ei on evens
            Index Cond: (evens.id = "outer.id")
      ->  Index Scan using oi on odds
            Index Cond: (odds.id = "outer.id")

So.. Why wouldn't postgres use such a plan?  I could think of three
reasons:
    - The planner wasn't considering this plan due to some
      fault of its own
    - That plan makes no sense and would not be able to be run in the
      executor, and therefore it would be wasteful to consider it.
    - It truly is more expensive than a hash join

I've pretty much ruled out the third and I suspect the second is also
untrue (though have not proven that to myself), leaving the first.  If it
is indeed the second, that the plan makes no sense, someone please let me
know!

OK, so I took a look into the optimizer and over time got a better
understanding of what's going on, though I still don't understand it
completely.  Here's my theory as to what's happening:

For this query, most of the path consideration takes place in
match_unsorted_outer() in path/joinpath.c.  For the 'good' query against
the non-inherited 'evens' table, the good plan is generated in the line:

    bestinnerjoin = best_inner_indexscan(...);

Since an inherited table doesn't have one single index over all its
inherited tables, this call produces a null bestinnerjoin.

Later on in match_unsorted_inner(), various access paths are considered
for a nested loop.  One is bestinnerjoin (when it exists), and that is
how the 'good' query gets its nested loop with an index scan.

Other paths considered for inclusion in the nested loop are
'inner_cheapest_total' and 'inner_cheapest_startup';  These plans,
presumably, contain sequential scans, which are expensive enough in the
nested loop that the nested loop plan is suboptimal compared to a hash
join, which is what happens in the 'bad' query that joins against the
inheritor table.

Now, it seems to me as if the solution would be one of the following:

    - create a new bestinnerjoin plan when best_inner_indexscan
      returned null, yet the current relation is an inheritor.
          This bestinnerjoin plan will use the optimal Append plan
          that comprises of the optimal plans for the inherited
          tables that are relevant for the joins (as in the
      case of the hypothetical query above where the append consists
          of a number of index and sequential scans)

    - Consider the optimal Append path relevant to the join
          later on when considering paths for use in a nested loop.
      i.e. inner_cheapest_total or inner_cheapest_startup
      will have to be this good append plan.

For 2, the problem seems to be that in creating the inner_cheapest_total,
joininfo nodes for each child relation do not exist.  I experimented
by just copying the parents joininfo nodes to each child, and it looked
like it was considering index scans somewhere along the line, but the
final plan chosen did not use index scans.  I didn't see where it
failed.   I'm kinda skeptical of 3 anyway, because the
inner_cheapest_total in the 'good' query with out inheritance does not
appear to involve index scans at all.

So.. does anybody have any advice?  Am I totally off base with
my assertions that a better plan can exist when joining inherited
tables?  Does my analysis of the problem and possible solutions in the
optimizer make any sense?  I basically pored over the source code, read
the documentation, and did little tests here and there to arrive at my
conclusions, but cannot claim to have a good global view of how
everything works.

    Thanks very much

        -Aaron


Re: Joins on inherited tables

From
Tom Lane
Date:
apb18@cornell.edu writes:
> So.. does anybody have any advice?

Look at set_inherited_rel_pathlist() in allpaths.c --- it forms the best
plan for fully scanning the inheritance-tree table.  Currently that's
the *only* plan considered, and it does not make any use of join
clauses.  It's possible that something could be done with providing a
similar routine for best inner indexscans taken across the whole
inheritance tree.  You'd have to figure out how to locate the applicable
joinclauses though.  I think they'd be attached to the inheritance-tree
parent relation and not to the individual inheritance tree member
relations.  Also, if you are wondering why best_inner_indexscan() is so
tense about caching its results, that's because it gets called *a lot*
in large join problems.  If you don't achieve a similar level of
efficiency then you'll be seeing some nasty performance problems in
larger queries.

I think you'd find there is executor work to do as well; I'm not sure
how the outer-relation values would get propagated down into the
indexscans when there's an Append node between.  Maybe the existing code
would Just Work, but I bet not.

<digression>

Not sure if this will help you, but:

Once upon a time the planner did the APPEND for an inheritance tree at
the top of the plan not the bottom.  (It still does when the tree is the
target of an update/delete query.)  In 7.0 for example I get a plan like
this:

create table pt (f1 int primary key);
create table ct1 (f2 int) inherits (pt);
create table ct2 (f2 int) inherits (pt);
create index ct1i on ct1(f1);
create table bar(f1 int);

explain select * from pt*, bar where pt.f1 = bar.f1;
NOTICE:  QUERY PLAN:

Append  (cost=69.83..474.33 rows=30000 width=8)
    ->  Merge Join  (cost=69.83..154.83 rows=10000 width=8)
          ->  Index Scan using pt_pkey on pt  (cost=0.00..60.00 rows=1000 width=4)
          ->  Sort  (cost=69.83..69.83 rows=1000 width=4)
                ->  Seq Scan on bar  (cost=0.00..20.00 rows=1000 width=4)
    ->  Merge Join  (cost=69.83..154.83 rows=10000 width=8)
          ->  Index Scan using ct1i on ct1 pt  (cost=0.00..60.00 rows=1000 width=4)
          ->  Sort  (cost=69.83..69.83 rows=1000 width=4)
                ->  Seq Scan on bar  (cost=0.00..20.00 rows=1000 width=4)
    ->  Merge Join  (cost=139.66..164.66 rows=10000 width=8)
          ->  Sort  (cost=69.83..69.83 rows=1000 width=4)
                ->  Seq Scan on bar  (cost=0.00..20.00 rows=1000 width=4)
          ->  Sort  (cost=69.83..69.83 rows=1000 width=4)
                ->  Seq Scan on ct2 pt  (cost=0.00..20.00 rows=1000 width=4)

whereas the same test in CVS tip produces

                                 QUERY PLAN
----------------------------------------------------------------------------
 Merge Join  (cost=303.09..353.09 rows=3000 width=8)
   Merge Cond: ("outer".f1 = "inner".f1)
   ->  Sort  (cost=69.83..72.33 rows=1000 width=4)
         Sort Key: bar.f1
         ->  Seq Scan on bar  (cost=0.00..20.00 rows=1000 width=4)
   ->  Sort  (cost=233.26..240.76 rows=3000 width=4)
         Sort Key: public.pt.f1
         ->  Append  (cost=0.00..60.00 rows=3000 width=4)
               ->  Seq Scan on pt  (cost=0.00..20.00 rows=1000 width=4)
               ->  Seq Scan on ct1 pt  (cost=0.00..20.00 rows=1000 width=4)
               ->  Seq Scan on ct2 pt  (cost=0.00..20.00 rows=1000 width=4)

The fact that 7.0 could actually adapt to different index sets for
different child tables was kinda cool, but the append-at-the-top
strategy failed completely for outer joins, so we had to abandon it.
In practice I think the generated plan was usually worse anyway (note
that bar gets scanned three times in 7.0's plan), but for the specific
case where the inheritance tree is on the inside of a nestloop that
could be indexed, the new approach is not as good.  If you can come up
with a fix that doesn't break things in other respects, it'd be great.

[digs in CVS logs]  The patch that altered the APPEND-at-the-top
behavior was this one:

2000-11-11 19:36  tgl

    * src/: backend/commands/command.c, backend/commands/copy.c,
    backend/commands/explain.c, backend/executor/execMain.c,
    backend/executor/execQual.c, backend/executor/execTuples.c,
    backend/executor/execUtils.c, backend/executor/functions.c,
    backend/executor/nodeAppend.c, backend/executor/nodeSeqscan.c,
    backend/nodes/copyfuncs.c, backend/nodes/equalfuncs.c,
    backend/nodes/outfuncs.c, backend/nodes/readfuncs.c,
    backend/optimizer/path/allpaths.c,
    backend/optimizer/path/pathkeys.c,
    backend/optimizer/plan/createplan.c,
    backend/optimizer/plan/planmain.c,
    backend/optimizer/plan/planner.c,
    backend/optimizer/prep/prepunion.c,
    backend/optimizer/util/pathnode.c,
    backend/optimizer/util/relnode.c, backend/parser/parse_clause.c,
    backend/tcop/pquery.c, include/catalog/catversion.h,
    include/executor/executor.h, include/executor/tuptable.h,
    include/nodes/execnodes.h, include/nodes/nodes.h,
    include/nodes/parsenodes.h, include/nodes/plannodes.h,
    include/nodes/relation.h, include/optimizer/pathnode.h,
    include/optimizer/planmain.h, include/optimizer/planner.h,
    include/optimizer/prep.h, backend/optimizer/README: Restructure
    handling of inheritance queries so that they work with outer joins,
    and clean things up a good deal at the same time.  Append plan node
    no longer hacks on rangetable at runtime --- instead, all child
    tables are given their own RT entries during planning.    Concept of
    multiple target tables pushed up into execMain, replacing bug-prone
    implementation within nodeAppend.  Planner now supports generating
    Append plans for inheritance sets either at the top of the plan
    (the old way) or at the bottom.  Expanding at the bottom is
    appropriate for tables used as sources, since they may appear
    inside an outer join; but we must still expand at the top when the
    target of an UPDATE or DELETE is an inheritance set, because we
    actually need a different targetlist and junkfilter for each target
    table in that case.  Fortunately a target table can't be inside an
    outer join...  Bizarre mutual recursion between union_planner and
    prepunion.c is gone --- in fact, union_planner doesn't really have
    much to do with union queries anymore, so I renamed it
    grouping_planner.

Not sure if studying that diff would teach you anything useful, but
there it is...

</digression>

            regards, tom lane

Re: Joins on inherited tables

From
apb18@cornell.edu
Date:
OK, so I've had a bit of time to look things over, and appear to be
making headway.  Here's how things stand right now:

I added a function called best_inner_scan used the same way as
best_inner_indexscan, but it's a bit more generalized in the sense that
it can make append plans comprising of the best scans for each
constituent table (it makes calls to best_inner_indexscan for each
child), or just return the best simple index scan (or null) for plain
relations.

In order to make that work, I gave the child tables modified join
clauses from the parent... modified in the sense that I had to make the
operands match the inherited child table (they would match the parent
otherwise if I were to simply copy the Joininfo nodes, and thus fail in
finding an appropriate index in the child table).  I'm not entirely
comfortable with that solution yet, as I'm not absolutely certain those
additonal modified join clauses wont' affect something else in the code
that I'm not aware of,  but it appears to be having the desired effect.

Basically, with optimizer debug enabled, I'm getting plans that look like
this (with the same queries as before)

RELOPTINFO (1 2): rows=501 width=19
    cheapest total path:
    NestLoop(1 2) rows=501 cost=0.00..1253.67
      clauses: numbers.id = ids.id
        SeqScan(1) rows=1 cost=0.00..0.02
        Append(2)  rows=100051 cost=0.00..3.01

As opposed to this:

RELOPTINFO (1 2): rows=501 width=19
        cheapest total path:
        HashJoin(1 2) rows=501 cost=0.00..2195.79
          clauses: numbers.id = ids.id
                SeqScan(1) rows=1 cost=0.00..0.02
                Append(2)  rows=100051 cost=0.00..1690.50

The total cost seems a high for the nestloop.. its constituents are
certainly cheap. I need to look to see if I missed keeping track of
costs somewhere.

When I EXPLAIN, though, I get an error from the executor:
"ERROR: both left and right operands are rel-vars".  I haven't looked
into that yet, but the results so far are encouraging enough to press on
and get this completed.

There was one hairy part, though, which will have to be addressed at some
later point:  Right now there is a boolean 'inh' in the RangeTblEntry
struct which indicates "inheritance requested".  When the inheritance root
is first expanded by expand_inherited_rtentry(), the rte->inh is
nulled in order to prevent expansion of an UPDATE/DELETE target.  This
presented problems for me when I wanted to detect which relation was an
inheritor one in order to expand it into the append path.  For my testing
purposes, I just commented out the line, but for a real
solution, that's not an acceptable solution and some struct might have to
be changed slightly in order to convey the inheritance knowledge..

So, I guess the next step is to see what the executor is complaining
about and see if it's something that would need attention in the executor
of if it's something I did wrong.. If everything appears to work after
that point, then I'll check for efficiency and use of cache in generating
the inner scan plans.

Thanks for the advice and historical perspective so far, Tom. It has been
quite helpful.

    -Aaron