Re: Joins on inherited tables - Mailing list pgsql-performance

From apb18@cornell.edu
Subject Re: Joins on inherited tables
Date
Msg-id Pine.SOL.3.91.1031003152332.15800J-100000@travelers.mail.cornell.edu
Whole thread Raw
In response to Re: Joins on inherited tables  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
OK, so I've had a bit of time to look things over, and appear to be
making headway.  Here's how things stand right now:

I added a function called best_inner_scan used the same way as
best_inner_indexscan, but it's a bit more generalized in the sense that
it can make append plans comprising of the best scans for each
constituent table (it makes calls to best_inner_indexscan for each
child), or just return the best simple index scan (or null) for plain
relations.

In order to make that work, I gave the child tables modified join
clauses from the parent... modified in the sense that I had to make the
operands match the inherited child table (they would match the parent
otherwise if I were to simply copy the Joininfo nodes, and thus fail in
finding an appropriate index in the child table).  I'm not entirely
comfortable with that solution yet, as I'm not absolutely certain those
additonal modified join clauses wont' affect something else in the code
that I'm not aware of,  but it appears to be having the desired effect.

Basically, with optimizer debug enabled, I'm getting plans that look like
this (with the same queries as before)

RELOPTINFO (1 2): rows=501 width=19
    cheapest total path:
    NestLoop(1 2) rows=501 cost=0.00..1253.67
      clauses: numbers.id = ids.id
        SeqScan(1) rows=1 cost=0.00..0.02
        Append(2)  rows=100051 cost=0.00..3.01

As opposed to this:

RELOPTINFO (1 2): rows=501 width=19
        cheapest total path:
        HashJoin(1 2) rows=501 cost=0.00..2195.79
          clauses: numbers.id = ids.id
                SeqScan(1) rows=1 cost=0.00..0.02
                Append(2)  rows=100051 cost=0.00..1690.50

The total cost seems a high for the nestloop.. its constituents are
certainly cheap. I need to look to see if I missed keeping track of
costs somewhere.

When I EXPLAIN, though, I get an error from the executor:
"ERROR: both left and right operands are rel-vars".  I haven't looked
into that yet, but the results so far are encouraging enough to press on
and get this completed.

There was one hairy part, though, which will have to be addressed at some
later point:  Right now there is a boolean 'inh' in the RangeTblEntry
struct which indicates "inheritance requested".  When the inheritance root
is first expanded by expand_inherited_rtentry(), the rte->inh is
nulled in order to prevent expansion of an UPDATE/DELETE target.  This
presented problems for me when I wanted to detect which relation was an
inheritor one in order to expand it into the append path.  For my testing
purposes, I just commented out the line, but for a real
solution, that's not an acceptable solution and some struct might have to
be changed slightly in order to convey the inheritance knowledge..

So, I guess the next step is to see what the executor is complaining
about and see if it's something that would need attention in the executor
of if it's something I did wrong.. If everything appears to work after
that point, then I'll check for efficiency and use of cache in generating
the inner scan plans.

Thanks for the advice and historical perspective so far, Tom. It has been
quite helpful.

    -Aaron


pgsql-performance by date:

Previous
From: Dror Matalon
Date:
Subject: Speeding up Aggregates
Next
From: Rob Nagler
Date:
Subject: reindex/vacuum locking/performance?