Thread: Problem with inherited tables vs query planning

Problem with inherited tables vs query planning

From
"Dave Golombek"
Date:
I have a query which runs a join between a table with multiple child tables
and a second table. The planner is doing a sequential scan of the children,
then doing the join on the result, despite having indices that are relevant.
If I do the join between any of the children and the second table, the index
is used. Note that the planner does well when it doesn't have to deal with
the join -- it handles the child tables (and their indices) fine normally.

I think the simple example below demonstrates the problem I'm having:

create table base (file integer, data integer);
create table child_0 () inherits (base);
create table child_1 () inherits (base);
create index child_0_file_index on child_0 using btree (file);
create index child_1_file_index on child_1 using btree (file);
create table other (file integer, stuff integer);
analyze;
<insert lots of data here>

testing=> explain SELECT * from base join other using (file) where stuff =
1;
                                 QUERY PLAN
----------------------------------------------------------------------------
 Hash Join  (cost=34.27..285.26 rows=597 width=12)
   Hash Cond: ("outer".file = "inner".file)
   ->  Append  (cost=0.00..185.34 rows=11934 width=8)
         ->  Seq Scan on base  (cost=0.00..29.40 rows=1940 width=8)
         ->  Seq Scan on child_0 base  (cost=0.00..77.98 rows=4998 width=8)
         ->  Seq Scan on child_1 base  (cost=0.00..77.96 rows=4996 width=8)
   ->  Hash  (cost=34.25..34.25 rows=10 width=8)
         ->  Seq Scan on other  (cost=0.00..34.25 rows=10 width=8)
               Filter: (stuff = 1)


testing=> explain SELECT * from child_1 join other using (file) where stuff
= 1;
                                       QUERY PLAN
----------------------------------------------------------------------------
------------
 Nested Loop  (cost=0.00..64.51 rows=10 width=12)
   ->  Seq Scan on other  (cost=0.00..34.25 rows=10 width=8)
         Filter: (stuff = 1)
   ->  Index Scan using child_1_file_index on child_1  (cost=0.00..3.01
rows=1 width=8)
         Index Cond: (child_1.file = "outer".file)


Is there a way I can reformulate the query to help the planner use the
indices? I can clearly write a plpgsql function to run the query over all
child tables separately, but was hoping to learn more.

I'm running:
PostgreSQL 8.1.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.3
20041212 (Red Hat 3.4.3-9.EL4)

I also tried this on 8.1.9 and 8.2.4, same results.

Thanks,
Dave Golombek
Senior Software Engineer
Black Duck Software, Inc.
http://www.blackducksoftware.com



Re: Problem with inherited tables vs query planning

From
Tom Lane
Date:
"Dave Golombek" <daveg@blackducksoftware.com> writes:
> Is there a way I can reformulate the query to help the planner use the
> indices?

Use 8.2.  Also put an index on the base table, not only the children ---
the forced seqscan on the base weighs down the cost estimate for the
plan you would like to have.  (With sufficiently large child tables,
that might not matter, but it sure does for this toy example.)

            regards, tom lane

Re: Problem with inherited tables vs query planning

From
Richard Huxton
Date:
Dave Golombek wrote:
>
> create table base (file integer, data integer);
> create table child_0 () inherits (base);
> create table child_1 () inherits (base);
> create index child_0_file_index on child_0 using btree (file);
> create index child_1_file_index on child_1 using btree (file);
> create table other (file integer, stuff integer);
> analyze;
> <insert lots of data here>
>
> testing=> explain SELECT * from base join other using (file) where stuff =
> 1;
>                                  QUERY PLAN
> ----------------------------------------------------------------------------
>  Hash Join  (cost=34.27..285.26 rows=597 width=12)
>    Hash Cond: ("outer".file = "inner".file)
>    ->  Append  (cost=0.00..185.34 rows=11934 width=8)
>          ->  Seq Scan on base  (cost=0.00..29.40 rows=1940 width=8)
>          ->  Seq Scan on child_0 base  (cost=0.00..77.98 rows=4998 width=8)
>          ->  Seq Scan on child_1 base  (cost=0.00..77.96 rows=4996 width=8)
                                                            ^^^^^^^^^
Why does it think it's going to match almost 5000 rows here? You don't
say how many rows your test table has, but when I tried to reproduce it
with 10,000 rows (see sql below) it used the child_x indexes. Is
"stuff=1" particularly non-selective in your test?

SQL: INSERT INTO child_1 SELECT g, round(g/2) FROM (SELECT
generate_series(1,10000) as g) as foo;

>    ->  Hash  (cost=34.25..34.25 rows=10 width=8)
>          ->  Seq Scan on other  (cost=0.00..34.25 rows=10 width=8)
>                Filter: (stuff = 1)


--
   Richard Huxton
   Archonet Ltd

Re: Problem with inherited tables vs query planning

From
"Dave Golombek"
Date:
Tom Lane writes:
> "Dave Golombek" <daveg@blackducksoftware.com> writes:
> > Is there a way I can reformulate the query to help the planner use the
> > indices?
>
> Use 8.2.  Also put an index on the base table, not only the children ---
> the forced seqscan on the base weighs down the cost estimate for the
> plan you would like to have.  (With sufficiently large child tables,
> that might not matter, but it sure does for this toy example.)

Ah, I forgot to try the index on the base table using 8.2, which does indeed
solve the problem. It unfortunately doesn't help with 8.1.4, which we have
in the field; any thoughts on workarounds for older versions or should I
just use a function until we can upgrade everywhere? We have 150 million
rows spread across 16 child tables, which should help cost estimation.

Thanks,
Dave



Re: Problem with inherited tables vs query planning

From
Tom Lane
Date:
"Dave Golombek" <daveg@blackducksoftware.com> writes:
> Ah, I forgot to try the index on the base table using 8.2, which does indeed
> solve the problem. It unfortunately doesn't help with 8.1.4, which we have
> in the field; any thoughts on workarounds for older versions or should I
> just use a function until we can upgrade everywhere? We have 150 million
> rows spread across 16 child tables, which should help cost estimation.

Sorry, 8.1 is just not capable of generating that type of plan --- it
won't push join clauses down into an Append.

            regards, tom lane