Thread: Recursive query gets slower when adding an index

Recursive query gets slower when adding an index

From
Thomas Kellerer
Date:
Hi,

I have a self-referencing table that defines a hierarchy of projects and sub-projects.

This is the table definition:

CREATE TABLE project
(
    project_id    integer primary key,
    project_name  text,
    pl_name       text,
    parent_id     integer
);

ALTER TABLE project
   ADD CONSTRAINT project_parent_id_fkey FOREIGN KEY (parent_id)
   REFERENCES project (project_id)
   ON UPDATE NO ACTION
   ON DELETE NO ACTION;


The table contains ~11000 rows

The following statement:

with recursive project_tree as (
    select project_id,
           parent_id,
           pl_name      as root_pl,
           pl_name      as sub_pl,
           1 as lvl
      from project
     where parent_id is null
    union all
    select c.project_id,
           c.parent_id,
           coalesce(p.root_pl, c.pl_name) as root_pl,
           coalesce(c.pl_name, p.sub_pl)  as sub_pl,
           p.lvl + 1
      from project c
      join project_tree p on p.project_id = c.parent_id
)
select count(*), max(lvl)
   from project_tree
  where root_pl <> sub_pl;

usually runs in something like 60-80ms when the parent_id column is *not* indexed.

This is the execution plan without index: http://explain.depesz.com/s/ecCT

When I create an index on parent_id execution time increases to something between 110ms and 130ms

This is the execution plan with index: http://explain.depesz.com/s/xiL

As far as I can tell, the choice for the nested loop is the reason for the (slightly) slower execution.
I increased the statistics for the parent_id column to 10000 (and did an analyze of course) but that didn't change
anything.

I have no problem with that performance, so this is more a "I'm curious on why this happens" type of question.
(And I thought you might be interested in this behaviour as well)

My environment:

   *Windows 7 Professional 64bit
   * PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit


Regards
Thomas



Re: Recursive query gets slower when adding an index

From
Tom Lane
Date:
Thomas Kellerer <spam_eater@gmx.net> writes:
> This is the execution plan without index: http://explain.depesz.com/s/ecCT
> When I create an index on parent_id execution time increases to something between 110ms and 130ms
> This is the execution plan with index: http://explain.depesz.com/s/xiL

The reason you get a bad plan choice here is the severe underestimate of
the average number of rows coming out of the worktable scan (ie, the
size of the "recursive" result carried forward in each iteration).

Unfortunately, it's really hard to see how we might make that number
better.  The current rule of thumb is "10 times the size of the
nonrecursive term", which is why you get 10 here.  We could choose
another multiplier but it'd be just as bogus as the current one
(unless somebody has some evidence about typical expansion factors?)

I suppose though that there's some argument for discouraging the planner
from assuming that the carried-forward result is small; so maybe we
should use something larger than 10.

            regards, tom lane


Re: Recursive query gets slower when adding an index

From
Thomas Kellerer
Date:
Tom Lane wrote on 19.10.2012 16:20:
> Thomas Kellerer <spam_eater@gmx.net> writes:
>> This is the execution plan without index: http://explain.depesz.com/s/ecCT
>> When I create an index on parent_id execution time increases to something between 110ms and 130ms
>> This is the execution plan with index: http://explain.depesz.com/s/xiL
>
> The reason you get a bad plan choice here is the severe underestimate of
> the average number of rows coming out of the worktable scan (ie, the
> size of the "recursive" result carried forward in each iteration).
>
> Unfortunately, it's really hard to see how we might make that number
> better.  The current rule of thumb is "10 times the size of the
> nonrecursive term", which is why you get 10 here.  We could choose
> another multiplier but it'd be just as bogus as the current one
> (unless somebody has some evidence about typical expansion factors?)
>
> I suppose though that there's some argument for discouraging the planner
> from assuming that the carried-forward result is small; so maybe we
> should use something larger than 10.
>

Thanks for the feedback.

I just noticed this behaviour because we ran the same query on SQL Server 2008 and that took well over 30seconds
withoutthe index 
SQL Server *really* improved with the index and returned the result in 0.5 seconds whith the index in place.

So I was curious how much faster Postgres would be *with* the index ;)

Regards
Thomas