Thread: Recursive query gets slower when adding an index
Hi, I have a self-referencing table that defines a hierarchy of projects and sub-projects. This is the table definition: CREATE TABLE project ( project_id integer primary key, project_name text, pl_name text, parent_id integer ); ALTER TABLE project ADD CONSTRAINT project_parent_id_fkey FOREIGN KEY (parent_id) REFERENCES project (project_id) ON UPDATE NO ACTION ON DELETE NO ACTION; The table contains ~11000 rows The following statement: with recursive project_tree as ( select project_id, parent_id, pl_name as root_pl, pl_name as sub_pl, 1 as lvl from project where parent_id is null union all select c.project_id, c.parent_id, coalesce(p.root_pl, c.pl_name) as root_pl, coalesce(c.pl_name, p.sub_pl) as sub_pl, p.lvl + 1 from project c join project_tree p on p.project_id = c.parent_id ) select count(*), max(lvl) from project_tree where root_pl <> sub_pl; usually runs in something like 60-80ms when the parent_id column is *not* indexed. This is the execution plan without index: http://explain.depesz.com/s/ecCT When I create an index on parent_id execution time increases to something between 110ms and 130ms This is the execution plan with index: http://explain.depesz.com/s/xiL As far as I can tell, the choice for the nested loop is the reason for the (slightly) slower execution. I increased the statistics for the parent_id column to 10000 (and did an analyze of course) but that didn't change anything. I have no problem with that performance, so this is more a "I'm curious on why this happens" type of question. (And I thought you might be interested in this behaviour as well) My environment: *Windows 7 Professional 64bit * PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit Regards Thomas
Thomas Kellerer <spam_eater@gmx.net> writes: > This is the execution plan without index: http://explain.depesz.com/s/ecCT > When I create an index on parent_id execution time increases to something between 110ms and 130ms > This is the execution plan with index: http://explain.depesz.com/s/xiL The reason you get a bad plan choice here is the severe underestimate of the average number of rows coming out of the worktable scan (ie, the size of the "recursive" result carried forward in each iteration). Unfortunately, it's really hard to see how we might make that number better. The current rule of thumb is "10 times the size of the nonrecursive term", which is why you get 10 here. We could choose another multiplier but it'd be just as bogus as the current one (unless somebody has some evidence about typical expansion factors?) I suppose though that there's some argument for discouraging the planner from assuming that the carried-forward result is small; so maybe we should use something larger than 10. regards, tom lane
Tom Lane wrote on 19.10.2012 16:20: > Thomas Kellerer <spam_eater@gmx.net> writes: >> This is the execution plan without index: http://explain.depesz.com/s/ecCT >> When I create an index on parent_id execution time increases to something between 110ms and 130ms >> This is the execution plan with index: http://explain.depesz.com/s/xiL > > The reason you get a bad plan choice here is the severe underestimate of > the average number of rows coming out of the worktable scan (ie, the > size of the "recursive" result carried forward in each iteration). > > Unfortunately, it's really hard to see how we might make that number > better. The current rule of thumb is "10 times the size of the > nonrecursive term", which is why you get 10 here. We could choose > another multiplier but it'd be just as bogus as the current one > (unless somebody has some evidence about typical expansion factors?) > > I suppose though that there's some argument for discouraging the planner > from assuming that the carried-forward result is small; so maybe we > should use something larger than 10. > Thanks for the feedback. I just noticed this behaviour because we ran the same query on SQL Server 2008 and that took well over 30seconds withoutthe index SQL Server *really* improved with the index and returned the result in 0.5 seconds whith the index in place. So I was curious how much faster Postgres would be *with* the index ;) Regards Thomas