directory tree query with big planner variation - Mailing list pgsql-performance

From Axel Rau
Subject directory tree query with big planner variation
Date
Msg-id 79BB9B3D-BCE5-46AA-82DF-BB6B1D0FE2A7@Chaos1.DE
Whole thread Raw
Responses Re: directory tree query with big planner variation
List pgsql-performance
Hi group,

this is a directory tree query for a backup system (http://
sourceforge.net/projects/bacula).
You provide a path and get back the names of the children plus a
boolean telling if the child has itself children.
The "%@" stands for the initial path:
---------------------------------------------------------------
explain analyze  SELECT X.name AS name, COUNT(CH) > 1 AS children
   FROM
     ( SELECT  RTRIM( REPLACE( NLPC.path, '%@/', ''),'/') AS name,
                                                  FN.name AS CH
         FROM
           ( SELECT P.path,P.pathid
               FROM bacula.path P
               WHERE P.path ~ '^%@/[^/]*/$' ) AS NLPC
           LEFT OUTER JOIN bacula.file F
             ON
               NLPC.pathid = F.pathid
           LEFT OUTER JOIN bacula.filename FN
             ON
               F.filenameid = FN.filenameid
         GROUP BY NLPC.path, FN.name
       UNION
       SELECT  FN.name AS name, FN.name AS CH
         FROM
           bacula.path P, bacula.file F, bacula.filename FN
         WHERE
           P.path = '%@/'   AND
           P.pathid = F.pathid                           AND
           F.filenameid = FN.filenameid
     ) AS X
   WHERE X.name <> ''
   GROUP BY X.name
---------------------------------------------------------------
The 1st part of the union takes care of directories, the 2nd one of
flat files.
Application allows user navigation in a browser (clicking on a name
in one column triggers the query and fills the next browser column).
Initial path of "/Users/axel/Library/Preferences" results in:
---------------------------------------------------------------
  Sort  (cost=1295.24..1295.47 rows=92 width=64) (actual
time=818.987..819.871 rows=527 loops=1)
    Sort Key: upper(x.name)
    ->  GroupAggregate  (cost=1288.56..1292.24 rows=92 width=64)
(actual time=784.069..814.059 rows=527 loops=1)
          ->  Unique  (cost=1288.56..1289.25 rows=92 width=112)
(actual time=783.931..809.708 rows=684 loops=1)
                ->  Sort  (cost=1288.56..1288.79 rows=92 width=112)
(actual time=783.921..793.150 rows=5350 loops=1)
                      Sort Key: name, ch
                      ->  Append  (cost=642.03..1285.55 rows=92
width=112) (actual time=335.134..723.917 rows=5350 loops=1)
                            ->  Subquery Scan "*SELECT*
1"  (cost=642.03..643.18 rows=46 width=112) (actual
time=335.130..338.564 rows=184 loops=1)
                                  ->  HashAggregate
(cost=642.03..642.72 rows=46 width=112) (actual time=335.121..337.843
rows=184 loops=1)
                                        ->  Nested Loop Left Join
(cost=2.00..641.80 rows=46 width=112) (actual time=39.293..326.831
rows=1685 loops=1)
                                              ->  Nested Loop Left
Join  (cost=0.00..502.63 rows=46 width=97) (actual
time=21.026..202.206 rows=1685 loops=1)
                                                    ->  Index Scan
using path_name_idx on path p  (cost=0.00..3.02 rows=1 width=97)
(actual time=15.480..56.935 rows=27 loops=1)
                                                          Index Cond:
((path >= '/Users/axel/Library/Preferences/'::text) AND (path < '/
Users/axel/Library/Preferences0'::text))
                                                          Filter:
((path ~ '^/Users/axel/Library/Preferences/[^/]*/$'::text) AND (rtrim
("replace"(path, '/Users/axel/Library/Preferences/'::text, ''::text),
'/'::text) <> ''::text))
                                                    ->  Index Scan
using file_path_idx on file f  (cost=0.00..493.28 rows=506 width=8)
(actual time=0.473..5.119 rows=62 loops=27)
                                                          Index Cond:
("outer".pathid = f.pathid)
                                              ->  Bitmap Heap Scan on
filename fn  (cost=2.00..3.01 rows=1 width=23) (actual
time=0.058..0.061 rows=1 loops=1685)
                                                    Recheck Cond:
("outer".filenameid = fn.filenameid)
                                                    ->  Bitmap Index
Scan on filename_pkey  (cost=0.00..2.00 rows=1 width=0) (actual
time=0.030..0.030 rows=1 loops=1685)
                                                          Index Cond:
("outer".filenameid = fn.filenameid)
                            ->  Nested Loop  (cost=2.00..641.91
rows=46 width=19) (actual time=3.349..377.758 rows=5166 loops=1)
                                  ->  Nested Loop  (cost=0.00..502.62
rows=46 width=4) (actual time=3.118..97.375 rows=5200 loops=1)
                                        ->  Index Scan using
path_name_idx on path p  (cost=0.00..3.01 rows=1 width=4) (actual
time=0.045..0.052 rows=1 loops=1)
                                              Index Cond: (path = '/
Users/axel/Library/Preferences/'::text)
                                        ->  Index Scan using
file_path_idx on file f  (cost=0.00..493.28 rows=506 width=8) (actual
time=3.058..76.014 rows=5200 loops=1)
                                              Index Cond:
("outer".pathid = f.pathid)
                                  ->  Bitmap Heap Scan on filename
fn  (cost=2.00..3.02 rows=1 width=23) (actual time=0.037..0.039
rows=1 loops=5200)
                                        Recheck Cond:
("outer".filenameid = fn.filenameid)
                                        Filter: (name <> ''::text)
                                        ->  Bitmap Index Scan on
filename_pkey  (cost=0.00..2.00 rows=1 width=0) (actual
time=0.018..0.018 rows=1 loops=5200)
                                              Index Cond:
("outer".filenameid = fn.filenameid)
Total runtime: 832.458 ms
---------------------------------------------------------------
which is ok, but initial path of "/Users/axel" results in (which is
not ok):
---------------------------------------------------------------
  Sort  (cost=125533.67..125534.17 rows=200 width=64) (actual
time=84273.963..84274.260 rows=181 loops=1)
    Sort Key: upper(x.name)
    ->  GroupAggregate  (cost=123493.01..125526.03 rows=200 width=64)
(actual time=84263.411..84272.427 rows=181 loops=1)
          ->  Unique  (cost=123493.01..124169.51 rows=90201
width=112) (actual time=84263.215..84270.129 rows=522 loops=1)
                ->  Sort  (cost=123493.01..123718.51 rows=90201
width=112) (actual time=84263.208..84265.632 rows=1432 loops=1)
                      Sort Key: name, ch
                      ->  Append  (cost=113172.83..116069.08
rows=90201 width=112) (actual time=83795.274..84251.660 rows=1432
loops=1)
                            ->  Subquery Scan "*SELECT*
1"  (cost=113172.83..115426.71 rows=90155 width=112) (actual
time=83795.270..83803.996 rows=410 loops=1)
                                  ->  HashAggregate
(cost=113172.83..114525.16 rows=90155 width=112) (actual
time=83795.258..83802.369 rows=410 loops=1)
                                        ->  Hash Left Join
(cost=3124.38..112722.06 rows=90155 width=112) (actual
time=56254.547..83779.903 rows=3648 loops=1)
                                              Hash Cond:
("outer".filenameid = "inner".filenameid)
                                              ->  Merge Left Join
(cost=0.00..106216.87 rows=90155 width=97) (actual
time=54926.198..82430.621 rows=3648 loops=1)
                                                    Merge Cond:
("outer".pathid = "inner".pathid)
                                                    ->  Index Scan
using path_pkey on path p  (cost=0.00..2567.57 rows=1941 width=97)
(actual time=527.805..1521.911 rows=69 loops=1)
                                                          Filter:
((path ~ '^/Users/axel/[^/]*/$'::text) AND (rtrim("replace"(path, '/
Users/axel/'::text, ''::text), '/'::text) <> ''::text))
                                                    ->  Index Scan
using file_path_idx on file f  (cost=0.00..95191.99 rows=3020363
width=8) (actual time=17.561..74392.318 rows=2941790 loops=1)
                                              ->  Hash
(cost=2723.30..2723.30 rows=160430 width=23) (actual
time=1299.103..1299.103 rows=160430 loops=1)
                                                    ->  Seq Scan on
filename fn  (cost=0.00..2723.30 rows=160430 width=23) (actual
time=3.884..684.918 rows=160430 loops=1)
                            ->  Nested Loop  (cost=2.00..641.91
rows=46 width=19) (actual time=93.252..442.196 rows=1022 loops=1)
                                  ->  Nested Loop  (cost=0.00..502.62
rows=46 width=4) (actual time=49.375..209.694 rows=1050 loops=1)
                                        ->  Index Scan using
path_name_idx on path p  (cost=0.00..3.01 rows=1 width=4) (actual
time=29.455..29.462 rows=1 loops=1)
                                              Index Cond: (path = '/
Users/axel/'::text)
                                        ->  Index Scan using
file_path_idx on file f  (cost=0.00..493.28 rows=506 width=8) (actual
time=19.898..175.869 rows=1050 loops=1)
                                              Index Cond:
("outer".pathid = f.pathid)
                                  ->  Bitmap Heap Scan on filename
fn  (cost=2.00..3.02 rows=1 width=23) (actual time=0.206..0.208
rows=1 loops=1050)
                                        Recheck Cond:
("outer".filenameid = fn.filenameid)
                                        Filter: (name <> ''::text)
                                        ->  Bitmap Index Scan on
filename_pkey  (cost=0.00..2.00 rows=1 width=0) (actual
time=0.087..0.087 rows=1 loops=1050)
                                              Index Cond:
("outer".filenameid = fn.filenameid)
  Total runtime: 84295.927 ms
---------------------------------------------------------------
It happened once that the planner resolved the 2nd query with the 1st
plan, but this is not reproducible.
How can I avoid the 2nd plan?

This is 8.1.4 on OpenBSD 3.9 with 2x1GHz PIII and 2GB.
Axel
Axel Rau, ☀Frankfurt , Germany                       +49-69-951418-0



pgsql-performance by date:

Previous
From: Richard Huxton
Date:
Subject: Re: Query 200x slower on server [PART 2]
Next
From: Michael Stone
Date:
Subject: Re: directory tree query with big planner variation