Thread: Planner problem

Planner problem

From
Mikael Kjellström
Date:
Hi!

I am having a bit of problem with the plan that the planner produces.

First a little info:

OS: Linux Debian
Version: PostgreSQL 7.4.2 on i686-pc-linux-gnu, compiled by GCC gcc GCC)
3.3.3 (Debian 20040401)
CPU: AMD Athlon XP 2000+
Memory: 1GB
Disk: 4 SCSI-3 UW 10000rpm in RAID 1 (mirror) mode

shared_mem: 20000
Sort mem: 8192
effective_cache_size: 80000


Row count for the tables involved:
priceavailable: 8564
product: 3
productCodeAlias: 7
locationIATA: 3402
locationConnection: 64
price: 4402


I have runned both vacuum full and vacuum analyze before I run my query
and shared_mem is set to 20000


Here is the result I am getting:

EXPLAIN ANALYZE
SELECT
   CASE WHEN 'D' = 'A' THEN price.priceArrival ELSE price.priceDeparture
END AS price,
   price.vatPercent AS vatPercent
FROM
   priceavailable pa,
   product product,
   productCodeAlias productAlias,
   locationiata la,
   locationconnection lc,
   price price
WHERE
   pa.direction IN('D', 'B')
   AND pa.productId = product.productId
   AND product.productId = productAlias.productId
   AND productAlias.productCode = 'TAXI'
   AND pa.locationConnectionId = lc.locationConnectionId
   AND lc.locationConnectionCode = 'RNB'
   AND pa.locationId = la.locationId
   AND la.iataCode = 'KB8'
   AND price.pricegroupId = pa.priceGroupId
   AND price.productId = product.productId
   AND '2004-06-01 05:30:00.000000+02' BETWEEN price.validstartdate AND
price.validstopdate
   AND product.organizationId = 1
   AND price.organizationId = product.organizationId
   AND pa.deletionDate IS NULL
   AND product.deletionDate IS NULL
   AND la.deletionDate IS NULL
   AND lc.deletionDate IS NULL
   AND price.deletionDate IS NULL
;

  QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=2.14..17.16 rows=1 width=8) (actual
time=3274.196..4620.299 rows=1 loops=1)
    ->  Nested Loop  (cost=2.14..13.49 rows=1 width=12) (actual
time=2643.441..4617.246 rows=47 loops=1)
          Join Filter: (("outer".productid = "inner".productid) AND
("inner".pricegroupid = "outer".pricegroupid))
          ->  Nested Loop  (cost=0.00..7.72 rows=1 width=12) (actual
time=0.238..5.455 rows=111 loops=1)
                ->  Seq Scan on locationconnection lc  (cost=0.00..1.80
rows=1 width=4) (actual time=0.153..0.245 rows=1 loops=1)
                      Filter: ((locationconnectioncode = 'RNB'::text)
AND (deletiondate IS NULL))
                ->  Index Scan using priceavailableix1 on priceavailable
pa  (cost=0.00..5.91 rows=1 width=16) (actual time=0.067..4.182 rows=111
loops=1)
                      Index Cond: (pa.locationconnectionid =
"outer".locationconnectionid)
                      Filter: (((direction = 'D'::text) OR (direction =
'B'::text)) AND (deletiondate IS NULL))
          ->  Hash Join  (cost=2.14..5.74 rows=2 width=24) (actual
time=0.058..39.116 rows=1243 loops=111)
                Hash Cond: ("outer".productid = "inner".productid)
                ->  Index Scan using priceix2 on price  (cost=0.00..3.41
rows=23 width=20) (actual time=0.042..25.811 rows=4402 loops=111)
                      Index Cond: ('2004-06-01'::date <= validstopdate)
                      Filter: (('2004-06-01'::date >= validstartdate)
AND (deletiondate IS NULL) AND (organizationid = 1))
                ->  Hash  (cost=2.14..2.14 rows=1 width=12) (actual
time=0.132..0.132 rows=0 loops=1)
                      ->  Nested Loop  (cost=0.00..2.14 rows=1 width=12)
(actual time=0.088..0.123 rows=1 loops=1)
                            Join Filter: ("outer".productid =
"inner".productid)
                            ->  Seq Scan on product  (cost=0.00..1.04
rows=1 width=8) (actual time=0.013..0.022 rows=3 loops=1)
                                  Filter: ((organizationid = 1) AND
(deletiondate IS NULL))
                            ->  Seq Scan on productcodealias
productalias  (cost=0.00..1.09 rows=1 width=4) (actual time=0.022..0.024
rows=1 loops=3)
                                  Filter: (productcode = 'TAXI'::text)
    ->  Index Scan using locationiataix4 on locationiata la
(cost=0.00..3.66 rows=1 width=4) (actual time=0.052..0.052 rows=0 loops=47)
          Index Cond: ("outer".locationid = la.locationid)
          Filter: ((iatacode = 'KB8'::text) AND (deletiondate IS NULL))
  Total runtime: 4620.852 ms


If I do an "set enable_nestloop = 0" and run the exact same question I get:

      QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=150.55..154.20 rows=1 width=8) (actual
time=68.095..103.006 rows=1 loops=1)
    Hash Cond: ("outer".locationid = "inner".locationid)
    ->  Hash Join  (cost=145.65..149.29 rows=1 width=12) (actual
time=66.304..102.709 rows=47 loops=1)
          Hash Cond: (("outer".productid = "inner".productid) AND
("outer".pricegroupid = "inner".pricegroupid))
          ->  Hash Join  (cost=2.15..5.75 rows=2 width=24) (actual
time=0.343..38.844 rows=1243 loops=1)
                Hash Cond: ("outer".productid = "inner".productid)
                ->  Index Scan using priceix2 on price  (cost=0.00..3.41
rows=23 width=20) (actual time=0.053..26.225 rows=4402 loops=1)
                      Index Cond: ('2004-06-01'::date <= validstopdate)
                      Filter: (('2004-06-01'::date >= validstartdate)
AND (deletiondate IS NULL) AND (organizationid = 1))
                ->  Hash  (cost=2.14..2.14 rows=1 width=12) (actual
time=0.191..0.191 rows=0 loops=1)
                      ->  Hash Join  (cost=1.09..2.14 rows=1 width=12)
(actual time=0.174..0.184 rows=1 loops=1)
                            Hash Cond: ("outer".productid =
"inner".productid)
                            ->  Seq Scan on product  (cost=0.00..1.04
rows=1 width=8) (actual time=0.017..0.025 rows=3 loops=1)
                                  Filter: ((organizationid = 1) AND
(deletiondate IS NULL))
                            ->  Hash  (cost=1.09..1.09 rows=1 width=4)
(actual time=0.073..0.073 rows=0 loops=1)
                                  ->  Seq Scan on productcodealias
productalias  (cost=0.00..1.09 rows=1 width=4) (actual time=0.042..0.045
rows=1 loops=1)
                                        Filter: (productcode = 'TAXI'::text)
          ->  Hash  (cost=143.50..143.50 rows=1 width=12) (actual
time=61.650..61.650 rows=0 loops=1)
                ->  Merge Join  (cost=1.81..143.50 rows=1 width=12)
(actual time=59.914..61.419 rows=111 loops=1)
                      Merge Cond: ("outer".locationconnectionid =
"inner".locationconnectionid)
                      ->  Index Scan using priceavailableix1 on
priceavailable pa  (cost=0.00..141.57 rows=43 width=16) (actual
time=0.048..48.739 rows=6525 loops=1)
                            Filter: (((direction = 'D'::text) OR
(direction = 'B'::text)) AND (deletiondate IS NULL))
                      ->  Sort  (cost=1.81..1.81 rows=1 width=4) (actual
time=0.215..0.290 rows=1 loops=1)
                            Sort Key: lc.locationconnectionid
                            ->  Seq Scan on locationconnection lc
(cost=0.00..1.80 rows=1 width=4) (actual time=0.125..0.194 rows=1 loops=1)
                                  Filter: ((locationconnectioncode =
'RNB'::text) AND (deletiondate IS NULL))
    ->  Hash  (cost=4.89..4.89 rows=1 width=4) (actual time=0.122..0.122
rows=0 loops=1)
          ->  Index Scan using locationiataix1 on locationiata la
(cost=0.00..4.89 rows=1 width=4) (actual time=0.108..0.112 rows=1 loops=1)
                Index Cond: (iatacode = 'KB8'::text)
                Filter: (deletiondate IS NULL)
Total runtime: 103.563 ms

Which is a lot faster than the original question.

If I then enable nestloop again and lower the geqo_threshold from 11
(default) to 5 and run the same query again I get:


           QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=7.72..17.19 rows=1 width=8) (actual
time=9.752..42.278 rows=1 loops=1)
    ->  Nested Loop  (cost=7.72..13.52 rows=1 width=12) (actual
time=5.698..41.079 rows=47 loops=1)
          Join Filter: ("outer".productid = "inner".productid)
          ->  Nested Loop  (cost=7.72..12.47 rows=1 width=28) (actual
time=5.663..39.992 rows=47 loops=1)
                Join Filter: ("inner".productid = "outer".productid)
                ->  Hash Join  (cost=7.72..11.37 rows=1 width=24)
(actual time=5.470..36.848 rows=111 loops=1)
                      Hash Cond: (("outer".pricegroupid =
"inner".pricegroupid) AND ("outer".productid = "inner".productid))
                      ->  Index Scan using priceix2 on price
(cost=0.00..3.41 rows=23 width=20) (actual time=0.050..26.475 rows=4402
loops=1)
                            Index Cond: ('2004-06-01'::date <=
validstopdate)
                            Filter: (('2004-06-01'::date >=
validstartdate) AND (deletiondate IS NULL) AND (organizationid = 1))
                      ->  Hash  (cost=7.72..7.72 rows=1 width=12)
(actual time=1.809..1.809 rows=0 loops=1)
                            ->  Nested Loop  (cost=0.00..7.72 rows=1
width=12) (actual time=0.253..1.587 rows=111 loops=1)
                                  ->  Seq Scan on locationconnection lc
  (cost=0.00..1.80 rows=1 width=4) (actual time=0.163..0.234 rows=1 loops=1)
                                        Filter: ((locationconnectioncode
= 'RNB'::text) AND (deletiondate IS NULL))
                                  ->  Index Scan using priceavailableix1
on priceavailable pa  (cost=0.00..5.91 rows=1 width=16) (actual
time=0.070..0.965 rows=111 loops=1)
                                        Index Cond:
(pa.locationconnectionid = "outer".locationconnectionid)
                                        Filter: (((direction =
'D'::text) OR (direction = 'B'::text)) AND (deletiondate IS NULL))
                ->  Seq Scan on productcodealias productalias
(cost=0.00..1.09 rows=1 width=4) (actual time=0.020..0.022 rows=1 loops=111)
                      Filter: (productcode = 'TAXI'::text)
          ->  Seq Scan on product  (cost=0.00..1.04 rows=1 width=8)
(actual time=0.005..0.013 rows=3 loops=47)
                Filter: ((organizationid = 1) AND (deletiondate IS NULL))
    ->  Index Scan using locationiataix4 on locationiata la
(cost=0.00..3.66 rows=1 width=4) (actual time=0.022..0.022 rows=0 loops=47)
          Index Cond: ("outer".locationid = la.locationid)
          Filter: ((iatacode = 'KB8'::text) AND (deletiondate IS NULL))
Total runtime: 42.852 ms

Which is even faster than disable:ing nestloop.

So my question is why is the planner making such a bad choice and how
can I make it choose a better plan?


/Mikael


Re: Planner problem

From
Tom Lane
Date:
=?ISO-8859-1?Q?Mikael_Kjellstr=F6m?= <mikael.kjellstrom@mksoft.nu> writes:
> I am having a bit of problem with the plan that the planner produces.

Actually, your problem is with the row-count estimates.  Some of them
are pretty wildly off, which inevitably leads to bad plan choices.
In particular the price row estimate is off by a factor of 200 in all
three plans:

>   ->  Index Scan using priceix2 on price  (cost=0.00..3.41 rows=23 width=20) (actual time=0.042..25.811 rows=4402
loops=111)
>         Index Cond: ('2004-06-01'::date <= validstopdate)
>         Filter: (('2004-06-01'::date >= validstartdate) AND (deletiondate IS NULL) AND (organizationid = 1))

>   ->  Index Scan using priceix2 on price  (cost=0.00..3.41 rows=23 width=20) (actual time=0.053..26.225 rows=4402
loops=1)
>         Index Cond: ('2004-06-01'::date <= validstopdate)
>         Filter: (('2004-06-01'::date >= validstartdate) AND (deletiondate IS NULL) AND (organizationid = 1))

>         ->  Index Scan using priceix2 on price (cost=0.00..3.41 rows=23 width=20) (actual time=0.050..26.475
rows=4402loops=1) 
>               Index Cond: ('2004-06-01'::date <= validstopdate)
>               Filter: (('2004-06-01'::date >= validstartdate) AND (deletiondate IS NULL) AND (organizationid = 1))

and priceavailable is off by a factor of 100:

>   ->  Index Scan using priceavailableix1 on priceavailable pa  (cost=0.00..5.91 rows=1 width=16) (actual
time=0.067..4.182rows=111 loops=1) 
>         Index Cond: (pa.locationconnectionid = "outer".locationconnectionid)
>         Filter: (((direction = 'D'::text) OR (direction = 'B'::text)) AND (deletiondate IS NULL))

>         ->  Index Scan using priceavailableix1 on priceavailable pa  (cost=0.00..141.57 rows=43 width=16) (actual
time=0.048..48.739rows=6525 loops=1) 
>               Filter: (((direction = 'D'::text) OR (direction = 'B'::text)) AND (deletiondate IS NULL))

>       ->  Index Scan using priceavailableix1 on priceavailable pa  (cost=0.00..5.91 rows=1 width=16) (actual
time=0.070..0.965rows=111 loops=1) 
>             Index Cond: (pa.locationconnectionid = "outer".locationconnectionid)
>             Filter: (((direction = 'D'::text) OR (direction = 'B'::text)) AND (deletiondate IS NULL))

Are you sure you've vacuum analyzed these two tables recently?  If so,
what may be needed is to increase ANALYZE's statistics target for
the columns used in the conditions.  (See ALTER TABLE SET STATISTICS)

I suspect that part of the story here has to do with cross-column
correlations, which the present planner will never figure out since it
has no cross-column statistics.  But it's hard to believe that that's
the problem for cases as simple as

>               Filter: (((direction = 'D'::text) OR (direction = 'B'::text)) AND (deletiondate IS NULL))

            regards, tom lane