Thread: huge execution time difference with almost same plan

huge execution time difference with almost same plan

From
Tatsuo Ishii
Date:
Hi,

I've been playing with OSDL DBT-3 for a while and found very strange
phenomemon. The query in question is Q4:

select o_orderpriority, count(*) as order_count from orders where
o_orderdate >= date '1997-10-01' and o_orderdate < (date
'1997-10-01' + interval '3 month')::date and o_orderkey in (
select l_orderkey from lineitem where l_commitdate <
l_receiptdate ) group by o_orderpriority order by
o_orderpriority;

(I modified the original query a little bit so that it could use
i_o_orderdate index)

First I turn off enable_seqscan since old plan used seq scan on orders.
Sort  (cost=86050.12..86050.12 rows=1 width=19) (actual time=302476.572..302476.575 rows=5 loops=1)  Sort Key:
orders.o_orderpriority ->  HashAggregate  (cost=86050.10..86050.11 rows=1 width=19) (actual time=302476.525..302476.538
rows=5loops=1)        ->  Nested Loop IN Join  (cost=0.00..86010.61 rows=7898 width=19) (actual time=64.341..302136.493
rows=52544loops=1)              ->  Index Scan using orders_pkey on orders  (cost=0.00..65387.00 rows=53357 width=30)
(actualtime=44.687..36364.312 rows=57306 loops=1)                    Filter: ((o_orderdate >= '1997-10-01'::date) AND
(o_orderdate< '1998-01-01'::date))              ->  Index Scan using i_l_orderkey on lineitem  (cost=0.00..3.14 rows=3
width=11)(actual time=4.628..4.628 rows=1 loops=57306)                    Index Cond: ("outer".o_orderkey =
lineitem.l_orderkey)                   Filter: (l_commitdate < l_receiptdate)Total runtime: 302476.797 ms
 
(10 rows)

This took about 5 minutes. no so bad. (running on PostgreSQL 7.4.5
BTW)

Then I changed following variables.

enable_seqscan to on
effective_cache_size = 100000
random_page_cost = 1.5

This seems improve the query plan in the following part:

Index Cond: ((o_orderdate >= '1997-10-01'::date) AND (o_orderdate < '1998-01-01'::date))

However actual execution time was almost 6 times slower than above:
Sort  (cost=66763.68..66763.68 rows=1 width=19) (actual time=1915554.845..1915554.849 rows=5 loops=1)  Sort Key:
orders.o_orderpriority ->  HashAggregate  (cost=66763.66..66763.67 rows=1 width=19) (actual
time=1915554.797..1915554.811rows=5 loops=1)        ->  Nested Loop IN Join  (cost=0.00..66724.17 rows=7898 width=19)
(actualtime=277.150..1915137.592 rows=52544 loops=1)              ->  Index Scan using i_o_orderdate on orders
(cost=0.00..46100.56rows=53357 width=30) (actual time=215.502..650031.049 rows=57306 loops=1)                    Index
Cond:((o_orderdate >= '1997-10-01'::date) AND (o_orderdate < '1998-01-01'::date))              ->  Index Scan using
i_l_orderkeyon lineitem  (cost=0.00..3.14 rows=3 width=11) (actual time=22.064..22.064 rows=1 loops=57306)
     Index Cond: ("outer".o_orderkey = lineitem.l_orderkey)                    Filter: (l_commitdate <
l_receiptdate)Totalruntime: 1915555.070 ms
 
(10 rows)

Could anybody explain why? For me, these query plans are almost same,
and I don't understand how could that difference come from.
--
Tatsuo Ishii


Re: huge execution time difference with almost same plan

From
Hannu Krosing
Date:
On L, 2004-09-04 at 17:35, Tatsuo Ishii wrote:
> Hi,
> 
> I've been playing with OSDL DBT-3 for a while and found very strange
> phenomemon. The query in question is Q4:
> 
> select o_orderpriority, count(*) as order_count from orders where
> o_orderdate >= date '1997-10-01' and o_orderdate < (date
> '1997-10-01' + interval '3 month')::date and o_orderkey in (
> select l_orderkey from lineitem where l_commitdate <
> l_receiptdate ) group by o_orderpriority order by
> o_orderpriority;
> 
> (I modified the original query a little bit so that it could use
> i_o_orderdate index)
> 
> First I turn off enable_seqscan since old plan used seq scan on orders.

Did it run even faster when it used seq scan on orders ?

>  Sort  (cost=86050.12..86050.12 rows=1 width=19) (actual time=302476.572..302476.575 rows=5 loops=1)
>    Sort Key: orders.o_orderpriority
>    ->  HashAggregate  (cost=86050.10..86050.11 rows=1 width=19) (actual time=302476.525..302476.538 rows=5 loops=1)
>          ->  Nested Loop IN Join  (cost=0.00..86010.61 rows=7898 width=19) (actual time=64.341..302136.493 rows=52544
loops=1)
>                ->  Index Scan using orders_pkey on orders  (cost=0.00..65387.00 rows=53357 width=30) (actual
time=44.687..36364.312rows=57306 loops=1)
 
>                      Filter: ((o_orderdate >= '1997-10-01'::date) AND (o_orderdate < '1998-01-01'::date))
>                ->  Index Scan using i_l_orderkey on lineitem  (cost=0.00..3.14 rows=3 width=11) (actual
time=4.628..4.628rows=1 loops=57306)
 
>                      Index Cond: ("outer".o_orderkey = lineitem.l_orderkey)
>                      Filter: (l_commitdate < l_receiptdate)
>  Total runtime: 302476.797 ms
> (10 rows)
> 
> This took about 5 minutes. no so bad. (running on PostgreSQL 7.4.5
> BTW)
> 
> Then I changed following variables.
> 
> enable_seqscan to on
> effective_cache_size = 100000
> random_page_cost = 1.5
> 
> This seems improve the query plan in the following part:
> 
> Index Cond: ((o_orderdate >= '1997-10-01'::date) AND (o_orderdate < '1998-01-01'::date))
> 
> However actual execution time was almost 6 times slower than above:
> 
>  Sort  (cost=66763.68..66763.68 rows=1 width=19) (actual time=1915554.845..1915554.849 rows=5 loops=1)
>    Sort Key: orders.o_orderpriority
>    ->  HashAggregate  (cost=66763.66..66763.67 rows=1 width=19) (actual time=1915554.797..1915554.811 rows=5
loops=1)
>          ->  Nested Loop IN Join  (cost=0.00..66724.17 rows=7898 width=19) (actual time=277.150..1915137.592
rows=52544loops=1)
 
>                ->  Index Scan using i_o_orderdate on orders  (cost=0.00..46100.56 rows=53357 width=30) (actual
time=215.502..650031.049rows=57306 loops=1)
 
>                      Index Cond: ((o_orderdate >= '1997-10-01'::date) AND (o_orderdate < '1998-01-01'::date))
>                ->  Index Scan using i_l_orderkey on lineitem  (cost=0.00..3.14 rows=3 width=11) (actual
time=22.064..22.064rows=1 loops=57306)
 
>                      Index Cond: ("outer".o_orderkey = lineitem.l_orderkey)
>                      Filter: (l_commitdate < l_receiptdate)
>  Total runtime: 1915555.070 ms
> (10 rows)
> 
> Could anybody explain why? For me, these query plans are almost same,
> and I don't understand how could that difference come from.
> --
> Tatsuo Ishii
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
>                http://www.postgresql.org/docs/faqs/FAQ.html


Re: huge execution time difference with almost same plan

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> I've been playing with OSDL DBT-3 for a while and found very strange
> phenomemon.

What can you tell us about the physical ordering of the tables involved?
The only way I can think of to explain these results is that "orders"
is more or less physically in order by its primary key, and "lineitem"
is also more or less ordered by the orders primary key (which might or
might not be the same as l_orderkey; I forget the schema of that
database).

You have

>                ->  Index Scan using orders_pkey on orders  (cost=0.00..65387.00 rows=53357 width=30) (actual
time=44.687..36364.312rows=57306 loops=1)
 
>                      Filter: ((o_orderdate >= '1997-10-01'::date) AND (o_orderdate < '1998-01-01'::date))

versus

>                ->  Index Scan using i_o_orderdate on orders  (cost=0.00..46100.56 rows=53357 width=30) (actual
time=215.502..650031.049rows=57306 loops=1)
 
>                      Index Cond: ((o_orderdate >= '1997-10-01'::date) AND (o_orderdate < '1998-01-01'::date))

Now in the first case that's going to be a full-table scan because
there are no index constraints.  The second one actually is being
selective and so should visit less than all the table rows.  The
only way the first one can be twenty times faster is if it's doing
nearly sequential access to the table while the other one is jumping
all over the place.

(BTW, what happens if you take your thumb off the scales and allow
the planner to use a seqscan like it wanted to?  Seems like that
should be faster than a full-table index scan.)

The costs for the other table are striking too:

>                ->  Index Scan using i_l_orderkey on lineitem  (cost=0.00..3.14 rows=3 width=11) (actual
time=4.628..4.628rows=1 loops=57306)
 
>                      Index Cond: ("outer".o_orderkey = lineitem.l_orderkey)
>                      Filter: (l_commitdate < l_receiptdate)

versus

>                ->  Index Scan using i_l_orderkey on lineitem  (cost=0.00..3.14 rows=3 width=11) (actual
time=22.064..22.064rows=1 loops=57306)
 
>                      Index Cond: ("outer".o_orderkey = lineitem.l_orderkey)
>                      Filter: (l_commitdate < l_receiptdate)

These are the identical plan but the second case is five times slower
(which adds up to a lot, over 57000+ iterations).  Again, I think the
only possible explanation is that the first plan is benefiting from
locality of access while the second is not.  That would have to be due
to the order of the probes in the first case being well-matched to the
physical table order whereas in the second case not.

Can you look at the data and see whether the pkey is correlated to
physical order while the orderdate is not?  (That seems a bit weird
but it's the only explanation I can see.)

BTW, the planner is aware of physical order considerations; it would
not be smart enough to realize the correlation between physical order of
two different tables, but it would have accounted for this in its
preference of pkey over orderdate for scanning the orders table.
That's why you had to knock random_page_cost down so far in order
to mislead it into choosing the slower scan...
        regards, tom lane


Re: huge execution time difference with almost same plan

From
Tatsuo Ishii
Date:
Sorry for the delay. I was out of town last night.

> What can you tell us about the physical ordering of the tables involved?
> The only way I can think of to explain these results is that "orders"
> is more or less physically in order by its primary key, and "lineitem"
> is also more or less ordered by the orders primary key (which might or
> might not be the same as l_orderkey; I forget the schema of that
> database).

Yes, it seems orders table is in phsysically in order by
o_orderkey(primary key) and lineitem is in same order as o_orderkey.

BTW ere is the schema for orders and lineitem:
               Table "public.orders"    Column      |         Type          | Modifiers 
-----------------+-----------------------+-----------o_orderkey      | numeric(10,0)         | not nullo_custkey
|numeric(10,0)         | o_orderstatus   | character(1)          | o_totalprice    | numeric(12,2)         |
o_orderdate    | date                  | o_orderpriority | character(15)         | o_clerk         | character(15)
  | o_shippriority  | numeric(10,0)         | o_comment       | character varying(79) | 
 
Indexes:   "orders_pkey" primary key, btree (o_orderkey)   "i_o_custkey" btree (o_custkey)   "i_o_orderdate" btree
(o_orderdate)
              Table "public.lineitem"    Column      |         Type          | Modifiers 
-----------------+-----------------------+-----------l_orderkey      | numeric(10,0)         | not nulll_partkey
|numeric(10,0)         | l_suppkey       | numeric(10,0)         | l_linenumber    | numeric(10,0)         | not
nulll_quantity     | numeric(12,2)         | l_extendedprice | numeric(12,2)         | l_discount      | numeric(12,2)
      | l_tax           | numeric(12,2)         | l_returnflag    | character(1)          | l_linestatus    |
character(1)         | l_shipdate      | date                  | l_commitdate    | date                  |
l_receiptdate  | date                  | l_shipinstruct  | character(25)         | l_shipmode      | character(10)
  | l_comment       | character varying(44) | 
 
Indexes:   "lineitem_pkey" primary key, btree (l_orderkey, l_linenumber)   "i_l_orderkey" btree (l_orderkey)
"i_l_orderkey_quantity"btree (l_orderkey, l_quantity)   "i_l_partkey" btree (l_partkey)   "i_l_receiptdate" btree
(l_receiptdate)  "i_l_shipdate" btree (l_shipdate)   "i_l_suppkey" btree (l_suppkey)   "i_l_suppkey_partkey" btree
(l_partkey,l_suppkey)
 

> You have
> 
> >                ->  Index Scan using orders_pkey on orders  (cost=0.00..65387.00 rows=53357 width=30) (actual
time=44.687..36364.312rows=57306 loops=1)
 
> >                      Filter: ((o_orderdate >= '1997-10-01'::date) AND (o_orderdate < '1998-01-01'::date))
> 
> versus
> 
> >                ->  Index Scan using i_o_orderdate on orders  (cost=0.00..46100.56 rows=53357 width=30) (actual
time=215.502..650031.049rows=57306 loops=1)
 
> >                      Index Cond: ((o_orderdate >= '1997-10-01'::date) AND (o_orderdate < '1998-01-01'::date))
> 
> Now in the first case that's going to be a full-table scan because
> there are no index constraints.  The second one actually is being
> selective and so should visit less than all the table rows.  The
> only way the first one can be twenty times faster is if it's doing
> nearly sequential access to the table while the other one is jumping
> all over the place.
> 
> (BTW, what happens if you take your thumb off the scales and allow
> the planner to use a seqscan like it wanted to?  Seems like that
> should be faster than a full-table index scan.)

Yes, it was 278438.037 ms, about 7-8% faster than the 1st case.

> The costs for the other table are striking too:
> 
> >                ->  Index Scan using i_l_orderkey on lineitem  (cost=0.00..3.14 rows=3 width=11) (actual
time=4.628..4.628rows=1 loops=57306)
 
> >                      Index Cond: ("outer".o_orderkey = lineitem.l_orderkey)
> >                      Filter: (l_commitdate < l_receiptdate)
> 
> versus
> 
> >                ->  Index Scan using i_l_orderkey on lineitem  (cost=0.00..3.14 rows=3 width=11) (actual
time=22.064..22.064rows=1 loops=57306)
 
> >                      Index Cond: ("outer".o_orderkey = lineitem.l_orderkey)
> >                      Filter: (l_commitdate < l_receiptdate)
> 
> These are the identical plan but the second case is five times slower
> (which adds up to a lot, over 57000+ iterations).  Again, I think the
> only possible explanation is that the first plan is benefiting from
> locality of access while the second is not.  That would have to be due
> to the order of the probes in the first case being well-matched to the
> physical table order whereas in the second case not.
> 
> Can you look at the data and see whether the pkey is correlated to
> physical order while the orderdate is not?  (That seems a bit weird
> but it's the only explanation I can see.)

Your guess was correct. The pkey is physically in the same order as
orders, but o_orderdate seems in very random order.

> BTW, the planner is aware of physical order considerations; it would
> not be smart enough to realize the correlation between physical order of
> two different tables, but it would have accounted for this in its
> preference of pkey over orderdate for scanning the orders table.
> That's why you had to knock random_page_cost down so far in order
> to mislead it into choosing the slower scan...

I think statistics for orders shows too that your guess is correct.

DBT3=# select attname,correlation from pg_stats where tablename = 'orders';    attname     | correlation 
-----------------+-------------o_orderkey      |           1o_custkey       |    0.028476o_orderstatus   |
0.469662o_totalprice   |  -0.0439993o_orderdate     |  -0.0133034o_orderpriority |    0.197801o_clerk         |
-0.0169652o_shippriority |           1o_comment       |  0.00423846
 
(9 rows)

BTW, in 8.0 table spaces allow a table and an index to be placed on
different disk devices, which may improve the performance. Is it
possible that planner accounts for this in the future. Do existing
commercial DBMSs already do this?
--
Tatsuo Ishii


Re: huge execution time difference with almost same plan

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> BTW, in 8.0 table spaces allow a table and an index to be placed on
> different disk devices, which may improve the performance. Is it
> possible that planner accounts for this in the future.

I'm not sure how we could model it in any useful way ...
        regards, tom lane


Re: huge execution time difference with almost same plan

From
Gaetano Mendola
Date:
Tom Lane wrote:

> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> 
>>BTW, in 8.0 table spaces allow a table and an index to be placed on
>>different disk devices, which may improve the performance. Is it
>>possible that planner accounts for this in the future.
> 
> 
> I'm not sure how we could model it in any useful way ...
> 

Isn't enough have the related costs for table spaces ?


Regards
Gaetano Mendola