Thread: Performance weirdness with/without vacuum analyze

Performance weirdness with/without vacuum analyze

From
Harry Broomhall
Date:
   It has been suggested to me that I resubmit this question to this list,
rather than the GENERAL list it was originaly sent to.

   I asked earlier about ways of doing an UPDATE involving a left outer
join and got some very useful feedback.

   This has thrown up a (to me) strange anomaly about the speed of such
an update.

   The input to this query is a fairly large (the example I'm working
with has 335,000 rows) set of records containing numbers to be looked
up in the lookup table.  This lookup table has 239 rows.

   I'm always reading the suggestion that doing a 'VACUUM ANALYZE' on a
database is 'A Good Thing' as it helps the planner to do the best thing, so
I arranged a vacuum analyze on the input records.

   Running the query takes about 13 mins or so.

   If, however I *don't* do an analyze, but leave the input table as
it was when imported the run takes about 2.5 mins!

   Looking at the output from 'explain' I can see that the main difference
in the way the planner does it is that it does a merge join in the non-analyze
case, and a hash join in the analyze case.

   Unfortunately I don't really know what this is implying, hence the call
for assistance.

   I have a file with all sorts of info about the problem (details of tables,
output of 'explain' etc) but as it is about 5K in size, and wide as well, I
didn't want to dump it in the list without any warning!

   However - it has been suggested that it should be OK to include this I have
now done so - hopefully with this message.

   Regards,
       Harry.


Attachment

Re: Performance weirdness with/without vacuum analyze

From
Josh Berkus
Date:
Harry,

>    It has been suggested to me that I resubmit this question to this list,
> rather than the GENERAL list it was originaly sent to.
>
>    I asked earlier about ways of doing an UPDATE involving a left outer
> join and got some very useful feedback.

The query you posted will always be somewhat slow due to the forced join
order, which is unavodable with a left outer join.

However, regarding your peculiar behaviour, please post:

1) Your random_page_cost and effective_cache_size settings
2) The EXPLAIN ANALYZE of each query instead of just the EXPLAIN

Thanks!

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Performance weirdness with/without vacuum analyze

From
Harry Broomhall
Date:
Josh Berkus writes:
> Harry,


   Many thanks for your response,

>
> >    It has been suggested to me that I resubmit this question to this list,
> > rather than the GENERAL list it was originaly sent to.
> >
> >    I asked earlier about ways of doing an UPDATE involving a left outer
> > join and got some very useful feedback.
>
> The query you posted will always be somewhat slow due to the forced join
> order, which is unavodable with a left outer join.

  Yes - I rather suspected that!  It is a shame it takes two joins to do
the work.

>
> However, regarding your peculiar behaviour, please post:
>
> 1) Your random_page_cost and effective_cache_size settings

#effective_cache_size = 1000    # typically 8KB each
#random_page_cost = 4       # units are one sequential page fetch cost

  i.e. - still set to their defaults.

> 2) The EXPLAIN ANALYZE of each query instead of just the EXPLAIN

  First the case with no vacuum analyze:

                                                                      QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=99.32..171.32 rows=1000 width=259) (actual time=18579.92..48277.69 rows=335671 loops=1)
   Merge Cond: ("outer".cdr_id = "inner".cdr_id)
   ->  Index Scan using import_cdrs_cdr_id_key on import_cdrs  (cost=0.00..52.00 rows=1000 width=164) (actual
time=0.42..11479.51rows=335671 loops=1) 
   ->  Sort  (cost=99.32..101.82 rows=1000 width=95) (actual time=18578.71..21155.65 rows=335671 loops=1)
         Sort Key: un.cdr_id
         ->  Hash Join  (cost=6.99..49.49 rows=1000 width=95) (actual time=4.70..10011.35 rows=335671 loops=1)
               Hash Cond: ("outer".interim_cli = "inner".interim_num)
               Join Filter: (("outer".starttime >= "inner".starttime) AND ("outer".starttime <= "inner".endtime))
               ->  Seq Scan on import_cdrs un  (cost=0.00..20.00 rows=1000 width=49) (actual time=0.02..4265.63
rows=335671loops=1) 
               ->  Hash  (cost=6.39..6.39 rows=239 width=46) (actual time=4.57..4.57 rows=0 loops=1)
                     ->  Seq Scan on num_xlate  (cost=0.00..6.39 rows=239 width=46) (actual time=0.12..2.77 rows=239
loops=1)
 Total runtime: 80408.42 msec
(12 rows)

  And now the case *with* the vacuum analyze:

                                                               QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=15335.91..49619.57 rows=335671 width=202) (actual time=12383.44..49297.58 rows=335671 loops=1)
   Hash Cond: ("outer".cdr_id = "inner".cdr_id)
   ->  Seq Scan on import_cdrs  (cost=0.00..8496.71 rows=335671 width=126) (actual time=0.15..9504.24 rows=335671
loops=1)
   ->  Hash  (cost=10398.73..10398.73 rows=335671 width=76) (actual time=12371.13..12371.13 rows=0 loops=1)
         ->  Hash Join  (cost=6.99..10398.73 rows=335671 width=76) (actual time=4.91..9412.55 rows=335671 loops=1)
               Hash Cond: ("outer".interim_cli = "inner".interim_num)
               Join Filter: (("outer".starttime >= "inner".starttime) AND ("outer".starttime <= "inner".endtime))
               ->  Seq Scan on import_cdrs un  (cost=0.00..8496.71 rows=335671 width=30) (actual time=0.09..3813.54
rows=335671loops=1) 
               ->  Hash  (cost=6.39..6.39 rows=239 width=46) (actual time=4.71..4.71 rows=0 loops=1)
                     ->  Seq Scan on num_xlate  (cost=0.00..6.39 rows=239 width=46) (actual time=0.22..2.90 rows=239
loops=1)
 Total runtime: 432543.73 msec
(11 rows)

   Please note that since I first posted I have been slightly adjusting the
schema of the tables, but the disparity remains.

   Many thanks for your assistance.

   Regards,
       Harry.


Re: Performance weirdness with/without vacuum analyze

From
Shridhar Daithankar
Date:
Harry Broomhall wrote:
 > #effective_cache_size = 1000    # typically 8KB each
 > #random_page_cost = 4       # units are one sequential page fetch cost

You must tune the first one at least. Try
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html to tune these
parameters.

 >>2) The EXPLAIN ANALYZE of each query instead of just the EXPLAIN
 >
 >
 >   First the case with no vacuum analyze:
 >
 >                                                                       QUERY PLAN
 >

-------------------------------------------------------------------------------------------------------------------------------------------------------
 >  Merge Join  (cost=99.32..171.32 rows=1000 width=259) (actual
time=18579.92..48277.69 rows=335671 loops=1)
 >    Merge Cond: ("outer".cdr_id = "inner".cdr_id)
 >    ->  Index Scan using import_cdrs_cdr_id_key on import_cdrs
(cost=0.00..52.00 rows=1000 width=164) (actual time=0.42..11479.51 rows=335671
loops=1)
 >    ->  Sort  (cost=99.32..101.82 rows=1000 width=95) (actual
time=18578.71..21155.65 rows=335671 loops=1)
 >          Sort Key: un.cdr_id
 >          ->  Hash Join  (cost=6.99..49.49 rows=1000 width=95) (actual
time=4.70..10011.35 rows=335671 loops=1)
 >                Hash Cond: ("outer".interim_cli = "inner".interim_num)
 >                Join Filter: (("outer".starttime >= "inner".starttime) AND
("outer".starttime <= "inner".endtime))
 >                ->  Seq Scan on import_cdrs un  (cost=0.00..20.00 rows=1000
width=49) (actual time=0.02..4265.63 rows=335671 loops=1)
 >                ->  Hash  (cost=6.39..6.39 rows=239 width=46) (actual
time=4.57..4.57 rows=0 loops=1)
 >                      ->  Seq Scan on num_xlate  (cost=0.00..6.39 rows=239
width=46) (actual time=0.12..2.77 rows=239 loops=1)
 >  Total runtime: 80408.42 msec
 > (12 rows)

You are lucky to get a better plan here because planner is way off w.r.t
estimated number of rows.
 >
 >   And now the case *with* the vacuum analyze:
 >
 >                                                                QUERY PLAN
 >

-----------------------------------------------------------------------------------------------------------------------------------------
 >  Hash Join  (cost=15335.91..49619.57 rows=335671 width=202) (actual
time=12383.44..49297.58 rows=335671 loops=1)
 >    Hash Cond: ("outer".cdr_id = "inner".cdr_id)
 >    ->  Seq Scan on import_cdrs  (cost=0.00..8496.71 rows=335671 width=126)
(actual time=0.15..9504.24 rows=335671 loops=1)
 >    ->  Hash  (cost=10398.73..10398.73 rows=335671 width=76) (actual
time=12371.13..12371.13 rows=0 loops=1)
 >          ->  Hash Join  (cost=6.99..10398.73 rows=335671 width=76) (actual
time=4.91..9412.55 rows=335671 loops=1)
 >                Hash Cond: ("outer".interim_cli = "inner".interim_num)
 >                Join Filter: (("outer".starttime >= "inner".starttime) AND
("outer".starttime <= "inner".endtime))
 >                ->  Seq Scan on import_cdrs un  (cost=0.00..8496.71
rows=335671 width=30) (actual time=0.09..3813.54 rows=335671 loops=1)
 >                ->  Hash  (cost=6.39..6.39 rows=239 width=46) (actual
time=4.71..4.71 rows=0 loops=1)
 >                      ->  Seq Scan on num_xlate  (cost=0.00..6.39 rows=239
width=46) (actual time=0.22..2.90 rows=239 loops=1)
 >  Total runtime: 432543.73 msec
 > (11 rows)
 >

What happens if you turn off hash joins? Also bump sort memory to something
good.. around 16MB and see what difference does it make to performance..

  Shridhar



Re: Performance weirdness with/without vacuum analyze

From
Harry Broomhall
Date:
Shridhar Daithankar writes:
> Harry Broomhall wrote:
>  > #effective_cache_size = 1000    # typically 8KB each
>  > #random_page_cost = 4       # units are one sequential page fetch cost
>
> You must tune the first one at least. Try
> http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html to tune these
> parameters.

   Wow.  Many thanks for the pointer.  I'm going to be spending some time
trying to get my head around all of that!

[SNIP]

>  >  Total runtime: 80408.42 msec
>  > (12 rows)
>
> You are lucky to get a better plan here because planner is way off w.r.t
> estimated number of rows.

   Yes!  I thought that.  Which was why I was so surprised at the difference.

>  >
>  >   And now the case *with* the vacuum analyze:
>  >
[SNIP]
>
> What happens if you turn off hash joins? Also bump sort memory to something
> good.. around 16MB and see what difference does it make to performance..



   Lots of things to try there.....


   It will probably take me some time  <grin>.

   Regards,
      Harry.


Re: Performance weirdness with/without vacuum analyze

From
Harry Broomhall
Date:
Shridhar Daithankar writes:

  First - many thanks for your suggestions and pointers to further info.

  I have been trying some of them with some interesting results!

> Harry Broomhall wrote:
>  > #effective_cache_size = 1000    # typically 8KB each
>  > #random_page_cost = 4       # units are one sequential page fetch cost
>
> You must tune the first one at least. Try
> http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html to tune these
> parameters.

  Changing effective_cache_size seemed to have very little effect.  I took it
in steps up to 300MB (the machine has 640MB memory), and the differences
in speed were less than 10%.

[SNIP]
>
> What happens if you turn off hash joins?

   This makes the non vacuum version about 40% slower, and the vacuum version
to the same speed (i.e. about 4X faster than it had been!).

> Also bump sort memory to something
> good.. around 16MB and see what difference does it make to performance..


  This was interesting.  Taking it to 10MB made a slight improvement.  Up to
20MB and the vacuum case improved by 5X speed, but the non-vacuum version
slowed down.  Putting it up to 40MB slowed both down again.

  I will need to test with some of the other scripts and functions I have
written, but it looks as if selective use of more sort memory will be
useful.

  Regards,
      Harry.


Re: Performance weirdness with/without vacuum analyze

From
Tom Lane
Date:
Harry Broomhall <harry.broomhall@uk.easynet.net> writes:

>    ->  Index Scan using import_cdrs_cdr_id_key on import_cdrs  (cost=0.00..52.00 rows=1000 width=164) (actual
time=0.42..11479.51rows=335671 loops=1) 

>    ->  Seq Scan on import_cdrs  (cost=0.00..8496.71 rows=335671 width=126) (actual time=0.15..9504.24 rows=335671
loops=1)

Hm.  The planner's default cost parameters assume that a full-table
index scan will be much slower than a full-table seq scan.  That's
evidently not the case in your test situation.  You could probably
bring the estimates more in line with reality (and thereby improve the
choice of plan) by reducing random_page_cost towards 1 and increasing
effective_cache_size to represent some realistic fraction of your
available RAM (though I concur with your observation that the
latter doesn't change the estimates all that much).

Beware however that test-case reality and production reality are not the
same thing.  You are evidently testing with tables that fit in RAM.
If your production tables will not, you'd better be wary of being overly
aggressive about reducing random_page_cost.  I believe the default value
(4.0) is fairly representative for situations where many actual disk
fetches are needed, ie, the tables are much larger than RAM.  1.0 would
be appropriate if all your tables are always fully cached in RAM (since
RAM has by definition no random-access penalty).  In intermediate cases
you need to select intermediate values.

            regards, tom lane