Thread:

From

"Tyrrill, Ed"

Date:

25 June 2007, 18:29:06

Hey All,

I am testing upgrading our database from version 8.1 to 8.2.  I ran our
worst performing query on this table, an outer join with an "is null"
condition, and I was happy to see it ran over four times faster.  I also
noticed the explain analyze showed the planner chose to do sequential
scans on both tables.  I realized I had forgotten to increase
default_statistics_target from the default 10, so I increased it to 100,
and ran "analyze".  In 8.1 this sped things up significantly, but in 8.2
when I ran the query again it was actually slower.  These tests were
done with 8.2.3.1 so I also loaded 8.2.4.1 for comparison.  Here is the
explain analyze with default_statistics_target set to 10 on 8.2.3.1:

mdsdb=# explain analyze select backupobjects.record_id from
backupobjects left outer join backup_location using (record_id) where
backup_location.record_id is null;
                                                                   QUERY
PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
 Hash Left Join  (cost=7259801.24..12768737.71 rows=1 width=8) (actual
time=651003.121..1312717.249 rows=10411 loops=1)
   Hash Cond: (backupobjects.record_id = backup_location.record_id)
   Filter: (backup_location.record_id IS NULL)
   ->  Seq Scan on backupobjects  (cost=0.00..466835.63 rows=13716963
width=8) (actual time=0.030..95981.895 rows=13706121 loops=1)
   ->  Hash  (cost=3520915.44..3520915.44 rows=215090944 width=8)
(actual time=527345.024..527345.024 rows=215090786 loops=1)
         ->  Seq Scan on backup_location  (cost=0.00..3520915.44
rows=215090944 width=8) (actual time=0.048..333944.886 rows=215090786
loops=1)
 Total runtime: 1312727.200 ms

And again with default_statistics_target set to 100 on 8.2.3.1:

mdsdb=# explain analyze select backupobjects.record_id from
backupobjects left outer join backup_location using (record_id) where
backup_location.record_id is null;
                                                                   QUERY
PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
 Merge Left Join  (cost=38173940.88..41468823.19 rows=1 width=8) (actual
time=3256548.988..4299922.345 rows=10411 loops=1)
   Merge Cond: (backupobjects.record_id = backup_location.record_id)
   Filter: (backup_location.record_id IS NULL)
   ->  Sort  (cost=2258416.72..2292675.79 rows=13703629 width=8) (actual
time=74450.897..85651.707 rows=13706121 loops=1)
         Sort Key: backupobjects.record_id
         ->  Seq Scan on backupobjects  (cost=0.00..466702.29
rows=13703629 width=8) (actual time=0.024..40939.762 rows=13706121
loops=1)
   ->  Sort  (cost=35915524.17..36453251.53 rows=215090944 width=8)
(actual time=3182075.661..4094748.788 rows=215090786 loops=1)
         Sort Key: backup_location.record_id
         ->  Seq Scan on backup_location  (cost=0.00..3520915.44
rows=215090944 width=8) (actual time=17.905..790499.303 rows=215090786
loops=1)
 Total runtime: 4302591.325 ms

With 8.2.4.1 I get the same plan and performance with
default_statistics_target set to either 10 or 100:

mdsdb=# explain analyze select backupobjects.record_id from
backupobjects left outer join backup_location using (record_id) where
backup_location.record_id is null;
                                                                   QUERY
PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
 Merge Left Join  (cost=37615266.46..40910145.54 rows=1 width=8) (actual
time=2765729.582..3768519.658 rows=10411 loops=1)
   Merge Cond: (backupobjects.record_id = backup_location.record_id)
   Filter: (backup_location.record_id IS NULL)
   ->  Sort  (cost=2224866.79..2259124.25 rows=13702985 width=8) (actual
time=101118.216..113245.942 rows=13706121 loops=1)
         Sort Key: backupobjects.record_id
         ->  Seq Scan on backupobjects  (cost=0.00..466695.85
rows=13702985 width=8) (actual time=10.003..67604.564 rows=13706121
loops=1)
   ->  Sort  (cost=35390399.67..35928127.03 rows=215090944 width=8)
(actual time=2664596.049..3540048.500 rows=215090786 loops=1)
         Sort Key: backup_location.record_id
         ->  Seq Scan on backup_location  (cost=0.00..3520915.44
rows=215090944 width=8) (actual time=7.110..246561.900 rows=215090786
loops=1)
 Total runtime: 3770428.750 ms

And for reference here is the same query with 8.1.5.6:

mdsdb=# explain analyze select backupobjects.record_id from
backupobjects left outer join backup_location using (record_id) where
backup_location.record_id is null;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
-----------------
 Merge Left Join  (cost=37490897.67..41269533.13 rows=13705356 width=8)
(actual time=5096492.430..6588745.386 rows=10411 loops=1)
   Merge Cond: ("outer".record_id = "inner".record_id)
   Filter: ("inner".record_id IS NULL)
   ->  Index Scan using backupobjects_pkey on backupobjects
(cost=0.00..518007.92 rows=13705356 width=8) (actual
time=32.020..404517.133 rows=13706121 loops=1)
   ->  Sort  (cost=37490897.67..38028625.03 rows=215090944 width=8)
(actual time=5096460.396..6058937.259 rows=215090786 loops=1)
         Sort Key: backup_location.record_id
         ->  Seq Scan on backup_location  (cost=0.00..3520915.44
rows=215090944 width=8) (actual time=0.020..389038.442 rows=215090786
loops=1)
 Total runtime: 6599215.268 ms
(8 rows)

Based on all this we will be going with 8.2.4.1, but it seems like
currently the query planner isn't choosing the best plan for this case.

Thanks,
Ed

Re:

From

Stephen Frost

Date:

25 June 2007, 18:58:32

* Tyrrill, Ed (tyrrill_ed@emc.com) wrote:
> Based on all this we will be going with 8.2.4.1, but it seems like
> currently the query planner isn't choosing the best plan for this case.

Was the 'work_mem' set to the same thing on all these runs?  Also, you
might try increasing the 'work_mem' under 8.2.4, at least for this query
(you can set it by just doing: set work_mem = '2GB'; or similar in psql,
or you can change the default in postgresql.conf).

The big thing of note, it seems, is that you've got enough memory and
it's coming out faster when doing a hash-join vs. a sort + merge-join.
Could likely be because it doesn't think there's enough work memory
available for the hash, which might change based on the values it gets
from the statistics on how frequently something shows up, etc.

    Enjoy,

        Stephen

Attachment

signature.asc

Re:

From

Tom Lane

Date:

25 June 2007, 19:10:22

"Tyrrill, Ed" <tyrrill_ed@emc.com> writes:
> ... With 8.2.4.1 I get the same plan and performance with
> default_statistics_target set to either 10 or 100:

There's something fishy about that, because AFAICS from the CVS logs,
there are no relevant planner changes between 8.2.3 and 8.2.4.  You
should have gotten exactly the same behavior with both.  Maybe the
version difference you think you see is due to noise in ANALYZE's
random sampling --- are the plan choices stable if you repeat ANALYZE
several times at the same statistics target?

I'm also noticing some rather large variation in what ought to be
essentially the same seqscan cost:

>          ->  Seq Scan on backup_location  (cost=0.00..3520915.44
> rows=215090944 width=8) (actual time=0.048..333944.886 rows=215090786
> loops=1)

>          ->  Seq Scan on backup_location  (cost=0.00..3520915.44
> rows=215090944 width=8) (actual time=17.905..790499.303 rows=215090786
> loops=1)

>          ->  Seq Scan on backup_location  (cost=0.00..3520915.44
> rows=215090944 width=8) (actual time=7.110..246561.900 rows=215090786
> loops=1)

Got any idea what's up with that --- heavy background activity maybe,
or partially cached table data?  It's pretty tough to blame the plan for
a 3x variation in the cost of reading data.

Also, what do you have work_mem set to?  Have you changed any of the
planner cost parameters from their defaults?

            regards, tom lane

Re:

From

Ed Tyrrill

Date:

25 June 2007, 20:39:39

On Mon, 2007-06-25 at 17:56 -0400, Stephen Frost wrote:
> Was the 'work_mem' set to the same thing on all these runs?  Also, you
> might try increasing the 'work_mem' under 8.2.4, at least for this query
> (you can set it by just doing: set work_mem = '2GB'; or similar in psql,
> or you can change the default in postgresql.conf).
>
> The big thing of note, it seems, is that you've got enough memory and
> it's coming out faster when doing a hash-join vs. a sort + merge-join.
> Could likely be because it doesn't think there's enough work memory
> available for the hash, which might change based on the values it gets
> from the statistics on how frequently something shows up, etc.
>
>     Enjoy,
>
>         Stephen

Yes, work_mem was set to 128MB for all runs.  All settings were the same
except for the change to default_statistics_target.  I'm certainly
memory constrained, but giving 2GB to one one session doesn't allow
other sessions to do anything.  Possibly when we upgrade to 16GB. :-)

Re:

From

Stephen Frost

Date:

25 June 2007, 20:52:14

* Ed Tyrrill (tyrrill_ed@emc.com) wrote:
> Yes, work_mem was set to 128MB for all runs.  All settings were the same
> except for the change to default_statistics_target.  I'm certainly
> memory constrained, but giving 2GB to one one session doesn't allow
> other sessions to do anything.  Possibly when we upgrade to 16GB. :-)

You might consider a smaller increase, say to 256MB, to see if that'll
switch it to a hash join (and then watch the *actual* memory usage, of
course), if you're looking for performance for this query at least.

Yeah, 2GB is what I typically run on our data warehouse box, which is a
nice dual-proc/dual-core DL385 w/ 16GB of ram. :)  The annoying thing is
that I can still run it out of memory sometimes, even w/ 16GB. :/

    Thanks,

        Stephen

Attachment

signature.asc

Re:

From

Ed Tyrrill

Date:

25 June 2007, 21:10:35

On Mon, 2007-06-25 at 18:10 -0400, Tom Lane wrote:
> "Tyrrill, Ed" <tyrrill_ed@emc.com> writes:
> > ... With 8.2.4.1 I get the same plan and performance with
> > default_statistics_target set to either 10 or 100:
>
> There's something fishy about that, because AFAICS from the CVS logs,
> there are no relevant planner changes between 8.2.3 and 8.2.4.  You
> should have gotten exactly the same behavior with both.  Maybe the
> version difference you think you see is due to noise in ANALYZE's
> random sampling --- are the plan choices stable if you repeat ANALYZE
> several times at the same statistics target?
>
> I'm also noticing some rather large variation in what ought to be
> essentially the same seqscan cost:
>
> >          ->  Seq Scan on backup_location  (cost=0.00..3520915.44
> > rows=215090944 width=8) (actual time=0.048..333944.886 rows=215090786
> > loops=1)
>
> >          ->  Seq Scan on backup_location  (cost=0.00..3520915.44
> > rows=215090944 width=8) (actual time=17.905..790499.303 rows=215090786
> > loops=1)
>
> >          ->  Seq Scan on backup_location  (cost=0.00..3520915.44
> > rows=215090944 width=8) (actual time=7.110..246561.900 rows=215090786
> > loops=1)
>
> Got any idea what's up with that --- heavy background activity maybe,
> or partially cached table data?  It's pretty tough to blame the plan for
> a 3x variation in the cost of reading data.
>
> Also, what do you have work_mem set to?  Have you changed any of the
> planner cost parameters from their defaults?
>
>             regards, tom lane

I would expect the seqscan actual time to go down from the first explain
to the second because at least some of the data should be in the file
cache.  But the time goes up for the second run.  There are no other
applications running on this machine besides linux services, though it's
possible that one or more of them was doing something, but none of those
should have this major of an impact.

After loading the data dump from 8.1 I ran analyze once, ran the first
query, changed default_statistics_target to 100 in postgresql.conf, and
restarted postmaster, analyzed again, and ran the second query.  I then
did the same with 8.2.4.1, and the third explain analyze shows the run
with default_statistics_target set to 100.  The run with
default_statistics_target set to 10 with 8.2.4.1 was very similar to
when set to 100 so I didn't include it.

work_mem was set to 128MB for all runs.  I also have random_page_cost =
2.

It seems to me that the first plan is the optimal one for this case, but
when the planner has more information about the table it chooses not to
use it.  Do you think that if work_mem were higher it might choose the
first plan again?

Thanks,
Ed

Re:

From

Stephen Frost

Date:

25 June 2007, 21:33:45

* Ed Tyrrill (tyrrill_ed@emc.com) wrote:
> It seems to me that the first plan is the optimal one for this case, but
> when the planner has more information about the table it chooses not to
> use it.  Do you think that if work_mem were higher it might choose the
> first plan again?

Seems likely to me.  You understand that you can set the work_mem
whenever you want, right?  It's a GUC, so you could issue a 'set
work_mem = blah' in the application code right before and right after
(assuming you're going to continue using the session) this particular
query, or just do it in a seperate session using 'explain' to play
around with what the planner does given different arguments.

'explain's pretty cheap/easy, and you can play around with various
settings to see what PG will do in various cases.  Of course, you won't
know the runtimes without doing 'explain analyze', but I think you have
a good idea of the best plan for this query already...

    Thanks,

        Stephen

Attachment

signature.asc

Re:

From

Tom Lane

Date:

25 June 2007, 22:07:21

Ed Tyrrill <tyrrill_ed@emc.com> writes:
> It seems to me that the first plan is the optimal one for this case, but
> when the planner has more information about the table it chooses not to
> use it.  Do you think that if work_mem were higher it might choose the
> first plan again?

It's worth fooling around with work_mem just to see what happens.  The
other thing that would be interesting is to force the other plan (set
enable_mergejoin = off) just to see what the planner is costing it at.
My suspicion is that the estimated costs are pretty close.

The ANALYZE stats affect this choice only in second-order ways AFAIR.
The planner penalizes hashes if it thinks there will be a lot of
duplicate values in the inner relation, but IIRC there is also a penalty
for inner duplicates in the mergejoin cost estimate.  So I'm a bit
surprised that there'd be a change.

Can you show us the pg_stats rows for the join columns after analyzing
at target 10 and target 100?

            regards, tom lane

Re:

From

Ed Tyrrill

Date:

29 June 2007, 18:01:53

On Mon, 2007-06-25 at 21:07 -0400, Tom Lane wrote:
> It's worth fooling around with work_mem just to see what happens.  The
> other thing that would be interesting is to force the other plan (set
> enable_mergejoin = off) just to see what the planner is costing it at.
> My suspicion is that the estimated costs are pretty close.
>
> The ANALYZE stats affect this choice only in second-order ways AFAIR.
> The planner penalizes hashes if it thinks there will be a lot of
> duplicate values in the inner relation, but IIRC there is also a penalty
> for inner duplicates in the mergejoin cost estimate.  So I'm a bit
> surprised that there'd be a change.
>
> Can you show us the pg_stats rows for the join columns after analyzing
> at target 10 and target 100?
>
>             regards, tom lane

I wasn't able to work on this for a couple days, but now I am back on it
again.  I increased work_mem to 1GB, and decreased
default_statistics_target to 10.  postmaster takes 74.8% of RAM (out of
4GB) with shared_memory = 1GB as well.  I have not been able to get the
database to use the plan that was really fast the first time.  So
perhaps the random sample factor is what caused it to choose the faster
plan the first time.

Tom, as you requested here are the pg_stats rows with
default_statistics_target = 10:

mdsdb=# select * from pg_stats where attname = 'record_id';
 schemaname |    tablename    |  attname  | null_frac | avg_width |
n_distinct  | most_common_vals | most_common_freqs |
histogram_bounds                                       | correlation
------------+-----------------+-----------+-----------+-----------
+-------------+------------------+-------------------
+----------------------------------------------------------------------------------------------+-------------
 public     | backup_location | record_id |         0 |         8 |
4.40637e+06 | {6053595}        | {0.000666667}     |
{25859,1940711,2973201,4592467,5975199,8836423,10021178,10261007,11058355,12087662,14349748} |    0.165715
 public     | backupobjects   | record_id |         0 |         8 |
-1 |                  |                   |
{10565,1440580,2736075,4140418,5600863,7412501,8824407,10136590,11560512,13069900,14456128}  |    0.902336

and default_statistics_target = 100:

mdsdb=# select * from pg_stats where attname = 'record_id';
 schemaname |    tablename    |  attname  | null_frac | avg_width |
n_distinct  |
most_common_vals                                      |
most_common_freqs                                  |
histogram_bounds
| correlation
------------+-----------------+-----------+-----------+-----------
+-------------

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------!

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------!
 --------------------------------------------------------------!
 --------

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------
 public     | backup_location | record_id |         0 |         8 |
5.82337e+06 |

{235096,295262,1553025,1612535,1635617,1803461,2000641,2036309,2507381,2904177,2921981,3089088,3146908,3224744,3253356,3580055,3647668,4660094,4661032,4752775,4801371,5116051,5173423,9891458,9895966,9897668,9905497,9907478,9908664,9913842,9916856,9929495,9946579,9957084,9962904,9963807,9971068,9980253,9985117,9985892,10007476,10010352,10010808,10025192,10075013,10103597,10115103,10116781,10120165,10137641,10141427,10144210,10148637,10369082,10395553,10418593,10435057,10441855,10497439,10499683,10509766,10515351,10521300,10522302,10525281,10538714,10542612,10544981,10546440,10678033,10995462,11101727,11132055,12664343,12967575}
|
{6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.6!

6667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05}
|
{11545,295289,430988,565826,912519,1179853,1442886,1590646,1649901,1709198,1773893,1831398,1966887,2026312,2087874,2151518,2316639,2474245,2571004,2769480,2863837,2952117,3100933,3182580,3259831,3338022,3412802,3517509,3671705,3758894,4106521,4549800,4620521,4699748,4772724,4851063,4927467,5028209,5105421,5183582,5364296,5454952,5965286,6081539,6528031,6798065,7192136,7518897,7854942,8169821,8527085,8867514,9318637,9812968,9896732,9!
 915321,9933027,9950345,9969581,9987324,10004114,10022269,10040!
 935,1005

9618,10077611,10096111,10114682,10132165,10151207,10168791,10232857,10299111,10370156,10441842,10497303,10514993,10531984,10678040,10953841,11030018,11088408,11153327,11214573,11443648,11507997,11566711,11615011,11683984,11909042,12014715,12106151,12194283,12284176,12373145,12456035,12545752,12628686,12723672,13022336,13621556,14449465}
|   0.210513 
 public     | backupobjects   | record_id |         0 |         8 |
-1 |                                       |
|

{621,167329,364075,495055,629237,768429,906683,1036819,1168225,1304782,1446441,1635583,1776623,1919568,2058804,2213573,2384816,2516367,2654165,2777015,2913726,3045319,3179436,3326044,3449751,3584737,3705100,3849567,3983587,4119532,4255086,4400700,4522294,4676257,4803235,4930094,5065599,5212568,5341881,5476010,5610455,5750156,5876952,6009086,6341074,6663749,6792397,6913638,7035450,7166345,7309759,7449436,7579067,7717768,7852692,7992611,8107334,8232850,8376448,8510463,8654839,8785467,8930354,9065437,9219398,9347145,9497479,9694222,9829935,9962878,10107465,10246453,10406586,10548493,10690983,10827832,10978600,11111459,11257696,11462706,11593369,11738262,11918473,12065317,12208496,12340088,12483168,12631769,12754208,12907042,13037605,13176218,13312853,13440791,13600318,13749132,13884632,14018915,14174415,14328234,14458641}
             |    0.911416