Thread:
Hey All, I am testing upgrading our database from version 8.1 to 8.2. I ran our worst performing query on this table, an outer join with an "is null" condition, and I was happy to see it ran over four times faster. I also noticed the explain analyze showed the planner chose to do sequential scans on both tables. I realized I had forgotten to increase default_statistics_target from the default 10, so I increased it to 100, and ran "analyze". In 8.1 this sped things up significantly, but in 8.2 when I ran the query again it was actually slower. These tests were done with 8.2.3.1 so I also loaded 8.2.4.1 for comparison. Here is the explain analyze with default_statistics_target set to 10 on 8.2.3.1: mdsdb=# explain analyze select backupobjects.record_id from backupobjects left outer join backup_location using (record_id) where backup_location.record_id is null; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ Hash Left Join (cost=7259801.24..12768737.71 rows=1 width=8) (actual time=651003.121..1312717.249 rows=10411 loops=1) Hash Cond: (backupobjects.record_id = backup_location.record_id) Filter: (backup_location.record_id IS NULL) -> Seq Scan on backupobjects (cost=0.00..466835.63 rows=13716963 width=8) (actual time=0.030..95981.895 rows=13706121 loops=1) -> Hash (cost=3520915.44..3520915.44 rows=215090944 width=8) (actual time=527345.024..527345.024 rows=215090786 loops=1) -> Seq Scan on backup_location (cost=0.00..3520915.44 rows=215090944 width=8) (actual time=0.048..333944.886 rows=215090786 loops=1) Total runtime: 1312727.200 ms And again with default_statistics_target set to 100 on 8.2.3.1: mdsdb=# explain analyze select backupobjects.record_id from backupobjects left outer join backup_location using (record_id) where backup_location.record_id is null; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ Merge Left Join (cost=38173940.88..41468823.19 rows=1 width=8) (actual time=3256548.988..4299922.345 rows=10411 loops=1) Merge Cond: (backupobjects.record_id = backup_location.record_id) Filter: (backup_location.record_id IS NULL) -> Sort (cost=2258416.72..2292675.79 rows=13703629 width=8) (actual time=74450.897..85651.707 rows=13706121 loops=1) Sort Key: backupobjects.record_id -> Seq Scan on backupobjects (cost=0.00..466702.29 rows=13703629 width=8) (actual time=0.024..40939.762 rows=13706121 loops=1) -> Sort (cost=35915524.17..36453251.53 rows=215090944 width=8) (actual time=3182075.661..4094748.788 rows=215090786 loops=1) Sort Key: backup_location.record_id -> Seq Scan on backup_location (cost=0.00..3520915.44 rows=215090944 width=8) (actual time=17.905..790499.303 rows=215090786 loops=1) Total runtime: 4302591.325 ms With 8.2.4.1 I get the same plan and performance with default_statistics_target set to either 10 or 100: mdsdb=# explain analyze select backupobjects.record_id from backupobjects left outer join backup_location using (record_id) where backup_location.record_id is null; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ Merge Left Join (cost=37615266.46..40910145.54 rows=1 width=8) (actual time=2765729.582..3768519.658 rows=10411 loops=1) Merge Cond: (backupobjects.record_id = backup_location.record_id) Filter: (backup_location.record_id IS NULL) -> Sort (cost=2224866.79..2259124.25 rows=13702985 width=8) (actual time=101118.216..113245.942 rows=13706121 loops=1) Sort Key: backupobjects.record_id -> Seq Scan on backupobjects (cost=0.00..466695.85 rows=13702985 width=8) (actual time=10.003..67604.564 rows=13706121 loops=1) -> Sort (cost=35390399.67..35928127.03 rows=215090944 width=8) (actual time=2664596.049..3540048.500 rows=215090786 loops=1) Sort Key: backup_location.record_id -> Seq Scan on backup_location (cost=0.00..3520915.44 rows=215090944 width=8) (actual time=7.110..246561.900 rows=215090786 loops=1) Total runtime: 3770428.750 ms And for reference here is the same query with 8.1.5.6: mdsdb=# explain analyze select backupobjects.record_id from backupobjects left outer join backup_location using (record_id) where backup_location.record_id is null; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ ----------------- Merge Left Join (cost=37490897.67..41269533.13 rows=13705356 width=8) (actual time=5096492.430..6588745.386 rows=10411 loops=1) Merge Cond: ("outer".record_id = "inner".record_id) Filter: ("inner".record_id IS NULL) -> Index Scan using backupobjects_pkey on backupobjects (cost=0.00..518007.92 rows=13705356 width=8) (actual time=32.020..404517.133 rows=13706121 loops=1) -> Sort (cost=37490897.67..38028625.03 rows=215090944 width=8) (actual time=5096460.396..6058937.259 rows=215090786 loops=1) Sort Key: backup_location.record_id -> Seq Scan on backup_location (cost=0.00..3520915.44 rows=215090944 width=8) (actual time=0.020..389038.442 rows=215090786 loops=1) Total runtime: 6599215.268 ms (8 rows) Based on all this we will be going with 8.2.4.1, but it seems like currently the query planner isn't choosing the best plan for this case. Thanks, Ed
* Tyrrill, Ed (tyrrill_ed@emc.com) wrote: > Based on all this we will be going with 8.2.4.1, but it seems like > currently the query planner isn't choosing the best plan for this case. Was the 'work_mem' set to the same thing on all these runs? Also, you might try increasing the 'work_mem' under 8.2.4, at least for this query (you can set it by just doing: set work_mem = '2GB'; or similar in psql, or you can change the default in postgresql.conf). The big thing of note, it seems, is that you've got enough memory and it's coming out faster when doing a hash-join vs. a sort + merge-join. Could likely be because it doesn't think there's enough work memory available for the hash, which might change based on the values it gets from the statistics on how frequently something shows up, etc. Enjoy, Stephen
Attachment
"Tyrrill, Ed" <tyrrill_ed@emc.com> writes: > ... With 8.2.4.1 I get the same plan and performance with > default_statistics_target set to either 10 or 100: There's something fishy about that, because AFAICS from the CVS logs, there are no relevant planner changes between 8.2.3 and 8.2.4. You should have gotten exactly the same behavior with both. Maybe the version difference you think you see is due to noise in ANALYZE's random sampling --- are the plan choices stable if you repeat ANALYZE several times at the same statistics target? I'm also noticing some rather large variation in what ought to be essentially the same seqscan cost: > -> Seq Scan on backup_location (cost=0.00..3520915.44 > rows=215090944 width=8) (actual time=0.048..333944.886 rows=215090786 > loops=1) > -> Seq Scan on backup_location (cost=0.00..3520915.44 > rows=215090944 width=8) (actual time=17.905..790499.303 rows=215090786 > loops=1) > -> Seq Scan on backup_location (cost=0.00..3520915.44 > rows=215090944 width=8) (actual time=7.110..246561.900 rows=215090786 > loops=1) Got any idea what's up with that --- heavy background activity maybe, or partially cached table data? It's pretty tough to blame the plan for a 3x variation in the cost of reading data. Also, what do you have work_mem set to? Have you changed any of the planner cost parameters from their defaults? regards, tom lane
On Mon, 2007-06-25 at 17:56 -0400, Stephen Frost wrote: > Was the 'work_mem' set to the same thing on all these runs? Also, you > might try increasing the 'work_mem' under 8.2.4, at least for this query > (you can set it by just doing: set work_mem = '2GB'; or similar in psql, > or you can change the default in postgresql.conf). > > The big thing of note, it seems, is that you've got enough memory and > it's coming out faster when doing a hash-join vs. a sort + merge-join. > Could likely be because it doesn't think there's enough work memory > available for the hash, which might change based on the values it gets > from the statistics on how frequently something shows up, etc. > > Enjoy, > > Stephen Yes, work_mem was set to 128MB for all runs. All settings were the same except for the change to default_statistics_target. I'm certainly memory constrained, but giving 2GB to one one session doesn't allow other sessions to do anything. Possibly when we upgrade to 16GB. :-)
* Ed Tyrrill (tyrrill_ed@emc.com) wrote: > Yes, work_mem was set to 128MB for all runs. All settings were the same > except for the change to default_statistics_target. I'm certainly > memory constrained, but giving 2GB to one one session doesn't allow > other sessions to do anything. Possibly when we upgrade to 16GB. :-) You might consider a smaller increase, say to 256MB, to see if that'll switch it to a hash join (and then watch the *actual* memory usage, of course), if you're looking for performance for this query at least. Yeah, 2GB is what I typically run on our data warehouse box, which is a nice dual-proc/dual-core DL385 w/ 16GB of ram. :) The annoying thing is that I can still run it out of memory sometimes, even w/ 16GB. :/ Thanks, Stephen
Attachment
On Mon, 2007-06-25 at 18:10 -0400, Tom Lane wrote: > "Tyrrill, Ed" <tyrrill_ed@emc.com> writes: > > ... With 8.2.4.1 I get the same plan and performance with > > default_statistics_target set to either 10 or 100: > > There's something fishy about that, because AFAICS from the CVS logs, > there are no relevant planner changes between 8.2.3 and 8.2.4. You > should have gotten exactly the same behavior with both. Maybe the > version difference you think you see is due to noise in ANALYZE's > random sampling --- are the plan choices stable if you repeat ANALYZE > several times at the same statistics target? > > I'm also noticing some rather large variation in what ought to be > essentially the same seqscan cost: > > > -> Seq Scan on backup_location (cost=0.00..3520915.44 > > rows=215090944 width=8) (actual time=0.048..333944.886 rows=215090786 > > loops=1) > > > -> Seq Scan on backup_location (cost=0.00..3520915.44 > > rows=215090944 width=8) (actual time=17.905..790499.303 rows=215090786 > > loops=1) > > > -> Seq Scan on backup_location (cost=0.00..3520915.44 > > rows=215090944 width=8) (actual time=7.110..246561.900 rows=215090786 > > loops=1) > > Got any idea what's up with that --- heavy background activity maybe, > or partially cached table data? It's pretty tough to blame the plan for > a 3x variation in the cost of reading data. > > Also, what do you have work_mem set to? Have you changed any of the > planner cost parameters from their defaults? > > regards, tom lane I would expect the seqscan actual time to go down from the first explain to the second because at least some of the data should be in the file cache. But the time goes up for the second run. There are no other applications running on this machine besides linux services, though it's possible that one or more of them was doing something, but none of those should have this major of an impact. After loading the data dump from 8.1 I ran analyze once, ran the first query, changed default_statistics_target to 100 in postgresql.conf, and restarted postmaster, analyzed again, and ran the second query. I then did the same with 8.2.4.1, and the third explain analyze shows the run with default_statistics_target set to 100. The run with default_statistics_target set to 10 with 8.2.4.1 was very similar to when set to 100 so I didn't include it. work_mem was set to 128MB for all runs. I also have random_page_cost = 2. It seems to me that the first plan is the optimal one for this case, but when the planner has more information about the table it chooses not to use it. Do you think that if work_mem were higher it might choose the first plan again? Thanks, Ed
* Ed Tyrrill (tyrrill_ed@emc.com) wrote: > It seems to me that the first plan is the optimal one for this case, but > when the planner has more information about the table it chooses not to > use it. Do you think that if work_mem were higher it might choose the > first plan again? Seems likely to me. You understand that you can set the work_mem whenever you want, right? It's a GUC, so you could issue a 'set work_mem = blah' in the application code right before and right after (assuming you're going to continue using the session) this particular query, or just do it in a seperate session using 'explain' to play around with what the planner does given different arguments. 'explain's pretty cheap/easy, and you can play around with various settings to see what PG will do in various cases. Of course, you won't know the runtimes without doing 'explain analyze', but I think you have a good idea of the best plan for this query already... Thanks, Stephen
Attachment
Ed Tyrrill <tyrrill_ed@emc.com> writes: > It seems to me that the first plan is the optimal one for this case, but > when the planner has more information about the table it chooses not to > use it. Do you think that if work_mem were higher it might choose the > first plan again? It's worth fooling around with work_mem just to see what happens. The other thing that would be interesting is to force the other plan (set enable_mergejoin = off) just to see what the planner is costing it at. My suspicion is that the estimated costs are pretty close. The ANALYZE stats affect this choice only in second-order ways AFAIR. The planner penalizes hashes if it thinks there will be a lot of duplicate values in the inner relation, but IIRC there is also a penalty for inner duplicates in the mergejoin cost estimate. So I'm a bit surprised that there'd be a change. Can you show us the pg_stats rows for the join columns after analyzing at target 10 and target 100? regards, tom lane
On Mon, 2007-06-25 at 21:07 -0400, Tom Lane wrote: > It's worth fooling around with work_mem just to see what happens. The > other thing that would be interesting is to force the other plan (set > enable_mergejoin = off) just to see what the planner is costing it at. > My suspicion is that the estimated costs are pretty close. > > The ANALYZE stats affect this choice only in second-order ways AFAIR. > The planner penalizes hashes if it thinks there will be a lot of > duplicate values in the inner relation, but IIRC there is also a penalty > for inner duplicates in the mergejoin cost estimate. So I'm a bit > surprised that there'd be a change. > > Can you show us the pg_stats rows for the join columns after analyzing > at target 10 and target 100? > > regards, tom lane I wasn't able to work on this for a couple days, but now I am back on it again. I increased work_mem to 1GB, and decreased default_statistics_target to 10. postmaster takes 74.8% of RAM (out of 4GB) with shared_memory = 1GB as well. I have not been able to get the database to use the plan that was really fast the first time. So perhaps the random sample factor is what caused it to choose the faster plan the first time. Tom, as you requested here are the pg_stats rows with default_statistics_target = 10: mdsdb=# select * from pg_stats where attname = 'record_id'; schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation ------------+-----------------+-----------+-----------+----------- +-------------+------------------+------------------- +----------------------------------------------------------------------------------------------+------------- public | backup_location | record_id | 0 | 8 | 4.40637e+06 | {6053595} | {0.000666667} | {25859,1940711,2973201,4592467,5975199,8836423,10021178,10261007,11058355,12087662,14349748} | 0.165715 public | backupobjects | record_id | 0 | 8 | -1 | | | {10565,1440580,2736075,4140418,5600863,7412501,8824407,10136590,11560512,13069900,14456128} | 0.902336 and default_statistics_target = 100: mdsdb=# select * from pg_stats where attname = 'record_id'; schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation ------------+-----------------+-----------+-----------+----------- +------------- +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------! ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------! --------------------------------------------------------------! -------- -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------- public | backup_location | record_id | 0 | 8 | 5.82337e+06 | {235096,295262,1553025,1612535,1635617,1803461,2000641,2036309,2507381,2904177,2921981,3089088,3146908,3224744,3253356,3580055,3647668,4660094,4661032,4752775,4801371,5116051,5173423,9891458,9895966,9897668,9905497,9907478,9908664,9913842,9916856,9929495,9946579,9957084,9962904,9963807,9971068,9980253,9985117,9985892,10007476,10010352,10010808,10025192,10075013,10103597,10115103,10116781,10120165,10137641,10141427,10144210,10148637,10369082,10395553,10418593,10435057,10441855,10497439,10499683,10509766,10515351,10521300,10522302,10525281,10538714,10542612,10544981,10546440,10678033,10995462,11101727,11132055,12664343,12967575} | {6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.6! 6667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05} | {11545,295289,430988,565826,912519,1179853,1442886,1590646,1649901,1709198,1773893,1831398,1966887,2026312,2087874,2151518,2316639,2474245,2571004,2769480,2863837,2952117,3100933,3182580,3259831,3338022,3412802,3517509,3671705,3758894,4106521,4549800,4620521,4699748,4772724,4851063,4927467,5028209,5105421,5183582,5364296,5454952,5965286,6081539,6528031,6798065,7192136,7518897,7854942,8169821,8527085,8867514,9318637,9812968,9896732,9! 915321,9933027,9950345,9969581,9987324,10004114,10022269,10040! 935,1005 9618,10077611,10096111,10114682,10132165,10151207,10168791,10232857,10299111,10370156,10441842,10497303,10514993,10531984,10678040,10953841,11030018,11088408,11153327,11214573,11443648,11507997,11566711,11615011,11683984,11909042,12014715,12106151,12194283,12284176,12373145,12456035,12545752,12628686,12723672,13022336,13621556,14449465} | 0.210513 public | backupobjects | record_id | 0 | 8 | -1 | | | {621,167329,364075,495055,629237,768429,906683,1036819,1168225,1304782,1446441,1635583,1776623,1919568,2058804,2213573,2384816,2516367,2654165,2777015,2913726,3045319,3179436,3326044,3449751,3584737,3705100,3849567,3983587,4119532,4255086,4400700,4522294,4676257,4803235,4930094,5065599,5212568,5341881,5476010,5610455,5750156,5876952,6009086,6341074,6663749,6792397,6913638,7035450,7166345,7309759,7449436,7579067,7717768,7852692,7992611,8107334,8232850,8376448,8510463,8654839,8785467,8930354,9065437,9219398,9347145,9497479,9694222,9829935,9962878,10107465,10246453,10406586,10548493,10690983,10827832,10978600,11111459,11257696,11462706,11593369,11738262,11918473,12065317,12208496,12340088,12483168,12631769,12754208,12907042,13037605,13176218,13312853,13440791,13600318,13749132,13884632,14018915,14174415,14328234,14458641} | 0.911416