Thread: Planner problems in 8.2.4 and 8.2.5 (was: Possible planner bug/regression introduced in 8.2.5)
Planner problems in 8.2.4 and 8.2.5 (was: Possible planner bug/regression introduced in 8.2.5)
From
Greg Sabino Mullane
Date:
I don't have a full test case yet, but I did finally manage to get an explain analyze to finish in a sane amount of time on 8.2.5. Attached are two cleaned up explain analyze results, using the exact same data directory but different executables: one is 8.2.3 and returns as expected, the other is 8.2.5, which generates a slow plan despite any fiddling with geqo/join_collapse_limit, etc. The cost is the same, but it makes a wrong turn partway through the plan. This 8.2.5 has the earlier patches from Tom already posted applied to it (also tested on 8.2.4 and 8.2.5 with the same result). An earlier version of the query with tables truncated to 100000 rows ran in 70 seconds on 8.2.3, and did not finish after an hour on 8.2.5. This version has the tables truncated to a mere 10000 rows each. The query itself is a view calling some large views, which call other views and functions, etc. I can post a version of it if needed, but my energy is mostly focused now on making a reproducible test case. Now that this is locally reproducible in a finite amount of time, patches and tweaking suggestions are welcome. (Postgres built with no special flags from source, all tables have been analyzed, '***' in the explain analyze plans indicates places manually made things more readable).
Attachment
Re: Planner problems in 8.2.4 and 8.2.5 (was: Possible planner bug/regression introduced in 8.2.5)
From
Tom Lane
Date:
Greg Sabino Mullane <greg@turnstep.com> writes: > I don't have a full test case yet, but I did finally manage to get an > explain analyze to finish in a sane amount of time on 8.2.5. Attached > are two cleaned up explain analyze results, using the exact same data > directory but different executables: one is 8.2.3 and returns as > expected, the other is 8.2.5, which generates a slow plan despite any > fiddling with geqo/join_collapse_limit, etc. The cost is the same, but > it makes a wrong turn partway through the plan. Is there a reason you rounded off most of the costs? It looks like the estimated costs of the two join types are nearly equal, and so it's pure chance which one gets chosen. The real problem seems to be the misestimation here: > ->Seq Scan on orders_smaller m (C=0..742 R=1) (AT=0..11 R=9247 L=1) > Filter: ((order_number)::text !~~ '%.%'::text) With a base scan estimate that's off by four orders of magnitude, there's no reason at all to expect that the join plan above it will be very suitable :-( This might be a bug in the LIKE estimator (if so, it's in 8.2.3 as well). I don't have time to look closer right now, but can you show us the pg_stats row for orders_smaller.order_number? regards, tom lane
Re: Planner problems in 8.2.4 and 8.2.5 (was: Possible planner bug/regression introduced in 8.2.5)
From
Tom Lane
Date:
I wrote: > This might be a bug in the LIKE estimator (if so, it's in 8.2.3 as > well). I don't have time to look closer right now, but can you show us > the pg_stats row for orders_smaller.order_number? Oh, never mind that ... on inspection, the NOT LIKE selectivity estimator is obviously broken: patternsel() doesn't realize it needs to negate the results of the operator when applying same to histogram entries. regards, tom lane
Re: Planner problems in 8.2.4 and 8.2.5 (was: Possible planner bug/regression introduced in 8.2.5)
From
Tom Lane
Date:
I wrote: >> This might be a bug in the LIKE estimator (if so, it's in 8.2.3 as >> well). I don't have time to look closer right now, but can you show us >> the pg_stats row for orders_smaller.order_number? > Oh, never mind that ... on inspection, the NOT LIKE selectivity > estimator is obviously broken: patternsel() doesn't realize it needs > to negate the results of the operator when applying same to histogram > entries. I've applied a patch that should make this better: http://archives.postgresql.org/pgsql-committers/2007-11/msg00101.php regards, tom lane
Re: Planner problems in 8.2.4 and 8.2.5 (was: Possible planner bug/regression introduced in 8.2.5)
From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > Is there a reason you rounded off most of the costs? It looks like the > estimated costs of the two join types are nearly equal, and so it's pure > chance which one gets chosen. No real reason, it's just a post-processing script used to make explain output a little more readable. I'll leave in all the sigfigs next time. > This might be a bug in the LIKE estimator (if so, it's in 8.2.3 as > well). I don't have time to look closer right now, but can you show us > the pg_stats row for orders_smaller.order_number? I tried the patch you sent, with no change. However, I then changed the default_statistics_target to 100, reanalyzed, and it came back with the "good" plan. Trying this on the original larger query (which pulls from tables with millions of rows, not the 10,000 subsets I created) worked fine too. Very odd. I confirmed that a lower stats worked fine on 8.2.3, and then narrowed it down to 99 - at 98 and below on 8.2.5, the "bad" plan is used, and at 99 and above, the "good" one is. I guess as a rule of thumb we'll crank up the default estimate on our 8.2 boxes. Any particular number recommended? Any reason why 99 is so magical? I could have sworn I tried it with 100 last week and saw the bad plan. Guess I should also boost my default target testing up a bit as well. Thanks for the quick patch, we'll definitely apply that as well for safety. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200711081137 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFHMztMvJuQZxSWSsgRA0pfAKDHWvUafv0bwL/nzmP5yXuptTPX7gCfbMNr uMLI9yy6Prwt0DOHBsLu/Pk= =1Vsj -----END PGP SIGNATURE-----
Re: Planner problems in 8.2.4 and 8.2.5 (was: Possible planner bug/regression introduced in 8.2.5)
From
Tom Lane
Date:
"Greg Sabino Mullane" <greg@turnstep.com> writes: > I tried the patch you sent, with no change. However, I then changed the > default_statistics_target to 100, reanalyzed, and it came back with the > "good" plan. Trying this on the original larger query (which pulls from > tables with millions of rows, not the 10,000 subsets I created) worked > fine too. Very odd. I confirmed that a lower stats worked fine on 8.2.3, > and then narrowed it down to 99 - at 98 and below on 8.2.5, the "bad" > plan is used, and at 99 and above, the "good" one is. I guess as a rule > of thumb we'll crank up the default estimate on our 8.2 boxes. Any particular > number recommended? Any reason why 99 is so magical? If there are 100 or more histogram entries it'll do the estimation by counting how many of the histogram entries match the pattern, rather than using the prefix-range-based estimator (which is pretty much all-fantasy anyway for a pattern with leading % :-(). http://archives.postgresql.org/pgsql-committers/2006-09/msg00331.php I'm too lazy to go check, but I think the stats target is interpreted as the number of histogram bins rather than values, which is why you'd see the switchover at 99 not 100. > I could have sworn I > tried it with 100 last week and saw the bad plan. Without that patch, 8.2.x's NOT LIKE estimator is completely bogus anyway :-(. It has been broken right along --- I'm not sure why your query produced a different plan in 8.2.3 than later, but it wasn't as a result of changes here. regards, tom lane
Re: Planner problems in 8.2.4 and 8.2.5 (was: Possible planner bug/regression introduced in 8.2.5)
From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Tom Lane replied: > If there are 100 or more histogram entries it'll do the estimation by > counting how many of the histogram entries match the pattern, rather > than using the prefix-range-based estimator (which is pretty much > all-fantasy anyway for a pattern with leading % :-(). > > http://archives.postgresql.org/pgsql-committers/2006-09/msg00331.php Ugh, that's some gotcha. Now that the patches are in place to fix the planner problems, can I strongly recommend that a 8.2.6 version be made? These are some serious planner problems, reported by real-world users, and the only other option may be to go back to 8.1. > I'm too lazy to go check, but I think the stats target is interpreted as > the number of histogram bins rather than values, which is why you'd see > the switchover at 99 not 100. Can we switch the default_statistics_target to a default of 100? Is there any reason not to do so at this point? Ten has always seemed very low to me, and the 98/99 gotcha only makes the case for at least 100 as the default stronger. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200711132325 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFHOnkKvJuQZxSWSsgRA9i/AJ4rN3BANdWKLmrscVpij0GiZ1i/lwCg4u4x 6U+7bTe2o60Kv44f+6n61Zc= =MSBf -----END PGP SIGNATURE-----
Re: Planner problems in 8.2.4 and 8.2.5 (was: Possible planner bug/regression introduced in 8.2.5)
From
Tom Lane
Date:
"Greg Sabino Mullane" <greg@turnstep.com> writes: > Can we switch the default_statistics_target to a default of 100? Is > there any reason not to do so at this point? Other than a 10x increase in the cost of ANALYZE, and in the cost of histogram-based operations in the planner? It's been clear for quite awhile that a stats target of 10 is often too low, but no one has done the legwork to establish what a more reasonable tradeoff point would be. I'm not for increasing the overhead by a factor of 10 ... or even a factor of 2 ... without some real evidence about the implications. I'd be inclined to approach it by first trying to establish what's a sane default for operations unrelated to LIKE estimation. If that comes out as 50 or more, I'd just lower the LIKE threshold to whatever it comes out at. The 100 number was really just a guess in the first place. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > It's been clear for quite awhile that a stats target of 10 is often > too low, but no one has done the legwork to establish what a more > reasonable tradeoff point would be. Any ideas on what measurements would be interesting for this? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> It's been clear for quite awhile that a stats target of 10 is often >> too low, but no one has done the legwork to establish what a more >> reasonable tradeoff point would be. > Any ideas on what measurements would be interesting for this? Time to run ANALYZE on large tables, extra planning time for queries of various complexities; versus whether you actually get a better plan or not. In a quick look at selfuncs.c, it appears that the worst planner hit would be for eqjoin selectivities between non-unique columns (ie, those having MCV lists). The number of operations is O(N^2) if there are N values in each MCV list. regards, tom lane