Thread: Bad plan by Planner (Already resolved?)
Hi, I stumbled upon a situation where the planner comes with a bad query plan, but I wanted to mention upfront that I'm using a dated PG version and I already see an update which mentions about improving planner performance. I just wanted to check if this issue is already resolved, and if so, which version should I be eyeing. My PG Version: 8.4.7 Probably solved in: 8.4.8 / 9.0.4 ? Issue: It seems that the planner is unable to flatten the IN sub-query causing the planner to take a bad plan and take ages (>2500 seconds) and expects to give a 100 million row output, where in-fact it should get a six row output. The same IN query, when flattened, PG gives the correct result in a fraction of a second. Do let me know if this is a new case. I could try to give you the EXPLAIN ANALYSE outputs / approximate table sizes if required. EXISTING QUERY: SELECT field_b FROM large_table_a JOIN large_table_b USING (field_b) WHERE field_a IN (SELECT large_table_b.field_a FROM large_table_b WHERE field_b = 2673056) RECOMMENDED QUERY: SELECT s1.field_b FROM large_table_a JOIN large_table_b s1 USING (field_b) JOIN large_table_b s2 ON s1.field_a = s2.field_a WHERE s2.field_b = 2673056 -- Robins Tharakan
Attachment
Robins Tharakan <robins.tharakan@comodo.com> wrote: > I stumbled upon a situation where the planner comes with a bad > query plan, but I wanted to mention upfront that I'm using a dated > PG version and I already see an update which mentions about > improving planner performance. I just wanted to check if this > issue is already resolved, and if so, which version should I be > eyeing. > > My PG Version: 8.4.7 > Probably solved in: 8.4.8 / 9.0.4 ? First off, did you use pg_upgrade from an earlier major release? If so, be sure you've dealt with this issue: http://wiki.postgresql.org/wiki/20110408pg_upgrade_fix Second, the releases you should be considering are on this page: http://www.postgresql.org/ So 8.4.9, 9.0.5, or 9.1.1. If anybody recognized the issue from your description, they probably would have posted by now. The fact that there has been no such post doesn't necessarily mean it's not fixed -- the description is a little vague without table definitions and EXPLAIN ANALYZE output, so people might just not be sure. Since it's arguably in your best interest to update at least to 8.4.9 anyway, the easiest way to get your answer might be to do so and test it. http://www.postgresql.org/support/versioning > Do let me know if this is a new case. I could try to give you the > EXPLAIN ANALYSE outputs / approximate table sizes if required. If you establish that the latest versions of the software still show the issue, please post with more information, as described here: http://wiki.postgresql.org/wiki/SlowQueryQuestions -Kevin
Hi, I'll try to answer in-line. On 10/17/2011 09:32 PM, Kevin Grittner wrote: > First off, did you use pg_upgrade from an earlier major release? If > so, be sure you've dealt with this issue: Although I joined recently, I doubt whether pg_upgrade was used here. And this doesn't look like the issue either. There are no data loss issues and this seems primarily a planner specific bug. > the description is a > little vague without table definitions and EXPLAIN ANALYZE output, > so people might just not be sure. Makes sense. Just that, I thought I shouldn't drop in a large mail, in case the issue was a well-known one. Please find below the EXPLAIN ANALYSE output. I've changed the table-names / field-names and provided other details as well. large_table_a: ~20million n_dead_tuples / reltuples : ~7% analysed: <2 weeks large_table_b: ~140million n_dead_tuples / reltuples : ~0% analysed: <2 days default_statistics_target: 1000 field_a: int (indexed) field_b: int (indexed) > Since it's arguably in your best > interest to update at least to 8.4.9 anyway, the easiest way to get > your answer might be to do so and test it. Frankly, its slightly difficult to just try out versions. DB>1Tb and getting that kind of resources to just try out versions for a query is not that simple. Hope you would understand. I have the workaround implemented, but just wanted to be sure that this is accommodated in a newer version. =============== EXISTING QUERY: SELECT field_a FROM large_table_a JOIN large_table_b USING (field_a) WHERE field_b IN (SELECT large_table_b.field_b FROM large_table_b WHERE field_a = 2673056) ANALYSE: Hash Join (cost=273247.23..6460088.89 rows=142564896 width=4) Hash Cond: (public.large_table_b.field_b = public.large_table_b.field_b) -> Merge Join (cost=273112.62..5925331.24 rows=142564896 width=8) Merge Cond: (large_table_a.field_a = public.large_table_b.field_a) -> Index Scan using "PK_large_table_a" on large_table_a (cost=0.00..570804.30 rows=22935395 width=4) -> Index Scan using "IX_large_table_b_field_a" on large_table_b (cost=0.00..4381499.54 rows=142564896 width=8) -> Hash (cost=133.32..133.32 rows=103 width=4) -> HashAggregate (cost=132.29..133.32 rows=103 width=4) -> Index Scan using "IX_large_table_b_field_a" on large_table_b (cost=0.00..131.87 rows=165 width=4) Index Cond: (field_a = 2673056) ===================== ALTERNATE QUERY: SELECT s1.field_a FROM large_table_a JOIN large_table_b s1 USING (field_a) JOIN large_table_b s2 ON s1.field_b = s2.field_b WHERE s2.field_a = 2673056 ANALYSE: Nested Loop (cost=0.00..2368.74 rows=469 width=4) (actual time=0.090..0.549 rows=6 loops=1) -> Nested Loop (cost=0.00..1784.06 rows=469 width=4) (actual time=0.057..0.350 rows=16 loops=1) -> Index Scan using "IX_large_table_b_field_a" on large_table_b s2 (cost=0.00..131.87 rows=165 width=4) (actual time=0.033..0.046 rows=6 loops=1) Index Cond: (field_a = 2673056) -> Index Scan using "IX_large_table_b_SampleId" on large_table_b s1 (cost=0.00..9.99 rows=2 width=8) (actual time=0.037..0.047 rows=3 loops=6) Index Cond: (s1.field_b = s2.field_b) -> Index Scan using "PK_large_table_a" on large_table_a (cost=0.00..1.23 rows=1 width=4) (actual time=0.011..0.011 rows=0 loops=16) Index Cond: (large_table_a.field_a = s1.field_a) Total runtime: 0.620 ms -- Robins Tharakan
Attachment
On 17/10/11 19:28, Robins Tharakan wrote: > Hi, > > I stumbled upon a situation where the planner comes with a bad query > plan, but I wanted to mention upfront that I'm using a dated PG > version and I already see an update which mentions about improving > planner performance. I just wanted to check if this issue is already > resolved, and if so, which version should I be eyeing. > > My PG Version: 8.4.7 > Probably solved in: 8.4.8 / 9.0.4 ? > > Issue: It seems that the planner is unable to flatten the IN sub-query > causing the planner to take a bad plan and take ages (>2500 seconds) > and expects to give a 100 million row output, where in-fact it should > get a six row output. The same IN query, when flattened, PG gives the > correct result in a fraction of a second. > > Do let me know if this is a new case. I could try to give you the > EXPLAIN ANALYSE outputs / approximate table sizes if required. > > EXISTING QUERY: > SELECT field_b FROM large_table_a > JOIN large_table_b USING (field_b) > WHERE field_a IN (SELECT large_table_b.field_a > FROM large_table_b WHERE field_b = 2673056) > > RECOMMENDED QUERY: > SELECT s1.field_b FROM large_table_a > JOIN large_table_b s1 USING (field_b) > JOIN large_table_b s2 ON s1.field_a = s2.field_a > WHERE s2.field_b = 2673056 > > Poor plans being generated for the subquery variant above were specifically targeted in 8.4.9. It may be that you don't need the workaround in that (or corresponding later) versions - 9.0.5, 9.1.0. Regards Mark
Robins Tharakan wrote: > I'll try to answer in-line. Thanks; that's the preferred style on PostgreSQL lists. > On 10/17/2011 09:32 PM, Kevin Grittner wrote: >> First off, did you use pg_upgrade from an earlier major release? >> If so, be sure you've dealt with this issue: > Although I joined recently, I doubt whether pg_upgrade was used > here. And this doesn't look like the issue either. There are no > data loss issues and this seems primarily a planner specific bug. The data loss doesn't happen until transaction ID wraparound -- so if you had used pg_upgrade to get to where you are, and not used the recovery techniques I pointed to, you could suddenly start losing data at a time long after the conversion. Since you're on a version which came out before that was discovered I thought it would be friendly to try to save you that trouble; but if you're sure you're not in a vulnerable state, that's great. >> Since it's arguably in your best interest to update at least to >> 8.4.9 anyway, the easiest way to get your answer might be to do so >> and test it. > Frankly, its slightly difficult to just try out versions. DB>1Tb > and getting that kind of resources to just try out versions for a > query is not that simple. Hope you would understand. That I don't understand. We have found that it takes no longer to upgrade to a new minor release on a 2.5 TB database cluster than on a tiny 300 MB cluster. (With pg_upgrade, it only takes five or ten minutes of down time to upgrade a new *major* release on a multi-TB database, but that's not what we're talking about to get to 9.4.9.) We build from source, and we include the minor release number in the prefix for the build, so we can have both old and new software installed side-by-side. The path for the client-side executables we do through a symbolic link, so we can switch that painlessly. And we assign the prefix used for the server to an environment variable in our services script. So here is our process: - Build and install the new minor release. - Change the symlink to use it for clients (like pg_dump and psql). - Change the service script line that sets the prefix to point to the new minor release. - Run the service script with "stop" and then run the service script with "start". (Unless your service script does a restart by using stop and start, don't run it with "restart", because a PostgreSQL restart won't pick up the new executables.) There is literally no more down time than it takes to stop and start the database service. Our client software retries on a broken connection, so we can even do this while users are running and they just get a clock for a few seconds; but we usually prefer not to cause even that much disruption, at least during normal business hours. We have enough hardware to load balance off of one machine at a time to do this without interruption of service. There are sometimes bugs fixed in a minor release which require cleanup of possibly damaged data, like what I mentioned above. You may need to vacuum or reindex something to recover from the damage caused by the now-fixed bug, but the alternative is to continue to run with the damage. I don't understand why someone would knowingly choose that. Really, it is worthwhile to keep up on minor releases. http://www.postgresql.org/support/versioning Perhaps the difference is that you feel I'm suggesting that you upgrade in order to see if performance improves. I'm not. I'm suggesting that you upgrade to get the bug fixes and security fixes. After the upgrade, it would make sense to see if it also fixed your performance problem. > I have the workaround implemented, but just wanted to be sure that > this is accommodated in a newer version. You've already gotten feedback on that; I don't have anything to add there. -Kevin
Thanks Kevin, That's a pretty neat way to managing (at least) minor upgrades. Like I said, this place is new, and so although I'm quite positive about upgrading to the latest, I should probably take things one-at-a-time and bring in this idea of implementing regular updates sometime in the future. As for the query, I tried the same query on an alternate machine, and this is how EXPLAIN ANALYZE came up. Its much faster than the earlier slow query, but nowhere near the performance of the second query shown earlier. Do I have to live with that until this is implemented (if I am only doing a minor version upgrade) or am I missing something else here? I've provided the EXPLAIN ANALYZE as well as the web-link for a pretty output of the EXPLAIN ANALYZE for your review. ORIGINAL QUERY (on PostgreSQL 8.4.9): http://explain.depesz.com/s/bTm EXPLAIN ANALYZE SELECT field_a FROM large_table_a JOIN large_table_b USING (field_a) WHERE field_b IN (SELECT large_table_b.field_b FROM large_table_b WHERE field_a = 2673056) ; ------------------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=132.97..194243.54 rows=156031 width=4) (actual time=6.612..43179.524 rows=2120 loops=1) -> Nested Loop (cost=132.97..1107.63 rows=156031 width=4) (actual time=6.576..29122.017 rows=6938 loops=1) -> HashAggregate (cost=132.97..133.96 rows=99 width=4) (actual time=6.543..12.726 rows=2173 loops=1) -> Index Scan using "IX_large_table_b_SigId" on large_table_b (cost=0.00..132.56 rows=164 width=4) (actual time=0.029..3.425 rows=2173 loops=1) Index Cond: (field_a = 2673056) -> Index Scan using "IX_large_table_b_field_b" on large_table_b (cost=0.00..9.81 rows=2 width=8) (actual time=6.732..13.384 rows=3 loops=2173) Index Cond: (public.large_table_b.field_b = public.large_table_b.field_b) -> Index Scan using "PK_large_table_a" on large_table_a (cost=0.00..1.23 rows=1 width=4) (actual time=2.021..2.021 rows=0 loops=6938) Index Cond: (large_table_a.field_a = public.large_table_b.field_a) Total runtime: 43182.975 ms OPTIMIZED QUERY (on PostgreSQL 8.4.7): http://explain.depesz.com/s/emO EXPLAIN ANALYZE SELECT s1.field_a FROM large_table_a JOIN large_table_b s1 USING (field_a) JOIN large_table_b s2 ON s1.field_b = s2.field_b WHERE s2.field_a = 2673056; ------------------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.00..2356.98 rows=494 width=4) (actual time=0.086..96.056 rows=2120 loops=1) -> Nested Loop (cost=0.00..1745.51 rows=494 width=4) (actual time=0.051..48.900 rows=6938 loops=1) -> Index Scan using "IX_large_table_b_SigId" on large_table_b s2 (cost=0.00..132.56 rows=164 width=4) (actual time=0.028..3.411 rows=2173 loops=1) Index Cond: (field_a = 2673056) -> Index Scan using "IX_large_table_b_field_b" on large_table_b s1 (cost=0.00..9.81 rows=2 width=8) (actual time=0.007..0.012 rows=3 loops=2173) Index Cond: (s1.field_b = s2.field_b) -> Index Scan using "PK_large_table_a" on large_table_a (cost=0.00..1.23 rows=1 width=4) (actual time=0.004..0.004 rows=0 loops=6938) Index Cond: (large_table_a.field_a = s1.field_a) Total runtime: 98.165 ms -- Robins Tharakan On 10/18/2011 06:16 PM, Kevin Grittner wrote: > We build from source, and we include the minor release number in the > prefix for the build, so we can have both old and new software > installed side-by-side. The path for the client-side executables we > do through a symbolic link, so we can switch that painlessly. And we > assign the prefix used for the server to an environment variable in > our services script. So here is our process: > > - Build and install the new minor release. > - Change the symlink to use it for clients (like pg_dump and psql). > - Change the service script line that sets the prefix to point to > the new minor release. > - Run the service script with "stop" and then run the service script > with "start". (Unless your service script does a restart by using > stop and start, don't run it with "restart", because a PostgreSQL > restart won't pick up the new executables.) > > There is literally no more down time than it takes to stop and start > the database service. Our client software retries on a broken > connection, so we can even do this while users are running and they > just get a clock for a few seconds; but we usually prefer not to > cause even that much disruption, at least during normal business > hours. We have enough hardware to load balance off of one machine at > a time to do this without interruption of service. > > There are sometimes bugs fixed in a minor release which require > cleanup of possibly damaged data, like what I mentioned above. You > may need to vacuum or reindex something to recover from the damage > caused by the now-fixed bug, but the alternative is to continue to > run with the damage. I don't understand why someone would knowingly > choose that. > > Really, it is worthwhile to keep up on minor releases. > -Kevin
Attachment
Robins Tharakan <robins.tharakan@comodo.com> writes: > ORIGINAL QUERY (on PostgreSQL 8.4.9): > http://explain.depesz.com/s/bTm > EXPLAIN ANALYZE SELECT field_a FROM large_table_a JOIN large_table_b > USING (field_a) WHERE field_b IN (SELECT large_table_b.field_b FROM > large_table_b WHERE field_a = 2673056) ; > ------------------------------------------------------------------------------------------------------------------------------------------------------------ > Nested Loop (cost=132.97..194243.54 rows=156031 width=4) (actual > time=6.612..43179.524 rows=2120 loops=1) > -> Nested Loop (cost=132.97..1107.63 rows=156031 width=4) (actual > time=6.576..29122.017 rows=6938 loops=1) > -> HashAggregate (cost=132.97..133.96 rows=99 width=4) > (actual time=6.543..12.726 rows=2173 loops=1) > -> Index Scan using "IX_large_table_b_SigId" on > large_table_b (cost=0.00..132.56 rows=164 width=4) (actual > time=0.029..3.425 rows=2173 loops=1) > Index Cond: (field_a = 2673056) > -> Index Scan using "IX_large_table_b_field_b" on > large_table_b (cost=0.00..9.81 rows=2 width=8) (actual > time=6.732..13.384 rows=3 loops=2173) > Index Cond: (public.large_table_b.field_b = > public.large_table_b.field_b) > -> Index Scan using "PK_large_table_a" on large_table_a > (cost=0.00..1.23 rows=1 width=4) (actual time=2.021..2.021 rows=0 > loops=6938) > Index Cond: (large_table_a.field_a = public.large_table_b.field_a) > Total runtime: 43182.975 ms > OPTIMIZED QUERY (on PostgreSQL 8.4.7): > http://explain.depesz.com/s/emO > EXPLAIN ANALYZE SELECT s1.field_a FROM large_table_a JOIN large_table_b > s1 USING (field_a) JOIN large_table_b s2 ON s1.field_b = s2.field_b > WHERE s2.field_a = 2673056; > ------------------------------------------------------------------------------------------------------------------------------------------------------------ > Nested Loop (cost=0.00..2356.98 rows=494 width=4) (actual > time=0.086..96.056 rows=2120 loops=1) > -> Nested Loop (cost=0.00..1745.51 rows=494 width=4) (actual > time=0.051..48.900 rows=6938 loops=1) > -> Index Scan using "IX_large_table_b_SigId" on large_table_b > s2 (cost=0.00..132.56 rows=164 width=4) (actual time=0.028..3.411 > rows=2173 loops=1) > Index Cond: (field_a = 2673056) > -> Index Scan using "IX_large_table_b_field_b" on > large_table_b s1 (cost=0.00..9.81 rows=2 width=8) (actual > time=0.007..0.012 rows=3 loops=2173) > Index Cond: (s1.field_b = s2.field_b) > -> Index Scan using "PK_large_table_a" on large_table_a > (cost=0.00..1.23 rows=1 width=4) (actual time=0.004..0.004 rows=0 > loops=6938) > Index Cond: (large_table_a.field_a = s1.field_a) > Total runtime: 98.165 ms I suspect that you're just fooling yourself here, and the "optimized" query is no such thing. Those plans are identical except for the insertion of the HashAggregate step, which in itself adds less than 10msec to the runtime, and we can see it's not eliminating any rows either. So why does the second one run so much faster? I can think of three theories: 1. The tables are horrendously bloated on the first database, so that many more pages have to be touched to get the same number of tuples. This would likely indicate an improper autovacuum configuration. 2. You failed to account for caching effects, ie the first example is being run "cold" and has to actually read everything from disk, whereas the second example has everything it needs already in RAM. In that case the speed differential is quite illusory. 3. The HashAggregate would likely spit out the rows in a completely different order than it received them. If scanning large_table_b in the order of IX_large_table_b_SigId happens to yield field_b values that are very well ordered, it's possible that locality of access in the other indexscans would be enough better in the second plan to account for the speedup. This seems the least likely theory, though. BTW, how come is it that "SELECT large_table_b.field_b FROM large_table_b WHERE field_a = 2673056" produces no duplicate field_b values? Is that just luck? Is there a unique constraint on the table that implies it will happen? regards, tom lane
Thanks Tom! Regret the delay in reply, but two of the three guesses were spot-on and resolved the doubt. 8.4.9 does take care of this case very well. On 10/27/2011 01:27 AM, Tom Lane wrote: > I suspect that you're just fooling yourself here, and the "optimized" > query is no such thing. :) I actually meant 'faster' query, but well... > 1. The tables are horrendously bloated on the first database, so that > many more pages have to be touched to get the same number of tuples. > This would likely indicate an improper autovacuum configuration. I believe you've nailed it pretty accurately. The tables are horrendously bloated and I may need to tune AutoVacuum to be much more aggressive than it is. I did see that HashAggregate makes only a minor difference, but what didn't strike is that the slowness could be bloat. > 2. You failed to account for caching effects, ie the first example > is being run "cold" and has to actually read everything from disk, > whereas the second example has everything it needs already in RAM. > In that case the speed differential is quite illusory. On hindsight, this was a miss. Should have warmed the caches before posting. Re-running this query multiple times, brought out the result in ~100ms. > BTW, how come is it that "SELECT large_table_b.field_b FROM > large_table_b WHERE field_a = 2673056" produces no duplicate field_b > values? Is that just luck? Is there a unique constraint on the table > that implies it will happen? Its just luck. Sometimes the corresponding values genuinely don't exist in the other table, so that's ok. -- Robins Tharakan