Thread: 8.2 Query 10 times slower than 8.1 (view-heavy)
Dearest dragon hunters and mortal wanna-bes, I recently upgraded a system from Apache2/mod_perl2 to Lighttpd/fastcgi. The upgrade went about as rough as can be. While in the midst of a bad day, I decided to make it worse, and upgrade Pg 8.1 to 8.2. Most people I talk to seem to think 8.1 was a lemon release; not I. It worked excellent for me for the longest time, and I had no good reason to upgrade it, other than to just have done so. In the process, A query that took a matter of 2minutes, started taking hours. I broke that query up into something more atomic and used it as a sample. The following material is provided for your assisting-me-pleasure: the original SQL; the \ds for all pertinent views and tables; the output of Explain Analyze; and the original query. The original query both trials was: SELECT * FROM test_view where U_ID = 8; test_view.sql = http://rafb.net/p/HhT9g489.html 8.1_explain_analyze = http://rafb.net/p/uIyY1s44.html 8.2_explain_analzye = http://rafb.net/p/mxHWi340.html \d table/views = http://rafb.net/p/EPnyB229.html Yes, I ran vacuum full after loading both dbs. Thanks again, ask and I will provide anything else. I'm on freenode, in #postgresql, and can be found at all times with the nick EvanCarroll. -- Evan Carroll System Lord of the Internets me@evancarroll.com 832-445-8877
On 8/28/07, Evan Carroll <lists@evancarroll.com> wrote: > the midst of a bad day, I decided to make it worse, and upgrade Pg 8.1 > to 8.2. Most people I talk to seem to think 8.1 was a lemon release; > not I. 8.0 was the release that had more issues for me, as it was the first version with all the backend work done to make it capable of running windows. for that reason I stayed on 7.4 until 8.1.4 or so was out. 8.2 was a nice incremental upgrade, and I migrated to it around 8.2.3 and have been happy every since. > Yes, I ran vacuum full after loading both dbs. Did you run analyze? It's not built into vacuum.
"Evan Carroll" <lists@evancarroll.com> writes: "Evan Carroll" <lists@evancarroll.com> writes: > Dearest dragon hunters and mortal wanna-bes, > > I recently upgraded a system from Apache2/mod_perl2 to > Lighttpd/fastcgi. The upgrade went about as rough as can be. While in > the midst of a bad day, I decided to make it worse, and upgrade Pg 8.1 > to 8.2. Most people I talk to seem to think 8.1 was a lemon release; > not I. It worked excellent for me for the longest time, and I had no > good reason to upgrade it, other than to just have done so. I assume you mean 8.1.9 and 8.2.4? > The following material is provided for your assisting-me-pleasure: the > original SQL; the \ds for all pertinent views and tables; the output > of Explain Analyze; and the original query. While I do in fact enjoy analyzing query plans I have to say that 75-line plans push the bounds of my assisting-you-pleasure. Have you experimented with simplifying this query? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Gregory Stark wrote: > "Evan Carroll" <lists@evancarroll.com> writes: > > "Evan Carroll" <lists@evancarroll.com> writes: > >> Dearest dragon hunters and mortal wanna-bes, >> >> I recently upgraded a system from Apache2/mod_perl2 to >> Lighttpd/fastcgi. The upgrade went about as rough as can be. While in >> the midst of a bad day, I decided to make it worse, and upgrade Pg 8.1 >> to 8.2. Most people I talk to seem to think 8.1 was a lemon release; >> not I. It worked excellent for me for the longest time, and I had no >> good reason to upgrade it, other than to just have done so. > > I assume you mean 8.1.9 and 8.2.4? > >> The following material is provided for your assisting-me-pleasure: the >> original SQL; the \ds for all pertinent views and tables; the output >> of Explain Analyze; and the original query. > > While I do in fact enjoy analyzing query plans I have to say that 75-line > plans push the bounds of my assisting-you-pleasure. Have you experimented with > simplifying this query? Although simplifying the query is probably in order, doesn't it stand to reason that there may be a problem here. 10x difference (in the worse) from a lower version to a higher, is likely wrong :) Joshua D. Drake > - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG1EqYATb/zqfZUUQRAighAJ9g+Py+CRwsW7f5QWuA4uZ5G26a9gCcCXG2 0Le2KBGpdhDZyu4ZT30y8RA= =MfQw -----END PGP SIGNATURE-----
>>> On Tue, Aug 28, 2007 at 10:22 AM, in message <428b865e0708280822u522391ecr5366bb5a6647f1aa@mail.gmail.com>, "Evan Carroll" <lists@evancarroll.com> wrote: > Yes, I ran vacuum full after loading both dbs. Have you run VACUUM ANALYZE or ANALYZE? -Kevin
---------- Forwarded message ---------- From: Evan Carroll <me@evancarroll.com> Date: Aug 28, 2007 11:23 AM Subject: Re: [PERFORM] 8.2 Query 10 times slower than 8.1 (view-heavy) To: Scott Marlowe <scott.marlowe@gmail.com> On 8/28/07, Scott Marlowe <scott.marlowe@gmail.com> wrote: > I looked through your query plan, and this is what stood out in the 8.2 plan: > > -> Nested Loop Left Join (cost=8830.30..10871.27 rows=1 > width=102) (actual time=2148.444..236018.971 rows=62 loops=1) > Join Filter: ((public.contact.pkid = > public.contact.pkid) AND (public.event.ts_in > public.event.ts_in)) > Filter: (public.event.pkid IS NULL) > > Notice the misestimation is by a factor of 62, and the actual time > goes from 2149 to 236018 ms. > > Again, have you analyzed your tables / databases? > contacts=# \o scott_marlowe_test contacts=# VACUUM FULL ANALYZE; contacts=# SELECT * FROM test_view WHERE U_ID = 8; Cancel request sent ERROR: canceling statement due to user request contacts=# EXPLAIN ANALYZE SELECT * FROM test_view WHERE U_ID = 8; output found at http://rafb.net/p/EQouMI82.html -- Evan Carroll System Lord of the Internets me@evancarroll.com 832-445-8877 -- Evan Carroll System Lord of the Internets me@evancarroll.com 832-445-8877
From: "Evan Carroll" <me@evancarroll.com> To: "Kevin Grittner" <Kevin.Grittner@wicourts.gov>, pgsql-performance@postgresql.org Date: Tue, 28 Aug 2007 11:21:54 -0500 Subject: Re: [PERFORM] 8.2 Query 10 times slower than 8.1 (view-heavy) On 8/28/07, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > >>> On Tue, Aug 28, 2007 at 10:22 AM, in message > <428b865e0708280822u522391ecr5366bb5a6647f1aa@mail.gmail.com>, "Evan Carroll" > <lists@evancarroll.com> wrote: > > Yes, I ran vacuum full after loading both dbs. > > Have you run VACUUM ANALYZE or ANALYZE? VACUUM FULL ANALYZE on both tables, out of habit. > > -Kevin -- Evan Carroll System Lord of the Internets me@evancarroll.com 832-445-8877
It looks like your view is using a left join to look for rows in one table without matching rows in the other, i.e. a SQL construct similar in form to the query below: SELECT ... FROM A LEFT JOIN B ON (...) WHERE B.primary_key IS NULL Unfortunately there has been a planner regression in 8.2 in some cases with these forms of queries. This was discussed a few weeks (months?) ago on this forum. I haven't looked closely enough to confirm that this is the problem in your case, but it seems likely. Is it possible to refactor the query to avoid using this construct to see if that helps? We've been holding back from upgrading to 8.2 because this one is a show-stopper for us. -- Mark Lewis On Tue, 2007-08-28 at 11:24 -0500, Evan Carroll wrote: > ---------- Forwarded message ---------- > From: Evan Carroll <me@evancarroll.com> > Date: Aug 28, 2007 11:23 AM > Subject: Re: [PERFORM] 8.2 Query 10 times slower than 8.1 (view-heavy) > To: Scott Marlowe <scott.marlowe@gmail.com> > > > On 8/28/07, Scott Marlowe <scott.marlowe@gmail.com> wrote: > > I looked through your query plan, and this is what stood out in the 8.2 plan: > > > > -> Nested Loop Left Join (cost=8830.30..10871.27 rows=1 > > width=102) (actual time=2148.444..236018.971 rows=62 loops=1) > > Join Filter: ((public.contact.pkid = > > public.contact.pkid) AND (public.event.ts_in > public.event.ts_in)) > > Filter: (public.event.pkid IS NULL) > > > > Notice the misestimation is by a factor of 62, and the actual time > > goes from 2149 to 236018 ms. > > > > Again, have you analyzed your tables / databases? > > > contacts=# \o scott_marlowe_test > contacts=# VACUUM FULL ANALYZE; > contacts=# SELECT * FROM test_view WHERE U_ID = 8; > Cancel request sent > ERROR: canceling statement due to user request > contacts=# EXPLAIN ANALYZE SELECT * FROM test_view WHERE U_ID = 8; > > output found at http://rafb.net/p/EQouMI82.html > > -- > Evan Carroll > System Lord of the Internets > me@evancarroll.com > 832-445-8877 > >
Mark Lewis <mark.lewis@mir3.com> writes: > Unfortunately there has been a planner regression in 8.2 in some cases > with these forms of queries. This was discussed a few weeks (months?) > ago on this forum. I haven't looked closely enough to confirm that this > is the problem in your case, but it seems likely. Yeah, the EXPLAIN ANALYZE output clearly shows a drastic underestimate of the number of rows out of a join like this, and a consequent choice of a nestloop above it that performs terribly. > We've been holding back from upgrading to 8.2 because this one is a > show-stopper for us. Well, you could always make your own version with this patch reverted: http://archives.postgresql.org/pgsql-committers/2006-11/msg00066.php I might end up doing that in the 8.2 branch if a better solution seems too large to back-patch. regards, tom lane
On 8/28/07, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > >>> On Tue, Aug 28, 2007 at 10:22 AM, in message > <428b865e0708280822u522391ecr5366bb5a6647f1aa@mail.gmail.com>, "Evan Carroll" > <lists@evancarroll.com> wrote: > > Yes, I ran vacuum full after loading both dbs. > > Have you run VACUUM ANALYZE or ANALYZE? VACUUM FULL ANALYZE on both tables, out of habit. > > -Kevin > > > > -- Evan Carroll System Lord of the Internets me@evancarroll.com 832-445-8877
I wrote: > Mark Lewis <mark.lewis@mir3.com> writes: >> We've been holding back from upgrading to 8.2 because this one is a >> show-stopper for us. > Well, you could always make your own version with this patch reverted: > http://archives.postgresql.org/pgsql-committers/2006-11/msg00066.php > I might end up doing that in the 8.2 branch if a better solution > seems too large to back-patch. I thought of a suitably small hack that should cover at least the main problem without going so far as to revert that patch entirely. What we can do is have the IS NULL estimator recognize when the clause is being applied at an outer join, and not believe the table statistics in that case. I've applied the attached patch for this --- are you interested in trying it out on your queries before 8.2.5 comes out? regards, tom lane Index: src/backend/optimizer/path/clausesel.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/optimizer/path/clausesel.c,v retrieving revision 1.82 diff -c -r1.82 clausesel.c *** src/backend/optimizer/path/clausesel.c 4 Oct 2006 00:29:53 -0000 1.82 --- src/backend/optimizer/path/clausesel.c 31 Aug 2007 23:29:01 -0000 *************** *** 218,224 **** s2 = rqlist->hibound + rqlist->lobound - 1.0; /* Adjust for double-exclusion of NULLs */ ! s2 += nulltestsel(root, IS_NULL, rqlist->var, varRelid); /* * A zero or slightly negative s2 should be converted into a --- 218,226 ---- s2 = rqlist->hibound + rqlist->lobound - 1.0; /* Adjust for double-exclusion of NULLs */ ! /* HACK: disable nulltestsel's special outer-join logic */ ! s2 += nulltestsel(root, IS_NULL, rqlist->var, ! varRelid, JOIN_INNER); /* * A zero or slightly negative s2 should be converted into a *************** *** 701,707 **** s1 = nulltestsel(root, ((NullTest *) clause)->nulltesttype, (Node *) ((NullTest *) clause)->arg, ! varRelid); } else if (IsA(clause, BooleanTest)) { --- 703,710 ---- s1 = nulltestsel(root, ((NullTest *) clause)->nulltesttype, (Node *) ((NullTest *) clause)->arg, ! varRelid, ! jointype); } else if (IsA(clause, BooleanTest)) { Index: src/backend/utils/adt/selfuncs.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v retrieving revision 1.214.2.5 diff -c -r1.214.2.5 selfuncs.c *** src/backend/utils/adt/selfuncs.c 5 May 2007 17:05:55 -0000 1.214.2.5 --- src/backend/utils/adt/selfuncs.c 31 Aug 2007 23:29:02 -0000 *************** *** 1386,1396 **** */ Selectivity nulltestsel(PlannerInfo *root, NullTestType nulltesttype, ! Node *arg, int varRelid) { VariableStatData vardata; double selec; examine_variable(root, arg, varRelid, &vardata); if (HeapTupleIsValid(vardata.statsTuple)) --- 1386,1409 ---- */ Selectivity nulltestsel(PlannerInfo *root, NullTestType nulltesttype, ! Node *arg, int varRelid, JoinType jointype) { VariableStatData vardata; double selec; + /* + * Special hack: an IS NULL test being applied at an outer join should not + * be taken at face value, since it's very likely being used to select the + * outer-side rows that don't have a match, and thus its selectivity has + * nothing whatever to do with the statistics of the original table + * column. We do not have nearly enough context here to determine its + * true selectivity, so for the moment punt and guess at 0.5. Eventually + * the planner should be made to provide enough info about the clause's + * context to let us do better. + */ + if (IS_OUTER_JOIN(jointype) && nulltesttype == IS_NULL) + return (Selectivity) 0.5; + examine_variable(root, arg, varRelid, &vardata); if (HeapTupleIsValid(vardata.statsTuple)) Index: src/include/utils/selfuncs.h =================================================================== RCS file: /cvsroot/pgsql/src/include/utils/selfuncs.h,v retrieving revision 1.36 diff -c -r1.36 selfuncs.h *** src/include/utils/selfuncs.h 4 Oct 2006 00:30:11 -0000 1.36 --- src/include/utils/selfuncs.h 31 Aug 2007 23:29:02 -0000 *************** *** 149,155 **** extern Selectivity booltestsel(PlannerInfo *root, BoolTestType booltesttype, Node *arg, int varRelid, JoinType jointype); extern Selectivity nulltestsel(PlannerInfo *root, NullTestType nulltesttype, ! Node *arg, int varRelid); extern Selectivity scalararraysel(PlannerInfo *root, ScalarArrayOpExpr *clause, bool is_join_clause, --- 149,155 ---- extern Selectivity booltestsel(PlannerInfo *root, BoolTestType booltesttype, Node *arg, int varRelid, JoinType jointype); extern Selectivity nulltestsel(PlannerInfo *root, NullTestType nulltesttype, ! Node *arg, int varRelid, JoinType jointype); extern Selectivity scalararraysel(PlannerInfo *root, ScalarArrayOpExpr *clause, bool is_join_clause,
On Fri, 2007-08-31 at 19:39 -0400, Tom Lane wrote: > I wrote: > > Mark Lewis <mark.lewis@mir3.com> writes: > >> We've been holding back from upgrading to 8.2 because this one is a > >> show-stopper for us. > > > Well, you could always make your own version with this patch reverted: > > http://archives.postgresql.org/pgsql-committers/2006-11/msg00066.php > > I might end up doing that in the 8.2 branch if a better solution > > seems too large to back-patch. > > I thought of a suitably small hack that should cover at least the main > problem without going so far as to revert that patch entirely. What we > can do is have the IS NULL estimator recognize when the clause is being > applied at an outer join, and not believe the table statistics in that > case. I've applied the attached patch for this --- are you interested > in trying it out on your queries before 8.2.5 comes out? Wish I could, but I'm afraid that I'm not going to be in a position to try out the patch on the application that exhibits the problem for at least the next few weeks. -- Mark