Thread: add_path optimization
I've been doing some benchmarking and profiling on the PostgreSQL query analyzer, and it seems that (at least for the sorts of queries that I typically run) the dominant cost is add_path(). I've been able to find two optimizations that seem to help significantly: 1. add_path() often calls compare_fuzzy_path_costs() twice on the same pair of paths, and when the paths compare equal on one criterion, some comparisons are duplicated. I've refactored this function to return the results of both calculations without repeating any floating-point arithmetic. 2. match_unsorted_outer() adds as many as 5 nested loop joins at a time with the same set of pathkeys. In my tests, it tended to be ~3 - cheapest inner, cheapest inner materialized, and cheapest inner index. Since these all have the same pathkeys, clearly only the one with the cheapest total cost is in the running for cheapest total cost for that set of pathkeys, and likewise for startup cost (and the two may be the same). Yet we compare all of them against the whole pathlist, one after the other, including (for the most part) the rather expensive pathkey comparison. I've added a function add_similar_paths() and refactored match_unsorted_outer() to use it. On a couple of complex (and proprietary) queries with 12+ joins each, I measure a planning time improvement of 8-12% with the attached patch applied. It would be interesting to try to replicate this on a publicly available data set, but I don't know of a good one to use. Suggestions welcome - results of performance testing on your own favorite big queries even more welcome. Simple test harness also attached. I took the approach of dropping caches, starting the server, and then running this 5 times each on several queries, dropping top and bottom results. ...Robert
Attachment
On Sat, Jan 31, 2009 at 11:37:39PM -0500, Robert Haas wrote: > I've been doing some benchmarking and profiling on the PostgreSQL > query analyzer, and it seems that (at least for the sorts of queries > that I typically run) the dominant cost is add_path(). I've been > able to find two optimizations that seem to help significantly: Are there any cases you've found where this change significantly impairs performance, and if so, how did you find them? If not, would you be up for trying to find some? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Sun, Feb 1, 2009 at 12:03 PM, David Fetter <david@fetter.org> wrote: > On Sat, Jan 31, 2009 at 11:37:39PM -0500, Robert Haas wrote: >> I've been doing some benchmarking and profiling on the PostgreSQL >> query analyzer, and it seems that (at least for the sorts of queries >> that I typically run) the dominant cost is add_path(). I've been >> able to find two optimizations that seem to help significantly: > > Are there any cases you've found where this change significantly > impairs performance, and if so, how did you find them? If not, would > you be up for trying to find some? Basically, the patch is just performing the same operations with less overhead. For example, add_similar_path() is pretty much the same thing as repeated calls to add_path(), but you save the cost of unnecessary pathkey comparisons and maybe some ListCell alloc/free cycles. So I'm not really sure how it could make things worse, but I'd be interested in knowing if there's a case that you're worried about. It's pretty low-level code, so I don't think there's room for a lot of surprises. ...Robert
On Sat, Jan 31, 2009 at 11:37 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I've been doing some benchmarking and profiling on the PostgreSQL > query analyzer, and it seems that (at least for the sorts of queries > that I typically run) the dominant cost is add_path(). I've been able > to find two optimizations that seem to help significantly: > > 1. add_path() often calls compare_fuzzy_path_costs() twice on the same > > 2. match_unsorted_outer() adds as many as 5 nested loop joins at a if there are two optimizations maybe two different patches are better to test them individually -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
On Sun, Feb 1, 2009 at 1:34 PM, Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > On Sat, Jan 31, 2009 at 11:37 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> I've been doing some benchmarking and profiling on the PostgreSQL >> query analyzer, and it seems that (at least for the sorts of queries >> that I typically run) the dominant cost is add_path(). I've been able >> to find two optimizations that seem to help significantly: >> >> 1. add_path() often calls compare_fuzzy_path_costs() twice on the same >> >> 2. match_unsorted_outer() adds as many as 5 nested loop joins at a > > if there are two optimizations maybe two different patches are better > to test them individually I did test the changes independently and either one alone has a measurable benefit (with sufficiently careful measuring), but they're closely related changes so I think it makes more sense as one patch. It's only 84 insertions and 46 deletions, so we're not talking about some massive patch that will be difficult to review. There's also some synergy between the two changes: I don't think either works as well without the other. But please feel free to test it for yourself and let me know what you find. The changes are all very simple - the hard part was figuring out which changes would actually produce a benefit. ...Robert
disclaimer: I don't know that bit of postgresql code, in fact - this is the first time I see it. *** a/src/backend/optimizer/path/joinpath.c --- b/src/backend/optimizer/path/joinpath.c *************** *** 473,478 **** match_unsorted_outer(PlannerInfo *root, --- 473,481 ---- if (nestjoinOK) { + Path *paths[5]; I don't like the fact that you hardcoded that here. I know that you are trying to pass on few calls in one go here, but still... ugly. static int compare_fuzzy_path_costs(Path *path1, Path *path2, int *startup_cost) { ....*startup_cost = (s == 0) ? t : s; Why not *startup_cost = s, and let the caller decide which value it wants to use ? or just return both, from single call (which would ? ... return t; } To be fair, I don't see compare_fuzzy_path_costs change to save too much of time in planner. I would myself probably convert that function into two defines/inline funcs, but that's just me.
On Sun, Feb 1, 2009 at 3:25 PM, Grzegorz Jaskiewicz <gj@pointblue.com.pl> wrote: > I don't like the fact that you hardcoded that here. I know that you are > trying to pass on few calls in one go here, but still... ugly. Well, I think you'll find that using a dynamically sized data structure destroys the possibility of squeezing any additional performance out of this part of the code. The nice thing about fixed-size data structures is that they cost essentially nothing to stack-allocate; you just move the stack pointer and away you go. We should in fact be looking for MORE places where we can avoid the use of constructs like List, since the second-highest CPU hog in my tests was AllocSetAlloc(), beaten out only by add_path(). With this patch applied, AllocSetAlloc() moves up to first. (It would really be rather better to include all the paths generated in each pass of the loop in the call to add_similar_path(), but that looked rather more complicated because we can't predict how many of them there will be, and so adding a fixed-size data structure is not so easy. Plus, the code would all have to be rewritten not to assume that "continue" was the right way to move on to the next iteration of the loop. What would potentially be better still is to try to figure out which nested loop will be the winner without allocating all of the NestPath nodes in the first place, but that didn't seem possible without much more invasive changes, and it's not clear that you would actually still have a winner by the time you got done beating on it.) I am also somewhat mystified as to why using an array of size 5 to hold up to 5 data structure allocated in nearly-consecutive lines of C code would qualify as ugly (completely apart from the fact that it's a clear performance win). > static int > compare_fuzzy_path_costs(Path *path1, Path *path2, int *startup_cost) > { > .... > *startup_cost = (s == 0) ? t : s; > > Why not *startup_cost = s, and let the caller decide which value it wants to > use ? > or just return both, from single call (which would ? > ... > > return t; > } You're essentially suggesting removing logic from compare_fuzzy_path_costs() and duplicating it at the two call sites of that function. If there's a point to that, I don't see it. You might also take a look at the widely used function compare_path_costs(). > To be fair, I don't see compare_fuzzy_path_costs change to save too much of > time in planner. Hmm, well I didn't either, but there's this handy tool called gprof that you might want to try out. I wouldn't be wasting my time patching this part of the code if it didn't make a difference, and in fact if you do 10% of the amount of benchmarking that I did in the process of creating this patch, you will find that it in fact does make a difference. > I would myself probably convert that function into two defines/inline funcs, > but that's just me. It's already static to that .c file, so the compiler likely will inline it. In fact, I suspect you will find that removing the static keyword from the implementation of that function in CVS HEAD is itself sufficient to produce a small but measurable slowdown in planning of large join trees, exactly because it will defeat inlining. ...Robert
On 1 Feb 2009, at 21:35, Robert Haas wrote: > On Sun, Feb 1, 2009 at 3:25 PM, Grzegorz Jaskiewicz <gj@pointblue.com.pl > > wrote: >> I don't like the fact that you hardcoded that here. I know that you >> are >> trying to pass on few calls in one go here, but still... ugly. > > Well, I think you'll find that using a dynamically sized data > structure destroys the possibility of squeezing any additional > performance out of this part of the code. The nice thing about > fixed-size data structures is that they cost essentially nothing to > stack-allocate; you just move the stack pointer and away you go. We > should in fact be looking for MORE places where we can avoid the use > of constructs like List, since the second-highest CPU hog in my tests > was AllocSetAlloc(), beaten out only by add_path(). With this patch > applied, AllocSetAlloc() moves up to first. well, true - but also, statically allocated table, without any predefined size (with #DEFINE) , and no boundary check - is bad as well. I suppose , this code is easy enough to let it be with your changes, but I would still call it not pretty. > > Hmm, well I didn't either, but there's this handy tool called gprof > that you might want to try out. I wouldn't be wasting my time > patching this part of the code if it didn't make a difference, and in > fact if you do 10% of the amount of benchmarking that I did in the > process of creating this patch, you will find that it in fact does > make a difference. > To be honest, I really didn't had a time to run it down with your patch and gprof. I believe that you did that already, hence your suggestion, right ? Actually - if you did profile postgresql with bunch of queries, I wouldn't mind to see results of it - I don't know whether it makes sense to send that to the list (I would think it does), but if it is too big, or something - you could send it to me in private. > It's already static to that .c file, so the compiler likely will > inline it. In fact, I suspect you will find that removing the static > keyword from the implementation of that function in CVS HEAD is itself > sufficient to produce a small but measurable slowdown in planning of > large join trees, exactly because it will defeat inlining. that depends on many things, including whether optimizations are on or not. Because that function basically consists of two ifs essentially - it could easily be turned into two separate inlines/macros - that would remove any function's specific overhead (stack alloc, etc, etc).
> well, true - but also, statically allocated table, without any predefined > size (with #DEFINE) , and no boundary check - is bad as well. > I suppose , this code is easy enough to let it be with your changes, but I > would still call it not pretty. Well, it might merit a comment. > Actually - if you did profile postgresql with bunch of queries, I wouldn't > mind to see results of it - I don't know whether it makes sense to send that > to the list (I would think it does), but if it is too big, or something - > you could send it to me in private. What I'd really like to do is develop some tests based on a publicly available dataset. Any suggestions? ...Robert
On 2 Feb 2009, at 14:50, Robert Haas wrote: >> well, true - but also, statically allocated table, without any >> predefined >> size (with #DEFINE) , and no boundary check - is bad as well. >> I suppose , this code is easy enough to let it be with your >> changes, but I >> would still call it not pretty. > > Well, it might merit a comment. :) > > What I'd really like to do is develop some tests based on a publicly > available dataset. Any suggestions? I would say, it wouldn't hurt to do benchmarking/profiling regression tests on real hardware - but someone will have to generate quite substantial amount of data, so we could test it on small queries, up to 20+ join/sort/window function/aggregation queries, with various indexes, and data types. The more real the data, the better. I could make some of my stuff public - but without the lookup tables (id->some real data - like, names, surnames, mac addr, etc).
>>> Robert Haas <robertmhaas@gmail.com> wrote: > running this 5 times each on several queries, > dropping top and bottom results. Running a complex query (posted in previous threads, runs about 300,000 time per day in a production web application), I got these timings on a production quality machine (4 quad CPU chips, that is 16 CPUs like this: Intel(R) Xeon(R) CPU X7350 @ 2.93GHz, 128 GB RAM, big RAID with BBU). I ran explain in each environment 5 times, tossed high and low, and averaged. The 8.4devel was from today's (2008-02-02) snapshot, built the same way we did 8.3.5. 8.3.5, statistics target 10: 36.188 ms 8.4devel without patch, statistics target 100: 109.862 ms 8.4devel with patch, statistics target 100: 104.015 ms After seeing that, I re-analyzed to eliminate the statistics target as the cause of the 8.4 increase. 8.4devel with patch, statistics target 10: 99.421 ms -Kevin
On Mon, Feb 2, 2009 at 8:10 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: >>>> Robert Haas <robertmhaas@gmail.com> wrote: >> running this 5 times each on several queries, >> dropping top and bottom results. > > Running a complex query (posted in previous threads, runs about > 300,000 time per day in a production web application), I got these > timings on a production quality machine (4 quad CPU chips, that is 16 > CPUs like this: Intel(R) Xeon(R) CPU X7350 @ 2.93GHz, 128 GB RAM, big > RAID with BBU). I ran explain in each environment 5 times, tossed > high and low, and averaged. The 8.4devel was from today's > (2008-02-02) snapshot, built the same way we did 8.3.5. > > 8.3.5, statistics target 10: 36.188 ms > 8.4devel without patch, statistics target 100: 109.862 ms > 8.4devel with patch, statistics target 100: 104.015 ms > > After seeing that, I re-analyzed to eliminate the statistics target as > the cause of the 8.4 increase. > > 8.4devel with patch, statistics target 10: 99.421 ms Yikes! The impact of the patch is about what I'd expect, but the fact that planning time has nearly tripled is... way poor. Can you repost the query and the EXPLAIN output for 8.3.5 and CVS HEAD? ...Robert
>> Running a complex query (posted in previous threads, runs about >> 300,000 time per day in a production web application), I got these >> timings on a production quality machine (4 quad CPU chips, that is 16 >> CPUs like this: Intel(R) Xeon(R) CPU X7350 @ 2.93GHz, 128 GB RAM, big >> RAID with BBU). I ran explain in each environment 5 times, tossed >> high and low, and averaged. The 8.4devel was from today's >> (2008-02-02) snapshot, built the same way we did 8.3.5. >> >> 8.3.5, statistics target 10: 36.188 ms >> 8.4devel without patch, statistics target 100: 109.862 ms >> 8.4devel with patch, statistics target 100: 104.015 ms >> >> After seeing that, I re-analyzed to eliminate the statistics target as >> the cause of the 8.4 increase. >> >> 8.4devel with patch, statistics target 10: 99.421 ms > > Yikes! The impact of the patch is about what I'd expect, but the fact > that planning time has nearly tripled is... way poor. Can you repost > the query and the EXPLAIN output for 8.3.5 and CVS HEAD? FYI, I retested my queries on REL8_3_STABLE and the results were not all that different from CVS HEAD. So the problem is apparently specific to something your query is doing that mine isn't., rather than a general slowdown in planning (or else one of us goofed up the testing). ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > Yikes! The impact of the patch is about what I'd expect, but the fact > that planning time has nearly tripled is... way poor. We're going to need to see the test case, because I don't see that in some simple tests here. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > Yikes! The impact of the patch is about what I'd expect, but the fact > > that planning time has nearly tripled is... way poor. > > We're going to need to see the test case, because I don't see that in > some simple tests here. A good data set, plus complex queries against it, might be the data from the US Census, specifically the TIGER data and the TIGER geocoder. I've been following this thread with the intention of putting together a large-data test set, but I just havn't found the time to yet. Right now there's alot of dependencies on PostGIS (which aren't really required to just do the queries to pull out the street segment) which I figure people would want ripped out. It'd also be nice to include the other Census data besides just the road data. If people really are interested, I'll see what I can put together. It's *alot* of data (around 23G total in PG), though perhaps just doing 1 state would be enough for a good test, I keep the states split up anyway using CHECK constraints. Don't think that would change this case, though there might be cases where it does affect things.. Thanks, Stephen
> A good data set, plus complex queries against it, might be the data from > the US Census, specifically the TIGER data and the TIGER geocoder. I've > been following this thread with the intention of putting together a > large-data test set, but I just havn't found the time to yet. Right now > there's alot of dependencies on PostGIS (which aren't really required to > just do the queries to pull out the street segment) which I figure > people would want ripped out. It'd also be nice to include the other > Census data besides just the road data. > > If people really are interested, I'll see what I can put together. It's > *alot* of data (around 23G total in PG), though perhaps just doing 1 > state would be enough for a good test, I keep the states split up > anyway using CHECK constraints. Don't think that would change this > case, though there might be cases where it does affect things.. I'm interested, but I need maybe a 1GB data set, or smaller. The thing that we are benchmarking is the planner, and planning times are related to the complexity of the database and the accompanying queries, not the raw volume of data. (It's not size that matters, it's how you use it?) In fact, in a large database, one could argue that there is less reason to care about the planner, because the execution time will dominate anyway. I'm interested in complex queries in web/OLTP type applications, where you need the query to be planned and executed in 400 ms at the outside (and preferably less than half of that). ...Robert
* Robert Haas (robertmhaas@gmail.com) wrote: > I'm interested, but I need maybe a 1GB data set, or smaller. The > thing that we are benchmarking is the planner, and planning times are > related to the complexity of the database and the accompanying > queries, not the raw volume of data. (It's not size that matters, > it's how you use it?) In fact, in a large database, one could argue > that there is less reason to care about the planner, because the > execution time will dominate anyway. I'm interested in complex > queries in web/OLTP type applications, where you need the query to be > planned and executed in 400 ms at the outside (and preferably less > than half of that). We prefer that our geocoding be fast... :) Doing 1 state should give you about the right size (half to 1G of data). I'll try to put together a good test set this week. Stephen
Robert Haas <robertmhaas@gmail.com> writes: > I'm interested, but I need maybe a 1GB data set, or smaller. The > thing that we are benchmarking is the planner, and planning times are > related to the complexity of the database and the accompanying > queries, not the raw volume of data. In fact, the only reason to care whether there is any data in the DB *at all* is that you need some realistic content in pg_statistic. So it should be possible to set up a planner test DB with very little data bulk, which would surely make testing a lot less painful. regards, tom lane
>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: > In fact, the only reason to care whether there is any data in the DB > *at all* is that you need some realistic content in pg_statistic. > So it should be possible to set up a planner test DB with very little > data bulk, which would surely make testing a lot less painful. Can you suggest a query (or queries) which, together with a schema dump, would give you enough to duplicate my results? -Kevin
>>> Robert Haas <robertmhaas@gmail.com> wrote: > FYI, I retested my queries on REL8_3_STABLE and the results were not > all that different from CVS HEAD. So the problem is apparently > specific to something your query is doing that mine isn't., rather > than a general slowdown in planning (or else one of us goofed up the > testing). I know you said size doesn't matter, but just for the record, the ten tables I loaded for this test put the database at 56G. I'm pulling information together to share on this, but I was wondering: is there any possibility that the tendency to use index scans in nested loops (given the table sizes and the availability of useful indexes) contributes to the difference? Other possible factors: Most keys are multi-column and include varchar-based data types. Most columns are defined via domains. (More info to follow.) -Kevin
>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Yikes! The impact of the patch is about what I'd expect, but the fact >> that planning time has nearly tripled is... way poor. > > We're going to need to see the test case, because I don't see that in > some simple tests here. Built the snapshot with this: BINDIR = /usr/local/pgsql-8.4devel-20090202/bin DOCDIR = /usr/local/pgsql-8.4devel-20090202/share/doc HTMLDIR = /usr/local/pgsql-8.4devel-20090202/share/doc INCLUDEDIR = /usr/local/pgsql-8.4devel-20090202/include PKGINCLUDEDIR = /usr/local/pgsql-8.4devel-20090202/include INCLUDEDIR-SERVER = /usr/local/pgsql-8.4devel-20090202/include/server LIBDIR = /usr/local/pgsql-8.4devel-20090202/lib PKGLIBDIR = /usr/local/pgsql-8.4devel-20090202/lib LOCALEDIR = /usr/local/pgsql-8.4devel-20090202/share/locale MANDIR = /usr/local/pgsql-8.4devel-20090202/share/man SHAREDIR = /usr/local/pgsql-8.4devel-20090202/share SYSCONFDIR = /usr/local/pgsql-8.4devel-20090202/etc PGXS = /usr/local/pgsql-8.4devel-20090202/lib/pgxs/src/makefiles/pgxs.mk CONFIGURE = '--prefix=/usr/local/pgsql-8.4devel-20090202' '--enable-integer-datetimes' '--enable-debug' '--disable-nls' '--with-libxml' CC = gcc CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2 CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g CFLAGS_SL = -fpic LDFLAGS = -Wl,--as-needed -Wl,-rpath,'/usr/local/pgsql-8.4devel-20090202/lib' LDFLAGS_SL = LIBS = -lpgport -lxml2 -lz -lreadline -lcrypt -ldl -lm VERSION = PostgreSQL 8.4devel Ran it with this: max_connections = 10 shared_buffers = 250MB max_prepared_transactions = 0 work_mem = 50MB maintenance_work_mem = 1GB checkpoint_segments = 20 checkpoint_timeout = 30min effective_cache_size = 100GB geqo = off from_collapse_limit = 20 join_collapse_limit = 20 datestyle = 'iso, mdy' lc_messages = 'C' lc_monetary = 'C' lc_numeric = 'C' lc_time = 'C' default_text_search_config = 'pg_catalog.english' escape_string_warning = off sql_inheritance = off standard_conforming_strings = on In a new cluster the database was created with this: create user cirowner; create user cir; create user viewer; create database cir with owner cirowner; \c cir revoke create on database cir from public; revoke create on schema public from public; grant create on schema public to cirowner; checkpoint; Schema dump attached. The ten tables which are used by the query were loaded through pg_dump piped to psql. I then ran VACUUM FREEZE ANALYZE. I used \timing to capture run times. Here is the query: explain ( SELECT "C"."caseNo", "C"."filingDate", "CY"."countyName", "S"."descr" AS "statusCodeDescr", "P"."nameF", "P"."nameM", "P"."nameL", "P"."suffix", "P"."dob", "C"."caption", "CY"."countyNo", "C"."caseType", "C"."isSeal", "C"."isPartySeal", "LCM"."lastModified", "P"."searchName", CASE WHEN (("C"."filingDate" >= DATE '2008-11-01') AND ("C"."wcisClsCode" = '30709')) OR (("C"."caseType" IN ('CV','FA')) AND ("C"."wcisClsCode" = '30711') AND (EXISTS ( SELECT 1 FROM "CaseHist" "CH" WHERE ( ("C"."countyNo" = "CH"."countyNo") AND ("C"."caseNo" = "CH"."caseNo") AND ("CH"."eventType" IN ('FWBCA','CCTRO'))) ))) THEN TRUE ELSE FALSE END AS "isPetitionerSeal" FROM "Case" "C" JOIN "Party" "P" ON (("C"."caseNo" = "P"."caseNo") AND ("C"."countyNo" = "P"."countyNo") AND TRUE AND ((("P"."partyType" IN ('JV','CH')) AND ("C"."caseType" = 'ZZ')) OR ("P"."partyType" NOT IN ('JV','CH')))) JOIN "StatusCode" "S" ON ("C"."statusCode" = "S"."statusCode") JOIN "County" "CY" ON ("C"."countyNo" = "CY"."countyNo") LEFT OUTER JOIN "CaseDispo" "CD" ON (("CD"."caseNo" = "C"."caseNo") AND ("CD"."countyNo" = "C"."countyNo") AND (NOT (EXISTS ( SELECT 1 FROM "CaseDispo" "CD2" WHERE ( ("CD2"."caseNo" = "CD"."caseNo") AND ("CD2"."countyNo" = "CD"."countyNo") AND ("CD2"."dispoDate" > "CD"."dispoDate")) )))) LEFT OUTER JOIN "LastCaseMod" "LCM" ON (("LCM"."caseNo" = "C"."caseNo") AND ("LCM"."countyNo" = "C"."countyNo") AND ("LCM"."updateTypeId" = 0)) LEFT OUTER JOIN "WccaPermCaseType" "WPCT" ON (("C"."caseType" = "WPCT"."caseType") AND ("C"."countyNo" = "WPCT"."countyNo") AND ("WPCT"."profileName" = 'PUBLIC')) WHERE ( (("WPCT"."profileName" IS NOT NULL) OR (("C"."caseType" IN ('PA','JD')) AND ("C"."isConfidential" = FALSE))) AND ("C"."isExpunge" <> TRUE) AND (NOT (("WPCT"."profileName" = 'PUBLIC') AND ("C"."caseType" = 'FA') AND ("C"."wcisClsCode" = '40501'))) AND (NOT (EXISTS ( SELECT 1 FROM "HiddenCase" "HCA" WHERE ( ("HCA"."caseNo" = "C"."caseNo") AND ("HCA"."countyNo" = "C"."countyNo")) ))) AND ("P"."isSeal" = FALSE) AND (NOT (("P"."partyType" IN ('PE','PL','JP')) AND ((("C"."filingDate" >= DATE '2008-11-01') AND ("C"."wcisClsCode" = '30709')) OR (("C"."caseType" IN ('CV','FA')) AND ("C"."wcisClsCode" = '30711') AND (EXISTS ( SELECT 1 FROM "CaseHist" "CHPET" WHERE ( ("C"."countyNo" = "CHPET"."countyNo") AND ("C"."caseNo" = "CHPET"."caseNo") AND ("CHPET"."eventType" IN ('FWBCA','CCTRO'))) )))))) AND TRUE AND ("P"."searchName" LIKE 'HILL,J%') AND TRUE AND (NOT (("C"."caseType" IN ('CF','CI','CM','CT','FO','TR')) AND ("P"."partyType" <> 'DE'))) AND ((("C"."caseType" IN ('JA','JC','JG','JM','JO','JV','JI','TP')) AND ("P"."partyType" IN ('CH','JV'))) OR (("C"."caseType" NOT IN ('JA','JC','JG','JM','JO','JV','JI','TP')) AND ("P"."partyType" NOT IN ('CH','JV')))) AND TRUE AND TRUE AND TRUE AND TRUE AND TRUE AND TRUE AND TRUE AND TRUE AND TRUE AND TRUE) UNION SELECT "C"."caseNo", "C"."filingDate", "CY"."countyName", "S"."descr" AS "statusCodeDescr", "AL"."nameF", "AL"."nameM", "AL"."nameL", "AL"."suffix", "AL"."dob", "C"."caption", "CY"."countyNo", "C"."caseType", "C"."isSeal", "C"."isPartySeal", "LCM"."lastModified", "AL"."searchName", CASE WHEN (("C"."filingDate" >= DATE '2008-11-01') AND ("C"."wcisClsCode" = '30709')) OR (("C"."caseType" IN ('CV','FA')) AND ("C"."wcisClsCode" = '30711') AND (EXISTS ( SELECT 1 FROM "CaseHist" "CH" WHERE ( ("C"."countyNo" = "CH"."countyNo") AND ("C"."caseNo" = "CH"."caseNo") AND ("CH"."eventType" IN ('FWBCA','CCTRO'))) ))) THEN TRUE ELSE FALSE END AS "isPetitionerSeal" FROM "Case" "C" JOIN "Party" "P" ON (("C"."caseNo" = "P"."caseNo") AND ("C"."countyNo" = "P"."countyNo") AND ((("P"."partyType" IN ('JV','CH')) AND ("C"."caseType" = 'ZZ')) OR ("P"."partyType" NOT IN ('JV','CH')))) JOIN "Alias" "AL" ON (("P"."caseNo" = "AL"."caseNo") AND ("P"."countyNo" = "AL"."countyNo") AND ("P"."partyNo" = "AL"."partyNo") AND TRUE) JOIN "StatusCode" "S" ON ("C"."statusCode" = "S"."statusCode") JOIN "County" "CY" ON ("C"."countyNo" = "CY"."countyNo") LEFT OUTER JOIN "CaseDispo" "CD" ON (("CD"."caseNo" = "C"."caseNo") AND ("CD"."countyNo" = "C"."countyNo") AND (NOT (EXISTS ( SELECT 1 FROM "CaseDispo" "CD2" WHERE ( ("CD2"."caseNo" = "CD"."caseNo") AND ("CD2"."countyNo" = "CD"."countyNo") AND ("CD2"."dispoDate" > "CD"."dispoDate")) )))) LEFT OUTER JOIN "LastCaseMod" "LCM" ON (("LCM"."caseNo" = "C"."caseNo") AND ("LCM"."countyNo" = "C"."countyNo") AND ("LCM"."updateTypeId" = 0)) LEFT OUTER JOIN "WccaPermCaseType" "WPCT" ON (("C"."caseType" = "WPCT"."caseType") AND ("C"."countyNo" = "WPCT"."countyNo") AND ("WPCT"."profileName" = 'PUBLIC')) WHERE ( (("WPCT"."profileName" IS NOT NULL) OR (("C"."caseType" IN ('PA','JD')) AND ("C"."isConfidential" = FALSE))) AND ("C"."isExpunge" <> TRUE) AND (NOT (("WPCT"."profileName" = 'PUBLIC') AND ("C"."caseType" = 'FA') AND ("C"."wcisClsCode" = '40501'))) AND (NOT (EXISTS ( SELECT 1 FROM "HiddenCase" "HCA" WHERE ( ("HCA"."caseNo" = "C"."caseNo") AND ("HCA"."countyNo" = "C"."countyNo")) ))) AND ("P"."isSeal" = FALSE) AND (NOT (("P"."partyType" IN ('PE','PL','JP')) AND ((("C"."filingDate" >= DATE '2008-11-01') AND ("C"."wcisClsCode" = '30709')) OR (("C"."caseType" IN ('CV','FA')) AND ("C"."wcisClsCode" = '30711') AND (EXISTS ( SELECT 1 FROM "CaseHist" "CHPET" WHERE ( ("C"."countyNo" = "CHPET"."countyNo") AND ("C"."caseNo" = "CHPET"."caseNo") AND ("CHPET"."eventType" IN ('FWBCA','CCTRO'))) )))))) AND TRUE AND ("AL"."searchName" LIKE 'HILL,J%') AND TRUE AND (NOT (("C"."caseType" IN ('CF','CI','CM','CT','FO','TR')) AND ("P"."partyType" <> 'DE'))) AND ((("C"."caseType" IN ('JA','JC','JG','JM','JO','JV','JI','TP')) AND ("P"."partyType" IN ('CH','JV'))) OR (("C"."caseType" NOT IN ('JA','JC','JG','JM','JO','JV','JI','TP')) AND ("P"."partyType" NOT IN ('CH','JV')))) AND TRUE AND TRUE AND TRUE AND TRUE AND TRUE AND TRUE AND TRUE AND TRUE AND TRUE AND TRUE)) ORDER BY "caseNo" ; Plans from 8.3.5 and 8.4devel attached. If you need something else, let me know. -Kevin
Attachment
>>> I wrote: > Ran it with this: > effective_cache_size = 100GB Actually, the timings shown in the previous post were run with the default for this setting. I updated it after yesterday evening's tests when I noticed I'd missed it, but had to leave before I could rerun the tests. I forgot that until now. Sorry for the confusion. Surprisingly (to me), this setting affects plan time significantly. The 8.4devel with patch, statistics target 10, runs in an average 85.393 ms instead of the 99.421 ms I saw with the default. The 8.3.5 settings were run with this set. Both of the plans attached to the previous message were run with this set. -Kevin
On Tue, Feb 3, 2009 at 10:17 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: >> We're going to need to see the test case, because I don't see that in >> some simple tests here. > Plans from 8.3.5 and 8.4devel attached. > > If you need something else, let me know. I had a suspicion we were going to see something like this. You're using several NOT EXISTS clauses and 8.4devel is converting those into Anti Joins. Aside from the longer planning time, the resulting plan appears to have a much higher estimated cost, so I'm suspicious of a bug in the new Semi/Anti join stuff, but I don't have time to track it down right now (and I suspect Tom will figure it out faster than I can anyway). ...Robert
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > [ test case ] It looks to me like the reason for the planning time difference is that this query contains four NOT EXISTS subqueries, which 8.3 was not very smart about but 8.4 has converted into antijoins. That gives it more flexibility to consider different join orders, which means more paths to sort through, so it takes longer. But in principle you are more likely to get a good plan. (You didn't say anything about the actual runtimes --- I'd be interested to know about the runtimes and the quality of the rowcount estimates in both cases.) So as far as the fact that planning is slower is concerned, it's pretty much "nothing to see here, move along". I notice though that the profile shows add_path is eating even more run-time percentage wise than before, because it's getting called more. (It's up from about 14% to 21%, counting subroutines --- see below.) So there might be an argument for installing Robert's optimization or something like it in 8.4 to buy some of that back, rather than waiting for 8.5. regards, tom lane 8.3: 0.00 0.00 3700/2893200 set_rel_pathlist <cycle 5> [327] 0.00 0.00 4500/2893200 create_index_paths <cycle 5> [132] 0.51 0.17 2885000/2893200 add_paths_to_joinrel <cycle5> [14] [16] 14.7 0.51 0.18 2893200 add_path [16] 0.13 0.00 6401100/10760100 compare_pathkeys[29] 0.00 0.02 454600/621400 list_delete_cell [112] 0.01 0.00 453400/4243582 AllocSetFree [48] 0.01 0.00 453400/4242980 pfree [66] 0.00 0.00 85700/512901 lcons [98] 0.00 0.00 208700/1934900 compare_path_costs [196] 8.4: 0.00 0.00 4100/10605500 set_rel_pathlist <cycle 8> [200] 0.00 0.00 4300/10605500 create_index_paths <cycle 8> [207] 2.20 0.57 10597100/10605500 add_paths_to_joinrel<cycle 8> [14] [16] 21.7 2.20 0.57 10605500 add_path [16] 0.45 0.00 30231600/47490100 compare_pathkeys[24] 0.02 0.05 1584000/1909000 list_delete_cell [81] 0.03 0.00 1582800/13590386 AllocSetFree [46] 0.01 0.00 1014900/10462300 compare_path_costs [53] 0.01 0.00 1582800/13589684 pfree [62] 0.00 0.00 169400/833901 lcons [108]
Robert Haas <robertmhaas@gmail.com> writes: > I had a suspicion we were going to see something like this. You're > using several NOT EXISTS clauses and 8.4devel is converting those into > Anti Joins. Aside from the longer planning time, the resulting plan > appears to have a much higher estimated cost, so I'm suspicious of a > bug in the new Semi/Anti join stuff, More likely the other way round --- I think 8.3's rowcounts and hence costs are flights of fancy, and 8.4 is more likely to be in the right ballpark. We need to see EXPLAIN ANALYZE to find out though. regards, tom lane
>>> Robert Haas <robertmhaas@gmail.com> wrote: > I had a suspicion we were going to see something like this. You're > using several NOT EXISTS clauses and 8.4devel is converting those into > Anti Joins. Aside from the longer planning time, the resulting plan > appears to have a much higher estimated cost, so I'm suspicious of a > bug in the new Semi/Anti join stuff, but I don't have time to track it > down right now (and I suspect Tom will figure it out faster than I can > anyway). The 8.4 plan also don't seem to run as fast. Attached is EXPLAIN ANALYZE output from the 8.3.5 database I dumped from for my 8.4devel tests. I tried the same run on 8.4devel and it is still running after 20 minutes. I will let it cook for a while. -Kevin
Attachment
>>> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > Attached is EXPLAIN > ANALYZE output from the 8.3.5 database I dumped from for my 8.4devel > tests. Actually, that one is from the sibling machine which is in production. Attached is the one on standby where I've been running the rest of this. Apparently, differences in the random samples lead to different plans, so maybe having both will be of some use. Apologies, but I'm having to work this in among other work. -Kevin
Attachment
>>> I wrote: > I tried the same run on 8.4devel and it is still running after > 20 minutes. I will let it cook for a while. It's now been an hour and 30 minutes; so, while 8.4 does a much better job of estimating how many rows will be returned, the plan it generates is much slower for this query. FWIW I'll attach the results of vmstat 1 below. The machine a replication target and is receiving a little over 100 database transactions per second, but is otherwise fairly idle, except for this long-running query. -Kevin
Attachment
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > It's now been an hour and 30 minutes; so, while 8.4 does a much better > job of estimating how many rows will be returned, the plan it > generates is much slower for this query. Can you let it run to completion? Without explain analyze results it's going to be pretty difficult to isolate the problem. regards, tom lane
>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: > Can you let it run to completion? Without explain analyze results > it's going to be pretty difficult to isolate the problem. Barring some currently unforseen need to switch it into service to back the web site, yes. -Kevin
>>> I wrote: >>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Can you let it run to completion? Without explain analyze results >> it's going to be pretty difficult to isolate the problem. > > Barring some currently unforseen need to switch it into service to > back the web site, yes. It's been about 23 hours and it's still running. No apparent memory leakage. No significant disk activity. One CPU pegged (of the 16 on the machine). Sooner or later we're likely to need to swap this box into production, but there's a good chance it could be left this way for days or even a couple weeks. Unfortunately I didn't have the foresight to use nohup when running this, so a network glitch might knock me off, too. (And they've been working on the network due to a move of the server room to a new floor.) Do I just leave it? Would a stack trace be of any value? If you had row counts or dumps of the statistics tables, could you figure anything out? Let me know. -Kevin
On Wed, Feb 4, 2009 at 3:07 PM, Kevin Grittner > It's been about 23 hours and it's still running. @snide(Gee, it sure would be nice if we continued with that explain-in-progress patch I had sent in earlier...:) -- greg
On Wed, Feb 4, 2009 at 10:12 AM, Greg Stark <stark@enterprisedb.com> wrote:
Agreed.
On Wed, Feb 4, 2009 at 3:07 PM, Kevin Grittner> It's been about 23 hours and it's still running.@snide(Gee, it sure would be nice if we continued with that
explain-in-progress patch I had sent in earlier...:)
Agreed.
--
Jonah H. Harris, Senior DBA
myYearbook.com
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > It's been about 23 hours and it's still running. No apparent memory > leakage. No significant disk activity. One CPU pegged (of the 16 on > the machine). Hmmm ... I wonder if it could be stuck in an infinite loop? It's probably just a horribly bad plan choice, but ... > Would a stack trace be of any value? If you can attach to the backend with gdb, try "bt", then "cont", then wait a few seconds, then control-C and "bt" again. Repeat five or ten times and see if there's any consistency to the traces. You should be able to just "quit" and leave the backend running afterwards. regards, tom lane
>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: > If you can attach to the backend with gdb, try "bt", then "cont", > then wait a few seconds, then control-C and "bt" again. Repeat > five or ten times and see if there's any consistency to the traces. Attached. -Kevin
Attachment
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> If you can attach to the backend with gdb, try "bt", then "cont", >> then wait a few seconds, then control-C and "bt" again. Repeat >> five or ten times and see if there's any consistency to the traces. > Attached. Hmm, it seems to be spending all its time in this part of the plan: -> Merge Anti Join (cost=0.00..2069318.25 rows=11628008 width=15) MergeCond: (("CD"."countyNo")::smallint = ("CD2"."countyNo")::smallint) Join Filter: ((("CD2"."dispoDate")::date> ("CD"."dispoDate")::date) AND (("CD2"."caseNo")::text = ("CD"."caseNo")::text)) -> Index Scan using "CaseDispo_pkey" on "CaseDispo" "CD" (cost=0.00..1068306.86 rows=17442012 width=19) -> Index Scan using "CaseDispo_CountyNoDispoDate" on "CaseDispo" "CD2" (cost=0.00..913801.31rows=17442012 width=19) ie SELECT ... FROM "CaseDispo" "CD" WHERE (NOT (EXISTS ( SELECT 1 FROM "CaseDispo" "CD2" WHERE ( ("CD2"."caseNo" = "CD"."caseNo") AND ("CD2"."countyNo" = "CD"."countyNo") AND ("CD2"."dispoDate" > "CD"."dispoDate")) ))) which in fact the planner thought would be pretty expensive; but not 24 hours worth. I'm not sure at the moment if the cost estimate is just badly off, or if there's some sort of thinko in the logic. Can you estimate how many rows would come out of that join? Is the rowcount for the underlying table (17442012 rows in CaseDispo) accurate? regards, tom lane
>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: > SELECT ... FROM "CaseDispo" "CD" > WHERE (NOT (EXISTS > ( > SELECT > 1 > FROM > "CaseDispo" "CD2" > WHERE ( > ("CD2"."caseNo" = "CD"."caseNo") > AND ("CD2"."countyNo" = "CD"."countyNo") > AND ("CD2"."dispoDate" > "CD"."dispoDate")) > ))) > > which in fact the planner thought would be pretty expensive; but not 24 > hours worth. I'm not sure at the moment if the cost estimate is just > badly off, or if there's some sort of thinko in the logic. Can you > estimate how many rows would come out of that join? Well, of the cases which are selected based on the other criteria, there would be about one CaseDispo row each. The main selection criterion is the Party.searchName, with various security limitations added. Where one or more CaseDispo rows exist (it's only included through a LEFT JOIN), we want only the one with the latest dispoDate. Based on the queries which ran under 8.3.5, it's pretty safe to assume that the number of CaseDispo rows matching the join criteria to Case would be between 2,300 and 2,400, with only a handful having multiple matches to the same Case. There can never be more than one related to a specific Case for any one dispoDate. > Is the rowcount for > the underlying table (17442012 rows in CaseDispo) accurate? cir=# select count(*) from "CaseDispo"; count ----------17433234 (1 row)
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Well, of the cases which are selected based on the other criteria, > there would be about one CaseDispo row each. The main selection > criterion is the Party.searchName, with various security limitations > added. Where one or more CaseDispo rows exist (it's only included > through a LEFT JOIN), we want only the one with the latest dispoDate. > Based on the queries which ran under 8.3.5, it's pretty safe to assume > that the number of CaseDispo rows matching the join criteria to Case > would be between 2,300 and 2,400, with only a handful having multiple > matches to the same Case. Hmm ... but that's all irrelevant because this is being done as the innermost join, ie, it's running the EXISTS test for every row of CaseDispo regardless of whether that row will be joined to later. Based on your comments that seems like a really stupid plan choice, so I guess Robert is right: there's some sort of logic bug here someplace. Will look. regards, tom lane
>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: > there's some sort of logic bug here someplace. Keep in mind that this is running with the patch that started this thread. I didn't try actually running this query on 8.4devel without the patch. Should I kill this query, revert the software to pre-patch, and try it again to see, or are you confident that that's not the issue? -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> there's some sort of logic bug here someplace. > Keep in mind that this is running with the patch that started this > thread. I didn't try actually running this query on 8.4devel without > the patch. Should I kill this query, revert the software to > pre-patch, and try it again to see, or are you confident that that's > not the issue? Hmm. It would be worth checking whether you get the same plan without the patch (you should, but let's check). If it is the same plan then it's not going to be any faster. I don't see any value in trying to let the query run to conclusion either. regards, tom lane
On Wed, Feb 4, 2009 at 12:15 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: >>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: >> there's some sort of logic bug here someplace. > > Keep in mind that this is running with the patch that started this > thread. I didn't try actually running this query on 8.4devel without > the patch. Should I kill this query, revert the software to > pre-patch, and try it again to see, or are you confident that that's > not the issue? Oh, dear. If this turns out to be my bug Tom will kick my ass! ...Robert
On Feb 4, 2009, at 9:25 AM, Robert Haas wrote: > Oh, dear. If this turns out to be my bug Tom will kick my ass! Can I buy tickets to this event? Best, David
>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: > It would be worth checking whether you get the same plan without > the patch (you should, but let's check). Same plan. (Robert can relax, and David can forget about those tickets.) -Kevin
Robert Haas <robertmhaas@gmail.com> writes: > Oh, dear. If this turns out to be my bug Tom will kick my ass! Hmm ... one of the things that struck me as odd was that it was doing a merge join on just the countyNo, which is presumably very far from unique. Testing the query here with Kevin's schema but no data, I get -> Merge Anti Join (cost=0.00..102.51 rows=233 width=34) Merge Cond: ((("CD"."countyNo")::smallint= ("CD2"."countyNo")::smallint) AND (("CD"."caseNo")::text = ("CD2"."caseNo")::text)) Join Filter: (("CD2"."dispoDate")::date > ("CD"."dispoDate")::date) -> Index Scanusing "CaseDispo_pkey" on "CaseDispo" "CD" (cost=0.00..49.50 rows=350 width=38) -> Index Scanusing "CaseDispo_pkey" on "CaseDispo" "CD2" (cost=0.00..49.50 rows=350 width=38) ie it's using the first two columns of the pkey not only the first column as merge key (and not arbitrarily using two different indexes to accomplish the same scan, which is another weird thing about that plan). There's no visible reason for it not to have done that in Kevin's test, unless there's something wrong with your patch. There might be more than one bug here though. The other question is why it wants to do this join first at all, and I'm not convinced that add_path could be at fault for that. I'm suspecting that the logic that considers join order restrictions for antijoins might be overly restrictive. regards, tom lane
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It would be worth checking whether you get the same plan without >> the patch (you should, but let's check). > Same plan. Yeah, I just managed to reproduce a similar behavior in unpatched HEAD. Now, since I'm running without any stats, it might be that it's estimating similar costs for the one-key and two-key merges; but I don't see why that would happen for you. Off to do some debugging. regards, tom lane
>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hmm ... one of the things that struck me as odd was that it was doing a > merge join on just the countyNo, which is presumably very far from > unique. There are 72 possible values for any columns in that domain. In most large tables, Milwaukee County (value 40) is used in about 20% of the rows. > ... using two different indexes to > accomplish the same scan, which is another weird thing about that plan). > There's no visible reason for it not to have done that Well, for a human who understand the data, I certainly would have expected it to use the primary key for both. There are never duplicate case numbers within a county, and there can certainly be a lot of cases disposed on a given date for a county. It occurred to me that this had been run with the last ANALYZE having run with a default_statistics_target of 10, but that wasn't it -- I got new statistics with a target of 100 and this part of the plan looks the same. Some other aspects of the plan changed, but if this part was the killer, a higher target doesn't help. -Kevin
I wrote: > Now, since I'm running without any stats, it might be that it's > estimating similar costs for the one-key and two-key merges; but I don't > see why that would happen for you. Off to do some debugging. ... well, actually, it's because I blew off applying any cost correction for this case in cost_mergejoin: * For SEMI and ANTI joins, only one inner tuple need be rescanned for * each group of same-keyed outer tuples (assumingthat all joinquals * are merge quals). This makes the effect small enough to ignore, * so we just set rescannedtuples= 0. Obviously that's not going to be good enough for production --- the parenthetical assumption here is just wrong in the case at hand. That doesn't seem to be the only issue in your example, but it's definitely one of 'em. regards, tom lane
>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: > That doesn't seem to be the only issue in your example, but it's > definitely one of 'em. FWIW, the pattern causing the problem here is a pretty common one in court business logic: join (often outer join or an exists test) to a set of candidate rows WHERE NOT EXISTS the same set of candidate rows with tests to see if there is a better choice from among the candidates. In the simplest cases (like the current example) this could be rewritten using a correlated subquery with a MAX function; however, the test for a better candidate often is too complex for that technique to work, that technique has (until now) been slower, and programmers were much more prone to writing code with logic errors that way. I don't know if that matters to you in your current debugging, but I just wanted to point out that we do it in hundreds of places in the code. -Kevin
Robert Haas wrote: > On Mon, Feb 2, 2009 at 8:10 PM, Kevin Grittner > <Kevin.Grittner@wicourts.gov> wrote: > >>>> Robert Haas <robertmhaas@gmail.com> wrote: > >> running this 5 times each on several queries, > >> dropping top and bottom results. > > > > Running a complex query (posted in previous threads, runs about > > 300,000 time per day in a production web application), I got these > > timings on a production quality machine (4 quad CPU chips, that is 16 > > CPUs like this: Intel(R) Xeon(R) CPU X7350 @ 2.93GHz, 128 GB RAM, big > > RAID with BBU). I ran explain in each environment 5 times, tossed > > high and low, and averaged. The 8.4devel was from today's > > (2008-02-02) snapshot, built the same way we did 8.3.5. > > > > 8.3.5, statistics target 10: 36.188 ms > > 8.4devel without patch, statistics target 100: 109.862 ms > > 8.4devel with patch, statistics target 100: 104.015 ms > > > > After seeing that, I re-analyzed to eliminate the statistics target as > > the cause of the 8.4 increase. > > > > 8.4devel with patch, statistics target 10: 99.421 ms > > Yikes! The impact of the patch is about what I'd expect, but the fact > that planning time has nearly tripled is... way poor. Can you repost > the query and the EXPLAIN output for 8.3.5 and CVS HEAD? Where are we on this: the original patch, and Kevin's slow queries? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Sat, Feb 7, 2009 at 2:44 PM, Bruce Momjian <bruce@momjian.us> wrote: >> Yikes! The impact of the patch is about what I'd expect, but the fact >> that planning time has nearly tripled is... way poor. Can you repost >> the query and the EXPLAIN output for 8.3.5 and CVS HEAD? > > Where are we on this: the original patch, and Kevin's slow queries? The original patch is still in the queue for 8.5, and I'll probably do more work on it between now and the next CommitFest. If someone decides to commit some portion of the existing patch first, I never complain about getting to cut in front of the line - but I don't ask for it, either. Tom started a separate thread to discuss Kevin's planner issues: http://archives.postgresql.org/message-id/21139.1233853501@sss.pgh.pa.us and has committed a fix for one of the two problems Kevin found in testing my patch: http://archives.postgresql.org/message-id/20090206234324.11A417559ED@cvs.postgresql.org ...Robert
>>> Bruce Momjian <bruce@momjian.us> wrote: > Where are we on this: the original patch, and Kevin's slow queries? Robert's patch is not the cause of the 8.4 problems with my queries, and (as Robert pointed out) a separate thread has been started to discuss those issues. From my perspective, Robert's patch has improved plan time in every test of a complex query that I've run. I have compared plans for some queries with and without the patch, and in when I have done so the EXPLAIN output has been byte-for-byte identical, it just got to that plan faster. In this post: http://archives.postgresql.org/pgsql-hackers/2009-02/msg00118.php Tom points out that the additional optimizations included in 8.4 can increase plan time a bit, "So there might be an argument for installing Robert's optimization or something like it in 8.4 to buy some of that back, rather than waiting for 8.5." -Kevin
After a lot of distractions, I've finished applying the planner fixes that seem necessary in view of your report about poorer planning in 8.4 than 8.3. When you have a chance, it would be useful to do a thorough test of CVS HEAD on your data and query mix --- are there any other places where we have regressed relative to 8.3? regards, tom lane
[ This patch is on the 2009-First commitfest list, but there was earlier discussion to the effect that it might be a good idea to include in 8.4, so I made some time to look at it. ] Robert Haas <robertmhaas@gmail.com> writes: > I've been doing some benchmarking and profiling on the PostgreSQL > query analyzer, and it seems that (at least for the sorts of queries > that I typically run) the dominant cost is add_path(). I've been able > to find two optimizations that seem to help significantly: I got around to testing this patch a bit today, and I'm afraid it doesn't look very good from here. The test case I used was a psql script with 10000 repetitions of explain select * from information_schema.table_constraints join information_schema.referential_constraints using (constraint_catalog,constraint_schema,constraint_name); which isn't terribly creative, but those two views incorporate multiple joins so I figured this was a reasonably middle-of-the-road planner exercise. I first tried just the compare_fuzzy_path_costs() change and really couldn't measure any reliable difference. oprofile output for CVS HEAD looks like PU: P4 / Xeon with 2 hyper-threads, speed 2792.99 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000 samples % image name symbol name 149294 9.7519 postgres AllocSetAlloc 85437 5.5808 postgres SearchCatCache 45520 2.9734 postgres copyObject 39525 2.5818 postgres add_path 35878 2.3436 postgres expression_tree_walker 29733 1.9422 libc-2.8.so memcpy 25560 1.6696 postgres MemoryContextAllocZeroAligned 20879 1.3638 libc-2.8.so strlen 20521 1.3404 postgres add_paths_to_joinrel 20054 1.3099 postgres AllocSetFree 20023 1.3079 postgres cost_mergejoin 19363 1.2648 postgres nocachegetattr 17689 1.1554 libc-2.8.so vfprintf 17335 1.1323 postgres generate_join_implied_equalities 16085 1.0507 postgres MemoryContextAlloc and with the compare_fuzzy_path_costs() change CPU: P4 / Xeon with 2 hyper-threads, speed 2792.99 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000 samples % image name symbol name 144882 9.6951 postgres AllocSetAlloc 84205 5.6348 postgres SearchCatCache 43956 2.9414 postgres copyObject 37418 2.5039 postgres add_path 34865 2.3331 postgres expression_tree_walker 28718 1.9217 libc-2.8.so memcpy 23643 1.5821 postgres MemoryContextAllocZeroAligned 20835 1.3942 libc-2.8.so strlen 20065 1.3427 postgres cost_mergejoin 19672 1.3164 postgres AllocSetFree 18992 1.2709 postgres add_paths_to_joinrel 18802 1.2582 postgres nocachegetattr 17109 1.1449 libc-2.8.so __printf_fp 16901 1.1310 libc-2.8.so vfprintf 16159 1.0813 postgres hash_search_with_hash_value 16039 1.0733 postgres generate_join_implied_equalities 15554 1.0408 postgres MemoryContextAlloc 14981 1.0025 postgres expression_tree_mutator Note that the compiler is inlining compare_fuzzy_path_costs in both cases. I then added the add_similar_paths change, and got this: CPU: P4 / Xeon with 2 hyper-threads, speed 2792.99 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000 samples % image name symbol name 142645 9.6668 postgres AllocSetAlloc 83161 5.6357 postgres SearchCatCache 44660 3.0265 postgres copyObject 35762 2.4235 postgres expression_tree_walker 28427 1.9264 postgres add_path 27970 1.8955 libc-2.8.so memcpy 23317 1.5801 postgres MemoryContextAllocZeroAligned 20685 1.4018 libc-2.8.so strlen 19646 1.3314 postgres add_paths_to_joinrel 19129 1.2963 postgres AllocSetFree 18065 1.2242 postgres cost_mergejoin 17768 1.2041 postgres nocachegetattr 17249 1.1689 postgres generate_join_implied_equalities 16784 1.1374 libc-2.8.so vfprintf 15740 1.0667 libc-2.8.so __printf_fp 15642 1.0600 postgres MemoryContextAlloc 15194 1.0297 postgres expression_tree_mutator 15009 1.0171 postgres hash_search_with_hash_value ... 7746 0.5249 postgres add_similar_paths where at least you can see a change in add_path's percentage, although overall runtime did not change measurably; and in any case whatever was saved seems to have gone into add_similar_paths. It gets worse though: add_similar_paths is flat *wrong*. It compares each successive path to only the current cheapest-total candidate, and thus is quite likely to seize on the wrong cheapest-startup path. I tried fixing its loop logic like so: for (i = 1; i < npath; ++i) { if (compare_fuzzy_path_costs(cheapest_total, paths[i], TOTAL_COST) > 0) cheapest_total = paths[i]; if (compare_fuzzy_path_costs(cheapest_startup, paths[i], STARTUP_COST) > 0) cheapest_startup = paths[i]; } (obviously, this is with compare_fuzzy_path_costs reverted to its form in HEAD) and got this: CPU: P4 / Xeon with 2 hyper-threads, speed 2792.99 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000 samples % image name symbol name 143675 9.6105 postgres AllocSetAlloc 83809 5.6060 postgres SearchCatCache 43837 2.9323 postgres copyObject 35790 2.3940 postgres expression_tree_walker 29870 1.9980 postgres compare_fuzzy_path_costs 28521 1.9078 libc-2.8.so memcpy 23448 1.5685 postgres MemoryContextAllocZeroAligned 21049 1.4080 libc-2.8.so strlen 18343 1.2270 postgres add_path 18154 1.2143 postgres AllocSetFree 18119 1.2120 postgres cost_mergejoin 18036 1.2064 postgres add_paths_to_joinrel 17791 1.1901 postgres nocachegetattr 17277 1.1557 postgres generate_join_implied_equalities 16956 1.1342 libc-2.8.so vfprintf 16125 1.0786 postgres hash_search_with_hash_value 15733 1.0524 libc-2.8.so __printf_fp 15387 1.0292 postgres MemoryContextAlloc ... 5759 0.3852 postgres add_similar_paths Here we have a total of 3.6% spent in code that had used ~2.5% before we started. Not very promising. Now, maybe if we could force the compiler to inline compare_fuzzy_path_costs we could buy some of that back. Another possibility is to contort add_similar_path some more so that it avoids double comparisons so long as cheapest_total and cheapest_startup are the same path; I have a feeling that's not going to win much though. So it's kind of looking like a dead end from here. But in any case the patch isn't acceptable as-is with that fundamental logic error in add_similar_paths, so I'm bouncing it back for rework. regards, tom lane
Thanks for taking a look at it. > I first tried just the compare_fuzzy_path_costs() change and really > couldn't measure any reliable difference. oprofile output for CVS HEAD > looks like Well, there's obviously something different between your case and mine, because in my query add_path was the #1 CPU hog and it wasn't close. I suspect my query had more joins - I'll take a look at yours. > It gets worse though: add_similar_paths is flat *wrong*. It compares > each successive path to only the current cheapest-total candidate, > and thus is quite likely to seize on the wrong cheapest-startup path. > I tried fixing its loop logic like so: Nuts. That's probably a fatal defect. > Now, maybe if we could force the compiler to inline > compare_fuzzy_path_costs we could buy some of that back. Another > possibility is to contort add_similar_path some more so that it avoids > double comparisons so long as cheapest_total and cheapest_startup are > the same path; I have a feeling that's not going to win much though. Agreed. > So it's kind of looking like a dead end from here. But in any case the > patch isn't acceptable as-is with that fundamental logic error in > add_similar_paths, so I'm bouncing it back for rework. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: >> I first tried just the compare_fuzzy_path_costs() change and really >> couldn't measure any reliable difference. �oprofile output for CVS HEAD >> looks like > Well, there's obviously something different between your case and > mine, because in my query add_path was the #1 CPU hog and it wasn't > close. I suspect my query had more joins - I'll take a look at yours. Interesting. I'd like to see your example too, if you can publish it. You had noted that add_path seemed to iterate clear to the end of the list in most cases, which it's designed not to do, so there might be something else going on in your example. One other thought to roll around in your head: at the time that the current add_path logic was designed, compare_pathkeys was ungodly expensive, which is why the code tries to compare costs first. We've since introduced the "canonical pathkey" representation, which makes compare_pathkeys enough cheaper that it might not be insane to do it in the other order. I don't immediately see an easy win there, because as things are structured we typically want the cost comparison result anyway for list sorting purposes. But maybe there's a way around that. regards, tom lane
On Fri, Feb 27, 2009 at 11:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > One other thought to roll around in your head: at the time that the > current add_path logic was designed, compare_pathkeys was ungodly > expensive, which is why the code tries to compare costs first. > We've since introduced the "canonical pathkey" representation, which > makes compare_pathkeys enough cheaper that it might not be insane > to do it in the other order. I don't immediately see an easy win > there, because as things are structured we typically want the cost > comparison result anyway for list sorting purposes. But maybe there's > a way around that. I think the best think to do with the pathkeys comparison is avoid it altogether as frequently as possible. One thing I've been thinking about is trying to refactor add_paths_to_joinrel() so that it doesn't need to be called twice for every pair of input relations. This saves some cycles in a few different places: you avoid recomputing the merge clauses and hash clauses twice, and because you now generate all of the unsorted hash-paths in the same piece of code, you can do an add_similar_paths()-type thing where you only compare costs first and then throw just the winners into the main list. I think there are some minor economies in sort_inner_and_outer() as well. I haven't been able to convince myself that all of this adds up to a measurable win, though. The problem is sort_inner_and_outer and hash_inner_and_outer actually generate only a relatively small number of paths. The latter generates at most two, and the former generates length(mergeclause_list), which at least for the kinds of queries that I usually write is most often one. Maybe two? On the other hand, match_unsorted_outer generates up to 5 nestloops plus potentially several merge joins PER OUTER PATH. That's a much bigger number. Ergo, if you want to make join planning fast, you need to make match_unsorted_outer() consider fewer paths or consider them more quickly. And, unfortunately, it's not clear how to do that. I have a nagging feeling that the logic around when to try materializing the inner path could be improved. We skip it when the cheapest total inner path is an index scan, bitmap heap scan, tid scan, material path, function scan, cte scan, or work table scan (which is an irritatingly long list... is there a better way to do this?), but cost_nestloop() only costs the thing differently if the inner side is a material path or hash path. I sort of wonder if cost_nestloop() could be made to decide when materialization of the inner path is warranted. That might help some, but I still think the fact that match_unsorted_outer() generates a number of paths that is linear in the size of the outer pathlist (rather than a constant) is going to tend to make it the most expensive part of join planning. ...Robert
>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: > After a lot of distractions, I've finished applying the planner fixes > that seem necessary in view of your report about poorer planning in 8.4 > than 8.3. When you have a chance, it would be useful to do a thorough > test of CVS HEAD on your data and query mix --- are there any other > places where we have regressed relative to 8.3? You probably already know this, but on the query referenced earlier in the thread, a great plan is now generated! Even when not cached, this executed in just under seven seconds. (I chose these values for testing this query because it had been logged as exceeding 20 seconds under 8.3.) Cached, EXPLAIN ANALYZE runs between 225 and 250 ms. Running it without EXPLAIN the psql \timing is between 265 and 277 ms. EXPLAIN gives a \timing averaging 80 ms. I will see what kind of testing I can put together to try to shake out any remaining issues. -Kevin
Attachment
Did this ever get applied/resolved? --------------------------------------------------------------------------- Robert Haas wrote: > I've been doing some benchmarking and profiling on the PostgreSQL > query analyzer, and it seems that (at least for the sorts of queries > that I typically run) the dominant cost is add_path(). I've been able > to find two optimizations that seem to help significantly: > > 1. add_path() often calls compare_fuzzy_path_costs() twice on the same > pair of paths, and when the paths compare equal on one criterion, some > comparisons are duplicated. I've refactored this function to return > the results of both calculations without repeating any floating-point > arithmetic. > > 2. match_unsorted_outer() adds as many as 5 nested loop joins at a > time with the same set of pathkeys. In my tests, it tended to be ~3 - > cheapest inner, cheapest inner materialized, and cheapest inner index. > Since these all have the same pathkeys, clearly only the one with the > cheapest total cost is in the running for cheapest total cost for that > set of pathkeys, and likewise for startup cost (and the two may be the > same). Yet we compare all of them against the whole pathlist, one > after the other, including (for the most part) the rather expensive > pathkey comparison. I've added a function add_similar_paths() and > refactored match_unsorted_outer() to use it. > > On a couple of complex (and proprietary) queries with 12+ joins each, > I measure a planning time improvement of 8-12% with the attached patch > applied. It would be interesting to try to replicate this on a > publicly available data set, but I don't know of a good one to use. > Suggestions welcome - results of performance testing on your own > favorite big queries even more welcome. Simple test harness also > attached. I took the approach of dropping caches, starting the > server, and then running this 5 times each on several queries, > dropping top and bottom results. > > ...Robert [ Attachment, skipping... ] [ Attachment, skipping... ] > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.comPG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard drive,Christ can be your backup. +
On Fri, Feb 26, 2010 at 10:07 PM, Bruce Momjian <bruce@momjian.us> wrote: > Did this ever get applied/resolved? No, I never went back and tried to fix the brokenness Tom found. ...Robert