Thread: Where does the time go?
I have some odd results from timing two versions of an update query, and was hoping to get a better handle on how to interpret this. The query does an update of one table. One version does three NOT IN tests against three related tables. The other version does the logically equivalent NOT EXISTS tests. Here are the timings: (a) NOT IN EXPLAIN ANALYZE reports: 150 ms. (b) Difference between SELECT CURRENT_TIMESTAMP values before and after the NOT IN EXPLAIN ANALYZE: 171 ms. (c) Difference between SELECT CURRENT_TIMESTAMP values before and after NOT IN query without EXPLAIN ANALYZE: 140 ms. (d) NOT EXISTS EXPLAIN ANALYZE reports: 9 ms. (e) Difference between SELECT CURRENT_TIMESTAMP values before and after the NOT EXISTS EXPLAIN ANALYZE: 62 ms. (f) Difference between SELECT CURRENT_TIMESTAMP values before and after NOT EXISTS query without EXPLAIN ANALYZE: 62 ms. I repeated these tests many times. After each test I updated the table back to its original state and ran CLUSTER and VACUUM ANALYZE, then allowed a minute of settling time. The timings are remarkably consistent from one try to the next with an occasional outlier on the high side. It seems to me that these results indicate that EXPLAIN ANALYZE distorts the plan for the IN query (b - c = 31 ms), but not the plan for the EXISTS query (e == f). The time to bracket the UPDATE with the SELECT CURRENT_TIMESTAMP queries can't exceed 21 ms (b - a). That seems like a long time, though, so I suspect that some of that time is going to the same place that most of the 53 ms (e - d) goes on the NOT EXISTS query. Where would that be? Parse and plan phases? Is there a way to pin that down better? I was planning on posting the queries with plans and timings to illustrate a costing issue, but when I saw these timings I figured I'd ask about this first. It raises what seems like a more fundamental question, which would have an impact on collecting all of the interesting information. If execution time is 9 ms, but there is something else adding several times that, perhaps that something else is interesting in itself. Thanks for any info. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > I have some odd results from timing two versions of an update query, and > was hoping to get a better handle on how to interpret this. You didn't show us the explain analyze results, but I'm betting that a big part of your issue is that the EXPLAIN ANALYZE instrumentation overhead is (1) significant and (2) different for the two query plans. The instrumentation overhead should be about the same for any one plan node execution, but the two plans could involve very different numbers of plan node executions ... regards, tom lane
>>> On Wed, Mar 22, 2006 at 8:59 pm, in message <27386.1143082792@sss.pgh.pa.us>, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > You didn't show us the explain analyze results, The below is cut & paste directly from a psql run without editing. bigbird=> UPDATE "User" SET "isActive" = TRUE WHERE "countyNo" = 13 and "isActive" = FALSE; UPDATE 0 bigbird=> cluster "User"; CLUSTER bigbird=> vacuum analyze "User"; VACUUM bigbird=> select current_timestamp; now ----------------------------2006-03-22 16:30:55.875-06 (1 row) bigbird=> explain analyze bigbird-> UPDATE "User" bigbird-> SET "isActive" = FALSE bigbird-> WHERE "countyNo" = 13 bigbird-> AND ("userId", "countyNo") NOT IN (SELECT "userId", "countyNo" FROM "UserAuthority") bigbird-> AND ("userId", "countyNo") NOT IN (SELECT "userId", "countyNo" FROM "UserDivision") bigbird-> AND ("userId", "countyNo") NOT IN (SELECT "userId", "countyNo" FROM "UserCtofcAuthority"); QUERY PLAN -----------------------------------------------------------------------------------------------------------------------------------Index Scanusing "User_pkey" on "User" (cost=2365.43..2499.34 rows=44 width=126) (actual time=145.968..147.061 rows=153 loops=1) Index Cond: (("countyNo")::smallint = 13) Filter: ((NOT (hashedsubplan)) AND (NOT (hashed subplan)) AND (NOT (hashed subplan))) SubPlan -> Seq Scan on "UserCtofcAuthority" (cost=0.00..332.65 rows=15565 width=14) (actual time=0.003..10.105 rows=15565 loops=1) -> Seq Scan on "UserDivision" (cost=0.00..326.65rows=15765 width=14) (actual time=0.003..10.409 rows=15764 loops=1) -> Seq Scan on "UserAuthority" (cost=0.00..1451.24 rows=70624 width=14) (actual time=0.007..45.823 rows=70626 loops=1)Total runtime: 150.340 ms (8 rows) bigbird=> select current_timestamp; now ----------------------------2006-03-22 16:30:56.046-06 (1 row) bigbird=> UPDATE "User" SET "isActive" = TRUE WHERE "countyNo" = 13 and "isActive" = FALSE; UPDATE 153 bigbird=> cluster "User"; CLUSTER bigbird=> vacuum analyze "User"; VACUUM bigbird=> select current_timestamp; now ----------------------------2006-03-22 16:30:56.203-06 (1 row) bigbird=> UPDATE "User" bigbird-> SET "isActive" = FALSE bigbird-> WHERE "countyNo" = 13 bigbird-> AND ("userId", "countyNo") NOT IN (SELECT "userId", "countyNo" FROM "UserAuthority") bigbird-> AND ("userId", "countyNo") NOT IN (SELECT "userId", "countyNo" FROM "UserDivision") bigbird-> AND ("userId", "countyNo") NOT IN (SELECT "userId", "countyNo" FROM "UserCtofcAuthority"); UPDATE 153 bigbird=> select current_timestamp; now ----------------------------2006-03-22 16:30:56.343-06 (1 row) bigbird=> UPDATE "User" SET "isActive" = TRUE WHERE "countyNo" = 13 and "isActive" = FALSE; UPDATE 153 bigbird=> cluster "User"; CLUSTER bigbird=> vacuum analyze "User"; VACUUM bigbird=> select current_timestamp; now ----------------------------2006-03-22 16:30:56.484-06 (1 row) bigbird=> explain analyze bigbird-> UPDATE "User" bigbird-> SET "isActive" = FALSE bigbird-> WHERE "countyNo" = 13 bigbird-> AND NOT EXISTS (SELECT * FROM "UserAuthority" a where a."countyNo" = "User"."countyNo" and a."userId" = "User"."userId") bigbird-> AND NOT EXISTS (SELECT * FROM "UserDivision" b where b."countyNo" = "User"."countyNo" and b."userId" = "User"."userId") bigbird-> AND NOT EXISTS (SELECT * FROM "UserCtofcAuthority" c where c."countyNo" = "User"."countyNo" and c."userId" = "User"."userId"); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------Index Scanusing "User_pkey" on "User" (cost=0.00..3650.67 rows=42 width=111) (actual time=0.057..5.722 rows=153 loops=1) Index Cond: (("countyNo")::smallint = 13) Filter: ((NOT (subplan))AND (NOT (subplan)) AND (NOT (subplan))) SubPlan -> Index Scan using "UserCtofcAuthority_pkey" on "UserCtofcAuthority" c (cost=0.00..3.48 rows=1 width=50) (actual time=0.004..0.004 rows=0 loops=153) Index Cond: ((("countyNo")::smallint = ($0)::smallint) AND (("userId")::bpchar = ($1)::bpchar)) -> Index Scan using "UserDivision_pkey" on "UserDivision" b (cost=0.00..3.53 rows=1 width=42) (actual time=0.006..0.006 rows=0 loops=156) Index Cond: ((("countyNo")::smallint = ($0)::smallint) AND (("userId")::bpchar = ($1)::bpchar)) -> Index Scan using "UserAuthority_pkey" on "UserAuthority" a (cost=0.00..3.60 rows=1 width=42) (actual time=0.007..0.007 rows=1 loops=341) Index Cond: ((("countyNo")::smallint = ($0)::smallint) AND (("userId")::bpchar = ($1)::bpchar))Total runtime: 9.136 ms (11 rows) bigbird=> select current_timestamp; now ----------------------------2006-03-22 16:30:56.546-06 (1 row) bigbird=> UPDATE "User" SET "isActive" = TRUE WHERE "countyNo" = 13 and "isActive" = FALSE; UPDATE 153 bigbird=> cluster "User"; CLUSTER bigbird=> vacuum analyze "User"; VACUUM bigbird=> select current_timestamp; now ----------------------------2006-03-22 16:30:56.703-06 (1 row) bigbird=> UPDATE "User" bigbird-> SET "isActive" = FALSE bigbird-> WHERE "countyNo" = 13 bigbird-> AND NOT EXISTS (SELECT * FROM "UserAuthority" a where a."countyNo" = "User"."countyNo" and a."userId" = "User"."userId") bigbird-> AND NOT EXISTS (SELECT * FROM "UserDivision" b where b."countyNo" = "User"."countyNo" and b."userId" = "User"."userId") bigbird-> AND NOT EXISTS (SELECT * FROM "UserCtofcAuthority" c where c."countyNo" = "User"."countyNo" and c."userId" = "User"."userId"); UPDATE 153 bigbird=> select current_timestamp; now ----------------------------2006-03-22 16:30:56.765-06 (1 row) > I'm betting that a > big part of your issue is that the EXPLAIN ANALYZE instrumentation > overhead is (1) significant and (2) different for the two query plans. The run time of the NOT IN query, as measured by elapsed time between SELECT CURRENT_TIMESTAMP executions, increased by 31 ms. The run time of the NOT EXISTS did not increase measurably. I factored out the EXPLAIN ANALYZE time in my original post for this topic. My question was about where the REST of the time goes. On this topic, I'm interested in the difference between the time reported by the EXPLAIN ANALYZE and the elapsed time to run the query. I've run a series of tests to determine what the minimum time is for a query bracketed by SELECT CURRENT_TIMESTAMP statements for this environment, and it is 12 ms (on average -- the time increments 15 ms to 16 ms at a time, which helps explain the odd consistency of the timings). That leaves an unaccounted difference between the time reported by EXPLAIN ANALYZE and the timestamp elapsed time of (on average) 9 ms for the NOT IN form of the query, and 41 ms for the NOT EXISTS for of the query. (In the run shown above, it's higher.) I'm guessing that this is the time spent in parsing and planning the query. Since for one query it is many times the actual run time reported by EXPLAIN ANALYZE, it has piqued my interest. It's also interesting that this time would be so much higher for one query than the other. What is the best way to see where this time is going? (Or, if it is obvious to those more familiar with the internals, please share.) -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> You didn't show us the explain analyze results, > The below is cut & paste directly from a psql run without editing. OK, so the two plans do indeed have much different node execution counts. The EXPLAIN ANALYZE instrumentation overhead is basically proportional to (rows+1)*loops summed over all the nodes in the plan, so I count about 102112 node executions in the NOT IN plan versus 1145 in the NOT EXISTS plan --- in other words, 100x more overhead for the former. > The run time of the NOT IN query, as measured by elapsed time between > SELECT CURRENT_TIMESTAMP executions, increased by 31 ms. Works out to about 30 microsec per node execution, which seems a bit high for modern machines ... and the coarse quantization of the CURRENT_TIMESTAMP results is odd too. What platform is this on exactly? > That leaves an unaccounted difference between the time > reported by EXPLAIN ANALYZE and the timestamp elapsed time of (on > average) 9 ms for the NOT IN form of the query, and 41 ms for the NOT > EXISTS for of the query. (In the run shown above, it's higher.) I'm > guessing that this is the time spent in parsing and planning the query. Parse/plan time is one component, and another is the time spent by EXPLAIN preparing its output display, which is not an area we've spent any time at all optimizing --- I wouldn't be surprised if it's kinda slow. However, these plans are relatively similar in terms of the complexity of the display, so it is odd that there'd be so much difference. > What is the best way to see where this time is going? Profiling with gprof or some such tool might be educational. regards, tom lane
>>> On Thu, Mar 23, 2006 at 11:27 am, in message <10223.1143134839@sss.pgh.pa.us>, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The run time of the NOT IN query, as measured by elapsed time between >> SELECT CURRENT_TIMESTAMP executions, increased by 31 ms. > > Works out to about 30 microsec per node execution, which seems a bit > high for modern machines ... and the coarse quantization of the > CURRENT_TIMESTAMP results is odd too. What platform is this on exactly? This is a smaller machine with a copy of the full production database. A single 3.6 GHz Xeon with 4 GB RAM running Windows Server 2003. It was being used to test update scripts before applying them to the production machines. I stumbled across a costing issue I thought worth posting, and in the course of gathering data noticed this time difference I didn't understand. >> What is the best way to see where this time is going? > > Profiling with gprof or some such tool might be educational. Our builds are all done with --enable-debug, but this machine doesn't even have msys installed. I'll try to put together some way to profile it on this machine or some other. (It might be easier to move it to a Linux machine and confirm the problem there, then profile.) Thanks, -Kevin
>>> On Thu, Mar 23, 2006 at 11:27 am, in message <10223.1143134839@sss.pgh.pa.us>, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Profiling with gprof or some such tool might be educational. I've never used gprof before, and from a quick scan of the info, it appears that I need to compile and link a special version of the software to generate the file that gprof needs. Is this correct? Does it work on a Windows build, or will I need to use Linux? Any tips? Thanks, all. -Kevin
On Thu, Mar 23, 2006 at 12:29:27PM -0600, Kevin Grittner wrote: > > Works out to about 30 microsec per node execution, which seems a > > bit high for modern machines ... and the coarse quantization of the > > CURRENT_TIMESTAMP results is odd too. What platform is this on > > exactly? > > This is a smaller machine with a copy of the full production database. > A single 3.6 GHz Xeon with 4 GB RAM running Windows Server 2003. It was > being used to test update scripts before applying them to the production > machines. I stumbled across a costing issue I thought worth posting, > and in the course of gathering data noticed this time difference I > didn't understand. This may be the cause of the problem (windows). On UNIX platforms the gettimeofday() call is used to calculate the timings in both cases. On Windows the EXPLAIN ANALYZE measures time in a different way using the CPU counters. It uses the interface but it will run into issues if the CPU speed is not properly calculated or there is drift between the different CPUs. Here's one person who claims that the performance counter frequency is often wrong: http://archives.postgresql.org/pgsql-hackers-win32/2005-03/msg00063.php It's also been pointed out before that the code actually divides by the wrong number (it uses GetTimerFrequency() rather than QueryPerformenceFrequency()). If you can find the values of these two functions on your machine, see how it compares to your actual clock speed. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > I've never used gprof before, and from a quick scan of the info, it > appears that I need to compile and link a special version of the > software to generate the file that gprof needs. Is this correct? Does > it work on a Windows build, or will I need to use Linux? Any tips? I dunno anything about profiling on Windows. If you don't mind moving the test case to Linux it's pretty easy: ./configure --enable-debug --whatever-other-optionsmake PROFILE="-pg -DLINUX_PROFILE"install postgres executable (On non-Linux Unixen you can omit that -D flag, but otherwise the recipe is the same.) Run the test case (you'll want a test script that does the same thing over and over, enough times to build up a decent set of statistics; I like to have about a minute's worth of accumulated CPU time in a profile run). Exit the session --- the gmon.out file will only be dumped at backend process exit. Then do gprof /path/to/postgres-executable $PGDATA/gmon.out >outfile BTW, in 8.1 you want to be sure to do this with autovacuum off, else exit of the autovacuum process might clobber the gmon.out file before you can run gprof. regards, tom lane
On Wed, 2006-03-22 at 21:59 -0500, Tom Lane wrote: > I'm betting that a > big part of your issue is that the EXPLAIN ANALYZE instrumentation > overhead is (1) significant I would like to implement an additional mode for EXPLAIN ANALYZE that does no timing instrumentation at all. Most of the time just looking for differences between estimated and actual row counts is all you need. For long queries, the additional time can make the execution infeasible, yet frequently they are the ones you want to see the output for. EXPLAIN ANALYZE NOTIMING sounds a little dry, any ideas? Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > I would like to implement an additional mode for EXPLAIN ANALYZE that > does no timing instrumentation at all. Most of the time just looking for > differences between estimated and actual row counts is all you need. I don't really agree with that premise ... without timings, you cannot for instance tell if the planner has over/underestimated the cost of an index fetch. regards, tom lane
On Sat, Mar 25, 2006 at 10:00:51AM -0500, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > I would like to implement an additional mode for EXPLAIN ANALYZE that > > does no timing instrumentation at all. Most of the time just looking for > > differences between estimated and actual row counts is all you need. > > I don't really agree with that premise ... without timings, you cannot > for instance tell if the planner has over/underestimated the cost of an > index fetch. I agree. However, if it's the overhead of calling gettimeofday() that slows everything down, perhaps we should tackle that end. For example, have a sampling mode that only times say 5% of the executed nodes. EXPLAIN ANALYZE SAMPLE blah; And then in InstrStart have a quick test that skips the gettimeofday for this interation sometimes. You'd probably need some heuristics because you always want to catch the first iteration but after the 10,000th tuple in an indexscan, you're probably not going to learn anything new. You could do a non-random sampling fairly easily: if( ntuples < 16 ) yes else if( ntuples < 16*16 && (ntuples%16) == 0) yes else if( ntuples < 16*16*16 && ntuples%(16*16) == 0) yes else etc etc etc This mean that the more often a node is executed, the less often you actually time it. Note, we store ntuples as a doulbe so the mod operation won't work... How does this sound? -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
On Sat, Mar 25, 2006 at 04:24:05PM +0100, Martijn van Oosterhout wrote: > I agree. However, if it's the overhead of calling gettimeofday() that > slows everything down, perhaps we should tackle that end. For example, > have a sampling mode that only times say 5% of the executed nodes. > > EXPLAIN ANALYZE SAMPLE blah; <snip> > You could do a non-random sampling fairly easily: Actually, I thought of a better way to control the sampling that's probably better than the crude if-then-else structure I gave which also takes advantage of the fact that the numbers are floating point: InstrInit: next_sample = 0; InstrStart: if( ntuples < 16 ) dosample = yes; else if( ntuples > next_sample ) { dosample = yes; next_sample += log2(ntuples); } else dosample = no; This will sample approxiamtly log2 of the actual executions. You could use log10(), sqrt() or any other function you find reflects the sample you want. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
On Sat, 2006-03-25 at 16:24 +0100, Martijn van Oosterhout wrote: > I agree. However, if it's the overhead of calling gettimeofday() that > slows everything down, perhaps we should tackle that end. For example, > have a sampling mode that only times say 5% of the executed nodes. > > EXPLAIN ANALYZE SAMPLE blah; I like this idea. Why not do this all the time? I'd say we don't need the SAMPLE clause at all, just do this for all EXPLAIN ANALYZEs. > And then in InstrStart have a quick test that skips the gettimeofday > for this interation sometimes. You'd probably need some heuristics > because you always want to catch the first iteration but after the > 10,000th tuple in an indexscan, you're probably not going to learn > anything new. > How does this sound? Something even simpler? First 40 plus 5% random sample after that? I'd prefer a random sample so we have the highest level of trust in the numbers produced. Otherwise we might accidentally introduce bias from systematic effects such as nested loops queries speeding up towards the end of their run. (I know we would do that at the start, but we are stuck because we don't know the population size ahead of time and we know we need a reasonable number of data points). Best Regards, Simon Riggs
On Sat, Mar 25, 2006 at 05:38:26PM +0000, Simon Riggs wrote: > On Sat, 2006-03-25 at 16:24 +0100, Martijn van Oosterhout wrote: > > > I agree. However, if it's the overhead of calling gettimeofday() that > > slows everything down, perhaps we should tackle that end. For example, > > have a sampling mode that only times say 5% of the executed nodes. > > > > EXPLAIN ANALYZE SAMPLE blah; > > I like this idea. Why not do this all the time? I'd say we don't need > the SAMPLE clause at all, just do this for all EXPLAIN ANALYZEs. I was wondering about that. But then you may run into wierd results if a subselect takes a long time for just a few value. But maybe it should be the default, and have a FULL mode to say you want to measure everything. > Something even simpler? First 40 plus 5% random sample after that? I'd > prefer a random sample so we have the highest level of trust in the > numbers produced. Otherwise we might accidentally introduce bias from > systematic effects such as nested loops queries speeding up towards the > end of their run. (I know we would do that at the start, but we are > stuck because we don't know the population size ahead of time and we > know we need a reasonable number of data points). Well, I was wondering if a fixed percentage was appropriate. 5% of 10 million is still a lot for possibly not a lot of benefit. The followup email suggested a sampling that keeps happening less often as the number of tuples increases it a logorithmic based way. But we could add dome randomness that'd be cool. The question is, what's the overhead of calling random()? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.