Thread: not exactly a bug report, but surprising behaviour
It seems the sort operation is done very late in the process, after functions in the select column list have already been called and the results stored. This makes using sequences to generate a sequential list of numbers have a surprising behaviour. I don't know what standards say about this, I imagine they don't have anything relevant because sequences are nonstandard. But I'm pretty sure I recall doing queries like this all the time in Oracle and not seeing behaviour like this. Note that this makes the resulting data different if the index exists vs if it doesn't. I see the same behaviour with INSERT (...) (SELECT...) so it's not just CREATE TABLE AS doing it. As I said I'm not sure this is wrong, just wanted to mention it in case it's not intentional. slo=> create sequence w; CREATE SEQUENCE slo=> create table w1 (w integer); CREATE TABLE slo=> insert into w1 values (1); INSERT 229135376 1 slo=> insert into w1 values (2); INSERT 229135377 1 slo=> insert into w1 values (3); INSERT 229135378 1 slo=> create table w2 as (select nextval('w'),w from w1 order by w desc); SELECT slo=> select * from w2; nextval | w ---------+--- 3 | 3 2 | 2 1 | 1 (3 rows) slo=> create index idx_w on w1(w); CREATE INDEX slo=> set enable_seqscan = off; SET slo=> create table w3 as (select nextval('w'),w from w1 order by w desc); SELECT slo=> select * from w3; nextval | w ---------+--- 4 | 3 5 | 2 6 | 1 (3 rows) -- greg
On Tue, Feb 04, 2003 at 14:11:20 -0500, Gregory Stark <gsstark@mit.edu> wrote: > > It seems the sort operation is done very late in the process, after functions > in the select column list have already been called and the results stored. > This makes using sequences to generate a sequential list of numbers have a > surprising behaviour. You can't count on that behavior anyway. Sequences only guarenty a unique number. Other transactions can grab values from the sequence while your transaction is running.
On 4 Feb 2003, Gregory Stark wrote: > It seems the sort operation is done very late in the process, after functions > in the select column list have already been called and the results stored. > This makes using sequences to generate a sequential list of numbers have a > surprising behaviour. Select list entries are done before order by since you can order by the output of a select list entry. > slo=> create sequence w; > CREATE SEQUENCE > slo=> create table w1 (w integer); > CREATE TABLE > slo=> insert into w1 values (1); > INSERT 229135376 1 > slo=> insert into w1 values (2); > INSERT 229135377 1 > slo=> insert into w1 values (3); > INSERT 229135378 1 > slo=> create table w2 as (select nextval('w'),w from w1 order by w desc); > SELECT > slo=> select * from w2; > nextval | w > ---------+--- > 3 | 3 > 2 | 2 > 1 | 1 > (3 rows) I think the query that would give you what you want in this case is something like the following to force the order before doing the nextvals: create table w2 as select nextval('w'), w from (select w from w1 order by w desc) w1; However, I'm not sure that you can count on that working in all cases either.
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > Select list entries are done before order by since you can order by the > output of a select list entry. I understood that, in fact I can't quite figure out how Oracle could possibly do it differently. Perhaps if the sort refers only to data available immediately it does the sort before calculating the output columns. This would have a couple big advantages: 1) If columns were excluded from the results from limit/offset clauses then possibly expensive functions in the select list wouldn't have to be calculated. It also means if the sort is quick but the functions slow that the first rows would be returned quickly to the application even if the total time was the same. 2) The sort could be done on pointers to the tuples rather than pushing the data in the tuple around in the sort. Obviously the transaction integrity issues are tricky with this though. But it could save a lot of memory pressure from sorts. On the other hand, then you get two different behaviours depending on whether any output columns are listed in the order clause, which is a nonobvious side effect. > I think the query that would give you what you want in this case is > something like the following to force the order before doing the nextvals: I don't need a solution for my purposes. I just created the table and then did an update. Of course that isn't guaranteed either since it depends on the physical layout of the records and the behaviour of update, but it worked. -- greg
On 4 Feb 2003, Greg Stark wrote: > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > > Select list entries are done before order by since you can order by the > > output of a select list entry. > > I understood that, in fact I can't quite figure out how Oracle could possibly > do it differently. Perhaps if the sort refers only to data available > immediately it does the sort before calculating the output columns. Does it always do it when it's only on input columns? Or only in some cases? It might be interesting to see. For example, the same sort of thing with an index existing/not existing might make a different or the plan chosen. > This would have a couple big advantages: > > 1) If columns were excluded from the results from limit/offset clauses then > possibly expensive functions in the select list wouldn't have to be > calculated. It also means if the sort is quick but the functions slow that > the first rows would be returned quickly to the application even if the > total time was the same. That's a useful advantage. You can probably get this effect from pushing clauses into from list subselects as a current optimization, but it'd be nice since that's not an obvious conversion. > 2) The sort could be done on pointers to the tuples rather than pushing the > data in the tuple around in the sort. Obviously the transaction integrity > issues are tricky with this though. But it could save a lot of memory > pressure from sorts. This'd be an advantage in the case of a single table. Unless you can guarantee the join keeps the ordering properties, I don't think this'd help with anything that's doing joins or anything more complicated.
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > This would have a couple big advantages: > > > > 1) If columns were excluded from the results from limit/offset clauses then > > possibly expensive functions in the select list wouldn't have to be > > calculated. It also means if the sort is quick but the functions slow that > > the first rows would be returned quickly to the application even if the > > total time was the same. > > That's a useful advantage. You can probably get this effect from pushing > clauses into from list subselects as a current optimization, but it'd be > nice since that's not an obvious conversion. Not if the clause depends on the sort results such as limit/offset or having clauses. > > 2) The sort could be done on pointers to the tuples rather than pushing the > > data in the tuple around in the sort. Obviously the transaction integrity > > issues are tricky with this though. But it could save a lot of memory > > pressure from sorts. > > This'd be an advantage in the case of a single table. Unless you can > guarantee the join keeps the ordering properties, I don't think this'd > help with anything that's doing joins or anything more complicated. Joins shouldn't be a problem, just sort a list of pointers to records and have a structure on the side saying which tables each pointer corresponds to and which columns and expressions are requested from each. I'm not sure how to retain transactional integrity with this though. The tricky part would be evaluating when it would help. When sorting many records with a narrow result set it wouldn't pay to generate pointers that would be just as large as the original data and possibly push the original data out of cache. But for sorting a limited number of records of many columns each it would be way more efficient. I'm watching this query I'm working on now drive the cpu to 100% for 6+ hours with virtually no I/O. And I think it's the best I can do. All the times is being spent moving bits around from one place to another. I'm entirely blocked on memory throughput right now. The fewer data are actually pushed around the faster sorts will be. It occurs to me that it's possible postgres is doing this already at a lower level of abstraction. The sort engine could load all the records to sort into an array and generate a list of indexes into that array and sort the indexes. But I get the impression from the previous conversation that that's not what's happening. I'm constantly surprised by how much work there is to build a good database engine. But I'm also constantly amazed by how much postgres is already doing. It's really quite remarkable how complete it is. -- greg
Greg Stark <gsstark@mit.edu> writes: > I'm watching this query I'm working on now drive the cpu to 100% for 6+ hours > with virtually no I/O. And I think it's the best I can do. All the times is > being spent moving bits around from one place to another. It would be interesting to see a gprof profile of that. > It occurs to me that it's possible postgres is doing this already at a lower > level of abstraction. Yes, tuplesort.c has heard of pushing pointers around rather than copying records. I wonder though whether data is being pushed out to kernel disk buffers and then back in again --- do you have sort_mem set large enough? Another likely theory is that the interface layers needed to access datatype-specific comparison routines are chewing the cycles. Need facts not speculation to know where the bottleneck is... regards, tom lane
On 5 Feb 2003, Greg Stark wrote: > > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > > > This would have a couple big advantages: > > > > > > 1) If columns were excluded from the results from limit/offset clauses then > > > possibly expensive functions in the select list wouldn't have to be > > > calculated. It also means if the sort is quick but the functions slow that > > > the first rows would be returned quickly to the application even if the > > > total time was the same. > > > > That's a useful advantage. You can probably get this effect from pushing > > clauses into from list subselects as a current optimization, but it'd be > > nice since that's not an obvious conversion. > > Not if the clause depends on the sort results such as limit/offset or having > clauses. Sorry, I thought we were speaking specifically of the select expensive_function(...), col1, ... from table1, ... order by col1, ... limit ... case which probably safely can be written into select expensive_function(...), col1, ... from (select col1, ... from table1, ... order by col1, ... limit ...) foo; > > > 2) The sort could be done on pointers to the tuples rather than pushing the > > > data in the tuple around in the sort. Obviously the transaction integrity > > > issues are tricky with this though. But it could save a lot of memory > > > pressure from sorts. > > > > This'd be an advantage in the case of a single table. Unless you can > > guarantee the join keeps the ordering properties, I don't think this'd > > help with anything that's doing joins or anything more complicated. > > Joins shouldn't be a problem, just sort a list of pointers to records and have > a structure on the side saying which tables each pointer corresponds to and > which columns and expressions are requested from each. I'm not sure how to > retain transactional integrity with this though. Hmm, are you talking about pointers to the original tables or pointers to the output of the last step? The latter should be fine, but the former is problematic I'd think.
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > I'm watching this query I'm working on now drive the cpu to 100% for 6+ hours > > with virtually no I/O. And I think it's the best I can do. All the times is > > being spent moving bits around from one place to another. > > It would be interesting to see a gprof profile of that. Ok, I have a copy of postgres built with -pg now. I'm running grpof on the gmon.out file but all I'm seeing is this: % cumulative self self total time seconds seconds calls ms/call ms/call name 100.00 0.01 0.01 1 10.00 10.00 InitFreeSpaceMap 0.00 0.01 0.00 12296 0.00 0.00 LWLockAssign 0.00 0.01 0.00 323 0.00 0.00 GUC_yylex 0.00 0.01 0.00 123 0.00 0.00 ShmemAlloc 0.00 0.01 0.00 123 0.00 0.00 tas 0.00 0.01 0.00 79 0.00 0.00 AllocSetFreeIndex ... I assume this is only the postmaster initialization as I've run a smaller version of the job which took 8 minutes. Is there an easy trick to getting gmon.out output from a specific backend process? > > It occurs to me that it's possible postgres is doing this already at a lower > > level of abstraction. > > Yes, tuplesort.c has heard of pushing pointers around rather than > copying records. I wonder though whether data is being pushed out to > kernel disk buffers and then back in again --- do you have sort_mem > set large enough? Yes, this is the job I asked about sort_mem for. I have it up to 64M and no pgsql_tmp files are being created. > Another likely theory is that the interface layers needed to access > datatype-specific comparison routines are chewing the cycles. Need facts not > speculation to know where the bottleneck is... -- greg
Greg Stark <gsstark@mit.edu> writes: > Ok, I have a copy of postgres built with -pg now. I'm running grpof on the > gmon.out file but all I'm seeing is this: > I assume this is only the postmaster initialization as I've run a smaller > version of the job which took 8 minutes. Is there an easy trick to getting > gmon.out output from a specific backend process? Two things: one, backends dump gmon.out into $PGDATA/base/yourdb/, whereas the postmaster dumps into wherever you started it (it never does chdir IIRC). Two, if you're on Linux you must compile postmaster.c with -DLINUX_PROFILE, or you won't get any profiling timer interrupts in child processes. Your sample looks like the second problem... regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > Two things: one, backends dump gmon.out into $PGDATA/base/yourdb/, > whereas the postmaster dumps into wherever you started it (it never does > chdir IIRC). Two, if you're on Linux you must compile postmaster.c with > -DLINUX_PROFILE, or you won't get any profiling timer interrupts in > child processes. > > Your sample looks like the second problem... Actually I had both problems, I didn't realize the backends did a chdir. I cancelled it partway through. % cumulative self self total time seconds seconds calls s/call s/call name 16.59 97.88 97.88 69608411 0.00 0.00 ExecMakeFunctionResult 7.08 139.65 41.77 79472258 0.00 0.00 comparetup_heap 4.50 166.17 26.52 192211935 0.00 0.00 ExecEvalExpr 3.94 189.41 23.24 155884536 0.00 0.00 AllocSetAlloc 2.57 204.59 15.18 69608411 0.00 0.00 ExecEvalFuncArgs 2.57 219.76 15.17 17360796 0.00 0.00 heap_formtuple 2.25 233.03 13.27 248222443 0.00 0.00 AllocSetFreeIndex 2.22 246.11 13.08 17360724 0.00 0.00 ExecTargetList 2.17 258.92 12.81 45871623 0.00 0.00 ExecEvalParam 1.99 270.65 11.73 17360796 0.00 0.00 DataFill 1.82 281.37 10.72 23020517 0.00 0.00 AllocSetReset 1.57 290.62 9.25 29962044 0.00 0.00 ExecEvalVar 1.29 298.20 7.58 17360796 0.00 0.00 ComputeDataSize 1.17 305.09 6.89 92337151 0.00 0.00 AllocSetFree 1.13 311.74 6.64 97183879 0.00 0.00 myFunctionCall2 1.11 318.27 6.54 155883526 0.00 0.00 MemoryContextAlloc 1.07 324.57 6.30 97183879 0.00 0.00 inlineApplySortFunction 1.05 330.75 6.17 17285775 0.00 0.00 newNode -- greg
On Wed, 2003-02-05 at 15:49, Greg Stark wrote: > % cumulative self self total > time seconds seconds calls s/call s/call name > 16.59 97.88 97.88 69608411 0.00 0.00 ExecMakeFunctionResult > 7.08 139.65 41.77 79472258 0.00 0.00 comparetup_heap > 4.50 166.17 26.52 192211935 0.00 0.00 ExecEvalExpr (Semi OT) Annoyingly enough, I get similarly strange gprof output (all zeros in the two right hand columns) on Debian -- PostgreSQL compiled with -DLINUX_PROFILE -pg using gcc 3.2.2pre, glibc 2.3.1, binutils 2.13.90.0.16. I haven't been able to figure out what the problem is, though. If anyone has any suggestions on how to fix this, please speak up. Cheers, Neil -- Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC
Neil Conway <neilc@samurai.com> writes: > On Wed, 2003-02-05 at 15:49, Greg Stark wrote: > > % cumulative self self total > > time seconds seconds calls s/call s/call name > > 16.59 97.88 97.88 69608411 0.00 0.00 ExecMakeFunctionResult > > 7.08 139.65 41.77 79472258 0.00 0.00 comparetup_heap > > 4.50 166.17 26.52 192211935 0.00 0.00 ExecEvalExpr > > (Semi OT) > > Annoyingly enough, I get similarly strange gprof output (all zeros in > the two right hand columns) on Debian -- PostgreSQL compiled with > -DLINUX_PROFILE -pg using gcc 3.2.2pre, glibc 2.3.1, binutils > 2.13.90.0.16. I haven't been able to figure out what the problem is, > though. > > If anyone has any suggestions on how to fix this, please speak up. Well this is Debian also, but I don't think that's related. Those two columns are "seconds per call". Look at how many calls those functions have, they take very little time per call, much less than 1/100th of a second. -- greg
Greg Stark <gsstark@mit.edu> writes: > Neil Conway <neilc@samurai.com> writes: >> On Wed, 2003-02-05 at 15:49, Greg Stark wrote: > % cumulative self self total > time seconds seconds calls s/call s/call name > 16.59 97.88 97.88 69608411 0.00 0.00 ExecMakeFunctionResult > 7.08 139.65 41.77 79472258 0.00 0.00 comparetup_heap > 4.50 166.17 26.52 192211935 0.00 0.00 ExecEvalExpr >> >> Annoyingly enough, I get similarly strange gprof output (all zeros in >> the two right hand columns) on Debian -- > Well this is Debian also, but I don't think that's related. Right. Zeroes in the per-call columns are not unexpected. If you get zeroes in the "self seconds" or "calls" fields then you have a measurement issue ... but what we're seeing here is just ye olde death of a thousand cuts, viz a lot of calls on routines that don't take very long in any one call. It is annoying that ExecMakeFunctionResult seems to be chewing so much CPU time, since it's nothing more than an interface routine. But offhand I have no ideas on how to improve the situation. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > Neil Conway <neilc@samurai.com> writes: > >> On Wed, 2003-02-05 at 15:49, Greg Stark wrote: > > % cumulative self self total > > time seconds seconds calls s/call s/call name > > 16.59 97.88 97.88 69608411 0.00 0.00 ExecMakeFunctionResult > > 7.08 139.65 41.77 79472258 0.00 0.00 comparetup_heap > > 4.50 166.17 26.52 192211935 0.00 0.00 ExecEvalExpr > >> > >> Annoyingly enough, I get similarly strange gprof output (all zeros in > >> the two right hand columns) on Debian -- > > > Well this is Debian also, but I don't think that's related. > > Right. Zeroes in the per-call columns are not unexpected. If you get > zeroes in the "self seconds" or "calls" fields then you have a > measurement issue ... but what we're seeing here is just ye olde death > of a thousand cuts, viz a lot of calls on routines that don't take very > long in any one call. Though it looks from the legend as if these columns really ought to be "ms/call" not "s/call". Even so they would still be 0.00. It looks like ExecMakeFunctionResult is taking about 1.6us/call. > It is annoying that ExecMakeFunctionResult seems to be chewing so much > CPU time, since it's nothing more than an interface routine. But > offhand I have no ideas on how to improve the situation. Does ExecMakeFunctionResult have to do with pgpsql functions? Or internal functions? I have a normal function that's being used in the query. It in turn calls sqrt, pow, and cos. Should I inline my function manually? Should I do ... * ... instead of pow(..., 2) to calculate the square? But I think this result is deceptive. It's still only 17% of the time. There doesn't seem to be a way to get the linux gprof to display the total time per function including children. That would indicate the place where algorithmic changes would have the biggest impact. -- greg
Greg Stark <gsstark@mit.edu> writes: > But I think this result is deceptive. It's still only 17% of the time. There > doesn't seem to be a way to get the linux gprof to display the total time per > function including children. Er ... did you look all the way through the output? The standard gprof output shows the flat timings you quoted followed by a call-tree report. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > Er ... did you look all the way through the output? The standard gprof > output shows the flat timings you quoted followed by a call-tree report. Well, yes but I wasn't sure what to look for. It sort of looks to me like gprof chose an odd place to root the recursive cycle which makes the result confusing. At least to me. index % time self children called name [1] 88.3 194.70 326.46 228+534992216 <cycle 2 as a whole> [1] 97.88 35.00 69608411 ExecMakeFunctionResult <cycle 2> [5] 3.68 121.86 5731762 postquel_execute <cycle 2> [6] 13.08 39.77 17360724 ExecTargetList <cycle 2> [17] 26.52 23.01 192211935 ExecEvalExpr <cycle 2> [18] 1.74 32.89 6127527 ExecScan <cycle 2> [23] 0.67 30.64 448463 ExecUnique <cycle 2> [26] 15.18 0.00 69608411 ExecEvalFuncArgs <cycle 2> [47] 4.16 8.86 11463396 ExecEvalArrayRef <cycle 2> [54] 2.74 8.95 5731958 ExecutePlan <cycle 2> [60] 1.48 8.56 5412663 ExecSort <cycle 2> [65] 2.70 7.04 17360724 ExecProject <cycle 2> [66] 3.11 3.13 5560629 ExecNestLoop <cycle 2> [86] 2.00 3.23 5731890 ExecResult <cycle 2> [93] 5.20 0.00 23281174 ExecProcNode <cycle 2> [94] 4.16 0.05 35012021 ExecEvalFunc <cycle 2> [104] 3.28 0.34 5731762 fmgr_sql <cycle 2> [113] 3.21 0.03 34596390 ExecEvalOper <cycle 2> [117] 1.18 1.70 5731958 ExecutorRun <cycle 2> [122] 1.12 0.00 5731762 postquel_getnext <cycle 2> [152] 0.47 0.35 5748363 ExecQual <cycle 2> [167] 0.81 0.00 5678894 ExecIndexScan <cycle 2> [168] 0.00 0.49 2116 make_op <cycle 2> [182] 0.13 0.32 59133 ExecIndexReScan <cycle 2> [187] 0.12 0.00 59133 ExecReScan <cycle 2> [217] 0.00 0.05 1054 ParseFuncOrColumn <cycle 2> [244] 0.05 0.00 448463 SubqueryNext <cycle 2> [245] 0.00 0.04 184 pg_parse_and_rewrite <cycle 2> [263] 0.01 0.02 4657 simplify_op_or_func <cycle 2> [266] 0.00 0.03 1005 coerce_to_target_type <cycle 2> [270] 0.00 0.03 1735 coerce_type <cycle 2> [276] 0.00 0.03 9859 expression_tree_mutator <cycle 2> [277] 0.00 0.02 759 typecast_expression <cycle 2> [291] 0.01 0.01 198+409 set_plan_references <cycle 2> [292] 0.01 0.00 448463 ExecSubqueryScan <cycle 2> [313] 0.00 0.00 60857 ExecEvalExprSwitchContext <cycle 2> [350] 0.00 0.00 242 transformArraySubscripts <cycle 2> [366] 0.00 0.00 184 init_sql_fcache <cycle 2> [373] 0.00 0.00 446+6484 transformExpr <cycle 2> [375] 0.00 0.00 10680 eval_const_expressions_mutator <cycle 2> [377] 0.00 0.00 198 transformSelectStmt <cycle 2> [387] 0.00 0.00 5 set_base_rel_pathlists <cycle 2> [391] 0.00 0.00 202 pg_analyze_and_rewrite <cycle 2> [395] 0.00 0.00 5 make_one_rel <cycle 2> [408] 0.00 0.00 5 subplanner <cycle 2> [417] 0.00 0.00 202 parse_analyze <cycle 2> [438] 0.00 0.00 199 query_planner <cycle 2> [440] 0.00 0.00 184 init_execution_state <cycle 2> [450] 0.00 0.00 200 transformTargetEntry <cycle 2> [453] 0.00 0.00 198 transformTargetList <cycle 2> [463] 0.00 0.00 198 planner <cycle 2> [489] 0.00 0.00 1 transformInsertStmt <cycle 2> [527] 0.00 0.00 199 grouping_planner <cycle 2> [536] 0.00 0.00 597 preprocess_expression <cycle 2> [538] 0.00 0.00 199 subquery_planner <cycle 2> [540] 0.00 0.00 1 ExecAgg <cycle 2> [557] 0.00 0.00 7 create_plan <cycle 2> [562] 0.00 0.00 11 transformColumnRef <cycle 2> [571] 0.00 0.00 1 set_subquery_pathlist <cycle 2> [610] 0.00 0.00 27 join_references_mutator <cycle 2> [637] 0.00 0.00 199+8 pull_up_subqueries <cycle 2> [651] 0.00 0.00 396 transformWhereClause <cycle 2> [667] 0.00 0.00 1 create_nestloop_plan <cycle 2> [708] 0.00 0.00 8 ResolveNew_mutator <cycle 2> [757] 0.00 0.00 203 transformStmt <cycle 2> [860] 0.00 0.00 1 set_uppernode_references <cycle 2> [887] 0.00 0.00 1 create_join_plan <cycle 2> [919] 0.00 0.00 4232 make_operand <cycle 2> [994] 0.00 0.00 1837 eval_const_expressions <cycle 2> [1005] 0.00 0.00 1266 transformIndirection <cycle 2> [1008] 0.00 0.00 1054 make_arguments <cycle 2> [1015] 0.00 0.00 199+6 preprocess_qual_conditions <cycle 2> [1069] 0.00 0.00 198 pg_plan_query <cycle 2> [1072] 0.00 0.00 170 ExecSeqScan <cycle 2> [1084] 0.00 0.00 33 replace_correlation_vars_mutator <cycle 2> [1149] 0.00 0.00 5 join_references <cycle 2> [1220] 0.00 0.00 3 replace_vars_with_subplan_refs_mutator <cycle 2> [1276] 0.00 0.00 3 SS_replace_correlation_vars <cycle 2> [1266] 0.00 0.00 3 ResolveNew <cycle 2> [1265] 0.00 0.00 2 replace_vars_with_subplan_refs <cycle 2> [1325] 0.00 0.00 1+1 resolvenew_in_jointree <cycle 2> [1449] 0.00 0.00 1 set_join_references <cycle 2> [1453] ----------------------------------------------- 0.00 23.77 11/201 _SPI_execute [11] 0.00 410.53 190/201 _SPI_execute_plan [4] [2] 73.6 0.00 434.30 201 _SPI_pquery [2] 162.25 272.05 190/228 ExecutorRun <cycle 2> [122] 0.00 0.00 190/5731956 ExecutorStart [8] 0.00 0.00 189/5731953 ExecutorEnd [38] 0.00 0.00 127/130 _SPI_checktuples [4917] ----------------------------------------------- <spontaneous> [3] 69.6 0.00 410.54 SPI_execp [3] 0.00 410.53 190/190 _SPI_execute_plan [4] 0.00 0.00 190/212 _SPI_copy_plan [365] 0.00 0.00 189/203 _SPI_end_call [439] 0.00 0.00 190/215 _SPI_begin_call [788] ----------------------------------------------- 0.00 410.53 190/190 SPI_execp [3] [4] 69.6 0.00 410.53 190 _SPI_execute_plan [4] 0.00 410.53 190/201 _SPI_pquery [2] 0.00 0.00 379/395 CommandCounterIncrement [454] 0.00 0.00 190/441 CreateExecutorState [443] 0.00 0.00 190/389 CreateQueryDesc [535] 0.00 0.00 189/155883526 MemoryContextAlloc [21] -- greg
Greg Stark <gsstark@mit.edu> writes: > Well, yes but I wasn't sure what to look for. It sort of looks to me like > gprof chose an odd place to root the recursive cycle which makes the result > confusing. At least to me. _SPI_execute_plan? I thought you said this was a simple SELECT...ORDER BY query. regards, tom lane