Thread: Large arrays give long lag on server side before command executes
I'm calling one stored procedure with a prepared statement on the server with 6 arrays of around 1200 elements each as parameters. The parameters are around 220K in total. This is taking a surprising amount of time. Thus I put a lot of logging into the application and in the stored procedure that's getting called. It seems that almost all of most of the time is spent before the stored procedure is even entered. What wakes my suspicion that this is not as it should be, is that the time spent before the prepared statements starts is as long or longer as the time spent inside the stored procedure: doing 1200 inserts into a table, a fairly complex outer join with that data, and a loop over around the 1200 resulting records. Enabling lots of logging gives me: 2004-09-22 20:27:29 [5812] LOG: REWRITER STATISTICS DETAIL: ! system usage stats: ! 0.000004 elapsed 0.000000 user 0.000000 system sec ! [11.850198 user 0.215967 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 0/0 [0/3001] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [132/209] voluntary/involuntary context switches ! buffer usage stats: ! Shared blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Direct blocks: 0 read, 0 written 2004-09-22 20:27:29 [5812] DEBUG: ProcessUtility 2004-09-22 20:27:29 [5812] DEBUG: CommitTransactionCommand 2004-09-22 20:27:29 [5812] LOG: duration: 1.982 ms 2004-09-22 20:27:30 [5812] DEBUG: StartTransactionCommand 2004-09-22 20:27:35 [5812] NOTICE: Recursive status for begun at 2004-09-22 20:27:35.371659 2004-09-22 20:27:39 [5812] LOG: PARSER STATISTICS DETAIL: ! system usage stats: ! 0.000119 elapsed 0.000000 user 0.000000 system sec ! [20.825833 user 0.340948 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 0/0 [0/3590] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [273/362] voluntary/involuntary context switches ! buffer usage stats: ! Shared blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Direct blocks: 0 read, 0 written CONTEXT: SQL function "get_asset_guid_safe" during startup PL/pgSQL function "maint_status_recursive" line 198 at assignment What I find interesting is how five seconds pass from the 20:27:30 debug, where it starts calling my procedure, and the log I print out, at the top of the procedure at 20:27:35. Any suggestions how I go about finding the bottleneck here? What tools do other people use for profiling on Linux. This might sound like a "then don't do that" situation. The data is stuff that will be inserted into a temporary table, so I'd rather not go a different route, but I'm considering switching to a lot of insert statements or a copy in. Regards, d. -- David Helgason, Business Development et al., Over the Edge I/S (http://otee.dk) Direct line +45 2620 0663 Main line +45 3264 5049
Probably should have added the prototypes: -- the prepared statement prepare maint_status_recursive(text, integer[], character(32)[], character(32)[], text[], integer[], bytea[]) as select * from maint_status_recursive($1,$2,$3,$4,$5,$6,$7); -- the stored procedure create or replace function maint_status_recursive(text, integer[], character(32)[], character(32)[], text[], integer[], bytea[]) returns setof maint_status_type language plpgsql Looking at the processes on the machine, I see the process doing BIND during those seconds. As mentioned, I'm sending 1200 element arrays as all the array parameters, and the parsing is taking an amazing 5+ second on an unloaded server... it is a 300Mhz Mandrake server, but still seems extreme with 5+ seconds parsing and building a some arrays. I'd really thankful for pointers to solutions or code to look at. d. -- David Helgason, Business Development et al., Over the Edge I/S (http://otee.dk) Direct line +45 2620 0663 Main line +45 3264 5049 On 24. sep 2004, at 11:09, David Helgason wrote: > I'm calling one stored procedure with a prepared statement on the > server with 6 arrays of around 1200 elements each as parameters. The > parameters are around 220K in total. > > This is taking a surprising amount of time. Thus I put a lot of > logging into the application and in the stored procedure that's > getting called. It seems that almost all of most of the time is spent > before the stored procedure is even entered. > > What wakes my suspicion that this is not as it should be, is that the > time spent before the prepared statements starts is as long or longer > as the time spent inside the stored procedure: doing 1200 inserts into > a table, a fairly complex outer join with that data, and a loop over > around the 1200 resulting records. > > Enabling lots of logging gives me: > > 2004-09-22 20:27:29 [5812] LOG: REWRITER STATISTICS > DETAIL: ! system usage stats: > ! 0.000004 elapsed 0.000000 user 0.000000 system sec > ! [11.850198 user 0.215967 sys total] > ! 0/0 [0/0] filesystem blocks in/out > ! 0/0 [0/3001] page faults/reclaims, 0 [0] swaps > ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent > ! 0/0 [132/209] voluntary/involuntary context switches > ! buffer usage stats: > ! Shared blocks: 0 read, 0 written, > buffer hit rate = 0.00% > ! Local blocks: 0 read, 0 written, > buffer hit rate = 0.00% > ! Direct blocks: 0 read, 0 written > 2004-09-22 20:27:29 [5812] DEBUG: ProcessUtility > 2004-09-22 20:27:29 [5812] DEBUG: CommitTransactionCommand > 2004-09-22 20:27:29 [5812] LOG: duration: 1.982 ms > 2004-09-22 20:27:30 [5812] DEBUG: StartTransactionCommand > 2004-09-22 20:27:35 [5812] NOTICE: Recursive status for begun at > 2004-09-22 20:27:35.371659 > 2004-09-22 20:27:39 [5812] LOG: PARSER STATISTICS > DETAIL: ! system usage stats: > ! 0.000119 elapsed 0.000000 user 0.000000 system sec > ! [20.825833 user 0.340948 sys total] > ! 0/0 [0/0] filesystem blocks in/out > ! 0/0 [0/3590] page faults/reclaims, 0 [0] swaps > ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent > ! 0/0 [273/362] voluntary/involuntary context switches > ! buffer usage stats: > ! Shared blocks: 0 read, 0 written, > buffer hit rate = 0.00% > ! Local blocks: 0 read, 0 written, > buffer hit rate = 0.00% > ! Direct blocks: 0 read, 0 written > CONTEXT: SQL function "get_asset_guid_safe" during startup > PL/pgSQL function "maint_status_recursive" line 198 at > assignment > > What I find interesting is how five seconds pass from the 20:27:30 > debug, where it starts calling my procedure, and the log I print out, > at the top of the procedure at 20:27:35. > > Any suggestions how I go about finding the bottleneck here? What tools > do other people use for profiling on Linux. > > This might sound like a "then don't do that" situation. The data is > stuff that will be inserted into a temporary table, so I'd rather not > go a different route, but I'm considering switching to a lot of insert > statements or a copy in. > > Regards, > > d. > -- > David Helgason, > Business Development et al., > Over the Edge I/S (http://otee.dk) > Direct line +45 2620 0663 > Main line +45 3264 5049 > > > ---------------------------(end of > broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings >
David Helgason <david@uti.is> writes: > I'm calling one stored procedure with a prepared statement on the > server with 6 arrays of around 1200 elements each as parameters. The > parameters are around 220K in total. Exactly how are you fetching, building, or otherwise providing the arrays? Which PG version is this exactly? > Any suggestions how I go about finding the bottleneck here? What tools > do other people use for profiling on Linux. Rebuild with profiling enabled (make clean; make PROFILE="-pg -DLINUX_PROFILE") and then use gprof to produce a report from the trace file that the backend drops when it exits. If that sounds out of your league, send along a self-contained test case and I'll be glad to take a look. > This might sound like a "then don't do that" situation. I'm betting on some O(N^2) behavior in the array code, but it'll be difficult to pinpoint without profile results. regards, tom lane
On 24. sep 2004, at 16:32, Tom Lane wrote: > David Helgason <david@uti.is> writes: >> Any suggestions how I go about finding the bottleneck here? What tools >> do other people use for profiling on Linux. > > Rebuild with profiling enabled (make clean; make PROFILE="-pg > -DLINUX_PROFILE") > and then use gprof to produce a report from the trace file that the > backend drops when it exits. > > If that sounds out of your league, send along a self-contained test > case > and I'll be glad to take a look. I can try to make a self-contained test case, but right now there's a 10e6 dependancies, so it would be great if this is enough: Here's the gmon.out file that was created after a short run of postgres, and the output of 'gprof /usr/local/postgres/bin/postgres gmon.out'. The data didn't make much sense to me though. I'll try to investigate further. d. -- David Helgason, Business Development et al., Over the Edge I/S (http://otee.dk) Direct line +45 2620 0663 Main line +45 3264 5049
Attachment
David Helgason <david@uti.is> writes: > Here's the gmon.out file that was created after a short run of > postgres, and the output of 'gprof /usr/local/postgres/bin/postgres > gmon.out'. The data didn't make much sense to me though. I think you got a gmon.out from the wrong backend run. This run looks like it didn't do much except start and exit --- it certainly didn't run for any five seconds, which is what you were saying the problem case took. regards, tom lane
On Friday 24 September 2004 7:32 am, Tom Lane wrote: > David Helgason <david@uti.is> writes: > > I'm calling one stored procedure with a prepared statement on the > > server with 6 arrays of around 1200 elements each as parameters. > > The parameters are around 220K in total. > > Exactly how are you fetching, building, or otherwise providing the > arrays? Which PG version is this exactly? > > > Any suggestions how I go about finding the bottleneck here? What > > tools do other people use for profiling on Linux. > > Rebuild with profiling enabled (make clean; make PROFILE="-pg > -DLINUX_PROFILE") and then use gprof to produce a report from the > trace file that the backend drops when it exits. > > If that sounds out of your league, send along a self-contained test > case and I'll be glad to take a look. > > > This might sound like a "then don't do that" situation. > > I'm betting on some O(N^2) behavior in the array code, but it'll be > difficult to pinpoint without profile results. Possibly the bug we discussed early last year (IIRC added to todo but not fixed): http://archives.postgresql.org/pgsql-performance/2003-01/msg00235.php Cheers, Steve
Steve Crawford <scrawford@pinpointresearch.com> writes: > On Friday 24 September 2004 7:32 am, Tom Lane wrote: >> I'm betting on some O(N^2) behavior in the array code, but it'll be >> difficult to pinpoint without profile results. > Possibly the bug we discussed early last year (IIRC added to todo but > not fixed): > http://archives.postgresql.org/pgsql-performance/2003-01/msg00235.php No, it's not the same --- David sent me some results off-list and it seems ReadArrayStr is the culprit. I'm not completely sure why yet. FWIW, the printtup part of your gripe is fixed in CVS tip. regards, tom lane
On 25. sep 2004, at 01:42, Tom Lane wrote: > Steve Crawford <scrawford@pinpointresearch.com> writes: >> On Friday 24 September 2004 7:32 am, Tom Lane wrote: >>> I'm betting on some O(N^2) behavior in the array code, but it'll be >>> difficult to pinpoint without profile results. > >> Possibly the bug we discussed early last year (IIRC added to todo but >> not fixed): >> http://archives.postgresql.org/pgsql-performance/2003-01/msg00235.php > > No, it's not the same --- David sent me some results off-list and it > seems ReadArrayStr is the culprit. I'm not completely sure why yet. After playing around, I found the culprit. As always, I might just as well have gone to the code. Too bad C always reads to me like an old manuscript. I know the language, but I have to stutter through it. Anyway, these lines should trigger anyones big-O alarm: /* adt/arrayfuncs.c, line 575ff in v. 7.4.1. In the big switch statement inside ReadArrayStr() */ case '\\': { char *cptr; /* Crunch the string on top of the backslash. */ for (cptr = ptr; *cptr != '\0'; cptr++) /* BAD :) */ *cptr = *(cptr + 1); if (*ptr == '\0') ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("malformed array literal: \"%s\"", arrayStr))); break; } case '\"': { char *cptr; scanning_string = !scanning_string; /* Crunch the string on top of the quote. */ for (cptr = ptr; *cptr != '\0'; cptr++) /* BAD :) */ *cptr = *(cptr + 1); /* Back up to not miss following character. */ ptr--; break; } I'm toying with a reimplementation of this algorithm, but it might be some days before I have the time. Didn't look at the 8.0 beta yet, so of course someone might have been there already. I'll be back when I have more. d. > FWIW, the printtup part of your gripe is fixed in CVS tip. > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > -- David Helgason, Business Development et al., Over the Edge I/S (http://otee.dk) Direct line +45 2620 0663 Main line +45 3264 5049 -- David Helgason, Business Development et al., Over the Edge I/S (http://otee.dk) Direct line +45 2620 0663 Main line +45 3264 5049
David Helgason <david@uti.is> writes: > On 25. sep 2004, at 01:42, Tom Lane wrote: >> No, it's not the same --- David sent me some results off-list and it >> seems ReadArrayStr is the culprit. I'm not completely sure why yet. > After playing around, I found the culprit. I suspected as much but couldn't be sure without seeing your test data. I take it you have lots of quotes and/or backslashes in your big array values? regards, tom lane
On 26. sep 2004, at 17:25, Tom Lane wrote: > David Helgason <david@uti.is> writes: >> On 25. sep 2004, at 01:42, Tom Lane wrote: >>> No, it's not the same --- David sent me some results off-list and it >>> seems ReadArrayStr is the culprit. I'm not completely sure why yet. > >> After playing around, I found the culprit. > > I suspected as much but couldn't be sure without seeing your test data. > I take it you have lots of quotes and/or backslashes in your big array > values? Yes, I've got bytea-typed digests in there, and those naturally will have lots and lots of quoted stuff in them. I get the same – though not as dire – with text fields, unless when these contain very plain data. As soon as I wrap my brain around the details of ReadArrayStr() I'll try to reimplement it. Unless someone intimate with the code gets there first of course :) d.
David Helgason <david@uti.is> writes: > As soon as I wrap my brain around the details of ReadArrayStr() I'll > try to reimplement it. Unless someone intimate with the code gets there > first of course :) I've committed a rewrite into CVS tip. I have not looked at backporting it to 7.4 though --- since the code in that area has changed since 7.4, a backport is probably not quite trivial. regards, tom lane