Thread: Large arrays give long lag on server side before command executes

Large arrays give long lag on server side before command executes

From

David Helgason

Date:

24 September 2004, 10:09:46

I'm calling one stored procedure with a prepared statement on the
server with 6 arrays of around 1200 elements each as parameters. The
parameters are around 220K in total.

This is taking a surprising amount of time. Thus I put a lot of logging
into the application and in the stored procedure that's getting called.
It seems that almost all of most of the time is spent before the stored
procedure is even entered.

What wakes my suspicion that this is not as it should be, is that the
time spent before the prepared statements starts is as long or longer
as the time spent inside the stored procedure: doing 1200 inserts into
a table, a fairly complex outer join with that data, and a loop over
around the 1200 resulting records.

Enabling lots of logging gives me:

2004-09-22 20:27:29 [5812] LOG:  REWRITER STATISTICS
DETAIL:  ! system usage stats:
         !       0.000004 elapsed 0.000000 user 0.000000 system sec
         !       [11.850198 user 0.215967 sys total]
         !       0/0 [0/0] filesystem blocks in/out
         !       0/0 [0/3001] page faults/reclaims, 0 [0] swaps
         !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
         !       0/0 [132/209] voluntary/involuntary context switches
         ! buffer usage stats:
         !       Shared blocks:          0 read,          0 written,
buffer hit rate = 0.00%
         !       Local  blocks:          0 read,          0 written,
buffer hit rate = 0.00%
         !       Direct blocks:          0 read,          0 written
2004-09-22 20:27:29 [5812] DEBUG:  ProcessUtility
2004-09-22 20:27:29 [5812] DEBUG:  CommitTransactionCommand
2004-09-22 20:27:29 [5812] LOG:  duration: 1.982 ms
2004-09-22 20:27:30 [5812] DEBUG:  StartTransactionCommand
2004-09-22 20:27:35 [5812] NOTICE:  Recursive status for begun at
2004-09-22 20:27:35.371659
2004-09-22 20:27:39 [5812] LOG:  PARSER STATISTICS
DETAIL:  ! system usage stats:
         !       0.000119 elapsed 0.000000 user 0.000000 system sec
         !       [20.825833 user 0.340948 sys total]
         !       0/0 [0/0] filesystem blocks in/out
         !       0/0 [0/3590] page faults/reclaims, 0 [0] swaps
         !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
         !       0/0 [273/362] voluntary/involuntary context switches
         ! buffer usage stats:
         !       Shared blocks:          0 read,          0 written,
buffer hit rate = 0.00%
         !       Local  blocks:          0 read,          0 written,
buffer hit rate = 0.00%
         !       Direct blocks:          0 read,          0 written
CONTEXT:  SQL function "get_asset_guid_safe" during startup
         PL/pgSQL function "maint_status_recursive" line 198 at
assignment

What I find interesting is how five seconds pass from the 20:27:30
debug, where it starts calling my procedure, and the log I print out,
at the top of the procedure at 20:27:35.

Any suggestions how I go about finding the bottleneck here? What tools
do other people use for profiling on Linux.

This might sound like a "then don't do that" situation. The data is
stuff that will be inserted into a temporary table, so I'd rather not
go a different route, but I'm considering switching to a lot of insert
statements or a copy in.

Regards,

d.
--
David Helgason,
Business Development et al.,
Over the Edge I/S (http://otee.dk)
Direct line +45 2620 0663
Main line +45 3264 5049

Re: Large arrays give long lag on server side before command executes

From

David Helgason

Date:

24 September 2004, 13:49:23

Probably should have added the prototypes:

-- the prepared statement
prepare maint_status_recursive(text, integer[], character(32)[],
character(32)[], text[], integer[], bytea[])
    as select * from maint_status_recursive($1,$2,$3,$4,$5,$6,$7);

-- the stored procedure
create or replace function maint_status_recursive(text, integer[],
character(32)[], character(32)[], text[], integer[], bytea[]) returns
setof maint_status_type language plpgsql

Looking at the processes on the machine, I see the process doing BIND
during those seconds.

As mentioned, I'm sending 1200 element arrays as all the array
parameters, and the parsing is taking an amazing 5+ second on an
unloaded server... it is a 300Mhz Mandrake server, but still seems
extreme with 5+ seconds parsing and building a some arrays.

I'd really thankful for pointers to solutions or code to look at.

d.
--
David Helgason,
Business Development et al.,
Over the Edge I/S (http://otee.dk)
Direct line +45 2620 0663
Main line +45 3264 5049

On 24. sep 2004, at 11:09, David Helgason wrote:

> I'm calling one stored procedure with a prepared statement on the
> server with 6 arrays of around 1200 elements each as parameters. The
> parameters are around 220K in total.
>
> This is taking a surprising amount of time. Thus I put a lot of
> logging into the application and in the stored procedure that's
> getting called. It seems that almost all of most of the time is spent
> before the stored procedure is even entered.
>
> What wakes my suspicion that this is not as it should be, is that the
> time spent before the prepared statements starts is as long or longer
> as the time spent inside the stored procedure: doing 1200 inserts into
> a table, a fairly complex outer join with that data, and a loop over
> around the 1200 resulting records.
>
> Enabling lots of logging gives me:
>
> 2004-09-22 20:27:29 [5812] LOG:  REWRITER STATISTICS
> DETAIL:  ! system usage stats:
>         !       0.000004 elapsed 0.000000 user 0.000000 system sec
>         !       [11.850198 user 0.215967 sys total]
>         !       0/0 [0/0] filesystem blocks in/out
>         !       0/0 [0/3001] page faults/reclaims, 0 [0] swaps
>         !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
>         !       0/0 [132/209] voluntary/involuntary context switches
>         ! buffer usage stats:
>         !       Shared blocks:          0 read,          0 written,
> buffer hit rate = 0.00%
>         !       Local  blocks:          0 read,          0 written,
> buffer hit rate = 0.00%
>         !       Direct blocks:          0 read,          0 written
> 2004-09-22 20:27:29 [5812] DEBUG:  ProcessUtility
> 2004-09-22 20:27:29 [5812] DEBUG:  CommitTransactionCommand
> 2004-09-22 20:27:29 [5812] LOG:  duration: 1.982 ms
> 2004-09-22 20:27:30 [5812] DEBUG:  StartTransactionCommand
> 2004-09-22 20:27:35 [5812] NOTICE:  Recursive status for begun at
> 2004-09-22 20:27:35.371659
> 2004-09-22 20:27:39 [5812] LOG:  PARSER STATISTICS
> DETAIL:  ! system usage stats:
>         !       0.000119 elapsed 0.000000 user 0.000000 system sec
>         !       [20.825833 user 0.340948 sys total]
>         !       0/0 [0/0] filesystem blocks in/out
>         !       0/0 [0/3590] page faults/reclaims, 0 [0] swaps
>         !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
>         !       0/0 [273/362] voluntary/involuntary context switches
>         ! buffer usage stats:
>         !       Shared blocks:          0 read,          0 written,
> buffer hit rate = 0.00%
>         !       Local  blocks:          0 read,          0 written,
> buffer hit rate = 0.00%
>         !       Direct blocks:          0 read,          0 written
> CONTEXT:  SQL function "get_asset_guid_safe" during startup
>         PL/pgSQL function "maint_status_recursive" line 198 at
> assignment
>
> What I find interesting is how five seconds pass from the 20:27:30
> debug, where it starts calling my procedure, and the log I print out,
> at the top of the procedure at 20:27:35.
>
> Any suggestions how I go about finding the bottleneck here? What tools
> do other people use for profiling on Linux.
>
> This might sound like a "then don't do that" situation. The data is
> stuff that will be inserted into a temporary table, so I'd rather not
> go a different route, but I'm considering switching to a lot of insert
> statements or a copy in.
>
> Regards,
>
> d.
> --
> David Helgason,
> Business Development et al.,
> Over the Edge I/S (http://otee.dk)
> Direct line +45 2620 0663
> Main line +45 3264 5049
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

Re: Large arrays give long lag on server side before command executes

From

Tom Lane

Date:

24 September 2004, 15:32:20

David Helgason <david@uti.is> writes:
> I'm calling one stored procedure with a prepared statement on the
> server with 6 arrays of around 1200 elements each as parameters. The
> parameters are around 220K in total.

Exactly how are you fetching, building, or otherwise providing the
arrays?  Which PG version is this exactly?

> Any suggestions how I go about finding the bottleneck here? What tools
> do other people use for profiling on Linux.

Rebuild with profiling enabled (make clean; make PROFILE="-pg -DLINUX_PROFILE")
and then use gprof to produce a report from the trace file that the
backend drops when it exits.

If that sounds out of your league, send along a self-contained test case
and I'll be glad to take a look.

> This might sound like a "then don't do that" situation.

I'm betting on some O(N^2) behavior in the array code, but it'll be
difficult to pinpoint without profile results.

            regards, tom lane

Re: Large arrays give long lag on server side before command executes

From

David Helgason

Date:

24 September 2004, 16:39:44

On 24. sep 2004, at 16:32, Tom Lane wrote:
> David Helgason <david@uti.is> writes:
>> Any suggestions how I go about finding the bottleneck here? What tools
>> do other people use for profiling on Linux.
>
> Rebuild with profiling enabled (make clean; make PROFILE="-pg
> -DLINUX_PROFILE")
> and then use gprof to produce a report from the trace file that the
> backend drops when it exits.
>
> If that sounds out of your league, send along a self-contained test
> case
> and I'll be glad to take a look.

I can try to make a self-contained test case, but right now there's a
10e6 dependancies, so it would be great if this is enough:

Here's the gmon.out file that was created after a short run of
postgres, and the output of 'gprof /usr/local/postgres/bin/postgres
gmon.out'. The data didn't make much sense to me though.

I'll try to investigate further.

d.
--
David Helgason,
Business Development et al.,
Over the Edge I/S (http://otee.dk)
Direct line +45 2620 0663
Main line +45 3264 5049

Attachment

gprof data.zip

Re: Large arrays give long lag on server side before command executes

From

Tom Lane

Date:

24 September 2004, 16:51:12

David Helgason <david@uti.is> writes:
> Here's the gmon.out file that was created after a short run of
> postgres, and the output of 'gprof /usr/local/postgres/bin/postgres
> gmon.out'. The data didn't make much sense to me though.

I think you got a gmon.out from the wrong backend run.  This run looks
like it didn't do much except start and exit --- it certainly didn't run
for any five seconds, which is what you were saying the problem case
took.

            regards, tom lane

Re: Large arrays give long lag on server side before command executes

From

Steve Crawford

Date:

25 September 2004, 00:12:54

On Friday 24 September 2004 7:32 am, Tom Lane wrote:
> David Helgason <david@uti.is> writes:
> > I'm calling one stored procedure with a prepared statement on the
> > server with 6 arrays of around 1200 elements each as parameters.
> > The parameters are around 220K in total.
>
> Exactly how are you fetching, building, or otherwise providing the
> arrays?  Which PG version is this exactly?
>
> > Any suggestions how I go about finding the bottleneck here? What
> > tools do other people use for profiling on Linux.
>
> Rebuild with profiling enabled (make clean; make PROFILE="-pg
> -DLINUX_PROFILE") and then use gprof to produce a report from the
> trace file that the backend drops when it exits.
>
> If that sounds out of your league, send along a self-contained test
> case and I'll be glad to take a look.
>
> > This might sound like a "then don't do that" situation.
>
> I'm betting on some O(N^2) behavior in the array code, but it'll be
> difficult to pinpoint without profile results.

Possibly the bug we discussed early last year (IIRC added to todo but
not fixed):
http://archives.postgresql.org/pgsql-performance/2003-01/msg00235.php

Cheers,
Steve

Re: Large arrays give long lag on server side before command executes

From

Tom Lane

Date:

25 September 2004, 00:43:12

Steve Crawford <scrawford@pinpointresearch.com> writes:
> On Friday 24 September 2004 7:32 am, Tom Lane wrote:
>> I'm betting on some O(N^2) behavior in the array code, but it'll be
>> difficult to pinpoint without profile results.

> Possibly the bug we discussed early last year (IIRC added to todo but
> not fixed):
> http://archives.postgresql.org/pgsql-performance/2003-01/msg00235.php

No, it's not the same --- David sent me some results off-list and it
seems ReadArrayStr is the culprit.  I'm not completely sure why yet.

FWIW, the printtup part of your gripe is fixed in CVS tip.

            regards, tom lane

Re: Large arrays give long lag on server side before command executes

From

David Helgason

Date:

26 September 2004, 12:11:26

On 25. sep 2004, at 01:42, Tom Lane wrote:

> Steve Crawford <scrawford@pinpointresearch.com> writes:
>> On Friday 24 September 2004 7:32 am, Tom Lane wrote:
>>> I'm betting on some O(N^2) behavior in the array code, but it'll be
>>> difficult to pinpoint without profile results.
>
>> Possibly the bug we discussed early last year (IIRC added to todo but
>> not fixed):
>> http://archives.postgresql.org/pgsql-performance/2003-01/msg00235.php
>
> No, it's not the same --- David sent me some results off-list and it
> seems ReadArrayStr is the culprit.  I'm not completely sure why yet.

After playing around, I found the culprit. As always, I might just as
well have gone to the code. Too bad C always reads to me like an old
manuscript. I know the language, but I have to stutter through it.
Anyway, these lines should trigger anyones big-O alarm:

/* adt/arrayfuncs.c, line 575ff in v. 7.4.1. In the big switch
statement inside ReadArrayStr() */

                case '\\':
                    {
                        char       *cptr;

                        /* Crunch the string on top of the backslash. */
                        for (cptr = ptr; *cptr != '\0'; cptr++) /* BAD :) */
                            *cptr = *(cptr + 1);
                        if (*ptr == '\0')
                            ereport(ERROR,
                            (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                             errmsg("malformed array literal: \"%s\"", arrayStr)));
                        break;
                    }
                case '\"':
                    {
                        char       *cptr;

                        scanning_string = !scanning_string;
                        /* Crunch the string on top of the quote. */
                        for (cptr = ptr; *cptr != '\0'; cptr++) /* BAD :) */
                            *cptr = *(cptr + 1);
                        /* Back up to not miss following character. */
                        ptr--;
                        break;
                    }


I'm toying with a reimplementation of this algorithm, but it might be
some days before I have the time. Didn't look at the 8.0 beta yet, so
of course someone might have been there already.

I'll be back when I have more.

d.


> FWIW, the printtup part of your gripe is fixed in CVS tip.
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>


--
David Helgason,
Business Development et al.,
Over the Edge I/S (http://otee.dk)
Direct line +45 2620 0663
Main line +45 3264 5049


--
David Helgason,
Business Development et al.,
Over the Edge I/S (http://otee.dk)
Direct line +45 2620 0663
Main line +45 3264 5049

Re: Large arrays give long lag on server side before command executes

From

Tom Lane

Date:

26 September 2004, 16:25:55

David Helgason <david@uti.is> writes:
> On 25. sep 2004, at 01:42, Tom Lane wrote:
>> No, it's not the same --- David sent me some results off-list and it
>> seems ReadArrayStr is the culprit.  I'm not completely sure why yet.

> After playing around, I found the culprit.

I suspected as much but couldn't be sure without seeing your test data.
I take it you have lots of quotes and/or backslashes in your big array
values?

            regards, tom lane

Re: Large arrays give long lag on server side before command executes

From

David Helgason

Date:

26 September 2004, 17:04:57

On 26. sep 2004, at 17:25, Tom Lane wrote:

> David Helgason <david@uti.is> writes:
>> On 25. sep 2004, at 01:42, Tom Lane wrote:
>>> No, it's not the same --- David sent me some results off-list and it
>>> seems ReadArrayStr is the culprit.  I'm not completely sure why yet.
>
>> After playing around, I found the culprit.
>
> I suspected as much but couldn't be sure without seeing your test data.
> I take it you have lots of quotes and/or backslashes in your big array
> values?

Yes, I've got bytea-typed digests in there, and those naturally will
have lots and lots of quoted stuff in them. I get the same – though not
as dire – with text fields, unless when these contain very plain data.

As soon as I wrap my brain around the details of ReadArrayStr() I'll
try to reimplement it. Unless someone intimate with the code gets there
first of course :)

d.

Re: Large arrays give long lag on server side before command executes

From

Tom Lane

Date:

27 September 2004, 02:40:50

David Helgason <david@uti.is> writes:
> As soon as I wrap my brain around the details of ReadArrayStr() I'll
> try to reimplement it. Unless someone intimate with the code gets there
> first of course :)

I've committed a rewrite into CVS tip.  I have not looked at backporting
it to 7.4 though --- since the code in that area has changed since 7.4,
a backport is probably not quite trivial.

            regards, tom lane