Thread: Space management for PGresult

Space management for PGresult

From
Atsushi Ogawa
Date:
In space management for PGresult of libpq, the block size of PGresult
is always PGRESULT_DATA_BLOCKSIZE(2048bytes). Therefore, when a large
result of query is received, malloc is executed many times.

My proposal is to enlarge the size of the block whenever the block is
allocated. The size of first block is PGRESULT_DATA_BLOCKSIZE. And the
size of the following blocks will be doubled until it reaches
PGRESULT_MAX_DATA_BLOCKSIZE.

PGRESULT_MAX_DATA_BLOCKSIZE is new constants. I think that 2Mbytes is
good enough for this constants.

The test result is the following:

Test SQL:
select * from accounts; (It is pgbench's table. scale factor is 10.)

The number of malloc calls at pqResultAlloc:
 8.1.0  : 80542
 patched:    86

Execution time:
 8.1.0  : 6.80 sec
 patched: 6.73 sec

regards,

--- Atsushi Ogawa

Attachment

Re: Space management for PGresult

From
Tom Lane
Date:
Atsushi Ogawa <atsushi.ogawa@gmail.com> writes:
> The number of malloc calls at pqResultAlloc:
>  8.1.0  : 80542
>  patched:    86

> Execution time:
>  8.1.0  : 6.80 sec
>  patched: 6.73 sec

This hardly seems worth adding any complexity for ...

            regards, tom lane

Re: Space management for PGresult

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Atsushi Ogawa <atsushi.ogawa@gmail.com> writes:
> > The number of malloc calls at pqResultAlloc:
> >  8.1.0  : 80542
> >  patched:    86
>
> > Execution time:
> >  8.1.0  : 6.80 sec
> >  patched: 6.73 sec
>
> This hardly seems worth adding any complexity for ...

What about memory usage?  Is there a notorious difference?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Space management for PGresult

From
Atsushi Ogawa
Date:
Alvaro Herrera wrote:
> Tom Lane wrote:
> > Atsushi Ogawa <atsushi.ogawa@gmail.com> writes:
> > > The number of malloc calls at pqResultAlloc:
> > >  8.1.0  : 80542
> > >  patched:    86
> >
> > > Execution time:
> > >  8.1.0  : 6.80 sec
> > >  patched: 6.73 sec
> >
> > This hardly seems worth adding any complexity for ...
>
> What about memory usage?  Is there a notorious difference?

Well, I measured memory usage by attached patch. An allocated(bytes)
is total amounts of allocated memory by pqResultAlloc. An unused(bytes)
is total amounts of PGresult->spaceLeft.

(1)accounts table (4 columns, 1,000,000 tuples)
          malloc calls   allocated(bytes)   unused(bytes)  execution time
--------------------------------------------------------------------------
8.1.0           80,542        164,950,016       2,946,402       6.80 sec.
patched             86        161,478,656         177,650       6.73 sec.

(2)another teble (50 columns, 100,000 tuples)
          malloc calls   allocated(bytes)   unused(bytes)  execution time
--------------------------------------------------------------------------
8.1.0           55,557        113,780,736       8,561,518       6.26 sec.
patched             86        104,855,552          83,307       6.21 sec.

The unused memory increases when the number of columns increases. The
tuple size of PGresult is proportional to the number of columns.

getAnotherTuple() at fe-protocol3.c:
--------------------------------------------------------------------------
        conn->curTuple = (PGresAttValue *)
            pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
--------------------------------------------------------------------------

regards,

--- Atsushi Ogawa

Attachment

Re: Space management for PGresult

From
Tom Lane
Date:
Atsushi Ogawa <atsushi.ogawa@gmail.com> writes:
> (1)accounts table (4 columns, 1,000,000 tuples)
>           malloc calls   allocated(bytes)   unused(bytes)  execution time
> --------------------------------------------------------------------------
> 8.1.0           80,542        164,950,016       2,946,402       6.80 sec.
> patched             86        161,478,656         177,650       6.73 sec.

This hardly seems credible --- your patch would result in more wasted
memory, not less.  It looks to me like the instrumentation you added
assumes that extra space in a malloc block will never be used later,
which of course is not true ...

            regards, tom lane

Re: Space management for PGresult

From
Simon Riggs
Date:
On Wed, 2005-11-23 at 16:21 +0900, Atsushi Ogawa wrote:
> In space management for PGresult of libpq, the block size of PGresult
> is always PGRESULT_DATA_BLOCKSIZE(2048bytes). Therefore, when a large
> result of query is received, malloc is executed many times.
>
> My proposal is to enlarge the size of the block whenever the block is
> allocated. The size of first block is PGRESULT_DATA_BLOCKSIZE. And the
> size of the following blocks will be doubled until it reaches
> PGRESULT_MAX_DATA_BLOCKSIZE.
>
> PGRESULT_MAX_DATA_BLOCKSIZE is new constants. I think that 2Mbytes is
> good enough for this constants.
>
> The test result is the following:
>
> Test SQL:
> select * from accounts; (It is pgbench's table. scale factor is 10.)
>
> The number of malloc calls at pqResultAlloc:
>  8.1.0  : 80542
>  patched:    86
>
> Execution time:
>  8.1.0  : 6.80 sec
>  patched: 6.73 sec
>

What this highlights for me is that we have (IMHO) a strange viewpoint
on allocating result memory, not an optimization issue.

We really ought to be streaming the result back to the user, not
downloading it all into a massive client side chunk of memory. It ought
to be possible to do this with very low memory, and would probably have
the side-effect of reducing time-to-first-row. Then we wouldn't have a
memory allocation issue at all.

Consider what will happen if you do "select * from too_big_table". We'll
just run for ages, then blow memory and fail. (That's what it used to
do, does it still? he asks lazily).

Best Regards, Simon Riggs


Re: Space management for PGresult

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> We really ought to be streaming the result back to the user, not
> downloading it all into a massive client side chunk of memory.

Have you been paying any attention to the multiple previous discussions
of that point?  (Latest was on pgsql-interfaces within the past week.)

            regards, tom lane

Re: Space management for PGresult

From
Simon Riggs
Date:
On Thu, 2005-11-24 at 12:32 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > We really ought to be streaming the result back to the user, not
> > downloading it all into a massive client side chunk of memory.
>
> Have you been paying any attention to the multiple previous discussions
> of that point?  (Latest was on pgsql-interfaces within the past week.)

Clearly not. Thanks for the heads up.

Best Regards, Simon Riggs