Re: [HACKERS] Postgres Speed or lack thereof - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Postgres Speed or lack thereof
Date
Msg-id 7986.917207692@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] Postgres Speed or lack thereof  (Vadim Mikheev <vadim@krs.ru>)
List pgsql-hackers
Vadim Mikheev <vadim@krs.ru> writes:
> Tom Lane wrote:
>> In other words, we're spending a third of our time mallocing and freeing
>> memory.  A tad high, what?

> I added some debug code to palloc/pfree and it shows that for
> INSERT:
> 1. 80% of allocations are made for <= 32 bytes.
> 2. pfree is used for 25% of them only (others are freed
>    after statement/transaction is done).

I had suspected the former (because most of the allocs are
parser nodes) and knew the latter from profile call counts.

> Note that our mmgr adds 16 bytes to each allocation
> (+ some bytes in malloc) - a great overhead, yes?

Youch ... for a list-node-sized request, that's a lot of overhead.

I did some more profiling this weekend, after Magnus Hagander and I
repaired libpq's little problem with calling recv() once per input byte.
Unsurprisingly, memory allocation is still 50% of backend CPU time in
my test consisting of a long series of INSERT statements.  It's less
than that, but still an interesting number, in two other test scenarios
I set up:

(1) SELECT the whole contents of a 6500-row, 38-column table; repeat 5 times.   Memory allocation takes 8.74 of 54.45
backendCPU sec, or 16%.
 

(2) Load a database from a pg_dump script.  Script size is 1.3MB.   This is of course mostly CREATE TABLE, COPY from
stdin,CREATE INDEX.   Memory allocation takes 11.08 of 38.25 sec = 29%.
 

I believe that the reason memory alloc is such a large percentage of the
INSERT scenario is that this is the only one where lots of SQL parsing
is going on.  In the SELECT scenario, most of the palloc/pfree traffic
is temporary field value representations inside printtup (more about
that below).  In the database load scenario, it seems that btree index
building is the biggest hit; the heaviest callers of palloc are:
               0.00    0.01   10642/438019      _bt_searchr [86]               0.00    0.01   13531/438019
heap_formtuple[149]               0.00    0.01   13656/438019      CopyFrom [24]               0.00    0.01
14196/438019     textin [247]               0.01    0.01   18275/438019      datetime_in [42]               0.01
0.03  32595/438019      bpcharin [182]               0.01    0.03   38882/438019      index_formtuple [79]
0.01    0.03   39028/438019      _bt_formitem [212]               0.07    0.16  204564/438019      _bt_setsortkey [78]
 
[88]     1.3    0.14    0.34  438019         palloc [88]

> I added code to allocate a few big (16K-64K) blocks of memory for
> these small allocations to speed up palloc by skiping
> AllocSetAlloc/malloc. New code don't free allocated memory (to make
> bookkeeping fast) but keeping in mind 2. above and memory overhead it
> seems as appropriate thing to do. These code also speed up freeing
> when statement/transaction is done, because of only a few blocks have
> to be freed now.

This is pretty much what I had in mind, but I think that it is not OK
to leave everything to be freed at statement end.  In particular I've
just been looking at printtup(), the guts of SELECT output, and I see
that each field value is palloc'd (by a type-specific output routine)
and then pfree'd by printtup after emitting it.  If you make the pfrees
all no-ops then this means the temporary space needed for SELECT is
at least equal to the total volume of output data ... not good for
huge tables.  (COPY in and out also seem to palloc/pfree each field
value that goes past.)

However, I think we could get around this by making more use of memory
contexts.  The space allocated inside printtup could be palloc'd from a
"temporary" context that gets flushed every so often (say, after every
few hundred tuples of output).  All the pallocs done inside the parser
could go to a "statement" context that is flushed at the end of each
statement.  It'd probably also be a good idea to have two kinds of
memory contexts, the current kind where each block is individually
free-able and the new kind where we only keep track of how to free the
whole context's contents at once.  That way, any particular routine that
was going to allocate a *large* chunk of memory would have a way to free
that chunk explicitly when done with it (but would have to remember
which context it got it from in order to do the free).  For the typical
list-node-sized request, we wouldn't bother with the extra bookkeeping.

I think that palloc() could be defined as always giving you the bulk-
allocated kind of chunk.  pfree() would become a no-op and eventually
could be removed entirely.  Anyone who wants the other kind of chunk
would call a different pair of routines, where the controlling memory
context has to be named at both alloc and free time.  (This approach
gets rid of your concern about the cost of finding which kind of context
a block has been allocated in inside of pfree; we expect the caller
to know that.)

Getting this right is probably going to take some thought and work,
but it looks worthwhile from a performance standpoint.  Maybe for 6.6?

> It shows that we should get rid of system malloc/free and do
> all things in mmgr itself - this would allow us much faster
> free memory contexts at statement/transaction end.

I don't think we can or should stop using malloc(), but we can
ask it for large blocks and do our own allocations inside those
blocks --- was that what you meant?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Q about heap_getattr
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Q about heap_getattr