Improving the memory allocator - Mailing list pgsql-hackers

From Andres Freund
Subject Improving the memory allocator
Date
Msg-id 201104252345.33408.andres@anarazel.de
Whole thread Raw
Responses Re: Improving the memory allocator  (Robert Haas <robertmhaas@gmail.com>)
Re: Improving the memory allocator  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Improving the memory allocator  (Greg Smith <greg@2ndQuadrant.com>)
List pgsql-hackers
Hi,

In the thread http://archives.postgresql.org/message-id/4DA69D60.4000108@2ndquadrant.com /
"Single client performance on trivial SELECTs" I wondered whether it might be
possible to increase the performance of the current allocator by using some
slab allocator like concept.

While avoiding doing my tax returns and having a long and delayed train ride
I started to play around.

The profile I used in this case was:

pgbench -h /tmp/ -p5433 -s 30 pgbench -S -T 20

I hacked together a patch which records every allocation to a file. That patch
obviously slows everything down quite a bit so that I only get about a third
of the normal tps.

The current profile for that looks like that (hope thats recognizable). I
guess the only unclear column is "compile_time". Thats the result of
__builtin_constant_p(size) for the size argument of the various allocation
operations which is true for the case where size is known at compile time.

          action          |       file       |                 func                 | line  | size  | compile_time |
count
--------------------------+------------------+--------------------------------------+-------+-------+--------------+--------
chunk_alloc             | list.c           | new_list                             |    68 |    16 | t            |
940753chunk_alloc              | list.c           | new_list                             |    72 |    24 | t
|940753 chunk_alloc              | list.c           | new_tail_cell                        |   112 |    16 | t
 | 202908 chunk_alloc_zero         | bitmapset.c      | bms_make_singleton                   |   189 |     8 | f
   | 129122 chunk_alloc_zero_aligned | value.c          | makeString                           |    55 |    16 | t
     | 129122 chunk_alloc_zero_aligned | makefuncs.c      | makeVar                              |    73 |    40 | t
       | 110676 chunk_alloc_zero_aligned | makefuncs.c      | makeTargetEntry                      |   216 |    48 | t
         |  92230 chunk_alloc              | stringinfo.c     | initStringInfo                       |    50 |  1024 |
t           |  73799 chunk_alloc              | bitmapset.c      | bms_copy                             |   119 |     8
|f            |  73784 chunk_alloc              | lock.c           | LockAcquireExtended                  |   559 |
128| t            |  55441 chunk_alloc              | nodeFuncs.c      | expression_tree_mutator              |  2036 |
  40 | t            |  55338 chunk_alloc              | snapmgr.c        | PushActiveSnapshot                   |   282
|   24 | t            |  55338 chunk_alloc              | nbtsearch.c      | _bt_search                           |
121|    24 | t            |  36908 chunk_alloc              | resowner.c       | ResourceOwnerEnlargeCatCacheRefs     |
 612 |   128 | t            |  36893 chunk_alloc              | resowner.c       | ResourceOwnerEnlargeRelationRefs
|  754 |   128 | t            |  36893 chunk_alloc_zero         | resowner.c       | ResourceOwnerCreate
 |   135 |   168 | t            |  36893 chunk_alloc              | nodeFuncs.c      | expression_tree_mutator
   |  2027 |    40 | t            |  36892 chunk_alloc              | snapmgr.c        | CopySnapshot
     |   217 |    72 | f            |  36892 chunk_alloc_zero_aligned | copyfuncs.c      | _copyVar
       |  1041 |    40 | t            |  36892 chunk_alloc_zero_aligned | equivclass.c     | add_eq_member
         |   452 |    32 | t            |  36892 chunk_alloc_zero_aligned | execQual.c       | ExecInitExpr
           |  4231 |    24 | t            |  36892 chunk_alloc_zero_aligned | execTuples.c     | MakeTupleTableSlot
             |   118 |    96 | t            |  36892 chunk_alloc_zero_aligned | gram.y           | makeColumnRef
               | 12286 |    24 | t            |  36892 chunk_alloc              | list.c           | new_head_cell
                 |    93 |    16 | t            |  18529 chunk_alloc              | genam.c          |
RelationGetIndexScan                |    77 |   104 | t            |  18489 chunk_alloc              | nbtree.c
|btbeginscan                          |   385 |  6608 | t            |  18489 chunk_alloc              | tupdesc.c
 | CreateTemplateTupleDesc              |    63 |   160 | f            |  18479 chunk_alloc              | genam.c
   | RelationGetIndexScan                 |    89 |    72 | f            |  18471 chunk_alloc              | nbtree.c
     | btbeginscan                          |   388 |    72 | f            |  18471 chunk_alloc              |
resowner.c      | ResourceOwnerEnlargeBuffers          |   524 |    64 | t            |  18448 chunk_alloc
|trigger.c        | AfterTriggerBeginXact                |  3607 |   112 | t            |  18447 chunk_alloc
 | trigger.c        | AfterTriggerBeginXact                |  3619 |   192 | t            |  18447
 

splitted to contexts:
                  self                   |          action          |       file       |                 func
     | line  | size  | compile_time | count
-----------------------------------------+--------------------------+------------------+--------------------------------------+-------+-------+--------------+--------
MessageContext                         | chunk_alloc              | list.c           | new_list
   |    68 |    16 | t            | 848520 MessageContext                          | chunk_alloc              | list.c
        | new_list                             |    72 |    24 | t            | 848520 MessageContext
      | chunk_alloc              | list.c           | new_tail_cell                        |   112 |    16 | t
 | 166016 MessageContext                          | chunk_alloc_zero         | bitmapset.c      | bms_make_singleton
              |   189 |     8 | f            | 129122 MessageContext                          |
chunk_alloc_zero_aligned| value.c          | makeString                           |    55 |    16 | t            |
129122MessageContext                          | chunk_alloc_zero_aligned | makefuncs.c      | makeVar
          |    73 |    40 | t            | 110676 ExecutorState                           | chunk_alloc              |
list.c          | new_list                             |    68 |    16 | t            |  92230 ExecutorState
              | chunk_alloc              | list.c           | new_list                             |    72 |    24 | t
         |  92230 MessageContext                          | chunk_alloc_zero_aligned | makefuncs.c      |
makeTargetEntry                     |   216 |    48 | t            |  92230 MessageContext                          |
chunk_alloc             | bitmapset.c      | bms_copy                             |   119 |     8 | f            |
73784TopMemoryContext                        | chunk_alloc              | lock.c           | LockAcquireExtended
         |   559 |   128 | t            |  55441 MessageContext                          | chunk_alloc              |
nodeFuncs.c     | expression_tree_mutator              |  2036 |    40 | t            |  55338 TopTransactionContext
              | chunk_alloc              | snapmgr.c        | PushActiveSnapshot                   |   282 |    24 | t
         |  55338 MessageContext                          | chunk_alloc              | stringinfo.c     |
initStringInfo                      |    50 |  1024 | t            |  36894 TopMemoryContext                        |
chunk_alloc             | resowner.c       | ResourceOwnerEnlargeCatCacheRefs     |   612 |   128 | t            |
36893TopMemoryContext                        | chunk_alloc              | resowner.c       |
ResourceOwnerEnlargeRelationRefs    |   754 |   128 | t            |  36893 TopMemoryContext                        |
chunk_alloc_zero        | resowner.c       | ResourceOwnerCreate                  |   135 |   168 | t            |
36893ExecutorState                           | chunk_alloc              | list.c           | new_tail_cell
         |   112 |    16 | t            |  36892 ExecutorState                           | chunk_alloc              |
nbtsearch.c     | _bt_search                           |   121 |    24 | t            |  36892 ExecutorState
              | chunk_alloc              | stringinfo.c     | initStringInfo                       |    50 |  1024 | t
         |  36892 ExecutorState                           | chunk_alloc_zero_aligned | execQual.c       | ExecInitExpr
                      |  4231 |    24 | t            |  36892 ExecutorState                           |
chunk_alloc_zero_aligned| execTuples.c     | MakeTupleTableSlot                   |   118 |    96 | t            |
36892MessageContext                          | chunk_alloc              | nodeFuncs.c      | expression_tree_mutator
         |  2027 |    40 | t            |  36892 MessageContext                          | chunk_alloc_zero_aligned |
copyfuncs.c     | _copyVar                             |  1041 |    40 | t            |  36892 MessageContext
              | chunk_alloc_zero_aligned | equivclass.c     | add_eq_member                        |   452 |    32 | t
         |  36892 MessageContext                          | chunk_alloc_zero_aligned | gram.y           | makeColumnRef
                      | 12286 |    24 | t            |  36892 TopTransactionContext                   | chunk_alloc
        | snapmgr.c        | CopySnapshot                         |   217 |    72 | f            |  36892
TopMemoryContext                       | chunk_alloc              | resowner.c       | ResourceOwnerEnlargeBuffers
   |   524 |    64 | t            |  18448 ExecutorState                           | chunk_alloc              | genam.c
        | RelationGetIndexScan                 |    77 |   104 | t            |  18447 ExecutorState
      | chunk_alloc              | genam.c          | RelationGetIndexScan                 |    89 |    72 | f
 |  18447 ExecutorState                           | chunk_alloc              | nbtree.c         | btbeginscan
              |   385 |  6608 | t            |  18447 ExecutorState                           | chunk_alloc
| nbtree.c         | btbeginscan                          |   388 |    72 | f            |  18447 MessageContext
                 | chunk_alloc              | list.c           | new_head_cell                        |    93 |    16 |
t           |  18447 TopTransactionContext                   | chunk_alloc              | trigger.c        |
AfterTriggerBeginXact               |  3607 |   112 | t            |  18447 TopTransactionContext                   |
chunk_alloc             | trigger.c        | AfterTriggerBeginXact                |  3619 |   192 | t            |
18447

1:
For that allocation profile I found that one significant part of the costs is
AllocSetFreeIndex (which is inlined in optimized mode, but still).

If we would move more stuff to the headers those very common cases with
compile time known sizes could eliminate the AllocSetFreeIndex computation
at runtime as its only dependent on the size.

The problem with that is obviously that it would violate the whole mctx.c
abstraction as it has to be known at compile time which memory manager is
used.

Are we willing to reduce that abstraction?

2:
aset.c fragments horribly for longliving contexts. I haven't generated
sensibly accessible data for that as I currently just store it in an sql
database and sql (at least my queries :P) isn't really that efficient
for figuring out when how much was allocated from that:                            Table "public.allocations"    Column
  |  Type   |                        Modifiers
--------------+---------+----------------------------------------------------------id           | integer | not null
defaultnextval('allocations_id_seq'::regclass) action       | text    |  parent       | text    |  self         | text
 |  size         | bigint  |  addr         | bigint  |  file         | text    |  func         | text    |  line
| integer |  compile_time | boolean | 
 

schema when the data is sensibly large. A sensible benchmark easily produces
three digit million rows...

I guess some custom implementation for evaluating that data should be way much
better at that.

That fragmentation causes two main problems. 1 space wastage and 2. (more 
importantly imo) cache misses.

3:
Given the frequentness of very small allocations the current space overhead
of at least 16 bytes on 64bit Platforms seems quite high.

I don't think its too hard to devise a scheme (actually I have started that)
where the overhead is only sizeof(*void). But again that would reduce the 
the abstraction because it would make it harder to implement something like
pfree/repalloc/GetMemoryChunkContext which need to figure out the context
from a pointer.

If we had a separate api for small allocations we could even devise a schema
where a single chunk wouldn't have any memory overhead, but that seems to
complicated for now.


So after all this my question basically is: How important do we think the
mctx.c abstraction is?
I personally found the speed of AllocSetAlloc being the most limiting factor
in several workloads so I am willing to sacrifice some abstraction to gain
speed here. Especially as I hope its possible to write a single allocator
which is "good enough" for everything.

I currently started working on two pieces:
* A new memory manager implementation. For that the equivalent ofAllocSetFreeIdx seems to be the biggest limit so far.
Itsnot really readyfor anything yet though.
 

* recording and replaying MemoryContext* to get a somewhat reasonableand reproducable micro benchmark. That
unfortunatelydoesn't model theeffects of cache misses that well but I got no better idea so far.
 

Any opinions, input, ideas?

Thanks for reading,

Andres


pgsql-hackers by date:

Previous
From: Darren Duncan
Date:
Subject: Re: "stored procedures" - use cases?
Next
From: Tom Lane
Date:
Subject: Re: branching for 9.2devel