Thread: Performance problem in aset.c

Performance problem in aset.c

From
Tom Lane
Date:
While testing the just-committed executor memory leak fixes, I observe
that there are still slow leaks in some operations such as sorts on
large text fields.  With a little digging, it seems that the leak must
be blamed on the behavior of aset.c for large chunks.  Chunks between
1K and 64K (ALLOC_SMALLCHUNK_LIMIT and ALLOC_BIGCHUNK_LIMIT) are all
kept in the same freelist, and when a request is made for an amount
of memory in that range, aset.c gives back the first chunk that's
big enough from that freelist.

For example, let's say you are allocating and freeing roughly equal
numbers of 2K and 10K blocks.  Over time, about half of the 2K
requests will be answered by returning a 10K block --- which will
prevent the next 10K request from being filled by recycling that
10K block, causing a new 10K block to be allocated.  If aset.c
had given back a 2K block whenever possible, the net memory usage
would be static, but since it doesn't, the usage gradually creeps
up as more and more chunks are used inefficiently.  Our actual
maximum memory usage might be m * 2K plus n * 10K but the allocated
space will creep towards (m + n) * 10K where *all* the active blocks
are the larger size.

A straightforward solution would be to scan the entire freelist
and give back the smallest block that's big enough for the request.
That could be pretty slow (and induce lots of swap thrashing) so
I don't like it much.

Another idea is to split the returned chunk and put the wasted part back
as a smaller free chunk, but I don't think this solves the problem; it
just means that the wasted space ends up on a small-chunk freelist, not
that you can actually do anything with it.  But maybe someone can figure
out a variant that works better.

It might be that this behavior won't be seen much in practice and we
shouldn't slow down aset.c at all to try to deal with it.  But I think
it's worth looking for solutions that won't slow typical cases much.
        regards, tom lane


Re: Performance problem in aset.c

From
Chris Bitmead
Date:
Can you have a set of free lists. Like chunks of 2^8 or bigger,
2^9, 2^10, 2^11 etc. It should be faster than finding the first block
like now and error would be mostly bounded to a factor of 2.

Tom Lane wrote:
> 
> While testing the just-committed executor memory leak fixes, I observe
> that there are still slow leaks in some operations such as sorts on
> large text fields.  With a little digging, it seems that the leak must
> be blamed on the behavior of aset.c for large chunks.  Chunks between
> 1K and 64K (ALLOC_SMALLCHUNK_LIMIT and ALLOC_BIGCHUNK_LIMIT) are all
> kept in the same freelist, and when a request is made for an amount
> of memory in that range, aset.c gives back the first chunk that's
> big enough from that freelist.
> 
> For example, let's say you are allocating and freeing roughly equal
> numbers of 2K and 10K blocks.  Over time, about half of the 2K
> requests will be answered by returning a 10K block --- which will
> prevent the next 10K request from being filled by recycling that
> 10K block, causing a new 10K block to be allocated.  If aset.c
> had given back a 2K block whenever possible, the net memory usage
> would be static, but since it doesn't, the usage gradually creeps
> up as more and more chunks are used inefficiently.  Our actual
> maximum memory usage might be m * 2K plus n * 10K but the allocated
> space will creep towards (m + n) * 10K where *all* the active blocks
> are the larger size.
> 
> A straightforward solution would be to scan the entire freelist
> and give back the smallest block that's big enough for the request.
> That could be pretty slow (and induce lots of swap thrashing) so
> I don't like it much.
> 
> Another idea is to split the returned chunk and put the wasted part back
> as a smaller free chunk, but I don't think this solves the problem; it
> just means that the wasted space ends up on a small-chunk freelist, not
> that you can actually do anything with it.  But maybe someone can figure
> out a variant that works better.
> 
> It might be that this behavior won't be seen much in practice and we
> shouldn't slow down aset.c at all to try to deal with it.  But I think
> it's worth looking for solutions that won't slow typical cases much.
> 
>                         regards, tom lane


Re: Performance problem in aset.c

From
Philip Warner
Date:
At 23:48 11/07/00 -0400, Tom Lane wrote:
>Another idea is to split the returned chunk and put the wasted part back
>as a smaller free chunk, but I don't think this solves the problem; it
>just means that the wasted space ends up on a small-chunk freelist, not
>that you can actually do anything with it.  But maybe someone can figure
>out a variant that works better.

Can you maintain one free list for each power of 2 (which it might already
be doing by the look of it), and always allocate the max size for the list.
Then when you want a 10k chunk, you get a 16k chunk, but you know from the
request size which list to go to, and anything on the list will satisfy the
requirement.

In the absolute worst case this will double your memory consumption, and on
average should only use 50% more memory. If this worries you, then maintain
twice as many lists and halve the wastage. 

The process of finding the right list is just a matter of examining the MSB
of the chunk size. If you want twice the number of lists, then look at the
top two bits etc.




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Performance problem in aset.c

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> Can you maintain one free list for each power of 2 (which it might already
> be doing by the look of it), and always allocate the max size for the list.
> Then when you want a 10k chunk, you get a 16k chunk, but you know from the
> request size which list to go to, and anything on the list will satisfy the
> requirement.

That is how it works for small chunks (< 1K with the current
parameters).  I don't think we want to do it that way for really
huge chunks though.

Maybe the right answer is to eliminate the gap between small chunks
(which basically work as Philip sketches above) and huge chunks (for
which we fall back on malloc).  The problem is with the stuff in
between, for which we have a kind of half-baked approach...
        regards, tom lane


Re: Performance problem in aset.c

From
Alfred Perlstein
Date:
* Tom Lane <tgl@sss.pgh.pa.us> [000711 22:23] wrote:
> Philip Warner <pjw@rhyme.com.au> writes:
> > Can you maintain one free list for each power of 2 (which it might already
> > be doing by the look of it), and always allocate the max size for the list.
> > Then when you want a 10k chunk, you get a 16k chunk, but you know from the
> > request size which list to go to, and anything on the list will satisfy the
> > requirement.
> 
> That is how it works for small chunks (< 1K with the current
> parameters).  I don't think we want to do it that way for really
> huge chunks though.
> 
> Maybe the right answer is to eliminate the gap between small chunks
> (which basically work as Philip sketches above) and huge chunks (for
> which we fall back on malloc).  The problem is with the stuff in
> between, for which we have a kind of half-baked approach...

Er, are you guys seriously layering your own general purpose allocator
over the OS/c library allocator?

Don't do that!

The only time you may want to do this is if you're doing a special purpose
allocator like a zone or slab allocator, otherwise it's a pessimization.
The algorithms you're discussing to fix these leaks have been implemented
in almost any modern allocator that I know of.

Sorry if i'm totally off base, but "for which we fall back on malloc"
makes me wonder what's going on here.

thanks,
-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


Re: Performance problem in aset.c

From
Philip Warner
Date:
At 01:21 12/07/00 -0400, Tom Lane wrote:
>Philip Warner <pjw@rhyme.com.au> writes:
>> Can you maintain one free list for each power of 2 (which it might already
>> be doing by the look of it), and always allocate the max size for the list.
>> Then when you want a 10k chunk, you get a 16k chunk, but you know from the
>> request size which list to go to, and anything on the list will satisfy the
>> requirement.
>
>Maybe the right answer is to eliminate the gap between small chunks
>(which basically work as Philip sketches above) and huge chunks (for
>which we fall back on malloc).  The problem is with the stuff in
>between, for which we have a kind of half-baked approach...

That sounds good to me. 

You *might* want to enable some kind of memory statistics in shared memory
(for a mythical future repoting tool) so you can see how many memory
allocations fall into the 'big chunk' range, and adjust your definition of
'big chunk' appropriately.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Performance problem in aset.c

From
Philip Warner
Date:
At 22:33 11/07/00 -0700, Alfred Perlstein wrote:
>
>The only time you may want to do this is if you're doing a special purpose
>allocator like a zone or slab allocator, otherwise it's a pessimization.

That is why it is done (I believe) - memory can be allocated using one
context then freed as a block when the context is no longer used.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Performance problem in aset.c

From
JanWieck@t-online.de (Jan Wieck)
Date:
Alfred Perlstein wrote:
> * Tom Lane <tgl@sss.pgh.pa.us> [000711 22:23] wrote:
> > Philip Warner <pjw@rhyme.com.au> writes:
> > > Can you maintain one free list for each power of 2 (which it might already
> > > be doing by the look of it), and always allocate the max size for the list.
> > > Then when you want a 10k chunk, you get a 16k chunk, but you know from the
> > > request size which list to go to, and anything on the list will satisfy the
> > > requirement.
> >
> > That is how it works for small chunks (< 1K with the current
> > parameters).  I don't think we want to do it that way for really
> > huge chunks though.
> >
> > Maybe the right answer is to eliminate the gap between small chunks
> > (which basically work as Philip sketches above) and huge chunks (for
> > which we fall back on malloc).  The problem is with the stuff in
> > between, for which we have a kind of half-baked approach...
>
> Er, are you guys seriously layering your own general purpose allocator
> over the OS/c library allocator?
>
> Don't do that!
>
> The only time you may want to do this is if you're doing a special purpose
> allocator like a zone or slab allocator, otherwise it's a pessimization.
> The algorithms you're discussing to fix these leaks have been implemented
> in almost any modern allocator that I know of.
>
> Sorry if i'm totally off base, but "for which we fall back on malloc"
> makes me wonder what's going on here.
   To clearify this:
   I  developed  this  in aset.c because of the fact that we use   alot (really alot) of very small  chunks  beeing
palloc()'d.  Any  allocation  must  be  remembered in some linked lists to   know what to free at memory context reset
ordestruction.  In   the  old  version,  every  however small amount was allocated   using malloc() and remembered
separatelyin one huge List for   the  context.  Traversing  this  list was awfully slow when a   context said bye. And
Isaw no way to speedup this traversal.
 
   With  the  actual concept, only big chunks are remembered for   their  own.   All  small  allocations  aren't
tracked  that   accurately  and  memory  context destruction simply can throw   away all the blocks allocated for it.
 
   At the time I implemented it it gained a speedup of ~10%  for   the  regression  test.  It's  an approach of gaining
speedby   wasting memory.
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: Performance problem in aset.c

From
Bruce Momjian
Date:
> At 23:48 11/07/00 -0400, Tom Lane wrote:
> >Another idea is to split the returned chunk and put the wasted part back
> >as a smaller free chunk, but I don't think this solves the problem; it
> >just means that the wasted space ends up on a small-chunk freelist, not
> >that you can actually do anything with it.  But maybe someone can figure
> >out a variant that works better.
> 
> Can you maintain one free list for each power of 2 (which it might already
> be doing by the look of it), and always allocate the max size for the list.
> Then when you want a 10k chunk, you get a 16k chunk, but you know from the
> request size which list to go to, and anything on the list will satisfy the
> requirement.

Sounds like the BSD malloc memory manager.  I can post details, or the
pages from a kernel book I have about it.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Performance problem in aset.c

From
Karel Zak
Date:
On Tue, 11 Jul 2000, Tom Lane wrote:

> A straightforward solution would be to scan the entire freelist
> and give back the smallest block that's big enough for the request.
> That could be pretty slow (and induce lots of swap thrashing) so
> I don't like it much.
Yes. It is right solution. A little better chunk selection can be allowed
with more freelist than current 8. If we will 16 free lists a chunk real 
size and request size will more similar.

> Another idea is to split the returned chunk and put the wasted part back
> as a smaller free chunk, but I don't think this solves the problem; it
> just means that the wasted space ends up on a small-chunk freelist, not
> that you can actually do anything with it.  But maybe someone can figure
> out a variant that works better.
If I good understand it is a little like my idea with one-free-chunk from
block residual space.
It is not bad idea, but we have aligned chunks now. A question is if in 
wasted part will always space for new *aligned* chunk. And second question,
not will this method create small and smaller chunks? For example you 
after 1000 free/alloc you will have very much small chunks (from the wasted 
part).

The chunk from wasted part is good, but you must have usage of this chunks.

*IMHO* current new mem design create new demand on context. In old design
we used one context for more different proccess and request-allocation-size 
was more heterogeneous. But now, we use for specific operetions specific 
contexts and it probably very often create context that use very homogeneous
chunks. For example if a testing my query cache 90% of all alocation are in
the range 16-32b and one cached plan need 2000-5000b --- effect it that
query cache not need 8Kb blocks ...etc. Very simular situation will in 
other parts in PG.Tom add to AllocSet context defaul/max block size setting for each context.
It is good and it is first step to more specific context setting. Hmm, 
I haven't idea for some next step now... But something setting is needful 
for chunks operations (for example primary-default chunk size and from this
frequent freelist, big chunk limit setting ..etc.)
                    Karel