Re: PATCH: two slab-like memory allocators - Mailing list pgsql-hackers

From Petr Jelinek
Subject Re: PATCH: two slab-like memory allocators
Date
Msg-id 885540a7-ac72-5a21-6dae-a4d5b50672da@2ndquadrant.com
Whole thread Raw
In response to Re: PATCH: two slab-like memory allocators  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: PATCH: two slab-like memory allocators
List pgsql-hackers
On 25/09/16 22:17, Tomas Vondra wrote:
> On 09/25/2016 08:48 PM, Petr Jelinek wrote:
> 
>> Slab:
>> In general it seems understandable, the initial description helps to
>> understand what's happening well enough.
>>
>> One thing I don't understand however is why the freelist is both
>> array and doubly linked list and why there is new implementation of
>> said doubly linked list given that we have dlist.
>>
> 
> Hmm, perhaps that should be explained better.
> 
> In AllocSet, we only have a global freelist of chunks, i.e. we have a
> list of free chunks for each possible size (there's 11 sizes, starting
> with 8 bytes and then doubling the size). So freelist[0] is a list of
> free 8B chunks, freelist[1] is a list of free 16B chunks, etc.
> 
> In Slab, the freelist has two levels - first there's a bitmap on each
> block (which is possible, as the chunks have constant size), tracking
> which chunks of that particular block are free. This makes it trivial to
> check that all chunks on the block are free, and free the whole block
> (which is impossible with AllocSet).
> 
> Second, the freelist at the context level tracks blocks with a given
> number of free chunks - so freelist[0] tracks completely full blocks,
> freelist[1] is a list of blocks with 1 free chunk, etc. This is used to
> reuse space on almost full blocks first, in the hope that some of the
> less full blocks will get completely empty (and freed to the OS).
> 
> Is that clear?

Ah okay, makes sense, the documentation of this could be improved then
though as it's all squashed into single sentence that wasn't quite clear
for me.

> 
>>> +/*
>>> + * SlabChunk
>>> + *        The prefix of each piece of memory in an SlabBlock
>>> + *
>>> + * NB: this MUST match StandardChunkHeader as defined by
>>> utils/memutils.h.
>>> + */
>>
>> Is this true? Why? And if it is then why doesn't the SlabChunk
>> actually match the StandardChunkHeader?
> 
> It is true, a lot of stuff in MemoryContext infrastructure relies on
> that. For example when you do pfree(ptr), we actually do something like
> 
>    header = (StandardChunkHeader*)(ptr - sizeof(StandardChunkHeader))
> 
> to get the chunk header - which includes pointer to the memory context
> and other useful stuff.
> 
> This also means we can put additional fields before StandardChunkHeader
> as that does not break this pointer arithmetic, i.e. SlabChunkData is
> effectively defined like this:
> 
> typedef struct SlabChunkData
> {
>     /* block owning this chunk */
>     void *block;
> 
>     /* standard header */
>     StandardChunkHeader header;
> } SlabChunkData;
> 

Yes but your struct then does not match StandardChunkHeader exactly so
it should be explained in more detail (the aset.c where this comment is
also present has struct that matches StandardChunkHeader so it's
sufficient there).

>>
>>> +static void
>>> +SlabCheck(MemoryContext context)
>>> +{
>>> +    /* FIXME */
>>> +}
>>
>> Do you plan to implement this interface?
>>
> 
> Yes, although I'm not sure what checks should go there. The only thing I
> can think of right now is checking that the number of free chunks on a
> block (according to the bitmap) matches the freelist index.
> 

Yeah this context does not seem like it needs too much checking. The
freelist vs free chunks check sounds ok to me. I guess GenSlab then will
call the checks for the underlying contexts.

>>> +#define SLAB_DEFAULT_BLOCK_SIZE        8192
>>> +#define SLAB_LARGE_BLOCK_SIZE        (8 * 1024 * 1024)
>>
>> I am guessing this is based on max_cached_tuplebufs? Maybe these
>> could be written with same style?
>>
> 
> Not sure I understand what you mean by "based on"? I don't quite
> remember how I came up with those constants, but I guess 8kB and 8MB
> seemed like good values.
> 
> Also, what style you mean? I've used the same style as for ALLOCSET_*
> constants in the same file.
> 

I mean using 8 * 1024 for SLAB_DEFAULT_BLOCK_SIZE so that it's more
readable.  ALLOCSET_* does that too (with exception of
ALLOCSET_SEPARATE_THRESHOLD which I have no idea why it's different from
rest of the code).

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Rahila Syed
Date:
Subject: Re: Optimization for lazy_scan_heap
Next
From: Thomas Munro
Date:
Subject: Re: [GENERAL] C++ port of Postgres