Re: pgsql: Generational memory allocator - Mailing list pgsql-committers

From Tom Lane
Subject Re: pgsql: Generational memory allocator
Date
Msg-id 21466.1511644760@sss.pgh.pa.us
Whole thread Raw
In response to Re: pgsql: Generational memory allocator  (Andres Freund <andres@anarazel.de>)
Responses Re: pgsql: Generational memory allocator  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-committers
Andres Freund <andres@anarazel.de> writes:
> On 2017-11-25 14:50:41 -0500, Tom Lane wrote:
>> Meanwhile, skink has come back with a success as of 0f2458f, rendering
>> the other mystery even deeper --- although at least this confirms the
>> idea that the crashes we are seeing predate the generation.c patch.

> Hm, wonder whether that could be optimization dependent too.

Evidently :-(.  I examined my compiler's assembly output for the
relevant line, snapbuild.c:780:
   ReorderBufferAddNewTupleCids(builder->reorder, xlrec->top_xid, lsn,
xlrec->target_node,xlrec->target_tid,                                xlrec->cmin, xlrec->cmax,
     xlrec->combocid);
 

which is
movl    12(%rbx), %eax        combocidmovq    16(%rbx), %rcx        target_node.spcNode + dbNodemovq    %rbp, %rdxmovl
 24(%rbx), %r8d        target_node.relNodemovq    56(%r12), %rdimovq    28(%rbx), %r9        target_tid ... 8 bytes
worthmovl   (%rbx), %esi        top_xidmovl    %eax, 16(%rsp)movl    8(%rbx), %eax        cmaxmovl    %eax, 8(%rsp)movl
  4(%rbx), %eax        cminmovl    %eax, (%rsp)call    ReorderBufferAddNewTupleCids
 

I've annotated the fetches from *rbx to match the data structure:

typedef struct xl_heap_new_cid
{   /*    * store toplevel xid so we don't have to merge cids from different    * transactions    */   TransactionId
top_xid;  CommandId    cmin;   CommandId    cmax;
 
   /*    * don't really need the combocid since we have the actual values right in    * this struct, but the padding
makesit free and its useful for    * debugging.    */   CommandId    combocid;
 
   /*    * Store the relfilenode/ctid pair to facilitate lookups.    */   RelFileNode target_node;   ItemPointerData
target_tid;
} xl_heap_new_cid;

and what's apparent is that it's grabbing 8 bytes not just 6 from
target_tid.  So the failure case must occur when the xl_heap_new_cid
WAL record is right at the end of the palloc'd record buffer and
valgrind notices that we're fetching padding bytes.

I suspect that gcc feels within its rights to do this because the
size/alignment of xl_heap_new_cid is a multiple of 4 bytes, so that
there ought to be 2 padding bytes at the end.

We could possibly prevent that by marking the whole xl_heap_new_cid
struct as packed/align(2), the same way ItemPointerData is marked,
but that would render accesses to all of its fields inefficient
on alignment-sensitive machines.  Moreover, if we go that route,
we have a boatload of other xlog record formats with similar hazards.

Instead I propose that we should make sure that the palloc request size
for XLogReaderState->main_data is always maxalign'd.  The existing
behavior in DecodeXLogRecord of palloc'ing it only just barely big
enough for the current record seems pretty brain-dead performance-wise
even without this consideration.  Generally, if we need to enlarge
that buffer, we should enlarge it significantly, IMO.

BTW, that comment in the middle of xl_heap_new_cid about "padding
makes this field free" seems entirely wrong.  By my count the
struct is currently 34 bytes, so if by padding it's talking about
maxalign alignment of the whole struct, that field is costing us 8 bytes
on maxalign-8 machines.  There's certainly no internal alignment
that would make it free.
        regards, tom lane


pgsql-committers by date:

Previous
From: Tom Lane
Date:
Subject: pgsql: Replace raw timezone source data with IANA's new compactformat.
Next
From: Tom Lane
Date:
Subject: Re: pgsql: Generational memory allocator