Re: segmentation fault when cassert enabled - Mailing list pgsql-hackers

From Jehan-Guillaume de Rorthais
Subject Re: segmentation fault when cassert enabled
Date
Msg-id 20191105172918.3e32a446@firost
Whole thread Raw
In response to Re: segmentation fault when cassert enabled  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: segmentation fault when cassert enabled
List pgsql-hackers
On Fri, 25 Oct 2019 12:28:38 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Jehan-Guillaume de Rorthais <jgdr@dalibo.com> writes:
> > When investigating for the bug reported in thread "logical replication -
> > negative bitmapset member not allowed", I found a way to seg fault
> > postgresql only when cassert is enabled.
> > ...
> > I hadn't time to digg further yet. However, I don't understand why this
> > crash is triggered when cassert is enabled.  
> 
> Most likely, it's not so much assertions that provoke the crash as
> CLOBBER_FREED_MEMORY, ie the actual problem here is use of already-freed
> memory.

Thank you. Indeed, enabling CLOBBER_FREED_MEMORY on its own is enough to
trigger the segfault.

In fact, valgrind detect it as an uninitialised value, no matter
CLOBBER_FREED_MEMORY is defined or not:

 Conditional jump or move depends on uninitialised value(s)
    at 0x43F410: slot_modify_cstrings (worker.c:398)
    by 0x43FBE9: apply_handle_update (worker.c:744)
    by 0x440088: apply_dispatch (worker.c:968)
    by 0x4405D7: LogicalRepApplyLoop (worker.c:1175)
    by 0x440CD0: ApplyWorkerMain (worker.c:1733)
    by 0x411C34: StartBackgroundWorker (bgworker.c:834)
    by 0x41EA24: do_start_bgworker (postmaster.c:5763)
    by 0x41EB6F: maybe_start_bgworkers (postmaster.c:5976)
    by 0x41F562: sigusr1_handler (postmaster.c:5161)
    by 0x48A072F: ??? (in /lib/x86_64-linux-gnu/libpthread-2.28.so)
    by 0x4B31FF6: select (select.c:41)
    by 0x41FDDE: ServerLoop (postmaster.c:1668)
  Uninitialised value was created by a heap allocation
    at 0x5C579B: palloc (mcxt.c:949)
    by 0x437116: logicalrep_rel_open (relation.c:270)
    by 0x43FA8F: apply_handle_update (worker.c:684)
    by 0x440088: apply_dispatch (worker.c:968)
    by 0x4405D7: LogicalRepApplyLoop (worker.c:1175)
    by 0x440CD0: ApplyWorkerMain (worker.c:1733)
    by 0x411C34: StartBackgroundWorker (bgworker.c:834)
    by 0x41EA24: do_start_bgworker (postmaster.c:5763)
    by 0x41EB6F: maybe_start_bgworkers (postmaster.c:5976)
    by 0x41F562: sigusr1_handler (postmaster.c:5161)
    by 0x48A072F: ??? (in /lib/x86_64-linux-gnu/libpthread-2.28.so)
    by 0x4B31FF6: select (select.c:41)

My best bet so far is that logicalrep_relmap_invalidate_cb is not called after
the DDL on the subscriber so the relmap cache is not invalidated. So we end up
with slot->tts_tupleDescriptor->natts superior than rel->remoterel->natts in
slot_store_cstrings, leading to the overflow on attrmap and the sigsev.

I hadn't follow this path yet.

By the way, I noticed attrmap is declared as AttrNumber * in struct
LogicalRepRelMapEntry, AttrNumber being typedef'd as an int16. However, attrmap
is allocated based on sizeof(int) in logicalrep_rel_open:

  entry->attrmap = palloc(desc->natts * sizeof(int));

It doesn't look like a major problem, it just allocates more memory than
needed.

Regards,



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: pgbench - extend initialization phase control
Next
From: rtorre@carto.com
Date:
Subject: Re: [Proposal] Arbitrary queries in postgres_fdw