Re: segmentation fault when cassert enabled - Mailing list pgsql-hackers
From | Jehan-Guillaume de Rorthais |
---|---|
Subject | Re: segmentation fault when cassert enabled |
Date | |
Msg-id | 20191105172918.3e32a446@firost Whole thread Raw |
In response to | Re: segmentation fault when cassert enabled (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: segmentation fault when cassert enabled
|
List | pgsql-hackers |
On Fri, 25 Oct 2019 12:28:38 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jehan-Guillaume de Rorthais <jgdr@dalibo.com> writes: > > When investigating for the bug reported in thread "logical replication - > > negative bitmapset member not allowed", I found a way to seg fault > > postgresql only when cassert is enabled. > > ... > > I hadn't time to digg further yet. However, I don't understand why this > > crash is triggered when cassert is enabled. > > Most likely, it's not so much assertions that provoke the crash as > CLOBBER_FREED_MEMORY, ie the actual problem here is use of already-freed > memory. Thank you. Indeed, enabling CLOBBER_FREED_MEMORY on its own is enough to trigger the segfault. In fact, valgrind detect it as an uninitialised value, no matter CLOBBER_FREED_MEMORY is defined or not: Conditional jump or move depends on uninitialised value(s) at 0x43F410: slot_modify_cstrings (worker.c:398) by 0x43FBE9: apply_handle_update (worker.c:744) by 0x440088: apply_dispatch (worker.c:968) by 0x4405D7: LogicalRepApplyLoop (worker.c:1175) by 0x440CD0: ApplyWorkerMain (worker.c:1733) by 0x411C34: StartBackgroundWorker (bgworker.c:834) by 0x41EA24: do_start_bgworker (postmaster.c:5763) by 0x41EB6F: maybe_start_bgworkers (postmaster.c:5976) by 0x41F562: sigusr1_handler (postmaster.c:5161) by 0x48A072F: ??? (in /lib/x86_64-linux-gnu/libpthread-2.28.so) by 0x4B31FF6: select (select.c:41) by 0x41FDDE: ServerLoop (postmaster.c:1668) Uninitialised value was created by a heap allocation at 0x5C579B: palloc (mcxt.c:949) by 0x437116: logicalrep_rel_open (relation.c:270) by 0x43FA8F: apply_handle_update (worker.c:684) by 0x440088: apply_dispatch (worker.c:968) by 0x4405D7: LogicalRepApplyLoop (worker.c:1175) by 0x440CD0: ApplyWorkerMain (worker.c:1733) by 0x411C34: StartBackgroundWorker (bgworker.c:834) by 0x41EA24: do_start_bgworker (postmaster.c:5763) by 0x41EB6F: maybe_start_bgworkers (postmaster.c:5976) by 0x41F562: sigusr1_handler (postmaster.c:5161) by 0x48A072F: ??? (in /lib/x86_64-linux-gnu/libpthread-2.28.so) by 0x4B31FF6: select (select.c:41) My best bet so far is that logicalrep_relmap_invalidate_cb is not called after the DDL on the subscriber so the relmap cache is not invalidated. So we end up with slot->tts_tupleDescriptor->natts superior than rel->remoterel->natts in slot_store_cstrings, leading to the overflow on attrmap and the sigsev. I hadn't follow this path yet. By the way, I noticed attrmap is declared as AttrNumber * in struct LogicalRepRelMapEntry, AttrNumber being typedef'd as an int16. However, attrmap is allocated based on sizeof(int) in logicalrep_rel_open: entry->attrmap = palloc(desc->natts * sizeof(int)); It doesn't look like a major problem, it just allocates more memory than needed. Regards,
pgsql-hackers by date: