Home > mailing lists

Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker - Mailing list pgsql-bugs

From	Ondřej Jirman
Subject	Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker
Date	November 21, 2019 12:59:13
Msg-id	20191121125913.4dhhwpbw67vqx37o@core.my.home Whole thread Raw
In response to	Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker (Ondřej Jirman <ienieghapheoghaiwida@xff.cz>)
List	pgsql-bugs

Tree view

On Thu, Nov 21, 2019 at 12:53:26PM +0100, postgresql wrote:
> Hello,
> 
> On Thu, Nov 21, 2019 at 11:39:40AM +0100, Tomas Vondra wrote:
> > On Thu, Nov 21, 2019 at 01:14:18AM +0000, PG Bug reporting form wrote:
> > > Replication of one of my databases (running on ARMv7 machine) started
> > > segfaulting on the subscriber side (x86_64) like this:
> > > 
> > > #0  0x00007fc259739917 in __memmove_sse2_unaligned_erms () from
> > > /usr/lib/libc.so.6
> > > #1  0x000055d033e93d44 in memcpy (__len=620701425, __src=<optimized out>,
> > > __dest=0x55d0356da804) at /usr/include/bits/string_fortified.h:34
> > > #2  tts_virtual_materialize (slot=0x55d0356da3b8) at execTuples.c:235
> > > #3  0x000055d033e94d32 in ExecFetchSlotHeapTuple
> > > (slot=slot@entry=0x55d0356da3b8, materialize=materialize@entry=true,
> > > shouldFree=shouldFree@entry=0x7fff0e7cf387) at execTuples.c:1624
> 
> I forgot to add that publisher is still PostgreSQL 11.5.
> 

I can also add that I have data checksumming enabled on both ends, and 
it did not detect any corruption:

# pg_verify_checksums -D /var/lib/postgres/data
Checksum scan completed
Data checksum version: 1
Files scanned:  1751
Blocks scanned: 86592
Bad checksums:  0

# pg_checksums /var/lib/postgres/data
Checksum operation completed
Files scanned:  22777
Blocks scanned: 3601527
Bad checksums:  0
Data checksum version: 1

WAL log on the publisher is also dumpable to a state hours after the issues
started:

I've put the dump here, if it's of any use: https://megous.com/dl/tmp/wal_dump.txt

Dump ends with:

pg_waldump: FATAL:  error in WAL record at 2/BBE0E538: invalid record length at 2/BBE0E5A8: wanted 24, got 0

But that seems normal. I get that error on my other database clusters, too.

I managed to extract the failing logical decoding data from the publisher, if
that helps:


SELECT * FROM pg_logical_slot_peek_binary_changes('l5_hometv', NULL, NULL, 'proto_version', '1', 'publication_names',
'pub');

                                                                              
 
 2/BBD86EA0 | 56395 | \x4200000002bbd880b800023acd790ce5510000dc4b
 2/BBD87E90 | 56395 |
\x5200004a687075626c696300766964656f73006400080169640000000017ffffffff007469746c650000000019ffffffff00636f7665725f696d6167650000000011ffffffff006d657461646174610000000edafffffff

f0063617465676f72790000000017ffffffff007075626c6973686564000000043affffffff006164646564000000045affffffff00706c617965640000000010ffffffff
 2/BBD87E90 | 56395 |
\x5500004a684e0008740000000438333933740000005650617a6465726b613a204f206dc3a964696120736520706f7665646520626f6a2e204b64796279206ec3a17320706f6c6974696369206d696c6f76616c692c20627

96c6f206279206ec49b636f20c5a17061746ec49b7574000001397b226964223a20226430313064343430303965323131656162323539616331663662323230656538222c202264617465223a2022323031392d31312d3138222c20226e616d65223a20

2250617a6465726b613a204f206dc3a964696120736520706f7665646520626f6a2e204b64796279206ec3a17320706f6c6974696369206d696c6f76616c692c2062796c6f206279206ec49b636f20c5a17061746ec49b222c2022696d616765223a202

268747470733a2f2f63646e2e7873642e637a2f726573697a652f63353535656239633131353333313632386164666539396237343534353731655f657874726163743d302c302c313931392c313038305f726573697a653d3732302c3430355f2e6a70

673f686173683d6362316362623836336230353361613561333761346666616439303865303431227d7400000003323432740000000a323031392d31312d3138740000001a323031392d31312d31382031323a35303a30312e383136333736740000000
174
 2/BBD880E8 | 56395 | \x430000000002bbd880b800000002bbd880e800023acd790ce551


thank you and regards,
    o.

pgsql-bugs by date:

From: Dean Rasheed
Date: 21 November 2019, 12:29:39
Subject: Re: Failed assertion clauses != NIL

From: Tomas Vondra
Date: 21 November 2019, 12:59:30
Subject: Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker

Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker - Mailing list pgsql-bugs

Previous

Next