Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker - Mailing list pgsql-bugs

From Ondřej Jirman
Subject Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker
Date
Msg-id 20191121135523.aybyo7o6yzxxungd@core.my.home
Whole thread Raw
In response to Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-bugs
On Thu, Nov 21, 2019 at 02:32:37PM +0100, Tomas Vondra wrote:
> On Thu, Nov 21, 2019 at 01:59:13PM +0100, Ondřej Jirman wrote:
> > On Thu, Nov 21, 2019 at 12:53:26PM +0100, postgresql wrote:
> > > Hello,
> > > 
> > > On Thu, Nov 21, 2019 at 11:39:40AM +0100, Tomas Vondra wrote:
> > > > On Thu, Nov 21, 2019 at 01:14:18AM +0000, PG Bug reporting form wrote:
> > > > > Replication of one of my databases (running on ARMv7 machine) started
> > > > > segfaulting on the subscriber side (x86_64) like this:
> > > > >
> > > > > #0  0x00007fc259739917 in __memmove_sse2_unaligned_erms () from
> > > > > /usr/lib/libc.so.6
> > > > > #1  0x000055d033e93d44 in memcpy (__len=620701425, __src=<optimized out>,
> > > > > __dest=0x55d0356da804) at /usr/include/bits/string_fortified.h:34
> > > > > #2  tts_virtual_materialize (slot=0x55d0356da3b8) at execTuples.c:235
> > > > > #3  0x000055d033e94d32 in ExecFetchSlotHeapTuple
> > > > > (slot=slot@entry=0x55d0356da3b8, materialize=materialize@entry=true,
> > > > > shouldFree=shouldFree@entry=0x7fff0e7cf387) at execTuples.c:1624
> > > 
> > > I forgot to add that publisher is still PostgreSQL 11.5.
> > > 
> > 
> > I can also add that I have data checksumming enabled on both ends, and
> > it did not detect any corruption:
> > 
> > # pg_verify_checksums -D /var/lib/postgres/data
> > Checksum scan completed
> > Data checksum version: 1
> > Files scanned:  1751
> > Blocks scanned: 86592
> > Bad checksums:  0
> > 
> > # pg_checksums /var/lib/postgres/data
> > Checksum operation completed
> > Files scanned:  22777
> > Blocks scanned: 3601527
> > Bad checksums:  0
> > Data checksum version: 1
> > 
> > WAL log on the publisher is also dumpable to a state hours after the issues
> > started:
> > 
> > I've put the dump here, if it's of any use: https://megous.com/dl/tmp/wal_dump.txt
> > 
> > Dump ends with:
> > 
> > pg_waldump: FATAL:  error in WAL record at 2/BBE0E538: invalid record length at 2/BBE0E5A8: wanted 24, got 0
> > 
> > But that seems normal. I get that error on my other database clusters, too.
> > 
> > I managed to extract the failing logical decoding data from the publisher, if
> > that helps:
> > 
> > 
> > SELECT * FROM pg_logical_slot_peek_binary_changes('l5_hometv', NULL, NULL, 'proto_version', '1',
'publication_names','pub');
 
> > 
> > 2/BBD86EA0 | 56395 | \x4200000002bbd880b800023acd790ce5510000dc4b
> > 2/BBD87E90 | 56395 |
\x5200004a687075626c696300766964656f73006400080169640000000017ffffffff007469746c650000000019ffffffff00636f7665725f696d6167650000000011ffffffff006d657461646174610000000edafffffff
> >
f0063617465676f72790000000017ffffffff007075626c6973686564000000043affffffff006164646564000000045affffffff00706c617965640000000010ffffffff
> > 2/BBD87E90 | 56395 |
\x5500004a684e0008740000000438333933740000005650617a6465726b613a204f206dc3a964696120736520706f7665646520626f6a2e204b64796279206ec3a17320706f6c6974696369206d696c6f76616c692c20627
> >
96c6f206279206ec49b636f20c5a17061746ec49b7574000001397b226964223a20226430313064343430303965323131656162323539616331663662323230656538222c202264617465223a2022323031392d31312d3138222c20226e616d65223a20
> >
2250617a6465726b613a204f206dc3a964696120736520706f7665646520626f6a2e204b64796279206ec3a17320706f6c6974696369206d696c6f76616c692c2062796c6f206279206ec49b636f20c5a17061746ec49b222c2022696d616765223a202
> >
268747470733a2f2f63646e2e7873642e637a2f726573697a652f63353535656239633131353333313632386164666539396237343534353731655f657874726163743d302c302c313931392c313038305f726573697a653d3732302c3430355f2e6a70
> >
673f686173683d6362316362623836336230353361613561333761346666616439303865303431227d7400000003323432740000000a323031392d31312d3138740000001a323031392d31312d31382031323a35303a30312e383136333736740000000
> > 174
> > 2/BBD880E8 | 56395 | \x430000000002bbd880b800000002bbd880e800023acd790ce551
> > 
> 
> Can you show the whole transaction? From the WAL dump it seems it just
> did a single UPDATE:

I'm not sure what you mean. The above is the whole output I got from that
SELECT.

> rmgr: Heap        len (rec/tot):     59/  4075, tx:      56395, lsn: 2/BBD86EA0, prev 2/BBD86E68, desc: LOCK off 2:
xid56395: flags 0 LOCK_ONLY EXCL_LOCK
 
>     blkref #0: rel 1663/19030/19048 fork main blk 415 (FPW); hole: offset: 56, length: 4176
> rmgr: Heap        len (rec/tot):    523/   523, tx:      56395, lsn: 2/BBD87E90, prev 2/BBD86EA0, desc: HOT_UPDATE
off2 xmax 56395 ; new off 9 xmax 0
 
>     blkref #0: rel 1663/19030/19048 fork main blk 415
> rmgr: Transaction len (rec/tot):     46/    46, tx:      56395, lsn: 2/BBD880B8, prev 2/BBD87E90, desc: COMMIT
2019-11-2023:28:47.016273 CET
 
> 
> Which fields does it update?

The only update that my system does in this database is:

UPDATE videos SET played = TRUE WHERE id = ?

The rest is just inserts:

INSERT INTO categories (name, metadata, cover_image, provider) VALUES (?, ?, ?, (SELECT id FROM providers WHERE
system_name= ?))
 
INSERT INTO videos (title, metadata, cover_image, category, published) VALUES (?, ?, ?, ?, ?)

There are no other data modification statements executed.

There may be updates in other databases in the same cluster, but replication
worker is segfaulting only on the "hometv" database.

regards,
    o.

> regards
> 
> -- 
> Tomas Vondra                  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-bugs by date:

Previous
From: Ondřej Jirman
Date:
Subject: Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker
Next
From: Tomas Vondra
Date:
Subject: Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker