Thread: Wal sender segfault

Wal sender segfault

From
Dmitriy Sarafannikov
Date:
Hi, i'm trying to test logical decoding on server under load.
I launched pg_recvlogical with 'test_decoding' plugin and wal sender was crashed with segfault after several minutes of work.

pg_recvlogical --start --slot test_slot --no-loop -d dbname -h 127.0.0.1 -p5432 -U dbuser -w -f /tmp/test_logical.xlog

postgres=# select version();
version
-----------------------------------------------------------------------------------------------
PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit
(1 row)I have core dump file (size 66G)

I launch gdb with core file and getting incomplete backtrace:

Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fa9248af742 in ReorderBufferGetTupleBuf ()

(gdb) bt
#0 0x00007fa9248af742 in ReorderBufferGetTupleBuf ()
#1 0x00007fa9248acde2 in LogicalDecodingProcessRecord ()
#2 0x00007fa9248b53b4 in ?? ()
#3 0x00007fa9248b6ed3 in ?? ()
#4 0x00007fa9248b7d8a in exec_replication_command ()
#5 0x00007fa9248f39fe in PostgresMain ()
#6 0x00007fa9246bb92e in ?? ()
#7 0x00007fa92489e58b in PostmasterMain ()
#8 0x00007fa9246bcac2 in main ()
What can i do to get more info about the reason of this segfault?

--
Best regards,
Dmitriy Sarafannikov

Re: Wal sender segfault

From
Andres Freund
Date:
Hi,

Thanks for the report!

On 2016-01-22 15:45:27 +0300, Dmitriy Sarafannikov wrote:
>  Hi, i'm trying to test logical decoding on server under load.
> I launched pg_recvlogical with 'test_decoding' plugin and wal sender was crashed with segfault after several minutes
ofwork. 
>
> pg_recvlogical --start --slot test_slot --no-loop -d dbname -h 127.0.0.1 -p5432 -U dbuser -w -f
/tmp/test_logical.xlog
>
> postgres=# select version();
> version
> -----------------------------------------------------------------------------------------------
> PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit
> (1 row) I have core dump file (size 66G)
>
> I launch gdb with core file and getting incomplete backtrace:
>
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0 0x00007fa9248af742 in ReorderBufferGetTupleBuf ()
> (gdb) bt
> #0 0x00007fa9248af742 in ReorderBufferGetTupleBuf ()
> #1 0x00007fa9248acde2 in LogicalDecodingProcessRecord ()
> #2 0x00007fa9248b53b4 in ?? ()
> #3 0x00007fa9248b6ed3 in ?? ()
> #4 0x00007fa9248b7d8a in exec_replication_command ()
> #5 0x00007fa9248f39fe in PostgresMain ()
> #6 0x00007fa9246bb92e in ?? ()
> #7 0x00007fa92489e58b in PostmasterMain ()
> #8 0x00007fa9246bcac2 in main ()

Any chance that one of modfied tables in question uses REPLICA IDENTITY
FULL? There's an open bug about too large rows produced by that, which
we don't currently handle correctly.  I'm working on fixing that bug.

> What can i do to get more info about the reason of this segfault?

You could post a reproducible example... Other than that it's usually
helpful to build postgres with debugging symbols enabled, that'd already
give more context.

Regards,

Andres Freund

Re[2]: [BUGS] Wal sender segfault

From
Dmitriy Sarafannikov
Date:
Any chance that one of modfied tables in question uses REPLICA IDENTITY
FULL? There's an open bug about too large rows produced by that, which
we don't currently handle correctly. I'm working on fixing that bug.
Thanks,

Yes, i have 3 tables with REPLICA IDENTITY FULL. 2 of this tables have many fields and big text fields.

--
Best regards,
Dmitriy Sarafannikov

Re: Re[2]: [BUGS] Wal sender segfault

From
Michael Paquier
Date:
On Fri, Jan 22, 2016 at 11:19 PM, Dmitriy Sarafannikov
<dimon99901@mail.ru> wrote:
> Any chance that one of modfied tables in question uses REPLICA IDENTITY
> FULL? There's an open bug about too large rows produced by that, which
> we don't currently handle correctly. I'm working on fixing that bug.
>
> Thanks,
>
> Yes, i have 3 tables with REPLICA IDENTITY FULL. 2 of this tables have many
> fields and big text fields.

The original bug is here:
http://www.postgresql.org/message-id/CAKg6ypLd7773AOX4DiOGRwQk1TVOQKhNwjYiVjJnpq8Wo+i62Q@mail.gmail.com
--
Michael

Re[2]: [BUGS] Re[2]: [BUGS] Wal sender segfault

From
Dmitriy Sarafannikov
Date:
this is the same bug?

(gdb) bt
#0 slist_pop_head_node (head=0x7fa92656bc10) at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/include/lib/ilist.h:649
#1 ReorderBufferGetTupleBuf (rb=0x7fa92656bb90) at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/replication/logical/reorderbuffer.c:456
#2 0x00007fa9248acde2 in DecodeUpdate (ctx=<optimized out>, ctx=<optimized out>, buf=0x7fffecd7e300) at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/replication/logical/decode.c:651
#3 DecodeHeapOp (buf=0x7fffecd7e300, ctx=0x7fa92655db60) at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/replication/logical/decode.c:430
#4 LogicalDecodingProcessRecord (ctx=0x7fa92655db60, record=<optimized out>) at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/replication/logical/decode.c:115
#5 0x00007fa9248b53b4 in XLogSendLogical () at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/replication/walsender.c:2428
#6 0x00007fa9248b6ed3 in WalSndLoop (send_data=send_data@entry=0x7fa9248b5350 <XLogSendLogical>) at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/replication/walsender.c:1829
#7 0x00007fa9248b7d8a in StartLogicalReplication (cmd=<optimized out>) at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/replication/walsender.c:996
#8 exec_replication_command (cmd_string=cmd_string@entry=0x7fa926494820 "START_REPLICATION SLOT \"test_slot\" LOGICAL 0/0")
at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/replication/walsender.c:1321
#9 0x00007fa9248f39fe in PostgresMain (argc=<optimized out>, argv=argv@entry=0x7fa92647b448, dbname=0x7fa92647b370 "dbname", username=<optimized out>)
at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/tcop/postgres.c:4077
#10 0x00007fa9246bb92e in BackendRun (port=0x7fa9264bc3d0) at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/postmaster/postmaster.c:4252
#11 BackendStartup (port=0x7fa9264bc3d0) at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/postmaster/postmaster.c:3917
#12 ServerLoop () at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/postmaster/postmaster.c:1678
#13 0x00007fa92489e58b in PostmasterMain (argc=5, argv=<optimized out>) at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/postmaster/postmaster.c:1287
#14 0x00007fa9246bcac2 in main (argc=5, argv=0x7fa92647a570) at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/main/main.c:228

Воскресенье, 24 января 2016, 0:08 +09:00 от Michael Paquier <michael.paquier@gmail.com>:

On Fri, Jan 22, 2016 at 11:19 PM, Dmitriy Sarafannikov
<dimon99901@mail.ru> wrote:
> Any chance that one of modfied tables in question uses REPLICA IDENTITY
> FULL? There's an open bug about too large rows produced by that, which
> we don't currently handle correctly. I'm working on fixing that bug.
>
> Thanks,
>
> Yes, i have 3 tables with REPLICA IDENTITY FULL. 2 of this tables have many
> fields and big text fields.

The original bug is here:
http://www.postgresql.org/message-id/CAKg6ypLd7773AOX4DiOGRwQk1TVOQKhNwjYiVjJnpq8Wo+i62Q@mail.gmail.com
--
Michael


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


--
Best regards,
Dmitriy Sarafannikov