Re: BUG #8397: pg_basebackup -x from new standby server sometimes causes Segmentation fault - Mailing list pgsql-bugs

From Magnus Hagander
Subject Re: BUG #8397: pg_basebackup -x from new standby server sometimes causes Segmentation fault
Date
Msg-id CABUevExE8SD5_9yob-cMN-oh+dXUiabEzFTt4mNRJcP=ZXa1WA@mail.gmail.com
Whole thread Raw
In response to BUG #8397: pg_basebackup -x from new standby server sometimes causes Segmentation fault  (harukat@sraoss.co.jp)
Responses Re: BUG #8397: pg_basebackup -x from new standby server sometimes causes Segmentation fault
List pgsql-bugs
On Sat, Aug 24, 2013 at 1:46 PM,  <harukat@sraoss.co.jp> wrote:
> The following bug has been logged on the website:
>
> Bug reference:      8397
> Logged by:          TAKATSUKA Haruka
> Email address:      harukat@sraoss.co.jp
> PostgreSQL version: 9.2.4
> Operating system:   Linux (CentOS6)
> Description:
>
> Hi.
>
>
> I report a small bug.
> pg_basebackup -x from new standby server sometimes causes Segmentation
> fault.
>
>
> (1) create new standby server dir by pg_basebackup without -x
> (2) start new standby server
> (3) pg_basebackup from new standby server with -x
> (!) when new standby has no WAL files in pg_xlog,
>     new standby's wal sender crash
>
>
> new standby server's core file:
>
>
> Core was generated by `postgres: wal sender process postgres ::1(55210)
> sending backup "pg_basebackup'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x0000003b7368ac66 in __rawmemchr_sse2 () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install
> glibc-2.12-1.107.el6.x86_64 libxml2-2.7.6-4.el6.x86_64
> zlib-1.2.3-27.el6.x86_64
> (gdb) bt
> #0  0x0000003b7368ac66 in __rawmemchr_sse2 () from /lib64/libc.so.6
> #1  0x0000003b73675990 in _IO_str_init_static_internal () from
> /lib64/libc.so.6
> #2  0x0000003b73669935 in vsscanf () from /lib64/libc.so.6
> #3  0x0000003b736639a8 in sscanf () from /lib64/libc.so.6
> #4  0x0000000000622351 in perform_base_backup (opt=0x7fffc2e22300,
>     tblspcdir=0xd424c0) at basebackup.c:304
> #5  0x0000000000622c50 in SendBaseBackup (cmd=<value optimized out>)
>     at basebackup.c:558
> #6  0x000000000061f5b0 in HandleReplicationCommand () at walsender.c:482
> #7  WalSndHandshake () at walsender.c:257
> #8  WalSenderMain () at walsender.c:181
> #9  0x0000000000650b12 in PostgresMain (argc=1, argv=<value optimized out>,
>     dbname=0xc82a90 "", username=0xc82a70 "postgres") at postgres.c:3715
> #10 0x000000000060c4f1 in BackendRun () at postmaster.c:3614
> #11 BackendStartup () at postmaster.c:3304
> #12 ServerLoop () at postmaster.c:1367
> #13 0x000000000060f031 in PostmasterMain (argc=<value optimized out>,
>     argv=<value optimized out>) at postmaster.c:1127
> #14 0x00000000005ae140 in main (argc=5, argv=0xc80bb0) at main.c:199
>
>
>
>
> ./backend/replication/basebackup.c:304
>    XLogFromFileName(walFiles[0], &tli, &logid, &logseg);
>
>
> In this case, nWalFiles = 0 and walFiles[] palloced zero size.
>
>
> Though pg_basebackup does not have to work in this rare case,
> we should insert something like "if (nWalFiles <= 0) ereport(...);".

Yes, we definitely need better error checking there - a crash is never
the right answer.

Does this happen only when you take a backup "really quickly" after
setting up the new standby, or is there some scenario further in it's
lifetime when it can happen? In the first case, throwing a hard error
seems quite reasonable, but if it's repeatable, perhaps there is
something better we can do?

Also, while we definitely need a sanity check at this point, might it
be worth it to put a second check earlier in the process as well -
since AFAICT this error gets thrown only after all the data has been
sent arlready.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

pgsql-bugs by date:

Previous
From: harukat@sraoss.co.jp
Date:
Subject: BUG #8397: pg_basebackup -x from new standby server sometimes causes Segmentation fault
Next
From: paul@weotta.com
Date:
Subject: BUG #8396: Window function results differ when selecting from table and view, with where clause