Home > mailing lists

Re: pg_basebackup: return value 1: reason? - Mailing list pgsql-general

From	Andrej Vanek
Subject	Re: pg_basebackup: return value 1: reason?
Date	May 23, 2016 19:04:59
Msg-id	CAFNFRyEppbLWhYoGGGdvcctgL0OAjpZL1mVKSF5UisF=WCOFuA@mail.gmail.com Whole thread Raw
In response to	Re: pg_basebackup: return value 1: reason? (Adrian Klaver <adrian.klaver@aklaver.com>)
List	pgsql-general

Tree view

Hello,

I've given a try once again.

Two variants used in my script (launched by crm_mon):

1. /usr/pgsql-9.5/bin/pg_basebackup -U pgreplic -h db-other-site -w -D /opt/geo_stdby_data -c fast -vvv -X stream &>> /tmp/log

2. strace -o /tmp/pg_basebackup.log /usr/pgsql-9.5/bin/pg_basebackup -U pgreplic -h db-other-site -w -D /opt/geo_stdby_data -c fast -vvv -X stream &>> /tmp/log

Result:

variant 2. works fine with return code 0 (with strace)

variant 1. fails with error code 1 (without strace)

Any ideas?

Andrej

----------------------details

Output:

Variant 2:

DEBUG: CommitTransaction

DEBUG: name: unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children:

DEBUG: received replication command: IDENTIFY_SYSTEM

DEBUG: received replication command: BASE_BACKUP LABEL 'pg_basebackup base backup' FAST NOWAIT

-- Mon May 23 17:54:31 CEST 2016 [l1abrnch->l1abrnch:3122/27282:GEO] --INFO-- l1abrnch->l1abrnch (GEO-STDBY-DB / stop: 0): target/returned 0/0 (OK)

transaction log start point: 0/FA000028 on timeline 1

pg_basebackup: starting background WAL receiver

DEBUG: CommitTransaction

DEBUG: name: unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children:

DEBUG: received replication command: IDENTIFY_SYSTEM

DEBUG: received replication command: START_REPLICATION 0/FA000000 TIMELINE 1

WARNING: skipping special file "./pg_hba.conf"

DEBUG: standby "pg_basebackup" has now caught up with primary

DEBUG: write 0/FA000000 flush 0/0 apply 0/0

DEBUG: removing transaction log backup history file "0000000100000000000000F8.00000028.backup"

transaction log end point: 0/FA0000F8

pg_basebackup: waiting for background process to finish streaming ...

pg_basebackup: base backup completed

RETVAL=0

Output

Variant 1:

DEBUG: CommitTransaction

DEBUG: name: unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children:

DEBUG: received replication command: IDENTIFY_SYSTEM

DEBUG: received replication command: BASE_BACKUP LABEL 'pg_basebackup base backup' FAST NOWAIT

-- Mon May 23 17:55:32 CEST 2016 [l1abrnch->l1abrnch:3122/28785:GEO] --INFO-- l1abrnch->l1abrnch (GEO-STDBY-DB / stop: 0): target/returned 0/0 (OK)

transaction log start point: 0/FC000028 on timeline 1

pg_basebackup: starting background WAL receiver

DEBUG: CommitTransaction

DEBUG: name: unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children:

DEBUG: received replication command: IDENTIFY_SYSTEM

DEBUG: received replication command: START_REPLICATION 0/FC000000 TIMELINE 1

WARNING: skipping special file "./pg_hba.conf"

DEBUG: standby "pg_basebackup" has now caught up with primary

DEBUG: write 0/FC000000 flush 0/0 apply 0/0

DEBUG: removing transaction log backup history file "0000000100000000000000FA.00000028.backup"

transaction log end point: 0/FC0000F8

pg_basebackup: waiting for background process to finish streaming ...

pg_basebackup: could not wait for child process: No child processes

RETVAL=1

2016-04-18 16:12 GMT+02:00 Adrian Klaver <adrian.klaver@aklaver.com>:

On 04/17/2016 12:13 PM, Andrej Vanek wrote:
Hello Adrian,

I tried to use -U without "su"- launched directly by root: same behaviour.
Finally I reverted my script to use standard backup (pg_start_backup;
rsync; pg_stop_backup)- this works- the only downside is possible
collisions with on-line backup/synchronizaiton of other two nodes on
master node...

Back to the pg_basebackup issue: it is clear to me that this is an issue
of environment which launched pg_basebackup.
Possibly either some privileges or some kernel parameters/limits. Who
knows?
Summary: clusterlab's crm_mon launched a shell script starting
pg_basebackup which fails to do some its work (pg_basebackup: could not
wait for child process: No child processes)- probably due to some
failing system call.

How can I report to clusterlabs: What system call fails in pg_basebackup?

All I can to do is point you at:

https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD

Best Regards, Andrej

2016-04-17 1:09 GMT+02:00 Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>>:

Is the su - even necessary?

pg_basebackup is a Postgres client program you can specify the user
you want it to connect to using -U.

Or do you need the script to run as postgres in order to get
permissions on wherever you are creating the backup directory?

have to find out why pg_basebackup cannot fork when launched
from crm_mon.

I assume crm_mon is this:

http://linux.die.net/man/8/crm_mon

from Pacemaker.

I do not use Pacemaker, but I am pretty sure that running what is a
monitoring program in daemon mode and then shelling out to another
program is not workable. The docs seem to bear this out:

http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster#Installation

https://github.com/smbambling/pgsql_ha_cluster/wiki/Building-A-Highly-Available-Multi-Node-PostgreSQL-Cluster

--
Adrian Klaver
adrian.klaver@aklaver.com

pgsql-general by date:

From: Bruno Wolff III
Date: 23 May 2016, 17:35:27
Subject: Re: Postgresql-fdw

From: Tom Lane
Date: 23 May 2016, 19:55:51
Subject: Re: Fatal error "stack empty" on ROLLBACK

Re: pg_basebackup: return value 1: reason? - Mailing list pgsql-general

Previous

Next