Re: BUG #16440: pg_basebackup intermittently hangs waiting for inputunless run with --checkpoint=fast option - Mailing list pgsql-bugs

From Curt Kolovson
Subject Re: BUG #16440: pg_basebackup intermittently hangs waiting for inputunless run with --checkpoint=fast option
Date
Msg-id CANhYJV563us5eHeBXhTCHOO9u9aAPVkX47gmKY2SkxPhnJyj+A@mail.gmail.com
Whole thread Raw
In response to Re: BUG #16440: pg_basebackup intermittently hangs waiting for inputunless run with --checkpoint=fast option  (Magnus Hagander <magnus@hagander.net>)
Responses Re: BUG #16440: pg_basebackup intermittently hangs waiting for inputunless run with --checkpoint=fast option  (Curt Kolovson <curt@kolovson.org>)
List pgsql-bugs
We are using the default checkpoint settings: checkpoint_timout (5 min) * checkpoint_completion_target (0.5) = 2.5 min. OK on your point that we should be looking at the primary CPU, not the client. You're right that "indefinitely" is not a good term to use. We waited over 10 min. It appeared to be hung. The database was small (81 MB) and there was very little activity. I tried doing a checkpoint on the primary while this was going on, and it had no effect. The strange thing about this problem is that it is intermittent. We only see it happen occasionally. But when it occurs, it is repeatable. 

Curt 

On Fri, May 15, 2020 at 10:51 AM Magnus Hagander <magnus@hagander.net> wrote:
On Fri, May 15, 2020 at 7:49 PM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      16440
Logged by:          Curt Kolovson
Email address:      ckolovson@gmail.com
PostgreSQL version: 10.9
Operating system:   Linux 4.9.219-1.ph2 x86_64 (VMware Photon OS 2.0)
Description:       

We have noticed what appears to be an intermittent bug in pg_basebackup.
Here is what we are occasionally seeing:
$ /opt/vmware/vpostgres/current/bin/pg_basebackup -l "repmgr base backup"
-D /var/vmware/vpostgres/current/pgdata -h 172.18.50.48 -p 5432 -U repmgr -X
stream --verbose --progress
pg_basebackup: initiating base backup, waiting for checkpoint to complete

And then it hangs indefinitely at this point. It makes no progress (0 CPU),
so it is hanging on some type of input. Here is ps output:
postgres 3386 3370 0 15:01 ?    00:00:00
/opt/vmware/vpostgres/current/bin/pg_basebackup -l repmgr base backup -D
/var/vmware/vpostgres/current/pgdata -h 172.18.50.48 -p 5432 -U repmgr -X
stream

We notice that this behavior only occurs intermittently, but when it does,
it happens repeatedly on that system. The only workarounds we have found are
either to run it with the --checkpoint=fast option, or to restart postgres
on the primary.

We are using synchronous streaming replication.

Define "indefinitely". How long did you wait, and what's the value for your checkpoint_timeout?

It's perfectly normal for it to be waiting quite  some time, as it waits for the slow speed checkpoint to complete on the server. (And if you want to look at the processes and status of the server, not the client, to see what it's doing)

--

pgsql-bugs by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: BUG #16440: pg_basebackup intermittently hangs waiting for inputunless run with --checkpoint=fast option
Next
From: Curt Kolovson
Date:
Subject: Re: BUG #16440: pg_basebackup intermittently hangs waiting for inputunless run with --checkpoint=fast option