Thread: pg_basebackup -F t fails when fsync spends more time thantcp_user_timeout
pg_basebackup -F t fails when fsync spends more time thantcp_user_timeout
From
"r.takahashi_2@fujitsu.com"
Date:
Hi pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout in following environment. [Environment] Postgres 13dev (master branch) Red Hat Enterprise Postgres 7.4 [Error] $ pg_basebackup -F t --progress --verbose -h <hostname> -D <directory> pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed pg_basebackup: write-ahead log start point: 0/5A000060 on timeline 1 pg_basebackup: starting background WAL receiver pg_basebackup: created temporary replication slot "pg_basebackup_15647" pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. [Analysis] - pg_basebackup -F t creates a tar file and does fsync() for each tablespace. (Otherwise, -F p does fsync() only once at the end.) - While doing fsync() for a tar file for one tablespace, wal sender sends the content of the next tablespace. When fsync() spends long time, the tcp socket of pg_basebackup returns "zero window" packets to wal sender. This means the tcp socket buffer of pg_basebackup is exhausted since pg_basebackup cannot receive during fsync(). - The socket of wal sender retries to send the packet, but resets connection after tcp_user_timeout. After wal sender resets connection, pg_basebackup cannot receive data and fails with above error. [Solution] I think fsync() for each tablespace is not necessary. Like pg_basebackup -F p, I think fsync() is necessary only once at the end. Could you give me any comment? Regards, Ryohei Takahashi
Re: pg_basebackup -F t fails when fsync spends more time thantcp_user_timeout
From
Michael Paquier
Date:
On Mon, Sep 02, 2019 at 04:42:55AM +0000, r.takahashi_2@fujitsu.com wrote: > I think fsync() for each tablespace is not necessary. > Like pg_basebackup -F p, I think fsync() is necessary only once at the end. Yes, I agree that we overlooked that part when introducing tcp_user_timeout. It is possible to sync all the contents of pg_basebackup's tar format once at the end with fsync_dir_recurse(). Looking at the original discussion that brought bc34223b, the proposed patches did what we have now on HEAD but we did not really exchange about doing a fsync() just at the end with all the result base directory contents: https://www.postgresql.org/message-id/CAB7nPqQL0fCp0eDcVD6+3+Je24xeApU14vKz_pBpNA0sTPwLgQ@mail.gmail.com Attached is a patch to do that, which should go down to v12 where tcp_user_timeout has been introduced. Takahashi-san, what do you think? -- Michael
Attachment
RE: pg_basebackup -F t fails when fsync spends more time thantcp_user_timeout
From
"r.takahashi_2@fujitsu.com"
Date:
Hi Michael-san, > Attached is a patch to do that, which should go down to v12 where > tcp_user_timeout has been introduced. Takahashi-san, what do you > think? Thank you for creating the patch. This patch is what I expected. I'm not sure whether this patch should be applied to postgres below 11 since I'm not sure whether the OS parameters (ex. tcp_retries2) cause the same error. Regards, Ryohei Takahashi
Re: pg_basebackup -F t fails when fsync spends more time thantcp_user_timeout
From
Michael Paquier
Date:
On Mon, Sep 02, 2019 at 08:06:22AM +0000, r.takahashi_2@fujitsu.com wrote: > I'm not sure whether this patch should be applied to postgres below 11 > since I'm not sure whether the OS parameters (ex. tcp_retries2) cause the same error. Thinking wider, don't we have the same problem with wal_sender_timeout in the case where a sync request takes longer than the time it would take the backend to terminate the connection? -- Michael
Attachment
Re: pg_basebackup -F t fails when fsync spends more time thantcp_user_timeout
From
Michael Paquier
Date:
On Mon, Sep 02, 2019 at 05:38:56PM +0900, Michael Paquier wrote: > Thinking wider, don't we have the same problem with wal_sender_timeout > in the case where a sync request takes longer than the time it would > take the backend to terminate the connection? I have been able to work more on that, and that can indeed happen with wal_sender_timeout. While reviewing the code, I have noticed that there is little point to enable do_sync when fetching WAL segments. This actually led to too many fsyncs done for the plain format as each WAL segment is fsync'd first by walmethods.c, then fsync'd again by fsync_pgdata() in pg_wal/. Attached is an updated patch, which needs to go down to v10. -- Michael
Attachment
Re: pg_basebackup -F t fails when fsync spends more time thantcp_user_timeout
From
Michael Paquier
Date:
On Tue, Sep 03, 2019 at 10:58:18AM +0900, Michael Paquier wrote: > Attached is an updated patch, which needs to go down to v10. Committed, after doing more double-checks on it. Thanks for the report, Takahashi-san. -- Michael