Thread: replication timeout in pg_basebackup

replication timeout in pg_basebackup

From
"Aggarwal, Ajay"
Date:
Our environment: Postgres version 9.2.2 running on CentOS 6.4

Our backups using pg_basebackup are frequently failing with following error
"pg_basebackup: could not send feedback packet: server closed the connection unexpectedly       This probably means the server terminated abnormally       before or while processing the request."

We are invoking pg_basebackup with these arguments : pg_basebackup -D backup_dir -X stream -l backup_dir
In postgres logs we see this log message "terminating walsender process due to replication timeout".

Our replication timeout is default 60 seconds. If we increase the replication time to say 180 seconds, we see better results but backups still fail occasionally.

Running strace on pg_basebackup process, we see that the fsync() call takes significant time and could be responsible for causing this timeout in postgres.

Has anybody else run into the same issue? Is there a way to run pg_basebackup without fsync() ?


Re: replication timeout in pg_basebackup

From
Haribabu Kommi
Date:
On Mon, Mar 10, 2014 at 12:52 PM, Aggarwal, Ajay <aaggarwal@verizon.com> wrote:
Our environment: Postgres version 9.2.2 running on CentOS 6.4

Our backups using pg_basebackup are frequently failing with following error
"pg_basebackup: could not send feedback packet: server closed the connection unexpectedly       This probably means the server terminated abnormally       before or while processing the request."

We are invoking pg_basebackup with these arguments : pg_basebackup -D backup_dir -X stream -l backup_dir
In postgres logs we see this log message "terminating walsender process due to replication timeout".

Our replication timeout is default 60 seconds. If we increase the replication time to say 180 seconds, we see better results but backups still fail occasionally.

Running strace on pg_basebackup process, we see that the fsync() call takes significant time and could be responsible for causing this timeout in postgres.

Use the pg_test_fsync utility which is available in postgresql contrib module to test your system sync methods performance. 
 
Has anybody else run into the same issue? Is there a way to run pg_basebackup without fsync() ?

As of now there is no such options available, I feel it is better to find why the sync is taking time?
 
Regards,
Hari Babu
Fujitsu Australia

Re: replication timeout in pg_basebackup

From
"Aggarwal, Ajay"
Date:
Thanks Hari Babu.

I think what is happening is that my dirty cache builds up quickly for the volume where I am backing up. This would trigger flush of these dirty pages to the disk. While this flush is going on pg_basebackup tries to do fsync() on a received WAL file and gets blocked. 

While in this state, i.e. when dirty page count is high, following are the results of pg_test_fsync


# /usr/pgsql-9.2/bin/pg_test_fsync -f /backup/fsync_test
2 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
        open_datasync                      16.854 ops/sec
        fdatasync                          15.242 ops/sec
        fsync                               0.187 ops/sec
        fsync_writethrough                            n/a
        open_sync                          14.747 ops/sec

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
        open_datasync                       6.137 ops/sec
        fdatasync                          14.899 ops/sec
        fsync                               0.007 ops/sec
        fsync_writethrough                            n/a
        open_sync                           1.450 ops/sec

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
         1 * 16kB open_sync write          13.486 ops/sec
         2 *  8kB open_sync writes          6.006 ops/sec
         4 *  4kB open_sync writes          3.446 ops/sec
         8 *  2kB open_sync writes          1.400 ops/sec
        16 *  1kB open_sync writes          0.859 ops/sec

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
        write, fsync, close                 0.009 ops/sec
        write, close, fsync                 0.008 ops/sec

Non-Sync'ed 8kB writes:
        write                           99415.368 ops/sec


However when backups are not going on and dirty pages count is low, below are the results of this test

# /usr/pgsql-9.2/bin/pg_test_fsync -f /backup/fsync_test
2 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
        open_datasync                    1974.243 ops/sec
        fdatasync                        1410.804 ops/sec
        fsync                             181.129 ops/sec
        fsync_writethrough                            n/a
        open_sync                         547.389 ops/sec

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
        open_datasync                     290.109 ops/sec
        fdatasync                         962.378 ops/sec
        fsync                             158.987 ops/sec
        fsync_writethrough                            n/a
        open_sync                         642.309 ops/sec

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
         1 * 16kB open_sync write        1014.456 ops/sec
         2 *  8kB open_sync writes        627.964 ops/sec
         4 *  4kB open_sync writes        340.313 ops/sec
         8 *  2kB open_sync writes        173.581 ops/sec
        16 *  1kB open_sync writes        103.236 ops/sec

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
        write, fsync, close               244.670 ops/sec
        write, close, fsync               207.248 ops/sec

Non-Sync'ed 8kB writes:
        write                           202216.900 ops/sec



From: Haribabu Kommi [kommi.haribabu@gmail.com]
Sent: Monday, March 10, 2014 1:42 AM
To: Aggarwal, Ajay
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] replication timeout in pg_basebackup

On Mon, Mar 10, 2014 at 12:52 PM, Aggarwal, Ajay <aaggarwal@verizon.com> wrote:
Our environment: Postgres version 9.2.2 running on CentOS 6.4

Our backups using pg_basebackup are frequently failing with following error
"pg_basebackup: could not send feedback packet: server closed the connection unexpectedly       This probably means the server terminated abnormally       before or while processing the request."

We are invoking pg_basebackup with these arguments : pg_basebackup -D backup_dir -X stream -l backup_dir
In postgres logs we see this log message "terminating walsender process due to replication timeout".

Our replication timeout is default 60 seconds. If we increase the replication time to say 180 seconds, we see better results but backups still fail occasionally.

Running strace on pg_basebackup process, we see that the fsync() call takes significant time and could be responsible for causing this timeout in postgres.

Use the pg_test_fsync utility which is available in postgresql contrib module to test your system sync methods performance. 
 
Has anybody else run into the same issue? Is there a way to run pg_basebackup without fsync() ?

As of now there is no such options available, I feel it is better to find why the sync is taking time?
 
Regards,
Hari Babu
Fujitsu Australia

Re: replication timeout in pg_basebackup

From
Haribabu Kommi
Date:
On Tue, Mar 11, 2014 at 7:07 AM, Aggarwal, Ajay <aaggarwal@verizon.com> wrote:
> Thanks Hari Babu.
>
> I think what is happening is that my dirty cache builds up quickly for the
> volume where I am backing up. This would trigger flush of these dirty pages
> to the disk. While this flush is going on pg_basebackup tries to do fsync()
> on a received WAL file and gets blocked.

But the sync is executed for every WAL file finish. Does your database
is big in size?
Does your setup is write-heavy operations?

In Linux when it tries to write a bunch of buffers at once, the fysnc
call might block for some time.
In the following link there are some "Tuning Recommendations for
write-heavy operations" which might be useful to you.

http://www.westnet.com/~gsmith/content/linux-pdflush.htm

Any other ideas to handle these kind of problems?

Regards,
Hari Babu
Fujitsu Australia


Re: replication timeout in pg_basebackup

From
"Aggarwal, Ajay"
Date:
I have already tried experimenting with linux dirty_ratio etc. You can only fine tune up to a limit. The backup process
stillfills up the buffer cache very quickly. Yes, my database is about 5-6 GB in size and will grow bigger over time. 

If wish there was a way to slow down pg_basebackup or force it to use direct I/O.
________________________________________
From: Haribabu Kommi [kommi.haribabu@gmail.com]
Sent: Monday, March 10, 2014 8:31 PM
To: Aggarwal, Ajay
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] replication timeout in pg_basebackup

On Tue, Mar 11, 2014 at 7:07 AM, Aggarwal, Ajay <aaggarwal@verizon.com> wrote:
> Thanks Hari Babu.
>
> I think what is happening is that my dirty cache builds up quickly for the
> volume where I am backing up. This would trigger flush of these dirty pages
> to the disk. While this flush is going on pg_basebackup tries to do fsync()
> on a received WAL file and gets blocked.

But the sync is executed for every WAL file finish. Does your database
is big in size?
Does your setup is write-heavy operations?

In Linux when it tries to write a bunch of buffers at once, the fysnc
call might block for some time.
In the following link there are some "Tuning Recommendations for
write-heavy operations" which might be useful to you.

http://www.westnet.com/~gsmith/content/linux-pdflush.htm

Any other ideas to handle these kind of problems?

Regards,
Hari Babu
Fujitsu Australia


Re: replication timeout in pg_basebackup

From
"Aggarwal, Ajay"
Date:
If I don't include WAL files as part of my backup, I do not run into this issue. But a backup without WAL files is not
whatI want. 
________________________________________
From: Aggarwal, Ajay
Sent: Monday, March 10, 2014 9:46 PM
To: Haribabu Kommi
Cc: pgsql-general@postgresql.org
Subject: RE: [GENERAL] replication timeout in pg_basebackup

I have already tried experimenting with linux dirty_ratio etc. You can only fine tune up to a limit. The backup process
stillfills up the buffer cache very quickly. Yes, my database is about 5-6 GB in size and will grow bigger over time. 

If wish there was a way to slow down pg_basebackup or force it to use direct I/O.
________________________________________
From: Haribabu Kommi [kommi.haribabu@gmail.com]
Sent: Monday, March 10, 2014 8:31 PM
To: Aggarwal, Ajay
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] replication timeout in pg_basebackup

On Tue, Mar 11, 2014 at 7:07 AM, Aggarwal, Ajay <aaggarwal@verizon.com> wrote:
> Thanks Hari Babu.
>
> I think what is happening is that my dirty cache builds up quickly for the
> volume where I am backing up. This would trigger flush of these dirty pages
> to the disk. While this flush is going on pg_basebackup tries to do fsync()
> on a received WAL file and gets blocked.

But the sync is executed for every WAL file finish. Does your database
is big in size?
Does your setup is write-heavy operations?

In Linux when it tries to write a bunch of buffers at once, the fysnc
call might block for some time.
In the following link there are some "Tuning Recommendations for
write-heavy operations" which might be useful to you.

http://www.westnet.com/~gsmith/content/linux-pdflush.htm

Any other ideas to handle these kind of problems?

Regards,
Hari Babu
Fujitsu Australia


Re: replication timeout in pg_basebackup

From
John R Pierce
Date:
On 3/9/2014 6:52 PM, Aggarwal, Ajay wrote:
Our replication timeout is default 60 seconds. If we increase the replication time to say 180 seconds, we see better results but backups still fail occasionally.

so increase it to 300 seconds, or whatever.   thats an upper limit, it needs to be big enough that you DONT get into problems when doing stuff like basebackups.





-- 
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: replication timeout in pg_basebackup

From
"Aggarwal, Ajay"
Date:
Thats exactly what I was thinking after all other experiments. Couple of questions:
1) why did you say that 300 seconds is the upper limit? Is this enforced by Postgres? What if I want to set it to 10 minutes?
2) whats the downside of bigger replication timeout?

Thanks.

Ajay

From: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] on behalf of John R Pierce [pierce@hogranch.com]
Sent: Monday, March 10, 2014 9:58 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] replication timeout in pg_basebackup

On 3/9/2014 6:52 PM, Aggarwal, Ajay wrote:
Our replication timeout is default 60 seconds. If we increase the replication time to say 180 seconds, we see better results but backups still fail occasionally.

so increase it to 300 seconds, or whatever.   thats an upper limit, it needs to be big enough that you DONT get into problems when doing stuff like basebackups.





-- 
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: replication timeout in pg_basebackup

From
John R Pierce
Date:
On 3/11/2014 5:50 AM, Aggarwal, Ajay wrote:
> Thats exactly what I was thinking after all other experiments. Couple
> of questions:
> 1) why did you say that 300 seconds is the upper limit? Is this
> enforced by Postgres? What if I want to set it to 10 minutes?
> 2) whats the downside of bigger replication timeout?


I said, set it to 300 or whatever.   An unfortunate extra comma confused
my meaning.  What I meant was, whatever you set it to, thats a upper limit.

As I understand it, that timeout is how long the replication can lag the
server before the server decides to stop replication.   with it at 300,
under heavy load, the replication could run as much as 5 minutes (300
seconds) behind before it errors.



--
john r pierce                                      37N 122W
somewhere on the middle of the left coast