Thread: Streaming replication with sync slave, but disconnects due to missing WAL segments

Streaming replication with sync slave, but disconnects due to missing WAL segments

From
Mads.Tandrup@schneider-electric.com
Date:
Hi all

I have a question about sync streaming replication.

I have 2 postgresql 9.1 servers set up with streaming replication. On the
master node the slave is configured as a synchronous standby. I've verified
that pg_stat_replication shows sync_state = sync for the slave node.

It all seems to work fine. But I have noticed that sometimes when I restore
backups created by pg_dump. The slave node will disconnect with the message
in the postgresql log:
2013-06-03 13:13:48 GMT 4271  FATAL:  could not receive data from WAL
stream: SSL connection has been closed unexpectedly
2013-06-03 13:13:53 GMT 4270  LOG:  invalid magic number 0000 in log file
15, segment 65, offset 11665408
2013-06-03 13:13:54 GMT 36428  LOG:  streaming replication successfully
connected to primary
2013-06-03 13:13:54 GMT 36428  FATAL:  could not receive data from WAL
stream: FATAL:  requested WAL segment 000000010000000F00000041 has already
been removed
2013-06-03 13:13:58 GMT 36458  LOG:  streaming replication successfully
connected to primary
2013-06-03 13:13:58 GMT 36458  FATAL:  could not receive data from WAL
stream: FATAL:  requested WAL segment 000000010000000F00000041 has already
been removed

On the master I get this in the log file in the same timespan:
2013-06-03 13:13:47 GMT 1471  LOG:  checkpoints are occurring too
frequently (2 seconds apart)
2013-06-03 13:13:47 GMT 1471  HINT:  Consider increasing the configuration
parameter "checkpoint_segments".
2013-06-03 13:13:48 GMT 6189 [unknown] FATAL:  requested WAL segment
000000010000000F00000041 has already been removed
2013-06-03 13:13:48 GMT 6189 [unknown] LOG:  disconnection: session time:
77:37:37.684 user=root database= host=10.216.80.38 port=56114
2013-06-03 13:13:49 GMT 1471  LOG:  checkpoints are occurring too
frequently (2 seconds apart)
2013-06-03 13:13:49 GMT 1471  HINT:  Consider increasing the configuration
parameter "checkpoint_segments".
2013-06-03 13:13:51 GMT 1471  LOG:  checkpoints are occurring too
frequently (2 seconds apart)
2013-06-03 13:13:51 GMT 1471  HINT:  Consider increasing the configuration
parameter "checkpoint_segments".
2013-06-03 13:13:51 GMT 1468  LOG:  received SIGHUP, reloading
configuration files
2013-06-03 13:13:51 GMT 1468  LOG:  parameter "synchronous_standby_names"
removed from configuration file, reset to default
2013-06-03 13:13:53 GMT 1471  LOG:  checkpoints are occurring too
frequently (2 seconds apart)
2013-06-03 13:13:53 GMT 1471  HINT:  Consider increasing the configuration
parameter "checkpoint_segments".
2013-06-03 13:13:53 GMT 44063 [unknown] LOG:  connection received:
host=10.216.80.38 port=34038
2013-06-03 13:13:54 GMT 44063 [unknown] LOG:  replication connection
authorized: user=root
2013-06-03 13:13:54 GMT 44063 [unknown] FATAL:  requested WAL segment
000000010000000F00000041 has already been removed
2013-06-03 13:13:54 GMT 44063 [unknown] LOG:  disconnection: session time:
0:00:00.090 user=root database= host=10.216.80.38 port=34038

What I don't understand is how the slave node can miss a WAL segment since
it should be sync?

Shouldn't sync prevent the server from continuing if the slave is not able
to get WAL segments fast enough?

I have only noticed it while restoring a database. But the general load on
the DB has not been that high, so I'm not sure if it can occur with other
workloads.

Best regards,
Mads



On Tue, Jun 4, 2013 at 6:25 AM, <Mads.Tandrup@schneider-electric.com> wrote:

Hi all

I have a question about sync streaming replication.

I have 2 postgresql 9.1 servers set up with streaming replication. On the
master node the slave is configured as a synchronous standby. I've verified
that pg_stat_replication shows sync_state = sync for the slave node.

It all seems to work fine. But I have noticed that sometimes when I restore
backups created by pg_dump. The slave node will disconnect with the message
in the postgresql log:

You cannot use pg_dump to set up a database for receiving streaming replication.

For that it must be a physical copy, not a logical copy which is what pg_dump generates.

Cheers,

Jeff

vacuum_cost_delay and autovacuum_cost_delay

From
"ascot.moss@gmail.com"
Date:
Hi,

if I change the value of "vacuum_cost_delay", what is the impact of it on autovacuum  side?

regards

Re: vacuum_cost_delay and autovacuum_cost_delay

From
Sergey Konoplev
Date:
On Tue, Jun 4, 2013 at 5:06 PM, ascot.moss@gmail.com
<ascot.moss@gmail.com> wrote:
> if I change the value of "vacuum_cost_delay", what is the impact of it on autovacuum  side?

If autovacuum_vacuum_cost_delay is not -1 then autovacuum will use
this value, and there will be no effect of changing vacuum_cost_delay
on autovacuum. However, if it is -1 than it will use the value of
vacuum_cost_delay.

http://www.postgresql.org/docs/9.2/static/runtime-config-autovacuum.html#GUC-AUTOVACUUM-VACUUM-COST-DELAY

>
> regards
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com


Jeff Janes wrote:
> On Tue, Jun 4, 2013 at 6:25 AM, <Mads.Tandrup@schneider-electric.com> wrote:
>> I have a question about sync streaming replication.
>> 
>> I have 2 postgresql 9.1 servers set up with streaming replication. On the
>> master node the slave is configured as a synchronous standby. I've verified
>> that pg_stat_replication shows sync_state = sync for the slave node.
>> 
>> It all seems to work fine. But I have noticed that sometimes when I restore
>> backups created by pg_dump. The slave node will disconnect with the message
>> in the postgresql log:
> 
> You cannot use pg_dump to set up a database for receiving streaming replication.
> 
> For that it must be a physical copy, not a logical copy which is what pg_dump generates.

I think you misunderstood - he restores a dump on
the *primary* and that makes replication fall over
because it cannot catch up.

Yours,
Laurenz Albe

On Wed, Jun 5, 2013 at 1:30 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Jeff Janes wrote:
> On Tue, Jun 4, 2013 at 6:25 AM, <Mads.Tandrup@schneider-electric.com> wrote:
>> I have a question about sync streaming replication.
>>
>> I have 2 postgresql 9.1 servers set up with streaming replication. On the
>> master node the slave is configured as a synchronous standby. I've verified
>> that pg_stat_replication shows sync_state = sync for the slave node.
>>
>> It all seems to work fine. But I have noticed that sometimes when I restore
>> backups created by pg_dump. The slave node will disconnect with the message
>> in the postgresql log:
>
> You cannot use pg_dump to set up a database for receiving streaming replication.
>
> For that it must be a physical copy, not a logical copy which is what pg_dump generates.

I think you misunderstood - he restores a dump on
the *primary* and that makes replication fall over
because it cannot catch up.

Ah, I see.  And the synchronization point doesn't help, because it only occurs at commit-time and if the dump is being loaded in a single transaction, then the transaction could last for several checkpoints.  By the time it tries to synchronize, it is already too late.

Cheers,

Jeff

Re: Streaming replication with sync slave, but disconnects due to missing WAL segments

From
Mads.Tandrup@schneider-electric.com
Date:
<font face="Default Sans Serif,Verdana,Arial,Helvetica,sans-serif" size="2"><div>Hi</div><div><br /></div><div>Thanks
foryour reply. Do you know of any options that I could give pg_dump/psql to avoid creating one big transaction? I'm
usingthe plain text format for pg_dump.</div><div><br /></div><div>Best regards,</div><div>Mads</div><br /><br /><font
color="#990099">-----pgsql-general-owner@postgresql.orgskrev: -----</font><div style="padding-left:5px;"><div
style="padding-right:0px;padding-left:5px;border-left:solidblack 2px;">Til: Albe Laurenz
<laurenz.albe@wien.gv.at><br/>Fra: Jeff Janes <br />Sendt af: pgsql-general-owner@postgresql.org<br />Dato:
05-06-201312:43<br />Cc: "Mads.Tandrup@schneider-electric.com" <Mads.Tandrup@schneider-electric.com>,
"pgsql-general@postgresql.org"<pgsql-general@postgresql.org><br />Emne: Re: [GENERAL] Streaming replication with
syncslave, but disconnects due to missing WAL segments<br /><br /><div dir="ltr">On Wed, Jun 5, 2013 at 1:30 AM, Albe
Laurenz<span dir="ltr"><<a href="mailto:laurenz.albe@wien.gv.at"
target="_blank">laurenz.albe@wien.gv.at</a>></span>wrote:<br /><div class="gmail_extra"><div
class="gmail_quote"><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex"><divclass="im">Jeff Janes wrote:<br /> > On Tue, Jun 4, 2013 at 6:25 AM, <<a
href="mailto:Mads.Tandrup@schneider-electric.com">Mads.Tandrup@schneider-electric.com</a>>wrote:<br /></div><div
class="im">>>I have a question about sync streaming replication.<br /> >><br /> >> I have 2
postgresql9.1 servers set up with streaming replication. On the<br /> >> master node the slave is configured as a
synchronousstandby. I've verified<br /> >> that pg_stat_replication shows sync_state = sync for the slave
node.<br/> >><br /> >> It all seems to work fine. But I have noticed that sometimes when I restore<br />
>>backups created by pg_dump. The slave node will disconnect with the message<br /> >> in the postgresql
log:<br/> ><br /> > You cannot use pg_dump to set up a database for receiving streaming replication.<br />
><br/> > For that it must be a physical copy, not a logical copy which is what pg_dump generates.<br /><br
/></div>Ithink you misunderstood - he restores a dump on<br /> the *primary* and that makes replication fall over<br />
becauseit cannot catch up.<br /></blockquote><div><br /></div><div style="">Ah, I see.  And the synchronization point
doesn'thelp, because it only occurs at commit-time and if the dump is being loaded in a single transaction, then the
transactioncould last for several checkpoints.  By the time it tries to synchronize, it is already too late.</div><div
style=""><br/></div><div style="">Cheers,</div><div style=""><br /></div><div style="">Jeff</div></div></div></div><br
clear="both"/> ______________________________________________________________________<br /> This email has been scanned
bythe Symantec Email Security.cloud service.<br />
______________________________________________________________________<br/></div></div><div></div></font> 
On 06/04/2013 08:25 AM, Mads.Tandrup@schneider-electric.com wrote:

> It all seems to work fine. But I have noticed that sometimes when I
> restore backups created by pg_dump. The slave node will disconnect
> with the message in the postgresql log:

You need to increase wal_keep_segments on the master. Any pg_dump or
pg_restore is one giant transaction, and the dirty little secret about
PG replication, is that no slave can replay while a transaction is in
progress.

The slave keeps waiting while the master is doing all this extra work,
but without enough segments in reserve, the master may archive or delete
what it considers "excess" segments before slaves can receive or process
them.

One way to avoid keeping these around is to set your archive_command to
transmit archived WAL files to a location where the slave can read them.
Then in your recovery.conf, you can make it look something like this:

standby_mode     = 'on'
primary_conninfo = 'host=master-node user=replication'
trigger_file     = '/db/pgdata/promote'
restore_command  = 'cp -f /db/wal/pg_archived/%f %p'

So if your connection gets disrupted, or the slave falls behind, it can
always catch up again.

Honestly in sync rep, I'm surprised the master doesn't keep segments
until a waiting slave either disconnects, or can consume WAL files being
held up by a long-running transaction. Not that it matters, since you
can fake that behavior this way.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


On Wed, Jun 5, 2013 at 11:26 PM, <Mads.Tandrup@schneider-electric.com> wrote:
Hi

Thanks for your reply. Do you know of any options that I could give pg_dump/psql to avoid creating one big transaction? I'm using the plain text format for pg_dump.


For the plain text format, it is already not one big transaction, unless you specify to -1 to the psql.  However, the load of any individual table will still be a single transaction, so for a very large table it will still be a very long transaction.  

Using pg_dump for --inserts could get around this, but it would probably be better to fix the fundamental problem by increasing wal_keep_segments or something of that nature.

Cheers,

Jeff

Re: Streaming replication with sync slave, but disconnects due to missing WAL segments

From
Mads.Tandrup@schneider-electric.com
Date:
Hi Jeff.

Thanks for the clarification.

I'll adjust wal_keep_segments for the expected biggest table in the backup.

Best regards,
Mads





From:    Jeff Janes <jeff.janes@gmail.com>
To:    "Mads.Tandrup@schneider-electric.com"
            <Mads.Tandrup@schneider-electric.com>,
Cc:    Albe Laurenz <laurenz.albe@wien.gv.at>,
            "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Date:    06-06-2013 18:33
Subject:    Re: [GENERAL] Streaming replication with sync slave, but
            disconnects due to missing WAL segments
Sent by:    pgsql-general-owner@postgresql.org



On Wed, Jun 5, 2013 at 11:26 PM, <Mads.Tandrup@schneider-electric.com>
wrote:
      Hi

      Thanks for your reply. Do you know of any options that I could give
      pg_dump/psql to avoid creating one big transaction? I'm using the
      plain text format for pg_dump.


For the plain text format, it is already not one big transaction, unless
you specify to -1 to the psql.  However, the load of any individual table
will still be a single transaction, so for a very large table it will still
be a very long transaction.

Using pg_dump for --inserts could get around this, but it would probably be
better to fix the fundamental problem by increasing wal_keep_segments or
something of that nature.

Cheers,

Jeff

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
______________________________________________________________________