Thread: streaming replication not working

streaming replication not working

From
John DeSoi
Date:
I have a 9.2.4 (CentOS 6.5) standby that has been working with no problems using log shipping. I wanted to add
streamingreplication which I thought would be as simple as adding primary_conninfo to recovery.conf and restarting the
standby.But on restart there is no message or error about connecting to the primary for replication.
pg_stat_replicationis empty on the primary and I don't see any errors on the primary either.  

Here is what I have on the standby:

postgresql.conf

hot_standby = on
max_wal_senders = 2
wal_level = hot_standby


recovery.conf

standby_mode = 'on'
trigger_file = '/pgsql/9.2/data/failover.trigger'
primary_conninfo = 'host=localhost port=21333 user=postgres'
restore_command = 'pg_standby -t /pgsql/9.2/data/failover.trigger /shared/pgbackup %f %p %r'
archive_cleanup_command = 'pg_archivecleanup /shared/pgbackup %r'


I have a ssh tunnel setup on localhost and have verified the replication user can connect to the primary.

Am I missing something obvious? Do I have to back up the primary again to make this change?

Thanks,

John DeSoi, Ph.D.



Re: streaming replication not working

From
Ray Stell
Date:
On Sep 23, 2013, at 11:20 AM, John DeSoi <desoi@pgedit.com> wrote:
>
> Am I missing something obvious? Do I have to back up the primary again to make this change?

you didn't mention a pg_hba.conf rule.  did you add one for the replication user?

Re: streaming replication not working

From
John DeSoi
Date:
On Sep 23, 2013, at 1:00 PM, Ray Stell <stellr@vt.edu> wrote:

>>
>> Am I missing something obvious? Do I have to back up the primary again to make this change?
>
> you didn't mention a pg_hba.conf rule.  did you add one for the replication user?


You mean on the primary, right? Yes, I have one there. But even if I did not, I would expect to see a connection error
inthe log on the standby. No error or any indication the streaming replication process is running on the standby.  

John DeSoi, Ph.D.



Re: streaming replication not working

From
John DeSoi
Date:
On Sep 23, 2013, at 1:00 PM, Ray Stell <stellr@vt.edu> wrote:

>>
>> Am I missing something obvious? Do I have to back up the primary again to make this change?
>
> you didn't mention a pg_hba.conf rule.  did you add one for the replication user?


You mean on the primary, right? Yes, I have one there. But even if I did not, I would expect to see a connection error
inthe log on the standby. No error or any indication the streaming replication process is running on the standby.  

John DeSoi, Ph.D.



Re: streaming replication not working

From
Ray Stell
Date:

On Sep 23, 2013, at 4:47 PM, John DeSoi wrote:

You mean on the primary, right?

right


Yes, I have one there. But even if I did not, I would expect to see a connection error in the log on the standby. No error or any indication the streaming replication process is running on the standby.

you're right, I was firing from the hip.  sure enough, it does toss an error:
2013-09-23 18:00:01 EDT,0,authentication FATAL:  28000: no pg_hba.conf entry for replication connection from host "xxx", user "repuser", SSL off

I'd guess a firewall issue?  What happens with "telnet primary_host port"   maybe use tcpdump to see what's happening with the traffic?  

Re: streaming replication not working

From
Karl Denninger
Date:
On 9/23/2013 9:30 PM, Ray Stell wrote:

On Sep 23, 2013, at 4:47 PM, John DeSoi wrote:

You mean on the primary, right?

right


Yes, I have one there. But even if I did not, I would expect to see a connection error in the log on the standby. No error or any indication the streaming replication process is running on the standby.

you're right, I was firing from the hip.  sure enough, it does toss an error:
2013-09-23 18:00:01 EDT,0,authentication FATAL:  28000: no pg_hba.conf entry for replication connection from host "xxx", user "repuser", SSL off

I'd guess a firewall issue?  What happens with "telnet primary_host port"   maybe use tcpdump to see what's happening with the traffic?  

No, there is a missing line in pg_hba.conf that should look something like this:

host    replication     repuser        xxx           trust

(where "xxx" is the hostname)

See the pg_hba.conf file for more examples.  Note that "replication" is a special database tag and a replicating connection must have one of these defined as "all" does not match it.

You can use "host", "hostssl" or "hostnossl"; "trust" means that no password is demanded and for obvious reasons should NOT be used for other than a local connection that can be trusted implicitly.  I prefer not to use that method for other than local socket connections and then only on a machine where nobody signs in that is untrusted (e.g. only admins are permitted general access.)  If you are connecting over an insecure channel or untrusted users are on the machine then consider SSL to encrypt the traffic and either use md5 for the password or use a certificate.

You can reload the file without restarting postgres with "pg_ctl -D data-directory reload"

(where "data-directory" is wherever the data directory that has the pg_hba.conf file -- and the rest of the base of the data store -- is)

--
Karl Denninger
karl@denninger.net
Cuda Systems LLC
Attachment

Re: streaming replication not working

From
Albe Laurenz
Date:
John DeSoi wrote:
> I have a 9.2.4 (CentOS 6.5) standby that has been working with no problems using log shipping. I
> wanted to add streaming replication which I thought would be as simple as adding primary_conninfo to
> recovery.conf and restarting the standby. But on restart there is no message or error about connecting
> to the primary for replication. pg_stat_replication is empty on the primary and I don't see any errors
> on the primary either.
> 
> Here is what I have on the standby:
> 
> postgresql.conf
> 
> hot_standby = on
> max_wal_senders = 2
> wal_level = hot_standby

You should set the same parameters on the primary, else it won't work.

> recovery.conf
> 
> standby_mode = 'on'
> trigger_file = '/pgsql/9.2/data/failover.trigger'
> primary_conninfo = 'host=localhost port=21333 user=postgres'
> restore_command = 'pg_standby -t /pgsql/9.2/data/failover.trigger /shared/pgbackup %f %p %r'
> archive_cleanup_command = 'pg_archivecleanup /shared/pgbackup %r'
> 
> 
> I have a ssh tunnel setup on localhost and have verified the replication user can connect to the
> primary.
> 
> Am I missing something obvious? Do I have to back up the primary again to make this change?

No, that shouldn't be necessary.

What are the messages in the secondary's log
after you start it?

Yours,
Laurenz Albe

Re: streaming replication not working

From
John DeSoi
Date:
On Sep 24, 2013, at 5:48 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

>> Here is what I have on the standby:
>>
>> postgresql.conf
>>
>> hot_standby = on
>> max_wal_senders = 2
>> wal_level = hot_standby
>
> You should set the same parameters on the primary, else it won't work.

On the primary I have


wal_level = hot_standby
wal_keep_segments = 48
max_wal_senders = 2

archive_mode = on
archive_command = 'rsync --whole-file --ignore-existing --delete-after --timeout=30 -a %p backup:/path/backup/%f'
archive_timeout = 300

I don't have "hot_standby = on". I thought that only applied to the standby to allow queries in recovery mode?


>> Am I missing something obvious? Do I have to back up the primary again to make this change?
>
> No, that shouldn't be necessary.
>
> What are the messages in the secondary's log
> after you start it?


Below is what I have for restart with log level at debug2. In looking at the source, I should have an error or a log
message"streaming replication successfully connected to primary". I never get either one. 


LOG:  restored log file "000000010000010F00000001" from archive
LOG:  restored log file "000000010000010F00000002" from archive
LOG:  received fast shutdown request
LOG:  aborting any active transactions
LOG:  shutting down
LOG:  database system is shut down
LOG:  database system was shut down in recovery at 2013-09-24 08:23:53 CDT
DEBUG:  standby_mode = 'on'
DEBUG:  trigger_file = '/pgsql/9.2/data/failover.trigger'
DEBUG:  primary_conninfo = 'host=localhost port=21333 user=postgres'
DEBUG:  restore_command = 'pg_standby -t /pgsql/9.2/data/failover.trigger /shared/pgbackup %f %p %r'
DEBUG:  archive_cleanup_command = 'pg_archivecleanup /shared/pgbackup %r'
LOG:  entering standby mode
LOG:  restored log file "000000010000010F00000002" from archive
DEBUG:  got WAL segment from archive
DEBUG:  checkpoint record is at 10F/204D9B8
DEBUG:  redo record is at 10F/2035608; shutdown FALSE
DEBUG:  next transaction ID: 0/3837659; next OID: 13124290
DEBUG:  next MultiXactId: 3; next MultiXactOffset: 5
DEBUG:  oldest unfrozen transaction ID: 1798, in database 1
DEBUG:  transaction ID wrap limit is 2147485445, limited by database with OID 1
DEBUG:  resetting unlogged relations: cleanup 1 init 0
DEBUG:  initializing for hot standby
LOG:  redo starts at 10F/2035608
DEBUG:  recovery snapshots are now enabled
CONTEXT:  xlog redo  running xacts: nextXid 3837668 latestCompletedXid 3837667 oldestRunningXid 3837668
DEBUG:  checkpointer updated shared memory configuration values
LOG:  consistent recovery state reached at 10F/3000000
LOG:  database system is ready to accept read only connections
LOG:  restored log file "000000010000010F00000003" from archive
DEBUG:  got WAL segment from archive



Thanks,

John DeSoi, Ph.D.

Re: streaming replication not working

From
Guillaume Lelarge
Date:
On Tue, 2013-09-24 at 08:59 -0500, John DeSoi wrote:
> On Sep 24, 2013, at 5:48 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
>
> >> Here is what I have on the standby:
> >>
> >> postgresql.conf
> >>
> >> hot_standby = on
> >> max_wal_senders = 2
> >> wal_level = hot_standby
> >
> > You should set the same parameters on the primary, else it won't work.
>
> On the primary I have
>
>
> wal_level = hot_standby
> wal_keep_segments = 48
> max_wal_senders = 2
>
> archive_mode = on
> archive_command = 'rsync --whole-file --ignore-existing --delete-after --timeout=30 -a %p backup:/path/backup/%f'
> archive_timeout = 300
>
> I don't have "hot_standby = on". I thought that only applied to the standby to allow queries in recovery mode?
>
>
> >> Am I missing something obvious? Do I have to back up the primary again to make this change?
> >
> > No, that shouldn't be necessary.
> >
> > What are the messages in the secondary's log
> > after you start it?
>
>
> Below is what I have for restart with log level at debug2. In looking at the source, I should have an error or a log
message"streaming replication successfully connected to primary". I never get either one. 
>
>
> LOG:  restored log file "000000010000010F00000001" from archive
> LOG:  restored log file "000000010000010F00000002" from archive
> LOG:  received fast shutdown request
> LOG:  aborting any active transactions
> LOG:  shutting down
> LOG:  database system is shut down
> LOG:  database system was shut down in recovery at 2013-09-24 08:23:53 CDT
> DEBUG:  standby_mode = 'on'
> DEBUG:  trigger_file = '/pgsql/9.2/data/failover.trigger'
> DEBUG:  primary_conninfo = 'host=localhost port=21333 user=postgres'
> DEBUG:  restore_command = 'pg_standby -t /pgsql/9.2/data/failover.trigger /shared/pgbackup %f %p %r'
> DEBUG:  archive_cleanup_command = 'pg_archivecleanup /shared/pgbackup %r'
> LOG:  entering standby mode
> LOG:  restored log file "000000010000010F00000002" from archive
> DEBUG:  got WAL segment from archive
> DEBUG:  checkpoint record is at 10F/204D9B8
> DEBUG:  redo record is at 10F/2035608; shutdown FALSE
> DEBUG:  next transaction ID: 0/3837659; next OID: 13124290
> DEBUG:  next MultiXactId: 3; next MultiXactOffset: 5
> DEBUG:  oldest unfrozen transaction ID: 1798, in database 1
> DEBUG:  transaction ID wrap limit is 2147485445, limited by database with OID 1
> DEBUG:  resetting unlogged relations: cleanup 1 init 0
> DEBUG:  initializing for hot standby
> LOG:  redo starts at 10F/2035608
> DEBUG:  recovery snapshots are now enabled
> CONTEXT:  xlog redo  running xacts: nextXid 3837668 latestCompletedXid 3837667 oldestRunningXid 3837668
> DEBUG:  checkpointer updated shared memory configuration values
> LOG:  consistent recovery state reached at 10F/3000000
> LOG:  database system is ready to accept read only connections
> LOG:  restored log file "000000010000010F00000003" from archive
> DEBUG:  got WAL segment from archive
>

Your config file and your debug logs don't match. Your config file says
that the restore command is rsync, but your logs say its pg_standby.

Check if you have a pg_standby process on the slave. That would explain
why the slave never tries to establish a replication connection to the
master.


--
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com



Re: streaming replication not working

From
John DeSoi
Date:
On Sep 25, 2013, at 8:36 AM, Guillaume Lelarge <guillaume@lelarge.info> wrote:

> Your config file and your debug logs don't match. Your config file says
> that the restore command is rsync, but your logs say its pg_standby.
>
> Check if you have a pg_standby process on the slave. That would explain
> why the slave never tries to establish a replication connection to the
> master.

rsync is only used in the primary configuration to push the WAL files to the standby. But pg_standby is indeed the
problem.I thought pg_standby was a more feature rich option than using cp for the restore command. I see now the
documentationsays it supports creation of a "warm standby". It did not occur to me this meant the standby could not
connectto the primary for streaming replication. Even when using pg_standby, the server was really a "hot standby"
becauseI was able to connect to it and make read-only queries. I think it would be helpful for pg_standby to emit a
warningif primary_conninfo is set it the recovery.conf. 

I changed the restore command to use cp and now everything appears to be working as expected.

Thanks very much for your help and to everyone who offered suggestions.

John DeSoi, Ph.D.