Thread: Setting up replication on Windows, v9.4

Setting up replication on Windows, v9.4

From
Brad White
Date:
I'm setting up a backup for our primary postgres server using the archived WAL files.
Then I'll try to upgrade it to Streaming Replication.
Then I'll upgrade the system to v.latest.
For now, we are on v.9.4.

I do a base backup from the primary to a directory on the NAS.

      "C:\Program Files\PostgreSQL\9.4\bin\pg_basebackup.exe" -D \\diskstation\AccessData\Dev\Backup -P -X s -v -h 192.168.1.118 -p 5432 -U postgres

That appears to go fine.
Then I delete data\*.* and copy everything except the config files from the backup into data.

Copy in recovery.conf
------
    standby_mode          = 'on'
    primary_conninfo      = 'host=192.168.1.118 port=5432 user=replication password=**********'
    restore_command = 'copy "\\\\DISKSTATION\\AccessData\\WALfiles\\%f" "%p"'
------
Copy in postgresql.conf, with settings
------
listen_addresses = '127.0.0.1,192.168.1.118'
wal_level = archive
hot_standby = on
------
Interestingly, the recovery file says 
# Note that recovery.conf must be in $PGDATA directory.
# It should NOT be located in the same directory as postgresql.conf
Those seem contradictory.
And if I remove the postgresql, it just refuses to start.

With all this in place, I start the service, it runs for a bit, then shuts down.
No errors in the event log.
5 postgres processes are left running along with a pid file.

The log file says
------
LOG:  database system was interrupted while in recovery at log time 2022-11-04 13:17:28 PDT
HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
LOG:  entering standby mode
LOG:  consistent recovery state reached at 6A/35000090
LOG:  record with zero length at 6A/35000090
LOG:  started streaming WAL from primary at 6A/35000000 on timeline 1
LOG:  redo starts at 6A/35000090
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
....
FATAL:  the database system is starting up
FATAL:  the database system is starting up

So it seems that the backup didn't work as well as first appeared.

Re: Setting up replication on Windows, v9.4

From
Ian Lawrence Barwick
Date:
2022年11月5日(土) 7:59 Brad White <b55white@gmail.com>:
>
> I'm setting up a backup for our primary postgres server using the archived WAL files.
> Then I'll try to upgrade it to Streaming Replication.
> Then I'll upgrade the system to v.latest.
> For now, we are on v.9.4.
>
> I do a base backup from the primary to a directory on the NAS.
>
>       "C:\Program Files\PostgreSQL\9.4\bin\pg_basebackup.exe" -D \\diskstation\AccessData\Dev\Backup -P -X s -v -h
192.168.1.118-p 5432 -U postgres 
>
> That appears to go fine.
> Then I delete data\*.* and copy everything except the config files from the backup into data.
>
> Copy in recovery.conf
> ------
>     standby_mode          = 'on'
>     primary_conninfo      = 'host=192.168.1.118 port=5432 user=replication password=**********'
>     restore_command = 'copy "\\\\DISKSTATION\\AccessData\\WALfiles\\%f" "%p"'
> ------
> Copy in postgresql.conf, with settings
> ------
> listen_addresses = '127.0.0.1,192.168.1.118'
> wal_level = archive
> hot_standby = on
> ------

Is this the postgresql.conf applied to the standby? Just wondering
as one of the listen_addresses is the same as the host in primary_conninfo.

> Interestingly, the recovery file says
> # Note that recovery.conf must be in $PGDATA directory.
> # It should NOT be located in the same directory as postgresql.conf
> Those seem contradictory.

I don't know where those lines come from, they're not generated by PostgreSQL.

It is certainly true that recovery.conf *must* be in the $PGDATA directory.  The
above lines would make sense if it's expected that postgresql.conf will be
located in another location (as is usually the case with Debian/Ubuntu
packages), but unless your setup is specifying that, just put both files in
$PGDATA.

> And if I remove the postgresql, it just refuses to start.
>
> With all this in place, I start the service, it runs for a bit, then shuts down.
> No errors in the event log.
> 5 postgres processes are left running along with a pid file.

I'm not familiar with Windows, but it sounds like PostgreSQL is
actually running.
What happens if you try and connect to it?

> The log file says
> ------
> LOG:  database system was interrupted while in recovery at log time 2022-11-04 13:17:28 PDT
> HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier
recoverytarget. 
> FATAL:  the database system is starting up
> FATAL:  the database system is starting up
> FATAL:  the database system is starting up
> LOG:  entering standby mode
> LOG:  consistent recovery state reached at 6A/35000090
> LOG:  record with zero length at 6A/35000090
> LOG:  started streaming WAL from primary at 6A/35000000 on timeline 1
> LOG:  redo starts at 6A/35000090
> FATAL:  the database system is starting up
> FATAL:  the database system is starting up
> FATAL:  the database system is starting up
> ....
> FATAL:  the database system is starting up
> FATAL:  the database system is starting up
>
> So it seems that the backup didn't work as well as first appeared.

The above state looks very much what would happen if the standby is running with
"hot_standby = off", and connection attempts are being made. Try attempting
to connect to it.

Also, check if there's an entry in pg_stat_replication on the primary.


Regards

Ian Barwick



Re: Setting up replication on Windows, v9.4

From
Ian Lawrence Barwick
Date:
2022年11月5日(土) 10:02 Ian Lawrence Barwick <barwick@gmail.com>:
>
> 2022年11月5日(土) 7:59 Brad White <b55white@gmail.com>:
...
> > Interestingly, the recovery file says
> > # Note that recovery.conf must be in $PGDATA directory.
> > # It should NOT be located in the same directory as postgresql.conf
> > Those seem contradictory.
>
> I don't know where those lines come from, they're not generated by PostgreSQL.
>
> It is certainly true that recovery.conf *must* be in the $PGDATA directory.  The
> above lines would make sense if it's expected that postgresql.conf will be
> located in another location (as is usually the case with Debian/Ubuntu
> packages), but unless your setup is specifying that, just put both files in
> $PGDATA.
...

Hmm, looks like those lines came from here:

     https://wiki.postgresql.org/wiki/Streaming_Replication

That's plain wrong and confusing [1], I'll update.

[1] not for the first time either:
https://serverfault.com/questions/760802/why-should-postgresql-conf-not-be-in-the-same-dir-as-recovery-conf

Ian Barwick



Re: Setting up replication on Windows, v9.4

From
Ron
Date:
On 11/4/22 17:59, Brad White wrote:
> I'm setting up a backup for our primary postgres server using the archived 
> WAL files.
> Then I'll try to upgrade it to Streaming Replication.
> Then I'll upgrade the system to v.latest.
> For now, we are on v.9.4.

FYI: v9.4 has been EOL for 2 years 9 months.   (And there have been some big 
advances since then which make replication much easier.)

-- 
Angular momentum makes the world go 'round.



Re: Setting up replication on Windows, v9.4

From
Brad White
Date:

> v9.4 has been EOL for 2 years 9 months. 
As I said, the next step will be to upgrade.
It would make sense to upgrade first, since "there have been some big
advances since then which make replication much easier"
But when we upgraded, the app stopped working.
So I'll need to go through and nail down which exact version causes the issue and then see if we can get it resolved.
I have no way of knowing how long that will take.
So we're doing the replication first.

Re: Setting up replication on Windows, v9.4

From
Rob Sargent
Date:
On 11/7/22 13:59, Brad White wrote:

> v9.4 has been EOL for 2 years 9 months. 
As I said, the next step will be to upgrade.
It would make sense to upgrade first, since "there have been some big
advances since then which make replication much easier"
But when we upgraded, the app stopped working.
So I'll need to go through and nail down which exact version causes the issue and then see if we can get it resolved.
I have no way of knowing how long that will take.
So we're doing the replication first.
Care to share some of the ways the app stopped working?  You might get a leg up on where best to remediate.

Re: Setting up replication on Windows, v9.4

From
Brad White
Date:


On 11/7/2022 3:42 PM, Rob Sargent wrote:
> Care to share some of the ways the app stopped working?  You might get a leg up on where best to remediate.
I don't recall, as that was a few months ago.

We are running MS-Access as a front end with Postgres as the back end.
It appeared to read and write data perfectly fine, but failed in the business logic on the Access side where there didn't appear to be any DB involvement. But switching to v14 caused it and switching away fixed it.
No obvious cause that I could see.
As written, switching databases was very time-consuming.
Since then, I've rewritten the app so that the network admin just changes an INI file that the app reads on start up.
If there is a change, it switches all the tables over in about 15 seconds.
We wanted that ability anyway in case we ever needed to switch to using the backup server.
Now, I am working on getting replication going.
Next up: back to upgrading.

But, since you ask, I'll post here when I get back to that so y'all can follow along as I narrow it down to a specific version.

Re: Setting up replication on Windows, v9.4

From
Brad White
Date:
From the error log:
     FATAL:  hot standby is not possible because wal_level was not set to "hot_standby" or higher on the master server
     HINT:  Either set wal_level to "hot_standby" on the master, or turn off hot_standby here.
I tried setting hot_standby to off, but that didn't help.
I set the wal_level to "hot_standby" and hot_standby to on, and now it appears to have made progress.
The service starts up without failing.

> check if there's an entry in pg_stat_replication on the primary.
No. No entries.
There is now an entry on the primary.
 
But still can't connect from Navicat on the primary server and the log says

    FATAL:  could not receive data from WAL stream: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

    LOG:  record with zero length at 6B/63000090
    FATAL:  could not connect to the primary server: FATAL:  the database system is starting up

    LOG:  started streaming WAL from primary at 6B/63000000 on timeline 1
    LOG:  startup process (PID 3368) exited with exit code 1
    LOG:  terminating any other active server processes

¯\_(ツ)_/¯

Re: Setting up replication on Windows, v9.4

From
Ron
Date:
On 11/10/22 21:24, Brad White wrote:
On 11/7/2022 3:42 PM, Rob Sargent wrote:
> Care to share some of the ways the app stopped working?  You might get a leg up on where best to remediate.
I don't recall, as that was a few months ago.

We are running MS-Access as a front end with Postgres as the back end.
It appeared to read and write data perfectly fine, but failed in the business logic on the Access side where there didn't appear to be any DB involvement. But switching to v14 caused it and switching away fixed it.
No obvious cause that I could see.

Nothing in the application error logs?  What about the Postgresql error logs?   Or manually running the application queries?

Anyway, try Pg 9.6.  Still EOL, but not as EOL.

--
Angular momentum makes the world go 'round.

Re: Setting up replication on Windows, v9.4

From
Brad White
Date:
> Or manually running the application queries?
That was the odd thing. It didn't appear to be in the middle of running any queries. The database shouldn't have had any effect.

> Anyway, try Pg 9.6.
I have Pg 12 installed, so I'll try that.
I'll start a new thread on my adventures in upgrading.

Thanks,
Brad.


On Fri, Nov 11, 2022 at 12:38 PM Ron <ronljohnsonjr@gmail.com> wrote:
   Nothing in the application error logs?  What about the Postgresql error logs?   Or manually running the application queries?

Anyway, try Pg 9.6.  Still EOL, but not as EOL.

--
Angular momentum makes the world go 'round.

Re: Setting up replication on Windows, v9.4

From
Ron
Date:
Cranking up the log_min_messages level might help, to see what is (or is not) reaching the server.

On 11/11/22 14:35, Brad White wrote:
> Or manually running the application queries?
That was the odd thing. It didn't appear to be in the middle of running any queries. The database shouldn't have had any effect.

> Anyway, try Pg 9.6.
I have Pg 12 installed, so I'll try that.
I'll start a new thread on my adventures in upgrading.

Thanks,
Brad.


On Fri, Nov 11, 2022 at 12:38 PM Ron <ronljohnsonjr@gmail.com> wrote:
   Nothing in the application error logs?  What about the Postgresql error logs?   Or manually running the application queries?

Anyway, try Pg 9.6.  Still EOL, but not as EOL.

--
Angular momentum makes the world go 'round.

--
Angular momentum makes the world go 'round.

Re: Setting up replication on Windows, v9.4

From
Ron
Date:
Note that WAL replication replicates the whole instance not just specific databases.  You need logical replication for that.

Also, I just learned that 9.4 does have WAL replication slots (which makes replication much easier).  v9.6 is where replication is "like rolling off a log" simple.

On 11/4/22 17:59, Brad White wrote:
I'm setting up a backup for our primary postgres server using the archived WAL files.
Then I'll try to upgrade it to Streaming Replication.
Then I'll upgrade the system to v.latest.
For now, we are on v.9.4.

I do a base backup from the primary to a directory on the NAS.

      "C:\Program Files\PostgreSQL\9.4\bin\pg_basebackup.exe" -D \\diskstation\AccessData\Dev\Backup -P -X s -v -h 192.168.1.118 -p 5432 -U postgres

That appears to go fine.
Then I delete data\*.* and copy everything except the config files from the backup into data.

Copy in recovery.conf
------
    standby_mode          = 'on'
    primary_conninfo      = 'host=192.168.1.118 port=5432 user=replication password=**********'
    restore_command = 'copy "\\\\DISKSTATION\\AccessData\\WALfiles\\%f" "%p"'
------
Copy in postgresql.conf, with settings
------
listen_addresses = '127.0.0.1,192.168.1.118'
wal_level = archive
hot_standby = on
------
Interestingly, the recovery file says 
# Note that recovery.conf must be in $PGDATA directory.
# It should NOT be located in the same directory as postgresql.conf
Those seem contradictory.
And if I remove the postgresql, it just refuses to start.

With all this in place, I start the service, it runs for a bit, then shuts down.
No errors in the event log.
5 postgres processes are left running along with a pid file.

The log file says
------
LOG:  database system was interrupted while in recovery at log time 2022-11-04 13:17:28 PDT
HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
LOG:  entering standby mode
LOG:  consistent recovery state reached at 6A/35000090
LOG:  record with zero length at 6A/35000090
LOG:  started streaming WAL from primary at 6A/35000000 on timeline 1
LOG:  redo starts at 6A/35000090
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
....
FATAL:  the database system is starting up
FATAL:  the database system is starting up

So it seems that the backup didn't work as well as first appeared.

--
Angular momentum makes the world go 'round.