Re: How to start slave after pg_basebackup. Why min_wal_size andwal_keep_segments are duplicated - Mailing list pgsql-general

From Magnus Hagander
Subject Re: How to start slave after pg_basebackup. Why min_wal_size andwal_keep_segments are duplicated
Date
Msg-id CABUevEwKLAtAHAPb-kZcx+ZPxMMLboYiYL4a105M2kBB9Oy9SA@mail.gmail.com
Whole thread Raw
In response to Re: How to start slave after pg_basebackup. Why min_wal_size and wal_keep_segments are duplicated  ("Andrus" <kobruleht2@hot.ee>)
List pgsql-general


On Mon, Jun 1, 2020 at 10:17 AM Andrus <kobruleht2@hot.ee> wrote:
Hi!

> I have tried to re-initiate replica serveral times in low-use time but this error occurs again.
>remove the whole replica's PGDATA/* and do a pg_basebackup again. But before that, make sure wal_keep_segments in big enough on the
>master and,

I renamed whole cluster before pg_basebackup

>just as much important, do a vacuumdb -a (takes much space during the process) and use archiving!

I run vacuumdb --full --all before pg_basebackup

> If named replication slot is used commands like
> vacuumdb --all --full
> will cause main server crash due to disk space limit. pg_wal directory will occupy free disk space. After that main server stops.
>>if you have disk constraints you will run into trouble sooner or later anyway. Make sure, you have enough disk space. There's no
>>way around that anyway.

This space is sufficient for base backup and replication.

>> I tried using wal_keep_segments =180
>> Will setting wal_keep_segments to higher value allw replication start after pg_basebackup ?
>it depends. If you start the replica immediately and don't wait for hours or days, you should be good to go. But that depends on
>different factors, for example, how >many WAL files are written during the pg_basebackup and pg_ctl start of the replica. If more
>than 180 WALs have gone by on the master because it is really busy, >then you're probably lost again. Point being, you'll have to
>launch the replica before WALs are expired!
>Again: Make sure you have enough disk space, use archiving and use a replication slot.

I tried with wal_keep_segments=360 but problem persisists.
Server generates lot of less than 300 wal files.

Have you verified that wal_keep_segments actually end up at 360, by connecting to the database and issuing SHOW wal_keep_segments? I've seen far too many examples of people who accidentally had a second line that overrode the one they thought they changed, and thus still ran with a lower number.


Shell script starts server after pg_basebackup completes automatically:

PGHOST=example.com
PGPASSWORD=mypass
PGUSER=replikaator
export PGHOST  PGPASSWORD PGUSER
/etc/init.d/postgresql stop
mv /var/lib/postgresql/12/main /var/lib/postgresql/12/mainennebaasbakuppi
pg_basebackup --verbose --progress --write-recovery-conf -D /var/lib/postgresql/12/main
chmod --recursive --verbose 0700 /var/lib/postgresql/12/main
chown -Rv postgres:postgres /var/lib/postgresql/12/main
/etc/init.d/postgresql start

Do you get any useful output from the -v part of pg_basebackup? It should for example tell you the exact start and stop point in the wal during the basebackup, that can be  correlated to the msising file.

Normally the window between end of pg_basebackup and start of the actual service is not big enough to cause a problem (since v12 will do a streaming receive of the logs *during* the backup -- it could be a big problem before that was possible, or if one forgot to enable it before it was the default), and it certainly sounds weird that it should be in your case, unless the chmod and chown commands take a *long* time. But if it is, there is nothing preventing you from creating a slot just during setup and then get rid of it. That is:

1. create slot
2. pg_basebackup with slot
3. start replication with slot
4. restart replication without slot  once it's caught up
5. drop slot

However, if you want reliable replication, you really should have a slot. Or at least, you should have either a slot *or* log archiving that's read-accessible from the replica.

--

pgsql-general by date:

Previous
From: Paul Förster
Date:
Subject: Re: How to start slave after pg_basebackup. Why min_wal_size andwal_keep_segments are duplicated
Next
From: "Peter J. Holzer"
Date:
Subject: Re: Oracle vs. PostgreSQL - a comment