Thread: Replication: slave server has 3x size of production server?

Replication: slave server has 3x size of production server?

From

Edson Richter

Date:

22 February 2020, 17:25:08

Hi!

I've a database cluster created at 9.6.10 linux x64 server rhel. I made progressive upgrades, first upgrading slave and then upgrading master.

Actually both are running 9.6.17.

Current production server has 196Gb in size.

Nevertheless, the replicated (slave) server has 598 Gb in size.

Replication server has 3x size of production server, is that normal?

Shall I drop the slave server and re-create it? How to avoid this situation in future?

Thanks,

Edson

Re: Replication: slave server has 3x size of production server?

From

Adrian Klaver

Date:

22 February 2020, 17:33:37

On 2/22/20 9:25 AM, Edson Richter wrote:
> Hi!
> 
> I've a database cluster created at 9.6.10 linux x64 server rhel. I made 
> progressive upgrades, first upgrading slave and then upgrading master.
> Actually both are running 9.6.17.
> Current production server has 196Gb in size.
> Nevertheless, the replicated (slave) server has 598 Gb in size.
> Replication server has 3x size of production server, is that normal?

How are you measuring the sizes?

Where is the space being taken up on disk?

> 
> Shall I drop the slave server and re-create it? How to avoid this 
> situation in future?
> 
> Thanks,
> 
> Edson
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com

RE: Replication: slave server has 3x size of production server?

From

Edson Richter

Date:

22 February 2020, 18:05:14

De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: sábado, 22 de fevereiro de 2020 14:33
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?

On 2/22/20 9:25 AM, Edson Richter wrote:
> Hi!
>
> I've a database cluster created at 9.6.10 linux x64 server rhel. I made
> progressive upgrades, first upgrading slave and then upgrading master.
> Actually both are running 9.6.17.
> Current production server has 196Gb in size.
> Nevertheless, the replicated (slave) server has 598 Gb in size.
> Replication server has 3x size of production server, is that normal?

How are you measuring the sizes?

This is the command:

du --max-depth 1 -h pgDbCluster

Production:

du --max-depth 1 -h pgDbCluster

56M pgDbCluster/pg_log

444K pgDbCluster/global

4,0K pgDbCluster/pg_stat

4,0K pgDbCluster/pg_snapshots

16K pgDbCluster/pg_logical

20K pgDbCluster/pg_replslot

61M pgDbCluster/pg_subtrans

4,0K pgDbCluster/pg_commit_ts

465M pgDbCluster/pg_xlog

4,0K pgDbCluster/pg_twophase

12M pgDbCluster/pg_multixact

4,0K pgDbCluster/pg_serial

195G pgDbCluster/base

284K pgDbCluster/pg_stat_tmp

12M pgDbCluster/pg_clog

4,0K pgDbCluster/pg_dynshmem

12K pgDbCluster/pg_notify

4,0K pgDbCluster/pg_tblspc

196G pgDbCluster

Slave:

du -h --max-depth 1 pgDbCluster

403G pgDbCluster/pg_xlog

120K pgDbCluster/pg_log

424K pgDbCluster/global

0 pgDbCluster/pg_stat

0 pgDbCluster/pg_snapshots

4,0K pgDbCluster/pg_logical

8,0K pgDbCluster/pg_replslot

60M pgDbCluster/pg_subtrans

0 pgDbCluster/pg_commit_ts

0 pgDbCluster/pg_twophase

11M pgDbCluster/pg_multixact

0 pgDbCluster/pg_serial

195G pgDbCluster/base

12M pgDbCluster/pg_clog

0 pgDbCluster/pg_dynshmem

8,0K pgDbCluster/pg_notify

12K pgDbCluster/pg_stat_tmp

0 pgDbCluster/pg_tblspc

598G pgDbCluster

Edson

Where is the space being taken up on disk?

>
> Shall I drop the slave server and re-create it? How to avoid this
> situation in future?
>
> Thanks,
>
> Edson
>
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Replication: slave server has 3x size of production server?

From

Adrian Klaver

Date:

22 February 2020, 18:50:26

On 2/22/20 10:05 AM, Edson Richter wrote:
>     ------------------------------------------------------------------------
> 
>     *De:* Adrian Klaver <adrian.klaver@aklaver.com>
>     *Enviado:* sábado, 22 de fevereiro de 2020 14:33
>     *Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general
>     <pgsql-general@postgresql.org>
>     *Assunto:* Re: Replication: slave server has 3x size of production
>     server?
>     On 2/22/20 9:25 AM, Edson Richter wrote:
>     > Hi!
>     > 
>     > I've a database cluster created at 9.6.10 linux x64 server rhel. I made 
>     > progressive upgrades, first upgrading slave and then upgrading master.
>     > Actually both are running 9.6.17.
>     > Current production server has 196Gb in size.
>     > Nevertheless, the replicated (slave) server has 598 Gb in size.
>     > Replication server has 3x size of production server, is that normal?
> 
>     How are you measuring the sizes?
> 
> 
> This is the command:
> 
> du --max-depth 1 -h pgDbCluster
> 
> 
> Production:
> 
> du --max-depth 1 -h pgDbCluster
> 
> 56M     pgDbCluster/pg_log
> 444K    pgDbCluster/global
> 4,0K    pgDbCluster/pg_stat
> 4,0K    pgDbCluster/pg_snapshots
> 16K     pgDbCluster/pg_logical
> 20K     pgDbCluster/pg_replslot
> 61M     pgDbCluster/pg_subtrans
> 4,0K    pgDbCluster/pg_commit_ts
> 465M    pgDbCluster/pg_xlog
> 4,0K    pgDbCluster/pg_twophase
> 12M     pgDbCluster/pg_multixact
> 4,0K    pgDbCluster/pg_serial
> 195G    pgDbCluster/base
> 284K    pgDbCluster/pg_stat_tmp
> 12M     pgDbCluster/pg_clog
> 4,0K    pgDbCluster/pg_dynshmem
> 12K     pgDbCluster/pg_notify
> 4,0K    pgDbCluster/pg_tblspc
> 196G    pgDbCluster
> 
> 
> Slave:
> 
> du -h --max-depth 1 pgDbCluster
> 
> 403G    pgDbCluster/pg_xlog
> 120K    pgDbCluster/pg_log
> 424K    pgDbCluster/global
> 0       pgDbCluster/pg_stat
> 0       pgDbCluster/pg_snapshots
> 4,0K    pgDbCluster/pg_logical
> 8,0K    pgDbCluster/pg_replslot
> 60M     pgDbCluster/pg_subtrans
> 0       pgDbCluster/pg_commit_ts
> 0       pgDbCluster/pg_twophase
> 11M     pgDbCluster/pg_multixact
> 0       pgDbCluster/pg_serial
> 195G    pgDbCluster/base
> 12M     pgDbCluster/pg_clog
> 0       pgDbCluster/pg_dynshmem
> 8,0K    pgDbCluster/pg_notify
> 12K     pgDbCluster/pg_stat_tmp
> 0       pgDbCluster/pg_tblspc
> 598G    pgDbCluster

So the WAL logs are not being cleared.

What replication method is being used?

What are the settings for the replication?

> 
> 
> Edson
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com

RE: Replication: slave server has 3x size of production server?

From

Edson Richter

Date:

22 February 2020, 19:03:25

De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: sábado, 22 de fevereiro de 2020 15:50
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?

On 2/22/20 10:05 AM, Edson Richter wrote:
>     ------------------------------------------------------------------------
>
>     *De:* Adrian Klaver <adrian.klaver@aklaver.com>
>     *Enviado:* sábado, 22 de fevereiro de 2020 14:33
>     *Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general
>     <pgsql-general@postgresql.org>
>     *Assunto:* Re: Replication: slave server has 3x size of production
>     server?
>     On 2/22/20 9:25 AM, Edson Richter wrote:
>     > Hi!
>     >
>     > I've a database cluster created at 9.6.10 linux x64 server rhel. I made
>     > progressive upgrades, first upgrading slave and then upgrading master.
>     > Actually both are running 9.6.17.
>     > Current production server has 196Gb in size.
>     > Nevertheless, the replicated (slave) server has 598 Gb in size.
>     > Replication server has 3x size of production server, is that normal?
>
>     How are you measuring the sizes?
>
>
> This is the command:
>
> du --max-depth 1 -h pgDbCluster
>
>
> Production:
>
> du --max-depth 1 -h pgDbCluster
>
> 56M pgDbCluster/pg_log
> 444K pgDbCluster/global
> 4,0K pgDbCluster/pg_stat
> 4,0K pgDbCluster/pg_snapshots
> 16K pgDbCluster/pg_logical
> 20K pgDbCluster/pg_replslot
> 61M pgDbCluster/pg_subtrans
> 4,0K pgDbCluster/pg_commit_ts
> 465M pgDbCluster/pg_xlog
> 4,0K pgDbCluster/pg_twophase
> 12M pgDbCluster/pg_multixact
> 4,0K pgDbCluster/pg_serial
> 195G pgDbCluster/base
> 284K pgDbCluster/pg_stat_tmp
> 12M pgDbCluster/pg_clog
> 4,0K pgDbCluster/pg_dynshmem
> 12K pgDbCluster/pg_notify
> 4,0K pgDbCluster/pg_tblspc
> 196G pgDbCluster
>
>
> Slave:
>
> du -h --max-depth 1 pgDbCluster
>
> 403G pgDbCluster/pg_xlog
> 120K pgDbCluster/pg_log
> 424K pgDbCluster/global
> 0 pgDbCluster/pg_stat
> 0 pgDbCluster/pg_snapshots
> 4,0K pgDbCluster/pg_logical
> 8,0K pgDbCluster/pg_replslot
> 60M pgDbCluster/pg_subtrans
> 0 pgDbCluster/pg_commit_ts
> 0 pgDbCluster/pg_twophase
> 11M pgDbCluster/pg_multixact
> 0 pgDbCluster/pg_serial
> 195G pgDbCluster/base
> 12M pgDbCluster/pg_clog
> 0 pgDbCluster/pg_dynshmem
> 8,0K pgDbCluster/pg_notify
> 12K pgDbCluster/pg_stat_tmp
> 0 pgDbCluster/pg_tblspc
> 598G pgDbCluster

So the WAL logs are not being cleared.

What replication method is being used?

What are the settings for the replication?

Streaming replication. Initiated via pg_basebackup.

Settings on master server:

# - Sending Server(s) -

# Set these on the master and on any standby that will send replication data.

max_wal_senders = 2 # max number of walsender processes (change requires restart)

wal_keep_segments = 25 # in logfile segments, 16MB each; 0 disables

#wal_sender_timeout = 60s # in milliseconds; 0 disables

max_replication_slots = 2 # max number of replication slots (change requires restart)

#track_commit_timestamp = off # collect timestamp of transaction commit (change requires restart)

# - Master Server -

# These settings are ignored on a standby server.

#synchronous_standby_names = '' # standby servers that provide sync rep number of sync standbys and comma-separated list of application_name from standby(s); '*' = all

#vacuum_defer_cleanup_age = 0 # number of xacts by which cleanup is delayed

Settings on slave server:

# - Standby Servers -

# These settings are ignored on a master server.

hot_standby = on # "on" allows queries during recovery (change requires restart)

max_standby_archive_delay = -1 # max delay before canceling queries when reading WAL from archive; -1 allows indefinite delay

max_standby_streaming_delay = -1 # max delay before canceling queries when reading streaming WAL; -1 allows indefinite delay

wal_receiver_status_interval = 10s # send replies at least this often 0 disables

hot_standby_feedback = on # send info from standby to prevent query conflicts

wal_receiver_timeout = 0 # time that receiver waits for communication from master in milliseconds; 0 disables

wal_retrieve_retry_interval = 5s # time to wait before retrying to retrieve WAL after a failed attempt

Regards,

Edson

>
>
> Edson
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Replication: slave server has 3x size of production server?

From

Adrian Klaver

Date:

22 February 2020, 19:16:00

On 2/22/20 11:03 AM, Edson Richter wrote:
>     ------------------------------------------------------------------------
> 

> 
> 
> Streaming replication. Initiated via pg_basebackup.
> 
> Settings on master server:
> 
> # - Sending Server(s) -
> # Set these on the master and on any standby that will send replication 
> data.
> max_wal_senders = 2             # max number of walsender processes 
> (change requires restart)
> wal_keep_segments = 25          # in logfile segments, 16MB each; 0 disables
> #wal_sender_timeout = 60s       # in milliseconds; 0 disables
> max_replication_slots = 2       # max number of replication 
> slots (change requires restart)
> #track_commit_timestamp = off   # collect timestamp of transaction 
> commit (change requires restart)
> # - Master Server -
> # These settings are ignored on a standby server.
> #synchronous_standby_names = '' # standby servers that provide sync 
> rep number of sync standbys and comma-separated list of 
> application_name from standby(s); '*' = all
> #vacuum_defer_cleanup_age = 0   # number of xacts by which cleanup is 
> delayed
> 
> 
> 
> Settings on slave server:
> 
> # - Standby Servers -
> # These settings are ignored on a master server.
> hot_standby = on                        # "on" allows queries during 
> recovery (change requires restart)
> max_standby_archive_delay = -1          # max delay before canceling 
> queries when reading WAL from archive; -1 allows indefinite delay
> max_standby_streaming_delay = -1        # max delay before canceling 
> queries when reading streaming WAL; -1 allows indefinite delay
> wal_receiver_status_interval = 10s      # send replies at least this 
> often 0 disables
> hot_standby_feedback = on               # send info from standby to 
> prevent query conflicts
> wal_receiver_timeout = 0                # time that receiver waits for 
> communication from master in milliseconds; 0 disables
> wal_retrieve_retry_interval = 5s        # time to wait before retrying 
> to retrieve WAL after a failed attempt

What are the settings for:

archive_mode
archive_command

on the standby?

Are the files in pg_xlog on the standby mostly from well in the past?



> 
> 
> Regards,
> 
> Edson
> 
>     > 
>     > 
>     > Edson
>     > 
> 
>     -- 
>     Adrian Klaver
>     adrian.klaver@aklaver.com
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com

RE: Replication: slave server has 3x size of production server?

From

Edson Richter

Date:

22 February 2020, 19:23:05

De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: sábado, 22 de fevereiro de 2020 16:16
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?

On 2/22/20 11:03 AM, Edson Richter wrote:
> ------------------------------------------------------------------------
>

>
>
> Streaming replication. Initiated via pg_basebackup.
>
> Settings on master server:
>
> # - Sending Server(s) -
> # Set these on the master and on any standby that will send replication
> data.
> max_wal_senders = 2 # max number of walsender processes
> (change requires restart)
> wal_keep_segments = 25 # in logfile segments, 16MB each; 0 disables
> #wal_sender_timeout = 60s # in milliseconds; 0 disables
> max_replication_slots = 2 # max number of replication
> slots (change requires restart)
> #track_commit_timestamp = off # collect timestamp of transaction
> commit (change requires restart)
> # - Master Server -
> # These settings are ignored on a standby server.
> #synchronous_standby_names = '' # standby servers that provide sync
> rep number of sync standbys and comma-separated list of
> application_name from standby(s); '*' = all
> #vacuum_defer_cleanup_age = 0 # number of xacts by which cleanup is
> delayed
>
>
>
> Settings on slave server:
>
> # - Standby Servers -
> # These settings are ignored on a master server.
> hot_standby = on # "on" allows queries during
> recovery (change requires restart)
> max_standby_archive_delay = -1 # max delay before canceling
> queries when reading WAL from archive; -1 allows indefinite delay
> max_standby_streaming_delay = -1 # max delay before canceling
> queries when reading streaming WAL; -1 allows indefinite delay
> wal_receiver_status_interval = 10s # send replies at least this
> often 0 disables
> hot_standby_feedback = on # send info from standby to
> prevent query conflicts
> wal_receiver_timeout = 0 # time that receiver waits for
> communication from master in milliseconds; 0 disables
> wal_retrieve_retry_interval = 5s # time to wait before retrying
> to retrieve WAL after a failed attempt

What are the settings for:

archive_mode
archive_command

on the standby?

Are the files in pg_xlog on the standby mostly from well in the past?

Actually, standby server is sending wals to a backup (barman) server:

archive_mode = always # enables archiving; off, on, or always (change requires restart)

archive_command = 'rsync -e "ssh -2 -C -p 2022" -az %p barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'

The files are about 7 months old.

Thanks,

Edson

>
>
> Regards,
>
> Edson
>
>     >
>     >
>     > Edson
>     >
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Replication: slave server has 3x size of production server?

From

Adrian Klaver

Date:

22 February 2020, 21:12:59

On 2/22/20 11:23 AM, Edson Richter wrote:
>     ------------------------------------------------------------------------
> 
>     *De:* Adrian Klaver <adrian.klaver@aklaver.com>
>     *Enviado:* sábado, 22 de fevereiro de 2020 16:16
>     *Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general
>     <pgsql-general@postgresql.org>
>     *Assunto:* Re: Replication: slave server has 3x size of production
>     server?
>     On 2/22/20 11:03 AM, Edson Richter wrote:
>     >     ------------------------------------------------------------------------
>     > 
> 
>     > 
>     > 
>     > Streaming replication. Initiated via pg_basebackup.
>     > 
>     > Settings on master server:
>     > 
>     > # - Sending Server(s) -
>     > # Set these on the master and on any standby that will send replication 
>     > data.
>     > max_wal_senders = 2             # max number of walsender processes 
>     > (change requires restart)
>     > wal_keep_segments = 25          # in logfile segments, 16MB each; 0 disables
>     > #wal_sender_timeout = 60s       # in milliseconds; 0 disables
>     > max_replication_slots = 2       # max number of replication 
>     > slots (change requires restart)
>     > #track_commit_timestamp = off   # collect timestamp of transaction 
>     > commit (change requires restart)
>     > # - Master Server -
>     > # These settings are ignored on a standby server.
>     > #synchronous_standby_names = '' # standby servers that provide sync 
>     > rep number of sync standbys and comma-separated list of 
>     > application_name from standby(s); '*' = all
>     > #vacuum_defer_cleanup_age = 0   # number of xacts by which cleanup is 
>     > delayed
>     > 
>     > 
>     > 
>     > Settings on slave server:
>     > 
>     > # - Standby Servers -
>     > # These settings are ignored on a master server.
>     > hot_standby = on                        # "on" allows queries during 
>     > recovery (change requires restart)
>     > max_standby_archive_delay = -1          # max delay before canceling 
>     > queries when reading WAL from archive; -1 allows indefinite delay
>     > max_standby_streaming_delay = -1        # max delay before canceling 
>     > queries when reading streaming WAL; -1 allows indefinite delay
>     > wal_receiver_status_interval = 10s      # send replies at least this 
>     > often 0 disables
>     > hot_standby_feedback = on               # send info from standby to 
>     > prevent query conflicts
>     > wal_receiver_timeout = 0                # time that receiver waits for 
>     > communication from master in milliseconds; 0 disables
>     > wal_retrieve_retry_interval = 5s        # time to wait before retrying 
>     > to retrieve WAL after a failed attempt
> 
>     What are the settings for:
> 
>     archive_mode
>     archive_command
> 
>     on the standby?
> 
>     Are the files in pg_xlog on the standby mostly from well in the past?
> 
> 
> Actually, standby server is sending wals to a backup (barman) server:
> 
> archive_mode = always           # enables archiving; off, on, or always 
> (change requires restart)
> archive_command = 'rsync -e "ssh -2 -C -p 2022" -az %p 
> barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'

And the above is working, the files are showing up on the barman server?


> 
> 
> The files are about 7 months old.

Are there newer files that would indicate that the streaming is working?

> 
> 
> Thanks,
> 
> Edson
> 
>     > 
>     > 
>     > Regards,
>     > 
>     > Edson
>     > 
>     >     > 
>     >     > 
>     >     > Edson
>     >     > 
>     > 
>     >     -- 
>     >     Adrian Klaver
>     >     adrian.klaver@aklaver.com
>     > 
> 
> 
>     -- 
>     Adrian Klaver
>     adrian.klaver@aklaver.com
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com

RE: Replication: slave server has 3x size of production server?

From

Edson Richter

Date:

22 February 2020, 22:51:56

De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: sábado, 22 de fevereiro de 2020 18:12
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?

On 2/22/20 11:23 AM, Edson Richter wrote:
>     ------------------------------------------------------------------------
>
>     *De:* Adrian Klaver <adrian.klaver@aklaver.com>
>     *Enviado:* sábado, 22 de fevereiro de 2020 16:16
>     *Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general
>     <pgsql-general@postgresql.org>
>     *Assunto:* Re: Replication: slave server has 3x size of production
>     server?
>     On 2/22/20 11:03 AM, Edson Richter wrote:
>     >     ------------------------------------------------------------------------
>     >
>
>     >
>     >
>     > Streaming replication. Initiated via pg_basebackup.
>     >
>     > Settings on master server:
>     >
>     > # - Sending Server(s) -
>     > # Set these on the master and on any standby that will send replication
>     > data.
>     > max_wal_senders = 2 # max number of walsender processes
>     > (change requires restart)
>     > wal_keep_segments = 25 # in logfile segments, 16MB each; 0 disables
>     > #wal_sender_timeout = 60s # in milliseconds; 0 disables
>     > max_replication_slots = 2 # max number of replication
>     > slots (change requires restart)
>     > #track_commit_timestamp = off # collect timestamp of transaction
>     > commit (change requires restart)
>     > # - Master Server -
>     > # These settings are ignored on a standby server.
>     > #synchronous_standby_names = '' # standby servers that provide sync
>     > rep number of sync standbys and comma-separated list of
>     > application_name from standby(s); '*' = all
>     > #vacuum_defer_cleanup_age = 0 # number of xacts by which cleanup is
>     > delayed
>     >
>     >
>     >
>     > Settings on slave server:
>     >
>     > # - Standby Servers -
>     > # These settings are ignored on a master server.
>     > hot_standby = on # "on" allows queries during
>     > recovery (change requires restart)
>     > max_standby_archive_delay = -1 # max delay before canceling
>     > queries when reading WAL from archive; -1 allows indefinite delay
>     > max_standby_streaming_delay = -1 # max delay before canceling
>     > queries when reading streaming WAL; -1 allows indefinite delay
>     > wal_receiver_status_interval = 10s # send replies at least this
>     > often 0 disables
>     > hot_standby_feedback = on # send info from standby to
>     > prevent query conflicts
>     > wal_receiver_timeout = 0 # time that receiver waits for
>     > communication from master in milliseconds; 0 disables
>     > wal_retrieve_retry_interval = 5s # time to wait before retrying
>     > to retrieve WAL after a failed attempt
>
>     What are the settings for:
>
>     archive_mode
>     archive_command
>
>     on the standby?
>
>     Are the files in pg_xlog on the standby mostly from well in the past?
>
>
> Actually, standby server is sending wals to a backup (barman) server:
>
> archive_mode = always # enables archiving; off, on, or always
> (change requires restart)
> archive_command = 'rsync -e "ssh -2 -C -p 2022" -az %p
> barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'

And the above is working, the files are showing up on the barman server?

Yes, it is working. Last X'log file is present on all thee servers.

Also, comparting last transaction number on master and slave shows that all are in sync.

Last, but not least, select max(id) from a busy table shows same id (when queried almost simultaneously using a simple test routine).

>
>
> The files are about 7 months old.

Are there newer files that would indicate that the streaming is working?

Yes, streaming is working properly (as stated above).

Thanks,

Edson Richter

>
>
> Thanks,
>
> Edson
>
>     >
>     >
>     > Regards,
>     >
>     > Edson
>     >
>     >     >
>     >     >
>     >     > Edson
>     >     >
>     >
>     >     --
>     >     Adrian Klaver
>     >     adrian.klaver@aklaver.com
>     >
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Replication: slave server has 3x size of production server?

From

Adrian Klaver

Date:

22 February 2020, 23:34:18

On 2/22/20 2:51 PM, Edson Richter wrote:

> 
> Yes, it is working. Last X'log file is present on all thee servers.
> Also, comparting last transaction number on master and slave shows that 
> all are in sync.
> Last, but not least, select max(id) from a busy table shows same id 
> (when queried almost simultaneously using a simple test routine).

Well something is keeping those WAL file around. You probably should 
analyze your complete setup to see what else is touching those servers.

> 
>     > 
>     > 
>     > The files are about 7 months old.
> 
>     Are there newer files that would indicate that the streaming is working?
> 
> 
> Yes, streaming is working properly (as stated above).
> 
> Thanks,
> 
> 
> Edson Richter
> 
> 
>> 



-- 
Adrian Klaver
adrian.klaver@aklaver.com

RE: Replication: slave server has 3x size of production server?

From

Edson Richter

Date:

23 February 2020, 16:04:45

De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: sábado, 22 de fevereiro de 2020 20:34
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?

On 2/22/20 2:51 PM, Edson Richter wrote:

>
> Yes, it is working. Last X'log file is present on all thee servers.
> Also, comparting last transaction number on master and slave shows that
> all are in sync.
> Last, but not least, select max(id) from a busy table shows same id
> (when queried almost simultaneously using a simple test routine).

Well something is keeping those WAL file around. You probably should
analyze your complete setup to see what else is touching those servers.

It is safe to add a "--remove-source-files" into my archive_command as folows into my slave server?

archive_command = 'rsync --remove-source-files -e "ssh -2 -C -p 2022" -az %p barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'

and remove the xlog file after copy to barman?

I mean, whem the archive command starts, the wal has been already processed by the slave server, so we don't need them after copying to backup server, right?

Regards,

Edson

>
>     >
>     >
>     > The files are about 7 months old.
>
>     Are there newer files that would indicate that the streaming is working?
>
>
> Yes, streaming is working properly (as stated above).
>
> Thanks,
>
>
> Edson Richter
>
>
>>

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Replication: slave server has 3x size of production server?

From

Adrian Klaver

Date:

23 February 2020, 18:42:42

On 2/23/20 8:04 AM, Edson Richter wrote:
>     ------------------------------------------------------------------------
> 
>     *De:* Adrian Klaver <adrian.klaver@aklaver.com>
>     *Enviado:* sábado, 22 de fevereiro de 2020 20:34
>     *Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general
>     <pgsql-general@postgresql.org>
>     *Assunto:* Re: Replication: slave server has 3x size of production
>     server?
>     On 2/22/20 2:51 PM, Edson Richter wrote:
> 
>     > 
>     > Yes, it is working. Last X'log file is present on all thee servers.
>     > Also, comparting last transaction number on master and slave shows that 
>     > all are in sync.
>     > Last, but not least, select max(id) from a busy table shows same id 
>     > (when queried almost simultaneously using a simple test routine).
> 
>     Well something is keeping those WAL file around. You probably should
>     analyze your complete setup to see what else is touching those servers.
> 
> 
> It is safe to add a "--remove-source-files" into my archive_command as 
> folows into my slave server?

I would say not. See:

https://www.postgresql.org/docs/12/wal-configuration.html

"Checkpoints are points in the sequence of transactions at which it is 
guaranteed that the heap and index data files have been updated with all 
information written before that checkpoint. At checkpoint time, all 
dirty data pages are flushed to disk and a special checkpoint record is 
written to the log file. (The change records were previously flushed to 
the WAL files.) In the event of a crash, the crash recovery procedure 
looks at the latest checkpoint record to determine the point in the log 
(known as the redo record) from which it should start the REDO 
operation. Any changes made to data files before that point are 
guaranteed to be already on disk. Hence, after a checkpoint, log 
segments preceding the one containing the redo record are no longer 
needed and can be recycled or removed. (When WAL archiving is being 
done, the log segments must be archived before being recycled or removed.)"

So there is a window where a WAL is written but before the data it 
represents is check pointed, so it still needed.

> 
> 
> archive_command = 'rsync --remove-source-files -e "ssh -2 -C -p 2022" 
> -az %p barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'
> 
> 
> and remove the xlog file after copy to barman?
> I mean, whem the archive command starts, the wal has been already 
> processed by the slave server, so we don't need them after copying to 
> backup server, right?
> 
> 
> Regards,
> 
> Edson
> 
>     > 
>     >     > 
>     >     > 
>     >     > The files are about 7 months old.
>     > 
>     >     Are there newer files that would indicate that the streaming is working?
>     > 
>     > 
>     > Yes, streaming is working properly (as stated above).
>     > 
>     > Thanks,
>     > 
>     > 
>     > Edson Richter
>     > 
>     > 
>     >> 
> 
> 
> 
>     -- 
>     Adrian Klaver
>     adrian.klaver@aklaver.com
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com

RE: Replication: slave server has 3x size of production server?

From

Edson Richter

Date:

23 February 2020, 21:00:25

De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: domingo, 23 de fevereiro de 2020 15:42
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?

On 2/23/20 8:04 AM, Edson Richter wrote:
>     ------------------------------------------------------------------------
>
>     *De:* Adrian Klaver <adrian.klaver@aklaver.com>
>     *Enviado:* sábado, 22 de fevereiro de 2020 20:34
>     *Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general
>     <pgsql-general@postgresql.org>
>     *Assunto:* Re: Replication: slave server has 3x size of production
>     server?
>     On 2/22/20 2:51 PM, Edson Richter wrote:
>
>     >
>     > Yes, it is working. Last X'log file is present on all thee servers.
>     > Also, comparting last transaction number on master and slave shows that
>     > all are in sync.
>     > Last, but not least, select max(id) from a busy table shows same id
>     > (when queried almost simultaneously using a simple test routine).
>
>     Well something is keeping those WAL file around. You probably should
>     analyze your complete setup to see what else is touching those servers.
>
>
> It is safe to add a "--remove-source-files" into my archive_command as
> folows into my slave server?

I would say not. See:

https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fdocs%2F12%2Fwal-configuration.html&data=02%7C01%7C%7Cb49e9c01f11a4b9fe4d108d7b8902bd2%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637180801653706393&sdata=QY24D6R%2B%2BJ7VgnctERdK964reKEp7XbxERiXGC2XL8Y%3D&reserved=0

"Checkpoints are points in the sequence of transactions at which it is
guaranteed that the heap and index data files have been updated with all
information written before that checkpoint. At checkpoint time, all
dirty data pages are flushed to disk and a special checkpoint record is
written to the log file. (The change records were previously flushed to
the WAL files.) In the event of a crash, the crash recovery procedure
looks at the latest checkpoint record to determine the point in the log
(known as the redo record) from which it should start the REDO
operation. Any changes made to data files before that point are
guaranteed to be already on disk. Hence, after a checkpoint, log
segments preceding the one containing the redo record are no longer
needed and can be recycled or removed. (When WAL archiving is being
done, the log segments must be archived before being recycled or removed.)"

So there is a window where a WAL is written but before the data it
represents is check pointed, so it still needed.

I see. Makes sense.

I suppose that long lifed xlog files are of no use then... I would expect PostgreSQL delete them automatically.

Perhaps, since I have full backups happening every odd days, I can create a "post backup command" in barman script so it will delete files above 1 week from the server it is backup up from...

I understand there is no guarantee that these files have already been processed... but if they are needed, they can be recovered from the barman server...

Thanks,

Edson

>
>
> archive_command = 'rsync --remove-source-files -e "ssh -2 -C -p 2022"
> -az %p barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'
>
>
> and remove the xlog file after copy to barman?
> I mean, whem the archive command starts, the wal has been already
> processed by the slave server, so we don't need them after copying to
> backup server, right?
>
>
> Regards,
>
> Edson
>
>     >
>     >     >
>     >     >
>     >     > The files are about 7 months old.
>     >
>     >     Are there newer files that would indicate that the streaming is working?
>     >
>     >
>     > Yes, streaming is working properly (as stated above).
>     >
>     > Thanks,
>     >
>     >
>     > Edson Richter
>     >
>     >
>     >>
>
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Replication: slave server has 3x size of production server?

From

Jehan-Guillaume de Rorthais

Date:

25 February 2020, 09:55:44

On Sat, 22 Feb 2020 19:23:05 +0000
Edson Richter <edsonrichter@hotmail.com> wrote:
[...]
> Actually, standby server is sending wals to a backup (barman) server:
> 
> archive_mode = always           # enables archiving; off, on, or always
> (change requires restart) archive_command = 'rsync -e "ssh -2 -C -p 2022" -az
> %p barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'
> 
> 
> The files are about 7 months old.

Did you check the return code of your archive_command? 

Did you check the log produced by your archive_command and postmaster?

How many files with ".ready" extension in "$PGDATA/pg_xlog/archive_status/"?

Can you confirm there's no missing WAL between the older one and
the newer one in "$PGDATA/pg_xlog" in alphanum order?