Thread: Replication: slave server has 3x size of production server?
Hi!
I've a database cluster created at 9.6.10 linux x64 server rhel. I made progressive upgrades, first upgrading slave and then upgrading master.
Actually both are running 9.6.17.
Current production server has 196Gb in size.
Nevertheless, the replicated (slave) server has 598 Gb in size.
Replication server has 3x size of production server, is that normal?
Shall I drop the slave server and re-create it? How to avoid this situation in future?
Thanks,
Edson
On 2/22/20 9:25 AM, Edson Richter wrote: > Hi! > > I've a database cluster created at 9.6.10 linux x64 server rhel. I made > progressive upgrades, first upgrading slave and then upgrading master. > Actually both are running 9.6.17. > Current production server has 196Gb in size. > Nevertheless, the replicated (slave) server has 598 Gb in size. > Replication server has 3x size of production server, is that normal? How are you measuring the sizes? Where is the space being taken up on disk? > > Shall I drop the slave server and re-create it? How to avoid this > situation in future? > > Thanks, > > Edson > > -- Adrian Klaver adrian.klaver@aklaver.com
De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: sábado, 22 de fevereiro de 2020 14:33
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?On 2/22/20 9:25 AM, Edson Richter wrote:
> Hi!
>
> I've a database cluster created at 9.6.10 linux x64 server rhel. I made
> progressive upgrades, first upgrading slave and then upgrading master.
> Actually both are running 9.6.17.
> Current production server has 196Gb in size.
> Nevertheless, the replicated (slave) server has 598 Gb in size.
> Replication server has 3x size of production server, is that normal?
How are you measuring the sizes?
This is the command:
du --max-depth 1 -h pgDbCluster
Production:
du --max-depth 1 -h pgDbCluster
56M pgDbCluster/pg_log
444K pgDbCluster/global
4,0K pgDbCluster/pg_stat
4,0K pgDbCluster/pg_snapshots
16K pgDbCluster/pg_logical
20K pgDbCluster/pg_replslot
61M pgDbCluster/pg_subtrans
4,0K pgDbCluster/pg_commit_ts
465M pgDbCluster/pg_xlog
4,0K pgDbCluster/pg_twophase
12M pgDbCluster/pg_multixact
4,0K pgDbCluster/pg_serial
195G pgDbCluster/base
284K pgDbCluster/pg_stat_tmp
12M pgDbCluster/pg_clog
4,0K pgDbCluster/pg_dynshmem
12K pgDbCluster/pg_notify
4,0K pgDbCluster/pg_tblspc
196G pgDbCluster
Slave:
du -h --max-depth 1 pgDbCluster
403G pgDbCluster/pg_xlog
120K pgDbCluster/pg_log
424K pgDbCluster/global
0 pgDbCluster/pg_stat
0 pgDbCluster/pg_snapshots
4,0K pgDbCluster/pg_logical
8,0K pgDbCluster/pg_replslot
60M pgDbCluster/pg_subtrans
0 pgDbCluster/pg_commit_ts
0 pgDbCluster/pg_twophase
11M pgDbCluster/pg_multixact
0 pgDbCluster/pg_serial
195G pgDbCluster/base
12M pgDbCluster/pg_clog
0 pgDbCluster/pg_dynshmem
8,0K pgDbCluster/pg_notify
12K pgDbCluster/pg_stat_tmp
0 pgDbCluster/pg_tblspc
598G pgDbCluster
Edson
Where is the space being taken up on disk?
>
> Shall I drop the slave server and re-create it? How to avoid this
> situation in future?
>
> Thanks,
>
> Edson
>
>
--
Adrian Klaver
adrian.klaver@aklaver.com
On 2/22/20 10:05 AM, Edson Richter wrote: > ------------------------------------------------------------------------ > > *De:* Adrian Klaver <adrian.klaver@aklaver.com> > *Enviado:* sábado, 22 de fevereiro de 2020 14:33 > *Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general > <pgsql-general@postgresql.org> > *Assunto:* Re: Replication: slave server has 3x size of production > server? > On 2/22/20 9:25 AM, Edson Richter wrote: > > Hi! > > > > I've a database cluster created at 9.6.10 linux x64 server rhel. I made > > progressive upgrades, first upgrading slave and then upgrading master. > > Actually both are running 9.6.17. > > Current production server has 196Gb in size. > > Nevertheless, the replicated (slave) server has 598 Gb in size. > > Replication server has 3x size of production server, is that normal? > > How are you measuring the sizes? > > > This is the command: > > du --max-depth 1 -h pgDbCluster > > > Production: > > du --max-depth 1 -h pgDbCluster > > 56M pgDbCluster/pg_log > 444K pgDbCluster/global > 4,0K pgDbCluster/pg_stat > 4,0K pgDbCluster/pg_snapshots > 16K pgDbCluster/pg_logical > 20K pgDbCluster/pg_replslot > 61M pgDbCluster/pg_subtrans > 4,0K pgDbCluster/pg_commit_ts > 465M pgDbCluster/pg_xlog > 4,0K pgDbCluster/pg_twophase > 12M pgDbCluster/pg_multixact > 4,0K pgDbCluster/pg_serial > 195G pgDbCluster/base > 284K pgDbCluster/pg_stat_tmp > 12M pgDbCluster/pg_clog > 4,0K pgDbCluster/pg_dynshmem > 12K pgDbCluster/pg_notify > 4,0K pgDbCluster/pg_tblspc > 196G pgDbCluster > > > Slave: > > du -h --max-depth 1 pgDbCluster > > 403G pgDbCluster/pg_xlog > 120K pgDbCluster/pg_log > 424K pgDbCluster/global > 0 pgDbCluster/pg_stat > 0 pgDbCluster/pg_snapshots > 4,0K pgDbCluster/pg_logical > 8,0K pgDbCluster/pg_replslot > 60M pgDbCluster/pg_subtrans > 0 pgDbCluster/pg_commit_ts > 0 pgDbCluster/pg_twophase > 11M pgDbCluster/pg_multixact > 0 pgDbCluster/pg_serial > 195G pgDbCluster/base > 12M pgDbCluster/pg_clog > 0 pgDbCluster/pg_dynshmem > 8,0K pgDbCluster/pg_notify > 12K pgDbCluster/pg_stat_tmp > 0 pgDbCluster/pg_tblspc > 598G pgDbCluster So the WAL logs are not being cleared. What replication method is being used? What are the settings for the replication? > > > Edson > -- Adrian Klaver adrian.klaver@aklaver.com
De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: sábado, 22 de fevereiro de 2020 15:50
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?On 2/22/20 10:05 AM, Edson Richter wrote:
> ------------------------------------------------------------------------
>
> *De:* Adrian Klaver <adrian.klaver@aklaver.com>
> *Enviado:* sábado, 22 de fevereiro de 2020 14:33
> *Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general
> <pgsql-general@postgresql.org>
> *Assunto:* Re: Replication: slave server has 3x size of production
> server?
> On 2/22/20 9:25 AM, Edson Richter wrote:
> > Hi!
> >
> > I've a database cluster created at 9.6.10 linux x64 server rhel. I made
> > progressive upgrades, first upgrading slave and then upgrading master.
> > Actually both are running 9.6.17.
> > Current production server has 196Gb in size.
> > Nevertheless, the replicated (slave) server has 598 Gb in size.
> > Replication server has 3x size of production server, is that normal?
>
> How are you measuring the sizes?
>
>
> This is the command:
>
> du --max-depth 1 -h pgDbCluster
>
>
> Production:
>
> du --max-depth 1 -h pgDbCluster
>
> 56M pgDbCluster/pg_log
> 444K pgDbCluster/global
> 4,0K pgDbCluster/pg_stat
> 4,0K pgDbCluster/pg_snapshots
> 16K pgDbCluster/pg_logical
> 20K pgDbCluster/pg_replslot
> 61M pgDbCluster/pg_subtrans
> 4,0K pgDbCluster/pg_commit_ts
> 465M pgDbCluster/pg_xlog
> 4,0K pgDbCluster/pg_twophase
> 12M pgDbCluster/pg_multixact
> 4,0K pgDbCluster/pg_serial
> 195G pgDbCluster/base
> 284K pgDbCluster/pg_stat_tmp
> 12M pgDbCluster/pg_clog
> 4,0K pgDbCluster/pg_dynshmem
> 12K pgDbCluster/pg_notify
> 4,0K pgDbCluster/pg_tblspc
> 196G pgDbCluster
>
>
> Slave:
>
> du -h --max-depth 1 pgDbCluster
>
> 403G pgDbCluster/pg_xlog
> 120K pgDbCluster/pg_log
> 424K pgDbCluster/global
> 0 pgDbCluster/pg_stat
> 0 pgDbCluster/pg_snapshots
> 4,0K pgDbCluster/pg_logical
> 8,0K pgDbCluster/pg_replslot
> 60M pgDbCluster/pg_subtrans
> 0 pgDbCluster/pg_commit_ts
> 0 pgDbCluster/pg_twophase
> 11M pgDbCluster/pg_multixact
> 0 pgDbCluster/pg_serial
> 195G pgDbCluster/base
> 12M pgDbCluster/pg_clog
> 0 pgDbCluster/pg_dynshmem
> 8,0K pgDbCluster/pg_notify
> 12K pgDbCluster/pg_stat_tmp
> 0 pgDbCluster/pg_tblspc
> 598G pgDbCluster
So the WAL logs are not being cleared.
What replication method is being used?
What are the settings for the replication?
Streaming replication. Initiated via pg_basebackup.
Settings on master server:
# - Sending Server(s) -
# Set these on the master and on any standby that will send replication data.
max_wal_senders = 2 # max number of walsender processes (change requires restart)
wal_keep_segments = 25 # in logfile segments, 16MB each; 0 disables
#wal_sender_timeout = 60s # in milliseconds; 0 disables
max_replication_slots = 2 # max number of replication slots (change requires restart)
#track_commit_timestamp = off # collect timestamp of transaction commit (change requires restart)
# - Master Server -
# These settings are ignored on a standby server.
#synchronous_standby_names = '' # standby servers that provide sync rep number of sync standbys and comma-separated list of application_name from standby(s); '*' = all
#vacuum_defer_cleanup_age = 0 # number of xacts by which cleanup is delayed
Settings on slave server:
# - Standby Servers -
# These settings are ignored on a master server.
hot_standby = on # "on" allows queries during recovery (change requires restart)
max_standby_archive_delay = -1 # max delay before canceling queries when reading WAL from archive; -1 allows indefinite delay
max_standby_streaming_delay = -1 # max delay before canceling queries when reading streaming WAL; -1 allows indefinite delay
wal_receiver_status_interval = 10s # send replies at least this often 0 disables
hot_standby_feedback = on # send info from standby to prevent query conflicts
wal_receiver_timeout = 0 # time that receiver waits for communication from master in milliseconds; 0 disables
wal_retrieve_retry_interval = 5s # time to wait before retrying to retrieve WAL after a failed attempt
Regards,
Edson
>
>
> Edson
>
--
Adrian Klaver
adrian.klaver@aklaver.com
On 2/22/20 11:03 AM, Edson Richter wrote: > ------------------------------------------------------------------------ > > > > Streaming replication. Initiated via pg_basebackup. > > Settings on master server: > > # - Sending Server(s) - > # Set these on the master and on any standby that will send replication > data. > max_wal_senders = 2 # max number of walsender processes > (change requires restart) > wal_keep_segments = 25 # in logfile segments, 16MB each; 0 disables > #wal_sender_timeout = 60s # in milliseconds; 0 disables > max_replication_slots = 2 # max number of replication > slots (change requires restart) > #track_commit_timestamp = off # collect timestamp of transaction > commit (change requires restart) > # - Master Server - > # These settings are ignored on a standby server. > #synchronous_standby_names = '' # standby servers that provide sync > rep number of sync standbys and comma-separated list of > application_name from standby(s); '*' = all > #vacuum_defer_cleanup_age = 0 # number of xacts by which cleanup is > delayed > > > > Settings on slave server: > > # - Standby Servers - > # These settings are ignored on a master server. > hot_standby = on # "on" allows queries during > recovery (change requires restart) > max_standby_archive_delay = -1 # max delay before canceling > queries when reading WAL from archive; -1 allows indefinite delay > max_standby_streaming_delay = -1 # max delay before canceling > queries when reading streaming WAL; -1 allows indefinite delay > wal_receiver_status_interval = 10s # send replies at least this > often 0 disables > hot_standby_feedback = on # send info from standby to > prevent query conflicts > wal_receiver_timeout = 0 # time that receiver waits for > communication from master in milliseconds; 0 disables > wal_retrieve_retry_interval = 5s # time to wait before retrying > to retrieve WAL after a failed attempt What are the settings for: archive_mode archive_command on the standby? Are the files in pg_xlog on the standby mostly from well in the past? > > > Regards, > > Edson > > > > > > > Edson > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com > -- Adrian Klaver adrian.klaver@aklaver.com
De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: sábado, 22 de fevereiro de 2020 16:16
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?On 2/22/20 11:03 AM, Edson Richter wrote:
> ------------------------------------------------------------------------
>
>
>
> Streaming replication. Initiated via pg_basebackup.
>
> Settings on master server:
>
> # - Sending Server(s) -
> # Set these on the master and on any standby that will send replication
> data.
> max_wal_senders = 2 # max number of walsender processes
> (change requires restart)
> wal_keep_segments = 25 # in logfile segments, 16MB each; 0 disables
> #wal_sender_timeout = 60s # in milliseconds; 0 disables
> max_replication_slots = 2 # max number of replication
> slots (change requires restart)
> #track_commit_timestamp = off # collect timestamp of transaction
> commit (change requires restart)
> # - Master Server -
> # These settings are ignored on a standby server.
> #synchronous_standby_names = '' # standby servers that provide sync
> rep number of sync standbys and comma-separated list of
> application_name from standby(s); '*' = all
> #vacuum_defer_cleanup_age = 0 # number of xacts by which cleanup is
> delayed
>
>
>
> Settings on slave server:
>
> # - Standby Servers -
> # These settings are ignored on a master server.
> hot_standby = on # "on" allows queries during
> recovery (change requires restart)
> max_standby_archive_delay = -1 # max delay before canceling
> queries when reading WAL from archive; -1 allows indefinite delay
> max_standby_streaming_delay = -1 # max delay before canceling
> queries when reading streaming WAL; -1 allows indefinite delay
> wal_receiver_status_interval = 10s # send replies at least this
> often 0 disables
> hot_standby_feedback = on # send info from standby to
> prevent query conflicts
> wal_receiver_timeout = 0 # time that receiver waits for
> communication from master in milliseconds; 0 disables
> wal_retrieve_retry_interval = 5s # time to wait before retrying
> to retrieve WAL after a failed attempt
What are the settings for:
archive_mode
archive_command
on the standby?
Are the files in pg_xlog on the standby mostly from well in the past?
Actually, standby server is sending wals to a backup (barman) server:
archive_mode = always # enables archiving; off, on, or always (change requires restart)
The files are about 7 months old.
Thanks,
archive_command = 'rsync -e "ssh -2 -C -p 2022" -az %p barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'
Thanks,
Edson
>
>
> Regards,
>
> Edson
>
> >
> >
> > Edson
> >
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com
>
--
Adrian Klaver
adrian.klaver@aklaver.com
On 2/22/20 11:23 AM, Edson Richter wrote: > ------------------------------------------------------------------------ > > *De:* Adrian Klaver <adrian.klaver@aklaver.com> > *Enviado:* sábado, 22 de fevereiro de 2020 16:16 > *Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general > <pgsql-general@postgresql.org> > *Assunto:* Re: Replication: slave server has 3x size of production > server? > On 2/22/20 11:03 AM, Edson Richter wrote: > > ------------------------------------------------------------------------ > > > > > > > > > Streaming replication. Initiated via pg_basebackup. > > > > Settings on master server: > > > > # - Sending Server(s) - > > # Set these on the master and on any standby that will send replication > > data. > > max_wal_senders = 2 # max number of walsender processes > > (change requires restart) > > wal_keep_segments = 25 # in logfile segments, 16MB each; 0 disables > > #wal_sender_timeout = 60s # in milliseconds; 0 disables > > max_replication_slots = 2 # max number of replication > > slots (change requires restart) > > #track_commit_timestamp = off # collect timestamp of transaction > > commit (change requires restart) > > # - Master Server - > > # These settings are ignored on a standby server. > > #synchronous_standby_names = '' # standby servers that provide sync > > rep number of sync standbys and comma-separated list of > > application_name from standby(s); '*' = all > > #vacuum_defer_cleanup_age = 0 # number of xacts by which cleanup is > > delayed > > > > > > > > Settings on slave server: > > > > # - Standby Servers - > > # These settings are ignored on a master server. > > hot_standby = on # "on" allows queries during > > recovery (change requires restart) > > max_standby_archive_delay = -1 # max delay before canceling > > queries when reading WAL from archive; -1 allows indefinite delay > > max_standby_streaming_delay = -1 # max delay before canceling > > queries when reading streaming WAL; -1 allows indefinite delay > > wal_receiver_status_interval = 10s # send replies at least this > > often 0 disables > > hot_standby_feedback = on # send info from standby to > > prevent query conflicts > > wal_receiver_timeout = 0 # time that receiver waits for > > communication from master in milliseconds; 0 disables > > wal_retrieve_retry_interval = 5s # time to wait before retrying > > to retrieve WAL after a failed attempt > > What are the settings for: > > archive_mode > archive_command > > on the standby? > > Are the files in pg_xlog on the standby mostly from well in the past? > > > Actually, standby server is sending wals to a backup (barman) server: > > archive_mode = always # enables archiving; off, on, or always > (change requires restart) > archive_command = 'rsync -e "ssh -2 -C -p 2022" -az %p > barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f' And the above is working, the files are showing up on the barman server? > > > The files are about 7 months old. Are there newer files that would indicate that the streaming is working? > > > Thanks, > > Edson > > > > > > > Regards, > > > > Edson > > > > > > > > > > > Edson > > > > > > > -- > > Adrian Klaver > > adrian.klaver@aklaver.com > > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com > -- Adrian Klaver adrian.klaver@aklaver.com
De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: sábado, 22 de fevereiro de 2020 18:12
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?On 2/22/20 11:23 AM, Edson Richter wrote:
> ------------------------------------------------------------------------
>
> *De:* Adrian Klaver <adrian.klaver@aklaver.com>
> *Enviado:* sábado, 22 de fevereiro de 2020 16:16
> *Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general
> <pgsql-general@postgresql.org>
> *Assunto:* Re: Replication: slave server has 3x size of production
> server?
> On 2/22/20 11:03 AM, Edson Richter wrote:
> > ------------------------------------------------------------------------
> >
>
> >
> >
> > Streaming replication. Initiated via pg_basebackup.
> >
> > Settings on master server:
> >
> > # - Sending Server(s) -
> > # Set these on the master and on any standby that will send replication
> > data.
> > max_wal_senders = 2 # max number of walsender processes
> > (change requires restart)
> > wal_keep_segments = 25 # in logfile segments, 16MB each; 0 disables
> > #wal_sender_timeout = 60s # in milliseconds; 0 disables
> > max_replication_slots = 2 # max number of replication
> > slots (change requires restart)
> > #track_commit_timestamp = off # collect timestamp of transaction
> > commit (change requires restart)
> > # - Master Server -
> > # These settings are ignored on a standby server.
> > #synchronous_standby_names = '' # standby servers that provide sync
> > rep number of sync standbys and comma-separated list of
> > application_name from standby(s); '*' = all
> > #vacuum_defer_cleanup_age = 0 # number of xacts by which cleanup is
> > delayed
> >
> >
> >
> > Settings on slave server:
> >
> > # - Standby Servers -
> > # These settings are ignored on a master server.
> > hot_standby = on # "on" allows queries during
> > recovery (change requires restart)
> > max_standby_archive_delay = -1 # max delay before canceling
> > queries when reading WAL from archive; -1 allows indefinite delay
> > max_standby_streaming_delay = -1 # max delay before canceling
> > queries when reading streaming WAL; -1 allows indefinite delay
> > wal_receiver_status_interval = 10s # send replies at least this
> > often 0 disables
> > hot_standby_feedback = on # send info from standby to
> > prevent query conflicts
> > wal_receiver_timeout = 0 # time that receiver waits for
> > communication from master in milliseconds; 0 disables
> > wal_retrieve_retry_interval = 5s # time to wait before retrying
> > to retrieve WAL after a failed attempt
>
> What are the settings for:
>
> archive_mode
> archive_command
>
> on the standby?
>
> Are the files in pg_xlog on the standby mostly from well in the past?
>
>
> Actually, standby server is sending wals to a backup (barman) server:
>
> archive_mode = always # enables archiving; off, on, or always
> (change requires restart)
> archive_command = 'rsync -e "ssh -2 -C -p 2022" -az %p
> barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'
And the above is working, the files are showing up on the barman server?
Yes, it is working. Last X'log file is present on all thee servers.
Also, comparting last transaction number on master and slave shows that all are in sync.
Last, but not least, select max(id) from a busy table shows same id (when queried almost simultaneously using a simple test routine).
>
>
> The files are about 7 months old.
Are there newer files that would indicate that the streaming is working?
Yes, streaming is working properly (as stated above).
Thanks,
Edson Richter
>
> Thanks,
>
> Edson
>
> >
> >
> > Regards,
> >
> > Edson
> >
> > >
> > >
> > > Edson
> > >
> >
> > --
> > Adrian Klaver
> > adrian.klaver@aklaver.com
> >
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com
>
--
Adrian Klaver
adrian.klaver@aklaver.com
On 2/22/20 2:51 PM, Edson Richter wrote: > > Yes, it is working. Last X'log file is present on all thee servers. > Also, comparting last transaction number on master and slave shows that > all are in sync. > Last, but not least, select max(id) from a busy table shows same id > (when queried almost simultaneously using a simple test routine). Well something is keeping those WAL file around. You probably should analyze your complete setup to see what else is touching those servers. > > > > > > > The files are about 7 months old. > > Are there newer files that would indicate that the streaming is working? > > > Yes, streaming is working properly (as stated above). > > Thanks, > > > Edson Richter > > >> -- Adrian Klaver adrian.klaver@aklaver.com
De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: sábado, 22 de fevereiro de 2020 20:34
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?On 2/22/20 2:51 PM, Edson Richter wrote:
>
> Yes, it is working. Last X'log file is present on all thee servers.
> Also, comparting last transaction number on master and slave shows that
> all are in sync.
> Last, but not least, select max(id) from a busy table shows same id
> (when queried almost simultaneously using a simple test routine).
Well something is keeping those WAL file around. You probably should
analyze your complete setup to see what else is touching those servers.
It is safe to add a "--remove-source-files" into my archive_command as folows into my slave server?
archive_command = 'rsync --remove-source-files -e "ssh -2 -C -p 2022" -az %p barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'
and remove the xlog file after copy to barman?
I mean, whem the archive command starts, the wal has been already processed by the slave server, so we don't need them after copying to backup server, right?
Regards,
Edson
>
> >
> >
> > The files are about 7 months old.
>
> Are there newer files that would indicate that the streaming is working?
>
>
> Yes, streaming is working properly (as stated above).
>
> Thanks,
>
>
> Edson Richter
>
>
>>
--
Adrian Klaver
adrian.klaver@aklaver.com
On 2/23/20 8:04 AM, Edson Richter wrote: > ------------------------------------------------------------------------ > > *De:* Adrian Klaver <adrian.klaver@aklaver.com> > *Enviado:* sábado, 22 de fevereiro de 2020 20:34 > *Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general > <pgsql-general@postgresql.org> > *Assunto:* Re: Replication: slave server has 3x size of production > server? > On 2/22/20 2:51 PM, Edson Richter wrote: > > > > > Yes, it is working. Last X'log file is present on all thee servers. > > Also, comparting last transaction number on master and slave shows that > > all are in sync. > > Last, but not least, select max(id) from a busy table shows same id > > (when queried almost simultaneously using a simple test routine). > > Well something is keeping those WAL file around. You probably should > analyze your complete setup to see what else is touching those servers. > > > It is safe to add a "--remove-source-files" into my archive_command as > folows into my slave server? I would say not. See: https://www.postgresql.org/docs/12/wal-configuration.html "Checkpoints are points in the sequence of transactions at which it is guaranteed that the heap and index data files have been updated with all information written before that checkpoint. At checkpoint time, all dirty data pages are flushed to disk and a special checkpoint record is written to the log file. (The change records were previously flushed to the WAL files.) In the event of a crash, the crash recovery procedure looks at the latest checkpoint record to determine the point in the log (known as the redo record) from which it should start the REDO operation. Any changes made to data files before that point are guaranteed to be already on disk. Hence, after a checkpoint, log segments preceding the one containing the redo record are no longer needed and can be recycled or removed. (When WAL archiving is being done, the log segments must be archived before being recycled or removed.)" So there is a window where a WAL is written but before the data it represents is check pointed, so it still needed. > > > archive_command = 'rsync --remove-source-files -e "ssh -2 -C -p 2022" > -az %p barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f' > > > and remove the xlog file after copy to barman? > I mean, whem the archive command starts, the wal has been already > processed by the slave server, so we don't need them after copying to > backup server, right? > > > Regards, > > Edson > > > > > > > > > > > > The files are about 7 months old. > > > > Are there newer files that would indicate that the streaming is working? > > > > > > Yes, streaming is working properly (as stated above). > > > > Thanks, > > > > > > Edson Richter > > > > > >> > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com > -- Adrian Klaver adrian.klaver@aklaver.com
De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: domingo, 23 de fevereiro de 2020 15:42
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?On 2/23/20 8:04 AM, Edson Richter wrote:
> ------------------------------------------------------------------------
>
> *De:* Adrian Klaver <adrian.klaver@aklaver.com>
> *Enviado:* sábado, 22 de fevereiro de 2020 20:34
> *Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general
> <pgsql-general@postgresql.org>
> *Assunto:* Re: Replication: slave server has 3x size of production
> server?
> On 2/22/20 2:51 PM, Edson Richter wrote:
>
> >
> > Yes, it is working. Last X'log file is present on all thee servers.
> > Also, comparting last transaction number on master and slave shows that
> > all are in sync.
> > Last, but not least, select max(id) from a busy table shows same id
> > (when queried almost simultaneously using a simple test routine).
>
> Well something is keeping those WAL file around. You probably should
> analyze your complete setup to see what else is touching those servers.
>
>
> It is safe to add a "--remove-source-files" into my archive_command as
> folows into my slave server?
I would say not. See:
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fdocs%2F12%2Fwal-configuration.html&data=02%7C01%7C%7Cb49e9c01f11a4b9fe4d108d7b8902bd2%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637180801653706393&sdata=QY24D6R%2B%2BJ7VgnctERdK964reKEp7XbxERiXGC2XL8Y%3D&reserved=0
"Checkpoints are points in the sequence of transactions at which it is
guaranteed that the heap and index data files have been updated with all
information written before that checkpoint. At checkpoint time, all
dirty data pages are flushed to disk and a special checkpoint record is
written to the log file. (The change records were previously flushed to
the WAL files.) In the event of a crash, the crash recovery procedure
looks at the latest checkpoint record to determine the point in the log
(known as the redo record) from which it should start the REDO
operation. Any changes made to data files before that point are
guaranteed to be already on disk. Hence, after a checkpoint, log
segments preceding the one containing the redo record are no longer
needed and can be recycled or removed. (When WAL archiving is being
done, the log segments must be archived before being recycled or removed.)"
So there is a window where a WAL is written but before the data it
represents is check pointed, so it still needed.
I see. Makes sense.
I suppose that long lifed xlog files are of no use then... I would expect PostgreSQL delete them automatically.
Perhaps, since I have full backups happening every odd days, I can create a "post backup command" in barman script so it will delete files above 1 week from the server it is backup up from...
I understand there is no guarantee that these files have already been processed... but if they are needed, they can be recovered from the barman server...
Thanks,
Edson
>
>
> archive_command = 'rsync --remove-source-files -e "ssh -2 -C -p 2022"
> -az %p barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'
>
>
> and remove the xlog file after copy to barman?
> I mean, whem the archive command starts, the wal has been already
> processed by the slave server, so we don't need them after copying to
> backup server, right?
>
>
> Regards,
>
> Edson
>
> >
> > >
> > >
> > > The files are about 7 months old.
> >
> > Are there newer files that would indicate that the streaming is working?
> >
> >
> > Yes, streaming is working properly (as stated above).
> >
> > Thanks,
> >
> >
> > Edson Richter
> >
> >
> >>
>
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com
>
--
Adrian Klaver
adrian.klaver@aklaver.com
Re: Replication: slave server has 3x size of production server?
From
Jehan-Guillaume de Rorthais
Date:
On Sat, 22 Feb 2020 19:23:05 +0000 Edson Richter <edsonrichter@hotmail.com> wrote: [...] > Actually, standby server is sending wals to a backup (barman) server: > > archive_mode = always # enables archiving; off, on, or always > (change requires restart) archive_command = 'rsync -e "ssh -2 -C -p 2022" -az > %p barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f' > > > The files are about 7 months old. Did you check the return code of your archive_command? Did you check the log produced by your archive_command and postmaster? How many files with ".ready" extension in "$PGDATA/pg_xlog/archive_status/"? Can you confirm there's no missing WAL between the older one and the newer one in "$PGDATA/pg_xlog" in alphanum order?