Difference in the tablespace folders on primary and secondary nodes - Mailing list pgsql-general
From | Abhishek Bhola |
---|---|
Subject | Difference in the tablespace folders on primary and secondary nodes |
Date | |
Msg-id | CAEDsCzhpM3PPc03HtQ_-aqvQ=gfC06qsS-LhAr+b+nNQX1OAwQ@mail.gmail.com Whole thread Raw |
Responses |
Re: Difference in the tablespace folders on primary and secondary nodes
|
List | pgsql-general |
I recently set up a Postgres15 master-slave (Primary-secondary) cluster on 2 nodes. At the time of starting, I ensured that data files are exactly the same on both the nodes. The size of the DB is 1.5TB.
The directory structure on both nodes looks as follows:
```
- /PROD/datadg/
|
|-> /PROD/datadg/tablespace
| |-> /PROD/datadg/tablespace/tablespace1
| |-> /PROD/datadg/tablespace/tablespace2
|
|-> /PROD/datadg/data
| |-> /PROD/datadg/data/pg_tblspc
| | | -> /PROD/datadg/data/pg_tblspc/16432 -> /PROD/datadg/tablespace/tablespace1
| | | -> /PROD/datadg/data/pg_tblspc/16433 -> /PROD/datadg/tablespace/tablespace2
```
Almost a week later now, I see almost a 2GB size difference in the tablespace folders on the 2 nodes. I also see some file count difference on both the nodes.
`autovacuum` is on on both the nodes and there aren't any `idle in transaction` queries on the slave node. Also there hasn't been any disruption on the streaming replication. I did not get any error like `WAL segment already removed` or so on the slave node. `pg_stat_replication` on the master node also doesn't show anything out of the ordinary and the `sent_lsn`, `write_lsn` and `flush_lsn` are regularly updated. I can not see a difference in counts of most tables either, haven't verified for all of them.
So my **first question** is:
* Why is there a difference in the files in the tablespace folder? I can understand the difference in the modification timestamps, but some files are just missing on the slave node.
Now if I were to run `vacuumdb` on the master node, there are chances that the slave node will break and give an error like this
```
PANIC,XX000,"WAL contains references to invalid pages",,,,,"WAL redo at 875E/21A70BD0 for
Heap2/VISIBLE: cutoff xid 60350476 flags 0x01; blkref #0: rel 16405/16419/533716933, fork 2, blk 26; blkref #1: rel 16405/16419/533716933, blk 853758",,,,"","startup
```
In the case when slave node breaks, these are the steps I usually do to bring the slave node back:
1) Start `pg_backup_start('backup')` on the master node
2) rsync the files from master to slave by running the following on the slave node:
```
rsync -av --delete master_node:/PROD/datadg/data/ /PROD/datadg/data --exclude 'pg_log' --exclude 'pg_replslot'
```
3. Stop `pg_backup_stop()` on master node
4. Start the slave node again and it usually works, even though the tablespace files might not still be the same.
**Second question**:
* What is the best way to bring the slave node back? Is the `rsync` between tablespaces required? And if yes, what is the best method to do it for very large databases, something maybe as big as 30TB or more. I don't want to `rsync` all the files even if the timestamp on them is different. So is a command like this safe to do? Or should an option like `--checksum` be used?
```
rsync -av --delete master_node:/PROD/datadg/tablespace/ /PROD/datadg/tablespace --size-only
```
**Third question:**
* Is it advised to run `vacuumdb` before or after bringing the slave node back again?
The directory structure on both nodes looks as follows:
```
- /PROD/datadg/
|
|-> /PROD/datadg/tablespace
| |-> /PROD/datadg/tablespace/tablespace1
| |-> /PROD/datadg/tablespace/tablespace2
|
|-> /PROD/datadg/data
| |-> /PROD/datadg/data/pg_tblspc
| | | -> /PROD/datadg/data/pg_tblspc/16432 -> /PROD/datadg/tablespace/tablespace1
| | | -> /PROD/datadg/data/pg_tblspc/16433 -> /PROD/datadg/tablespace/tablespace2
```
Almost a week later now, I see almost a 2GB size difference in the tablespace folders on the 2 nodes. I also see some file count difference on both the nodes.
`autovacuum` is on on both the nodes and there aren't any `idle in transaction` queries on the slave node. Also there hasn't been any disruption on the streaming replication. I did not get any error like `WAL segment already removed` or so on the slave node. `pg_stat_replication` on the master node also doesn't show anything out of the ordinary and the `sent_lsn`, `write_lsn` and `flush_lsn` are regularly updated. I can not see a difference in counts of most tables either, haven't verified for all of them.
So my **first question** is:
* Why is there a difference in the files in the tablespace folder? I can understand the difference in the modification timestamps, but some files are just missing on the slave node.
Now if I were to run `vacuumdb` on the master node, there are chances that the slave node will break and give an error like this
```
PANIC,XX000,"WAL contains references to invalid pages",,,,,"WAL redo at 875E/21A70BD0 for
Heap2/VISIBLE: cutoff xid 60350476 flags 0x01; blkref #0: rel 16405/16419/533716933, fork 2, blk 26; blkref #1: rel 16405/16419/533716933, blk 853758",,,,"","startup
```
In the case when slave node breaks, these are the steps I usually do to bring the slave node back:
1) Start `pg_backup_start('backup')` on the master node
2) rsync the files from master to slave by running the following on the slave node:
```
rsync -av --delete master_node:/PROD/datadg/data/ /PROD/datadg/data --exclude 'pg_log' --exclude 'pg_replslot'
```
3. Stop `pg_backup_stop()` on master node
4. Start the slave node again and it usually works, even though the tablespace files might not still be the same.
**Second question**:
* What is the best way to bring the slave node back? Is the `rsync` between tablespaces required? And if yes, what is the best method to do it for very large databases, something maybe as big as 30TB or more. I don't want to `rsync` all the files even if the timestamp on them is different. So is a command like this safe to do? Or should an option like `--checksum` be used?
```
rsync -av --delete master_node:/PROD/datadg/tablespace/ /PROD/datadg/tablespace --size-only
```
**Third question:**
* Is it advised to run `vacuumdb` before or after bringing the slave node back again?
This correspondence (including any attachments) is for the intended recipient(s) only. It may contain confidential or privileged information or both. No confidentiality or privilege is waived or lost by any mis-transmission. If you receive this correspondence by mistake, please contact the sender immediately, delete this correspondence (and all attachments) and destroy any hard copies. You must not use, disclose, copy, distribute or rely on any part of this correspondence (including any attachments) if you are not the intended recipient(s).
pgsql-general by date: