pg_rewind copies - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject pg_rewind copies
Date
Msg-id f67feb24-5833-88cb-1020-19a4a2b83ac7@iki.fi
Whole thread Raw
Responses Re: pg_rewind copies  (Cary Huang <cary.huang@highgo.ca>)
List pgsql-hackers
If a file is modified and becomes larger in the source system while 
pg_rewind is running, pg_rewind can leave behind a partial copy of file. 
That's by design, and it's OK for relation files because they're 
replayed from WAL. But it can cause trouble for configuration files.

I ran into this while playing with pg_auto_failover. After failover, 
pg_auto_failover would often launch pg_rewind, and run ALTER SYSTEM on 
the primary while pg_rewind was running. The resulting rewound system 
would fail to start up:

Nov 13 09:24:42 pg-node-a pg_autoctl[2217]: 09:24:42 2220 ERROR 
2020-11-13 09:24:32.547 GMT [2246] LOG:  syntax error in file 
"/data/pgdata/postgresql.auto.conf" line 4, near token "'"
Nov 13 09:24:42 pg-node-a pg_autoctl[2217]: 09:24:42 2220 ERROR 
2020-11-13 09:24:32.547 GMT [2246] FATAL:  configuration file 
"postgresql.auto.conf" contains errors

Attached is a patch to mitigate that. It changes pg_rewind so that when 
it copies a whole file, it ignores the original file size. It's not a 
complete cure: it still believes the original size for files larger than 
1 MB. That limit was just expedient given the way the chunking logic in 
libpq_source.c works, but should be enough for configuration files.

There's another race condition that this doesn't try to fix: If a file 
is modified while it's being copied, you can have a torn file with one 
half of the file from the old version, and one half from the new. That's 
a much more narrow window, though, and pg_basebackup has the same problem.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: [PATCH] remove deprecated v8.2 containment operators
Next
From: Amit Kapila
Date:
Subject: Re: logical streaming of xacts via test_decoding is broken