Thread: pg_rewind: Skip log directory for file type check like pg_wal

pg_rewind: Skip log directory for file type check like pg_wal

From
Soumyadeep Chakraborty
Date:
Hello hackers,

I think we should extend the "log" directory the same courtesy as was
done for pg_wal (pg_xlog) in 0e42397f42b.

Today, even if BOTH source and target servers have symlinked "log"
directories, pg_rewind fails with:

file "log" is of different type in source and target.

Attached is a repro patch using the 004_pg_xlog_symlink.pl test to
demonstrate the failure.
Running make check PROVE_TESTS='t/004_pg_xlog_symlink.pl'
in src/bin/pg_rewind should suffice after applying.

This is because when we use the libpq query to fetch the filemap from
the source server, we consider the log directory as a directory, even if
it is a symlink. This is because pg_stat_file() is used in that query in
libpq_traverse_files() and pg_stat_file() returns isdir=t for symlinks
to directories.

This shortcoming is somewhat called out:

 * XXX: There is no backend function to get a symbolic link's target in
 * general, so if the admin has put any custom symbolic links in the data
 * directory, they won't be copied correctly.

We could fix the query and/or pg_stat_file(). However, we would also
like to support deployments where only one of the primaries and/or
standbys have the symlink. That is not hard to conceive, given primaries
and standbys can have drastically disparate log volume and/or log
collection requirements.

Attached is a patch that treats "log" like we treat "pg_wal".

Regards,
Soumyadeep (VMware)

Attachment

Re: pg_rewind: Skip log directory for file type check like pg_wal

From
Alexander Kukushkin
Date:
Hello Soumyadeep,

The problem indeed exists, but IMO the "log" directory case must be handled differently:
1. We don't need or I would even say we don't want to sync log files from the new primary, because it destroys the actual logs, which could be very important to figure out what has happened with the old primary
2. Unlike "pg_wal", the "log" directory is not necessarily located inside PGDATA. The actual value is configured using "log_directory" GUC, which just happened to be "log" by default. And in fact actual values on source and target could be different.

Regards,
--
Alexander Kukushkin

Re: pg_rewind: Skip log directory for file type check like pg_wal

From
Soumyadeep Chakraborty
Date:
On Mon, Mar 6, 2023 at 12:28 AM Alexander Kukushkin <cyberdemn@gmail.com> wrote:
>
> Hello Soumyadeep,
>
> The problem indeed exists, but IMO the "log" directory case must be handled differently:
> 1. We don't need or I would even say we don't want to sync log files from the new primary, because it destroys the
actuallogs, which could be very important to figure out what has happened with the old primary 

Yes, this can be solved by adding "log" to excludeDirContents. We did
this for GPDB.

> 2. Unlike "pg_wal", the "log" directory is not necessarily located inside PGDATA. The actual value is configured
using"log_directory" GUC, which just happened to be "log" by default. And in fact actual values on source and target
couldbe different. 

I think we only care about files/dirs inside the datadir. Anything
outside is out of scope for
pg_rewind AFAIU. We can only address the common case here. As mentioned in this
comment:

 * XXX: There is no backend function to get a symbolic link's target in
 * general, so if the admin has put any custom symbolic links in the data
 * directory, they won't be copied correctly.

There is not much we can do about custom configurations.

Regards,
Soumyadeep (VMware)



Re: pg_rewind: Skip log directory for file type check like pg_wal

From
Alexander Kukushkin
Date:


On Mon, 6 Mar 2023 at 19:37, Soumyadeep Chakraborty <soumyadeep2007@gmail.com> wrote:

> 2. Unlike "pg_wal", the "log" directory is not necessarily located inside PGDATA. The actual value is configured using "log_directory" GUC, which just happened to be "log" by default. And in fact actual values on source and target could be different.

I think we only care about files/dirs inside the datadir. Anything
outside is out of scope for
pg_rewind AFAIU. We can only address the common case here. As mentioned in this
comment:

 * XXX: There is no backend function to get a symbolic link's target in
 * general, so if the admin has put any custom symbolic links in the data
 * directory, they won't be copied correctly.

That's exactly my point. Users are very creative.
On one node they could set log_directory to for example "pg_log" and on another one "my_log".
And they would be writing logs to $PGDATA/pg_log and $PGDATA/my_log respectively and they are both located inside datadir.

Lets assume that on the source we have "pg_log" and on the target we have "my_log" (they are configured using "log_directory" GUC).
When doing rewind in this case we want neither to remove the content of "my_log" on the target nor to copy content of "pg_log" from the source.
It couldn't be achieved just by introducing a static string "log". The "log_directory" GUC must be examined on both, source and target.

Regards,
--
Alexander Kukushkin

Re: pg_rewind: Skip log directory for file type check like pg_wal

From
Soumyadeep Chakraborty
Date:
On Mon, Mar 6, 2023 at 11:33 PM Alexander Kukushkin <cyberdemn@gmail.com> wrote:
>
>
> Lets assume that on the source we have "pg_log" and on the target we have "my_log" (they are configured using
"log_directory"GUC). 
> When doing rewind in this case we want neither to remove the content of "my_log" on the target nor to copy content of
"pg_log"from the source. 
> It couldn't be achieved just by introducing a static string "log". The "log_directory" GUC must be examined on both,
sourceand target. 

Trouble with doing that is if pg_rewind is run in non-libpq (offline)
mode. Then we would have to parse it out of the conf file(s)?
Is there a standard way of doing that?

Regards,
Soumyadeep (VMware)



Re: pg_rewind: Skip log directory for file type check like pg_wal

From
Alexander Kukushkin
Date:

On Tue, 7 Mar 2023 at 08:52, Soumyadeep Chakraborty <soumyadeep2007@gmail.com> wrote:

> It couldn't be achieved just by introducing a static string "log". The "log_directory" GUC must be examined on both, source and target.

Trouble with doing that is if pg_rewind is run in non-libpq (offline)
mode. Then we would have to parse it out of the conf file(s)?
Is there a standard way of doing that?

pg_rewind is already doing something similar for "restore_command":
/*          
 * Get value of GUC parameter restore_command from the target cluster.
 *  
 * This uses a logic based on "postgres -C" to get the value from the
 * cluster.
 */
static void
getRestoreCommand(const char *argv0)

For the running source cluster one could just use "SHOW log_directory"

Regards,
--
Alexander Kukushkin

Re: pg_rewind: Skip log directory for file type check like pg_wal

From
Daniel Gustafsson
Date:
> On 7 Mar 2023, at 08:33, Alexander Kukushkin <cyberdemn@gmail.com> wrote:

> The "log_directory" GUC must be examined on both, source and target.

Agreed, log_directory must be resolved to the configured values.  Teaching
pg_rewind about those in case they are stored in $PGDATA sounds like a good
idea though.

--
Daniel Gustafsson