Thread: BUG #18971: Server passes an invalid (indirect) path in PGDATA to the external program

The following bug has been logged on the website:

Bug reference:      18971
Logged by:          Dmitry Kovalenko
Email address:      d.kovalenko@postgrespro.ru
PostgreSQL version: 18beta1
Operating system:   Ubuntu 2024.04
Description:

Hello,
PG version is 18beta1 (bbccf7ec)
OS is Ubunty 24.04
---------- SHORT DESCRIPTION
Server passes an invalid (indirect) path in PGDATA to the external program.
Server must provide to the external programs a right, absolute path in
PGDATA.
---------- FULL DESCRIPTION / STEPS
1) Our SPECIAL test starts PG with an indirect path to the database cluster
files:
CWD is:
"/home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01"
COMMAND is:
pg_ctl start -D

"../../../../../../../../../../../..//home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/tests/tmp_dirs/ArchiveTest/test_archive_get_relative_path/restored/data"
-l

"../../../../../../../../../../../..//home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/tests/tmp_dirs/ArchiveTest/test_archive_get_relative_path/restored/logs/postgres.log"
postgresql.auto.conf of this cluster has the following lines:
------ [begin of postgresql.auto.conf]
archive_mode =  'on'
archive_command =  'exec

"/home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/builddir/src/pg_probackup3"
archive-push -B

/home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/tests/tmp_dirs/ArchiveTest/test_archive_get_relative_path/backup
--instance=node --compress-algorithm=zlib --log-level-console=trace
--no-sync --wal-file-path=%p --wal-file-name=%f'
recovery_target_timeline =  'current'
recovery_target_action =  'pause'
restore_command =

'"/home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/builddir/src/pg_probackup3"
archive-get -B

"/home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/tests/tmp_dirs/ArchiveTest/test_archive_get_relative_path/backup"
--instance "node" --wal-file-path=%p --wal-file-name=%f'
------ [/end of postgresql.auto.conf]
2) pc_ctl starts the postmaster with this PWD:
"/home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01"
So, the path to the cluster files in "-D" is valid and is:

"/home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/tests/tmp_dirs/ArchiveTest/test_archive_get_relative_path/restored/data"
3) During start Postgres runs the external program pg_probackup3 with the
following commandline (I got it with "ps -eo args | grep probackup"):
sh -c --

"/home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/builddir/src/pg_probackup3"
archive-get -B

"/home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/tests/tmp_dirs/ArchiveTest/test_archive_get_relative_path/backup"
--instance "node" --wal-file-path=pg_wal/RECOVERYXLOG
--wal-file-name=000000010000000000000002
This command runs the next process:

/home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/builddir/src/pg_probackup3
archive-get -B

/home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/tests/tmp_dirs/ArchiveTest/test_archive_get_relative_path/backup
--instance node --wal-file-path=pg_wal/RECOVERYXLOG
--wal-file-name=000000010000000000000002
4) pg_probackup3 process has the following environment variables:

PWD="/home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/tests/tmp_dirs/ArchiveTest/test_archive_get_relative_path/restored/data"

PGDATA="../../../../../../../../../../../../home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/tests/tmp_dirs/ArchiveTest/test_archive_get_relative_path/restored/data"
---------- PROBLEM
PGDATA (4) is invalid for pg_probackup3. Because PWD of probackup (4) != PWD
of server (2).
When probackup is trying to use this PGDATA, it fails.
Server must provide to the external programs a right, absolute path in
PGDATA.
Thanks&Regards,
Dmitry Kovalenko,
Postgres Professional, Russia


PG Bug reporting form <noreply@postgresql.org> writes:
> 4) pg_probackup3 process has the following environment variables:
>
PWD="/home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/tests/tmp_dirs/ArchiveTest/test_archive_get_relative_path/restored/data"
>
PGDATA="../../../../../../../../../../../../home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/tests/tmp_dirs/ArchiveTest/test_archive_get_relative_path/restored/data"
> ---------- PROBLEM
> PGDATA (4) is invalid for pg_probackup3.

I do not think this is a Postgres bug.  The PGDATA environment
variable is not canonical, it is just the default to be used if
you don't specify a -D switch on the postmaster/pg_ctl command line.
In fact, it might not be set at all.  (I see that pg_ctl does set
it, but pg_ctl is not the only way to start the postmaster.)
Therefore, relying on PGDATA in a restore_command script isn't safe.
You should be relying on the current working directory (PWD) instead.

            regards, tom lane



Hello Tom,

> I do not think this is a Postgres bug.  The PGDATA environment
> variable is not canonical, it is just the default to be used if
> you don't specify a -D switch on the postmaster/pg_ctl command line.
> In fact, it might not be set at all.  (I see that pg_ctl does set
> it, but pg_ctl is not the only way to start the postmaster.)
> Therefore, relying on PGDATA in a restore_command script isn't safe.
> You should be relying on the current working directory (PWD) instead.
A test executes pg_ctl with the correct path to cluster and the correct CWD.

pg_ctl (and so on) executes an external program with inconsistent PGDATA 
and CWD.

This is clearly a problem on your side )

I wonder that it was not detected earlier.

For my point of view, if you can't provider a valid path in PGDATA, do 
not do it at all.

With Best Regards,

Dmitry Kovalenko




On Sat, Jun 28, 2025 at 1:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dmitry Kovalenko <d.kovalenko@postgrespro.ru> writes:
> For my point of view, if you can't provider a valid path in PGDATA, do
> not do it at all.

I'm not sure which part of "don't rely on PGDATA" you didn't get.
Under normal circumstances, Postgres itself doesn't set that at all.


If we have to tell someone to not rely on a value that we've set we've done something wrong.  Either haven't explained how to use it correctly, we've set it in unusual circumstances when we should not have, or we are choosing a bad value to set it in the first place.  Setting it to an absolute path would seem like the least problematic approach to take to make our system more usable.

David J.

On Sat, 2025-06-28 at 14:12 -0700, David G. Johnston wrote:
> On Sat, Jun 28, 2025 at 1:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I'm not sure which part of "don't rely on PGDATA" you didn't get.
> > Under normal circumstances, Postgres itself doesn't set that at all.
>
> If we have to tell someone to not rely on a value that we've set we've
> done something wrong.

I am not sure that that is an accurate description.  Look at the code
in pg_ctl.c:

    /* process command-line options */
    while ((c = getopt_long(argc, argv, "cD:e:l:m:N:o:p:P:sS:t:U:wW",
                            long_options, &option_index)) != -1)
    {
        switch (c)
        {
            case 'D':
                {
                    char       *pgdata_D;

                    pgdata_D = pg_strdup(optarg);
                    canonicalize_path(pgdata_D);
                    setenv("PGDATA", pgdata_D, 1);

                    /*
                     * We could pass PGDATA just in an environment variable
                     * but we do -D too for clearer postmaster 'ps' display
                     */
                    pgdata_opt = psprintf("-D \"%s\" ", pgdata_D);
                    free(pgdata_D);
                    break;
                }

So we take the value specified with the -D option, canonicalize_path() it
and set it in the environment.  canonicalize_path() does the following:

/*
 * canonicalize_path()
 *
 *  Clean up path by:
 *      o  make Win32 path use Unix slashes
 *      o  remove trailing quote on Win32
 *      o  remove trailing slash
 *      o  remove duplicate (adjacent) separators
 *      o  remove '.' (unless path reduces to only '.')
 *      o  process '..' ourselves, removing it if possible
 *  Modifies path in-place.

So, essentially, Dmitry feeds a weird path to the -D option and then
complains that PGDATA is set to a weird value.

Yours,
Laurenz Albe