Thread: pg_upgrade -c cannot be run if old cluster is running

pg_upgrade -c cannot be run if old cluster is running

From

PG Doc comments form

Date:

04 September 2024, 18:43:41

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/15/pgupgrade.html
Description:

https://www.postgresql.org/docs/15/pgupgrade.html tells
"You can use pg_upgrade --check to perform only the checks, even if the old
server is still running."
I tried (upgrade from postgres 10 to postgres 15):
/usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B /usr/pgsql-15/bin -d
/var/lib/pgsql/10/data  -D /mnt/pgdata/pgdirbc15/ --link -c
Log tells:
-----------------------------------------------------------------
  pg_upgrade run on Wed Sep  4 09:08:31 2024
-----------------------------------------------------------------

command: "/usr/pgsql-10/bin/pg_ctl" -w -l
"/mnt/pgdata/pgdirbc15/pg_upgrade_output.d/20240904T090831.750/log/pg_upgrade_server.log"
-D "/var/lib/pgsql/10/data" -o "-p 50432 -b  -c listen_addresses='' -c
unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/pgsql'"
start >>
"/mnt/pgdata/pgdirbc15/pg_upgrade_output.d/20240904T090831.750/log/pg_upgrade_server.log"
2>&1
waiting for server to start....2024-09-04 12:08:32.179 CEST [12197] FATAL:
lock file "postmaster.pid" already exists
2024-09-04 12:08:32.179 CEST [12197] HINT:  Is another postmaster (PID
25407) running in data directory "/mnt/pgdata/pgdir"?
 stopped waiting
pg_ctl: could not start server
Examine the log output.


I tried the without --link, same error.
It seems pg_upgrade -c requires the old server to be stopped.

Re: pg_upgrade -c cannot be run if old cluster is running

From

Tom Lane

Date:

04 September 2024, 23:45:54

PG Doc comments form <noreply@postgresql.org> writes:
> https://www.postgresql.org/docs/15/pgupgrade.html tells
> "You can use pg_upgrade --check to perform only the checks, even if the old
> server is still running."

I tested this case, and it seems to work for me.  pg_upgrade will try
to start a server in the old data directory, but if that fails it will
assume that there's a live server to run the checks against.

> waiting for server to start....2024-09-04 12:08:32.179 CEST [12197] FATAL: 
> lock file "postmaster.pid" already exists
> 2024-09-04 12:08:32.179 CEST [12197] HINT:  Is another postmaster (PID
> 25407) running in data directory "/mnt/pgdata/pgdir"?
>  stopped waiting
> pg_ctl: could not start server
> Examine the log output.

What I suspect happened here is some confusion about where the lock
file is, leading to not recognizing that there was possibly an old
live server.  You didn't supply enough detail to reproduce the
problem, but perhaps check whether the old and new Postgres versions
were built with similar options.

            regards, tom lane

RE: pg_upgrade -c cannot be run if old cluster is running

From

Alain Bourgeois

Date:

05 September 2024, 12:24:50

Here is what can be said about config (no extension, nothing special)
There are already one postgresql-15 services (separate cluster, on ports <>5432) that run on this machine, and there is
alsoone postgres 10 cluster (on port 5432). 
The aim is to update the 10 to 15 to get rid of v10.
So we created a new psql 15 cluster.

/usr/lib/systemd/system/postgresql-10.service has PGDATA=/var/lib/pgsql/10/data/,
/var/lib/pgsql/10/data/postgtresql.confholds data_directory = '/mnt/pgdata/pgdir' 

The new psql-15 cluster,/usr/lib/systemd/system/postgresql-15-bc.service, has PGDATA=/mnt/pgdata/pgdirbc15/

Initdb has been run on /mnt/pgdata/pgdirbc15/, postgresql.conf holds data_directory = '/mnt/pgdata/pgdirbc15/',
hba_file= '/mnt/pgdata/pgdirbc15/pg_hba.conf'  
pg_hba.conf has been copied from psql10 data dir
postgresql.conf has been adapted to match the v10 one: listen_addresses = '*', port=5455, max_connections=150,
password_encryption= md5, ssl=on (certificates have been copied from previous cluster to new one), wal_level=replica. I
leftarchive_mode=off on new server although it is on on the old one. 
For new cluster postgresql.conf is in PGDATA directory (which is not the case of the old one).
systemctl start postgresql-15-bc works without any problem. It has been stopped before pg_upgrade --check.

Postgres 10 cannot be stopped, it is a live prod system.

RE: pg_upgrade -c cannot be run if old cluster is running

From

Alain Bourgeois

Date:

05 September 2024, 12:45:02

Tried (as postgres)
/usr/pgsql-15/bin/pg_upgrade --check -b /usr/pgsql-10/bin/ -B /usr/pgsql-15/bin -d /var/lib/pgsql/10/data -D
/mnt/pgdata/pgdirbc15-p5432 -P5455 -Upostgres -v 
Running in verbose mode
Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok
Current pg_control values:

pg_control version number:            1002
Catalog version number:               201707211
Database system identifier:           6631834114473972710
Latest checkpoint's TimeLineID:       1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0:555
Latest checkpoint's NextOID:          13809
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        548
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0


Values to be changed:

First log segment after reset:        000000010000000000000002
Current pg_control values:

pg_control version number:            1300
Catalog version number:               202209061
Database system identifier:           7410372953202657258
Latest checkpoint's TimeLineID:       1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0:732
Latest checkpoint's NextOID:          16387
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        716
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float8 argument passing:              by value

Data page checksum version:           0


Values to be changed:

First log segment after reset:        000000010000000000000002
"/usr/pgsql-10/bin/pg_ctl" -w -l
"/mnt/pgdata/pgdirbc15/pg_upgrade_output.d/20240905T083735.457/log/pg_upgrade_server.log"-D "/var/lib/pgsql/10/data" -o
"-p5432 -b  -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/pgsql'" start
>>"/mnt/pgdata/pgdirbc15/pg_upgrade_output.d/20240905T083735.457/log/pg_upgrade_server.log" 2>&1 

*failure*
There were problems executing ""/usr/pgsql-10/bin/pg_ctl" -w -l
"/mnt/pgdata/pgdirbc15/pg_upgrade_output.d/20240905T083735.457/log/pg_upgrade_server.log"-D "/var/lib/pgsql/10/data" -o
"-p5432 -b  -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/pgsql'" start
>>"/mnt/pgdata/pgdirbc15/pg_upgrade_output.d/20240905T083735.457/log/pg_upg 
rade_server.log" 2>&1"
Consult the last few lines of "/mnt/pgdata/pgdirbc15/pg_upgrade_output.d/20240905T083735.457/log/pg_upgrade_server.log"
forthe probable cause of the failure. 

connection to server on socket "/var/lib/pgsql/.s.PGSQL.5432" failed: No such file or directory
Is the server running locally and accepting connections on that socket?

could not connect to source postmaster started with the command:
"/usr/pgsql-10/bin/pg_ctl" -w -l
"/mnt/pgdata/pgdirbc15/pg_upgrade_output.d/20240905T083735.457/log/pg_upgrade_server.log"-D "/var/lib/pgsql/10/data" -o
"-p5432 -b  -c listen_addresses=''  
-c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/pgsql'" start
Failure, exiting

Re: pg_upgrade -c cannot be run if old cluster is running

From

Tom Lane

Date:

05 September 2024, 16:39:02

Alain Bourgeois <alain.bourgeois@zetes.com> writes:
> /usr/lib/systemd/system/postgresql-10.service has PGDATA=/var/lib/pgsql/10/data/,
/var/lib/pgsql/10/data/postgtresql.confholds data_directory = '/mnt/pgdata/pgdir' 

This is probably the source of your problem.  IIRC, pg_upgrade
needs to be pointed at the actual old data directory, not
PGDATA=/var/lib/pgsql/10/data.

            regards, tom lane

Re: pg_upgrade -c cannot be run if old cluster is running

From

Daniel Gustafsson

Date:

05 September 2024, 16:41:10

> On 5 Sep 2024, at 15:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Alain Bourgeois <alain.bourgeois@zetes.com> writes:
>> /usr/lib/systemd/system/postgresql-10.service has PGDATA=/var/lib/pgsql/10/data/,
/var/lib/pgsql/10/data/postgtresql.confholds data_directory = '/mnt/pgdata/pgdir' 
>
> This is probably the source of your problem.  IIRC, pg_upgrade
> needs to be pointed at the actual old data directory, not
> PGDATA=/var/lib/pgsql/10/data.

Correct, pg_upgrade use the PGDATANEW and PGDATAOLD env variables for the
cluster data directories.

--
Daniel Gustafsson

Re: pg_upgrade -c cannot be run if old cluster is running

From

Tom Lane

Date:

05 September 2024, 17:57:04

[ Please keep the mailing list cc'd ]

Alain Bourgeois <alain.bourgeois@zetes.com> writes:
> I confirm copyig config to data dir and specifying data dir works
> /usr/pgsql-15/bin/pg_upgrade --check -b /usr/pgsql-10/bin/ -B /usr/pgsql-15/bin -d /mnt/pgdata/pgdir -D
/mnt/pgdata/pgdirbc15--check 

Hm.  On second look, there is code in there that intends to deal
with -d pointing at a configuration-only directory, and it seems
to work for me:

$ pg_upgrade --check -d /home/postgres/version10/config ...etc...
Finding the real data directory for the source cluster      ok
Performing Consistency Checks on Old Live Server
------------------------------------------------
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
...

The lack of any comment about "Finding the real data directory"
in your output is suspicious.  I notice that the code that
does this has a short-circuit case that perhaps could get fooled:

    /* If PG_VERSION exists, it can't be a config-only dir */
    snprintf(filename, sizeof(filename), "%s/PG_VERSION", cluster->pgconfig);
    if ((fp = fopen(filename, "r")) != NULL)
    {
        fclose(fp);
        return;
    }

Do you have a PG_VERSION file in /var/lib/pgsql/10/data?
If so, why?

            regards, tom lane

RE: pg_upgrade -c cannot be run if old cluster is running

From

Alain Bourgeois

Date:

05 September 2024, 18:04:34

When the cluster was created, we did initdb  /var/lib/pgsql10/data, the storagebox was not available.
Then storagebox was delivered, we copied data folder to storagebox in /mnt/pgdata/pgdir and changed postgresql.conf. We
didn'tdelete existing datafiles from /var/lib/pgsql10/data (db was empty).

Re: pg_upgrade -c cannot be run if old cluster is running

From

Tom Lane

Date:

05 September 2024, 18:13:18

Alain Bourgeois <alain.bourgeois@zetes.com> writes:
> When the cluster was created, we did initdb  /var/lib/pgsql10/data, the storagebox was not available.
> Then storagebox was delivered, we copied data folder to storagebox in /mnt/pgdata/pgdir and changed postgresql.conf.
Wedidn't delete existing datafiles from /var/lib/pgsql10/data (db was empty). 

Yeah, after further experimentation I saw that pg_upgrade would fail
in a pretty obvious way unless the supposedly-config-only directory
also contains a full set of subdirectories (base, pg_wal, etc).

We could imagine removing that check for PG_VERSION.  I don't think
that the ensuing call of "postgres -C data_directory" would add
anything meaningful to pg_upgrade's runtime.  But then you'd see
"Finding the real data directory" every time, indeed twice (for
source and target directories).  That seems like it would create
more confusion than is justified.

On the whole I think this is self-inflicted damage.  Leaving that
stuff around was just asking for confusion.

            regards, tom lane