Thread: pg_upgrade -c cannot be run if old cluster is running
The following documentation comment has been logged on the website: Page: https://www.postgresql.org/docs/15/pgupgrade.html Description: https://www.postgresql.org/docs/15/pgupgrade.html tells "You can use pg_upgrade --check to perform only the checks, even if the old server is still running." I tried (upgrade from postgres 10 to postgres 15): /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B /usr/pgsql-15/bin -d /var/lib/pgsql/10/data -D /mnt/pgdata/pgdirbc15/ --link -c Log tells: ----------------------------------------------------------------- pg_upgrade run on Wed Sep 4 09:08:31 2024 ----------------------------------------------------------------- command: "/usr/pgsql-10/bin/pg_ctl" -w -l "/mnt/pgdata/pgdirbc15/pg_upgrade_output.d/20240904T090831.750/log/pg_upgrade_server.log" -D "/var/lib/pgsql/10/data" -o "-p 50432 -b -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/pgsql'" start >> "/mnt/pgdata/pgdirbc15/pg_upgrade_output.d/20240904T090831.750/log/pg_upgrade_server.log" 2>&1 waiting for server to start....2024-09-04 12:08:32.179 CEST [12197] FATAL: lock file "postmaster.pid" already exists 2024-09-04 12:08:32.179 CEST [12197] HINT: Is another postmaster (PID 25407) running in data directory "/mnt/pgdata/pgdir"? stopped waiting pg_ctl: could not start server Examine the log output. I tried the without --link, same error. It seems pg_upgrade -c requires the old server to be stopped.
PG Doc comments form <noreply@postgresql.org> writes: > https://www.postgresql.org/docs/15/pgupgrade.html tells > "You can use pg_upgrade --check to perform only the checks, even if the old > server is still running." I tested this case, and it seems to work for me. pg_upgrade will try to start a server in the old data directory, but if that fails it will assume that there's a live server to run the checks against. > waiting for server to start....2024-09-04 12:08:32.179 CEST [12197] FATAL: > lock file "postmaster.pid" already exists > 2024-09-04 12:08:32.179 CEST [12197] HINT: Is another postmaster (PID > 25407) running in data directory "/mnt/pgdata/pgdir"? > stopped waiting > pg_ctl: could not start server > Examine the log output. What I suspect happened here is some confusion about where the lock file is, leading to not recognizing that there was possibly an old live server. You didn't supply enough detail to reproduce the problem, but perhaps check whether the old and new Postgres versions were built with similar options. regards, tom lane
Here is what can be said about config (no extension, nothing special) There are already one postgresql-15 services (separate cluster, on ports <>5432) that run on this machine, and there is alsoone postgres 10 cluster (on port 5432). The aim is to update the 10 to 15 to get rid of v10. So we created a new psql 15 cluster. /usr/lib/systemd/system/postgresql-10.service has PGDATA=/var/lib/pgsql/10/data/, /var/lib/pgsql/10/data/postgtresql.confholds data_directory = '/mnt/pgdata/pgdir' The new psql-15 cluster,/usr/lib/systemd/system/postgresql-15-bc.service, has PGDATA=/mnt/pgdata/pgdirbc15/ Initdb has been run on /mnt/pgdata/pgdirbc15/, postgresql.conf holds data_directory = '/mnt/pgdata/pgdirbc15/', hba_file= '/mnt/pgdata/pgdirbc15/pg_hba.conf' pg_hba.conf has been copied from psql10 data dir postgresql.conf has been adapted to match the v10 one: listen_addresses = '*', port=5455, max_connections=150, password_encryption= md5, ssl=on (certificates have been copied from previous cluster to new one), wal_level=replica. I leftarchive_mode=off on new server although it is on on the old one. For new cluster postgresql.conf is in PGDATA directory (which is not the case of the old one). systemctl start postgresql-15-bc works without any problem. It has been stopped before pg_upgrade --check. Postgres 10 cannot be stopped, it is a live prod system.
Tried (as postgres) /usr/pgsql-15/bin/pg_upgrade --check -b /usr/pgsql-10/bin/ -B /usr/pgsql-15/bin -d /var/lib/pgsql/10/data -D /mnt/pgdata/pgdirbc15-p5432 -P5455 -Upostgres -v Running in verbose mode Performing Consistency Checks ----------------------------- Checking cluster versions ok Current pg_control values: pg_control version number: 1002 Catalog version number: 201707211 Database system identifier: 6631834114473972710 Latest checkpoint's TimeLineID: 1 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0:555 Latest checkpoint's NextOID: 13809 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Latest checkpoint's oldestXID: 548 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 0 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 1 Latest checkpoint's oldestCommitTsXid:0 Latest checkpoint's newestCommitTsXid:0 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Size of a large-object chunk: 2048 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value Data page checksum version: 0 Values to be changed: First log segment after reset: 000000010000000000000002 Current pg_control values: pg_control version number: 1300 Catalog version number: 202209061 Database system identifier: 7410372953202657258 Latest checkpoint's TimeLineID: 1 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0:732 Latest checkpoint's NextOID: 16387 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Latest checkpoint's oldestXID: 716 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 0 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 1 Latest checkpoint's oldestCommitTsXid:0 Latest checkpoint's newestCommitTsXid:0 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Size of a large-object chunk: 2048 Date/time type storage: 64-bit integers Float8 argument passing: by value Data page checksum version: 0 Values to be changed: First log segment after reset: 000000010000000000000002 "/usr/pgsql-10/bin/pg_ctl" -w -l "/mnt/pgdata/pgdirbc15/pg_upgrade_output.d/20240905T083735.457/log/pg_upgrade_server.log"-D "/var/lib/pgsql/10/data" -o "-p5432 -b -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/pgsql'" start >>"/mnt/pgdata/pgdirbc15/pg_upgrade_output.d/20240905T083735.457/log/pg_upgrade_server.log" 2>&1 *failure* There were problems executing ""/usr/pgsql-10/bin/pg_ctl" -w -l "/mnt/pgdata/pgdirbc15/pg_upgrade_output.d/20240905T083735.457/log/pg_upgrade_server.log"-D "/var/lib/pgsql/10/data" -o "-p5432 -b -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/pgsql'" start >>"/mnt/pgdata/pgdirbc15/pg_upgrade_output.d/20240905T083735.457/log/pg_upg rade_server.log" 2>&1" Consult the last few lines of "/mnt/pgdata/pgdirbc15/pg_upgrade_output.d/20240905T083735.457/log/pg_upgrade_server.log" forthe probable cause of the failure. connection to server on socket "/var/lib/pgsql/.s.PGSQL.5432" failed: No such file or directory Is the server running locally and accepting connections on that socket? could not connect to source postmaster started with the command: "/usr/pgsql-10/bin/pg_ctl" -w -l "/mnt/pgdata/pgdirbc15/pg_upgrade_output.d/20240905T083735.457/log/pg_upgrade_server.log"-D "/var/lib/pgsql/10/data" -o "-p5432 -b -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/pgsql'" start Failure, exiting
Alain Bourgeois <alain.bourgeois@zetes.com> writes: > /usr/lib/systemd/system/postgresql-10.service has PGDATA=/var/lib/pgsql/10/data/, /var/lib/pgsql/10/data/postgtresql.confholds data_directory = '/mnt/pgdata/pgdir' This is probably the source of your problem. IIRC, pg_upgrade needs to be pointed at the actual old data directory, not PGDATA=/var/lib/pgsql/10/data. regards, tom lane
> On 5 Sep 2024, at 15:39, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alain Bourgeois <alain.bourgeois@zetes.com> writes: >> /usr/lib/systemd/system/postgresql-10.service has PGDATA=/var/lib/pgsql/10/data/, /var/lib/pgsql/10/data/postgtresql.confholds data_directory = '/mnt/pgdata/pgdir' > > This is probably the source of your problem. IIRC, pg_upgrade > needs to be pointed at the actual old data directory, not > PGDATA=/var/lib/pgsql/10/data. Correct, pg_upgrade use the PGDATANEW and PGDATAOLD env variables for the cluster data directories. -- Daniel Gustafsson
[ Please keep the mailing list cc'd ] Alain Bourgeois <alain.bourgeois@zetes.com> writes: > I confirm copyig config to data dir and specifying data dir works > /usr/pgsql-15/bin/pg_upgrade --check -b /usr/pgsql-10/bin/ -B /usr/pgsql-15/bin -d /mnt/pgdata/pgdir -D /mnt/pgdata/pgdirbc15--check Hm. On second look, there is code in there that intends to deal with -d pointing at a configuration-only directory, and it seems to work for me: $ pg_upgrade --check -d /home/postgres/version10/config ...etc... Finding the real data directory for the source cluster ok Performing Consistency Checks on Old Live Server ------------------------------------------------ Checking cluster versions ok Checking database user is the install user ok Checking database connection settings ok ... The lack of any comment about "Finding the real data directory" in your output is suspicious. I notice that the code that does this has a short-circuit case that perhaps could get fooled: /* If PG_VERSION exists, it can't be a config-only dir */ snprintf(filename, sizeof(filename), "%s/PG_VERSION", cluster->pgconfig); if ((fp = fopen(filename, "r")) != NULL) { fclose(fp); return; } Do you have a PG_VERSION file in /var/lib/pgsql/10/data? If so, why? regards, tom lane
When the cluster was created, we did initdb /var/lib/pgsql10/data, the storagebox was not available. Then storagebox was delivered, we copied data folder to storagebox in /mnt/pgdata/pgdir and changed postgresql.conf. We didn'tdelete existing datafiles from /var/lib/pgsql10/data (db was empty).
Alain Bourgeois <alain.bourgeois@zetes.com> writes: > When the cluster was created, we did initdb /var/lib/pgsql10/data, the storagebox was not available. > Then storagebox was delivered, we copied data folder to storagebox in /mnt/pgdata/pgdir and changed postgresql.conf. Wedidn't delete existing datafiles from /var/lib/pgsql10/data (db was empty). Yeah, after further experimentation I saw that pg_upgrade would fail in a pretty obvious way unless the supposedly-config-only directory also contains a full set of subdirectories (base, pg_wal, etc). We could imagine removing that check for PG_VERSION. I don't think that the ensuing call of "postgres -C data_directory" would add anything meaningful to pg_upgrade's runtime. But then you'd see "Finding the real data directory" every time, indeed twice (for source and target directories). That seems like it would create more confusion than is justified. On the whole I think this is self-inflicted damage. Leaving that stuff around was just asking for confusion. regards, tom lane