Bogus error handling in pg_upgrade - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Bogus error handling in pg_upgrade |
Date | |
Msg-id | 1413.1388294704@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Bogus error handling in pg_upgrade
Re: Bogus error handling in pg_upgrade |
List | pgsql-hackers |
A credulous person might suppose that this chunk of code is designed to abort if pg_resetxlog fails: prep_status("Setting next transaction ID for new cluster"); exec_prog(UTILITY_LOG_FILE, NULL, true, "\"%s/pg_resetxlog\"-f -x %u \"%s\"", new_cluster.bindir, old_cluster.controldata.chkpnt_nxtxid, new_cluster.pgdata); check_ok(); In point of fact, it does no such thing, but blithely continues (even though pg_resetxlog has corrupted things horribly before failing). check_ok() is particularly badly named, since it contains not one iota of error checking. misleadingly_claim_ok() would be a better name. If this isn't broken-by-design, I'd like an explanation why not. In case you're wondering, I'm investigating the problem mentioned at <1387636762.30013.13.camel@vanquo.pezone.net>. I see this output: Performing Upgrade ------------------ Analyzing all rows in the new cluster ok Freezing all rows on the new cluster ok Deleting files from new pg_clog ok Copying old pg_clog to new server ok Setting next transaction ID for new cluster *failure* Consult the last few lines of "pg_upgrade_utility.log" for the probable cause of the failure. ok Deleting files from new pg_multixact/offsets ok Copying old pg_multixact/offsets to new server ok Deleting files from new pg_multixact/members ok Copying old pg_multixact/members to new server ok Setting next multixact ID and offset for new cluster ok Resetting WAL archives *failure* Consult the last few lines of "pg_upgrade_utility.log" for the probable cause of the failure. ok *failure* Consult the last few lines of "pg_upgrade_server_start.log" or "pg_upgrade_server.log" for the probable cause of the failure. connection to database failed: could not connect to server: No such file or directory Is the server running locallyand accepting connections on Unix domain socket "/Users/tgl/pgsql/contrib/pg_upgrade/.s.PGSQL.57632"? could not connect to new postmaster started with the command: "/Users/tgl/pgsql/contrib/pg_upgrade/tmp_check/install//Users/tgl/testversion/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D"/Users/tgl/pgsql/contrib/pg_upgrade/tmp_check/data" -o "-p 57632 -b -c synchronous_commit=off -c fsync=off -c full_page_writes=off -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/Users/tgl/pgsql/contrib/pg_upgrade'"start Failure, exiting make: *** [check] Error 1 I think the actual problem is that pg_resetxlog rewrites pg_control, zaps everything in pg_xlog/, and then fails before writing a new initial xlog segment. However, pg_upgrade isn't making this any easier to investigate by failing to stop at the first sign of trouble. regards, tom lane
pgsql-hackers by date: