Thread: pg_upgrade of 11 -> 13: free(): invalid pointer
I’m continuing my upgrade journey, this time from 11 to 13, and the process is dying in the copy phase, always on the sameDB: — Performing Upgrade ------------------ Analyzing all rows in the new cluster ok Freezing all rows in the new cluster ok Deleting files from new pg_xact ok Copying old pg_xact to new server ok Setting next transaction ID and epoch for new cluster ok Deleting files from new pg_multixact/offsets ok Copying old pg_multixact/offsets to new server ok Deleting files from new pg_multixact/members ok Copying old pg_multixact/members to new server ok Setting next multixact ID and offset for new cluster ok Resetting WAL archives ok Setting frozenxid and minmxid counters in new cluster ok Restoring global objects in the new cluster ok Restoring database schemas in the new cluster messages *failure* Consult the last few lines of "pg_upgrade_dump_16387.log" for the probable cause of the failure. Failure, exiting — The log contains (which is different each time): — pg_restore: WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because anotherserver process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. pg_restore: creating COMMENT "public.FUNCTION "st_isempty"("rast" "public"."raster")" pg_restore: while PROCESSING TOC: pg_restore: from TOC entry 5338; 0 0 COMMENT FUNCTION "st_isempty"("rast" "public"."raster") postgres pg_restore: error: could not execute query: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. Command was: COMMENT ON FUNCTION "public"."st_isempty"("rast" "public"."raster") IS 'args: rast - Returns true if the rasteris empty (width = 0 and height = 0). Otherwise, returns false.’; — And the pgsql13 server log contains: — 2020-11-17 11:51:40.953 EST [96545] LOG: database system is ready to accept connections free(): invalid pointer 2020-11-17 11:51:42.880 EST [96545] LOG: server process (PID 96575) was terminated by signal 6: Aborted 2020-11-17 11:51:42.880 EST [96545] LOG: terminating any other active server processes 2020-11-17 11:51:42.880 EST [96582] WARNING: terminating connection because of crash of another server process 2020-11-17 11:51:42.880 EST [96582] DETAIL: The postmaster has commanded this server process to roll back the current transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. 2020-11-17 11:51:42.880 EST [96582] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2020-11-17 11:51:42.884 EST [96545] LOG: all server processes terminated; reinitializing 2020-11-17 11:51:42.904 EST [96545] LOG: received fast shutdown request 2020-11-17 11:51:42.905 EST [96585] LOG: database system was interrupted; last known up at 2020-11-17 11:51:42 EST 2020-11-17 11:51:42.906 EST [96585] LOG: database system was not properly shut down; automatic recovery in progress 2020-11-17 11:51:42.906 EST [96585] LOG: redo starts at E0/DB6B2960 2020-11-17 11:51:42.907 EST [96545] LOG: abnormal database system shutdown 2020-11-17 11:51:42.909 EST [96545] LOG: database system is shut down — So I’m assuming it’s that free() call. Servers have PostGIS 3.0 on them, all installed from repo, and running CentOS 8.
On 11/17/20 8:59 AM, Jeremy Wilson wrote: > I’m continuing my upgrade journey, this time from 11 to 13, and the process is dying in the copy phase, always on the sameDB: > > — > Performing Upgrade > ------------------ > Analyzing all rows in the new cluster ok > Freezing all rows in the new cluster ok > Deleting files from new pg_xact ok > Copying old pg_xact to new server ok > Setting next transaction ID and epoch for new cluster ok > Deleting files from new pg_multixact/offsets ok > Copying old pg_multixact/offsets to new server ok > Deleting files from new pg_multixact/members ok > Copying old pg_multixact/members to new server ok > Setting next multixact ID and offset for new cluster ok > Resetting WAL archives ok > Setting frozenxid and minmxid counters in new cluster ok > Restoring global objects in the new cluster ok > Restoring database schemas in the new cluster > messages > *failure* > > Consult the last few lines of "pg_upgrade_dump_16387.log" for > the probable cause of the failure. > Failure, exiting > — > > The log contains (which is different each time): > > — > pg_restore: WARNING: terminating connection because of crash of another server process > DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because anotherserver process exited abnormally and possibly corrupted shared memory. > HINT: In a moment you should be able to reconnect to the database and repeat your command. > pg_restore: creating COMMENT "public.FUNCTION "st_isempty"("rast" "public"."raster")" > pg_restore: while PROCESSING TOC: > pg_restore: from TOC entry 5338; 0 0 COMMENT FUNCTION "st_isempty"("rast" "public"."raster") postgres > pg_restore: error: could not execute query: server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > Command was: COMMENT ON FUNCTION "public"."st_isempty"("rast" "public"."raster") IS 'args: rast - Returns true if the rasteris empty (width = 0 and height = 0). Otherwise, returns false.’; > — > > And the pgsql13 server log contains: > > — > 2020-11-17 11:51:40.953 EST [96545] LOG: database system is ready to accept connections > free(): invalid pointer > 2020-11-17 11:51:42.880 EST [96545] LOG: server process (PID 96575) was terminated by signal 6: Aborted > 2020-11-17 11:51:42.880 EST [96545] LOG: terminating any other active server processes > 2020-11-17 11:51:42.880 EST [96582] WARNING: terminating connection because of crash of another server process > 2020-11-17 11:51:42.880 EST [96582] DETAIL: The postmaster has commanded this server process to roll back the currenttransaction and exit, because another server process exited abnormally and possibly corrupted shared memory. > 2020-11-17 11:51:42.880 EST [96582] HINT: In a moment you should be able to reconnect to the database and repeat yourcommand. > 2020-11-17 11:51:42.884 EST [96545] LOG: all server processes terminated; reinitializing > 2020-11-17 11:51:42.904 EST [96545] LOG: received fast shutdown request > 2020-11-17 11:51:42.905 EST [96585] LOG: database system was interrupted; last known up at 2020-11-17 11:51:42 EST > 2020-11-17 11:51:42.906 EST [96585] LOG: database system was not properly shut down; automatic recovery in progress > 2020-11-17 11:51:42.906 EST [96585] LOG: redo starts at E0/DB6B2960 > 2020-11-17 11:51:42.907 EST [96545] LOG: abnormal database system shutdown > 2020-11-17 11:51:42.909 EST [96545] LOG: database system is shut down > — > > So I’m assuming it’s that free() call. Servers have PostGIS 3.0 on them, all installed from repo, and running CentOS 8. Was this after a clean install of the corrected RPM's? -- Adrian Klaver adrian.klaver@aklaver.com
> On Nov 17, 2020, at 12:18 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > > On 11/17/20 8:59 AM, Jeremy Wilson wrote: > > Was this after a clean install of the corrected RPM’s? Yes, this is a fresh install of CentOS 8 and installed using the updated repo and RPMs.
On Tue, Nov 17, 2020 at 11:59:10AM -0500, Jeremy Wilson wrote: > pg_restore: WARNING: terminating connection because of crash of another server process > DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because anotherserver process exited abnormally and possibly corrupted shared memory. > HINT: In a moment you should be able to reconnect to the database and repeat your command. > pg_restore: creating COMMENT "public.FUNCTION "st_isempty"("rast" "public"."raster")" > pg_restore: while PROCESSING TOC: > pg_restore: from TOC entry 5338; 0 0 COMMENT FUNCTION "st_isempty"("rast" "public"."raster") postgres > pg_restore: error: could not execute query: server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > Command was: COMMENT ON FUNCTION "public"."st_isempty"("rast" "public"."raster") IS 'args: rast - Returns true if the rasteris empty (width = 0 and height = 0). Otherwise, returns false.’; My guess is that this is a crash in the PostGIS shared library. I would ask the PostGIS team if they know of any crash cases, and if not, I think you need to do a pg_dump of the database and test-load it into a new database to see what query makes it fail, and then load debug symbols and do a backtrace of the stack at the point of the crash. Yeah, not fun. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
On Tue, Nov 17, 2020 at 02:44:47PM -0500, Bruce Momjian wrote: > On Tue, Nov 17, 2020 at 11:59:10AM -0500, Jeremy Wilson wrote: > > pg_restore: WARNING: terminating connection because of crash of another server process > > DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because anotherserver process exited abnormally and possibly corrupted shared memory. > > HINT: In a moment you should be able to reconnect to the database and repeat your command. > > pg_restore: creating COMMENT "public.FUNCTION "st_isempty"("rast" "public"."raster")" > > pg_restore: while PROCESSING TOC: > > pg_restore: from TOC entry 5338; 0 0 COMMENT FUNCTION "st_isempty"("rast" "public"."raster") postgres > > pg_restore: error: could not execute query: server closed the connection unexpectedly > > This probably means the server terminated abnormally > > before or while processing the request. > > Command was: COMMENT ON FUNCTION "public"."st_isempty"("rast" "public"."raster") IS 'args: rast - Returns true if theraster is empty (width = 0 and height = 0). Otherwise, returns false.’; > > My guess is that this is a crash in the PostGIS shared library. I would > ask the PostGIS team if they know of any crash cases, and if not, I > think you need to do a pg_dump of the database and test-load it into a > new database to see what query makes it fail, and then load debug > symbols and do a backtrace of the stack at the point of the crash. > Yeah, not fun. Actually pg_dump --schema-only is what you want to dump and load into a separate databsae. No need to dump the data. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
> On Nov 17, 2020, at 11:44 AM, Bruce Momjian <bruce@momjian.us> wrote: > > On Tue, Nov 17, 2020 at 11:59:10AM -0500, Jeremy Wilson wrote: >> pg_restore: WARNING: terminating connection because of crash of another server process >> DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because anotherserver process exited abnormally and possibly corrupted shared memory. >> HINT: In a moment you should be able to reconnect to the database and repeat your command. >> pg_restore: creating COMMENT "public.FUNCTION "st_isempty"("rast" "public"."raster")" >> pg_restore: while PROCESSING TOC: >> pg_restore: from TOC entry 5338; 0 0 COMMENT FUNCTION "st_isempty"("rast" "public"."raster") postgres >> pg_restore: error: could not execute query: server closed the connection unexpectedly >> This probably means the server terminated abnormally >> before or while processing the request. >> Command was: COMMENT ON FUNCTION "public"."st_isempty"("rast" "public"."raster") IS 'args: rast - Returns true if theraster is empty (width = 0 and height = 0). Otherwise, returns false.’; > > My guess is that this is a crash in the PostGIS shared library. I would > ask the PostGIS team if they know of any crash cases, and if not, I > think you need to do a pg_dump of the database and test-load it into a > new database to see what query makes it fail, and then load debug > symbols and do a backtrace of the stack at the point of the crash. > Yeah, not fun. These kinds of problems have been almost always due to multiple versions of dependencies installed simultaneously. So packagingfun. You'll get some version of postgis compiled against one train of dependencies and another against another train,and for upgrade both trains will end up installed simultaneously, and things will break. P > > -- > Bruce Momjian <bruce@momjian.us> https://momjian.us > EnterpriseDB https://enterprisedb.com > > The usefulness of a cup is in its emptiness, Bruce Lee > > >