Thread: BUG #5004: pg_freespacemap make a SegFault
The following bug has been logged online: Bug reference: 5004 Logged by: Sebastien Lardiere Email address: slardiere@hi-media.com PostgreSQL version: 8.3.7 Operating system: Debian Etch Description: pg_freespacemap make a SegFault Details: I've got a crash with a cluster. Nothing found in the logfile, but a message about a Segfault, so I get a coredump and here is the backtrace : Core was generated by `postgres: postgres postgres [local] SELECT '. Program terminated with signal 11, Segmentation fault. #0 pg_freespacemap_pages (fcinfo=0x7fff4a9bc250) at pg_freespacemap.c:162 162 fctx->record[i].reltablespace = fsmrel->key.spcNode; (gdb) bt #0 pg_freespacemap_pages (fcinfo=0x7fff4a9bc250) at pg_freespacemap.c:162 #1 0x0000000000526781 in ExecMakeTableFunctionResult (funcexpr=0x29c2408, econtext=0x29c1b70, expectedDesc=0x29c1ed0, returnDesc=0x7fff4a9bc6d0) at execQual.c:1566 #2 0x00000000005330d2 in FunctionNext (node=0x29bf620) at nodeFunctionscan.c:68 #3 0x000000000052881c in ExecScan (node=0x7fc03f6c5370, accessMtd=0x533030 <FunctionNext>) at execScan.c:68 #4 0x0000000000521f6d in ExecProcNode (node=0x29bf620) at execProcnode.c:356 #5 0x000000000052ca40 in ExecAgg (node=0x29c17f0) at nodeAgg.c:874 #6 0x0000000000521fed in ExecProcNode (node=0x29c17f0) at execProcnode.c:394 #7 0x0000000000520ffd in ExecutorRun (queryDesc=<value optimized out>, direction=ForwardScanDirection, count=0) at execMain.c:1335 #8 0x00000000005ba0d6 in PortalRunSelect (portal=0x29b47a0, forward=<value optimized out>, count=0, dest=0x29af198) at pquery.c:943 #9 0x00000000005bb159 in PortalRun (portal=0x29b47a0, count=9223372036854775807, isTopLevel=1 '\001', dest=0x29af198, altdest=0x29af198, completionTag=0x7fff4a9bcf40 "") at pquery.c:769 #10 0x00000000005b6d2d in exec_simple_query (query_string=0x2969070 "select count(*) as pages from pg_freespacemap_pages ") at postgres.c:1004 #11 0x00000000005b8071 in PostgresMain (argc=4, argv=<value optimized out>, username=0x28bf4b0 "postgres") at postgres.c:3631 #12 0x000000000058ca1b in ServerLoop () at postmaster.c:3207 #13 0x000000000058d73e in PostmasterMain (argc=5, argv=0x28ba310) at postmaster.c:1029 #14 0x0000000000544c15 in main (argc=5, argv=<value optimized out>) at main.c:188 We can see the use of contrib/pg_freespacemap. A munin plugin sent this query "select count(*) as pages from pg_freespacemap_pages " every 5 minutes ( since 1 year, now ) and we obtain graph. I notice that the graph says that our freespacemap is empty ( a few thousand of pages ) since our first crash. And sometime, the number of pages increase, and we've got a crash. If you want more detail, ask me ... Thanks, PS : Sorry for my poor english
"Sebastien Lardiere" <slardiere@hi-media.com> writes: > Description: pg_freespacemap make a SegFault There's a post-8.3.7 fix that might cure this: http://archives.postgresql.org/pgsql-committers/2009-04/msg00108.php regards, tom lane
On Fri, Aug 21, 2009 at 04:26:11PM +0000, Sebastien Lardiere wrote: > > The following bug has been logged online: > > Bug reference: 5004 > Logged by: Sebastien Lardiere > Email address: slardiere@hi-media.com > PostgreSQL version: 8.3.7 > Operating system: Debian Etch > Description: pg_freespacemap make a SegFault > Details: > > I've got a crash with a cluster. Nothing found in the logfile, but a message > about a Segfault, so I get a coredump and here is the backtrace : Can you check if you had any vacuums running at the time of crash? It might be in logs, something like: WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because anotherserver process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. STATEMENT: vacuum if yes - how many vacuum jobs there were? depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
On 21/08/2009 18:51, Tom Lane wrote: > "Sebastien Lardiere"<slardiere@hi-media.com> writes: > >> Description: pg_freespacemap make a SegFault >> > There's a post-8.3.7 fix that might cure this: > > http://archives.postgresql.org/pgsql-committers/2009-04/msg00108.php > > regards, tom lane > Ok, I'll try to appli this patch, Thanks, -- Sébastien Lardière
On 22/08/2009 19:52, hubert depesz lubaczewski wrote: > On Fri, Aug 21, 2009 at 04:26:11PM +0000, Sebastien Lardiere wrote: > >> The following bug has been logged online: >> >> Bug reference: 5004 >> Logged by: Sebastien Lardiere >> Email address: slardiere@hi-media.com >> PostgreSQL version: 8.3.7 >> Operating system: Debian Etch >> Description: pg_freespacemap make a SegFault >> Details: >> >> I've got a crash with a cluster. Nothing found in the logfile, but a message >> about a Segfault, so I get a coredump and here is the backtrace : >> > Can you check if you had any vacuums running at the time of crash? > Yes, autovacuum is on. it wasn't "normal" vacuum during the crash, but the last. Nevertheless, the day before the first crash, I made a big delete on 23 millions of rows, and pg_freespacemap show a big increase of the number of pages in FSM. Then, when the number of pages in FSM increase, Pg crashes ; but : > It might be in logs, something like: > WARNING: terminating connection because of crash of another server process > DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because anotherserver process exited abnormally and possibly corrupted shared memory. > HINT: In a moment you should be able to reconnect to the database and repeat your command. > STATEMENT: vacuum > > if yes - how many vacuum jobs there were? > > I never seen in the logs this messages with vacuum, Pg always crash with the query : "select count(*) as pages from pg_freespacemap_pages" We can see in Munin ( graph attached ), the behavior : The big increase, then, the first crash, and, a each time there is a significat increase, a crash, with a reset of FSM. I had disable the plugin, so there is no more queries with pg_freespacemap, and no crash. -- Sébastien Lardière