Thread: BUG #5004: pg_freespacemap make a SegFault

BUG #5004: pg_freespacemap make a SegFault

From

"Sebastien Lardiere"

Date:

21 August 2009, 13:32:07

The following bug has been logged online:

Bug reference:      5004
Logged by:          Sebastien Lardiere
Email address:      slardiere@hi-media.com
PostgreSQL version: 8.3.7
Operating system:   Debian Etch
Description:        pg_freespacemap make a SegFault
Details:

I've got a crash with a cluster. Nothing found in the logfile, but a message
about a Segfault, so I get a coredump and here is the backtrace :

Core was generated by `postgres: postgres postgres [local] SELECT
                        '.
Program terminated with signal 11, Segmentation fault.
#0  pg_freespacemap_pages (fcinfo=0x7fff4a9bc250) at pg_freespacemap.c:162
162
fctx->record[i].reltablespace = fsmrel->key.spcNode;
(gdb) bt
#0  pg_freespacemap_pages (fcinfo=0x7fff4a9bc250) at pg_freespacemap.c:162
#1  0x0000000000526781 in ExecMakeTableFunctionResult (funcexpr=0x29c2408,
econtext=0x29c1b70, expectedDesc=0x29c1ed0, returnDesc=0x7fff4a9bc6d0) at
execQual.c:1566
#2  0x00000000005330d2 in FunctionNext (node=0x29bf620) at
nodeFunctionscan.c:68
#3  0x000000000052881c in ExecScan (node=0x7fc03f6c5370, accessMtd=0x533030
<FunctionNext>) at execScan.c:68
#4  0x0000000000521f6d in ExecProcNode (node=0x29bf620) at
execProcnode.c:356
#5  0x000000000052ca40 in ExecAgg (node=0x29c17f0) at nodeAgg.c:874
#6  0x0000000000521fed in ExecProcNode (node=0x29c17f0) at
execProcnode.c:394
#7  0x0000000000520ffd in ExecutorRun (queryDesc=<value optimized out>,
direction=ForwardScanDirection, count=0) at execMain.c:1335
#8  0x00000000005ba0d6 in PortalRunSelect (portal=0x29b47a0, forward=<value
optimized out>, count=0, dest=0x29af198) at pquery.c:943
#9  0x00000000005bb159 in PortalRun (portal=0x29b47a0,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x29af198,
altdest=0x29af198, completionTag=0x7fff4a9bcf40 "") at pquery.c:769
#10 0x00000000005b6d2d in exec_simple_query (query_string=0x2969070 "select
count(*) as pages from pg_freespacemap_pages ") at postgres.c:1004
#11 0x00000000005b8071 in PostgresMain (argc=4, argv=<value optimized out>,
username=0x28bf4b0 "postgres") at postgres.c:3631
#12 0x000000000058ca1b in ServerLoop () at postmaster.c:3207
#13 0x000000000058d73e in PostmasterMain (argc=5, argv=0x28ba310) at
postmaster.c:1029
#14 0x0000000000544c15 in main (argc=5, argv=<value optimized out>) at
main.c:188

We can see the use of contrib/pg_freespacemap. A munin plugin sent this
query "select count(*) as pages from pg_freespacemap_pages " every 5 minutes
( since 1 year, now ) and we obtain graph.

I notice that the graph says that our freespacemap is empty ( a few thousand
of pages ) since our first crash. And sometime, the number of pages
increase, and we've got a crash.

If you want more detail, ask me ...

Thanks,

PS : Sorry for my poor english

Re: BUG #5004: pg_freespacemap make a SegFault

From

Tom Lane

Date:

21 August 2009, 13:52:45

"Sebastien Lardiere" <slardiere@hi-media.com> writes:
> Description:        pg_freespacemap make a SegFault

There's a post-8.3.7 fix that might cure this:

http://archives.postgresql.org/pgsql-committers/2009-04/msg00108.php

            regards, tom lane

Re: BUG #5004: pg_freespacemap make a SegFault

From

hubert depesz lubaczewski

Date:

22 August 2009, 14:52:58

On Fri, Aug 21, 2009 at 04:26:11PM +0000, Sebastien Lardiere wrote:
>
> The following bug has been logged online:
>
> Bug reference:      5004
> Logged by:          Sebastien Lardiere
> Email address:      slardiere@hi-media.com
> PostgreSQL version: 8.3.7
> Operating system:   Debian Etch
> Description:        pg_freespacemap make a SegFault
> Details:
>
> I've got a crash with a cluster. Nothing found in the logfile, but a message
> about a Segfault, so I get a coredump and here is the backtrace :

Can you check if you had any vacuums running at the time of crash?

It might be in logs, something like:
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because
anotherserver process exited abnormally and possibly corrupted shared memory. 
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
STATEMENT:  vacuum

if yes - how many vacuum jobs there were?

depesz

--
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

Re: BUG #5004: pg_freespacemap make a SegFault

From

Sébastien Lardière

Date:

24 August 2009, 11:10:07

On 21/08/2009 18:51, Tom Lane wrote:
> "Sebastien Lardiere"<slardiere@hi-media.com>  writes:
>
>> Description:        pg_freespacemap make a SegFault
>>
> There's a post-8.3.7 fix that might cure this:
>
> http://archives.postgresql.org/pgsql-committers/2009-04/msg00108.php
>
>             regards, tom lane
>

Ok, I'll try to appli this patch,

Thanks,

--
Sébastien Lardière

Re: BUG #5004: pg_freespacemap make a SegFault

From

Sébastien Lardière

Date:

24 August 2009, 11:10:53

On 22/08/2009 19:52, hubert depesz lubaczewski wrote:
> On Fri, Aug 21, 2009 at 04:26:11PM +0000, Sebastien Lardiere wrote:
>
>> The following bug has been logged online:
>>
>> Bug reference:      5004
>> Logged by:          Sebastien Lardiere
>> Email address:      slardiere@hi-media.com
>> PostgreSQL version: 8.3.7
>> Operating system:   Debian Etch
>> Description:        pg_freespacemap make a SegFault
>> Details:
>>
>> I've got a crash with a cluster. Nothing found in the logfile, but a message
>> about a Segfault, so I get a coredump and here is the backtrace :
>>
> Can you check if you had any vacuums running at the time of crash?
>

Yes, autovacuum is on. it wasn't "normal" vacuum during the crash, but
the last.

Nevertheless, the day before the first crash, I made a big delete on 23
millions of rows, and pg_freespacemap show a big increase of the number
of pages in FSM. Then, when the number of pages in FSM increase, Pg
crashes ; but :

> It might be in logs, something like:
> WARNING:  terminating connection because of crash of another server process
> DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because
anotherserver process exited abnormally and possibly corrupted shared memory. 
> HINT:  In a moment you should be able to reconnect to the database and repeat your command.
> STATEMENT:  vacuum
>
> if yes - how many vacuum jobs there were?
>
>

I never seen in the logs this messages with vacuum, Pg always crash with
the query :

"select count(*) as pages from pg_freespacemap_pages"

We can see in Munin ( graph attached ), the behavior :

The big increase, then, the first crash, and, a each time there is a
significat increase, a crash, with a reset of FSM.

I had disable the plugin, so there is no more queries with
pg_freespacemap, and no crash.

--
Sébastien Lardière

Attachment

bdd1-pg_fsm-week.png