Home > mailing lists

Re: GIN data corruption bug(s) in 9.6devel - Mailing list pgsql-hackers

From	Jeff Janes
Subject	Re: GIN data corruption bug(s) in 9.6devel
Date	November 6, 2015 01:44:35
Msg-id	CAMkU=1w5x8rY5EvWieJsfZWB3eNGmFGHmOBf9r5VLWDTW72b2g@mail.gmail.com Whole thread Raw
In response to	GIN data corruption bug(s) in 9.6devel (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: GIN data corruption bug(s) in 9.6devel (Tomas Vondra <tomas.vondra@2ndquadrant.com>) Re: GIN data corruption bug(s) in 9.6devel (Peter Geoghegan <pg@heroku.com>)
List	pgsql-hackers

Tree view

On Thu, Nov 5, 2015 at 2:18 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> Hi,
>
> while repeating some full-text benchmarks on master, I've discovered
> that there's a data corruption bug somewhere. What happens is that while
> loading data into a table with GIN indexes (using multiple parallel
> connections), I sometimes get this:
>
> TRAP: FailedAssertion("!(((PageHeader) (page))->pd_special >=
> (__builtin_offsetof (PageHeaderData, pd_linp)))", File: "ginfast.c",
> Line: 537)
> LOG:  server process (PID 22982) was terminated by signal 6: Aborted
> DETAIL:  Failed process was running: autovacuum: ANALYZE messages
>
> The details of the assert are always exactly the same - it's always
> autovacuum and it trips on exactly the same check. And the backtrace
> always looks like this (full backtrace attached):
>
> #0  0x00007f133b635045 in raise () from /lib64/libc.so.6
> #1  0x00007f133b6364ea in abort () from /lib64/libc.so.6
> #2  0x00000000007dc007 in ExceptionalCondition
> (conditionName=conditionName@entry=0x81a088 "!(((PageHeader)
> (page))->pd_special >= (__builtin_offsetof (PageHeaderData, pd_linp)))",
>        errorType=errorType@entry=0x81998b "FailedAssertion",
> fileName=fileName@entry=0x83480a "ginfast.c",
> lineNumber=lineNumber@entry=537) at assert.c:54
> #3  0x00000000004894aa in shiftList (stats=0x0, fill_fsm=1 '\001',
> newHead=26357, metabuffer=130744, index=0x7f133c0f7518) at ginfast.c:537
> #4  ginInsertCleanup (ginstate=ginstate@entry=0x7ffd98ac9160,
> vac_delay=vac_delay@entry=1 '\001', fill_fsm=fill_fsm@entry=1 '\001',
> stats=stats@entry=0x0) at ginfast.c:908
> #5  0x00000000004874f7 in ginvacuumcleanup (fcinfo=<optimized out>) at
> ginvacuum.c:662
> ...

This looks like it is probably the same bug discussed here:

http://www.postgresql.org/message-id/CAMkU=1xALfLhUUohFP5v33RzedLVb5aknNUjcEuM9KNBKrB6-Q@mail.gmail.com

And here:

http://www.postgresql.org/message-id/56041B26.2040902@sigaev.ru

The bug theoretically exists in 9.5, but it wasn't until 9.6 (commit
e95680832854cf300e64c) that free pages were recycled aggressively
enough that it actually becomes likely to be hit.

There are some proposed patches in those threads, but discussion on
them seems to have stalled out.  Can you try one and see if it fixes
the problems you are seeing?

Cheers,

Jeff

pgsql-hackers by date:

From: Tomas Vondra
Date: 06 November 2015, 01:18:12
Subject: GIN data corruption bug(s) in 9.6devel

From: Haribabu Kommi
Date: 06 November 2015, 02:09:22
Subject: Re: NOTIFY in Background Worker

Re: GIN data corruption bug(s) in 9.6devel - Mailing list pgsql-hackers

Previous

Next