GIN data corruption bug(s) in 9.6devel - Mailing list pgsql-hackers

From Tomas Vondra
Subject GIN data corruption bug(s) in 9.6devel
Date
Msg-id 563BD59A.1040206@2ndquadrant.com
Whole thread Raw
Responses Re: GIN data corruption bug(s) in 9.6devel  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
Hi,

while repeating some full-text benchmarks on master, I've discovered
that there's a data corruption bug somewhere. What happens is that while
loading data into a table with GIN indexes (using multiple parallel
connections), I sometimes get this:

TRAP: FailedAssertion("!(((PageHeader) (page))->pd_special >=
(__builtin_offsetof (PageHeaderData, pd_linp)))", File: "ginfast.c",
Line: 537)
LOG:  server process (PID 22982) was terminated by signal 6: Aborted
DETAIL:  Failed process was running: autovacuum: ANALYZE messages

The details of the assert are always exactly the same - it's always
autovacuum and it trips on exactly the same check. And the backtrace
always looks like this (full backtrace attached):

#0  0x00007f133b635045 in raise () from /lib64/libc.so.6
#1  0x00007f133b6364ea in abort () from /lib64/libc.so.6
#2  0x00000000007dc007 in ExceptionalCondition
(conditionName=conditionName@entry=0x81a088 "!(((PageHeader)
(page))->pd_special >= (__builtin_offsetof (PageHeaderData, pd_linp)))",
        errorType=errorType@entry=0x81998b "FailedAssertion",
fileName=fileName@entry=0x83480a "ginfast.c",
lineNumber=lineNumber@entry=537) at assert.c:54
#3  0x00000000004894aa in shiftList (stats=0x0, fill_fsm=1 '\001',
newHead=26357, metabuffer=130744, index=0x7f133c0f7518) at ginfast.c:537
#4  ginInsertCleanup (ginstate=ginstate@entry=0x7ffd98ac9160,
vac_delay=vac_delay@entry=1 '\001', fill_fsm=fill_fsm@entry=1 '\001',
stats=stats@entry=0x0) at ginfast.c:908
#5  0x00000000004874f7 in ginvacuumcleanup (fcinfo=<optimized out>) at
ginvacuum.c:662
...

It's not perfectly deterministic - sometimes I had repeat the whole load
multiple times (up to 5x, and each load takes ~30minutes).

I'm pretty sure this is not external issue, because I've reproduced it
on a different machine (different CPU / kernel / libraries / compiler).
It's however interesting that on the other machine I've also observed a
different kind of lockups, where the sessions get stuck on semop() in
gininsert (again, full backtrace attached):

#0  0x0000003f3d4eaf97 in semop () from /lib64/libc.so.6
#1  0x000000000067a41f in PGSemaphoreLock (sema=0x7f93290405d8) at
pg_sema.c:387
#2  0x00000000006df613 in LWLockAcquire (lock=0x7f92a4dce900,
mode=LW_EXCLUSIVE) at lwlock.c:1049
#3  0x00000000004878c6 in ginHeapTupleFastInsert
(ginstate=0x7ffd969c88f0, collector=0x7ffd969caeb0) at ginfast.c:250
#4  0x000000000047423a in gininsert (fcinfo=<value optimized out>) at
gininsert.c:531
...

I'm not sure whether this is a different manifestation of the same issue
or another bug. The systems are not exactly the same - one has a single
socket (i5) while the other one has 2 (Xeon), the compilers and kernels
are different and so on.

I've also seen cases when the load seemingly completed OK, but trying to
dump the table to disk using COPY resulted in

        ERROR:  compressed data is corrupted

which I find rather strange as there was no previous error, and also
COPY should only dump table data (while the asserts were in GIN code
handling index pages, unless I'm mistaken). Seems like a case of
insufficient locking where two backends scribble on the same page
somehow, and then autovacuum hits the assert. Ot maybe not, not sure.

I've been unable to reproduce the issue on REL9_5_STABLE (despite
running the load ~20x on each machine), so that seems safe, and the
issue was introduced by some of the newer commits.

I've already spent too much CPU time on this, so perhaps someone with
better knowledge of the GIN code can take care of this. To reproduce it
you may use the same code I did - it's available here:

        https://bitbucket.org/tvondra/archie

it's a PoC of database with pgsql mailing lists with fulltext. It's a
bit messy, but it's rather simple

        1) clone the repo

           $ git clone https://bitbucket.org/tvondra/archie.git

        2) create a directory for downloading the mbox files

           $ mkdir archie-data

        3) download the mbox files (~4.5GB of data) using the download
           script (make sure archie/bin is on PATH)

           $ cd archie-data
           $ export PATH=../archie/bin:$PATH
           $ ../archie/download

        4) use "run" scipt (attached) to run the load n-times on a given
           commit

           $ run.sh master 10

           NOTICE: The run script is the messiest one, you'll have to
                   edit it to fix paths etc.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [PATCH] Skip ALTER x SET SCHEMA if the schema didn't change
Next
From: Jeff Janes
Date:
Subject: Re: GIN data corruption bug(s) in 9.6devel