Re: Duplicate Item Pointers in Gin index - Mailing list pgsql-hackers

From R, Siva
Subject Re: Duplicate Item Pointers in Gin index
Date
Msg-id 1529016918726.65758@amazon.com
Whole thread Raw
In response to Re: Duplicate Item Pointers in Gin index  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Tue, Jun 12, 2018 at 11:01 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

> FWIW, I've looked at this again. I think that the situation Siva
> reported in the first mail can happen before we get commit 3b2787e.
> That is, gin indexes had had a data corruption bug. I've reproduced
> the situation with PostgreSQL 10.1 and observed that a gin index can
> corrupt.

Thank you so much for trying the steps out and reproducing the issue!
It is good that the correctness of query result is not affected when reading
from a gin index that has duplicates.

One place where an assertion will fail is when we compress the posting list [1]
but since it is a debug assert, it will be suppressed in release builds.
Here, the correctness of varbyte encoding is not affected,
but the encoded page will have redundant zero deltas.

When we repack leaf items [2] this may cause creation of a new segment
if all the items (including the duplicates) do not fit in a single page.

[1] -
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/gin/ginpostinglist.c;h=8d2d31ac7236de5e2c62c6fa745af45a2b895b2c;hb=refs/heads/REL_10_STABLE#l209
[2] -
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/gin/gindatapage.c;h=2e5ea479763f32d6dc637e1c27d5975d124f293f;hb=refs/heads/REL_10_STABLE#l1601

Best
Siva

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: [HACKERS] Optional message to user when terminating/cancellingbackend
Next
From: Tom Lane
Date:
Subject: Bogus dependency calculation for expressions involving casts