Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows - Mailing list pgsql-bugs

From Pawel Kudzia
Subject Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
Date
Msg-id CAJYBUS_5Z+9aHyNQ7wkTEkZF+c_RMn7hMjPkTGU0Uc7OHMU3sQ@mail.gmail.com
Whole thread Raw
In response to Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows  (Pawel Kudzia <kudzia@gmail.com>)
Responses Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows  (Pawel Kudzia <kudzia@gmail.com>)
List pgsql-bugs
Based on the suggestions we've received from IRC we've:
* Moved some of our postgresql instances to another
  physical server,
* Enabled data_checksums on all instances,
* Upgraded upgraded our servers to 13.3
  (Debian 13.3-1.pgdg100+1) with hope that
https://github.com/postgres/postgres/commit/0d779d22a290a89b6c892137a37280b9588ad0cc
  addresses the issue [ although it's ts-vector
  specific, so unlikely to be related to our issue ].

Yet soon after we've observe another inconsistency.

Recap of the problem - SELECTs return rows, based on GIN,
that do not actually meet criteria expressed in WHERE.


SELECT entity_id,attribute_name_ids FROM entity WHERE
( attribute_name_ids && '{7572}' ) AND NOT
( (attribute_name_ids||0) && '{7572}') LIMIT 100 ;


 entity_id |                                   attribute_name_ids
-----------+----------------------------------------------------------------------------------------
  22327791 | {1,2,3,4,6,8,9,10,11,13,14,17,19,21,35,72,366,1659,2208,2211,3270,3273,3279,5787,7650}

(1 row)

EXPLAIN ANALYZE of the query above:


QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=35.46..46.67 rows=100 width=132) (actual
time=307.221..335.842 rows=1 loops=1)
   ->  Bitmap Heap Scan on entity  (cost=35.46..3705.32 rows=32724
width=132) (actual time=307.220..335.839 rows=1 loops=1)
         Recheck Cond: (attribute_name_ids && '{7572}'::integer[])
         Rows Removed by Index Recheck: 72012
         Filter: (NOT ((attribute_name_ids || 0) && '{7572}'::integer[]))
         Rows Removed by Filter: 21501
         Heap Blocks: exact=8998 lossy=9257
         ->  Bitmap Index Scan on entity_attribute_name_ids_gin
(cost=0.00..35.46 rows=32889 width=0) (actual time=19.790..19.790
rows=115485 loops=1)
               Index Cond: (attribute_name_ids && '{7572}'::integer[])
 Planning Time: 0.068 ms
 Execution Time: 335.879 ms
(11 rows)

Relevant part of the table structure:

CREATE TABLE public.entity (
    entity_id bigint NOT NULL,
    attribute_value_ids integer[] NOT NULL,
    attribute_name_ids integer[] NOT NULL,
[..]
);


ALTER TABLE ONLY public.entity ADD CONSTRAINT entity_pkey PRIMARY KEY
(entity_id);
CREATE INDEX entity_attribute_name_ids_gin ON public.entity USING gin
(attribute_name_ids public.gin__int_ops);
CREATE INDEX entity_attribute_value_ids_gin ON public.entity USING gin
(attribute_value_ids public.gin__int_ops);


There are at leat tens of updates / inserts per second to the table all the
time. the issue above manifests very rarely, after few days if not weeks of
uptime. we did not find any deterministic way of reproducing it, but it's
a matter of time till it shows.

The issue persist is related to on-disk data, it replicates from
streaming replication
masters to slaves.

Thank you in advance for your suggestions how to tackle this.

-- 
regards,
Pawel Kudzia



pgsql-bugs by date:

Previous
From: Pantelis Theodosiou
Date:
Subject: Re: BUG #17060: ERROR: column "rownum" does not exist
Next
From: Pawel Kudzia
Date:
Subject: Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows