Re: BUG #9741: Mininal case for the BUG #9735: Error: "ERROR: tuple offset out of range: 0" during bitmap scan - Mailing list pgsql-bugs
From | Maxim Boguk |
---|---|
Subject | Re: BUG #9741: Mininal case for the BUG #9735: Error: "ERROR: tuple offset out of range: 0" during bitmap scan |
Date | |
Msg-id | CAK-MWwQe_tihMYEa=O8hp6d14AQnwdmPLpEcgQaUWpx9xPMEcw@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #9741: Mininal case for the BUG #9735: Error: "ERROR: tuple offset out of range: 0" during bitmap scan (Maxim Boguk <maxim.boguk@gmail.com>) |
List | pgsql-bugs |
On Thu, Mar 27, 2014 at 11:27 PM, Maxim Boguk <maxim.boguk@gmail.com> wrote: > > > > On Thu, Mar 27, 2014 at 10:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Maxim Boguk <maxim.boguk@gmail.com> writes: >> > select * from bt_page_items('transactions_pkey', 88472) where >> itemoffset in >> > (93,94); >> > itemoffset | ctid | itemlen | nulls | vars | data >> > >> ------------+-------------+---------+-------+------+------------------------- >> > 93 | (2413168,1) | 16 | f | f | 7c c2 2c 03 00 00 >> 00 00 >> > 94 | (2337931,0) | 16 | f | f | 7c c2 2c 03 00 00 >> 00 00 >> >> This is sort of what I expected. Surely reindexing the index will get >> rid of that bogus entry? >> >> regards, tom lane >> > > Yep reindexing that particular index fixed problem. > But this index had been reindexed just 2 days ago in the same situation, > and now problem reappear (with different table row/index entries). > > If problem reappear after the last reindex and with 9.3.4 version I will > investigate future. > > > It would be interesting know how simple integer btree index could have > entry with definitely wrong ctid? > It very strange for problem reappear with the only one table (there are > couple other tables in the database with similar load but with no issues at > all), it looks too selective and repeatable for a hardware glitch. > Hi, Situation reappeared just in 3 hours after reindex (and no database crash/restart happens during that period). Problem found via manual search of the potentially troublesome id's: Step 1: search potentially suspicious index entry/id's (with index ctid have offset zero): select *,(get_byte(val, 0))+(get_byte(val, 1)<<8)+(get_byte(val, 2)<<16)+(get_byte(val, 3)<<24) as id from (select gs.i,t.*,(E'\\x'||replace(trim(trailing ' 0' from data), ' ', ''))::bytea as val from (select generate_series(1,90000) as i) as gs, bt_page_items('transactions_pkey', i) as t where ctid::text like '%,0)') as _t order by id,i,itemoffset; i | itemoffset | ctid | itemlen | nulls | vars | data | val | id -------+------------+-------------+---------+-------+------+-------------------------+------------+---------- 83604 | 60 | (2233938,0) | 16 | f | f | df aa 11 03 00 00 00 00 | \xdfaa1103 | 51489503 83648 | 92 | (2245568,0) | 16 | f | f | e7 e9 11 03 00 00 00 00 | \xe7e91103 | 51505639 83819 | 37 | (2253801,0) | 16 | f | f | bc dc 12 03 00 00 00 00 | \xbcdc1203 | 51567804 83907 | 184 | (2256376,0) | 16 | f | f | 1f 5b 13 03 00 00 00 00 | \x1f5b1303 | 51600159 ... Found 25 rows. Now situation become much worse: the test query ( set enable_indexscan to 0; select * from transactions where id in (25 rows from previous query result); ) produce the same error on the master database as well as on the replicas. And reindex index transactions_pkey; doesn't help. However, filenode of index changed for sure (so reindex happened): select *,pg_relation_filenode('transactions_pkey'),pg_relation_filepath('transactions_pkey') from pg_stat_file(pg_relation_filepath('transactions_pkey')); size | access | modification | change | creation | isdir | pg_relation_filenode | pg_relation_filepath -----------+------------------------+------------------------+------------------------+----------+-------+----------------------+---------------------- 760471552 | 2014-03-28 04:02:22+04 | 2014-03-28 04:16:34+04 | 2014-03-28 04:16:34+04 | | f | 408909 | base/16424/408909 reindex index transactions_pkey; REINDEX select *,pg_relation_filenode('transactions_pkey'),pg_relation_filepath('transactions_pkey') from pg_stat_file(pg_relation_filepath('transactions_pkey')); size | access | modification | change | creation | isdir | pg_relation_filenode | pg_relation_filepath -----------+------------------------+------------------------+------------------------+----------+-------+----------------------+---------------------- 760438784 | 2014-03-28 04:18:20+04 | 2014-03-28 04:19:39+04 | 2014-03-28 04:19:39+04 | | f | 408931 | base/16424/408931 set enable_indexscan to 0; SET select id from transactions where id in (51489503,51505639,51567804,... 25 total); ERROR: tuple offset out of range: 0 Any suggestions what I can do next and where problem could be? Now it doesn't seems replication issue... may be some repeatable low level database corruption? But I have no idea what type corruption could produce so weird results. Kind Regards, Maskym
pgsql-bugs by date: