Re: BUG #8656: Duplicate data violating unique constraints - Mailing list pgsql-bugs

From Maciek Sakrejda
Subject Re: BUG #8656: Duplicate data violating unique constraints
Date
Msg-id CAKwe89CLSkt=BgygyQoAMw0rSXM5DuDDpyTbNd7sYDzM0SQNOQ@mail.gmail.com
Whole thread Raw
In response to Re: BUG #8656: Duplicate data violating unique constraints  (Maciek Sakrejda <maciek@heroku.com>)
List pgsql-bugs
Last night, I ran into an issue on a promoted 9.3.2 replica from this
instance (the original was a 9.3.1, but no 9.3.1 replica was ever promoted
from it, just 9.3.2), and a few folks on IRC were incredibly helpful in
debugging the issue (thanks again), but I don't think we came to any clear
conclusion. The final verdict was that a dump/restore is the best bet and
I'm working with the customer to do that, but I want to make sure that the
issues here fit the symptoms of the bugs fixed in 9.3.2, and that there's
nothing else lurking that may bite us in the future. Based on the earlier
comments in this thread, I got the impression that that's not quite the
case. I'm happy to dig in further if that's useful, but I'm not sure what
I'm looking for.

Anyway, the symptoms last night were basically intermittent failures in
creating temporary tables. I ran a statement like:

=> CREATE TEMPORARY TABLE foo(id INTEGER NOT NULL, bar INTEGER, PRIMARY KEY
(id)) ON COMMIT DROP;

and occasionally this would work, but sometimes yield errors like:

ERROR:  could not read block 20 in file "base/16385/12661": read only 0 of
8192 bytes

12661 is pg_type, and the other table that showed up in the error is
pg_constraint. Block 20 here is beyond the end of the table
(pg_relation_size was 155648), and I got similar errors for pg_constraint.
The error was intermittent (sometimes creating the table worked fine) and
the block that could not be read was one to three blocks beyond the end of
the table. I did not see any other errors.

Based on guidance in IRC, I checked the raw page information for the tables
and all their indexes with the pageinspect extension,

select i.* from generate_series(0,18) bn, get_raw_page('pg_type', bn) p,
heap_page_items(p) i;
select b.* from generate_series(1,3) bn, bt_page_items('pg_type_oid_index',
bn) b;

and did not see any blocks outside of the expected range referenced there.

I also checked the FSM for both tables with the pg_freespacemap extension:

select * from pg_freespace('pg_type');

but did not find any blocks outside of the expected range for either.

I also checked relfrozenxid, relminmxid in pg_class for the tables showing
errors:

relname | relfrozenxid | relminmxid
---------------+--------------+------------
 pg_type | 1845 | 1
 pg_constraint | 1845 | 1

select txid_current() was 4011697 at the time.

I think this is all the relevant info from the IRC session. As before,
there was no explicit VACUUM FREEZE and no changes to autovacuum settings.

So, does this fit the bill for the issues fixed in 9.3.2? As I understand
it, the fact that the current txid is much lower than the wraparound
threshold suggests no, right? If not, are there more diagnostics I can run
to learn anything here? If there's another bug to find here, I'd love to
help find it.

Thanks,
Maciek

pgsql-bugs by date:

Previous
From: "Erik Rijkers"
Date:
Subject: Re: BUG #8679: Error in regex function
Next
From: arjsia@gmail.com
Date:
Subject: BUG #8682: sqlstate = 28000