Thread: Duplicate values found when reindexing unique index

Duplicate values found when reindexing unique index

From
"Mason Hale"
Date:
Hello --

I noticed recently some errors in my postgres log like the following:

ERROR:  could not open segment 9 of relation 1663/16830/1384385 (target
block 776929292): No such file or directory

These are occurring at a rate of 1 every 2-3 days. But that rate has been
increasing.

After Googling around, I found this error could indicate index corruption.

I found that the relation with oid 1384385 was the following
index: "index_entry_on_guid_and_feed_id" UNIQUE, btree (guid, feed_id) -- on
a table named entry.

When I tried to rebuild that index, I got the following error:

prod_2=# reindex index index_entry_on_guid_and_feed_id;
ERROR:  could not create unique index
DETAIL:  Table contains duplicated values.

Now, if that index was in fact corrupted, the perhaps it is not such a
surprise that the duplicate values are there. The thing that makes this more
mysterious is that we have another unique index on entry(feed_id, guid) --
and I have no reason to believe it is corrupted (although I guess it still
could be).

Please note: I have previously reported another issue with duplicate values
making into into a table despite having a unique index in place (bug #3724).
This issue involves a completely different table.

We can stay running despite this error but I would very much like to track
down the cause as soon as possible. To avoid destroying any evidence, I'm
going to leave things intact and await further instructions.

Given my experience, the reliability of unique indexes is becoming somewhat
suspect. Please help. ;-)

thanks in advance,
Mason

Re: Duplicate values found when reindexing unique index

From
Tom Lane
Date:
"Mason Hale" <masonhale@gmail.com> writes:
> Given my experience, the reliability of unique indexes is becoming somewhat
> suspect. Please help. ;-)

Well, as in the previous report, there is not enough information here to
offer much chance of understanding what's going wrong.

Have you tried reindexing that other index with the same columns in the
other order?  My guess is that there really are duplicate entries in the
table; if so the other one should fail too.  If so, please try to
identify the duplicated values, along the lines of

    select guid, feed_id from entry group by 1,2 having count(*) > 1

and show us the system columns (ctid,xmin,xmax,cmin,cmax) from the
tuples having duplicate value(s).  Note that you should probably disable
indexscan and bitmapscan while doing this probing, so as not to have the
queries use the suspect indexes.

This is 8.2.5 right?  Was the DB loaded fresh into 8.2.5, or was it
inherited from previous 8.2.x release(s)?

BTW, what are the datatypes of the index columns?

            regards, tom lane

Re: Duplicate values found when reindexing unique index

From
Tom Lane
Date:
"Mason Hale" <masonhale@gmail.com> writes:
> I found a single pair of rows that were duplicated. Interestingly it was not
> just the guid and feed_id that were duplicated but all columns were
> indentical, including the primary key, except an update_at column which is
> automatically populated via a trigger (BEFORE UPDATE on entry FOR EACH ROW).

> The duplicate data included a created_at column which defaults to now() --
> that the two duplicate rows have exactly the same values strongly hints to
> me that the duplicates were created during the same transaction.

Hmm, what sounds more likely to me is that both of those rows are
updated versions of the same original row.

> Here's the system column data you requested.

>     id     |     ctid     | xmin | xmax | cmin | cmax
> -----------+--------------+------+------+------+------
>  151341072 | (1508573,11) |    2 |    0 |   19 |    0
>  151341072 | (1818219,11) |    2 |    0 |   19 |    0
> (2 rows)

I wonder whether it's just a coincidence that these have the same offset
number...

The fact that both rows have been frozen (xmin=2) means we can't really
tell whether they originated in the same transaction or not.

Can you show us all the triggers on this table?  Also, it would be real
interesting to see "pg_filedump -i -f" output for the two blocks in
question (1508573 and 1818219) --- see http://sources.redhat.com/rhdb/
to get a copy of pg_filedump.

            regards, tom lane

Re: Duplicate values found when reindexing unique index

From
Tom Lane
Date:
"Mason Hale" <masonhale@gmail.com> writes:
> I have downloaded, compiled and installed pg_filedump -- but I am not sure
> how to determine which file I should have it dump. I am not very familiar
> with the postgres file structure. Can you please provide some guidance? How
> do I determine the correct file?

> I've determined the relation "entry" has an oid = 16838 -- but the
> /data/base/16830 directory contains 92 1GB files named 16838.[1-92]

In the first place, it's relfilenode not OID to look at --- these
are often the same but not always.

Once you've got past that, there are 131072 blocks per file segment,
so logical block 1508573 would translate to segment .11 block 66781.
pg_filedump is too stupid to do that arithmetic for you :-(

            regards, tom lane

Re: Duplicate values found when reindexing unique index

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

>> Here's the system column data you requested.
>
>>     id     |     ctid     | xmin | xmax | cmin | cmax
>> -----------+--------------+------+------+------+------
>>  151341072 | (1508573,11) |    2 |    0 |   19 |    0
>>  151341072 | (1818219,11) |    2 |    0 |   19 |    0
>> (2 rows)
>
> I wonder whether it's just a coincidence that these have the same offset
> number...

I can't imagine any Postgres bug which would depend on the offsets being the
same. But what I could imagine is filesystem corruption which copied the block
to someplace else in the table or possibly has even mapped the same block into
two different places in the table.

Can you unmount the filesystem and run "fsck -v -n" on it? Is this the only
duplicate record in the table? Are there any other records on either of these
blocks?

To answer the latter question I found a handy trick of converting the tid to a
"point" so I could refer to the block or offset. In 8.3 this looks like:

select ctid from foo where (ctid::text::point)[0]  = 0;

But in 8.2 iirc you had to call the tid output function explicitly because
there was no cast to text.


--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

Re: Duplicate values found when reindexing unique index

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> I wonder whether it's just a coincidence that these have the same offset
>> number...

> I can't imagine any Postgres bug which would depend on the offsets
> being the same. But what I could imagine is filesystem corruption
> which copied the block to someplace else in the table or possibly has
> even mapped the same block into two different places in the table.

That idea was in my mind too, but Mason stated that the rows showed
different "updated_at" values, so they couldn't be mirror images of that
sort.  The pg_filedump output for the two blocks would be more conclusive
about this though --- I was expecting to pay attention to the whole
block contents not only the seemingly-dup rows.

            regards, tom lane

Re: Duplicate values found when reindexing unique index

From
"Mason Hale"
Date:
On Dec 30, 2007 12:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> "Mason Hale" <masonhale@gmail.com> writes:
> > Given my experience, the reliability of unique indexes is becoming
> somewhat
> > suspect. Please help. ;-)
>
> Well, as in the previous report, there is not enough information here to
> offer much chance of understanding what's going wrong.
>

Please just tell me what information you need and I will provide what I can.


> Have you tried reindexing that other index with the same columns in the
> other order?  My guess is that there really are duplicate entries in the
> table; if so the other one should fail too.  If so, please try to
> identify the duplicated values, along the lines of
>
>        select guid, feed_id from entry group by 1,2 having count(*) > 1
>
> and show us the system columns (ctid,xmin,xmax,cmin,cmax) from the
> tuples having duplicate value(s).  Note that you should probably disable
> indexscan and bitmapscan while doing this probing, so as not to have the
> queries use the suspect indexes.


I found a single pair of rows that were duplicated. Interestingly it was not
just the guid and feed_id that were duplicated but all columns were
indentical, including the primary key, except an update_at column which is
automatically populated via a trigger (BEFORE UPDATE on entry FOR EACH ROW).

The duplicate data included a created_at column which defaults to now() --
that the two duplicate rows have exactly the same values strongly hints to
me that the duplicates were created during the same transaction.

Here's the system column data you requested.

    id     |     ctid     | xmin | xmax | cmin | cmax
-----------+--------------+------+------+------+------
 151341072 | (1508573,11) |    2 |    0 |   19 |    0
 151341072 | (1818219,11) |    2 |    0 |   19 |    0
(2 rows)



> This is 8.2.5 right?  Was the DB loaded fresh into 8.2.5, or was it
> inherited from previous 8.2.x release(s)?
>

It is 8.2.5. It was loaded from a pg_dump from an 8.2.3 database into a
fresh 8.2.5 database on new hardware.


> BTW, what are the datatypes of the index columns?
>

id integer not null (serial)
guid character varying not null   ( no size limit defined )
feed_id integer not null


>
>                        regards, tom lane
>

thanks for the help Tom.

I do want to clear out one of the offending duplicates and reindex. But I'll
wait to do that until I get the okay from you.


Mason

Re: Duplicate values found when reindexing unique index

From
"Mason Hale"
Date:
Trolling through my server log I found this error:
2007-12-30 20:02:08 CST (10.11.199.136) PANIC:  right sibling's left-link
doesn't match
2007-12-30 20:02:08 CST (10.11.199.136) STATEMENT:  update bdu.entry set
title=$1, author=$2, description_type=$3, description_length=$4,
description=$5, published_at=$6, republished_at=$7, link=$8,
link_page_id=$9, link_count=$10, enclosure=$11, enclosure_page_id=$12,
enclosure_count=$13 where id=$14
2007-12-30 20:02:08 CST () LOG:  server process (PID 30004) was terminated
by signal 6
2007-12-30 20:02:08 CST () LOG:  terminating any other active server
processes

This seems related to the entry table -- so I wonder if it is related to
this problem?



On Dec 30, 2007 8:23 PM, Mason Hale <masonhale@gmail.com> wrote:

>
>
> On Dec 30, 2007 12:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > "Mason Hale" <masonhale@gmail.com> writes:
> > > Given my experience, the reliability of unique indexes is becoming
> > somewhat
> > > suspect. Please help. ;-)
> >
> > Well, as in the previous report, there is not enough information here to
> > offer much chance of understanding what's going wrong.
> >
>
> Please just tell me what information you need and I will provide what I
> can.
>
>
> > Have you tried reindexing that other index with the same columns in the
> > other order?  My guess is that there really are duplicate entries in the
> >
> > table; if so the other one should fail too.  If so, please try to
> > identify the duplicated values, along the lines of
> >
> >        select guid, feed_id from entry group by 1,2 having count(*) > 1
> >
> > and show us the system columns (ctid,xmin,xmax,cmin,cmax) from the
> > tuples having duplicate value(s).  Note that you should probably disable
> > indexscan and bitmapscan while doing this probing, so as not to have the
> > queries use the suspect indexes.
>
>
> I found a single pair of rows that were duplicated. Interestingly it was
> not just the guid and feed_id that were duplicated but all columns were
> indentical, including the primary key, except an update_at column which is
> automatically populated via a trigger (BEFORE UPDATE on entry FOR EACH ROW).
>
>
> The duplicate data included a created_at column which defaults to now() --
> that the two duplicate rows have exactly the same values strongly hints to
> me that the duplicates were created during the same transaction.
>
> Here's the system column data you requested.
>
>     id     |     ctid     | xmin | xmax | cmin | cmax
> -----------+--------------+------+------+------+------
>  151341072 | (1508573,11) |    2 |    0 |   19 |    0
>  151341072 | (1818219,11) |    2 |    0 |   19 |    0
> (2 rows)
>
>
>
> > This is 8.2.5 right?  Was the DB loaded fresh into 8.2.5, or was it
> > inherited from previous 8.2.x release(s)?
> >
>
> It is 8.2.5. It was loaded from a pg_dump from an 8.2.3 database into a
> fresh 8.2.5 database on new hardware.
>
>
> > BTW, what are the datatypes of the index columns?
> >
>
> id integer not null (serial)
> guid character varying not null   ( no size limit defined )
> feed_id integer not null
>
>
> >
> >                        regards, tom lane
> >
>
> thanks for the help Tom.
>
> I do want to clear out one of the offending duplicates and reindex. But
> I'll wait to do that until I get the okay from you.
>
>
> Mason
>

Re: Duplicate values found when reindexing unique index

From
"Mason Hale"
Date:
>
>
> Can you show us all the triggers on this table?


Here they are:

Triggers:
    entry_correct_published_at_trigger BEFORE INSERT OR UPDATE ON entry FOR
EACH ROW EXECUTE PROCEDURE correct_published_at()
    entry_feed_page_trigger BEFORE INSERT OR UPDATE ON entry FOR EACH ROW
EXECUTE PROCEDURE entry_feed_page_trigger()
    entry_updated_at_trigger BEFORE UPDATE ON entry FOR EACH ROW EXECUTE
PROCEDURE update_updated_at_timestamp()
    feed_entry_count_trigger AFTER INSERT ON entry FOR EACH ROW EXECUTE
PROCEDURE update_feed_entry_count()


>  Also, it would be real
> interesting to see "pg_filedump -i -f" output for the two blocks in
> question (1508573 and 1818219) --- see http://sources.redhat.com/rhdb/
> to get a copy of pg_filedump.
>

I have downloaded, compiled and installed pg_filedump -- but I am not sure
how to determine which file I should have it dump. I am not very familiar
with the postgres file structure. Can you please provide some guidance? How
do I determine the correct file?

I've determined the relation "entry" has an oid = 16838 -- but the
/data/base/16830 directory contains 92 1GB files named 16838.[1-92]

I've tried:

pg_filedump -i -f -R 1508573 16838

and got the following error:

[postgres@prod-db-2 16830]$ pg_filedump -i -f -R 1508573 16838

*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 8.2.0
*
* File: 16838
* Options used: -i -f -R 1508573
*
* Dump created on: Sun Dec 30 23:18:01 2007
*******************************************************************
Error: Seek error encountered before requested start block <1508573>.

I tried a few other files in the 1-92 range with the same results.


Mason

Re: Duplicate values found when reindexing unique index

From
Simon Riggs
Date:
On Mon, 2007-12-31 at 01:27 -0500, Tom Lane wrote:
> Gregory Stark <stark@enterprisedb.com> writes:
> > "Tom Lane" <tgl@sss.pgh.pa.us> writes:
> >> I wonder whether it's just a coincidence that these have the same offset
> >> number...
>
> > I can't imagine any Postgres bug which would depend on the offsets
> > being the same. But what I could imagine is filesystem corruption
> > which copied the block to someplace else in the table or possibly has
> > even mapped the same block into two different places in the table.
>
> That idea was in my mind too, but Mason stated that the rows showed
> different "updated_at" values, so they couldn't be mirror images of that
> sort.

Tom,

I think you misread Mason's post of 20:23 GMT-6 where he says the
created_at values are the *same*, not different. Mason's previous bug
report 3724 also had duplicate rows with matching created_at values.

So mangling block numbers and re-writing blocks in the wrong place is a
possibility. Filesystem corruption is just one way that can occur.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

Re: Duplicate values found when reindexing unique index

From
"Mason Hale"
Date:
>
>
> I think you misread Mason's post of 20:23 GMT-6 where he says the
> created_at values are the *same*, not different. Mason's previous bug
> report 3724 also had duplicate rows with matching created_at values.
>

Yes, to confirm, the created_at values are the same. The *only* values that
are different are updated_at and that value is set via trigger.

Another data point that may or may not be relevant: I've been working to set
up a warm standby server and have run into some weird behavior there as
well. The symptom is the server encounters an error during the file restore,
then pre-maturely exits recovery mode. I'm using the pg_standby program for
the restore_command in my recovery.conf on the standby server.

The error message from the standby server was:

Last week I had turned my attention away from the database restore, and
didn't notice that our standby server had exited recovery mode on 12/20.
Here's the last few lines from the log file (full log attached).

2007-12-20 04:11:43 CST () LOG:  restored log file
"000000010000042200000057" from archive
2007-12-20 04:13:09 CST () LOG:  restored log file
"000000010000042200000058" from archive
2007-12-20 04:14:40 CST () LOG:  restored log file
"000000010000042200000059" from archive
2007-12-20 04:14:40 CST () LOG:  invalid info bits 0001 in log file 1058,
segment 89, offset 0
2007-12-20 04:14:40 CST () LOG:  redo done at 422/58FFEE38
2007-12-20 04:14:40 CST () LOG:  restored log
file "000000010000042200000058" from archive
2007-12-20 04:14:40 CST () LOG:  archive recovery complete
2007-12-20 04:24:57 CST () LOG:  database system is ready

thanks,
Mason

Re: Duplicate values found when reindexing unique index

From
Tom Lane
Date:
"Mason Hale" <masonhale@gmail.com> writes:
>> I think you misread Mason's post of 20:23 GMT-6 where he says the
>> created_at values are the *same*, not different. Mason's previous bug
>> report 3724 also had duplicate rows with matching created_at values.

> Yes, to confirm, the created_at values are the same. The *only* values that
> are different are updated_at and that value is set via trigger.

Which still demonstrates that they aren't exact images.  I'd still like
to see the pg_filedump output though ...

> Last week I had turned my attention away from the database restore, and
> didn't notice that our standby server had exited recovery mode on 12/20.
> Here's the last few lines from the log file (full log attached).

> 2007-12-20 04:11:43 CST () LOG:  restored log file
> "000000010000042200000057" from archive
> 2007-12-20 04:13:09 CST () LOG:  restored log file
> "000000010000042200000058" from archive
> 2007-12-20 04:14:40 CST () LOG:  restored log file
> "000000010000042200000059" from archive
> 2007-12-20 04:14:40 CST () LOG:  invalid info bits 0001 in log file 1058,
> segment 89, offset 0

Do you by any chance still have 000000010000042200000058 and
000000010000042200000059 archived?  If so it would be useful to
look at the first dozen lines of "od -x" dump of each of them.

            regards, tom lane

Re: Duplicate values found when reindexing unique index

From
"Mason Hale"
Date:
On Dec 31, 2007 9:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> "Mason Hale" <masonhale@gmail.com> writes:
> >> I think you misread Mason's post of 20:23 GMT-6 where he says the
> >> created_at values are the *same*, not different. Mason's previous bug
> >> report 3724 also had duplicate rows with matching created_at values.
>
> > Yes, to confirm, the created_at values are the same. The *only* values
> that
> > are different are updated_at and that value is set via trigger.
>
> Which still demonstrates that they aren't exact images.  I'd still like
> to see the pg_filedump output though ...


I sent in previous email. You saw it, yes?


> > 2007-12-20 04:11:43 CST () LOG:  restored log file
> > "000000010000042200000057" from archive
> > 2007-12-20 04:13:09 CST () LOG:  restored log file
> > "000000010000042200000058" from archive
> > 2007-12-20 04:14:40 CST () LOG:  restored log file
> > "000000010000042200000059" from archive
> > 2007-12-20 04:14:40 CST () LOG:  invalid info bits 0001 in log file
> 1058,
> > segment 89, offset 0
>
> Do you by any chance still have 000000010000042200000058 and
> 000000010000042200000059 archived?  If so it would be useful to
> look at the first dozen lines of "od -x" dump of each of them.
>


Yes, I do. Here's the output:

[postgres@dev-db-2 wal_archive]$ od -x 000000010000042200000058 | head -n15
0000000 d05e 0002 0001 0000 0423 0000 0000 c100
0000020 f7df 472e e701 4728 0000 0100 2000 0000
0000040 a1db 81e6 0423 0000 0068 c000 0000 0000
0000060 0048 0000 002c 0000 0000 0000 0423 0000
0000100 0020 c100 0000 0000 0000 0000 0001 0000
0000120 0000 0000 780b 2ede 9f68 00f5 7834 0000
0000140 2f4d 0001 1f35 4774 0000 0000 0000 0000
0000160 0000 0000 0000 0000 0000 0000 0000 0000
*
0020000 d05e 0000 0001 0000 0422 0000 2000 5800
0020020 53c2 48bc 0422 0000 1fbc 5800 ce6f 2edd
0020040 003a 0000 001e 0000 0b00 0000 067f 0000
0020060 41be 0000 1ff8 0015 0000 0186 0007 0000
0020100 4337 000a 000c 008f 0122 0000 db84 d429
0020120 0422 0000 2010 5800 ce6f 2edd 003a 0000
[postgres@dev-db-2 wal_archive]$ od -x 000000010000042200000059 | head -n15
0000000 d05e 0001 0001 0000 006b 0000 6000 69dc
0000020 12ae 0000 6380 0024 0010 375a 21cd 1174
0000040 4001 0001 637c 0058 0010 375a 21cd 1174
0000060 4001 0001 6355 0010 0010 375a 21cd 1174
0000100 4001 0001 631d 005a 0010 375a 21cd 1174
0000120 4001 0001 62e8 001e 0010 375a 21cd 1174
0000140 4001 0001 629a 0019 0010 375a 21cd 1174
0000160 4001 0001 604a 0013 0010 375a 21cd 1174
0000200 4001 0001 6045 0025 0010 375a 21cd 1174
0000220 4001 0001 6044 0029 0010 375a 21cd 1174
0000240 4001 0001 603a 0005 0010 375a 21cd 1174
0000260 4001 0001 6020 0035 0010 375a 21cd 1174
0000300 4001 0001 600e 0050 0010 375a 21cd 1174
0000320 4001 0001 600d 0018 0010 375a 21cd 1174
0000340 4001 0001 600b 0033 0010 375a 21cd 1174
[postgres@dev-db-2 wal_archive]$


Mason

Re: Duplicate values found when reindexing unique index

From
Tom Lane
Date:
"Mason Hale" <masonhale@gmail.com> writes:
>> Can you show us all the triggers on this table?

> Here they are:

> Triggers:
>     entry_correct_published_at_trigger BEFORE INSERT OR UPDATE ON entry FOR
> EACH ROW EXECUTE PROCEDURE correct_published_at()
>     entry_feed_page_trigger BEFORE INSERT OR UPDATE ON entry FOR EACH ROW
> EXECUTE PROCEDURE entry_feed_page_trigger()
>     entry_updated_at_trigger BEFORE UPDATE ON entry FOR EACH ROW EXECUTE
> PROCEDURE update_updated_at_timestamp()
>     feed_entry_count_trigger AFTER INSERT ON entry FOR EACH ROW EXECUTE
> PROCEDURE update_feed_entry_count()

Actually I wanted to see the function bodies ...

            regards, tom lane

Re: Duplicate values found when reindexing unique index

From
"Mason Hale"
Date:
Tom, I'll send these to you privately.
Mason

On Dec 31, 2007 10:22 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> "Mason Hale" <masonhale@gmail.com> writes:
> >> Can you show us all the triggers on this table?
>
> > Here they are:
>
> > Triggers:
> >     entry_correct_published_at_trigger BEFORE INSERT OR UPDATE ON entry
> FOR
> > EACH ROW EXECUTE PROCEDURE correct_published_at()
> >     entry_feed_page_trigger BEFORE INSERT OR UPDATE ON entry FOR EACH
> ROW
> > EXECUTE PROCEDURE entry_feed_page_trigger()
> >     entry_updated_at_trigger BEFORE UPDATE ON entry FOR EACH ROW EXECUTE
> > PROCEDURE update_updated_at_timestamp()
> >     feed_entry_count_trigger AFTER INSERT ON entry FOR EACH ROW EXECUTE
> > PROCEDURE update_feed_entry_count()
>
> Actually I wanted to see the function bodies ...
>
>                        regards, tom lane
>

Re: Duplicate values found when reindexing unique index

From
Tom Lane
Date:
"Mason Hale" <masonhale@gmail.com> writes:
> On Dec 31, 2007 9:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Do you by any chance still have 000000010000042200000058 and
>> 000000010000042200000059 archived?  If so it would be useful to
>> look at the first dozen lines of "od -x" dump of each of them.

> Yes, I do. Here's the output:

> [postgres@dev-db-2 wal_archive]$ od -x 000000010000042200000058 | head -n15
> 0000000 d05e 0002 0001 0000 0423 0000 0000 c100
> 0000020 f7df 472e e701 4728 0000 0100 2000 0000
> 0000040 a1db 81e6 0423 0000 0068 c000 0000 0000

Hmm, something wrong here.  The data looks sane, but the page header
address ought to be in 0000000100000423000000C1 if I'm not mistaken.
And what's even odder is that the second page of the file has got

> 0020000 d05e 0000 0001 0000 0422 0000 2000 5800
> 0020020 53c2 48bc 0422 0000 1fbc 5800 ce6f 2edd
> 0020040 003a 0000 001e 0000 0b00 0000 067f 0000
> 0020060 41be 0000 1ff8 0015 0000 0186 0007 0000
> 0020100 4337 000a 000c 008f 0122 0000 db84 d429
> 0020120 0422 0000 2010 5800 ce6f 2edd 003a 0000

which *is* what you'd expect to find at the second block of
000000010000042200000058.

And then we have

> [postgres@dev-db-2 wal_archive]$ od -x 000000010000042200000059 | head -n15
> 0000000 d05e 0001 0001 0000 006b 0000 6000 69dc
> 0000020 12ae 0000 6380 0024 0010 375a 21cd 1174
> 0000040 4001 0001 637c 0058 0010 375a 21cd 1174
> 0000060 4001 0001 6355 0010 0010 375a 21cd 1174
> 0000100 4001 0001 631d 005a 0010 375a 21cd 1174

which is just completely off in left field --- that's not even close to
being the right sequence number, plus it's not a valid
first-page-of-file header (which is what the xlog complaint message
was about).  But on its own terms it might be valid data for someplace
in the middle of 000000010000006B00000069.

It might be worth trawling through both files to check the page headers
(every 8K) and see which ones agree with expectation and which don't.
The state of the ...0058 file might be explained by the theory that
you'd archived it a bit too late (after the first page had been
overwritten with newer WAL data), but the ...0059 file seems just plain
broken.  I am starting to wonder about hardware or OS misfeasance
causing writes to be lost or misdirected.

            regards, tom lane

Re: Duplicate values found when reindexing unique index

From
Simon Riggs
Date:
On Mon, 2007-12-31 at 11:53 -0500, Tom Lane wrote:

> It might be worth trawling through both files to check the page headers
> (every 8K) and see which ones agree with expectation and which don't.
> The state of the ...0058 file might be explained by the theory that
> you'd archived it a bit too late (after the first page had been
> overwritten with newer WAL data),

The interlock with .ready and .done should prevent reuse of a file. So
the only way this could happen is if the archive_command queued a
request to copy, rather than performing the copy immediately.

So I was going to say "thats not possible", but perhaps rsync might
become confused by the file renaming mechanism we use?

> but the ...0059 file seems just plain
> broken.

Yeh

> I am starting to wonder about hardware or OS misfeasance
> causing writes to be lost or misdirected.

Agreed

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

Re: Duplicate values found when reindexing unique index

From
Tom Lane
Date:
"Mason Hale" <masonhale@gmail.com> writes:
> Tom, I'll send these to you privately.

Thanks.  I don't see anything particularly surprising there though.
What I was wondering about was whether your application was in the
habit of doing repeated no-op updates on the same "entry" row.

The pg_filedump outputs seem to blow away any theory of hardware-level
duplication of the row --- all the tuples on both pages have the
expected block number in their headers, so it seems PG deliberately
put them where they are.  And the two tuples at issue are both marked
UPDATED, so they clearly are updated versions of some now-lost original.

What is not clear is whether they are independent updates of the same
original or whether there was a chain of updates --- that is, was the
newer one (which from the timestamp must be the one in the
lower-numbered block) made by an update from the older one, or from the
lost original?

Since the older one doesn't show any sign of having been updated itself
(in particular, no xmax and its ctid still points to itself), the former
theory would require assuming that the page update "got lost" --- was
discarded without being written to disk.  On the other hand, the latter
theory seems to require a similar assumption with respect to whatever
page held the original.

Given this, and the index corruption you showed before (the wrong
sibling link, which would represent index breakage quite independent of
what was in the heap), and the curious contents of your WAL files
(likewise not explainable by anything going wrong within a table),
I'm starting to think that Occam's razor says you've got hardware
problems.  Or maybe a kernel-level bug that is causing writes to get
discarded.

            regards, tom lane

Re: Duplicate values found when reindexing unique index

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> On Mon, 2007-12-31 at 11:53 -0500, Tom Lane wrote:
>> The state of the ...0058 file might be explained by the theory that
>> you'd archived it a bit too late (after the first page had been
>> overwritten with newer WAL data),

> The interlock with .ready and .done should prevent reuse of a file. So
> the only way this could happen is if the archive_command queued a
> request to copy, rather than performing the copy immediately.
> So I was going to say "thats not possible", but perhaps rsync might
> become confused by the file renaming mechanism we use?

Actually, the other problem with that theory is that the slave swallowed
the file without complaint.  Since the WAL reader code does check that
the page header contains the expected address, this seems to imply that
what the slave saw must have had 422/58 in it, not the 423/C1 we see
now.  So what needs to be explained is why what Mason is looking at now
is different from what the slave saw ten days ago.

            regards, tom lane

Re: Duplicate values found when reindexing unique index

From
Simon Riggs
Date:
On Mon, 2007-12-31 at 12:33 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Mon, 2007-12-31 at 11:53 -0500, Tom Lane wrote:
> >> The state of the ...0058 file might be explained by the theory that
> >> you'd archived it a bit too late (after the first page had been
> >> overwritten with newer WAL data),
>
> > The interlock with .ready and .done should prevent reuse of a file. So
> > the only way this could happen is if the archive_command queued a
> > request to copy, rather than performing the copy immediately.
> > So I was going to say "thats not possible", but perhaps rsync might
> > become confused by the file renaming mechanism we use?
>
> Actually, the other problem with that theory is that the slave swallowed
> the file without complaint.

No, it barfed. Mason showed us a recovery script, so it came from the
slave.

> Since the WAL reader code does check that
> the page header contains the expected address, this seems to imply that
> what the slave saw must have had 422/58 in it, not the 423/C1 we see
> now.  So what needs to be explained is why what Mason is looking at now
> is different from what the slave saw ten days ago.

So the slave did see a problem ten days ago, though I take your point
that the problem we see now may not be the as it was back then.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

Re: Duplicate values found when reindexing unique index

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> On Mon, 2007-12-31 at 12:33 -0500, Tom Lane wrote:
>> Actually, the other problem with that theory is that the slave swallowed
>> the file without complaint.

> No, it barfed. Mason showed us a recovery script, so it came from the
> slave.

No, it barfed on the *next* file.  422/58 it was fine with:

>>> 2007-12-20 04:11:43 CST () LOG:  restored log file
>>> "000000010000042200000057" from archive
>>> 2007-12-20 04:13:09 CST () LOG:  restored log file
>>> "000000010000042200000058" from archive
>>> 2007-12-20 04:14:40 CST () LOG:  restored log file
>>> "000000010000042200000059" from archive
>>> 2007-12-20 04:14:40 CST () LOG:  invalid info bits 0001 in log file 1058,
>>> segment 89, offset 0
>>> 2007-12-20 04:14:40 CST () LOG:  redo done at 422/58FFEE38

The "invalid info bits" gripe is consistent with what Mason showed us from
the 0059 file, but there's nothing there complaining about the 0058 file.

(Sometime we oughta try to make these messages more consistent about
whether hex or decimal notation is being used...)

            regards, tom lane

Re: Duplicate values found when reindexing unique index

From
Tom Lane
Date:
"Mason Hale" <masonhale@gmail.com> writes:
> [postgres@dev-db-2 wal_archive]$ od -x 000000010000042200000058 | head -n15
> 0000000 d05e 0002 0001 0000 0423 0000 0000 c100
> 0000020 f7df 472e e701 4728 0000 0100 2000 0000
> 0000040 a1db 81e6 0423 0000 0068 c000 0000 0000
> 0000060 0048 0000 002c 0000 0000 0000 0423 0000
> 0000100 0020 c100 0000 0000 0000 0000 0001 0000
> 0000120 0000 0000 780b 2ede 9f68 00f5 7834 0000
> 0000140 2f4d 0001 1f35 4774 0000 0000 0000 0000
> 0000160 0000 0000 0000 0000 0000 0000 0000 0000
> *
> 0020000 d05e 0000 0001 0000 0422 0000 2000 5800

Something else interesting about that: the apparent interloper page
contains just a single WAL record, which appears to be a shutdown
checkpoint bearing the timestamp "Thu Dec 27 2007, 16:55:01 EST".
Not sure if Mason can correlate that with any recent activity...

            regards, tom lane

Re: Duplicate values found when reindexing unique index

From
Simon Riggs
Date:
On Mon, 2007-12-31 at 13:14 -0500, Tom Lane wrote:
> "Mason Hale" <masonhale@gmail.com> writes:
> > [postgres@dev-db-2 wal_archive]$ od -x 000000010000042200000058 | head -n15
> > 0000000 d05e 0002 0001 0000 0423 0000 0000 c100
> > 0000020 f7df 472e e701 4728 0000 0100 2000 0000
> > 0000040 a1db 81e6 0423 0000 0068 c000 0000 0000
> > 0000060 0048 0000 002c 0000 0000 0000 0423 0000
> > 0000100 0020 c100 0000 0000 0000 0000 0001 0000
> > 0000120 0000 0000 780b 2ede 9f68 00f5 7834 0000
> > 0000140 2f4d 0001 1f35 4774 0000 0000 0000 0000
> > 0000160 0000 0000 0000 0000 0000 0000 0000 0000
> > *
> > 0020000 d05e 0000 0001 0000 0422 0000 2000 5800
>
> Something else interesting about that: the apparent interloper page
> contains just a single WAL record, which appears to be a shutdown
> checkpoint bearing the timestamp "Thu Dec 27 2007, 16:55:01 EST".
> Not sure if Mason can correlate that with any recent activity...

OK, well thats clearly an archive overwrite then.

This is happening on the slave and it is a page-level overwrite, of WAL,
not an original page being propagated to WAL via full_page_writes. So it
is either unconnected to the index problems on the primary, or they are
linked in some way, which seems most likely. That tends to argue against
a pure hardware problem. So it could be device drivers, OS or Postgres
causing this. Don't see how it can be the latter...

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

Re: Duplicate values found when reindexing unique index

From
"Mason Hale"
Date:
>
> Something else interesting about that: the apparent interloper page
> contains just a single WAL record, which appears to be a shutdown
> checkpoint bearing the timestamp "Thu Dec 27 2007, 16:55:01 EST".
> Not sure if Mason can correlate that with any recent activity...
>

Nothing jumps to mind, but I'll crawl through the logs looking for anything
interesting.

I'm starting to think that Occam's razor says you've got hardware
> problems.  Or maybe a kernel-level bug that is causing writes to get
> discarded.
>

For what its worth, we do closely monitor the server for RAID and kernel
errors and haven't seen either variety reported in quite some time.

If it is some mysterious hardware or kernel error -- any suggestions for how
I should go about narrowing that down?

Finally, if it would help to see the full wal segment files, let me know and
I will email individually. (I assume the list doesn't want 32MB of
attachments).

Mason

Re: Duplicate values found when reindexing unique index

From
Tom Lane
Date:
"Mason Hale" <masonhale@gmail.com> writes:
> If it is some mysterious hardware or kernel error -- any suggestions for how
> I should go about narrowing that down?

The first thing to ask is whether you're running an up-to-date kernel
(and whose is it).

> Finally, if it would help to see the full wal segment files, let me know and
> I will email individually. (I assume the list doesn't want 32MB of
> attachments).

If you'd send them to me privately, I'd be interested.

            regards, tom lane

Re: Duplicate values found when reindexing unique index

From
"Mason Hale"
Date:
On Dec 31, 2007 12:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> "Mason Hale" <masonhale@gmail.com> writes:
> > If it is some mysterious hardware or kernel error -- any suggestions for
> how
> > I should go about narrowing that down?
>
> The first thing to ask is whether you're running an up-to-date kernel
> (and whose is it).
>

The kernel is not up to date. What we have is:

Kernel: Linux 2.6.18-8.1.15.el5PAE
Distribution: RedHat EL 5.0-32

We can/will update the kernel at our next opportunity.


>
> > Finally, if it would help to see the full wal segment files, let me know
> and
> > I will email individually. (I assume the list doesn't want 32MB of
> > attachments).
>
> If you'd send them to me privately, I'd be interested.
>

 I will send these to you right away.

Mason

Re: Duplicate values found when reindexing unique index

From
Tom Lane
Date:
"Mason Hale" <masonhale@gmail.com> writes:
>> If you'd send them to me privately, I'd be interested.

>  I will send these to you right away.

Okay, the 00058 file looks pretty sane, except for what we already knew
about the first page being overwritten with a much later shutdown
checkpoint.  All the other page headers are as-expected.

The 00059 file, on the other hand, is just plain wacko.  Here are its
first few page headers (or what should be page headers):

0000000 d05e 0001 0001 0000 006b 0000 6000 69dc
0020000 d05e 0001 0001 0000 006b 0000 8000 69dc
0040000 d05e 0001 0001 0000 006b 0000 a000 69dc
0060000 d05e 0001 0001 0000 006b 0000 c000 69dc
0100000 d05e 0001 0001 0000 006b 0000 e000 69dc
0120000 d05e 0001 0001 0000 006b 0000 0000 69dd
0140000 0018 93cc 0018 9414 0018 9588 0018 957c
0160000 000c d63c 0bc7 0009 3c17 0009 000c d63c
0200000 000c b55e 08f7 0009 1373 004b 000c b55e
0220000 2f11 0042 000c 6cc3 0bac 0009 2e14 0043
0240000 4001 0008 c775 0060 0010 d61c 10c3 0c40
0260000 0901 0009 3719 001c 000c 26fa 0901 0009
0300000 0b95 0008 ab38 004b 000c 40b4 0b95 0008
0320000 4001 74de 0002 74e0 0002 0000 0000 0001
0340000 0018 9c48 0018 9c3c 0018 9c30 0018 9c24
0360000 b715 0026 0010 89a1 97e3 0549 4001 0002
0400000 0905 0009 2fc1 0031 000c a3aa 0905 0009
0420000 317c 0015 000c 7017 0bbd 0009 315f 0038
0440000 0018 960c 0018 9600 0018 95f4 0018 95e8
0460000 0018 8f64 0018 8f58 0018 8f4c 0018 8f40

The first six look sane except that they shouldn't be in this file,
and in fact shouldn't be at the front of any WAL file.  After that it
goes into a long stretch of what seems to be just plain garbage data.
Here's a sample chosen at random:

0058daa0: 432b 63b4 cc3f 1200 6525 3e00 1000 81d2  C+c..?..e%>.....
0058dab0: 432b 63b4 cc3f 1200 2a16 0600 1000 39c8  C+c..?..*.....9.
0058dac0: 7a28 3eb4 cc3f 1200 7e14 1600 1000 39c8  z(>..?..~.....9.
0058dad0: 7a28 3eb4 cc3f 1200 7e14 1500 1000 39c8  z(>..?..~.....9.
0058dae0: 7a28 3eb4 cc3f 1200 7e14 1400 1000 39c8  z(>..?..~.....9.
0058daf0: 7a28 3eb4 cc3f 1200 7e14 1300 1000 39c8  z(>..?..~.....9.
0058db00: 7a28 3eb4 cc3f 1200 7e14 1200 1000 39c8  z(>..?..~.....9.
0058db10: 7a28 3eb4 cc3f 1200 2c09 4600 1000 f9a1  z(>..?..,.F.....
0058db20: 8b0a 7cb4 cc3f 1100 73e0 4c00 1000 b0ca  ..|..?..s.L.....
0058db30: d011 4ab4 cc3f 1100 06c2 0700 1000 f777  ..J..?.........w
0058db40: 1a05 aab4 cc3f 1100 dfaa 4600 1000 82df  .....?....F.....
0058db50: 14ac 91b4 cc3f 1100 07a4 0f00 1000 0df4  .....?..........
0058db60: f025 8cb4 cc3f 1100 19a3 4f00 1000 0df4  .%...?....O.....
0058db70: f025 8cb4 cc3f 1100 19a3 4e00 1000 0df4  .%...?....N.....
0058db80: f025 8cb4 cc3f 1100 19a3 4d00 1000 0df4  .%...?....M.....
0058db90: f025 8cb4 cc3f 1100 19a3 4c00 1000 0df4  .%...?....L.....
0058dba0: f025 8cb4 cc3f 1100 19a3 4b00 1000 0df4  .%...?....K.....
0058dbb0: f025 8cb4 cc3f 1100 069f 1e00 1000 8fc1  .%...?..........
0058dbc0: bcc4 9bb4 cc3f 1100 ae9d 0300 1000 8fc1  .....?..........
0058dbd0: bcc4 9bb4 cc3f 1100 8d82 4900 1000 5e2d  .....?....I...^-
0058dbe0: 7dc4 79b4 cc3f 1100 8c82 2800 1000 5e2d  }.y..?....(...^-
0058dbf0: 7dc4 79b4 cc3f 1100 8c82 2700 1000 5e2d  }.y..?....'...^-
0058dc00: 7dc4 79b4 cc3f 1100 8c82 2600 1000 5e2d  }.y..?....&...^-
0058dc10: 7dc4 79b4 cc3f 1100 8c82 2500 1000 5e2d  }.y..?....%...^-
0058dc20: 7dc4 79b4 cc3f 1100 007d 4100 1000 5e2d  }.y..?...}A...^-
0058dc30: 7dc4 79b4 cc3f 1100 e27c 4200 1000 5e2d  }.y..?...|B...^-
0058dc40: 7dc4 79b4 cc3f 1100 cc7c 0300 1000 e183  }.y..?...|......
0058dc50: 2610 91b4 cc3f 1100 2f6f 1200 1000 9473  &....?../o.....s
0058dc60: d989 8ab4 cc3f 1100 6c33 4800 1000 2e08  .....?..l3H.....
0058dc70: 2178 7db4 cc3f 1100 312d 4100 1000 006f  !x}..?..1-A....o

There's a clear 16-byte periodicity to this junk, but it doesn't
look much like anything I'd expect to find in Postgres data.
What it *does* remind me of is certain types of hardware failure
modes, eg a 16-byte-wide DMA path going nuts.  Or it could be valid
data from some non-PG application, though I dunno what offhand.

Continuing to dig through the places where WAL page headers ought to be,
there are additional long stretches of seeming sanity, eg here

11060000 d05e 0001 0001 0000 006b 0000 c000 6a00
11100000 d05e 0001 0001 0000 006b 0000 e000 6a00
11120000 d05e 0001 0001 0000 006b 0000 0000 6a01
11140000 d05e 0001 0001 0000 006b 0000 2000 6a01
11160000 d05e 0001 0001 0000 006b 0000 4000 6a01
11200000 d05e 0001 0001 0000 006b 0000 6000 6a01
11220000 d05e 0000 0001 0000 006b 0000 8000 6a01
11240000 d05e 0000 0001 0000 006b 0000 a000 6a01
11260000 d05e 0000 0001 0000 006b 0000 c000 6a01

and here's an interesting stretch

14260000 d05e 0001 0001 0000 006b 0000 c000 6a0d
14300000 d05e 0000 0001 0000 006b 0000 e000 6a0d
14320000 d05e 0001 0001 0000 006b 0000 0000 6a0e
14340000 d05e 0001 0001 0000 006b 0000 2000 6a0e
14360000 d05e 0000 0001 0000 0067 0000 4000 610e
14400000 d05e 0001 0001 0000 0067 0000 6000 610e
14420000 d05e 0001 0001 0000 0067 0000 8000 610e
14440000 d05e 0001 0001 0000 0067 0000 a000 610e
14460000 d05e 0001 0001 0000 0067 0000 c000 610e
14500000 d05e 0001 0001 0000 0067 0000 e000 610e
14520000 d05e 0000 0001 0000 0067 0000 0000 610f
14540000 d05e 0001 0001 0000 0067 0000 2000 610f
14560000 d05e 0001 0001 0000 0067 0000 4000 610f
14600000 d05e 0001 0001 0000 0067 0000 6000 610f
14620000 d05e 0001 0001 0000 0067 0000 8000 610f
14640000 d05e 0001 0001 0000 0067 0000 a000 610f
14660000 d05e 0001 0001 0000 0067 0000 c000 610f
14700000 d05e 0001 0001 0000 0067 0000 e000 610f
14720000 d05e 0001 0001 0000 0065 0000 0000 5910
14740000 d05e 0000 0001 0000 0065 0000 2000 5910
14760000 d05e 0001 0001 0000 0065 0000 4000 5910
15000000 d05e 0001 0001 0000 0065 0000 6000 5910
15020000 d05e 0001 0001 0000 0061 0000 8000 4b10
15040000 d05e 0001 0001 0000 0061 0000 a000 4b10
15060000 d05e 0001 0001 0000 0061 0000 c000 4b10
15100000 d05e 0001 0001 0000 0061 0000 e000 4b10
15120000 d05e 0001 0001 0000 0061 0000 0000 4b11

and ending up here

77320000 d05e 0001 0001 0000 0043 0000 0000 a1da
77340000 d05e 0001 0001 0000 0043 0000 2000 a1da
77360000 d05e 0001 0001 0000 0043 0000 4000 a1da
77400000 d05e 0001 0001 0000 0043 0000 6000 a1da
77420000 d05e 0001 0001 0000 0043 0000 8000 a1da
77440000 d05e 0001 0001 0000 0043 0000 a000 a1da
77460000 d05e 0001 0001 0000 0043 0000 c000 a1da
77500000 d05e 0000 0001 0000 0043 0000 e000 a1da
77520000 d05e 0001 0001 0000 0043 0000 0000 a1db
77540000 d05e 0001 0001 0000 0043 0000 2000 a1db
77560000 d05e 0001 0001 0000 0043 0000 4000 a1db
77600000 d05e 0001 0001 0000 0043 0000 6000 a1db
77620000 d05e 0001 0001 0000 0043 0000 8000 a1db
77640000 d05e 0000 0001 0000 0043 0000 a000 a1db
77660000 d05e 0001 0001 0000 0043 0000 c000 a1db
77700000 d05e 0001 0001 0000 0043 0000 e000 a1db
77720000 d05e 0001 0001 0000 0043 0000 0000 a1dc
77740000 d05e 0001 0001 0000 0043 0000 2000 a1dc
77760000 d05e 0000 0001 0000 0043 0000 4000 a1dc

What we've got here is chunks of stuff that look like they are
WAL data, all right, but they are in the wrong positions in the
wrong files.

This could be the kernel's fault, but I'm wondering whether the
RAID controller is going south.  Offhand it seems like the controller
would be the only place that would be likely to explain both the
bogus-data and good-data-in-wrong-place behaviors.

            regards, tom lane

Re: Duplicate values found when reindexing unique index

From
"Mason Hale"
Date:
Hello Tom --
a thousand thanks for taking the time to walk through those files.

This could be the kernel's fault, but I'm wondering whether the
> RAID controller is going south.  Offhand it seems like the controller
> would be the only place that would be likely to explain both the
> bogus-data and good-data-in-wrong-place behaviors.
>

To clarify a bit further -- on the production server, the data is written to
a 10-disk RAID 1+0, but the pg_xlog directory is symlinked to a separate,
dedicated SATA II disk.

There is a similar setup on the standby server, except that in addition to
the RAID for the data, and a separate SATA II disk for the pg_xlog, there is
another disk (also SATA II) dedicated for the archive of wal files copied
over from the production server.

I mention all this because if the problem is the wal files in the pg_xlog
directory becoming corrupted, and those exist on a dedicated disk (not in a
RAID), then doesn't that indicate that the RAID controller is not involved?

Mason

Re: Duplicate values found when reindexing unique index

From
Tom Lane
Date:
"Mason Hale" <masonhale@gmail.com> writes:
>> This could be the kernel's fault, but I'm wondering whether the
>> RAID controller is going south.

> To clarify a bit further -- on the production server, the data is written to
> a 10-disk RAID 1+0, but the pg_xlog directory is symlinked to a separate,
> dedicated SATA II disk.

> There is a similar setup on the standby server, except that in addition to
> the RAID for the data, and a separate SATA II disk for the pg_xlog, there is
> another disk (also SATA II) dedicated for the archive of wal files copied
> over from the production server.

Oh.  Maybe it's one of those disks' fault then.  Although WAL corruption
would not lead to corruption of the primary DB as long as there were no
crash/replay events.  Maybe there is more than one issue here, or maybe
it's the kernel's fault after all.

            regards, tom lane

Re: Duplicate values found when reindexing unique index

From
Simon Riggs
Date:
On Mon, 2007-12-31 at 18:35 -0500, Tom Lane wrote:
> "Mason Hale" <masonhale@gmail.com> writes:
> >> This could be the kernel's fault, but I'm wondering whether the
> >> RAID controller is going south.
>
> > To clarify a bit further -- on the production server, the data is written to
> > a 10-disk RAID 1+0, but the pg_xlog directory is symlinked to a separate,
> > dedicated SATA II disk.
>
> > There is a similar setup on the standby server, except that in addition to
> > the RAID for the data, and a separate SATA II disk for the pg_xlog, there is
> > another disk (also SATA II) dedicated for the archive of wal files copied
> > over from the production server.
>
> Oh.  Maybe it's one of those disks' fault then.  Although WAL corruption
> would not lead to corruption of the primary DB as long as there were no
> crash/replay events.  Maybe there is more than one issue here, or maybe
> it's the kernel's fault after all.

The standby replays from the archive drive, whereas the primary does
crash recovery from the pg_xlog. We know that the primary is corrupted
in some way, and so is the standby, plus we know the standby corruption
occurred after it was copied to the archive and restored/recovered. So
we must have problems on at least two drives.

If we have had at least one recent primary server database crash
recovery then we might explain all the corruptions by a common issue
related to the SATA II drives. That might be the device driver but maybe
other things as well.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

Re: Duplicate values found when reindexing unique index

From
"Mason Hale"
Date:
Hi everyone --

Sorry to revisit a dead horse, but I wanted to clear up some misinformation
--

On Dec 31, 2007 5:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> "Mason Hale" <masonhale@gmail.com> writes:
> >> This could be the kernel's fault, but I'm wondering whether the
> >> RAID controller is going south.
>
> > To clarify a bit further -- on the production server, the data is
> written to
> > a 10-disk RAID 1+0, but the pg_xlog directory is symlinked to a
> separate,
> > dedicated SATA II disk.
>
> > There is a similar setup on the standby server, except that in addition
> to
> > the RAID for the data, and a separate SATA II disk for the pg_xlog,
> there is
> > another disk (also SATA II) dedicated for the archive of wal files
> copied
> > over from the production server.
>

It turns out that the separate SATA II disk was configured as a single-disk
JBOD on the same controller as the 10-disk RAID 1+0.

Since we've seen corruption in the data directory (on the RAID) and in the
pg_xlog directory (on the SATA II disk) the RAID controller is one of the
few common elements between those two partitions and hence is highly
suspect, and may dispel some of the mystery with our situation.

We will be replacing the RAID controller in short order. For what it is
worth it is an Adaptec 31605 with a battery backup module.


>
> Oh.  Maybe it's one of those disks' fault then.  Although WAL corruption
> would not lead to corruption of the primary DB as long as there were no
> crash/replay events.  Maybe there is more than one issue here, or maybe
> it's the kernel's fault after all.
>
>
Given the new information about the RAID controller is managing all the
disks in the question (after all) -- if the RAID controller is going south,
then there would be no need for a crash/replay event for that corruption to
make it into the primary DB. Seems to be pretty damning evidence against the
RAID controller, agreed?

Mason

Re: Duplicate values found when reindexing unique index

From
Alvaro Herrera
Date:
Tom Lane escribió:

> Given this, and the index corruption you showed before (the wrong
> sibling link, which would represent index breakage quite independent of
> what was in the heap), and the curious contents of your WAL files
> (likewise not explainable by anything going wrong within a table),
> I'm starting to think that Occam's razor says you've got hardware
> problems.  Or maybe a kernel-level bug that is causing writes to get
> discarded.

FWIW, on a customer's setup we've seen something that looks very much
like this: table pages being rewritten by something that I couldn't
identify, but which certainly weren't normal table or index pages.  On
looking at your report and diagnosis, I'm inclined to think that they
were WAL records due to the similarity and regularity.  (I'll take a
look again).

I also wrote them off as hardware failure or kernel bugs.

I don't think we've been seen the problem again.  But what happened in
the end was that the customer asked us whether we could make something
about CRC'ing the files we use in order to detect system failure
proactively.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Duplicate values found when reindexing unique index

From
"Michael Orozco"
Date:
Someone tell me now to get off this list

-----Original Message-----
From: pgsql-bugs-owner@postgresql.org [mailto:pgsql-bugs-owner@postgresql.o=
rg] On Behalf Of Alvaro Herrera
Sent: Thursday, January 03, 2008 8:30 PM
To: Tom Lane
Cc: Mason Hale; pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Duplicate values found when reindexing unique index

Tom Lane escribi=C3=B3:

> Given this, and the index corruption you showed before (the wrong
> sibling link, which would represent index breakage quite independent of
> what was in the heap), and the curious contents of your WAL files
> (likewise not explainable by anything going wrong within a table),
> I'm starting to think that Occam's razor says you've got hardware
> problems.  Or maybe a kernel-level bug that is causing writes to get
> discarded.

FWIW, on a customer's setup we've seen something that looks very much
like this: table pages being rewritten by something that I couldn't
identify, but which certainly weren't normal table or index pages.  On
looking at your report and diagnosis, I'm inclined to think that they
were WAL records due to the similarity and regularity.  (I'll take a
look again).

I also wrote them off as hardware failure or kernel bugs.

I don't think we've been seen the problem again.  But what happened in
the end was that the customer asked us whether we could make something
about CRC'ing the files we use in order to detect system failure
proactively.

--=20
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: Duplicate values found when reindexing unique index

From
"Kevin Grittner"
Date:
>>> On Fri, Jan 4, 2008 at  8:00 AM, in message
<FD4D6FBAA8631042A2A121E1B0A15AE502256E39@ncuexch2.ncu.local>, "Michael Oro=
zco"
<morozco@numarkcu.org> wrote:=20
> Someone tell me now to get off this list
=20
=46rom the main web page at:
=20
http://www.postgresql.org/
=20
There is a link to "Mailing Lists".  If you take that, you will see, up at =
the top:
=20
How to Subscribe or Unsubscribe

Fill out this form.

Click that link and fill out the form.
=20
-Kevin
=20