Re: Explaining duplicate rows in spite of unique index - Mailing list pgsql-general

From Albe Laurenz
Subject Re: Explaining duplicate rows in spite of unique index
Date
Msg-id D960CB61B694CF459DCFB4B0128514C20393819F@exadv11.host.magwien.gv.at
Whole thread Raw
In response to Explaining duplicate rows in spite of unique index  ("Albe Laurenz" <laurenz.albe@wien.gv.at>)
List pgsql-general
I wrote:

> We recently found a couple of rows in a production database
> that had identical values in the columns constituting the primary key
> (The problem surfaced because a pg_dump could not be restored).
> 
> Now I'm looking for explanations how this could happen.
> 
> The rows originate from around the time when we had a hardware
> failure that corrupted the file system. The database came up
> after a file system check, and people continued working until
> we noticed that some tables were corrupted.
> 
> At that point we restored an online backup and recovered past
> the time of the hardware failure. The WALs were intact and recovery
> completed successfully.
> 
> Now does the following explanation sound plausible:
[...]

For the record, I set up a small experiment and could reproduce the
corruption like this:

- initdb new cluster, start the postmaster
- create a table with a primary key constraint
- perform an online backup
- stop the postmaster
- cp -p pg_clog/0000 pg_clog/0000.bak
- start the postmaster
- insert a record, commit
- stop the postmaster
- mv pg_clog/0000.bak pg_clog/0000
- start the postmaster
- Insert and commit a second record with the same primary key.
  This will succeed since the previous commit is "gone".
- stop the postmaster
- perform PITR from the online backup to the end of WALs

The PITR will succeed, and you end up with two rows with the
same primary key, but only the second one is in the unique index.

Here's what I learned:
- Don't let anybody work on a database after a hardware problem
  that affected the integrity of the file system, even if the
  postmaster comes up without complaining.
- Never perform point-in-time-recovery beyond the time when
  the hardware problem happened.

Yours,
Laurenz Albe

pgsql-general by date:

Previous
From: "Albe Laurenz"
Date:
Subject: Re:
Next
From: "Luigi Antognini"
Date:
Subject: Transaction isolation when applying DDLs