Re: Duplicated IDs - Mailing list pgsql-general

From Alexis Bernard
Subject Re: Duplicated IDs
Date
Msg-id CANiB9R9QGHzB3_Jvv3H+FdkfBhT2LL=dA0sJiUci0cPuBDxuyw@mail.gmail.com
Whole thread Raw
In response to Re: Duplicated IDs  (Alban Hertroys <haramrae@gmail.com>)
Responses Re: Duplicated IDs  (Alban Hertroys <haramrae@gmail.com>)
List pgsql-general
Hi,

Thanks for your detailed response Alban.

Primary key definition: "tracks_pkey" PRIMARY KEY, btree (id)
select version(): PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-16ubuntu6) 4.8.2, 64-bit
Ubuntu trusty with kernel 3.13.0-29-generic.

I have a autovacuum process running and configuration is by default: ps aux | grep vacuum | grep -v grep
postgres   587  0.0  0.0 1174304 3720 ?        Ss   juil.30   0:05 postgres: autovacuum launcher process

I tried to vacuum manualy.

I tried reindex:

=> reindex index tracks_pkey;
ERROR:  failed to find parent tuple for heap-only tuple at (38802,116) in table "tracks

Here are more information about storage (raid 1):

lshw -class disk -class storage
  *-storage               
       description: Serial Attached SCSI controller
       produit: SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]
       fabriquant: LSI Logic / Symbios Logic
       identifiant matériel: 0
       information bus: pci@0000:01:00.0
       nom logique: scsi0
       version: 03
       bits: 64 bits
       horloge: 33MHz
       fonctionnalités: storage pm pciexpress vpd msi msix bus_master cap_list rom
       configuration: driver=mpt2sas latency=0
       ressources: irq:16 portE/S:2000(taille=256) mémoire:c5140000-c514ffff mémoire:c5100000-c513ffff mémoire:c5400000-c54fffff
     *-disk:0 NON-RÉCLAMÉ
          description: ATA Disk
          produit: ST1000NM0033-9ZM
          fabriquant: Seagate
          identifiant matériel: 0.0.0
          information bus: scsi@0:0.0.0
          version: GA04
          numéro de série: Z1W0HGB9
          configuration: ansiversion=5
     *-disk:1
          description: SCSI Disk
          produit: Virtual Disk
          fabriquant: Dell
          identifiant matériel: 1.0.0
          information bus: scsi@0:1.0.0
          nom logique: /dev/sda
          version: 1028
          taille: 931GiB (999GB)
          capacité: 931GiB (999GB)
          fonctionnalités: 15000rpm partitioned partitioned:dos
          configuration: ansiversion=6 sectorsize=512 signature=000e59a7
     *-disk:2 NON-RÉCLAMÉ
          description: ATA Disk
          produit: ST1000NM0033-9ZM
          fabriquant: Seagate
          identifiant matériel: 0.1.0
          information bus: scsi@0:0.1.0
          version: GA04
          numéro de série: Z1W0HFYZ
          configuration: ansiversion=5
  *-storage
       description: SATA controller
       produit: 6 Series/C200 Series Chipset Family SATA AHCI Controller
       fabriquant: Intel Corporation
       identifiant matériel: 1f.2
       information bus: pci@0000:00:1f.2
       version: 04
       bits: 32 bits
       horloge: 66MHz
       fonctionnalités: storage msi pm ahci_1.0 bus_master cap_list
       configuration: driver=ahci latency=0
       ressources: irq:43 portE/S:3048(taille=8) portE/S:3054(taille=4) portE/S:3040(taille=8) portE/S:3050(taille=4) portE/S:3020(taille=32) mémoire:c5204000-c52047ff

Cheers,
Alexis.


2014-08-09 12:35 GMT+02:00 Alban Hertroys <haramrae@gmail.com>:
On 09 Aug 2014, at 11:38, Alexis Bernard <alexis@bernard.io> wrote:

> Hi,
>
> I am having regurlarly duplicated rows with the same primary key.
>
> => select id, created_at, updated_at from tracks where created_at = '2014-07-03 15:09:16.336488';
>    id   |         created_at         |         updated_at
> --------+----------------------------+----------------------------
>  331804 | 2014-07-03 15:09:16.336488 | 2014-07-03 15:37:55.253729
>  331804 | 2014-07-03 15:09:16.336488 | 2014-07-03 15:37:55.253801
>
> => select id from tracks where id = 331804;
>  id
> ----
> (0 rows)
>

First of all, what is the definition of that primary key? 
What exact version of PG are you using?: select version();
What exact OS is this on? What kind of storage?
To me it looks like you may have run into transaction wrap-around or a corrupted index.

Before you do anything, make a backup.

Theorising that the issue here indeed is transaction wrap-around, what you’re seeing may be data from older transactions that has become newer because your current transaction txid is lower (due to the wraparound) than the txid of the transactions those rows belong(ed) to. If those transactions were committed, then you’re possibly seeing deleted or updated rows that are still around. TX wraparound can occur if you do not vacuum frequently enough and another thing that vacuum does is mark old rows obsolete so that the DB can reclaim the space they use. Seeing data from rows that are no longer there or that has been modified since seems to fit the bill here.

Hence the question: When did you last (auto-)vacuum this table? Did you perhaps turn autovacuum off? Did it fall behind?

If you have been vacuuming and the issue is a corrupt index: Does it help to reindex that table? You said that you have fsync on; what kind of storage is this database on? Something allowed that index to get corrupted. It is more likely that it’s caused by something in the underlying storage system (including the OS) than that it is a bug in PG.

> => delete from tracks where created_at = '2014-07-03 15:09:16.336488' and updated_at = '2014-07-03 15:37:55.253801';
> ERROR:  update or delete on table "tracks" violates foreign key constraint "fk_sources_on_track_id" on table "sources"
> DETAIL:  Key (id)=(331804) is still referenced from table "sources”.

Apparently there is a row from another table referencing this one. So either the referenced row does actually exist (corrupt index theory) or it doesn’t and the referencing row is from an older transaction as well (TX wraparound theory).

Considering that you’re seeing this regularly, my bet is on TX wraparound.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.


pgsql-general by date:

Previous
From: Alban Hertroys
Date:
Subject: Re: Duplicated IDs
Next
From: Adrian Klaver
Date:
Subject: Re: postgresql referencing and creating types as record