Re: BUG #17892: Bug primary key - Mailing list pgsql-bugs

From Alvaro Herrera
Subject Re: BUG #17892: Bug primary key
Date
Msg-id 20230413151808.dlrbdvedxkmtflp2@alvherre.pgsql
Whole thread Raw
In response to BUG #17892: Bug primary key  (PG Bug reporting form <noreply@postgresql.org>)
List pgsql-bugs
On 2023-Apr-12, PG Bug reporting form wrote:

> The following bug has been logged on the website:
> 
> Bug reference:      17892
> Logged by:          Comanici Bogdan
> Email address:      comanicibogdy@yahoo.com
> PostgreSQL version: 13.9
> Operating system:   Debian 11
> Description:        
> 
> We have a database with over 3 milions apps in it
> Something happend we just found duplicates with same id, but we have
> constraints : PRIMARY KEY on id
> 
> How it's happened?
> ID(PRIMARY KEY) SLUG(UNIQUE KEY)
> 29181513    nottiled    0    Home-and-Hobby    Recreation    mirwanda-center    false    0    false    false    false
  true    false    2019-07-15
 
> 11:53:38.000    2023-04-10
> 21:44:34.000    0.0    0.0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    NotTiled
0   0.00    1.8.6    Freeware            0.0    2022-09-19    adsense    true    0            2    false
 
> 29181513    nottiled    0    Home-and-Hobby    Recreation    mirwanda-center    false    0    false    false    false
  true    false    2019-07-15
 
> 11:53:38.000    2023-03-25
> 14:32:55.000    0.0    0.0    0    26    0    1    0    26    0    0    5    0    0    0    5    0    0    NotTiled
22020096    0.00    1.7.0a    Freeware            0.0    2021-09-03    adsense    true    20230410            2
false
> 
> We tried manually to insert a new row with same value, get error for unique
> key slug

It sounds like both are different row versions of the same row, given
that they have the same ID and SLUG.  To me it sounds like the second
row shown here existed, then an UPDATE came along and created the other
one (changing for example the second timestamp column from
'2023-03-25 14:32:55.000' to '2023-04-10 21:44:34.000', among other
things); and then the first one should have been deleted but wasn't.

This could have happened, for example, if VACUUM later processed the
table, removed a few dead entries at the end of the table, but while it
was waiting to truncate the last few pages empty of any live tuple, a
crash came along; then, after recovery, pages which should have been
truncated away, came back to life.

You say you have more cases of the same thing; maybe check the "ctid"
values for the duplicates; if you can see a bunch of them nearby, in a
page close to the physical end of the table, that bug may explain the
issue.


You have a column which in the second copy of the row has the value
20230410, which appears to match another timestamp column in the other
copy of the row.  That's a curious coincidence; I wonder if that column
indicate "recheck this app by this time" or something like that.  Not
that it matters ...

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"This is what I like so much about PostgreSQL.  Most of the surprises
are of the "oh wow!  That's cool" Not the "oh shit!" kind.  :)"
Scott Marlowe, http://archives.postgresql.org/pgsql-admin/2008-10/msg00152.php



pgsql-bugs by date:

Previous
From: "Wetmore, Matthew (CTR)"
Date:
Subject: Error while installing Postgresql
Next
From:
Date:
Subject: 1. PostgreSQL Server Subprocess Went down at function 'pg_detoast_datum_packed'