Thread: BUG #17892: Bug primary key

BUG #17892: Bug primary key

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      17892
Logged by:          Comanici Bogdan
Email address:      comanicibogdy@yahoo.com
PostgreSQL version: 13.9
Operating system:   Debian 11
Description:

We have a database with over 3 milions apps in it
Something happend we just found duplicates with same id, but we have
constraints : PRIMARY KEY on id

How it's happened?
ID(PRIMARY KEY) SLUG(UNIQUE KEY)
29181513    nottiled    0    Home-and-Hobby    Recreation    mirwanda-center    false    0    false    false    false
true    false    2019-07-15 
11:53:38.000    2023-04-10
21:44:34.000    0.0    0.0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    NotTiled    0
 0.00    1.8.6    Freeware            0.0    2022-09-19    adsense    true    0            2    false
 
29181513    nottiled    0    Home-and-Hobby    Recreation    mirwanda-center    false    0    false    false    false
true    false    2019-07-15 
11:53:38.000    2023-03-25
14:32:55.000    0.0    0.0    0    26    0    1    0    26    0    0    5    0    0    0    5    0    0    NotTiled
22020096   0.00    1.7.0a    Freeware            0.0    2021-09-03    adsense    true    20230410            2
false

We tried manually to insert a new row with same value, get error for unique
key slug


Re: BUG #17892: Bug primary key

From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes:
> We have a database with over 3 milions apps in it
> Something happend we just found duplicates with same id, but we have
> constraints : PRIMARY KEY on id

With no details it's impossible to do more than speculate.
However, if these are string-type columns then a possible cause
is an OS update having changed collation rules underneath you.
There's more details about that in

https://wiki.postgresql.org/wiki/Locale_data_changes

In any case the path to follow is to manually fix the duplicates
and REINDEX.

            regards, tom lane



Re: BUG #17892: Bug primary key

From
Michael Paquier
Date:
On Wed, Apr 12, 2023 at 08:03:28AM -0400, Tom Lane wrote:
> In any case the path to follow is to manually fix the duplicates
> and REINDEX.

A latent bug in Postgres could be a possibility, as well, but it is
not really possible to rule out that without more details.

I don't recall anything specific that's been fixed in 13.10, but you
should update to it, at least.
--
Michael

Attachment

Re: BUG #17892: Bug primary key

From
Peter Geoghegan
Date:
On Wed, Apr 12, 2023 at 3:39 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
> We have a database with over 3 milions apps in it
> Something happend we just found duplicates with same id, but we have
> constraints : PRIMARY KEY on id

You should run pg_amcheck/amcheck on your database to determine the
extent of the problem. You're running Postgres 13, which has the
amcheck corruption checking module, but not the more user-friendly
pg_amcheck utility. So you'll need to install the amcheck contrib
extension.

Once it is installed, you can run a query like the following (you may
want to customize this):

SELECT bt_index_check(index => c.oid, heapallindexed => true),
c.relname,
c.relpages
FROM pg_index i
JOIN pg_opclass op ON i.indclass[0] = op.oid
JOIN pg_am am ON op.opcmethod = am.oid
JOIN pg_class c ON i.indexrelid = c.oid
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE am.amname = 'btree'
-- Don't check temp tables, which may be from another session:
AND c.relpersistence != 't'
-- Function may throw an error when this is omitted:
AND c.relkind = 'i' AND i.indisready AND i.indisvalid
ORDER BY c.relpages DESC;

The query will throw errors when it finds corruption. That should give
you a general idea of the extent of the problem. Maybe the only
corruption is the corruption that you know about already, but it's
more likely that other indexes are also affected.

If this query takes too long to complete you may find it useful to add
something to limit the indexes check, such as: AND n.nspname =
'public' -- that change to the SQL will make the query just test
indexes from the public schema. You could also customize it to just
check indexes on text columns, once you're sure that the problem is
collation related (ask me how if this is useful). You could also specify
"heapallindexed=>false" to make it run faster (though this will also be
less thorough).

Do "SET client_min_messages=DEBUG1 " to get a kind of rudimentary
progress indicator, if that seems useful to you.

The docs have further information on what this bt_index_check
function does:

https://www.postgresql.org/docs/13/amcheck.html
--
Peter Geoghegan



Re: BUG #17892: Bug primary key

From
Alvaro Herrera
Date:
On 2023-Apr-12, PG Bug reporting form wrote:

> The following bug has been logged on the website:
> 
> Bug reference:      17892
> Logged by:          Comanici Bogdan
> Email address:      comanicibogdy@yahoo.com
> PostgreSQL version: 13.9
> Operating system:   Debian 11
> Description:        
> 
> We have a database with over 3 milions apps in it
> Something happend we just found duplicates with same id, but we have
> constraints : PRIMARY KEY on id
> 
> How it's happened?
> ID(PRIMARY KEY) SLUG(UNIQUE KEY)
> 29181513    nottiled    0    Home-and-Hobby    Recreation    mirwanda-center    false    0    false    false    false
  true    false    2019-07-15
 
> 11:53:38.000    2023-04-10
> 21:44:34.000    0.0    0.0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    NotTiled
0   0.00    1.8.6    Freeware            0.0    2022-09-19    adsense    true    0            2    false
 
> 29181513    nottiled    0    Home-and-Hobby    Recreation    mirwanda-center    false    0    false    false    false
  true    false    2019-07-15
 
> 11:53:38.000    2023-03-25
> 14:32:55.000    0.0    0.0    0    26    0    1    0    26    0    0    5    0    0    0    5    0    0    NotTiled
22020096    0.00    1.7.0a    Freeware            0.0    2021-09-03    adsense    true    20230410            2
false
> 
> We tried manually to insert a new row with same value, get error for unique
> key slug

It sounds like both are different row versions of the same row, given
that they have the same ID and SLUG.  To me it sounds like the second
row shown here existed, then an UPDATE came along and created the other
one (changing for example the second timestamp column from
'2023-03-25 14:32:55.000' to '2023-04-10 21:44:34.000', among other
things); and then the first one should have been deleted but wasn't.

This could have happened, for example, if VACUUM later processed the
table, removed a few dead entries at the end of the table, but while it
was waiting to truncate the last few pages empty of any live tuple, a
crash came along; then, after recovery, pages which should have been
truncated away, came back to life.

You say you have more cases of the same thing; maybe check the "ctid"
values for the duplicates; if you can see a bunch of them nearby, in a
page close to the physical end of the table, that bug may explain the
issue.


You have a column which in the second copy of the row has the value
20230410, which appears to match another timestamp column in the other
copy of the row.  That's a curious coincidence; I wonder if that column
indicate "recheck this app by this time" or something like that.  Not
that it matters ...

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"This is what I like so much about PostgreSQL.  Most of the surprises
are of the "oh wow!  That's cool" Not the "oh shit!" kind.  :)"
Scott Marlowe, http://archives.postgresql.org/pgsql-admin/2008-10/msg00152.php