Thread: Remove duplicated row in pg_largeobject_metadata

Remove duplicated row in pg_largeobject_metadata

From
Tobias Meyer
Date:
Well, it seems we have hit a bug in postgresql 10. 

We tried running vacuumlo on a database and it complained at some point with a message

Failed to remove lo 64985186: ERROR:  large object  64985186 does not exist
Removal from database "XXXXX" failed at object 26 of 100.

Yet, object  64985186 is neither in pg_largeobject.loid nor in pg_largeobject_metadata.oid

Researching similar problems did  not yield anything enlightening, except for a comment by Tom Lane that vacuumlo will use an index.

That led be to the assumption an index had gone wrong and I did
REINDEX ( VERBOSE ) TABLE pg_largeobject_metadata;

Which failed with the message 
ERROR:  could not create unique index "pg_largeobject_metadata_oid_index"
DETAIL:  Key (oid)=(665238) is duplicated.

Well, 
select oid,* from pg_largeobject_metadata where oid=665238;
  oid   | lomowner | lomacl
--------+----------+--------
 665238  |    16123 |
(1 row)

That seemed odd, so I decided to do a FULL VACUUM
VACUUM (full, verbose) pg_largeobject_metadata

Now I see
select oid,* from pg_largeobject_metadata where oid=665238;
  oid   | lomowner | lomacl
--------+----------+--------
 665238 |    16123 |
 665238 |    16123 |
(2 rows)

Deleting both would be a no-go, as I cannot INSERT a specific oid to add one back as far as I know.  
So how do I delete only one of the two? 

Luckily this is only a test instance, so I'm not totally freaked out now. 

Note: I randomized the IDs a bit in this post for security reasons, but kept the scale. Please ignore should they be in an unusual range.

Thanks,
Tobias

Re: Remove duplicated row in pg_largeobject_metadata

From
Tom Lane
Date:
Tobias Meyer <t9m@qad.com> writes:
> Now I see
> select oid,* from pg_largeobject_metadata where oid=665238;
>   oid   | lomowner | lomacl
> --------+----------+--------
>  665238 |    16123 |
>  665238 |    16123 |
> (2 rows)

hmm ... is this a reasonably up-to-date v10?

> Deleting both would be a no-go, as I cannot INSERT a specific oid to add
> one back as far as I know.
> So how do I delete only one of the two?

Delete by ctid.

select ctid, oid, * from pg_largeobject_metadata where oid=665238;
delete from pg_largeobject_metadata where ctid = 'pick one';

            regards, tom lane



Re: Remove duplicated row in pg_largeobject_metadata

From
Tobias Meyer
Date:
Hi Tom, 
thanks for taking a look.
 
hmm ... is this a reasonably up-to-date v10?

PostgreSQL 10.18 
The latest packaged with Ubuntu 18.04. 
 
Delete by ctid.

select ctid, oid, * from pg_largeobject_metadata where oid=665238;
delete from pg_largeobject_metadata where ctid = 'pick one';

OK, that worked, but it seems there is more than one...

so I did
select ctid as tid,oid,lomowner into temp table temp_pgmeta from pg_largeobject_metadata;
select * from ( select count(tid) as cnt,max(tid), min(tid), oid from temp_pgmeta group by oid) x where cnt > 1;

and that gave me 2951857 rows


I admit we have quite a few of LOs in there, around 32 million, where vacuumlo had already cleaned 47 million in a first run. 

And I know this is bad practise, but we forgot to unlink when deleting things - it's why we needed vacuumlo to begin with. 


Any ideas what might have happened?


Kind regards, Tobias

Re: Remove duplicated row in pg_largeobject_metadata

From
Tom Lane
Date:
Tobias Meyer <t9m@qad.com> writes:
> so I did
> select ctid as tid,oid,lomowner into temp table temp_pgmeta from
> pg_largeobject_metadata;
> select * from ( select count(tid) as cnt,max(tid), min(tid), oid from
> temp_pgmeta group by oid) x where cnt > 1;

> and that gave me 2951857 rows

Yipes.  Did you verify that the TIDs are all distinct?

A possible theory is that pg_largeobject_metadata_oid_index has been
corrupt for a long time, allowing a lot of duplicate entries to be made.
However, unless pg_largeobject's pg_largeobject_loid_pn_index is *also*
corrupt, you'd think that creation of such duplicates would still be
stopped by that unique index.  There's something mighty odd here.

            regards, tom lane



Re: Remove duplicated row in pg_largeobject_metadata

From
Tobias Meyer
Date:
Yipes.  Did you verify that the TIDs are all distinct?
yes, they were.
 
A possible theory is that pg_largeobject_metadata_oid_index has been
corrupt for a long time, allowing a lot of duplicate entries to be made.
However, unless pg_largeobject's pg_largeobject_loid_pn_index is *also*
corrupt, you'd think that creation of such duplicates would still be
stopped by that unique index.  There's something mighty odd here.

That would be a mildly disturbing thought indeed. 
Any way to quickly check that without reindexing?

But if those duplicates were inserted during normal operation, that would mean there had been an OID overflow, correct?
And that would also mean we would have to be referencing the same OID in more than one place (different LOs actually), which I could not see in the other tables. Did not check that for all 2.9 million though.

Let me roll back the test instance to before the first vacuumlo run and verify if the index was OK before - will only get to do that on monday though. 

Kind regards, 
Tobias


Re: Remove duplicated row in pg_largeobject_metadata

From
Tobias Meyer
Date:
 
A possible theory is that pg_largeobject_metadata_oid_index has been
corrupt for a long time, allowing a lot of duplicate entries to be made.
However, unless pg_largeobject's pg_largeobject_loid_pn_index is *also*
corrupt, you'd think that creation of such duplicates would still be
stopped by that unique index.  There's something mighty odd here.

Let me roll back the test instance to before the first vacuumlo run and verify if the index was OK before - will only get to do that on monday though. 

 
To follow up on this and share my research: The index was corrupt also in the version before the vacuumlo (and possibly much longer), so that was only the indicator, not the cause. 
pg_largeobject's pg_largeobject_loid_pn_index was OK, and we most certainly did not have an OID overflow as max(OID) was around 90 million, far from the 4 billion for the 32 bit rollover. So that indicated another problem source.

I was able to correlate the transaction ids (xmin) of the duplicate rows in pg_largeobject_meta with times stored in another table, and found the duplicates were all added on one specific day, while the originals nicely spread out over time. I was further able to trace that date back to a cluster upgrade (pg_upgradecluster) we performed with that machine, so I guess we have a pretty good candidate. After the upgrade, no more duplicates were produced.

As we have a way to recover, I'm not too concerned about this, but to add to the lessons learned: should there have been messages in the upgrade log or is this a known issue when upgrading? I'm pretty sure I checked the logs, but might have missed something if looking for the wrong things. 

Thanks,
Tobias

Re: Remove duplicated row in pg_largeobject_metadata

From
Tom Lane
Date:
Tobias Meyer <t9m@qad.com> writes:
> I was able to correlate the transaction ids (xmin) of the duplicate rows in
> pg_largeobject_meta with times stored in another table, and found the
> duplicates were all added on one specific day, while the originals nicely
> spread out over time. I was further able to trace that date back to a
> cluster upgrade (pg_upgradecluster) we performed with that machine, so I
> guess we have a pretty good candidate. After the upgrade, no more
> duplicates were produced.

Interesting.  Can you nail down the software versions that were in
use here?  That'd be the old PG server version upgraded from, the
new server version upgraded to, the versions of pg_upgrade and
pg_dump (these probably should match the new server version, but
I'm not certain we enforce that), and the pg_upgradecluster
script version?

            regards, tom lane



Re: Remove duplicated row in pg_largeobject_metadata

From
Tobias Meyer
Date:

Interesting.  Can you nail down the software versions that were in
use here?  That'd be the old PG server version upgraded from, the
new server version upgraded to, the versions of pg_upgrade and
pg_dump (these probably should match the new server version, but
I'm not certain we enforce that), and the pg_upgradecluster
script version?

Well, it was an upgrade from 9.5 to 10. Unfortunately I can't pin down any other or more exact versions any longer.
The command we used was "pg_upgradecluster 9.5 main -m upgrade -k -v 10"

Regards, Tobias