Re: Solving the OID-collision problem - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Solving the OID-collision problem
Date
Msg-id 15860.1123597883@sss.pgh.pa.us
Whole thread Raw
In response to Re: Solving the OID-collision problem  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Solving the OID-collision problem  (Richard Huxton <dev@archonet.com>)
List pgsql-hackers
Simon Riggs <simon@2ndquadrant.com> writes:
> We either need to have a special routine for each catalog table, or we
> scan all tables, all of the time. The latter is a disaster, so lets look
> at the former: spicing the code with appropriate catalog checks would be
> a lot of work and probably very error prone and hard to maintain. We
> would never be sure that any particular check had been done
> appropriately.

I don't think it's as bad as all that.  As my prototype showed, we only
need one base routine for this; the trick is to give it the pg_class
OIDs of the target catalog and the catalog's index on OID.
[ ... click click, grep grep ... ]  There are only eight calls to
newoid() in the backend, and in six of them we know exactly which
catalog we are inserting into and which index could be used to check
uniqueness.  One of them is actually generating a relfilenode value not
a normal OID, so we would need a special routine that looks into the
filesystem to check uniqueness, but that's no big deal.  The only call
that's at all problematic is the one in heap_insert --- there, we know
the target relation, but haven't got any direct access to knowledge
about whether it has an OID index.  There are several ways you could
deal with that.  The simplest is to just have a small constant table
someplace, listing the pg_class OIDs of all the catalogs that have OIDs
and the pg_class OIDs of their OID indexes.  This would be a little bit
of a maintenance gotcha (someone could add a new catalog having OIDs
and forget to add it to that table) but certainly there are many worse
gotchas than that in the system.

I was also toying with the idea of automating it: if the target table
has OIDs, look to see if it has a unique index on OID, and if so use
that.  (If we cache the result in Relation structures, this shouldn't be
too terribly expensive timewise --- in fact, we could make
RelationGetIndexList do and cache this check, which would make it
virtually free since if you're inserting you have certainly got to do
RelationGetIndexList somewhere along the line.)  The interesting thing
about that is that the guaranteed-unique-OID behavior would then be
available for user tables too, if we cared to document how to use it.
This might be gilding the lily though.

> 1. When we wrap we set up an OID Free Space Map. We do this once when we
> wrap, rather than every time we collide. We scan all catalog tables and
> set the bits in a single 8192 byte block and write it out to disk.
> We then allocate OIDs from completely untouched chunks,

What if there aren't any "untouched chunks"?  With only 64K-chunk
granularity, I think you'd hit that condition a lot more than you are
hoping.  Also, this seems to assume uniqueness across all tables in an
entire cluster, which is much more than we want; it makes the 32-bit
size of OIDs significantly more worrisome than when they only need to be
unique within a table.
        regards, tom lane


pgsql-hackers by date:

Previous
From: mark@mark.mielke.cc
Date:
Subject: Re: Simplifying wal_sync_method
Next
From: Dave Cramer
Date:
Subject: Re: MySQL to PostgreSQL for SugarCRM