Thread: An Idea for OID conflicts

An Idea for OID conflicts

From
Gevik Babakhani
Date:
Folks,

I would like to have your thoughts on a solution for the duplicate OIDs.

I wanted to apply the uuid patch on a newly download source from the
CVS. Patching and "make install" went just okay but "make check" and
initdb failed to my surprise. A quick look at duplicate_oids showed that
almost all of the OIDs I got available from unused_oids were already in
use. So naturally I went "!!?! segmentation fault **!!?".

I can only imagine how time consuming this can be for the committers to
correct these manually.

I think we can solve this problem with the combination of the following
three steps.

1. When using new OIDs always start from a fixed number. For example
10000. This way the new OIDs are easy to recognize and the developer can
continue the work. 
2. Always use the new OIDs sequentially.
3. Make a small utility that goes through a patch, finds the new OIDs
and changes them back to a value specified by the committer(s).

Would this be workable?

Regards,
Gevik.





Re: An Idea for OID conflicts

From
Andrew Dunstan
Date:
Gevik Babakhani wrote:
> Folks,
>
> I would like to have your thoughts on a solution for the duplicate OIDs.
>
> I wanted to apply the uuid patch on a newly download source from the
> CVS. Patching and "make install" went just okay but "make check" and
> initdb failed to my surprise. A quick look at duplicate_oids showed that
> almost all of the OIDs I got available from unused_oids were already in
> use. So naturally I went "!!?! segmentation fault **!!?".
>
> I can only imagine how time consuming this can be for the committers to
> correct these manually.
>
> I think we can solve this problem with the combination of the following
> three steps.
>
> 1. When using new OIDs always start from a fixed number. For example
> 10000. This way the new OIDs are easy to recognize and the developer can
> continue the work. 
> 2. Always use the new OIDs sequentially.
> 3. Make a small utility that goes through a patch, finds the new OIDs
> and changes them back to a value specified by the committer(s).
>
> Would this be workable?
>
>   

My idea was to have a file called reserved_oids.h which would contain 
lines like:

#error "do not include this file anywhere"
CATALOG(reserved_for_foo_module,9876) /* 2006-09-18 */

and which would be examined by the unused_oids script.

To get oids reserved, a committer would add lines to that file for you. 
When your patch was done, it would include stuff to remove those lines 
from the file.

That way you will be working with the oids your patch will eventually 
use - no later fixup necessary.

Every release the file would be cleaned of old entries if necessary.

cheers

andrew





Re: An Idea for OID conflicts

From
Tom Lane
Date:
Gevik Babakhani <pgdev@xs4all.nl> writes:
> 3. Make a small utility that goes through a patch, finds the new OIDs
> and changes them back to a value specified by the committer(s).

> Would this be workable?

That utility sounds AI-complete to me ...
        regards, tom lane


Re: An Idea for OID conflicts

From
Gregory Stark
Date:
Gevik Babakhani <pgdev@xs4all.nl> writes:

> 1. When using new OIDs always start from a fixed number. For example
> 10000. This way the new OIDs are easy to recognize and the developer can
> continue the work. 

Reserving a range of OIDs for experimentation seems like a good idea since it
means nobody's development tree would ever be broken by a cvs update. At least
not because their OIDs were pulled out from under them.

But I had another thought. It seems like a lot of the catalog include files
don't really need to be defined so laboriously. Just because types and
operators are in the core doesn't mean they need to be boostrapped using fixed
OIDs and C code.

Those types, functions and operators that aren't used by system tables could
be created by a simple SQL script instead. It's a hell of a lot easier to
write a CREATE OPERATOR CLASS call than to get all the OIDs in in four
different include files to line up properly.

This would make it a whole lot more practical to define all those smallfoo
data types we were discussing too with all the cross-data-type comparison
operators as well.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: An Idea for OID conflicts

From
Gevik Babakhani
Date:
On Mon, 2006-09-18 at 14:36 -0400, Tom Lane wrote:
> Gevik Babakhani <pgdev@xs4all.nl> writes:
> > 3. Make a small utility that goes through a patch, finds the new OIDs
> > and changes them back to a value specified by the committer(s).
> 
> > Would this be workable?
> 
> That utility sounds AI-complete to me ...

Maybe I am missing something here but if the developer used the OIDs
sequentially it won't be that difficult to find and replace the used
range within the patch.









Re: An Idea for OID conflicts

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> Those types, functions and operators that aren't used by system tables could
> be created by a simple SQL script instead.

Only if you don't need to know their OIDs anywhere in the C code.  I'm
not certain offhand how many of the non-core objects are so referenced.
We have in any case got a problem with changing OIDs for any built-in
types, because there are a number of clients out there with hard-wired
knowledge of what the type OIDs returned in query results mean.

T'would be nice to simplify the operator class setup though ...
        regards, tom lane


Re: An Idea for OID conflicts

From
Neil Conway
Date:
On Mon, 2006-09-18 at 14:10 -0400, Andrew Dunstan wrote:
> My idea was to have a file called reserved_oids.h which would contain 
> lines like:
> 
> #error "do not include this file anywhere"
> CATALOG(reserved_for_foo_module,9876) /* 2006-09-18 */
> 
> and which would be examined by the unused_oids script.

Or you needn't even use a header file -- presumably it wouldn't actually
be included by any C source files, so you could just keep the reserved
OIDs as a newline-separated text file. We'd probably want to design the
file format to avoid unnecessary patch conflicts when reserved OID
ranges are removed.

I wonder if the additional committer burden of maintaining this file is
worth it -- personally this problem rarely occurs in practice for me,
and when it does, it is easily resolved via the existing scripts for
finding duplicate and unused OIDs.

-Neil




Re: An Idea for OID conflicts

From
Tom Dunstan
Date:
Whoops, I hadn't read this far when I sent the last message on the other
thread.

> Gevik Babakhani wrote:
>> 3. Make a small utility that goes through a patch, finds the new OIDs
>> and changes them back to a value specified by the committer(s).

This is effectively what I ended up suggesting in the aforementioned
post. There's no reason that the OIDs would have to start at 10000,
though, and that would probably necessitate a change in what I
understand is the policy of starting a db cluster at OID 10000. OTOH,
while waiting for patch acceptance, there's probably benefit in using
OIDs well above the current first free OID: you don't have to repeatedly
run the script. If you started at 9000, say, you'd only have to run the
update script when you were about to submit the patch. If you're right
on the line, you'd have to run it every week or whatever.

>> Would this be workable?

Well, *I* think so, with the suggestions made above.

Andrew Dunstan wrote:
> My idea was to have a file called reserved_oids.h which would contain 
> lines like:
> 
> #error "do not include this file anywhere"
> CATALOG(reserved_for_foo_module,9876) /* 2006-09-18 */
> 
> and which would be examined by the unused_oids script.
> 
> To get oids reserved, a committer would add lines to that file for you. 
> When your patch was done, it would include stuff to remove those lines 
> from the file.
> 
> That way you will be working with the oids your patch will eventually 
> use - no later fixup necessary.

This sounds like a recipe for gaps in the OID list to me. If the patch
gets rejected / dropped, you've got a gap. If the user doesn't use all
the allocated OIDs, you've got a gap. What happens if the patch author
decides that they need more OIDs? More time from a committer to reserve
some. Plus it has the downside that the author has to ask for some to be
reserved up front, they need to have a very good idea of how many
they'll need, and a committer has to agree with them. If you'd asked me
how many I thought I'd need when I started the enum patch, I would have
told you a number much smaller than the number that I ended up using. :)

Of course, if you don't care about gaps, then half of the objections
above go away, and the other half can be solved by reserving more than
you'd need. Given the fairly contiguous nature of OIDs in the catalogs
currently, it would appear that we do care about gaps, though.

I like the script idea much better. It wouldn't be bad to even allow
patches to be submitted with OIDs in the high 9000 or whatever range.
The committer responsible for committing the patch could just run the
update script before comitting the code.

Cheers

Tom



Re: An Idea for OID conflicts

From
"Jim C. Nasby"
Date:
On Mon, Sep 18, 2006 at 07:46:41PM +0100, Gregory Stark wrote:
> 
> Gevik Babakhani <pgdev@xs4all.nl> writes:
> 
> > 1. When using new OIDs always start from a fixed number. For example
> > 10000. This way the new OIDs are easy to recognize and the developer can
> > continue the work. 
> 
> Reserving a range of OIDs for experimentation seems like a good idea since it
> means nobody's development tree would ever be broken by a cvs update. At least
> not because their OIDs were pulled out from under them.
> 
> But I had another thought. It seems like a lot of the catalog include files
> don't really need to be defined so laboriously. Just because types and
> operators are in the core doesn't mean they need to be boostrapped using fixed
> OIDs and C code.
> 
> Those types, functions and operators that aren't used by system tables could
> be created by a simple SQL script instead. It's a hell of a lot easier to
> write a CREATE OPERATOR CLASS call than to get all the OIDs in in four
> different include files to line up properly.

If there's 4 different files involved ISTM it'd be best to have a script
that generates those 4 files from a master list. This is something I
should be able to create if there's interest.

Though I do agree that moving things to SQL where possible probably
makes the most sense...
-- 
Jim Nasby                                    jimn@enterprisedb.com
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)


Re: An Idea for OID conflicts

From
Tom Dunstan
Date:
Gregory Stark wrote:
> Those types, functions and operators that aren't used by system tables could
> be created by a simple SQL script instead. It's a hell of a lot easier to
> write a CREATE OPERATOR CLASS call than to get all the OIDs in in four
> different include files to line up properly.

No kidding.

Just FYI that wouldn't have worked for the enums patch, though, because 
of the pseudo anyenum type. That stuff really did need to be in the 
backend. For more common user defined types like uuid that are being 
discussed, it might work well. Heck, a bunch of the existing casts etc 
could probably be changed to SQL, and would become a great deal more 
readable in the process. Not that I'm advocating fixing a non-broken 
thing... :)

Cheers

Tom


Re: An Idea for OID conflicts

From
Tom Lane
Date:
Tom Dunstan <pgsql@tomd.cc> writes:
> [ some good arguments snipped ]

> I like the script idea much better. It wouldn't be bad to even allow
> patches to be submitted with OIDs in the high 9000 or whatever range.
> The committer responsible for committing the patch could just run the
> update script before comitting the code.

The scary thing about a script is the assumption that it will make all
and only the changes needed.  Four-digit magic numbers are not that
uncommon in C code.  I think it might be safer if we made the arbitrary
OID range for an uncommitted patch be large, say eight digits (maybe
"12345xxx").  The script would knock these down to 4-digit numbers,
ie removing one tab stop, so it wouldn't be too hard on your formatting.
Given the current OID generation mechanism, the presence of large OIDs
in the initial catalogs shouldn't be a problem for testing patches,
even assuming that the OID counter gets that high in your test database.
        regards, tom lane


Re: An Idea for OID conflicts

From
Tom Dunstan
Date:
Tom Lane wrote:
> The scary thing about a script is the assumption that it will make all
> and only the changes needed.  Four-digit magic numbers are not that
> uncommon in C code.  I think it might be safer if we made the arbitrary
> OID range for an uncommitted patch be large, say eight digits (maybe
> "12345xxx").  The script would knock these down to 4-digit numbers,
> ie removing one tab stop, so it wouldn't be too hard on your formatting.
> Given the current OID generation mechanism, the presence of large OIDs
> in the initial catalogs shouldn't be a problem for testing patches,
> even assuming that the OID counter gets that high in your test database.

Well, the only files that a script would touch would be in 
src/include/catalog, since any other reference to them should be through 
a #define anyway IMO. And I figured that the 9000 range was as good a 
magic number as any, at least in that directory. The nice thing about 
9000 numbers is that they're still under the 10000 magic starting point 
for initdb, so you're guaranteed not to run into that range when 
inserting data while testing your patch. I'm a little shady on how OID 
uniqueness works though, so maybe it's not a problem. Are OIDs 
guaranteed to be unique across a whole cluster, or just in a given 
relation (including wraparound scenarios)?

The other point about the script scariness is that obviously you'd have 
to a) pass make check including whatever tests test out your patch (and 
there should be some if you've added a new proc or type or whatever), 
and b) the patch would have to survive review, where OID weirdness 
should leap out when the patch is viewed stripped of all the surrounding 
catalog noise. Maybe. :)

Anyway if having OIDs up above 10000 isn't a problem, then it doesn't 
really matter either way, so having them stand out by being longer seems 
just as good to me as my suggestion.

Thanks

Tom




Re: An Idea for OID conflicts

From
Tom Dunstan
Date:
Whoops, I hadn't read this far when I sent the last message on the other 
thread.

> Gevik Babakhani wrote:
>> 3. Make a small utility that goes through a patch, finds the new OIDs
>> and changes them back to a value specified by the committer(s).

This is effectively what I ended up suggesting in the aforementioned 
post. There's no reason that the OIDs would have to start at 10000, 
though, and that would probably necessitate a change in what I 
understand is the policy of starting a db cluster at OID 10000. OTOH, 
while waiting for patch acceptance, there's probably benefit in using 
OIDs well above the current first free OID: you don't have to repeatedly 
run the script. If you started at 9000, say, you'd only have to run the 
update script when you were about to submit the patch. If you're right 
on the line, you'd have to run it every week or whatever.

>> Would this be workable?

Well, *I* think so, with the suggestions made above.

Andrew Dunstan wrote:
> My idea was to have a file called reserved_oids.h which would contain 
> lines like:
> 
> #error "do not include this file anywhere"
> CATALOG(reserved_for_foo_module,9876) /* 2006-09-18 */
> 
> and which would be examined by the unused_oids script.
> 
> To get oids reserved, a committer would add lines to that file for you. 
> When your patch was done, it would include stuff to remove those lines 
> from the file.
> 
> That way you will be working with the oids your patch will eventually 
> use - no later fixup necessary.

This sounds like a recipe for gaps in the OID list to me. If the patch 
gets rejected / dropped, you've got a gap. If the user doesn't use all 
the allocated OIDs, you've got a gap. What happens if the patch author 
decides that they need more OIDs? More time from a committer to reserve 
some. Plus it has the downside that the author has to ask for some to be 
reserved up front, they need to have a very good idea of how many 
they'll need, and a committer has to agree with them. If you'd asked me 
how many I thought I'd need when I started the enum patch, I would have 
told you a number much smaller than the number that I ended up using. :)

Of course, if you don't care about gaps, then half of the objections 
above go away, and the other half can be solved by reserving more than 
you'd need. Given the fairly contiguous nature of OIDs in the catalogs 
currently, it would appear that we do care about gaps, though.

I like the script idea much better. It wouldn't be bad to even allow 
patches to be submitted with OIDs in the high 9000 or whatever range. 
The committer responsible for committing the patch could just run the 
update script before comitting the code.

Cheers

Tom