Thread: Re: [GENERAL] Using oids

Re: [GENERAL] Using oids

From
"Shridhar Daithankar"
Date:
On 3 Sep 2003 at 11:28, Bo Lorentsen wrote:

> On Wed, 2003-09-03 at 11:10, Shridhar Daithankar wrote:
>
> > Yes. It is correct. As of 7.3.x and onwards oids are optional at table creation
> > times. They default to be available for new objects but that is for backwards
> > compatibility I believe. In future, they would default to be not available for
> > a particular object(hopefully). Right now you need to explicitly specify no
> > oids while creating tables etc.
> I do understand the limitation of the oid as implimented now, but why
> remove the possibility to make unique row references (like after an
> insert), instead of extenting the oid ?
>
> I don't care if they wrap, or is a unique string or anything else, as
> long as I can use it to refetch a row after an insert, without keeping
> track of app. implimentation specific SERIAL fields.

Well, what I do is, declare a serate sequence, retrive next available value and
explicitly insert it into a integer field. That avoids having to retrieve the
latest value again.

I don't know if this is a widespread practice but I find it useful for more
than one way in the environment in which I program.
>
> > About oids not being unique, oids can assume 4 billion different values. If you
> > have more than those many rows in a table, oids will wrap around and will no
> > longer be unique in that object.
> I see that this is a problem, and 4 billion is not alot, but why not
> make another format like in oracle, so that it still is possible to
> refere to a row using a unique --- thing. There must be an internal oid
> somewhere, what we may be able to use.

I understand. With growing use of 64 bit hardware, 4 billion will be history
pretty soon.

However historically oids were assumed to be 32 bit. There could be places
which unintentionally assumed it as such. Cleaning all those places is pretty
difficult given the big code base postgresql has.

If you compile postgresql with Oid as 64 bit integer, that will work in most
cases probably. However it does not guarantee that it will always work. There
always could be some places which assumed 32 bit data types.

That is one of the problem as I understand.

(Gathered and extended from one of Tom's post. correct me if I am wrong)

> > About oids being eliminated, I am sure it would happen some time in the future,
> > looking at the development on this issue. Core team could elaborate more on
> > this.
> Sounds like a sad thing, if this is not replaced by something simular,
> but more fit to a large DB.

You can request this to be a TODO to hackers. They will decide if this is worth
having it. Personally I support it the way it is. Optionally available.

For large table containing billions of rows, Oids add to tuple size and overall
IO. If you are not using Oids, they become overhead. Ability to turn them off
is certainly nice..

Bye
 Shridhar

--
Unfair animal names:-- tsetse fly            -- bullhead-- booby            -- duck-billed
platypus-- sapsucker            -- Clarence        -- Gary Larson


Re: [GENERAL] Using oids

From
Martijn van Oosterhout
Date:
On Wed, Sep 03, 2003 at 12:20:42PM +0200, Bo Lorentsen wrote:
> On Wed, 2003-09-03 at 11:38, Shridhar Daithankar wrote:
>
> > Well, what I do is, declare a serate sequence, retrive next available value and
> > explicitly insert it into a integer field. That avoids having to retrieve the
> > latest value again.
> Yeps, this is what I call an application specific implimentation, as one
> can't do this at a more genral layer (that does not know about your
> table layout).

But your insert function needs to know something about the table it's
inserting into. The sequences have quite predicatable names. Besides, you
can set the name yourself (DCL does this IIRC).

> Like having a general function that insert a row and return the newly
> inserted row, containing the defaults set by PG. My code contain this
> function (http://www.lue.dk/prj/dbc), and I have no way to make this
> work if I'm not able to fetch the oid after an insert, in some way.

The only thing you need to know is the name of the primary key field. This
many be a problem in a generic layer. If you like you can make a UNIQUE
INDEX on the oid column and retry inserts when they fail.

In your code, do create an index on the OID column? If not, that's be a
performance hit,

> > However historically oids were assumed to be 32 bit. There could be places
> > which unintentionally assumed it as such. Cleaning all those places is pretty
> > difficult given the big code base postgresql has.
> One day I will try to look after myself, but what does PG do internaly,
> when referring to rows in a unique way (lets say in an index) ?

Something called CTID usually (AFAIK). It's a (block,tuple) pair. It changes
whenever you do an update. Or a vacuum.

> > For large table containing billions of rows, Oids add to tuple size and overall
> > IO. If you are not using Oids, they become overhead. Ability to turn them off
> > is certainly nice..
> Yeps, if they really are not nessesary.

If I were doing it would extract the primary key of each table on startup
and then change that one line of code to:

os << "SELECT * FROM " << sTable << " WHERE "
   << prikey << " = currval('" << sTable << "_" << prikey << "_seq')";

Say you have a LAST_ID function and you a table with more than one sequence,
which would it return?

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "All that is needed for the forces of evil to triumph is for enough good
> men to do nothing." - Edmond Burke
> "The penalty good people pay for not being interested in politics is to be
> governed by people worse than themselves." - Plato

Attachment

Re: [GENERAL] Using oids

From
Martijn van Oosterhout
Date:
On Wed, Sep 03, 2003 at 01:47:01PM +0200, Bo Lorentsen wrote:
> On Wed, 2003-09-03 at 13:19, Martijn van Oosterhout wrote:
> > The only thing you need to know is the name of the primary key field. This
> > many be a problem in a generic layer. If you like you can make a UNIQUE
> > INDEX on the oid column and retry inserts when they fail.
> Hmm, it all end up putting alot of information to a lower layer, and
> this is sad as PG already knows, but it may not tell me.

Well, in a sense it know and in a sense it doesn't. Sequences are not
considered special in terms of returning data to the client. It's just
another function from the parser's point of view.

> > In your code, do create an index on the OID column? If not, that's be a
> > performance hit,
> I'm not sure what you mean !

If you know the OID of a row, PostgreSQL doesn't have a special lookup table
to find it. That's also why they're not unique; the backend would have to
scan through every table to find out if the next one is available.

So, unless you specifically add an index to the table, looking up by OID
will always trigger a sequential scan.

That said, there is no reason why someone couldn't create a last_sequence()
function so you could say SELECT currval( last_sequence() ). Ofcourse, if
your table has no SERIAL field, you're stuffed either way.
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "All that is needed for the forces of evil to triumph is for enough good
> men to do nothing." - Edmond Burke
> "The penalty good people pay for not being interested in politics is to be
> governed by people worse than themselves." - Plato

Attachment

Re: [GENERAL] Using oids

From
Ashley Cambrell
Date:
> That said, there is no reason why someone couldn't create a last_sequence()
> function so you could say SELECT currval( last_sequence() ). Ofcourse, if
> your table has no SERIAL field, you're stuffed either way.

Instead of SELECT currval( last_sequence() ), what about implementing
oracl type binding?

Ala
http://groups.google.com.au/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&th=5cc63a569577d024#link2

With the new FE/BE changes, how easy would it be to implement? (I
mentioned it the FE/BE discussions)

Ashley Cambrell



Re: [GENERAL] Using oids

From
Bo Lorentsen
Date:
On Wed, 2003-09-03 at 11:38, Shridhar Daithankar wrote:

> Well, what I do is, declare a serate sequence, retrive next available value and
> explicitly insert it into a integer field. That avoids having to retrieve the
> latest value again.
Yeps, this is what I call an application specific implimentation, as one
can't do this at a more genral layer (that does not know about your
table layout).

Like having a general function that insert a row and return the newly
inserted row, containing the defaults set by PG. My code contain this
function (http://www.lue.dk/prj/dbc), and I have no way to make this
work if I'm not able to fetch the oid after an insert, in some way.

> I don't know if this is a widespread practice but I find it useful for more
> than one way in the environment in which I program.
You are not the only one recommenting this solution :-)

> I understand. With growing use of 64 bit hardware, 4 billion will be history
> pretty soon.
Agreed !

> However historically oids were assumed to be 32 bit. There could be places
> which unintentionally assumed it as such. Cleaning all those places is pretty
> difficult given the big code base postgresql has.
One day I will try to look after myself, but what does PG do internaly,
when referring to rows in a unique way (lets say in an index) ?

> If you compile postgresql with Oid as 64 bit integer, that will work in most
> cases probably. However it does not guarantee that it will always work. There
> always could be some places which assumed 32 bit data types.
But if convertet to a string type, all involved places would fail, and
no uncertency are involved when fixing it (well, a little too primitive
argument, I know) :-)

> You can request this to be a TODO to hackers. They will decide if this is worth
> having it. Personally I support it the way it is. Optionally available.
I'm not sure, if my skills reach this fare, but thanks for the advice.
And for the rest --- we disagree :-)

> For large table containing billions of rows, Oids add to tuple size and overall
> IO. If you are not using Oids, they become overhead. Ability to turn them off
> is certainly nice..
Yeps, if they really are not nessesary.

/B


Re: [GENERAL] Using oids

From
Bo Lorentsen
Date:
On Wed, 2003-09-03 at 13:19, Martijn van Oosterhout wrote:

> But your insert function needs to know something about the table it's
> inserting into. The sequences have quite predicatable names. Besides, you
> can set the name yourself (DCL does this IIRC).
No it don't know anything about the table it insert into. I simply do
the following :

1. INSERT data (comming from another layer)
2. Get the last oid
3. SELECT * FROM the same table where oid = what I just found.

I know absolutly nothing about the table, and I like it this way :-)

> The only thing you need to know is the name of the primary key field. This
> many be a problem in a generic layer. If you like you can make a UNIQUE
> INDEX on the oid column and retry inserts when they fail.
Hmm, it all end up putting alot of information to a lower layer, and
this is sad as PG already knows, but it may not tell me.

> In your code, do create an index on the OID column? If not, that's be a
> performance hit,
I'm not sure what you mean !

> Something called CTID usually (AFAIK). It's a (block,tuple) pair. It changes
> whenever you do an update. Or a vacuum.
So no id for a row, but only for a row instance ? Is this the reason for
the growing index files ?

> If I were doing it would extract the primary key of each table on startup
> and then change that one line of code to:
>
> os << "SELECT * FROM " << sTable << " WHERE "
>    << prikey << " = currval('" << sTable << "_" << prikey << "_seq')";
Thanks, but I have to be aware of the "prikey" name, and demand a prikey
for all tables to insert row into :-(

> Hope this helps,

I know what you mean, but I don't like the impact of the solution.

/BL


Re: [GENERAL] Using oids

From
Bo Lorentsen
Date:
On Wed, 2003-09-03 at 17:28, Martijn van Oosterhout wrote:

> If you know the OID of a row, PostgreSQL doesn't have a special lookup table
> to find it. That's also why they're not unique; the backend would have to
> scan through every table to find out if the next one is available.
Ahh, thats not nice, hav'nt checked that, yet.

> So, unless you specifically add an index to the table, looking up by OID
> will always trigger a sequential scan.
I thought it was much more easy for PG to find these, but I quess ctid
are the one that is fast to find.

> That said, there is no reason why someone couldn't create a last_sequence()
> function so you could say SELECT currval( last_sequence() ). Ofcourse, if
> your table has no SERIAL field, you're stuffed either way.
Not as nice as oid's.

/BL