Thread: Large object insert/update and oid use

Large object insert/update and oid use

From

Louis LeBlanc

Date:

31 January 2004, 12:03:34

Hey folks.  I'm new to the list, and not quite what you'd call a DB
Guru, so please be patient with me.  I'm afraid the lead up here is a
bit verbose . . .

I am working on an application that uses very high volume DB
transactions - in the order of tens of millions per day . . .

Anyway, the current database which will remain nameless, but begins
with O and rymes with debacle (sorta), has a problem with high volume
work when it comes to binary large objects and rowid use (or oid use
as I understand Postgres uses).

Here's the problem:

When a record containing a number of basic types (int, varchar(2048),
char(32), etc.) as well as a large object is updated with a LOB, we
used to use rowids returned in the update of the other fields.  The
rowid was then the condition used when selecting the locator for the
large object.

Unfortunately, when a rowid is returned, it is always the location of
the current location, but if the data in question  won't fit in the
current location, it is shifted to another rowid - but the application
can't really get notification of this.  Later, when the object is
inserted using that rowid, it gets put in the wrong place.  The result
is that sometimes, a request for one object will yield the wrong
object.

Of course, there are ways to simply update the whole record, object
and all, but there are bugs in the database version we are using that
causes the temp tablespace to fill up and not be reused when this
method is used, so we've had to change some of this stuff around to
use the key field as the condition, and selecting the locator after
the initial creation of the empty object.  The point is that the
direct updates of large objects have problems, as do the use of rowids
in high volume conditions.  Regardless, we've had to implement some
verification checks to ensure the objects don't get swapped - this is
done by prefixing the object with a 32 bit MD5 hash that is also
stored in a separate field of the same record.

I've been looking at the Postgres docs, and I've found the lo_*()
routines, which appear to use OIDs to write to the object.  Is there
any 'shifting' of records during insert because of space availability?
And, probably more important, is there a way to do a direct update of
a large object along with the other fields in a record - without
having to call the lo_creat() and lo_write() functions?  I've done
some searching in the archives, but can't find anything that appears
to answer the question.

If I've been unclear on any of this, I'm sorry, feel free to prod for
more info.

Thanks in advance

Lou
--
Louis LeBlanc               leblanc@keyslapper.org
Fully Funded Hobbyist, KeySlapper Extrordinaire :)
http://www.keyslapper.org                     Ô¿Ô¬

Scientists are people who build the Brooklyn Bridge and then buy it.
    -- William Buckley

Re: Large object insert/update and oid use

From

Tom Lane

Date:

31 January 2004, 15:01:03

Louis LeBlanc <db@keyslapper.org> writes:
> [ large objects don't work real well in That Other Database ]

Fascinating.  I'd have thought they'd use a less flaky design.

> I've been looking at the Postgres docs, and I've found the lo_*()
> routines, which appear to use OIDs to write to the object.  Is there
> any 'shifting' of records during insert because of space availability?

No, OIDs are stable identifiers.  It sounds like their rowid equates
to what we call a TID or CTID --- that is, a tuple's current physical
location.  CTID isn't fixed across updates in Postgres, but OID is.

> And, probably more important, is there a way to do a direct update of
> a large object along with the other fields in a record - without
> having to call the lo_creat() and lo_write() functions?

Well, I guess the question is how large are your BLOBs and what are you
doing with them exactly?

Postgres offers two different ways of dealing with large chunks of data.
One is just to store them as big text or bytea values (which data type
you use depends on whether the data is textual or not).  This works well
up to maybe a few megabytes, but it gets unwieldy for larger values,
mainly because there's no easy way to store sections of a value.  In
some cases you can use substr() to fetch sections of a large value,
but there's no equivalent means of writing just part of it.  Also, if
the data isn't textual then you have to deal with messy quoting rules
when constructing SQL commands.  (But as of 7.4 it's possible to
alleviate the quoting problem by using out-of-line parameters instead.)

The other way is to use lo_creat() and related functions.  In this case
each large object has its own identity (an OID assigned by lo_creat) and
what you put in your table row is just a reference to the object.  The
plus side is that you can read and write a large object in sections
(using lo_read/lo_write/lo_seek), the downside is that you have a more
complicated model of what the database structure is, and lots more
management headaches.  For instance you need to explicitly delete a
large object when you don't need it any more --- deleting a table row
doesn't in itself make referenced large objects go away.

Personally I'd avoid the large object facility if there were any chance
of dealing with the data as ordinary wide values instead.  It's just
easier.

            regards, tom lane

Re: Large object insert/update and oid use

From

"John Sidney-Woollett"

Date:

31 January 2004, 15:03:50

Apologies if I missed something, but why are you using the rowid in Oracle
instead of your own primary key value (even if you are inserting 10's of
millions of records a day). A number(38) provides a range of magnitude
1E-130 .. 10E125

Using a non rowid PK value would mean that you were not at the mercy of
the database moving your data record elsewhere if it cannot be accomodated
in the original block following an update.

If you use a number(38) PK value instead of a row ID, you are *sure* that
your record is accessible regardless of updates.

Of course if you wanted to avoid building/maintaining the number(38) PK
index then that is a different matter...

John Sidney-Woollett


Louis LeBlanc said:
> Hey folks.  I'm new to the list, and not quite what you'd call a DB
> Guru, so please be patient with me.  I'm afraid the lead up here is a
> bit verbose . . .
>
> I am working on an application that uses very high volume DB
> transactions - in the order of tens of millions per day . . .
>
> Anyway, the current database which will remain nameless, but begins
> with O and rymes with debacle (sorta), has a problem with high volume
> work when it comes to binary large objects and rowid use (or oid use
> as I understand Postgres uses).
>
> Here's the problem:
>
> When a record containing a number of basic types (int, varchar(2048),
> char(32), etc.) as well as a large object is updated with a LOB, we
> used to use rowids returned in the update of the other fields.  The
> rowid was then the condition used when selecting the locator for the
> large object.
>
> Unfortunately, when a rowid is returned, it is always the location of
> the current location, but if the data in question  won't fit in the
> current location, it is shifted to another rowid - but the application
> can't really get notification of this.  Later, when the object is
> inserted using that rowid, it gets put in the wrong place.  The result
> is that sometimes, a request for one object will yield the wrong
> object.
>
> Of course, there are ways to simply update the whole record, object
> and all, but there are bugs in the database version we are using that
> causes the temp tablespace to fill up and not be reused when this
> method is used, so we've had to change some of this stuff around to
> use the key field as the condition, and selecting the locator after
> the initial creation of the empty object.  The point is that the
> direct updates of large objects have problems, as do the use of rowids
> in high volume conditions.  Regardless, we've had to implement some
> verification checks to ensure the objects don't get swapped - this is
> done by prefixing the object with a 32 bit MD5 hash that is also
> stored in a separate field of the same record.
>
> I've been looking at the Postgres docs, and I've found the lo_*()
> routines, which appear to use OIDs to write to the object.  Is there
> any 'shifting' of records during insert because of space availability?
> And, probably more important, is there a way to do a direct update of
> a large object along with the other fields in a record - without
> having to call the lo_creat() and lo_write() functions?  I've done
> some searching in the archives, but can't find anything that appears
> to answer the question.
>
> If I've been unclear on any of this, I'm sorry, feel free to prod for
> more info.
>
> Thanks in advance
>
> Lou
> --
> Louis LeBlanc               leblanc@keyslapper.org
> Fully Funded Hobbyist, KeySlapper Extrordinaire :)
> http://www.keyslapper.org                     Ô¿Ô¬
>
> Scientists are people who build the Brooklyn Bridge and then buy it.
>     -- William Buckley
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>

Re: Large object insert/update and oid use

From

Louis LeBlanc

Date:

31 January 2004, 15:41:36

On 01/31/04 06:57 PM, John Sidney-Woollett sat at the `puter and typed:
> Apologies if I missed something, but why are you using the rowid in
> Oracle instead of your own primary key value (even if you are
> inserting 10's of millions of records a day). A number(38) provides
> a range of magnitude 1E-130 .. 10E125

Ah, good question.  Mainly because I didn't write the initial
implementation.  As I understand it, my predecessor was under the
impression that rowid use was much faster than using the records
existing key, which is a varchar(2048).  Now that I've changed the
code to use the key rather than the rowid, it's not any slower
(faster, in fact, though that may be attributable to any number of
other things . . .).  On top of that, the key is pretty much
guaranteed to be unique.

> Using a non rowid PK value would mean that you were not at the mercy of
> the database moving your data record elsewhere if it cannot be accomodated
> in the original block following an update.

Exactly.  Which is why we don't use rowids at all anymore.

> If you use a number(38) PK value instead of a row ID, you are *sure* that
> your record is accessible regardless of updates.

We thought of that, O is able to generate them through triggers, and
they should be uniqueue enough, but probably not worth the effort
since the tables pk isn't any slower.

> Of course if you wanted to avoid building/maintaining the number(38) PK
> index then that is a different matter...

That isn't so much the problem as trying to stop the swapped LOBs.
Ideally, the LOB would be included in the update when all the data is
collected.  Here's how it goes:

Get a key (varchar(2048)) and create a placeholder record to prevent
    other threads wasting effort on it (this is autocommitted).
Collect all the data for that key, including the object for the LOB.
Open an atomic transaction
  Update the record with an empty LOB and the other data
  Select the LOB locator using the pk
  Write the object to the LOB locator
Close the atomic transaction, which commits.

The first step of this three step transaction used to return the
rowid, but from time to time it would also result in a shift of the
record to another rowid without notifying the app.  The second step
would use that rowid rather than the pk to write the object in.

Ideally, the three step atomic transaction would be reduced to a
single step transaction.  This is possible in O, but 8.1.7 has a bug
that causes temp tablespace to be eaten up and not freed.

That's what I'm trying to do in Postgres.

So, can it be done?

Thanks

Lou

> Louis LeBlanc said:
> > Hey folks.  I'm new to the list, and not quite what you'd call a DB
> > Guru, so please be patient with me.  I'm afraid the lead up here is a
> > bit verbose . . .
> >
> > I am working on an application that uses very high volume DB
> > transactions - in the order of tens of millions per day . . .
> >
> > Anyway, the current database which will remain nameless, but begins
> > with O and rymes with debacle (sorta), has a problem with high volume
> > work when it comes to binary large objects and rowid use (or oid use
> > as I understand Postgres uses).
> >
> > Here's the problem:
> >
> > When a record containing a number of basic types (int, varchar(2048),
> > char(32), etc.) as well as a large object is updated with a LOB, we
> > used to use rowids returned in the update of the other fields.  The
> > rowid was then the condition used when selecting the locator for the
> > large object.
> >
> > Unfortunately, when a rowid is returned, it is always the location of
> > the current location, but if the data in question  won't fit in the
> > current location, it is shifted to another rowid - but the application
> > can't really get notification of this.  Later, when the object is
> > inserted using that rowid, it gets put in the wrong place.  The result
> > is that sometimes, a request for one object will yield the wrong
> > object.
> >
> > Of course, there are ways to simply update the whole record, object
> > and all, but there are bugs in the database version we are using that
> > causes the temp tablespace to fill up and not be reused when this
> > method is used, so we've had to change some of this stuff around to
> > use the key field as the condition, and selecting the locator after
> > the initial creation of the empty object.  The point is that the
> > direct updates of large objects have problems, as do the use of rowids
> > in high volume conditions.  Regardless, we've had to implement some
> > verification checks to ensure the objects don't get swapped - this is
> > done by prefixing the object with a 32 bit MD5 hash that is also
> > stored in a separate field of the same record.
> >
> > I've been looking at the Postgres docs, and I've found the lo_*()
> > routines, which appear to use OIDs to write to the object.  Is there
> > any 'shifting' of records during insert because of space availability?
> > And, probably more important, is there a way to do a direct update of
> > a large object along with the other fields in a record - without
> > having to call the lo_creat() and lo_write() functions?  I've done
> > some searching in the archives, but can't find anything that appears
> > to answer the question.
> >
> > If I've been unclear on any of this, I'm sorry, feel free to prod for
> > more info.
> >
> > Thanks in advance
> >
> > Lou
> > --
> > Louis LeBlanc               leblanc@keyslapper.org
> > Fully Funded Hobbyist, KeySlapper Extrordinaire :)
> > http://www.keyslapper.org                     Ô¿Ô¬
> >
> > Scientists are people who build the Brooklyn Bridge and then buy it.
> >     -- William Buckley
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> >       subscribe-nomail command to majordomo@postgresql.org so that your
> >       message can get through to the mailing list cleanly
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>
>

--
Louis LeBlanc               leblanc@keyslapper.org
Fully Funded Hobbyist, KeySlapper Extrordinaire :)
http://www.keyslapper.org                     Ô¿Ô¬

Federal grants are offered for... research into the recreation
potential of interplanetary space travel for the culturally disadvantaged.

Re: Large object insert/update and oid use

From

Louis LeBlanc

Date:

31 January 2004, 15:57:44

On 01/31/04 02:00 PM, Tom Lane sat at the `puter and typed:
> Louis LeBlanc <db@keyslapper.org> writes:
> > [ large objects don't work real well in That Other Database ]
>
> Fascinating.  I'd have thought they'd use a less flaky design.

No doubt.  I haven't had the chance to look at 9i or 10g, but 8.1.7 is
pretty bogus in that respect.

> > I've been looking at the Postgres docs, and I've found the lo_*()
> > routines, which appear to use OIDs to write to the object.  Is there
> > any 'shifting' of records during insert because of space availability?
>
> No, OIDs are stable identifiers.  It sounds like their rowid equates
> to what we call a TID or CTID --- that is, a tuple's current physical
> location.  CTID isn't fixed across updates in Postgres, but OID is.

Nice.  Good to know.

> > And, probably more important, is there a way to do a direct update of
> > a large object along with the other fields in a record - without
> > having to call the lo_creat() and lo_write() functions?
>
> Well, I guess the question is how large are your BLOBs and what are you
> doing with them exactly?

Well, we have a couple that are in the 650M range (CD ISOs), but the
average size is around 4K.  They are retrieved from the database via a
server process, and inserted via a client process.  Can't go into too
much more, but basically, they get stored in the core and kept while
they are useful (rollbacks for the table in question are unnecessary,
and O won't let you turn them off either - can you turn them off in
Postgres?).

> Postgres offers two different ways of dealing with large chunks of data.
> One is just to store them as big text or bytea values (which data type
> you use depends on whether the data is textual or not).  This works well
> up to maybe a few megabytes, but it gets unwieldy for larger values,
> mainly because there's no easy way to store sections of a value.  In
> some cases you can use substr() to fetch sections of a large value,
> but there's no equivalent means of writing just part of it.  Also, if
> the data isn't textual then you have to deal with messy quoting rules
> when constructing SQL commands.  (But as of 7.4 it's possible to
> alleviate the quoting problem by using out-of-line parameters instead.)

Sometimes the data is textual, but sometimes it isn't.  It can be an
iso image, a graphic image file, an executable, literally anything.
Any datatype that requires escaping certain characters would be
impractical because of the potential size variation.  Fetching part or
all of the object is important so that large objects can be retrieved
in parts, like a continued ftp transaction would do.  Also, the
objects are usually written piecemeal when they exceed the 10M mark.
I suppose this could be done with an import though . . .

> The other way is to use lo_creat() and related functions.  In this case
> each large object has its own identity (an OID assigned by lo_creat) and
> what you put in your table row is just a reference to the object.  The
> plus side is that you can read and write a large object in sections
> (using lo_read/lo_write/lo_seek), the downside is that you have a more
> complicated model of what the database structure is, and lots more
> management headaches.  For instance you need to explicitly delete a
> large object when you don't need it any more --- deleting a table row
> doesn't in itself make referenced large objects go away.

This still sounds like our current model except for the fact that the
OID has to be explicitly deleted.  I assume a trigger could be set up
to perform this on the fly though?

> Personally I'd avoid the large object facility if there were any chance
> of dealing with the data as ordinary wide values instead.  It's just
> easier.

Agreed, but probably not feasable when you consider that some of these
objects could eventually reach the Gigabyte threshold.

Thanks for the feedback.
Lou
--
Louis LeBlanc               leblanc@keyslapper.org
Fully Funded Hobbyist, KeySlapper Extrordinaire :)
http://www.keyslapper.org                     Ô¿Ô¬

Water, taken in moderation cannot hurt anybody.
    -- Mark Twain

Re: Large object insert/update and oid use

From

Tom Lane

Date:

31 January 2004, 16:42:35

Louis LeBlanc <db@keyslapper.org> writes:
>> For instance you need to explicitly delete a
>> large object when you don't need it any more --- deleting a table row
>> doesn't in itself make referenced large objects go away.

> This still sounds like our current model except for the fact that the
> OID has to be explicitly deleted.  I assume a trigger could be set up
> to perform this on the fly though?

Yeah, as long as you keep things simple (no multiple references to
BLOBs) you can just add an ON DELETE trigger to handle that.

Given that this model is close to what you're already doing, large
objects probably are the way to go.  Good luck with it.

            regards, tom lane