Thread: Large object insert/update and oid use
Hey folks. I'm new to the list, and not quite what you'd call a DB Guru, so please be patient with me. I'm afraid the lead up here is a bit verbose . . . I am working on an application that uses very high volume DB transactions - in the order of tens of millions per day . . . Anyway, the current database which will remain nameless, but begins with O and rymes with debacle (sorta), has a problem with high volume work when it comes to binary large objects and rowid use (or oid use as I understand Postgres uses). Here's the problem: When a record containing a number of basic types (int, varchar(2048), char(32), etc.) as well as a large object is updated with a LOB, we used to use rowids returned in the update of the other fields. The rowid was then the condition used when selecting the locator for the large object. Unfortunately, when a rowid is returned, it is always the location of the current location, but if the data in question won't fit in the current location, it is shifted to another rowid - but the application can't really get notification of this. Later, when the object is inserted using that rowid, it gets put in the wrong place. The result is that sometimes, a request for one object will yield the wrong object. Of course, there are ways to simply update the whole record, object and all, but there are bugs in the database version we are using that causes the temp tablespace to fill up and not be reused when this method is used, so we've had to change some of this stuff around to use the key field as the condition, and selecting the locator after the initial creation of the empty object. The point is that the direct updates of large objects have problems, as do the use of rowids in high volume conditions. Regardless, we've had to implement some verification checks to ensure the objects don't get swapped - this is done by prefixing the object with a 32 bit MD5 hash that is also stored in a separate field of the same record. I've been looking at the Postgres docs, and I've found the lo_*() routines, which appear to use OIDs to write to the object. Is there any 'shifting' of records during insert because of space availability? And, probably more important, is there a way to do a direct update of a large object along with the other fields in a record - without having to call the lo_creat() and lo_write() functions? I've done some searching in the archives, but can't find anything that appears to answer the question. If I've been unclear on any of this, I'm sorry, feel free to prod for more info. Thanks in advance Lou -- Louis LeBlanc leblanc@keyslapper.org Fully Funded Hobbyist, KeySlapper Extrordinaire :) http://www.keyslapper.org Ô¿Ô¬ Scientists are people who build the Brooklyn Bridge and then buy it. -- William Buckley
Louis LeBlanc <db@keyslapper.org> writes: > [ large objects don't work real well in That Other Database ] Fascinating. I'd have thought they'd use a less flaky design. > I've been looking at the Postgres docs, and I've found the lo_*() > routines, which appear to use OIDs to write to the object. Is there > any 'shifting' of records during insert because of space availability? No, OIDs are stable identifiers. It sounds like their rowid equates to what we call a TID or CTID --- that is, a tuple's current physical location. CTID isn't fixed across updates in Postgres, but OID is. > And, probably more important, is there a way to do a direct update of > a large object along with the other fields in a record - without > having to call the lo_creat() and lo_write() functions? Well, I guess the question is how large are your BLOBs and what are you doing with them exactly? Postgres offers two different ways of dealing with large chunks of data. One is just to store them as big text or bytea values (which data type you use depends on whether the data is textual or not). This works well up to maybe a few megabytes, but it gets unwieldy for larger values, mainly because there's no easy way to store sections of a value. In some cases you can use substr() to fetch sections of a large value, but there's no equivalent means of writing just part of it. Also, if the data isn't textual then you have to deal with messy quoting rules when constructing SQL commands. (But as of 7.4 it's possible to alleviate the quoting problem by using out-of-line parameters instead.) The other way is to use lo_creat() and related functions. In this case each large object has its own identity (an OID assigned by lo_creat) and what you put in your table row is just a reference to the object. The plus side is that you can read and write a large object in sections (using lo_read/lo_write/lo_seek), the downside is that you have a more complicated model of what the database structure is, and lots more management headaches. For instance you need to explicitly delete a large object when you don't need it any more --- deleting a table row doesn't in itself make referenced large objects go away. Personally I'd avoid the large object facility if there were any chance of dealing with the data as ordinary wide values instead. It's just easier. regards, tom lane
Apologies if I missed something, but why are you using the rowid in Oracle instead of your own primary key value (even if you are inserting 10's of millions of records a day). A number(38) provides a range of magnitude 1E-130 .. 10E125 Using a non rowid PK value would mean that you were not at the mercy of the database moving your data record elsewhere if it cannot be accomodated in the original block following an update. If you use a number(38) PK value instead of a row ID, you are *sure* that your record is accessible regardless of updates. Of course if you wanted to avoid building/maintaining the number(38) PK index then that is a different matter... John Sidney-Woollett Louis LeBlanc said: > Hey folks. I'm new to the list, and not quite what you'd call a DB > Guru, so please be patient with me. I'm afraid the lead up here is a > bit verbose . . . > > I am working on an application that uses very high volume DB > transactions - in the order of tens of millions per day . . . > > Anyway, the current database which will remain nameless, but begins > with O and rymes with debacle (sorta), has a problem with high volume > work when it comes to binary large objects and rowid use (or oid use > as I understand Postgres uses). > > Here's the problem: > > When a record containing a number of basic types (int, varchar(2048), > char(32), etc.) as well as a large object is updated with a LOB, we > used to use rowids returned in the update of the other fields. The > rowid was then the condition used when selecting the locator for the > large object. > > Unfortunately, when a rowid is returned, it is always the location of > the current location, but if the data in question won't fit in the > current location, it is shifted to another rowid - but the application > can't really get notification of this. Later, when the object is > inserted using that rowid, it gets put in the wrong place. The result > is that sometimes, a request for one object will yield the wrong > object. > > Of course, there are ways to simply update the whole record, object > and all, but there are bugs in the database version we are using that > causes the temp tablespace to fill up and not be reused when this > method is used, so we've had to change some of this stuff around to > use the key field as the condition, and selecting the locator after > the initial creation of the empty object. The point is that the > direct updates of large objects have problems, as do the use of rowids > in high volume conditions. Regardless, we've had to implement some > verification checks to ensure the objects don't get swapped - this is > done by prefixing the object with a 32 bit MD5 hash that is also > stored in a separate field of the same record. > > I've been looking at the Postgres docs, and I've found the lo_*() > routines, which appear to use OIDs to write to the object. Is there > any 'shifting' of records during insert because of space availability? > And, probably more important, is there a way to do a direct update of > a large object along with the other fields in a record - without > having to call the lo_creat() and lo_write() functions? I've done > some searching in the archives, but can't find anything that appears > to answer the question. > > If I've been unclear on any of this, I'm sorry, feel free to prod for > more info. > > Thanks in advance > > Lou > -- > Louis LeBlanc leblanc@keyslapper.org > Fully Funded Hobbyist, KeySlapper Extrordinaire :) > http://www.keyslapper.org Ô¿Ô¬ > > Scientists are people who build the Brooklyn Bridge and then buy it. > -- William Buckley > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
On 01/31/04 06:57 PM, John Sidney-Woollett sat at the `puter and typed: > Apologies if I missed something, but why are you using the rowid in > Oracle instead of your own primary key value (even if you are > inserting 10's of millions of records a day). A number(38) provides > a range of magnitude 1E-130 .. 10E125 Ah, good question. Mainly because I didn't write the initial implementation. As I understand it, my predecessor was under the impression that rowid use was much faster than using the records existing key, which is a varchar(2048). Now that I've changed the code to use the key rather than the rowid, it's not any slower (faster, in fact, though that may be attributable to any number of other things . . .). On top of that, the key is pretty much guaranteed to be unique. > Using a non rowid PK value would mean that you were not at the mercy of > the database moving your data record elsewhere if it cannot be accomodated > in the original block following an update. Exactly. Which is why we don't use rowids at all anymore. > If you use a number(38) PK value instead of a row ID, you are *sure* that > your record is accessible regardless of updates. We thought of that, O is able to generate them through triggers, and they should be uniqueue enough, but probably not worth the effort since the tables pk isn't any slower. > Of course if you wanted to avoid building/maintaining the number(38) PK > index then that is a different matter... That isn't so much the problem as trying to stop the swapped LOBs. Ideally, the LOB would be included in the update when all the data is collected. Here's how it goes: Get a key (varchar(2048)) and create a placeholder record to prevent other threads wasting effort on it (this is autocommitted). Collect all the data for that key, including the object for the LOB. Open an atomic transaction Update the record with an empty LOB and the other data Select the LOB locator using the pk Write the object to the LOB locator Close the atomic transaction, which commits. The first step of this three step transaction used to return the rowid, but from time to time it would also result in a shift of the record to another rowid without notifying the app. The second step would use that rowid rather than the pk to write the object in. Ideally, the three step atomic transaction would be reduced to a single step transaction. This is possible in O, but 8.1.7 has a bug that causes temp tablespace to be eaten up and not freed. That's what I'm trying to do in Postgres. So, can it be done? Thanks Lou > Louis LeBlanc said: > > Hey folks. I'm new to the list, and not quite what you'd call a DB > > Guru, so please be patient with me. I'm afraid the lead up here is a > > bit verbose . . . > > > > I am working on an application that uses very high volume DB > > transactions - in the order of tens of millions per day . . . > > > > Anyway, the current database which will remain nameless, but begins > > with O and rymes with debacle (sorta), has a problem with high volume > > work when it comes to binary large objects and rowid use (or oid use > > as I understand Postgres uses). > > > > Here's the problem: > > > > When a record containing a number of basic types (int, varchar(2048), > > char(32), etc.) as well as a large object is updated with a LOB, we > > used to use rowids returned in the update of the other fields. The > > rowid was then the condition used when selecting the locator for the > > large object. > > > > Unfortunately, when a rowid is returned, it is always the location of > > the current location, but if the data in question won't fit in the > > current location, it is shifted to another rowid - but the application > > can't really get notification of this. Later, when the object is > > inserted using that rowid, it gets put in the wrong place. The result > > is that sometimes, a request for one object will yield the wrong > > object. > > > > Of course, there are ways to simply update the whole record, object > > and all, but there are bugs in the database version we are using that > > causes the temp tablespace to fill up and not be reused when this > > method is used, so we've had to change some of this stuff around to > > use the key field as the condition, and selecting the locator after > > the initial creation of the empty object. The point is that the > > direct updates of large objects have problems, as do the use of rowids > > in high volume conditions. Regardless, we've had to implement some > > verification checks to ensure the objects don't get swapped - this is > > done by prefixing the object with a 32 bit MD5 hash that is also > > stored in a separate field of the same record. > > > > I've been looking at the Postgres docs, and I've found the lo_*() > > routines, which appear to use OIDs to write to the object. Is there > > any 'shifting' of records during insert because of space availability? > > And, probably more important, is there a way to do a direct update of > > a large object along with the other fields in a record - without > > having to call the lo_creat() and lo_write() functions? I've done > > some searching in the archives, but can't find anything that appears > > to answer the question. > > > > If I've been unclear on any of this, I'm sorry, feel free to prod for > > more info. > > > > Thanks in advance > > > > Lou > > -- > > Louis LeBlanc leblanc@keyslapper.org > > Fully Funded Hobbyist, KeySlapper Extrordinaire :) > > http://www.keyslapper.org Ô¿Ô¬ > > > > Scientists are people who build the Brooklyn Bridge and then buy it. > > -- William Buckley > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 3: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that your > > message can get through to the mailing list cleanly > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > > -- Louis LeBlanc leblanc@keyslapper.org Fully Funded Hobbyist, KeySlapper Extrordinaire :) http://www.keyslapper.org Ô¿Ô¬ Federal grants are offered for... research into the recreation potential of interplanetary space travel for the culturally disadvantaged.
On 01/31/04 02:00 PM, Tom Lane sat at the `puter and typed: > Louis LeBlanc <db@keyslapper.org> writes: > > [ large objects don't work real well in That Other Database ] > > Fascinating. I'd have thought they'd use a less flaky design. No doubt. I haven't had the chance to look at 9i or 10g, but 8.1.7 is pretty bogus in that respect. > > I've been looking at the Postgres docs, and I've found the lo_*() > > routines, which appear to use OIDs to write to the object. Is there > > any 'shifting' of records during insert because of space availability? > > No, OIDs are stable identifiers. It sounds like their rowid equates > to what we call a TID or CTID --- that is, a tuple's current physical > location. CTID isn't fixed across updates in Postgres, but OID is. Nice. Good to know. > > And, probably more important, is there a way to do a direct update of > > a large object along with the other fields in a record - without > > having to call the lo_creat() and lo_write() functions? > > Well, I guess the question is how large are your BLOBs and what are you > doing with them exactly? Well, we have a couple that are in the 650M range (CD ISOs), but the average size is around 4K. They are retrieved from the database via a server process, and inserted via a client process. Can't go into too much more, but basically, they get stored in the core and kept while they are useful (rollbacks for the table in question are unnecessary, and O won't let you turn them off either - can you turn them off in Postgres?). > Postgres offers two different ways of dealing with large chunks of data. > One is just to store them as big text or bytea values (which data type > you use depends on whether the data is textual or not). This works well > up to maybe a few megabytes, but it gets unwieldy for larger values, > mainly because there's no easy way to store sections of a value. In > some cases you can use substr() to fetch sections of a large value, > but there's no equivalent means of writing just part of it. Also, if > the data isn't textual then you have to deal with messy quoting rules > when constructing SQL commands. (But as of 7.4 it's possible to > alleviate the quoting problem by using out-of-line parameters instead.) Sometimes the data is textual, but sometimes it isn't. It can be an iso image, a graphic image file, an executable, literally anything. Any datatype that requires escaping certain characters would be impractical because of the potential size variation. Fetching part or all of the object is important so that large objects can be retrieved in parts, like a continued ftp transaction would do. Also, the objects are usually written piecemeal when they exceed the 10M mark. I suppose this could be done with an import though . . . > The other way is to use lo_creat() and related functions. In this case > each large object has its own identity (an OID assigned by lo_creat) and > what you put in your table row is just a reference to the object. The > plus side is that you can read and write a large object in sections > (using lo_read/lo_write/lo_seek), the downside is that you have a more > complicated model of what the database structure is, and lots more > management headaches. For instance you need to explicitly delete a > large object when you don't need it any more --- deleting a table row > doesn't in itself make referenced large objects go away. This still sounds like our current model except for the fact that the OID has to be explicitly deleted. I assume a trigger could be set up to perform this on the fly though? > Personally I'd avoid the large object facility if there were any chance > of dealing with the data as ordinary wide values instead. It's just > easier. Agreed, but probably not feasable when you consider that some of these objects could eventually reach the Gigabyte threshold. Thanks for the feedback. Lou -- Louis LeBlanc leblanc@keyslapper.org Fully Funded Hobbyist, KeySlapper Extrordinaire :) http://www.keyslapper.org Ô¿Ô¬ Water, taken in moderation cannot hurt anybody. -- Mark Twain
Louis LeBlanc <db@keyslapper.org> writes: >> For instance you need to explicitly delete a >> large object when you don't need it any more --- deleting a table row >> doesn't in itself make referenced large objects go away. > This still sounds like our current model except for the fact that the > OID has to be explicitly deleted. I assume a trigger could be set up > to perform this on the fly though? Yeah, as long as you keep things simple (no multiple references to BLOBs) you can just add an ON DELETE trigger to handle that. Given that this model is close to what you're already doing, large objects probably are the way to go. Good luck with it. regards, tom lane