Thread: How does TOAST compare to other databases' mechanisms?
If this question is answered in some FAQ, I'd be happy just for some pointer. If not, what I'd like to know is this: Since I will be using PostgreSQL (because it's the database I know best and because I like it ;)) for a content management type app where I need TOAST, I am wondering whether this mechanism is actually something really advanced and neat (this is what I'd assume) or if it is a not-so-elegant, rather involved construction to get around a legacy limitation? To put it another way: Can I go around bragging to my SQL-minded friends about using this really cool thing that no other database has, or should I keep my mouth shut because it's actually not so cool? How do the MySQL people do it (there's a pretty instructive comparison of MySQL and PostgreSQL at http://www.phpbuilder.com/columns/tim20000705.php3 in the context of both databases' suitability as a web site backend which mentions the 8k barrier as one of the bigger minuses of PostgreSQL - it's a head-to-head race in this comparison by the way, which makes it particularly interesting)? What's Oracle's answer to oversized attributes (or other advanced DBMS's, like Sybase, Informix et al)? Thanks, Frank -- frank joerdens joerdens new media urbanstr. 116 10967 berlin germany e: frank@joerdens.de t: +49 (0)30 69597650 f: +49 (0)30 7864046 h: http://www.joerdens.de pgp public key: http://www.joerdens.de/pgp/frank_joerdens.asc
Frank Joerdens <frank@joerdens.de> writes: > Can I go around bragging to my SQL-minded friends about using this > really cool thing that no other database has, or should I keep my mouth > shut because it's actually not so cool? IMHO it's pretty cool. You get the benefits of BLOB storage without having to deal with weird declarations or access methods. I have no idea whether any other databases do it the same way, but simply removing the limit on physical tuple length wouldn't have been as nice. See, with a toasted column, you don't pay to suck the contents of the column into memory when you read the row for a query that doesn't actually touch that column. So, for example, you might have a table declared like CREATE TABLE foo (key1 text, moddate timestamp, big_horking_data text); and you can do things like SELECT big_horking_data FROM foo WHERE key1 = 'bar' AND moddate > 'yesterday'; Here the table is essentially acting as an index for the BLOB storage: the system won't bother to fetch the BLOB values for the rows that fail the WHERE check. You can't do that without lots of cruft in any non-TOAST-like scheme, AFAICS. regards, tom lane
Tom Lane wrote: > Frank Joerdens <frank@joerdens.de> writes: > > Can I go around bragging to my SQL-minded friends about using this > > really cool thing that no other database has, or should I keep my mouth > > shut because it's actually not so cool? > > IMHO it's pretty cool. You get the benefits of BLOB storage without > having to deal with weird declarations or access methods. I have no > idea whether any other databases do it the same way, but simply removing > the limit on physical tuple length wouldn't have been as nice. See, > with a toasted column, you don't pay to suck the contents of the column > into memory when you read the row for a query that doesn't actually touch > that column. So, for example, you might have a table declared like > > CREATE TABLE foo > (key1 text, > moddate timestamp, > big_horking_data text); > > [...] And it also helps to avoid tables, containing such big_horking_data items, to grow as fast as they would before 7.1. In the case CREATE TABLE customer_call ( cc_callid serial primary key, cc_custid integer foreign key ..., cc_priority integer, cc_calltime timestamp, cc_callclosed timestamp, cc_huge_description text ); UPDATE customer_call SET cc_callclosed = now() WHERE cc_callid = 5432; the cc_huge_description isn't touched. Now think about the way the non-overwriting storage manager in PostgreSQL works. Normally it would store a completely new tuple, containing the description again and VACUUM needs to move alot of data to condense the table again. TOAST will reuse the previous toasted value and NOT outdate it, but put another reference to it into the new tuple. This will avoid alot of write access to the disks and speedup VACUUM. Also, the UPDATE will never even read these items, so the update itself is (like Tom's SELECT sample) working on a small table. All in all it is a very good solution for the very special problems we have in PostgreSQL. It might not compare in any way to what other databases do, but the non-overwriting technology bought us MVCC to be relatively easy. Now it lowers the cost of having it. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
> > idea whether any other databases do it the same way, but simply removing > > the limit on physical tuple length wouldn't have been as nice. Yes, that makes it a lot clearer. This is more or less what I'd been wondering, without really noticing: Why not just remove the limit on physical tuple length? Because the description of TOAST's inner workings seemed so convoluted (what with compression etcetera) that it appeared as either a great, strained effort to remove a "deep, hardwired limitation" (someone's statement regarding the 8K barrier) or something that someone had thought long and hard about to make particularly cool. > All in all it is a very good solution for the very special > problems we have in PostgreSQL. It might not compare in any > way to what other databases do, but the non-overwriting > technology bought us MVCC to be relatively easy. Now it > lowers the cost of having it. I am more than satisfied with both your explanations, thanks. I was afraid I might have to switch to MySQL (the horror! the horror!), also because of a piece of information from another recent thread: > >> For more than 10,000 BLOBs, the database was a clear winner. > >> That is, it took less time to locate and read an 8k-16k BLOB > >> from the MySQL database than from the local disk. For smaller > >> numbers of BLOBs, the filesystem was faster. > > This is undoubtedly right for existing Postgres releases, because > currently PG stores each BLOB as an independent table --- and just to > add insult to injury, creates an index for it :-(. So you actually have > *two* added files in the DB directory per BLOB. Needless to say, this > does not scale real well to large numbers of BLOBs. . . . Tom goes on to say that TOAST is a solution to most of these issues whilst a better BLOB implementation is waiting in the wings (scheduled for 7.1 also) which will provide a more natural solution to some problems - an issue which is also touched upon by Jan in pgsql-sql: > > TOAST is finished and will be shipped with 7.1. It's not a > > solution for huge items, but medium sized text up to some > > hundred K works fine. > > What do you mean by "..not a solution for huge items"? Does TOAST have a size limit? . . . Jan goes on to explain that TOAST does not have an explicit size limit and why it is still less than optimal for really big items. Which leads to another question in the context of my little content management app: Part of it is composed of smaller and larger texts, for which TOAST will be great, because I can still search them, they're not opaque for a, for instance, LIKE query parameter in a SELECT statement (right?), like BLOBs, if I want to search for texts that contain particular words or statements (this will probably be slow if the texts are big and a lot of them are included in the query). But I also have images, most of which will _probably_ not exceed a few hundred K. At which point do I cross the line where TOAST becomes silly and BLOBs are The Way? Regards, Frank
Frank Joerdens <frank@joerdens.de> writes: > another recent thread: >>>>> For more than 10,000 BLOBs, the database was a clear winner. >>>>> That is, it took less time to locate and read an 8k-16k BLOB >>>>> from the MySQL database than from the local disk. For smaller >>>>> numbers of BLOBs, the filesystem was faster. >> >> This is undoubtedly right for existing Postgres releases, because >> currently PG stores each BLOB as an independent table --- and just to >> add insult to injury, creates an index for it :-(. So you actually have >> *two* added files in the DB directory per BLOB. Needless to say, this >> does not scale real well to large numbers of BLOBs. > . . . Tom goes on to say that TOAST is a solution to most of these > issues whilst a better BLOB implementation is waiting in the wings > (scheduled for 7.1 also) BTW, just in case I wasn't being perfectly clear: TOAST doesn't depend on the existing table-per-BLOB implementation of "large objects". All TOAST-created BLOBs are stored in auxiliary tables, where there is one auxiliary table per normal table that contains TOASTable columns. So TOAST shouldn't create problems with directory bloat, even if we don't get around to fixing the large-object problem for 7.1. (I still hope to check over and commit Denis Perchine's proposed fix for LOs, though.) > Which leads to another question in the context of my little content > management app: Part of it is composed of smaller and larger texts, > for which TOAST will be great, because I can still search them, > they're not opaque for a, for instance, LIKE query parameter in a > SELECT statement (right?), like BLOBs, if I want to search for texts > that contain particular words or statements (this will probably be > slow if the texts are big and a lot of them are included in the > query). But I also have images, most of which will _probably_ not > exceed a few hundred K. At which point do I cross the line where TOAST > becomes silly and BLOBs are The Way? Good question. The answer probably depends on operational experience that we don't have yet (in other words, please try it both ways and report back ;-)). However, I'd venture the following observation: using special BLOB syntax makes the most sense if you either need to access sub-sections of a BLOB (because your BLOB is so big you really don't want to fetch/store it as a unit) or need to link to the same physical BLOB from multiple rows (again, this is only critical for really big BLOBs). It doesn't sound like either of these issues applies to your problem. regards, tom lane
At 00:21 9/10/00 -0400, Tom Lane wrote: >> But I also have images, most of which will _probably_ not >> exceed a few hundred K. At which point do I cross the line where TOAST ... > >Good question. The answer probably depends on operational experience >that we don't have yet (in other words, please try it both ways and >report back ;-)). In the case of images, is there a way to tell TOAST not to bother even trying to compress the data? (eg. JPEG files). If so, would I be right in assuming that this would be better for fast retrieval (even for text files)? Or is this anothet case where we need to test - ie. CPU Vs. IO availability? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > In the case of images, is there a way to tell TOAST not to bother even > trying to compress the data? (eg. JPEG files). If so, would I be right in > assuming that this would be better for fast retrieval (even for text > files)? TOAST will not store a compressed value unless the compressed value is smaller than the uncompressed by some reasonable amount (which looks to be 20% by default). I'd expect JPEG-like data always to fail the compression ratio check. So there's no significant CPU cost at read time, but there's some cost at write time to try to do the compression. There is a provision to discourage the toaster from even trying to compress a particular column --- see the attstorage column in pg_attribute. At the moment there's no user interface for that :-( so you'd have to reach in with a manual "UPDATE pg_attribute" to change it from the default value. Someone should work on adding an ALTER command to change it in a more user-friendly fashion. regards, tom lane
Is all the info on using toast on the postgresql site, or is it somewhere else? Also, I'm assuming it is good for more than just test, correct? Adam Lang Systems Engineer Rutgers Casualty Insurance Company ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "Frank Joerdens" <frank@joerdens.de> Cc: <pgsql-general@postgresql.org> Sent: Friday, October 06, 2000 5:47 PM Subject: Re: [GENERAL] How does TOAST compare to other databases' mechanisms? > Frank Joerdens <frank@joerdens.de> writes: > > Can I go around bragging to my SQL-minded friends about using this > > really cool thing that no other database has, or should I keep my mouth > > shut because it's actually not so cool? > > IMHO it's pretty cool. You get the benefits of BLOB storage without > having to deal with weird declarations or access methods. I have no > idea whether any other databases do it the same way, but simply removing > the limit on physical tuple length wouldn't have been as nice. See, > with a toasted column, you don't pay to suck the contents of the column > into memory when you read the row for a query that doesn't actually touch > that column. So, for example, you might have a table declared like > > CREATE TABLE foo > (key1 text, > moddate timestamp, > big_horking_data text); > > and you can do things like > > SELECT big_horking_data FROM foo > WHERE key1 = 'bar' AND moddate > 'yesterday'; > > Here the table is essentially acting as an index for the BLOB storage: > the system won't bother to fetch the BLOB values for the rows that > fail the WHERE check. You can't do that without lots of cruft in any > non-TOAST-like scheme, AFAICS. > > regards, tom lane
> Tom Lane wrote: > > Frank Joerdens <frank@joerdens.de> writes: > > > Can I go around bragging to my SQL-minded friends about using this > > > really cool thing that no other database has, or should I keep my mouth > > > shut because it's actually not so cool? > > > > IMHO it's pretty cool. You get the benefits of BLOB storage without > > having to deal with weird declarations or access methods. I have no > > idea whether any other databases do it the same way, but simply removing > > the limit on physical tuple length wouldn't have been as nice. See, > > with a toasted column, you don't pay to suck the contents of the column > > into memory when you read the row for a query that doesn't actually touch > > that column. So, for example, you might have a table declared like I think TOAST is a classic case of the Internet group coming up with a better solution than any company could. No one had all the ideas of TOAST, but together, we came up with a solution that has all the advantages and no downside. The long values stay out of the main table, and use the existing database indexing structure to access them when needed. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> Philip Warner <pjw@rhyme.com.au> writes: > > In the case of images, is there a way to tell TOAST not to bother even > > trying to compress the data? (eg. JPEG files). If so, would I be right in > > assuming that this would be better for fast retrieval (even for text > > files)? > > TOAST will not store a compressed value unless the compressed value is > smaller than the uncompressed by some reasonable amount (which looks to > be 20% by default). I'd expect JPEG-like data always to fail the Yea, I know, I have never heard of JPEG either, but Tom Lane is Mr. Internet JPEG, so humor him. :-) -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
[ Charset ISO-8859-1 unsupported, converting... ] > Is all the info on using toast on the postgresql site, or is it somewhere > else? > > Also, I'm assuming it is good for more than just test, correct? We are keeping TOAST quiet until beta beings. Of course, this has already made it to the general list, so I guess that idea didn't work. :-) -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
At 10:12 9/10/00 -0400, Tom Lane wrote: > >There is a provision to discourage the toaster from even trying to >compress a particular column --- see the attstorage column in >pg_attribute. At the moment there's no user interface for that :-( >so you'd have to reach in with a manual "UPDATE pg_attribute" to >change it from the default value. Someone should work on adding an >ALTER command to change it in a more user-friendly fashion. > If I wanted to do this, how long do you think it would take (given that I have not done anything similar so far)? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > At 10:12 9/10/00 -0400, Tom Lane wrote: >> pg_attribute. At the moment there's no user interface for that :-( >> so you'd have to reach in with a manual "UPDATE pg_attribute" to >> change it from the default value. Someone should work on adding an >> ALTER command to change it in a more user-friendly fashion. > If I wanted to do this, how long do you think it would take (given that I > have not done anything similar so far)? Hard to say. Do you know anything about yacc grammars? You'd have to add a production to gram.y to define the syntax, probably extend the existing AlterStmt data structure (which implies touching support code in backend/nodes), and then add some execution code that checks for a valid command (ie, that the data column type is toastable) and finally applies the pg_attribute change. All told it might be a couple hundred lines of new or changed code. Pretty much all of this could be done by cribbing from existing code (ie. programming-by-example) which is a good thing because there's not much documentation. Someone who already knew what they were doing could do it in an hour or two. Not sure how much learning time you'd need to figure on top of that. But if you're interested in learning to hack the backend, this seems like a pretty reasonable first project. regards, tom lane
At 20:42 13/10/00 -0400, Tom Lane wrote: >Philip Warner <pjw@rhyme.com.au> writes: >> At 10:12 9/10/00 -0400, Tom Lane wrote: >>> Someone should work on adding an >>> ALTER command to change it in a more user-friendly fashion. > >> If I wanted to do this, how long do you think it would take (given that I >> have not done anything similar so far)? > >Hard to say. Do you know anything about yacc grammars? I added a 'get statistics' statement to plpgsql a while back, but I don't think it made it into CVS. So yes, I have at least seen yacc before. > You'd have >to add a production to gram.y to define the syntax, probably extend >the existing AlterStmt data structure (which implies touching support >code in backend/nodes), and then add some execution code that checks >for a valid command (ie, that the data column type is toastable) This sounds fine, but begs the question: if there is TOASTed data in the table and the column is set to 'not TOASTed', will TOAST cope? And vice verca? >finally applies the pg_attribute change. This sounds cloneable. >All told it might be a couple >hundred lines of new or changed code. Pretty much all of this could >be done by cribbing from existing code (ie. programming-by-example) >which is a good thing because there's not much documentation. So it might be worh documenting, too... >Someone who already knew what they were doing could do it in an hour >or two. Not sure how much learning time you'd need to figure on top >of that. But if you're interested in learning to hack the backend, >this seems like a pretty reasonable first project. Sounds good to me; if I can't get it into the beta, then it's no real problem I guess. Main constraint will be my real work, which is fairly intense at the moment. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
> This sounds fine, but begs the question: if there is TOASTed data in the > table and the column is set to 'not TOASTed', will TOAST cope? And vice verca? Yes; the detoasting mechanism doesn't pay any attention to attstorage, only to what's actually in the stored tuple. attstorage is only examined while storing a tuple, and it's really only advisory anyway (see tuptoaster.c). The only "must" involved here is that you must not set attstorage to anything but 'p' for a non-toastable data type; which you determine by checking the type's pg_type entry (typstorage != 'p' means toastable). >> All told it might be a couple >> hundred lines of new or changed code. Pretty much all of this could >> be done by cribbing from existing code (ie. programming-by-example) >> which is a good thing because there's not much documentation. > So it might be worh documenting, too... Yes, if you wanted to keep notes and work them up into documentation for future hackers, that'd be a nice side benefit. regards, tom lane
Philip Warner wrote: > At 20:42 13/10/00 -0400, Tom Lane wrote: > >Philip Warner <pjw@rhyme.com.au> writes: > >> At 10:12 9/10/00 -0400, Tom Lane wrote: > >>> Someone should work on adding an > >>> ALTER command to change it in a more user-friendly fashion. > > > >> If I wanted to do this, how long do you think it would take (given that I > >> have not done anything similar so far)? > > > >Hard to say. Do you know anything about yacc grammars? > > I added a 'get statistics' statement to plpgsql a while back, but I don't > think it made it into CVS. So yes, I have at least seen yacc before. You named it GET DIAGNOSTICS ... Took a while, but it is in the current tree for a couple of weeks now. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #