Thread: update on TOAST status
FYI, during the day I committed a couple of changes to TOAST. - Secondary relations for the toaster (to move off values) are now automatically created during CREATE TABLE, ALTER TABLE ... ADD COLUMN and SELECT ... INTO, whenever the first toastable attribute appears in thetable schema. - The TOAST tables are now of kind RELKIND_TOASTVALUE. - TOAST tables cannot be vacuumed separately. They are allways vacuumend if their master table is, whileVACUUM still holds the lock on the master table. - VACUUM doesn't propagate ANALYZE to TOAST tables. Statistics for them are needless because the toastaccess is allways hardcoded indexed. - TOAST tables are protected against manual INSERT, UPDATE and DELETE operations. SELECT is still possiblefor debugging purposes. The name of the TOAST table is pg_toast_<oid-of-master>. - The chunk_data attribute has been changed to type bytea. For now, "lztext" is the only test candidate datatype to invoke the toaster. It can hold up to multi-megabytesnow. But be warned, this datatype will disappear as soon as "text" is toastable. Next I'll make pg_dump TOAST-safe. Will only take a couple of minutes I think. Toast tables aren't automatically created for system catalogs. Thus I'll add ALTER TABLE pg_rewrite CREATE TOAST TABLE to initdb. So we'll get unlimited view complexity for free. As soon as arrays are toastable, we might want to add pg_class because of relacl too. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan Wieck wrote: > FYI, > > For now, "lztext" is the only test candidate datatype to > invoke the toaster. It can hold up to multi-megabytes now. > But be warned, this datatype will disappear as soon as "text" > is toastable. > I have not been following the TOAST discussion, but why would lztext disappear? It seems like a useful datatype independent of TOAST or not TOAST? Jeff
Jeffery Collins wrote: > Jan Wieck wrote: > > > FYI, > > > > For now, "lztext" is the only test candidate datatype to > > invoke the toaster. It can hold up to multi-megabytes now. > > But be warned, this datatype will disappear as soon as "text" > > is toastable. > > > > I have not been following the TOAST discussion, but why would lztext > disappear? It seems like a useful datatype independent of TOAST or not > TOAST? The "lztext" type was something I developed before TOAST was born. It's was a "text" type that tried to compress thevalue at input time. In the TOAST world, each input value will be passed around as is. Only when it gets down to be stored in a table and the resulting heap tuple exceeds 2K, the toaster will try to compress toastable attributes and/or move off attributes.The behaviour will be configurable on a per tables attribute base. So someone can specify "don't trycompression", "ignore this attribute until all others are toasted" or "never toast this, instead fail and abort -unwise but possible". In the current CVS sources, "lztext" already doesn't know anything about compression anymore. It's more or less equivalent to "text" now, where it's lztextin() function produces a plain varlena structure like textin()does. Only that all it's other functions are aware that the values they recieve might be toasted ones. It'sthe toaster that does the compression/move-off for it now. So as soon as "text" is toastable, there is absolutely no need for "lztext" anymore. We will add an alias to theparser for 7.1, which will disappear in 7.2 again. If you pg_dump/restore your databases during the 7.0->7.1upgrade, all your table schemas will automatically be changed from "lztext" to "text". Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan Wieck wrote: > Toast tables aren't automatically created for system > catalogs. Thus I'll add > > ALTER TABLE pg_rewrite CREATE TOAST TABLE > > to initdb. So we'll get unlimited view complexity for free. > As soon as arrays are toastable, we might want to add > pg_class because of relacl too. Why would we want system catalogs toastable?
Chris Bitmead wrote: > > Jan Wieck wrote: > > > Toast tables aren't automatically created for system > > catalogs. Thus I'll add > > > > ALTER TABLE pg_rewrite CREATE TOAST TABLE > > > > to initdb. So we'll get unlimited view complexity for free. > > As soon as arrays are toastable, we might want to add > > pg_class because of relacl too. > > Why would we want system catalogs toastable? I assume this will allow for Views with large rewrite rules which currently are limited in size. Mike Mascari
> - VACUUM doesn't propagate ANALYZE to TOAST tables. > Statistics for them are needless because the toast access > is allways hardcoded indexed. I don't think statistics are insignificant for TOASTed columns. If I say col=3, the optimizer uses that information for estimating the number of rows returned, and figuring out the type of join and order of join to perform, not just for "use index, don't use index" decisions. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > > - VACUUM doesn't propagate ANALYZE to TOAST tables. > > Statistics for them are needless because the toast access > > is allways hardcoded indexed. > > I don't think statistics are insignificant for TOASTed columns. If I > say col=3, the optimizer uses that information for estimating the number > of rows returned, and figuring out the type of join and order of join to > perform, not just for "use index, don't use index" decisions. Ask your boys to give you a training session for "reading" when they go to bed tonight - and greet them from the"police officer" :-) I said "to TOAST tables", not "TOASTed columns". Their master tables will allways have the statistics, including those for toasted columns, if you ask for them via ANALYZE. In normal operation, noone would ever know if a TOAST table is accessed during his query - not even the planner or optimmizer. It's totally transparent and the only one accessing the TOAST tables is the toaster himself - and he knows what he does. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> - VACUUM doesn't propagate ANALYZE to TOAST tables. >> Statistics for them are needless because the toast access >> is allways hardcoded indexed. > I don't think statistics are insignificant for TOASTed columns. He didn't say that! I think what he meant is that there's no need for statistics associated with the TOAST table itself, and AFAICS that's OK. BTW, I have thought of a potential problem with indexes on toasted columns. As I understand Jan's current thinking, the idea is 1. During storage of the tuple in the main table, any oversize fields get compressed/moved off. 2. The toasted item in the finished main tuple gets handed to the index routines to be stored in the index. Now, storing the toasted item in the index tuple seems fine, but what I do not like here is the implication that all the comparisons needed to find where to *put* the index tuple are done using a pretoasted value. That seems to imply dozens of redundant decompressions/fetches, another one for each key comparison we have to do. Jan, do you have a way around this that I missed? One simple answer that might help for other scenarios too is to keep a small cache of the last few values that had to be untoasted. Maybe we only need it for moved-off values --- it could be that decompression is fast enough that we should just do it over rather than trying to cache. regards, tom lane
Mike Mascari wrote: > Chris Bitmead wrote: > > > > Jan Wieck wrote: > > > > > Toast tables aren't automatically created for system > > > catalogs. Thus I'll add > > > > > > ALTER TABLE pg_rewrite CREATE TOAST TABLE > > > > > > to initdb. So we'll get unlimited view complexity for free. > > > As soon as arrays are toastable, we might want to add > > > pg_class because of relacl too. > > > > Why would we want system catalogs toastable? > > I assume this will allow for Views with large rewrite rules which > currently are limited in size. Absolutely correnct. With the code in place (after a few more fixes) I was able to create a "SELECT *" view from a 681 attribute table. The resulting rule is about 170K! And more complex things are possible too now, because the rewrite rule size is not limited any longer (as long as you have enough CPU, ram and disk space). Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> - VACUUM doesn't propagate ANALYZE to TOAST tables. > >> Statistics for them are needless because the toast access > >> is allways hardcoded indexed. > > > I don't think statistics are insignificant for TOASTed columns. > > He didn't say that! I think what he meant is that there's no need for > statistics associated with the TOAST table itself, and AFAICS that's OK. > > BTW, I have thought of a potential problem with indexes on toasted > columns. As I understand Jan's current thinking, the idea is > > 1. During storage of the tuple in the main table, any oversize fields > get compressed/moved off. > > 2. The toasted item in the finished main tuple gets handed to the index > routines to be stored in the index. Right. > Now, storing the toasted item in the index tuple seems fine, but what > I do not like here is the implication that all the comparisons needed > to find where to *put* the index tuple are done using a pretoasted > value. That seems to imply dozens of redundant decompressions/fetches, > another one for each key comparison we have to do. Dozens - right. I just did a little gdb session tracing a SELECT ... WHERE toasted = 'xxx' The table has 151 rows and an index on 'toasted'. It needed 6 fetches of the attribute. Better than good, because 2^6 is only 64, so btree did a perfect job. Anyhow, in the case of a real TOASTed (read burned) value, it'd mean 6 indexscans to recreate the on disk stored representation plus 6 decompression loops to get the plain oneto compare against. What the hell would an "IN (SELECT ...)" cause? > Jan, do you have a way around this that I missed? > > One simple answer that might help for other scenarios too is to keep > a small cache of the last few values that had to be untoasted. Maybe > we only need it for moved-off values --- it could be that decompression > is fast enough that we should just do it over rather than trying to > cache. I'm still argueing that indexing huge values is a hint for a misleading schema. If this is true, propagating toasted attributes into indices is a dead end street and I'd have to change the heap-access<->toaster interface so that the modified (stored) main tuple isn't visible to the following code (that does the index inserts). What is the value of supporting index tuples >2K? Support of braindead schemas? I can live withoout it! Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
> Bruce Momjian wrote: > > > - VACUUM doesn't propagate ANALYZE to TOAST tables. > > > Statistics for them are needless because the toast access > > > is allways hardcoded indexed. > > > > I don't think statistics are insignificant for TOASTed columns. If I > > say col=3, the optimizer uses that information for estimating the number > > of rows returned, and figuring out the type of join and order of join to > > perform, not just for "use index, don't use index" decisions. > > Ask your boys to give you a training session for "reading" > when they go to bed tonight - and greet them from the "police > officer" :-) Sure. > > I said "to TOAST tables", not "TOASTed columns". > > Their master tables will allways have the statistics, > including those for toasted columns, if you ask for them via > ANALYZE. > > In normal operation, noone would ever know if a TOAST table > is accessed during his query - not even the planner or > optimmizer. It's totally transparent and the only one > accessing the TOAST tables is the toaster himself - and he > knows what he does. > Oh, sure, got it. It is the toast table that doesn't need stats. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
JanWieck@t-online.de (Jan Wieck) writes: > Tom Lane wrote: >> One simple answer that might help for other scenarios too is to keep >> a small cache of the last few values that had to be untoasted. Maybe >> we only need it for moved-off values --- it could be that decompression >> is fast enough that we should just do it over rather than trying to >> cache. > I'm still argueing that indexing huge values is a hint for a > misleading schema. If this is true, propagating toasted > attributes into indices is a dead end street and I'd have to > change the heap-access<->toaster interface so that the > modified (stored) main tuple isn't visible to the following > code (that does the index inserts). But you'll notice that is *not* what I suggested. A detoasted-value cache could be useful in more situations than just an index lookup. I don't necessarily say we've got to have it in 7.1, but let's keep the idea in mind in case we start finding there is a bottleneck here. > What is the value of supporting index tuples >2K? If you're toasting the whole main tuple down to <2K, you might find yourself toasting individual fields that are a good bit less than that. So I don't think indexing a toasted value will be all that unusual. But this is all speculation for now. Let's get it working bulletproof for 7.1, and then worry about speedups after we know they are needed. regards, tom lane
Tom Lane wrote: > JanWieck@t-online.de (Jan Wieck) writes: > > Tom Lane wrote: > >> One simple answer that might help for other scenarios too is to keep > >> a small cache of the last few values that had to be untoasted. Maybe > >> we only need it for moved-off values --- it could be that decompression > >> is fast enough that we should just do it over rather than trying to > >> cache. > > > I'm still argueing that indexing huge values is a hint for a > > misleading schema. If this is true, propagating toasted > > attributes into indices is a dead end street and I'd have to > > change the heap-access<->toaster interface so that the > > modified (stored) main tuple isn't visible to the following > > code (that does the index inserts). > > But you'll notice that is *not* what I suggested. A detoasted-value Haven't missed it in the first read - of course. > cache could be useful in more situations than just an index lookup. > I don't necessarily say we've got to have it in 7.1, but let's keep > the idea in mind in case we start finding there is a bottleneck here. > > > What is the value of supporting index tuples >2K? > > If you're toasting the whole main tuple down to <2K, you might find > yourself toasting individual fields that are a good bit less than > that. So I don't think indexing a toasted value will be all that > unusual. Exactly that's why I'm asking if we wouldn't be better off by limiting index tuples to (blocksize - overhead) / 4 and allways store plain, untoasted values in indices. I've asked now a couple of times "who really has the need for indexing huge values"? All responses I got so far where of the kind "would be nice if we support it" or "I don't like such restrictions". But noone really said "Ineed it". > But this is all speculation for now. Let's get it working bulletproof > for 7.1, and then worry about speedups after we know they are needed. Let me speculate too a little. The experience I have up to now is that the saved time from requiring less blocks in the buffer cache outweights thecost of decompression. Especially with our algorithm, because it is byte oriented (instead of huffman coding beeingbased on a bit stream), causing it to be extremely fast on decompression. And the technique ofmoving off values from the main heap causes the main tuples to be much smaller. As long as the toasted valuesaren't used in qualification or joining, only their references move around through the various executor steps, and only those values that are part of the final result set need to be fetched when sending them tothe client. Given a limited amount of total memory available for one running postmaster, we save alot of disk I/O and hold more values in their compressed format in the shared buffers. With the limit on total memory, the size of the buffercache must be lowered by the size of the new detoasted cache, and that only if we make it shared too. Givenfurther an average of 50% compression ratio (what's not unlikely with typical input like html pages), one cacheddetoasted value would require two compressed ones to go away. Wouldn't really surprise me if we gain speed from it in the average query. Even if some operations might slow down (sorting on maybe toasted fields). We need to see some results and wait for reports for this. But we know already that it can cause trouble with indexed fields, because these are likely to be used for comparision during scans. So do we want to have indices storing plain values allways and limit them in the index-tuple size or not? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
JanWieck@t-online.de (Jan Wieck) writes: > ... So do we want to have indices storing plain > values allways and limit them in the index-tuple size or not? I think not: it will be seen as a robustness failure, even (or especially) if it doesn't happen often. I can see the bug reports now: "Hey! I tried to insert a long value in my field, and it didn't work! I thought you'd fixed this bug?" You make good arguments that we shouldn't be too concerned about the speed of access to toasted index values, and I'm willing to accept that point of view (at least till we have hard evidence about it). But when I say "it should be bulletproof" I mean it should *work*, without imposing arbitrary limits on the user. Arbitrary limits are exactly what we are trying to eliminate. regards, tom lane
Tom Lane wrote: > JanWieck@t-online.de (Jan Wieck) writes: > > ... So do we want to have indices storing plain > > values allways and limit them in the index-tuple size or not? > > I think not: it will be seen as a robustness failure, even (or > especially) if it doesn't happen often. I can see the bug reports now: > "Hey! I tried to insert a long value in my field, and it didn't work! > I thought you'd fixed this bug?" > > You make good arguments that we shouldn't be too concerned about the > speed of access to toasted index values, and I'm willing to accept > that point of view (at least till we have hard evidence about it). > But when I say "it should be bulletproof" I mean it should *work*, > without imposing arbitrary limits on the user. Arbitrary limits are > exactly what we are trying to eliminate. After debugging something I thought was a bug in the toaster, I've found something really causing headaches. TOAST AS IS IS NOT CAPABLE OF HOLDING INDEXED VALUES! It appears that brtee indices (at least) can keep references to old toast values that survive a VACUUM! Seemsthese references live in nodes actually not referring to a heap tuple any more, but used during tree traversal in comparisions. As if an index tuple delete from a btree not necessarily causes the index value to disappear from the btree completely. It'll never be returned by an index scan, but the value is still theresomewhere. Everything is OK with this up to a VACUUM run. The toaster uses SnapShotAny to fetch toast values. So an external value can be fetched by the toaster even if it is already deleted and committed. If he has a reference somewhere, he has allways a share or higher lock on the main relation preventing VACUUM to mangleup the toast relation (I moved toast relation vacuuming into the lock time of the main table recently). But in the above case it is already vacuumed and not present any more. Now the btree traversal needs to compare againsta value, long gone to the bit heaven, and that cannot work with the toast architecture. Seems the designs of btree and toast are colliding. As soon as "text" is toastable, this'll hurt - be warned. AFAICS, we need to detoast values for index inserts allways and have another toaster inside the index access methodsat some day. In the meantime we should decide a safe maximum index tuple size and emit an explanative errormessage on the attempt to insert oversized index entries instead of possibly corrupting the index. Comment! Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
At 14:02 11/07/00 +0200, Jan Wieck wrote: > AFAICS, we need to detoast values for index inserts allways > and have another toaster inside the index access methods at > some day. We might not need it...at least not in the furst pass. > In the meantime we should decide a safe maximum > index tuple size and emit an explanative error message on the > attempt to insert oversized index entries instead of possibly > corrupting the index. Can I suggest that we also put out a warning when defining an index using a field with a (potentially) unlimited size? Indexing a text field will mostly be a bizarre thing to do, but, eg, indexing the first 255 chars of a text field (via substr) might not be. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner wrote: > At 14:02 11/07/00 +0200, Jan Wieck wrote: > > AFAICS, we need to detoast values for index inserts allways > > and have another toaster inside the index access methods at > > some day. > > We might not need it...at least not in the furst pass. The thing is actually broken and needs a fix. As soon as "text" is toastable, it can happen everywhere that text is toasted even if it's actual plain value would perfectly fit into an index tuple. Think of a table with 20 text columns, where the indexed one has a 1024 bytes value, while all others hold 512 bytes. In that case, theindexed one is the biggest and get's toasted first. And if all the data is of nature that compression doesn'tgain enough, it might still be the biggest one after that step and will be considered for move off ... boom. We can't let this in in the first pass! > > In the meantime we should decide a safe maximum > > index tuple size and emit an explanative error message on the > > attempt to insert oversized index entries instead of possibly > > corrupting the index. > > Can I suggest that we also put out a warning when defining an index using a > field with a (potentially) unlimited size? Indexing a text field will > mostly be a bizarre thing to do, but, eg, indexing the first 255 chars of a > text field (via substr) might not be. Marking it BOLD somewhere in the release notes, the CREATE INDEX doc and some other places should be enough.Such a message at every CREATE INDEX is annoying. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
At 14:38 11/07/00 +0200, Jan Wieck wrote: >> Can I suggest that we also put out a warning when defining an index using a >> field with a (potentially) unlimited size? Indexing a text field will >> mostly be a bizarre thing to do, but, eg, indexing the first 255 chars of a >> text field (via substr) might not be. > > Marking it BOLD somewhere in the release notes, the CREATE > INDEX doc and some other places should be enough. Such a > message at every CREATE INDEX is annoying. The suggestion was only if the index contained a text, lztext etc field, but no problem. The way I read your suggestion was that I'd get a real error when doing an insert if the text was too large. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner wrote: > At 14:38 11/07/00 +0200, Jan Wieck wrote: > >> Can I suggest that we also put out a warning when defining an index using a > >> field with a (potentially) unlimited size? Indexing a text field will > >> mostly be a bizarre thing to do, but, eg, indexing the first 255 chars of a > >> text field (via substr) might not be. > > > > Marking it BOLD somewhere in the release notes, the CREATE > > INDEX doc and some other places should be enough. Such a > > message at every CREATE INDEX is annoying. > > The suggestion was only if the index contained a text, lztext etc field, > but no problem. The way I read your suggestion was that I'd get a real > error when doing an insert if the text was too large. Yes, that's what I'm after. It's too fragile IMHO to check on multi column indices with char(n) or so if resulting index tuples will fit in the future. The atttypmod field on NUMERIC columns for example doesn't tell the easy way how big the internal representation might grow. And what about variable size user defined types that are marked toastable? Can you estimatethe maximum internal storage size for them? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
At 15:08 11/07/00 +0200, Jan Wieck wrote: > > The atttypmod field on NUMERIC columns for example doesn't > tell the easy way how big the internal representation might > grow. And what about variable size user defined types that > are marked toastable? Can you estimate the maximum internal > storage size for them? > Well, uncompressed size would be a good upper estimate, since you may be passed already compressed data... ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
JanWieck@t-online.de (Jan Wieck) writes: > After debugging something I thought was a bug in the toaster, > I've found something really causing headaches. > TOAST AS IS IS NOT CAPABLE OF HOLDING INDEXED VALUES! > It appears that brtee indices (at least) can keep references > to old toast values that survive a VACUUM! Seems these > references live in nodes actually not referring to a heap > tuple any more, but used during tree traversal in > comparisions. As if an index tuple delete from a btree not > necessarily causes the index value to disappear from the > btree completely. It'll never be returned by an index scan, > but the value is still there somewhere. Oooh, nasty. Probably the keys you are looking at are in upper- level btree pages and indicate the ranges of keys found in lower pages, rather than being pointers to real tuples. One answer is to rebuild indexes from scratch during VACUUM, before we vacuum the TOAST relation. We've been talking about doing that for a long time. Maybe it's time to bite the bullet and do it. (Of course that means fixing the relation-versioning problem, which it seems we don't have a consensus on yet...) > Seems the designs of btree and toast are colliding. As soon > as "text" is toastable, this'll hurt - be warned. Text *is* marked toastable in current CVS... > AFAICS, we need to detoast values for index inserts allways > and have another toaster inside the index access methods at > some day. In the meantime we should decide a safe maximum > index tuple size and emit an explanative error message on the > attempt to insert oversized index entries instead of possibly > corrupting the index. I don't like that --- seems it would put a definite crimp in the whole point of TOAST, which is not to have arbitrary limits on field sizes. regards, tom lane
Tom Lane wrote: > JanWieck@t-online.de (Jan Wieck) writes: > > After debugging something I thought was a bug in the toaster, > > I've found something really causing headaches. > > TOAST AS IS IS NOT CAPABLE OF HOLDING INDEXED VALUES! > > It appears that brtee indices (at least) can keep references > > to old toast values that survive a VACUUM! Seems these > > references live in nodes actually not referring to a heap > > tuple any more, but used during tree traversal in > > comparisions. As if an index tuple delete from a btree not > > necessarily causes the index value to disappear from the > > btree completely. It'll never be returned by an index scan, > > but the value is still there somewhere. > > Oooh, nasty. Probably the keys you are looking at are in upper- > level btree pages and indicate the ranges of keys found in lower > pages, rather than being pointers to real tuples. So our btree implementation is closer to an ISAM file organization than to a real tree? Anyway, either one or the other is the reason that an attempt to insert a new value results in an lztext_cmp() call that cannot be resolved due to a missing toast value. I added some checks to the detoaster just to throw an elog(ERROR) instead of a coredump in such a case earlier today. > One answer is to rebuild indexes from scratch during VACUUM, > before we vacuum the TOAST relation. We've been talking about > doing that for a long time. Maybe it's time to bite the bullet > and do it. (Of course that means fixing the relation-versioning > problem, which it seems we don't have a consensus on yet...) Doesn't matter if we do it before or after, because the main heap shouldn't contain any more toast references to deleted (later to be vacuumed) toast entries at that time. Anyway, it's a nice idea that should solve the problem. For indices, which can allways be rebuilt from the heap data, I don't see such a big need for the versioning. Only that a partially rebuilt index (rebuild crashed in themiddle) needs another vacuum before the the DB is accessible again. How often does that happen? So why not having vacuum truncating the index file to zero and rebuilding it from scratch in place? Can anyone accessan index while vacuum has a lock on it's heap? > > > Seems the designs of btree and toast are colliding. As soon > > as "text" is toastable, this'll hurt - be warned. > > Text *is* marked toastable in current CVS... Whow - haven't noticed. Will run my tests against text ... parallel. Does it have any impact on the regression test execution time? Does anytoast table (that should now be there in the regression DB) loose it's zero size during the tests? > > > AFAICS, we need to detoast values for index inserts allways > > and have another toaster inside the index access methods at > > some day. In the meantime we should decide a safe maximum > > index tuple size and emit an explanative error message on the > > attempt to insert oversized index entries instead of possibly > > corrupting the index. > > I don't like that --- seems it would put a definite crimp in the > whole point of TOAST, which is not to have arbitrary limits on field > sizes. If we can solve it, let's do so. If we cannot, let's restrict it for 7.1. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
JanWieck@t-online.de (Jan Wieck) writes: > Tom Lane wrote: >> One answer is to rebuild indexes from scratch during VACUUM, >> before we vacuum the TOAST relation. We've been talking about >> doing that for a long time. Maybe it's time to bite the bullet >> and do it. (Of course that means fixing the relation-versioning >> problem, which it seems we don't have a consensus on yet...) > Doesn't matter if we do it before or after, because the main > heap shouldn't contain any more toast references to deleted > (later to be vacuumed) toast entries at that time. No, we must fix the indexes first, so that they contain no bogus values if we fail while vacuuming the TOAST relation. > Anyway, it's a nice idea that should solve the problem. For > indices, which can allways be rebuilt from the heap data, I > don't see such a big need for the versioning. Only that a > partially rebuilt index (rebuild crashed in the middle) needs > another vacuum before the the DB is accessible again. How > often does that happen? If it happens just once on one of your system-table indices, you won't be happy. We've sweated hard to make VACUUM crash-safe, and I don't want to throw that away because of TOAST. >> Text *is* marked toastable in current CVS... > Whow - haven't noticed. > Will run my tests against text ... parallel. Does it have any > impact on the regression test execution time? Does any toast > table (that should now be there in the regression DB) loose > it's zero size during the tests? Yes, there are some nonzero-size toast files in there. Haven't tried to run any timing tests... regards, tom lane
tOM lANE wrote: > JanWieck@t-online.de (Jan Wieck) writes: > > Tom Lane wrote: > >> One answer is to rebuild indexes from scratch during VACUUM, > >> before we vacuum the TOAST relation. We've been talking about > >> doing that for a long time. Maybe it's time to bite the bullet > >> and do it. (Of course that means fixing the relation-versioning > >> problem, which it seems we don't have a consensus on yet...) > > > Doesn't matter if we do it before or after, because the main > > heap shouldn't contain any more toast references to deleted > > (later to be vacuumed) toast entries at that time. > > No, we must fix the indexes first, so that they contain no bogus > values if we fail while vacuuming the TOAST relation. Got me. > > Anyway, it's a nice idea that should solve the problem. For > > indices, which can allways be rebuilt from the heap data, I > > don't see such a big need for the versioning. Only that a > > partially rebuilt index (rebuild crashed in the middle) needs > > another vacuum before the the DB is accessible again. How > > often does that happen? > > If it happens just once on one of your system-table indices, you > won't be happy. We've sweated hard to make VACUUM crash-safe, > and I don't want to throw that away because of TOAST. Alternatively we could go for both methods. Does any system catalog have an index on a varlena field? If not, we coulddo the classic vacuum on anything that is either a catalog or a table that doesn't have a toast relation. Then do the lazy reindex from scratch on anything left. > > >> Text *is* marked toastable in current CVS... > > > Whow - haven't noticed. > > > Will run my tests against text ... parallel. Does it have any > > impact on the regression test execution time? Does any toast > > table (that should now be there in the regression DB) loose > > it's zero size during the tests? > > Yes, there are some nonzero-size toast files in there. Haven't > tried to run any timing tests... No, there aren't. All you've seen are their indices of 16K each. But my tests, formerly using lztext, ran smooth with text. I've looked at textout() and, well, your style of detoasting arguments looks alot better and easier. From the way it's implemented I assume the per tuple memory context is done too, no? Jan #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
JanWieck@t-online.de (Jan Wieck) writes: > I've looked at textout() and, well, your style of detoasting > arguments looks alot better and easier. From the way it's > implemented I assume the per tuple memory context is done > too, no? Not yet --- I'm running regress tests on it right now, though. You're right that I'm assuming the function routines can leak memory without trouble. (We might need to avoid leaks in the comparison routines that are used for indexes, but otherwise I think this scheme will work comfortably.) regards, tom lane
Tom Lane wrote: > JanWieck@t-online.de (Jan Wieck) writes: > > I've looked at textout() and, well, your style of detoasting > > arguments looks alot better and easier. From the way it's > > implemented I assume the per tuple memory context is done > > too, no? > > Not yet --- I'm running regress tests on it right now, though. > You're right that I'm assuming the function routines can leak > memory without trouble. > > (We might need to avoid leaks in the comparison routines that are used > for indexes, but otherwise I think this scheme will work comfortably.) That sounds bad. At least not very good. So we better add a PG_FREEARG_xxx(ptr, argno) macro that does the pfree if the pointer is different from the one in the argument. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
JanWieck@t-online.de (Jan Wieck) writes: >> (We might need to avoid leaks in the comparison routines that are used >> for indexes, but otherwise I think this scheme will work comfortably.) > That sounds bad. At least not very good. > So we better add a PG_FREEARG_xxx(ptr, argno) macro that does > the pfree if the pointer is different from the one in the > argument. Yes, I already borrowed that idea from your original code. I don't like it a whole lot, but as long as the need for it is confined to the indexable comparison operators I think we can tolerate it. The alternative is to hack up the index search routines (and also tuplesort.c, and perhaps other places?) to maintain a short-term memory context for evaluating comparison operators, and reset said context fairly frequently. That might be doable but I haven't yet looked into what it would take. I'm hoping to commit what I have this evening... regards, tom lane