Re: [HACKERS] LZTEXT for rule plan stings - Mailing list pgsql-hackers
From | Don Baccus |
---|---|
Subject | Re: [HACKERS] LZTEXT for rule plan stings |
Date | |
Msg-id | 3.0.1.32.20000225211513.016e43b0@mail.pacifier.com Whole thread Raw |
In response to | Re: [HACKERS] LZTEXT for rule plan stings (wieck@debis.com (Jan Wieck)) |
List | pgsql-hackers |
At 04:06 AM 2/26/00 +0100, Jan Wieck wrote: > False. > With TOAST, the maximum tuple length is limited by available > disk space (minus some overhead) and/or the number of bits we > use to represent the values original size and/or the size > addressable by the TOAST'ers table at all. Available space > allways limits the amount of data in a DB, and you allways > have to take some overhead into account, but calling this > _undefined_ isn't correct IMHO - better call it hard to > figure out. Same is true for non-TOAST lzText. Or...non lzText text, for that matter. Of course, the size of text IS UNDEFINED TODAY. create table foo ( t1 text, t2 text); Pray tell, what is the maximum size of t1? Is it independent of t2? Or...correct me if I'm mistaken...if t2 contains 8,000+ characters won't "insert" bomb me if I try to insert 8,000+ characters into t1? Or even a few characters? Exactly where is this vaunted and well defined limit? (oops - you can't answer that question because it depends on the size of BLCKSZ, which of course one can change at will) The maximum size of "text" is already undefined, as it depends on: BLCKSZ (which the user may not've set herself, and maybe is unaware of if she's a user at the mercy of some sysadmin) and the actual bytes occupied by other variable-length columns. "bytea" for instance. "text" for instance. "varchar(n)" for instance, which actually is a variable-length string which has a maximum value. PG lets me do this: create table foo ( t1 varchar(8000), t2 varchar(8000), tn varchar(8000) -- n fairly large ); Can I insert 8K chars into t1? Into t2? Into t3? Trick PG question - into all three at once? Sorry, but this is a crap argument. There is no way to know how many characters you can insert into a "text" column unless you have detailed knowledge of the table, not only the types in the table but the data stored in the pertinent row of the table. I should know, I've been fighting this when porting code over from Oracle, where the blocksize truly limits the size of ONE COLUMN, not a row (tuple) at large. If I can really have a tuple with 1000 varchar(BLCKSZ-overhead) columns, fully filled with data, could you please tell me how to do this? My life will be much simpler. >> If text were implemented as lzText for a quick 7.1, which apparently >> was Jan's spin on the idea, then for 7.1 we'd say: > > On the first look, it was a tempting solution. But there are > ton's of places in the backend, that assume text is binary > compatible to something or the bytes after the VARSIZE are > plain value bytes, not some compressed garbage to be passed > through a function first. Replacing TEXT by LZTEXT therefore > wouldn't be such an easy job, but would be working for the > wastebasked from the very beginning anyway, because TOAST > needs to revert it all again. > I don't like that kind of work. Nor do I, which is why I didn't suggest it when lzText first came up and drifted into a TOAST discussion. Clearly, TOAST is a better solution. In particular, it solves Hannu's objection regarding the fact that a compressed text type would have no fixed upper limit. Better yet, it would solve Hannu's misunderstanding that today's text type has such a limit. Because (I love flogging dead horses): create table foo ( i: integer, t: text); and create table bar ( t: text); create two columns T with different maximum limits. Because the limit is based on tuple-size. A compressed text type is only a bad idea because it's a dead end. Not because it turns a "defined" max text limit into an undefined max text limit. The maximum number of chars you can stuff into a text var is always undefined unless you dissect exactly how other columns eat storage. > Maybe I found some kind of compromise: > > - We make LZTEXT a released type, without warning and anyone > can use it as needed. > > - When featuring TOAST, we remove it and create a type > alias. This way, the "backend" will convert the table > schemas (WRT lztext->text) at reload time of the > conversion. I have no strong feelings here. Personally, I can live with just compiling PG with a 16KB blocksize, for the work I'm doing today. But I don't think the upgrade problem's a big deal. If the type's not popularized, only those of us "inside" will know of it, and as far as I'm concerned, hand-editing a pg_dump would be fine with me if I choose to use it. But I'm only speaking for myself. TOAST is clearly the way to go. On the other hand, I don't see people flinging bricks at Interbase for compressing their text type. After all, they have outer joins... > Actually I have some problems with the type coercion stuff. > There are functions lztext(text) and vice versa, but the > system is unable to find an "=" operator for lztext and text > when issuing > > SELECT * FROM t1, t2, WHERE t1.lztext_att = t2.text_att; > > This worked in the past releases (IIRC), so I wonder if the > failure above is a wanted "feature". I'll commit the stuff I > have tomorrow and hope someone can help me to get the > coercion working. All we have to do then is to tell in the > release notes and docs "Never use LZTEXT type name explicitly > in an application query (like for type casting) - use TEXT > instead". Despite the above, I have no really strong feelings. I only raised the compressed text issue because my (belated) reading of the Interbase docs made it clear that they do this, and Tom resurrected lztext in regard to views (and my problems there probably made it a red herring in this case, too!) It's an interesting idea, and if TOAST is indeed implemented probably a moot one. Though...where is the crossover between an in-place compression and moving an item to the TOASTed table. And... all of the problems with the backend making assumptions about text etc will have to be addressed by the TOASTER, too. For instance...varchar(4000) might still benefit from being compressed, even if it is not TOASTed, due to PG's love of dragging full tuples around. Saves disk space. Bigger slices of tables can be sorted in memory vs. disk for any given backend sort/hash buffer size parameter. Today's x86 CPUs, at least, favor shrinking the memory footprint of data due to the fact that CPUs tend to be data-starved when working on large amounts of data in RAM. Etc etc etc. So such a compressed implementation may actually be a win even if Hannu's made happy by affixing fixed varchar(n) limits on the column length. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
pgsql-hackers by date: