Thread: LZTEXT for rule plan stings

LZTEXT for rule plan stings

From
wieck@debis.com (Jan Wieck)
Date:
Hi,
   there  was  discussion  about  readding  the  lztext type for   internal use in pg_rewrite for 7.0, then remove it
againonce   we have TOAST.
 
   Was  because  significant growth of the rule plan strings due   to other changes can cause views/rules to be
rejectedby 7.0,   that easily worked with 6.5.
 
   I've reconstructed the entire lztext type now from CVS, could   reapply changes and reactivate deleted  files  from
./Attic.  Will  work on a NOTICE/ERROR message now, preventing users to   use it in their schemas.
 
   But it requires an initdb and we're in BETA. So I better  ask   if someone complains.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #




Re: [HACKERS] LZTEXT for rule plan stings

From
The Hermit Hacker
Date:
On Fri, 25 Feb 2000, Jan Wieck wrote:

> Hi,
> 
>     there  was  discussion  about  readding  the  lztext type for
>     internal use in pg_rewrite for 7.0, then remove it again once
>     we have TOAST.
> 
>     Was  because  significant growth of the rule plan strings due
>     to other changes can cause views/rules to be rejected by 7.0,
>     that easily worked with 6.5.
> 
>     I've reconstructed the entire lztext type now from CVS, could
>     reapply changes and reactivate deleted  files  from  ./Attic.
>     Will  work on a NOTICE/ERROR message now, preventing users to
>     use it in their schemas.
> 
>     But it requires an initdb and we're in BETA. So I better  ask
>     if someone complains.

this close to the beginning of beta, I would say go for it ... the
benefits of doing so, as I undesrtand it, *far* outweighs the disadvantage
of requiring an initdb ...




Re: [HACKERS] LZTEXT for rule plan stings

From
Don Baccus
Date:
At 09:36 PM 2/25/00 +0100, Jan Wieck wrote:

>    But it requires an initdb and we're in BETA. So I better  ask
>    if someone complains.

Well...here's an example of a view that worked in 6.5, with an 8KB
block size, that fails in 7.0 unless I build with a 16KB block size:

create view ec_products_displayable
as
select * from ec_products
where active_p='t';

Impressively large, eh? :)  I was kinda grossed out when Postgres
choked on it, to be honest.

ec_products in this case has quite a few columns...

You know, I've investigated further and the rule string itself is
no where near 8KB.  More like 1KB.  So there is more to the story
than just the string itself.

Still, it should help because most of my failing views were just
a bit over 8KB.  One still fails with a 16KB block size, though!
Fortunately it's not currently used in the web tool kit.

Anyway, it seems to me that we need SOME solution to this problem.
It is going to be hard to convince users that views like the one
above are really too complex for Postgres to handle.

I still like the idea of "text" being implemented under the hood
as lzText for a quick 7.1 release if that idea works out ...



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] LZTEXT for rule plan stings

From
Hannu Krosing
Date:
Don Baccus wrote:
> 
> 
> I still like the idea of "text" being implemented under the hood
> as lzText for a quick 7.1 release if that idea works out ...

But without TOAST it would result in _undefined_ max tuple length,
which is probably not desirable.

Using it for views is another thing as their max size was undefined 
to begin with.

-----------
Hannu


Re: [HACKERS] LZTEXT for rule plan stings

From
Tom Lane
Date:
Don Baccus <dhogaza@pacifier.com> writes:
> Well...here's an example of a view that worked in 6.5, with an 8KB
> block size, that fails in 7.0 unless I build with a 16KB block size:

> create view ec_products_displayable
> as
> select * from ec_products
> where active_p='t';

> You know, I've investigated further and the rule string itself is
> no where near 8KB.  More like 1KB.  So there is more to the story
> than just the string itself.

Really?  That's interesting.  Could you send me a test case
(create table and create view commands)?
        regards, tom lane


Re: [HACKERS] LZTEXT for rule plan stings

From
Tom Lane
Date:
wieck@debis.com (Jan Wieck) writes:
>     Will  work on a NOTICE/ERROR message now, preventing users to
>     use it in their schemas.

I think a NOTICE is sufficient --- people who really need it will
go in and dike out any ERROR anyway ;-)

>     But it requires an initdb and we're in BETA. So I better  ask
>     if someone complains.

Not me.
        regards, tom lane


Re: [HACKERS] LZTEXT for rule plan stings

From
Don Baccus
Date:
At 03:02 AM 2/26/00 +0200, Hannu Krosing wrote:
>Don Baccus wrote:
>> 
>> 
>> I still like the idea of "text" being implemented under the hood
>> as lzText for a quick 7.1 release if that idea works out ...
>
>But without TOAST it would result in _undefined_ max tuple length,
>which is probably not desirable.

Boy, I'd sure find it desirable.  There's nothing to stop people from
using varchar(8000) or whatever if they want a predictable top limit.
Text is not a standard type, and this wouldn't break standard semantics.

lzText wasn't removed because folks thought it was useless, IIRC,
it was removed because TOAST was an exciting and much more powerful
approach and no one wanted to introduce a new type doomed to disappear
after a single release cycle.

With TOAST, from the user's point of view you'll still have an
_undefined_ max tuple length - the max will just be really, really
large.  Sure, the tuples will actually be fixed but large varying
types can be split off into a series of tuples in the TOASTer
oven, so to speak.  So I guess I have difficulty understanding
your argument.

If text were implemented as lzText for a quick 7.1, which apparently
was Jan's spin on the idea, then for 7.1 we'd say:

"maximum number of characters you can store in a column of typetext varies"

and after TOAST we'd say:

"maximum number of characters you can store in a column of typetext varies"

Right?  The only difference is that the _undefined_ maximum in
the non-TOAST case is "thousands of characters" and in the TOAST
case "gigabytes of characters" but undefined is undefined in my
book.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] LZTEXT for rule plan stings

From
Don Baccus
Date:
At 08:17 PM 2/25/00 -0500, Tom Lane wrote:
>Don Baccus <dhogaza@pacifier.com> writes:
>> Well...here's an example of a view that worked in 6.5, with an 8KB
>> block size, that fails in 7.0 unless I build with a 16KB block size:
>
>> create view ec_products_displayable
>> as
>> select * from ec_products
>> where active_p='t';
>
>> You know, I've investigated further and the rule string itself is
>> no where near 8KB.  More like 1KB.  So there is more to the story
>> than just the string itself.
>
>Really?  That's interesting.  Could you send me a test case
>(create table and create view commands)?

I'll try to get to it soon.




- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] LZTEXT for rule plan stings

From
Don Baccus
Date:
At 08:17 PM 2/25/00 -0500, Tom Lane wrote:

>Really?  That's interesting.  Could you send me a test case
>(create table and create view commands)?

Normally, I wouldn't post the test case to the whole group
but figured folks might find this interesting.  It's not all
that complex a table and the view of course is very simple.

Now...this is running on a snapshot from last weekend, just
before you fixed the pg_dump/reload problem associated with
column alias and views.  I tried downloading the latest sources
via CVS and got bit by the "it don't compile" problem others
have complained about earlier today (ecpg).  Here's the test
case:

create table ec_products (   product_id      integer not null primary key,   sku         varchar(100),   product_name
    varchar(200),   creation_date       datetime default current_timestamp not null,   one_line_description
varchar(400),  detailed_description    varchar(4000),   search_keywords     varchar(4000),   price           numeric,
shipping       numeric,   shipping_additional numeric,   weight          float4,   dirname         varchar(200),
present_p      char(1) check (present_p in ('f','t')) default 't',   active_p        char(1) check (active_p in
('f','t'))default 't',   available_date      datetime default current_timestamp not null,   announcements
varchar(4000),  announcements_expire    datetime,   url         varchar(300),   template_id     integer,   stock_status
      char(1) check (stock_status in ('o','q','m','s','i')),   last_modified       datetime not null,
last_modifying_userinteger not null,   modified_ip_address varchar(20) not null
 
);

create view ec_products_displayable
as
select * from ec_products e
where active_p='t';



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] LZTEXT for rule plan stings

From
Don Baccus
Date:
At 08:18 PM 2/25/00 -0500, Tom Lane wrote:
>wieck@debis.com (Jan Wieck) writes:
>>     Will  work on a NOTICE/ERROR message now, preventing users to
>>     use it in their schemas.
>
>I think a NOTICE is sufficient --- people who really need it will
>go in and dike out any ERROR anyway ;-)

It will also be very easy to upgrade to TOAST for those of us who
are knowledgable enough to edit our pg_dumps from "lztext" to "text"
when that feature finally comes out.  We'll undoubtably have to
dump and reload anyway :)



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] LZTEXT for rule plan stings

From
wieck@debis.com (Jan Wieck)
Date:
Don Baccus wrote:

> At 03:02 AM 2/26/00 +0200, Hannu Krosing wrote:
> >Don Baccus wrote:
> >>
> >>
> >> I still like the idea of "text" being implemented under the hood
> >> as lzText for a quick 7.1 release if that idea works out ...
> >
> >But without TOAST it would result in _undefined_ max tuple length,
> >which is probably not desirable.
   True.

> Boy, I'd sure find it desirable.  There's nothing to stop people from
> using varchar(8000) or whatever if they want a predictable top limit.
> Text is not a standard type, and this wouldn't break standard semantics.
>
> lzText wasn't removed because folks thought it was useless, IIRC,
> it was removed because TOAST was an exciting and much more powerful
> approach and no one wanted to introduce a new type doomed to disappear
> after a single release cycle.
   True.

> With TOAST, from the user's point of view you'll still have an
> _undefined_ max tuple length - the max will just be really, really
> large.  Sure, the tuples will actually be fixed but large varying
> types can be split off into a series of tuples in the TOASTer
> oven, so to speak.  So I guess I have difficulty understanding
> your argument.
   False.
   With  TOAST, the maximum tuple length is limited by available   disk space (minus some overhead) and/or the number
ofbits we   use  to  represent  the  values original size and/or the size   addressable by the TOAST'ers table at  all.
Available  space   allways  limits  the  amount of data in a DB, and you allways   have to take some overhead into
account, but  calling  this   _undefined_  isn't  correct  IMHO  -  better  call it hard to   figure out.
 
   The number  of  bits  representing  the  attributes  size  is   another  story, because we already decided to use
someof the   top bits for special purposes, so  a  single  attribute  will   have  some limit around 1/4 to 1 GB. Not
toobad I think, who   would ever attempt to store a complete server backup  in  one   tuple?  And  which client/server
combowill be able to handle   the required queries using the existing  FE/BE  protocol  and   libpq  implementation
either. Thus  there  are other limits   causing problems before we need to continue this  discussion,   surely.
 

> If text were implemented as lzText for a quick 7.1, which apparently
> was Jan's spin on the idea, then for 7.1 we'd say:
   On  the first look, it was a tempting solution. But there are   ton's of places in the backend, that assume  text
is binary   compatible  to  something  or the bytes after the VARSIZE are   plain value bytes, not some compressed
garbageto  be  passed   through  a function first. Replacing TEXT by LZTEXT therefore   wouldn't be such an easy job,
butwould be  working  for  the   wastebasked  from  the  very  beginning anyway, because TOAST   needs to revert it all
again.
   I don't like that kind of work.
   Maybe I found some kind of compromise:
   -  We make LZTEXT a released type, without warning and anyone      can use it as needed.
   -  When  featuring  TOAST,  we  remove  it  and create a type      alias. This way, the  "backend"  will  convert
the table      schemas   (WRT   lztext->text)   at  reload  time  of  the      conversion.
 
   -  We keep the type alias active past the  next  one  or  two      major   releases.   Someone   skipping   major
releases,     converting from say 7.1 to 9.2, will have  other  problems      than  replacing  all  occurences  of
lztextby text in his      dumps.
 
   Actually I have some problems with the type  coercion  stuff.   There  are  functions  lztext(text)  and  vice
versa,but the   system is unable to find an "=" operator for lztext and  text   when issuing
 
     SELECT * FROM t1, t2, WHERE t1.lztext_att = t2.text_att;
   This  worked  in the past releases (IIRC), so I wonder if the   failure above is a wanted "feature".  I'll commit
thestuff I   have  tomorrow  and  hope  someone  can  help  me  to get the   coercion working. All we have to do then
isto  tell  in  the   release notes and docs "Never use LZTEXT type name explicitly   in an application query (like for
typecasting)  -  use  TEXT   instead".
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #




Re: [HACKERS] LZTEXT for rule plan stings

From
Don Baccus
Date:
At 04:06 AM 2/26/00 +0100, Jan Wieck wrote:

>    False.

>    With  TOAST, the maximum tuple length is limited by available
>    disk space (minus some overhead) and/or the number of bits we
>    use  to  represent  the  values original size and/or the size
>    addressable by the TOAST'ers table at  all.  Available  space
>    allways  limits  the  amount of data in a DB, and you allways
>    have to take some overhead into  account,  but  calling  this
>    _undefined_  isn't  correct  IMHO  -  better  call it hard to
>    figure out.

Same is true for non-TOAST lzText.

Or...non lzText text, for that matter.

Of course, the size of text IS UNDEFINED TODAY.

create table foo (   t1 text,   t2 text);

Pray tell, what is the maximum size of t1? 

Is it independent of t2?   Or...correct me if I'm mistaken...if t2
contains 8,000+ characters won't "insert" bomb me if I try to 
insert 8,000+ characters into t1?  Or even a few characters?  Exactly
where is this vaunted and well defined limit?

(oops - you can't answer that question because it depends on the
size of BLCKSZ, which of course one can change at will)

The maximum size of "text" is already undefined, as it depends on:

BLCKSZ (which the user may not've set herself, and maybe is unaware of
if she's a user at the mercy of some sysadmin)

and 

the actual bytes occupied by other variable-length columns.

"bytea" for instance.  "text" for instance.  "varchar(n)" for instance,
which actually is a variable-length string which has a maximum value.

PG lets me do this:

create table foo (  t1  varchar(8000),  t2  varchar(8000),  tn  varchar(8000) -- n fairly large
);

Can I insert 8K chars into t1?

Into t2?

Into t3?

Trick PG question - into all three at once?

Sorry, but this is a crap argument.  There is no way to know how
many characters you can insert into a "text" column unless you have
detailed knowledge of the table, not only the types in the table
but the data stored in the pertinent row of the table.

I should know, I've been fighting this when porting code over from
Oracle, where the blocksize truly limits the size of ONE COLUMN,
not a row (tuple) at large.

If I can really have a tuple with 1000 varchar(BLCKSZ-overhead) columns,
fully filled with data, could you please tell me how to do this?  My
life will be much simpler.

>> If text were implemented as lzText for a quick 7.1, which apparently
>> was Jan's spin on the idea, then for 7.1 we'd say:
>
>    On  the first look, it was a tempting solution. But there are
>    ton's of places in the backend, that assume  text  is  binary
>    compatible  to  something  or the bytes after the VARSIZE are
>    plain value bytes, not some compressed garbage to  be  passed
>    through  a function first. Replacing TEXT by LZTEXT therefore
>    wouldn't be such an easy job, but would be  working  for  the
>    wastebasked  from  the  very  beginning anyway, because TOAST
>    needs to revert it all again.

>    I don't like that kind of work.

Nor do I, which is why I didn't suggest it when lzText first came up
and drifted into a TOAST discussion.  Clearly, TOAST is a better
solution.

In particular, it solves Hannu's objection regarding the fact that
a compressed text type would have no fixed upper limit.

Better yet, it would solve Hannu's misunderstanding that today's
text type has such a limit.

Because (I love flogging dead horses):

create table foo ( i: integer, t: text);

and 

create table bar ( t: text);

create two columns T with different maximum limits.  Because the
limit is based on tuple-size.

A compressed text type is only a bad idea because it's a dead end.
Not because it turns a "defined" max text limit into an undefined
max text limit.  The maximum number of chars you can stuff into
a text var is always undefined unless you dissect exactly how
other columns eat storage.

>    Maybe I found some kind of compromise:
>
>    -  We make LZTEXT a released type, without warning and anyone
>       can use it as needed.
>
>    -  When  featuring  TOAST,  we  remove  it  and create a type
>       alias. This way, the  "backend"  will  convert  the  table
>       schemas   (WRT   lztext->text)   at  reload  time  of  the
>       conversion.

I have no strong feelings here.  Personally, I can live with just
compiling PG with a 16KB blocksize, for the work I'm doing today.

But I don't think the upgrade problem's a big deal.  If the type's
not popularized, only those of us "inside" will know of it, and as
far as I'm concerned, hand-editing a pg_dump would be fine with me if
I choose to use it.

But I'm only speaking for myself.

TOAST is clearly the way to go.

On the other hand, I don't see people flinging bricks at Interbase
for compressing their text type.  After all, they have outer joins...

>    Actually I have some problems with the type  coercion  stuff.
>    There  are  functions  lztext(text)  and  vice versa, but the
>    system is unable to find an "=" operator for lztext and  text
>    when issuing
>
>      SELECT * FROM t1, t2, WHERE t1.lztext_att = t2.text_att;
>
>    This  worked  in the past releases (IIRC), so I wonder if the
>    failure above is a wanted "feature".  I'll commit the stuff I
>    have  tomorrow  and  hope  someone  can  help  me  to get the
>    coercion working. All we have to do then is to  tell  in  the
>    release notes and docs "Never use LZTEXT type name explicitly
>    in an application query (like for type casting)  -  use  TEXT
>    instead".

Despite the above, I have no really strong feelings.  I only raised
the compressed text issue because my (belated) reading of the Interbase
docs made it clear that they do this, and Tom resurrected lztext in
regard to views (and my problems there probably made it a red herring
in this case, too!)  It's an interesting idea, and if TOAST is indeed
implemented probably a moot one.  Though...where is the crossover between
an in-place compression and moving an item to the TOASTed table.  And...
all of the problems with the backend making assumptions about text
etc will have to be addressed by the TOASTER, too.

For instance...varchar(4000) might still benefit from being compressed,
even if it is not TOASTed, due to PG's love of dragging full tuples
around.  Saves disk space.  Bigger slices of tables can be sorted in
memory vs. disk for any given backend sort/hash buffer size parameter.
Today's x86 CPUs, at least, favor shrinking the memory footprint of
data due to the fact that CPUs tend to be data-starved when working
on large amounts of data in RAM.   Etc etc etc.  So such a compressed
implementation may actually be a win even if Hannu's made happy by
affixing fixed varchar(n) limits on the column length.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] LZTEXT for rule plan stings

From
Tom Lane
Date:
Don Baccus <dhogaza@pacifier.com> writes:
> Here's the test case:

Hmm.  I get a rule string exceeding 8K out of this (in current sources),
as checked by breakpointing at InsertRule() in rewriteDefine.c and
looking at 'actiontree'.

What's your basis for asserting the rule is only ~ 1K?
        regards, tom lane


Re: [HACKERS] LZTEXT for rule plan stings

From
Hannu Krosing
Date:
Don Baccus wrote:
> 
> Boy, I'd sure find it desirable.  There's nothing to stop people from
> using varchar(8000) or whatever if they want a predictable top limit.
> Text is not a standard type, and this wouldn't break standard semantics.
>
> lzText wasn't removed because folks thought it was useless, IIRC,
> it was removed because TOAST was an exciting and much more powerful
> approach and no one wanted to introduce a new type doomed to disappear
> after a single release cycle.
> 
> With TOAST, from the user's point of view you'll still have an
> _undefined_ max tuple length - the max will just be really, really
> large.  Sure, the tuples will actually be fixed but large varying
> types can be split off into a series of tuples in the TOASTer
> oven, so to speak.  So I guess I have difficulty understanding
> your argument.

Acutually it was not undefined but variable that made me uncertain - 
i.e. the fact that max size depends on the contents of string

> If text were implemented as lzText for a quick 7.1, which apparently
> was Jan's spin on the idea, then for 7.1 we'd say:
> 
> "maximum number of characters you can store in a column of type
>  text varies"

... varies from below 8K to ~100K depending on the redundancy of data"

> and after TOAST we'd say:
> 
> "maximum number of characters you can store in a column of type
>  text varies"

Rather "maximum number of characters you can store in a column of type text is limited by available memory and/or disk
space"

-----------------
Hannu


Re: [HACKERS] LZTEXT for rule plan stings

From
Don Baccus
Date:
At 01:27 AM 2/26/00 -0500, Tom Lane wrote:
>Don Baccus <dhogaza@pacifier.com> writes:
>> Here's the test case:
>
>Hmm.  I get a rule string exceeding 8K out of this (in current sources),
>as checked by breakpointing at InsertRule() in rewriteDefine.c and
>looking at 'actiontree'.
>
>What's your basis for asserting the rule is only ~ 1K?

I looked at the string dumped by pg_dump and it didn't appear to be
anywhere near 8KB, so I presumed that the actual data stuffed into
the rule is larger than whatever gets dumped out as the source
representation.

I've never looked at the implementation of rules, so it's unclear
to me just exactly what is being saved and just how much of it
using lzText would impact.

I had breakpointed the debugger at first and that's why I first
said apparently the rule string was > 8KB.  Then I looked at 
pg_dump output and had doubts that the answer was this simple...



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] LZTEXT for rule plan stings

From
Don Baccus
Date:
At 02:41 PM 2/26/00 +0200, Hannu Krosing wrote:

>Rather "maximum number of characters you can store in a column of type
>  text is limited by available memory and/or disk space"

TOAST is obviously ideal, so in a sense this discussion's pointless
because I have no doubt TOAST will happen.

We could still put an 8KB upper limit on a compressed text type if
we wish.  The size savings would be a plus, and you'd be able to
have full-sized 8KB text columns in many tables, at least, that carry
a bunch of other cruft around.

That's really the problem I run in porting over the web toolkit
from arsDigita.

I see tables that have two or three varchar(4000) columns with other
data, i.e. names and stuff that are also varchar but smaller.  I know
that these don't actually get stuffed with 4000 chars but rather
that 4KB is the upper limit of the size of an Oracle varchar and that
the author's been lazy.  If I had a compressed text or varchar type
I'd be quite confident that the application code would run even with
an 8KB block size.

In the interim.  Until TOAST comes or until I have time to dig into
the code and determine more accurate and reasonable sizes for the
varchars.

On the other hand, as I've mentioned I'm also just as happy to run
with a 16KB block size.  From the point of view of distributing the
web toolkit, some of our little group feel uncomfortable with that
requirement but it doesn't really bother me as I know TOAST will solve
the problem and that by end of year we'll be able to run the toolkit
on a default installation of Postgres.

So I'm happy, I run with a 16KB block size and eagerly await TOASTed
tuples.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] LZTEXT for rule plan stings

From
Bruce Momjian
Date:
> I see tables that have two or three varchar(4000) columns with other
> data, i.e. names and stuff that are also varchar but smaller.  I know
> that these don't actually get stuffed with 4000 chars but rather
> that 4KB is the upper limit of the size of an Oracle varchar and that
> the author's been lazy.  If I had a compressed text or varchar type
> I'd be quite confident that the application code would run even with
> an 8KB block size.

Just to clearify, varchar(4000) does not take 4000 chars on disk, while
char(4000) does use 4000 chars on the disk.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] LZTEXT for rule plan stings

From
Tom Lane
Date:
Don Baccus <dhogaza@pacifier.com> writes:
>> What's your basis for asserting the rule is only ~ 1K?

> I looked at the string dumped by pg_dump and it didn't appear to be
> anywhere near 8KB, so I presumed that the actual data stuffed into
> the rule is larger than whatever gets dumped out as the source
> representation.

Yes, the source representation is *vastly* more compact.  A single
result column might look like "tab1.product_id" when dumped by pg_dump,
but the nodetree dump looks more like
     { TARGETENTRY      :resdom         { RESDOM         :resno 1         :restype 23         :restypmod -1
:resnameproduct_id         :reskey 0         :reskeyop 0         :ressortgroupref 0         :resjunk false         }
      :expr         { VAR         :varno 1         :varattno 1         :vartype 23         :vartypmod -1
:varlevelsup0         :varnoold 1         :varoattno 1        }     }
 

and (except for not using any excess whitespace) that is exactly what
goes into a rule action string.

As you can see, this is very amenable to compression, especially
when you have a lot of columns in a view.

Someday we might think about using a more compact representation for
stored rules, but there are advantages to using a format that's fairly
easy for a human to examine.
        regards, tom lane


Re: [HACKERS] LZTEXT for rule plan stings

From
Don Baccus
Date:
At 12:15 PM 2/26/00 -0500, Tom Lane wrote:

>and (except for not using any excess whitespace) that is exactly what
>goes into a rule action string.
>
>As you can see, this is very amenable to compression, especially
>when you have a lot of columns in a view.
>
>Someday we might think about using a more compact representation for
>stored rules, but there are advantages to using a format that's fairly
>easy for a human to examine.

Oh, now I understand, I didn't realize the tree was being stored in
human-readable form as a string, but thought it was being parsed into
a binary form.  That's why I began having doubts that I might've triggered
unecessary work on Jan's part regarding lztext.  Yes, since it's
stored as a text string lztext should help a LOT.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] LZTEXT for rule plan stings

From
Don Baccus
Date:
At 10:36 AM 2/26/00 -0500, Bruce Momjian wrote:
>> I see tables that have two or three varchar(4000) columns with other
>> data, i.e. names and stuff that are also varchar but smaller.  I know
>> that these don't actually get stuffed with 4000 chars but rather
>> that 4KB is the upper limit of the size of an Oracle varchar and that
>> the author's been lazy.  If I had a compressed text or varchar type
>> I'd be quite confident that the application code would run even with
>> an 8KB block size.
>
>Just to clearify, varchar(4000) does not take 4000 chars on disk, while
>char(4000) does use 4000 chars on the disk.

Yes, I know.  The point is that without digging into how the code actually
uses these tables, I don't know which, if any, of the columns might
actually get stuffed with two, three, or four thousand characters.  If
I'm unlucky, all of them will be.  For now my simple solution is to
run with a 16KB blocksize and not worry about it.

This isn't postgresql's fault or whatever, the basic problem is a
lazy web hacker arbitrarily declaring varchar(4000) columns rather
than sitting down and determining what they need, because in Oracle
the amount taken is also only the number of bytes in the string
stuffed into the column.

This is kind of a pointless discussion.  We all know that TOAST is
going to be ultra-slick.

lztext was resurrected as an idea by Tom Lane in response to the
explosion in the length of the rule strings generated for views
in PG7.0.

That just triggered a memory on my part that Interbase apparently
compresses their text type, a fact I found interesting enough to
mention.

I'm neither lobbying for or against Postgres implementation of lztext,
text as lztext, or anything else.

I just found the notion interesting...

It would be nice if a simple table/view combination such as I posted
here earlier didn't bomb PG7.0 with a default 8KB blocksize, though!

My own views are working fine since I've switched to a 16KB blocksize
for the reasons hinted at above, but the fact that this example fails
in the default 8KB version is pretty grotty.  Tom Lane will probably
have it all fixed via lztext or some other method before most of the
folks on this list read this note :)

Regarding large types, TOAST is clearly the path to follow, and Jan's
plans for TOASTed couples includes compression when  appropriate.  I
also think we can layer SQL3-compliant BLOBs and CLOBs on top of his
TOAST implementation later on - for compatibility reasons only, of
course.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] LZTEXT for rule plan stings

From
Tom Lane
Date:
Don Baccus <dhogaza@pacifier.com> writes:
> My own views are working fine since I've switched to a 16KB blocksize
> for the reasons hinted at above, but the fact that this example fails
> in the default 8KB version is pretty grotty.  Tom Lane will probably
> have it all fixed via lztext or some other method before most of the
> folks on this list read this note :)

Not me --- Jan gets the credit for lztext.
        regards, tom lane


Re: [HACKERS] LZTEXT for rule plan stings

From
Don Baccus
Date:
At 04:35 PM 2/26/00 -0500, Tom Lane wrote:
>Don Baccus <dhogaza@pacifier.com> writes:
>> My own views are working fine since I've switched to a 16KB blocksize
>> for the reasons hinted at above, but the fact that this example fails
>> in the default 8KB version is pretty grotty.  Tom Lane will probably
>> have it all fixed via lztext or some other method before most of the
>> folks on this list read this note :)
>
>Not me --- Jan gets the credit for lztext.

Did he hook it up to pg_rewrite, then?  If so, I'll try downloading
it and I'll toss my stuff at it...



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.