Thread: type coerce problem with lztext

type coerce problem with lztext

From
wieck@debis.com (Jan Wieck)
Date:
Well,
   LZTEXT  is there again, and pg_rewrite uses it for action and   qual strings. This is what it tells:
   pgsql=# select rulename, length(ev_action), octet_length(ev_action)   pgsql-#   from pg_rewrite;       rulename    |
length| octet_length   ----------------+--------+--------------    _RETpg_user    |   3043 |          855
_RETpg_rules  |   3074 |         1139    _RETpg_views   |   4261 |         1252    _RETpg_tables  |   5187 |
1338   _RETpg_indexes |   3525 |         1122   (5 rows)
 
   Yes, the 3043 bytes long rule action string got stored in 855   bytes  in  pg_rewrite.  That's 71.9% compression
rateon this   attempt!
 
   There are functions text(lztext) and  lztext(text)  too,  but   the  system  is  unable  to  find an operator if one
compares  text=lztext in a query. IIRC, creating a  function  named  as   the target type and taking the source type is
whatmade auto-   type-conversion work - so what am I missing here?
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #




Re: [HACKERS] type coerce problem with lztext

From
Bruce Momjian
Date:
>     Yes, the 3043 bytes long rule action string got stored in 855
>     bytes  in  pg_rewrite.  That's 71.9% compression rate on this
>     attempt!
> 
>     There are functions text(lztext) and  lztext(text)  too,  but
>     the  system  is  unable  to  find an operator if one compares
>     text=lztext in a query. IIRC, creating a  function  named  as
>     the target type and taking the source type is what made auto-
>     type-conversion work - so what am I missing here?

Added to Features:
New lztext data type for compressed text fieldsLarger views/rules supported

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] type coerce problem with lztext

From
Don Baccus
Date:
At 01:03 PM 2/27/00 +0100, Jan Wieck wrote:
>Well,
>
>    LZTEXT  is there again, and pg_rewrite uses it for action and
>    qual strings. This is what it tells:
>
>    pgsql=# select rulename, length(ev_action), octet_length(ev_action)
>    pgsql-#   from pg_rewrite;
>        rulename    | length | octet_length
>    ----------------+--------+--------------
>     _RETpg_user    |   3043 |          855
>     _RETpg_rules   |   3074 |         1139
>     _RETpg_views   |   4261 |         1252
>     _RETpg_tables  |   5187 |         1338
>     _RETpg_indexes |   3525 |         1122
>    (5 rows)
>
>    Yes, the 3043 bytes long rule action string got stored in 855
>    bytes  in  pg_rewrite.  That's 71.9% compression rate on this
>    attempt!

This will greatly help counter 7.0's "rule length explosion". 

Thanks.




- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] type coerce problem with lztext

From
Tom Lane
Date:
wieck@debis.com (Jan Wieck) writes:
>     There are functions text(lztext) and  lztext(text)  too,  but
>     the  system  is  unable  to  find an operator if one compares
>     text=lztext in a query. IIRC, creating a  function  named  as
>     the target type and taking the source type is what made auto-
>     type-conversion work - so what am I missing here?

I'll take a look.  I think the key may be teaching TypeCategory
to know that lztext is a member of the text type class.
        regards, tom lane


Re: [HACKERS] type coerce problem with lztext

From
Tom Lane
Date:
wieck@debis.com (Jan Wieck) writes:
>     There are functions text(lztext) and  lztext(text)  too,  but
>     the  system  is  unable  to  find an operator if one compares
>     text=lztext in a query. IIRC, creating a  function  named  as
>     the target type and taking the source type is what made auto-
>     type-conversion work - so what am I missing here?

Yup, TypeCategory was the missing ingredient.  Seems to work now.

>     Yes, the 3043 bytes long rule action string got stored in 855
>     bytes  in  pg_rewrite.  That's 71.9% compression rate on this
>     attempt!

Over all the rules in the regression test database, I see:

regression=# select sum(length(ev_action)),sum(octet_length(ev_action)) from pg
_rewrite; sum   |  sum
--------+-------105270 | 38091
(1 row)

or about 64% compression.  Not bad...
        regards, tom lane


Re: [HACKERS] type coerce problem with lztext

From
Bruce Momjian
Date:
> Yup, TypeCategory was the missing ingredient.  Seems to work now.
> 
> >     Yes, the 3043 bytes long rule action string got stored in 855
> >     bytes  in  pg_rewrite.  That's 71.9% compression rate on this
> >     attempt!
> 
> Over all the rules in the regression test database, I see:
> 
> regression=# select sum(length(ev_action)),sum(octet_length(ev_action)) from pg
> _rewrite;
>   sum   |  sum
> --------+-------
>  105270 | 38091
> (1 row)
> 
> or about 64% compression.  Not bad...

We clearly needed this for 7.0 because of the larger plans.  Good thing
Jan had it available, becuase I can imagine some major headaches for
people without it.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] type coerce problem with lztext

From
wieck@debis.com (Jan Wieck)
Date:
> wieck@debis.com (Jan Wieck) writes:
> >     There are functions text(lztext) and  lztext(text)  too,  but
> >     the  system  is  unable  to  find an operator if one compares
> >     text=lztext in a query. IIRC, creating a  function  named  as
> >     the target type and taking the source type is what made auto-
> >     type-conversion work - so what am I missing here?
>
> Yup, TypeCategory was the missing ingredient.  Seems to work now.
  Tnx

> Over all the rules in the regression test database, I see:
>
> regression=# select sum(length(ev_action)),sum(octet_length(ev_action)) from pg
> _rewrite;
>   sum   |  sum
> --------+-------
>  105270 | 38091
> (1 row)
>
> or about 64% compression.  Not bad...
   Amazing,  when  looking  at  the simpleness of the algorithm,   isn't it?


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #




Re: [HACKERS] type coerce problem with lztext

From
wieck@debis.com (Jan Wieck)
Date:
Bruce Momjian wrote:

> We clearly needed this for 7.0 because of the larger plans.  Good thing
> Jan had it available, becuase I can imagine some major headaches for
> people without it.
   I  haven't  had  it available. But where able to dig out some   revision numbers, then take some CVS  diffs,  and
reactivate  two files from the CVS Attic directories.
 
   Here's something close to the new limit:
       rulename    | length | octet_length   ----------------+--------+--------------    _RETv1         |  64677 |
  7440
 
   The  view v1 is a simple 'SELECT * FROM t1' and t1 is a table   of 220 columns with the same names and types  as
nearly all   attributes of the system catalogs. Makes me feel comfortable.
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #




Re: [HACKERS] type coerce problem with lztext

From
Bruce Momjian
Date:
> Bruce Momjian wrote:
> 
> > We clearly needed this for 7.0 because of the larger plans.  Good thing
> > Jan had it available, becuase I can imagine some major headaches for
> > people without it.
> 
>     I  haven't  had  it available. But where able to dig out some
>     revision numbers, then take some CVS  diffs,  and  reactivate
>     two files from the CVS Attic directories.
> 
>     Here's something close to the new limit:
> 
>         rulename    | length | octet_length
>     ----------------+--------+--------------
>      _RETv1         |  64677 |         7440
> 
>     The  view v1 is a simple 'SELECT * FROM t1' and t1 is a table
>     of 220 columns with the same names and types  as  nearly  all
>     attributes of the system catalogs. Makes me feel comfortable.

Wow, that's a large number, 64k.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] type coerce problem with lztext

From
Don Baccus
Date:
At 04:18 PM 2/27/00 -0500, Bruce Momjian wrote:
>> Bruce Momjian wrote:

>>     The  view v1 is a simple 'SELECT * FROM t1' and t1 is a table
>>     of 220 columns with the same names and types  as  nearly  all
>>     attributes of the system catalogs. Makes me feel comfortable.
>
>Wow, that's a large number, 64k.

This is the "explosion" in length due to the column aliases now
being inserted into the rule, apparently.   The limit on views now
is much more tied to the number of columns in the referenced table(s)
than on the actual complexity of the view's definition per se.

lztext is doing a GREAT job of sweeping this problem under the rug,
so to speak, but it's still there...




- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] type coerce problem with lztext

From
Tom Lane
Date:
wieck@debis.com (Jan Wieck) writes:
>         rulename    | length | octet_length
>     ----------------+--------+--------------
>      _RETv1         |  64677 |         7440

>     The  view v1 is a simple 'SELECT * FROM t1' and t1 is a table
>     of 220 columns with the same names and types  as  nearly  all
>     attributes of the system catalogs. Makes me feel comfortable.

Wow, better than 8-to-1.  I guess you'd expect good compression on that,
considering the very repetitive nature of the targetlist node string.
Have you tried something with a long, boring WHERE-clause?
        regards, tom lane


Re: [HACKERS] type coerce problem with lztext

From
Tom Lane
Date:
Don Baccus <dhogaza@pacifier.com> writes:
> This is the "explosion" in length due to the column aliases now
> being inserted into the rule, apparently.
> lztext is doing a GREAT job of sweeping this problem under the rug,
> so to speak, but it's still there...

Actually, as far as I can tell 7.0 should be only marginally worse than
prior releases in terms of verbosity of the rule parsetree string.
As a check I did

create table foo (f1 int, f2 char(10), f3 text);
create view foov as select * from foo;
select ev_action from pg_rewrite where rulename = '_RETfoov';

and got (linebreaks inserted for readability)

({ QUERY :command 1 :utility <> :resultRelation 0 :into <> :isPortal
false :isBinary false :isTemp false :unionall false :distinctClause <>
:sortClause <>:rtable (
{ RTE :relname foov :ref { ATTR :relname *CURRENT*:attrs ( "f1" "f2" "f3" )}:relid 148363 :inh false :inFromCl false
:inJoinSetfalse :skipAcl false}
 
{ RTE :relname foov :ref { ATTR :relname *NEW*:attrs ( "f1" "f2" "f3" )}:relid 148363 :inh false :inFromCl false
:inJoinSetfalse :skipAcl false}
 
{ RTE :relname foo :ref { ATTR :relname foo:attrs ( "f1" "f2" "f3" )}:relid 148352 :inh false :inFromCl true :inJoinSet
true:skipAcl false}):targetlist (
 
{ TARGETENTRY :resdom { RESDOM :resno 1 :restype 23 :restypmod -1
:resname f1 :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false }
:expr { VAR :varno 3 :varattno 1 :vartype 23 :vartypmod -1
:varlevelsup 0 :varnoold 3 :varoattno 1}}
{ TARGETENTRY :resdom { RESDOM :resno 2 :restype 1042 :restypmod 14
:resname f2 :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false }
:expr { VAR :varno 3 :varattno 2 :vartype 1042 :vartypmod 14
:varlevelsup 0 :varnoold 3 :varoattno 2}}
{ TARGETENTRY :resdom { RESDOM :resno 3 :restype 25 :restypmod -1
:resname f3 :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false }
:expr { VAR :varno 3 :varattno 3 :vartype 25 :vartypmod -1
:varlevelsup 0 :varnoold 3 :varoattno 3}}):qual <> :groupClause <> :havingQual <> :hasAggs false :hasSubLinks
false :unionClause <> :intersectClause <> :limitOffset <> :limitCount <>
:rowMark <>})

The thrice-repeated list of attribute names in the rtable entries is
new with Thomas' latest changes, and I'd like to see it go away again,
but even so it's not very long compared to the targetlist entries.

The inJoinSet fields in rtable entries are new, and ressortgroupref
used to be called resgroupref which is costing us 4 more bytes per
targetlist item ;-).  But otherwise it's three occurrences of the
field name added onto an existing cost of about 230 bytes per target
entry.  This is not an "explosion".

In fact, if I do
select length(ev_action) from pg_rewrite where rulename = '_RETfoov';
I get 1507 in current sources and 1318 in 6.5.3, or about 15% growth.

My guess is that Don's problems are stemming from rules that reference
tables that have many more columns than are being output.  Citations
of the otherwise-unreferenced columns in the rtable could add a lot of
bulk that wasn't there before.  But it doesn't look to me like the size
of a simple "SELECT *" rule string has grown all that much.
        regards, tom lane


Re: [HACKERS] type coerce problem with lztext

From
Don Baccus
Date:
At 05:02 PM 2/27/00 -0500, Tom Lane wrote:

>My guess is that Don's problems are stemming from rules that reference
>tables that have many more columns than are being output.  Citations
>of the otherwise-unreferenced columns in the rtable could add a lot of
>bulk that wasn't there before.  But it doesn't look to me like the size
>of a simple "SELECT *" rule string has grown all that much.

I'll buy that.  A couple of the views I was having problems with were
indeed returning a few columns from a view joining a couple of tables, with
in two cases a "where" clause with a further subselect returning
a single column (used on the right of an "=").  I might add that the
problem was made worse by the fact that the view itself wasn't as
complex earlier - I updated my PG7.0 snapshot to include Thomas'
last changes at roughly same time I updated the web toolkit.

I picked out one doing just a "select *" as an example because I
felt it would kind of drive the point home that simple views on
relatively small tables were failing...


- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] type coerce problem with lztext

From
Peter Eisentraut
Date:
Bruce Momjian writes:

> Added to Features:
> 
>     New lztext data type for compressed text fields

I strongly suggest to not name this new feature. All the attempts to make
it go away silently in 7.1 will get a blow in the face from this.

Regarding which: Make a default description/comment (DESCR macro) "for
internal use only" and don't mention it in the documention (see last
paragraph), that should suffice. If people disregard that, they probably
use int2vector for their production applications as well.


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden




Re: [HACKERS] type coerce problem with lztext

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
>> New lztext data type for compressed text fields

> I strongly suggest to not name this new feature. All the attempts to make
> it go away silently in 7.1 will get a blow in the face from this.

People *will* use it, if it's there.  Don't fool yourself.

However, we can make it "go away silently" the same way we are making
datetime go away: the 7.1 (or whatever) parser can just translate the
typename lztext to text.  If that weren't feasible then I'd be pretty
worried too.
        regards, tom lane


Re: [HACKERS] type coerce problem with lztext

From
wieck@debis.com (Jan Wieck)
Date:
[Charset ISO-8859-1 unsupported, filtering to ASCII...]
> Bruce Momjian writes:
>
> > Added to Features:
> >
> >  New lztext data type for compressed text fields
>
> I strongly suggest to not name this new feature. All the attempts to make
> it go away silently in 7.1 will get a blow in the face from this.
   I already discovered that this ain't true.
   At  the  time we feature TOAST, we remove LZTEXT and put in a   type alias to TEXT. This way, during  a  dump/reload
upgrade   from any non-toasted to a toasted release, the "backend" will   take care for the silent conversion of table
schemas.We  can   keep  this  alias  for  a faily long time, so external schema   scripts can be modified.
 
   This way, all we have to mention is  exactly  the  above,  so   schema writers take it onto their upgrade-checklist,
andthat   no application query should ever use LZTEXT explicitly  (like   in casting expressions). They shall use TEXT
instead.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #