Thread: type coerce problem with lztext
Well, LZTEXT is there again, and pg_rewrite uses it for action and qual strings. This is what it tells: pgsql=# select rulename, length(ev_action), octet_length(ev_action) pgsql-# from pg_rewrite; rulename | length| octet_length ----------------+--------+-------------- _RETpg_user | 3043 | 855 _RETpg_rules | 3074 | 1139 _RETpg_views | 4261 | 1252 _RETpg_tables | 5187 | 1338 _RETpg_indexes | 3525 | 1122 (5 rows) Yes, the 3043 bytes long rule action string got stored in 855 bytes in pg_rewrite. That's 71.9% compression rateon this attempt! There are functions text(lztext) and lztext(text) too, but the system is unable to find an operator if one compares text=lztext in a query. IIRC, creating a function named as the target type and taking the source type is whatmade auto- type-conversion work - so what am I missing here? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
> Yes, the 3043 bytes long rule action string got stored in 855 > bytes in pg_rewrite. That's 71.9% compression rate on this > attempt! > > There are functions text(lztext) and lztext(text) too, but > the system is unable to find an operator if one compares > text=lztext in a query. IIRC, creating a function named as > the target type and taking the source type is what made auto- > type-conversion work - so what am I missing here? Added to Features: New lztext data type for compressed text fieldsLarger views/rules supported -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
At 01:03 PM 2/27/00 +0100, Jan Wieck wrote: >Well, > > LZTEXT is there again, and pg_rewrite uses it for action and > qual strings. This is what it tells: > > pgsql=# select rulename, length(ev_action), octet_length(ev_action) > pgsql-# from pg_rewrite; > rulename | length | octet_length > ----------------+--------+-------------- > _RETpg_user | 3043 | 855 > _RETpg_rules | 3074 | 1139 > _RETpg_views | 4261 | 1252 > _RETpg_tables | 5187 | 1338 > _RETpg_indexes | 3525 | 1122 > (5 rows) > > Yes, the 3043 bytes long rule action string got stored in 855 > bytes in pg_rewrite. That's 71.9% compression rate on this > attempt! This will greatly help counter 7.0's "rule length explosion". Thanks. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
wieck@debis.com (Jan Wieck) writes: > There are functions text(lztext) and lztext(text) too, but > the system is unable to find an operator if one compares > text=lztext in a query. IIRC, creating a function named as > the target type and taking the source type is what made auto- > type-conversion work - so what am I missing here? I'll take a look. I think the key may be teaching TypeCategory to know that lztext is a member of the text type class. regards, tom lane
wieck@debis.com (Jan Wieck) writes: > There are functions text(lztext) and lztext(text) too, but > the system is unable to find an operator if one compares > text=lztext in a query. IIRC, creating a function named as > the target type and taking the source type is what made auto- > type-conversion work - so what am I missing here? Yup, TypeCategory was the missing ingredient. Seems to work now. > Yes, the 3043 bytes long rule action string got stored in 855 > bytes in pg_rewrite. That's 71.9% compression rate on this > attempt! Over all the rules in the regression test database, I see: regression=# select sum(length(ev_action)),sum(octet_length(ev_action)) from pg _rewrite; sum | sum --------+-------105270 | 38091 (1 row) or about 64% compression. Not bad... regards, tom lane
> Yup, TypeCategory was the missing ingredient. Seems to work now. > > > Yes, the 3043 bytes long rule action string got stored in 855 > > bytes in pg_rewrite. That's 71.9% compression rate on this > > attempt! > > Over all the rules in the regression test database, I see: > > regression=# select sum(length(ev_action)),sum(octet_length(ev_action)) from pg > _rewrite; > sum | sum > --------+------- > 105270 | 38091 > (1 row) > > or about 64% compression. Not bad... We clearly needed this for 7.0 because of the larger plans. Good thing Jan had it available, becuase I can imagine some major headaches for people without it. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> wieck@debis.com (Jan Wieck) writes: > > There are functions text(lztext) and lztext(text) too, but > > the system is unable to find an operator if one compares > > text=lztext in a query. IIRC, creating a function named as > > the target type and taking the source type is what made auto- > > type-conversion work - so what am I missing here? > > Yup, TypeCategory was the missing ingredient. Seems to work now. Tnx > Over all the rules in the regression test database, I see: > > regression=# select sum(length(ev_action)),sum(octet_length(ev_action)) from pg > _rewrite; > sum | sum > --------+------- > 105270 | 38091 > (1 row) > > or about 64% compression. Not bad... Amazing, when looking at the simpleness of the algorithm, isn't it? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
Bruce Momjian wrote: > We clearly needed this for 7.0 because of the larger plans. Good thing > Jan had it available, becuase I can imagine some major headaches for > people without it. I haven't had it available. But where able to dig out some revision numbers, then take some CVS diffs, and reactivate two files from the CVS Attic directories. Here's something close to the new limit: rulename | length | octet_length ----------------+--------+-------------- _RETv1 | 64677 | 7440 The view v1 is a simple 'SELECT * FROM t1' and t1 is a table of 220 columns with the same names and types as nearly all attributes of the system catalogs. Makes me feel comfortable. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
> Bruce Momjian wrote: > > > We clearly needed this for 7.0 because of the larger plans. Good thing > > Jan had it available, becuase I can imagine some major headaches for > > people without it. > > I haven't had it available. But where able to dig out some > revision numbers, then take some CVS diffs, and reactivate > two files from the CVS Attic directories. > > Here's something close to the new limit: > > rulename | length | octet_length > ----------------+--------+-------------- > _RETv1 | 64677 | 7440 > > The view v1 is a simple 'SELECT * FROM t1' and t1 is a table > of 220 columns with the same names and types as nearly all > attributes of the system catalogs. Makes me feel comfortable. Wow, that's a large number, 64k. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
At 04:18 PM 2/27/00 -0500, Bruce Momjian wrote: >> Bruce Momjian wrote: >> The view v1 is a simple 'SELECT * FROM t1' and t1 is a table >> of 220 columns with the same names and types as nearly all >> attributes of the system catalogs. Makes me feel comfortable. > >Wow, that's a large number, 64k. This is the "explosion" in length due to the column aliases now being inserted into the rule, apparently. The limit on views now is much more tied to the number of columns in the referenced table(s) than on the actual complexity of the view's definition per se. lztext is doing a GREAT job of sweeping this problem under the rug, so to speak, but it's still there... - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
wieck@debis.com (Jan Wieck) writes: > rulename | length | octet_length > ----------------+--------+-------------- > _RETv1 | 64677 | 7440 > The view v1 is a simple 'SELECT * FROM t1' and t1 is a table > of 220 columns with the same names and types as nearly all > attributes of the system catalogs. Makes me feel comfortable. Wow, better than 8-to-1. I guess you'd expect good compression on that, considering the very repetitive nature of the targetlist node string. Have you tried something with a long, boring WHERE-clause? regards, tom lane
Don Baccus <dhogaza@pacifier.com> writes: > This is the "explosion" in length due to the column aliases now > being inserted into the rule, apparently. > lztext is doing a GREAT job of sweeping this problem under the rug, > so to speak, but it's still there... Actually, as far as I can tell 7.0 should be only marginally worse than prior releases in terms of verbosity of the rule parsetree string. As a check I did create table foo (f1 int, f2 char(10), f3 text); create view foov as select * from foo; select ev_action from pg_rewrite where rulename = '_RETfoov'; and got (linebreaks inserted for readability) ({ QUERY :command 1 :utility <> :resultRelation 0 :into <> :isPortal false :isBinary false :isTemp false :unionall false :distinctClause <> :sortClause <>:rtable ( { RTE :relname foov :ref { ATTR :relname *CURRENT*:attrs ( "f1" "f2" "f3" )}:relid 148363 :inh false :inFromCl false :inJoinSetfalse :skipAcl false} { RTE :relname foov :ref { ATTR :relname *NEW*:attrs ( "f1" "f2" "f3" )}:relid 148363 :inh false :inFromCl false :inJoinSetfalse :skipAcl false} { RTE :relname foo :ref { ATTR :relname foo:attrs ( "f1" "f2" "f3" )}:relid 148352 :inh false :inFromCl true :inJoinSet true:skipAcl false}):targetlist ( { TARGETENTRY :resdom { RESDOM :resno 1 :restype 23 :restypmod -1 :resname f1 :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 3 :varattno 1 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 3 :varoattno 1}} { TARGETENTRY :resdom { RESDOM :resno 2 :restype 1042 :restypmod 14 :resname f2 :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 3 :varattno 2 :vartype 1042 :vartypmod 14 :varlevelsup 0 :varnoold 3 :varoattno 2}} { TARGETENTRY :resdom { RESDOM :resno 3 :restype 25 :restypmod -1 :resname f3 :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 3 :varattno 3 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold 3 :varoattno 3}}):qual <> :groupClause <> :havingQual <> :hasAggs false :hasSubLinks false :unionClause <> :intersectClause <> :limitOffset <> :limitCount <> :rowMark <>}) The thrice-repeated list of attribute names in the rtable entries is new with Thomas' latest changes, and I'd like to see it go away again, but even so it's not very long compared to the targetlist entries. The inJoinSet fields in rtable entries are new, and ressortgroupref used to be called resgroupref which is costing us 4 more bytes per targetlist item ;-). But otherwise it's three occurrences of the field name added onto an existing cost of about 230 bytes per target entry. This is not an "explosion". In fact, if I do select length(ev_action) from pg_rewrite where rulename = '_RETfoov'; I get 1507 in current sources and 1318 in 6.5.3, or about 15% growth. My guess is that Don's problems are stemming from rules that reference tables that have many more columns than are being output. Citations of the otherwise-unreferenced columns in the rtable could add a lot of bulk that wasn't there before. But it doesn't look to me like the size of a simple "SELECT *" rule string has grown all that much. regards, tom lane
At 05:02 PM 2/27/00 -0500, Tom Lane wrote: >My guess is that Don's problems are stemming from rules that reference >tables that have many more columns than are being output. Citations >of the otherwise-unreferenced columns in the rtable could add a lot of >bulk that wasn't there before. But it doesn't look to me like the size >of a simple "SELECT *" rule string has grown all that much. I'll buy that. A couple of the views I was having problems with were indeed returning a few columns from a view joining a couple of tables, with in two cases a "where" clause with a further subselect returning a single column (used on the right of an "="). I might add that the problem was made worse by the fact that the view itself wasn't as complex earlier - I updated my PG7.0 snapshot to include Thomas' last changes at roughly same time I updated the web toolkit. I picked out one doing just a "select *" as an example because I felt it would kind of drive the point home that simple views on relatively small tables were failing... - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
Bruce Momjian writes: > Added to Features: > > New lztext data type for compressed text fields I strongly suggest to not name this new feature. All the attempts to make it go away silently in 7.1 will get a blow in the face from this. Regarding which: Make a default description/comment (DESCR macro) "for internal use only" and don't mention it in the documention (see last paragraph), that should suffice. If people disregard that, they probably use int2vector for their production applications as well. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Peter Eisentraut <peter_e@gmx.net> writes: >> New lztext data type for compressed text fields > I strongly suggest to not name this new feature. All the attempts to make > it go away silently in 7.1 will get a blow in the face from this. People *will* use it, if it's there. Don't fool yourself. However, we can make it "go away silently" the same way we are making datetime go away: the 7.1 (or whatever) parser can just translate the typename lztext to text. If that weren't feasible then I'd be pretty worried too. regards, tom lane
[Charset ISO-8859-1 unsupported, filtering to ASCII...] > Bruce Momjian writes: > > > Added to Features: > > > > New lztext data type for compressed text fields > > I strongly suggest to not name this new feature. All the attempts to make > it go away silently in 7.1 will get a blow in the face from this. I already discovered that this ain't true. At the time we feature TOAST, we remove LZTEXT and put in a type alias to TEXT. This way, during a dump/reload upgrade from any non-toasted to a toasted release, the "backend" will take care for the silent conversion of table schemas.We can keep this alias for a faily long time, so external schema scripts can be modified. This way, all we have to mention is exactly the above, so schema writers take it onto their upgrade-checklist, andthat no application query should ever use LZTEXT explicitly (like in casting expressions). They shall use TEXT instead. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #