Thread: lztext.c
I'm going to commit changes to make lztextlen() aware of multi-byte. While doing the work, I found that no POSITION() or SUBSTRING() for lztext has been implemented in the file. BTW, does anybody work on making lztext indexable? If no, I will take care of it with above addtions. -- Tatsuo Ishii
Tatsuo Ishii wrote: > I'm going to commit changes to make lztextlen() aware of > multi-byte. While doing the work, I found that no POSITION() or > SUBSTRING() for lztext has been implemented in the file. Thank's for that. I usually don't have multi-byte support compiled in and it's surely better if you do the extension and tests. I know that a lot of functions are missing so far. Especially comparision and the mentioned ones. I thought to get back on it after the multi-byte support is inside. > BTW, does anybody work on making lztext indexable? If no, I will take > care of it with above addtions. IMHO something questionable. A compressed data type is preferred to store large amounts of data. Indexing large fields OTOH is something to prevent by database design. The new type at hand offers reasonable compression rates only above some size of input. OTOOH, it might get someone around the btree split problems some of us encountered and which I where able to trigger with field contents above 2K already. In such a case it can be a last resort. I'd like to know what others think. Don't spend much efford for comparision and the SUBSTRING() things right now. I already have an additional, generalized decompressor in mind, that can be used in the comparision for example to decompress two values on the fly and stop comparision at the first difference, which usually happens early in two random datums. Tell me when you have the multi-byte (and maybe cyrillic?) stuff committed and I'll take my hands back on the code. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
> Don't spend much efford for comparision and the SUBSTRING() > things right now. I already have an additional, generalized > decompressor in mind, that can be used in the comparision for > example to decompress two values on the fly and stop > comparision at the first difference, which usually happens > early in two random datums. Ok. > Tell me when you have the multi-byte (and maybe cyrillic?) > stuff committed and I'll take my hands back on the code. I have committed the changes just now, though cyrillic support is not included. I vaguely recall the discussion about the usefullness of the cyrillic support. -- Tatsuo Ishii
On Wed, 24 Nov 1999, Tatsuo Ishii wrote: > Date: Wed, 24 Nov 1999 12:52:53 +0900 > From: Tatsuo Ishii <t-ishii@sra.co.jp> > To: Jan Wieck <wieck@debis.com> > Cc: pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] lztext.c > > > Don't spend much efford for comparision and the SUBSTRING() > > things right now. I already have an additional, generalized > > decompressor in mind, that can be used in the comparision for > > example to decompress two values on the fly and stop > > comparision at the first difference, which usually happens > > early in two random datums. > > Ok. > > > Tell me when you have the multi-byte (and maybe cyrillic?) > > stuff committed and I'll take my hands back on the code. > > I have committed the changes just now, though cyrillic support is not > included. I vaguely recall the discussion about the usefullness of > the cyrillic support. If you mean --recode you-re right. > -- > Tatsuo Ishii > > > ************ > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Tatsuo Ishii wrote: > > Don't spend much efford for comparision and the SUBSTRING() > > things right now. I already have an additional, generalized > > decompressor in mind, that can be used in the comparision for > > example to decompress two values on the fly and stop > > comparision at the first difference, which usually happens > > early in two random datums. > > Ok. > > > Tell me when you have the multi-byte (and maybe cyrillic?) > > stuff committed and I'll take my hands back on the code. > > I have committed the changes just now, though cyrillic support is not > included. I vaguely recall the discussion about the usefullness of > the cyrillic support. I added the comparision functions, operators and the default nbtree operator class for indexing. For the SUBSTR() and STRPOS(), I just checked the current setup and it automatically casts an lztext argument in these functions to text. I assume lztext can now be used in every place where text is allowed. Is it really worth to blow up the catalogs with rarely used functions that only gain some saved decompressed portion? Remember, the algorithm is optimized for decompression speed. It might save some time to do this for a comparision function used inside of index scans or btree operations, where it's likely to hit a difference early. But for something like STRPOS(), using the default cast and changing the STRPOS() match search itself into a KMP algorithm (instead of walking through the text and comparing each position against the pattern using strncmp) would outperform it in any case. With the byte by byte strncmp() method, we definitely implemented the slowest and best readable possibility. I think we should better spend our time in adding a lzbpchar type. Or work on compressed tables and tuple split to blow away the size limits at all. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #