Re: [HACKERS] lztext.c - Mailing list pgsql-hackers

From wieck@debis.com (Jan Wieck)
Subject Re: [HACKERS] lztext.c
Date
Msg-id m11qoZc-0003kGC@orion.SAPserv.Hamburg.dsh.de
Whole thread Raw
In response to Re: [HACKERS] lztext.c  (Tatsuo Ishii <t-ishii@sra.co.jp>)
List pgsql-hackers
Tatsuo Ishii wrote:

> >    Don't  spend  much efford for comparision and the SUBSTRING()
> >    things right now. I already have an  additional,  generalized
> >    decompressor in mind, that can be used in the comparision for
> >    example  to  decompress  two  values  on  the  fly  and  stop
> >    comparision  at  the  first difference, which usually happens
> >    early in two random datums.
>
> Ok.
>
> >    Tell me when you have the multi-byte  (and  maybe  cyrillic?)
> >    stuff committed and I'll take my hands back on the code.
>
> I have committed the changes just now, though cyrillic support is not
> included. I vaguely recall the discussion about the usefullness of
> the cyrillic support.

    I  added the comparision functions, operators and the default
    nbtree operator class for indexing.

    For the SUBSTR() and STRPOS(), I  just  checked  the  current
    setup  and it automatically casts an lztext argument in these
    functions to text. I assume lztext can now be used  in  every
    place  where  text  is allowed. Is it really worth to blow up
    the catalogs with rarely used functions that only  gain  some
    saved decompressed portion?

    Remember, the algorithm is optimized for decompression speed.
    It might save some time to do this for a comparision function
    used  inside  of  index scans or btree operations, where it's
    likely to hit a difference  early.  But  for  something  like
    STRPOS(),  using  the  default cast and changing the STRPOS()
    match search itself into a KMP algorithm (instead of  walking
    through  the  text  and  comparing  each position against the
    pattern using strncmp) would outperform it in any case.  With
    the  byte by byte strncmp() method, we definitely implemented
    the slowest and best readable possibility.

    I think we should better spend our time in adding a  lzbpchar
    type.   Or  work on compressed tables and tuple split to blow
    away the size limits at all.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

pgsql-hackers by date:

Previous
From: wieck@debis.com (Jan Wieck)
Date:
Subject: Re: [HACKERS] run_check problem
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] run_check problem