Re: What is the maximum encoding-conversion growth rate, anyway? - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: What is the maximum encoding-conversion growth rate, anyway?
Date
Msg-id 200707181509.l6IF9AE12790@momjian.us
Whole thread Raw
In response to Re: What is the maximum encoding-conversion growth rate, anyway?  (Tatsuo Ishii <ishii@postgresql.org>)
List pgsql-hackers
This has been saved for the 8.4 release:
http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Tatsuo Ishii wrote:
> The conclusion of the discussion appears that we could reduce
> MAX_CONVERSION_GROWTH from 4 to 3 safely with all existing built-in
> conversions.
> 
> However, since user defined conversions could set arbitrary growth
> rate, probably it would be better leave it as it is now.
> 
> For 8.4, maybe we could change conversion function's signature so that
> we don't need to have the fixed conversion rate as Tom suggested.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> 
> > Where are we on this?
> > 
> > ---------------------------------------------------------------------------
> > 
> > Tom Lane wrote:
> > > I just rearranged the code in mbutils.c a little bit to make it more
> > > robust if conversion of an over-length string is attempted, and noted
> > > this comment:
> > > 
> > > /*
> > >  * When converting strings between different encodings, we assume that space
> > >  * for converted result is 4-to-1 growth in the worst case. The rate for
> > >  * currently supported encoding pairs are within 3 (SJIS JIS X0201 half width
> > >  * kanna -> UTF8 is the worst case).  So "4" should be enough for the moment.
> > >  *
> > >  * Note that this is not the same as the maximum character width in any
> > >  * particular encoding.
> > >  */
> > > #define MAX_CONVERSION_GROWTH  4
> > > 
> > > It strikes me that this is overly pessimistic, since we do not support
> > > 5- or 6-byte UTF8 characters, and AFAICS there are no 1-byte characters
> > > in any supported encoding that require 4 bytes in another.  Could we
> > > reduce the multiplier to 3?  Or even 2?  This has a direct impact on the
> > > longest COPY lines we can support, so I'd like it not to be larger than
> > > necessary.
> > > 
> > >             regards, tom lane
> > > 
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 4: Have you searched our list archives?
> > > 
> > >                http://archives.postgresql.org
> > 
> > -- 
> >   Bruce Momjian  <bruce@momjian.us>          http://momjian.us
> >   EnterpriseDB                               http://www.enterprisedb.com
> > 
> >   + If your life is a hard drive, Christ can be your backup. +

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Future of krb5 authentication
Next
From: "Simon Riggs"
Date:
Subject: Re: Comments on the HOT design