Re: [HACKERS] Radix tree for character conversion - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: [HACKERS] Radix tree for character conversion
Date
Msg-id 01efd334-b839-0450-1b63-f2dea9326a7e@iki.fi
Whole thread Raw
In response to Re: [HACKERS] Radix tree for character conversion  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: [HACKERS] Radix tree for character conversion  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
On 03/17/2017 07:19 AM, Kyotaro HORIGUCHI wrote:
> At Mon, 13 Mar 2017 21:07:39 +0200, Heikki Linnakangas <hlinnaka@iki.fi> wrote in
<d5b70078-9f57-0f63-3462-1e564a57739f@iki.fi>
>> Hmm. A somewhat different approach might be more suitable for testing
>> across versions, though. We could modify the perl scripts slightly to
>> print out SQL statements that exercise every mapping. For every
>> supported conversion, the SQL script could:
>>
>> 1. create a database in the source encoding.
>> 2. set client_encoding='<target encoding>'
>> 3. SELECT a string that contains every character in the source
>> encoding.
>
> There are many encodings that can be client-encoding but cannot
> be database-encoding.

Good point.

> I would like to use convert() function. It can be a large
> PL/PgSQL function or a series of "SELECT convert(...)"s. The
> latter is doable on-the-fly (by not generating/storing the whole
> script).
>
> | -- Test for SJIS->UTF-8 conversion
> | ...
> | SELECT convert('\0000', 'SJIS', 'UTF-8'); -- results in error
> | ...
> | SELECT convert('\897e', 'SJIS', 'UTF-8');

Makes sense.

>> You could then run those SQL statements against old and new server
>> version, and verify that you get the same results.
>
> Including the result files in the repository will make this easy
> but unacceptably bloats. Put mb/Unicode/README.sanity_check?

Yeah, a README with instructions on how to do sounds good. No need to 
include the results in the repository, you can run the script against an 
older version when you need something to compare with.

- Heikki




pgsql-hackers by date:

Previous
From: Emre Hasegeli
Date:
Subject: Re: [HACKERS] BRIN cost estimate
Next
From: Masahiko Sawada
Date:
Subject: Re: [HACKERS] Two phase commit in ECPG