Thread: pgsql: Add 'noError' argument to encoding conversion functions.

pgsql: Add 'noError' argument to encoding conversion functions.

From
Heikki Linnakangas
Date:
Add 'noError' argument to encoding conversion functions.

With the 'noError' argument, you can try to convert a buffer without
knowing the character boundaries beforehand. The functions now need to
return the number of input bytes successfully converted.

This is is a backwards-incompatible change, if you have created a custom
encoding conversion with CREATE CONVERSION. This adds a check to
pg_upgrade for that, refusing the upgrade if there are any user-defined
encoding conversions. Custom conversions are very rare, there are no
commonly used extensions that I know of that uses that feature. No other
objects can depend on conversions, so if you do have one, you can fairly
easily drop it before upgrading, and recreate it after the upgrade with
an updated version.

Add regression tests for built-in encoding conversions. This doesn't cover
every conversion, but it covers all the internal functions in conv.c that
are used to implement the conversions.

Reviewed-by: John Naylor
Discussion: https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01%40iki.fi

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/ea1b99a6619cd9dcfd46b82ac0d926b0b80e0ae9

Modified Files
--------------
doc/src/sgml/ref/create_conversion.sgml            |  12 +-
src/backend/commands/conversioncmds.c              |  32 +-
src/backend/utils/error/elog.c                     |   2 +
src/backend/utils/mb/conv.c                        | 139 +++++-
.../cyrillic_and_mic/cyrillic_and_mic.c            | 127 +++--
.../euc2004_sjis2004/euc2004_sjis2004.c            |  94 +++-
.../euc_cn_and_mic/euc_cn_and_mic.c                |  57 ++-
.../euc_jp_and_sjis/euc_jp_and_sjis.c              | 153 ++++--
.../euc_kr_and_mic/euc_kr_and_mic.c                |  57 ++-
.../euc_tw_and_big5/euc_tw_and_big5.c              | 165 +++++--
.../latin2_and_win1250/latin2_and_win1250.c        |  49 +-
.../conversion_procs/latin_and_mic/latin_and_mic.c |  43 +-
.../conversion_procs/utf8_and_big5/utf8_and_big5.c |  37 +-
.../utf8_and_cyrillic/utf8_and_cyrillic.c          |  67 +--
.../utf8_and_euc2004/utf8_and_euc2004.c            |  37 +-
.../utf8_and_euc_cn/utf8_and_euc_cn.c              |  37 +-
.../utf8_and_euc_jp/utf8_and_euc_jp.c              |  37 +-
.../utf8_and_euc_kr/utf8_and_euc_kr.c              |  37 +-
.../utf8_and_euc_tw/utf8_and_euc_tw.c              |  37 +-
.../utf8_and_gb18030/utf8_and_gb18030.c            |  37 +-
.../conversion_procs/utf8_and_gbk/utf8_and_gbk.c   |  37 +-
.../utf8_and_iso8859/utf8_and_iso8859.c            |  43 +-
.../utf8_and_iso8859_1/utf8_and_iso8859_1.c        |  35 +-
.../utf8_and_johab/utf8_and_johab.c                |  37 +-
.../conversion_procs/utf8_and_sjis/utf8_and_sjis.c |  37 +-
.../utf8_and_sjis2004/utf8_and_sjis2004.c          |  37 +-
.../conversion_procs/utf8_and_uhc/utf8_and_uhc.c   |  37 +-
.../conversion_procs/utf8_and_win/utf8_and_win.c   |  43 +-
src/backend/utils/mb/mbutils.c                     |  79 +++-
src/bin/pg_upgrade/check.c                         |  95 ++++
src/include/catalog/catversion.h                   |   2 +-
src/include/catalog/pg_proc.dat                    | 332 ++++++-------
src/include/mb/pg_wchar.h                          |  35 +-
src/test/regress/expected/conversion.out           | 519 +++++++++++++++++++++
src/test/regress/expected/opr_sanity.out           |   7 +-
src/test/regress/input/create_function_1.source    |   4 +
src/test/regress/output/create_function_1.source   |   3 +
src/test/regress/regress.c                         | 134 ++++++
src/test/regress/sql/conversion.sql                | 185 ++++++++
src/test/regress/sql/opr_sanity.sql                |   7 +-
40 files changed, 2333 insertions(+), 631 deletions(-)


Re: pgsql: Add 'noError' argument to encoding conversion functions.

From
Jaime Casanova
Date:
On Thu, Apr 01, 2021 at 09:25:00AM +0000, Heikki Linnakangas wrote:
> 
> Add regression tests for built-in encoding conversions. This doesn't cover
> every conversion, but it covers all the internal functions in conv.c that
> are used to implement the conversions.
> 

sqlsmith thinks that test_enc_conversion() must be marked as STRICT. It
crashes with something like this:

```
SELECT test_enc_conversion('\xe4dede'::bytea, 
                           null::name, 
               pg_catalog.getdatabaseencoding(), 
               false
              );
```

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL



Re: pgsql: Add 'noError' argument to encoding conversion functions.

From
Heikki Linnakangas
Date:
On 03/04/2021 05:53, Michael Paquier wrote:
> On Fri, Apr 02, 2021 at 06:53:37PM -0500, Jaime Casanova wrote:
>> sqlsmith thinks that test_enc_conversion() must be marked as STRICT. It
>> crashes with something like this:
> 
> This one has won an open item.

Fixed by adding STRICT. Thanks!

- Heikki