Thread: pg_conversion seems rather strangely defined

pg_conversion seems rather strangely defined

From

Tom Lane

Date:

05 January 2016, 18:47:02

What is the point of pg_conversion.condefault (the flag that says whether
a conversion is "default")?  AFAICS, there is absolutely no way to invoke
a conversion that is not default, which means we might as well eliminate
the concept.

I do not see a lot of point in the namespacing of encoding conversions
either.  Does anyone really need or use search-path-dependent lookup of
conversions?  (If they do, it's probably broken anyway, since for example
we do not trouble to re-identify the client encoding conversion functions
when search_path changes.)

While actually removing pg_conversion.connamespace might not be worth
the trouble, it's mighty tempting to have just a single unique index on
(conforencoding, contoencoding), thereby enforcing that There Can Be Only
One conversion between any given pair of encodings, and then we can just
use that index to find the right entry without any fooling with search
path.

But in any case we should get rid of the concept of defaultness, because
it's pointless; all entries should be equally "default".
        regards, tom lane

Re: pg_conversion seems rather strangely defined

From

Tom Lane

Date:

05 January 2016, 21:57:54

I wrote:
> What is the point of pg_conversion.condefault (the flag that says whether
> a conversion is "default")?  AFAICS, there is absolutely no way to invoke
> a conversion that is not default, which means we might as well eliminate
> the concept.

> I do not see a lot of point in the namespacing of encoding conversions
> either.  Does anyone really need or use search-path-dependent lookup of
> conversions?  (If they do, it's probably broken anyway, since for example
> we do not trouble to re-identify the client encoding conversion functions
> when search_path changes.)

> While actually removing pg_conversion.connamespace might not be worth
> the trouble, it's mighty tempting to have just a single unique index on
> (conforencoding, contoencoding), thereby enforcing that There Can Be Only
> One conversion between any given pair of encodings, and then we can just
> use that index to find the right entry without any fooling with search
> path.

> But in any case we should get rid of the concept of defaultness, because
> it's pointless; all entries should be equally "default".

After further poking around, I realized that there *used* to be a way to
get at non-default encoding conversions, but it was removed here:

commit 02138357ffc8c41a3d646035368712a5394f1f5f
Author: Andrew Dunstan <andrew@dunslane.net>
Date:   Mon Sep 24 01:29:30 2007 +0000
   Remove "convert 'blah' using conversion_name" facility, because if it   produces text it is an encoding hole and if
notit's incompatible   with the spec, whatever the spec means (which we're not sure about anyway).

Now, the big problem there was that the function could produce values of
type "text" that violate the current database encoding.  We could have
retained the facility by making the function take and return bytea, as
convert() does today, but apparently nobody spoke up for the need then
or since.

Still, somebody might want it someday, and it would be pretty messy
to remove the concept of a default encoding only to have to put it
back later.

I still think however that search-path-based lookup of default encoding
conversions is a Bad Idea, and that we only ought to allow one default
conversion to exist for any pair of encodings.

I realized that we could implement that without too much trouble by
redefining pg_conversion.condefault as being true for default conversions
and NULL (not false) for non-default ones.  With this definition, a
unique index on pg_conversion (conforencoding, contoencoding, condefault)
would have the behavior we want --- sort of a poor man's partial unique
index, except that it would work correctly on a system catalog whereas
a true partial index wouldn't.

Thoughts?
        regards, tom lane

Re: pg_conversion seems rather strangely defined

From

Tom Lane

Date:

06 January 2016, 01:02:23

I wrote:
> I still think however that search-path-based lookup of default encoding
> conversions is a Bad Idea, and that we only ought to allow one default
> conversion to exist for any pair of encodings.

> I realized that we could implement that without too much trouble by
> redefining pg_conversion.condefault as being true for default conversions
> and NULL (not false) for non-default ones.  With this definition, a
> unique index on pg_conversion (conforencoding, contoencoding, condefault)
> would have the behavior we want --- sort of a poor man's partial unique
> index, except that it would work correctly on a system catalog whereas
> a true partial index wouldn't.

Turns out that indeed that works just fine.  See attached draft patch.

            regards, tom lane

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 97ef618..dcb8b8e 100644
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
***************
*** 2538,2544 ****
        <entry><structfield>condefault</structfield></entry>
        <entry><type>bool</type></entry>
        <entry></entry>
!       <entry>True if this is the default conversion</entry>
       </row>

      </tbody>
--- 2538,2545 ----
        <entry><structfield>condefault</structfield></entry>
        <entry><type>bool</type></entry>
        <entry></entry>
!       <entry>True if this is the default conversion for these two encodings,
!        else NULL</entry>
       </row>

      </tbody>
diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index f8c7ac3..eb39a9f 100644
*** a/doc/src/sgml/charset.sgml
--- b/doc/src/sgml/charset.sgml
*************** $ <userinput>psql -l</userinput>
*** 1102,1108 ****
       <literal>pg_conversion</> system catalog.  <productname>PostgreSQL</>
       comes with some predefined conversions, as shown in <xref
       linkend="multibyte-translation-table">. You can create a new
!      conversion using the SQL command <command>CREATE CONVERSION</command>.
      </para>

       <table id="multibyte-translation-table">
--- 1102,1108 ----
       <literal>pg_conversion</> system catalog.  <productname>PostgreSQL</>
       comes with some predefined conversions, as shown in <xref
       linkend="multibyte-translation-table">. You can create a new
!      conversion using the SQL command <xref linkend="sql-createconversion">.
      </para>

       <table id="multibyte-translation-table">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a0b42c2..f57d891 100644
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***************
*** 1488,1494 ****
          original encoding is specified by
          <parameter>src_encoding</parameter>. The
          <parameter>string</parameter> must be valid in this encoding.
!         Conversions can be defined by <command>CREATE CONVERSION</command>.
          Also there are some predefined conversions. See <xref
          linkend="conversion-names"> for available conversions.
         </entry>
--- 1488,1494 ----
          original encoding is specified by
          <parameter>src_encoding</parameter>. The
          <parameter>string</parameter> must be valid in this encoding.
!         Conversions can be defined with <xref linkend="sql-createconversion">.
          Also there are some predefined conversions. See <xref
          linkend="conversion-names"> for available conversions.
         </entry>
***************
*** 1525,1534 ****
         </entry>
         <entry><type>bytea</type></entry>
         <entry>
!         Convert string to <parameter>dest_encoding</parameter>.
         </entry>
         <entry><literal>convert_to('some text', 'UTF8')</literal></entry>
!        <entry><literal>some text</literal> represented in the UTF8 encoding</entry>
        </row>

        <row>
--- 1525,1535 ----
         </entry>
         <entry><type>bytea</type></entry>
         <entry>
!         Convert string from the database encoding
!         to <parameter>dest_encoding</parameter>.
         </entry>
         <entry><literal>convert_to('some text', 'UTF8')</literal></entry>
!        <entry><literal>some text</literal> represented in UTF8 encoding</entry>
        </row>

        <row>
diff --git a/doc/src/sgml/ref/create_conversion.sgml b/doc/src/sgml/ref/create_conversion.sgml
index d2e2c01..1933107 100644
*** a/doc/src/sgml/ref/create_conversion.sgml
--- b/doc/src/sgml/ref/create_conversion.sgml
*************** CREATE [ DEFAULT ] CONVERSION <replaceab
*** 28,34 ****

    <para>
     <command>CREATE CONVERSION</command> defines a new conversion between
!    character set encodings.  Also, conversions that
     are marked <literal>DEFAULT</> can be used for automatic encoding
     conversion between
     client and server. For this purpose, two conversions, from encoding A to
--- 28,34 ----

    <para>
     <command>CREATE CONVERSION</command> defines a new conversion between
!    character set encodings.  Conversions that
     are marked <literal>DEFAULT</> can be used for automatic encoding
     conversion between
     client and server. For this purpose, two conversions, from encoding A to
*************** CREATE [ DEFAULT ] CONVERSION <replaceab
*** 53,59 ****
        <para>
         The <literal>DEFAULT</> clause indicates that this conversion
         is the default for this particular source to destination
!        encoding. There should be only one default encoding in a schema
         for the encoding pair.
        </para>
       </listitem>
--- 53,59 ----
        <para>
         The <literal>DEFAULT</> clause indicates that this conversion
         is the default for this particular source to destination
!        encoding. There can be only one default conversion in a database
         for the encoding pair.
        </para>
       </listitem>
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 8b105fe..98977f6 100644
*** a/src/backend/catalog/namespace.c
--- b/src/backend/catalog/namespace.c
***************
*** 28,34 ****
  #include "catalog/pg_authid.h"
  #include "catalog/pg_collation.h"
  #include "catalog/pg_conversion.h"
- #include "catalog/pg_conversion_fn.h"
  #include "catalog/pg_namespace.h"
  #include "catalog/pg_opclass.h"
  #include "catalog/pg_operator.h"
--- 28,33 ----
*************** get_conversion_oid(List *name, bool miss
*** 3416,3448 ****
  }

  /*
-  * FindDefaultConversionProc - find default encoding conversion proc
-  */
- Oid
- FindDefaultConversionProc(int32 for_encoding, int32 to_encoding)
- {
-     Oid            proc;
-     ListCell   *l;
-
-     recomputeNamespacePath();
-
-     foreach(l, activeSearchPath)
-     {
-         Oid            namespaceId = lfirst_oid(l);
-
-         if (namespaceId == myTempNamespace)
-             continue;            /* do not look in temp namespace */
-
-         proc = FindDefaultConversion(namespaceId, for_encoding, to_encoding);
-         if (OidIsValid(proc))
-             return proc;
-     }
-
-     /* Not found in path */
-     return InvalidOid;
- }
-
- /*
   * recomputeNamespacePath - recompute path derived variables if needed.
   */
  static void
--- 3415,3420 ----
diff --git a/src/backend/catalog/pg_conversion.c b/src/backend/catalog/pg_conversion.c
index e2feb17..89b1ccf 100644
*** a/src/backend/catalog/pg_conversion.c
--- b/src/backend/catalog/pg_conversion.c
***************
*** 14,19 ****
--- 14,20 ----
   */
  #include "postgres.h"

+ #include "access/genam.h"
  #include "access/heapam.h"
  #include "access/htup_details.h"
  #include "access/sysattr.h"
*************** ConversionCreate(const char *conname, Oi
*** 68,79 ****
      if (def)
      {
          /*
!          * make sure there is no existing default <for encoding><to encoding>
!          * pair in this name space
           */
!         if (FindDefaultConversion(connamespace,
!                                   conforencoding,
!                                   contoencoding))
              ereport(ERROR,
                      (errcode(ERRCODE_DUPLICATE_OBJECT),
                       errmsg("default conversion for %s to %s already exists",
--- 69,79 ----
      if (def)
      {
          /*
!          * make sure there is no existing default conversion for same
!          * encodings
           */
!         if (OidIsValid(FindDefaultConversionProc(conforencoding,
!                                                  contoencoding)))
              ereport(ERROR,
                      (errcode(ERRCODE_DUPLICATE_OBJECT),
                       errmsg("default conversion for %s to %s already exists",
*************** ConversionCreate(const char *conname, Oi
*** 100,106 ****
      values[Anum_pg_conversion_conforencoding - 1] = Int32GetDatum(conforencoding);
      values[Anum_pg_conversion_contoencoding - 1] = Int32GetDatum(contoencoding);
      values[Anum_pg_conversion_conproc - 1] = ObjectIdGetDatum(conproc);
!     values[Anum_pg_conversion_condefault - 1] = BoolGetDatum(def);

      tup = heap_form_tuple(tupDesc, values, nulls);

--- 100,109 ----
      values[Anum_pg_conversion_conforencoding - 1] = Int32GetDatum(conforencoding);
      values[Anum_pg_conversion_contoencoding - 1] = Int32GetDatum(contoencoding);
      values[Anum_pg_conversion_conproc - 1] = ObjectIdGetDatum(conproc);
!     if (def)
!         values[Anum_pg_conversion_condefault - 1] = BoolGetDatum(true);
!     else
!         nulls[Anum_pg_conversion_condefault - 1] = true;

      tup = heap_form_tuple(tupDesc, values, nulls);

*************** ConversionCreate(const char *conname, Oi
*** 145,159 ****
  /*
   * RemoveConversionById
   *
!  * Remove a tuple from pg_conversion by Oid. This function is solely
!  * called inside catalog/dependency.c
   */
  void
  RemoveConversionById(Oid conversionOid)
  {
      Relation    rel;
      HeapTuple    tuple;
!     HeapScanDesc scan;
      ScanKeyData scanKeyData;

      ScanKeyInit(&scanKeyData,
--- 148,162 ----
  /*
   * RemoveConversionById
   *
!  * Remove a tuple from pg_conversion by Oid.
!  * This function is solely called inside catalog/dependency.c.
   */
  void
  RemoveConversionById(Oid conversionOid)
  {
      Relation    rel;
      HeapTuple    tuple;
!     SysScanDesc scan;
      ScanKeyData scanKeyData;

      ScanKeyInit(&scanKeyData,
*************** RemoveConversionById(Oid conversionOid)
*** 161,213 ****
                  BTEqualStrategyNumber, F_OIDEQ,
                  ObjectIdGetDatum(conversionOid));

-     /* open pg_conversion */
      rel = heap_open(ConversionRelationId, RowExclusiveLock);

!     scan = heap_beginscan_catalog(rel, 1, &scanKeyData);

!     /* search for the target tuple */
!     if (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
          simple_heap_delete(rel, &tuple->t_self);
      else
          elog(ERROR, "could not find tuple for conversion %u", conversionOid);
!     heap_endscan(scan);
      heap_close(rel, RowExclusiveLock);
  }

  /*
!  * FindDefaultConversion
   *
!  * Find "default" conversion proc by for_encoding and to_encoding in the
!  * given namespace.
   *
   * If found, returns the procedure's oid, otherwise InvalidOid.  Note that
   * you get the procedure's OID not the conversion's OID!
   */
  Oid
! FindDefaultConversion(Oid name_space, int32 for_encoding, int32 to_encoding)
  {
!     CatCList   *catlist;
!     HeapTuple    tuple;
!     Form_pg_conversion body;
!     Oid            proc = InvalidOid;
!     int            i;

!     catlist = SearchSysCacheList3(CONDEFAULT,
!                                   ObjectIdGetDatum(name_space),
!                                   Int32GetDatum(for_encoding),
!                                   Int32GetDatum(to_encoding));

!     for (i = 0; i < catlist->n_members; i++)
      {
!         tuple = &catlist->members[i]->tuple;
!         body = (Form_pg_conversion) GETSTRUCT(tuple);
!         if (body->condefault)
!         {
!             proc = body->conproc;
!             break;
!         }
      }
!     ReleaseSysCacheList(catlist);
!     return proc;
  }
--- 164,229 ----
                  BTEqualStrategyNumber, F_OIDEQ,
                  ObjectIdGetDatum(conversionOid));

      rel = heap_open(ConversionRelationId, RowExclusiveLock);

!     scan = systable_beginscan(rel, ConversionOidIndexId, true,
!                               NULL, 1, &scanKeyData);

!     if (HeapTupleIsValid(tuple = systable_getnext(scan)))
          simple_heap_delete(rel, &tuple->t_self);
      else
          elog(ERROR, "could not find tuple for conversion %u", conversionOid);
!
!     systable_endscan(scan);
      heap_close(rel, RowExclusiveLock);
  }

  /*
!  * FindDefaultConversionProc
   *
!  * Find "default" conversion proc converting from for_encoding to to_encoding.
   *
   * If found, returns the procedure's oid, otherwise InvalidOid.  Note that
   * you get the procedure's OID not the conversion's OID!
   */
  Oid
! FindDefaultConversionProc(int32 for_encoding, int32 to_encoding)
  {
!     Oid            result = InvalidOid;
!     Relation    conRel;
!     ScanKeyData key[3];
!     SysScanDesc conScan;
!     HeapTuple    conTup;

!     conRel = heap_open(ConversionRelationId, AccessShareLock);

!     ScanKeyInit(&key[0],
!                 Anum_pg_conversion_conforencoding,
!                 BTEqualStrategyNumber, F_INT4EQ,
!                 Int32GetDatum(for_encoding));
!     ScanKeyInit(&key[1],
!                 Anum_pg_conversion_contoencoding,
!                 BTEqualStrategyNumber, F_INT4EQ,
!                 Int32GetDatum(to_encoding));
!     ScanKeyInit(&key[2],
!                 Anum_pg_conversion_condefault,
!                 BTEqualStrategyNumber, F_BOOLEQ,
!                 BoolGetDatum(true));
!
!     conScan = systable_beginscan(conRel, ConversionDefaultIndexId, true,
!                                  NULL, 3, key);
!
!     /* Assume there can be at most one match */
!     if (HeapTupleIsValid(conTup = systable_getnext(conScan)))
      {
!         Form_pg_conversion conForm = (Form_pg_conversion) GETSTRUCT(conTup);
!
!         result = conForm->conproc;
      }
!
!     systable_endscan(conScan);
!
!     relation_close(conRel, AccessShareLock);
!
!     return result;
  }
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 6eb2ac6..3d6f82b 100644
*** a/src/backend/utils/cache/syscache.c
--- b/src/backend/utils/cache/syscache.c
*************** static const struct cachedesc cacheinfo[
*** 303,319 ****
          },
          8
      },
-     {ConversionRelationId,        /* CONDEFAULT */
-         ConversionDefaultIndexId,
-         4,
-         {
-             Anum_pg_conversion_connamespace,
-             Anum_pg_conversion_conforencoding,
-             Anum_pg_conversion_contoencoding,
-             ObjectIdAttributeNumber,
-         },
-         8
-     },
      {ConversionRelationId,        /* CONNAMENSP */
          ConversionNameNspIndexId,
          2,
--- 303,308 ----
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 7f1c881..b9a9686 100644
*** a/src/backend/utils/mb/mbutils.c
--- b/src/backend/utils/mb/mbutils.c
***************
*** 35,41 ****
  #include "postgres.h"

  #include "access/xact.h"
! #include "catalog/namespace.h"
  #include "mb/pg_wchar.h"
  #include "utils/builtins.h"
  #include "utils/memutils.h"
--- 35,41 ----
  #include "postgres.h"

  #include "access/xact.h"
! #include "catalog/pg_conversion_fn.h"
  #include "mb/pg_wchar.h"
  #include "utils/builtins.h"
  #include "utils/memutils.h"
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 56c0528..3421b12 100644
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
*************** dumpConversion(Archive *fout, DumpOption
*** 12178,12183 ****
--- 12178,12184 ----
      conforencoding = PQgetvalue(res, 0, i_conforencoding);
      contoencoding = PQgetvalue(res, 0, i_contoencoding);
      conproc = PQgetvalue(res, 0, i_conproc);
+     /* condefault might be true, false, or null; this works regardless */
      condefault = (PQgetvalue(res, 0, i_condefault)[0] == 't');

      /*
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index ab2c1a8..25a465b 100644
*** a/src/include/catalog/indexing.h
--- b/src/include/catalog/indexing.h
*************** DECLARE_INDEX(pg_constraint_contypid_ind
*** 123,129 ****
  DECLARE_UNIQUE_INDEX(pg_constraint_oid_index, 2667, on pg_constraint using btree(oid oid_ops));
  #define ConstraintOidIndexId  2667

! DECLARE_UNIQUE_INDEX(pg_conversion_default_index, 2668, on pg_conversion using btree(connamespace oid_ops,
conforencodingint4_ops, contoencoding int4_ops, oid oid_ops)); 
  #define ConversionDefaultIndexId  2668
  DECLARE_UNIQUE_INDEX(pg_conversion_name_nsp_index, 2669, on pg_conversion using btree(conname name_ops, connamespace
oid_ops));
  #define ConversionNameNspIndexId  2669
--- 123,130 ----
  DECLARE_UNIQUE_INDEX(pg_constraint_oid_index, 2667, on pg_constraint using btree(oid oid_ops));
  #define ConstraintOidIndexId  2667

! /* NB: this index will not enforce uniqueness for rows with null condefault */
! DECLARE_UNIQUE_INDEX(pg_conversion_default_index, 2668, on pg_conversion using btree(conforencoding int4_ops,
contoencodingint4_ops, condefault bool_ops)); 
  #define ConversionDefaultIndexId  2668
  DECLARE_UNIQUE_INDEX(pg_conversion_name_nsp_index, 2669, on pg_conversion using btree(conname name_ops, connamespace
oid_ops));
  #define ConversionNameNspIndexId  2669
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index 2ccb3a7..35b0369 100644
*** a/src/include/catalog/namespace.h
--- b/src/include/catalog/namespace.h
*************** extern void PopOverrideSearchPath(void);
*** 135,141 ****

  extern Oid    get_collation_oid(List *collname, bool missing_ok);
  extern Oid    get_conversion_oid(List *conname, bool missing_ok);
- extern Oid    FindDefaultConversionProc(int32 for_encoding, int32 to_encoding);

  /* initialization & transaction cleanup code */
  extern void InitializeSearchPath(void);
--- 135,140 ----
diff --git a/src/include/catalog/pg_conversion.h b/src/include/catalog/pg_conversion.h
index 1cfe57c..7b8fff0 100644
*** a/src/include/catalog/pg_conversion.h
--- b/src/include/catalog/pg_conversion.h
***************
*** 32,38 ****
   *    conforencoding        FOR encoding id
   *    contoencoding        TO encoding id
   *    conproc                OID of the conversion proc
!  *    condefault            TRUE if this is a default conversion
   * ----------------------------------------------------------------
   */
  #define ConversionRelationId  2607
--- 32,41 ----
   *    conforencoding        FOR encoding id
   *    contoencoding        TO encoding id
   *    conproc                OID of the conversion proc
!  *    condefault            TRUE if this is a default conversion, else NULL
!  *
!  * (The odd definition of condefault allows us to make a poor man's
!  * partial unique index on conforencoding/contoencoding.)
   * ----------------------------------------------------------------
   */
  #define ConversionRelationId  2607
*************** CATALOG(pg_conversion,2607)
*** 45,51 ****
      int32        conforencoding;
      int32        contoencoding;
      regproc        conproc;
!     bool        condefault;
  } FormData_pg_conversion;

  /* ----------------
--- 48,57 ----
      int32        conforencoding;
      int32        contoencoding;
      regproc        conproc;
!
! #ifdef CATALOG_VARLEN            /* variable-length fields start here */
!     bool condefault BKI_FORCE_NULL;
! #endif
  } FormData_pg_conversion;

  /* ----------------
diff --git a/src/include/catalog/pg_conversion_fn.h b/src/include/catalog/pg_conversion_fn.h
index 9fcdde6..fb4eef7 100644
*** a/src/include/catalog/pg_conversion_fn.h
--- b/src/include/catalog/pg_conversion_fn.h
*************** extern ObjectAddress ConversionCreate(co
*** 22,27 ****
                   int32 conforencoding, int32 contoencoding,
                   Oid conproc, bool def);
  extern void RemoveConversionById(Oid conversionOid);
! extern Oid    FindDefaultConversion(Oid connamespace, int32 for_encoding, int32 to_encoding);

  #endif   /* PG_CONVERSION_FN_H */
--- 22,27 ----
                   int32 conforencoding, int32 contoencoding,
                   Oid conproc, bool def);
  extern void RemoveConversionById(Oid conversionOid);
! extern Oid    FindDefaultConversionProc(int32 for_encoding, int32 to_encoding);

  #endif   /* PG_CONVERSION_FN_H */
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 256615b..cd390b0 100644
*** a/src/include/utils/syscache.h
--- b/src/include/utils/syscache.h
*************** enum SysCacheIdentifier
*** 48,54 ****
      CLAOID,
      COLLNAMEENCNSP,
      COLLOID,
-     CONDEFAULT,
      CONNAMENSP,
      CONSTROID,
      CONVOID,
--- 48,53 ----
diff --git a/src/test/regress/expected/conversion.out b/src/test/regress/expected/conversion.out
index 37965ae..7260ed2 100644
*** a/src/test/regress/expected/conversion.out
--- b/src/test/regress/expected/conversion.out
*************** CREATE CONVERSION myconv FOR 'LATIN1' TO
*** 10,23 ****
  CREATE CONVERSION myconv FOR 'LATIN1' TO 'UTF8' FROM iso8859_1_to_utf8;
  ERROR:  conversion "myconv" already exists
  --
! -- create default conversion with qualified name
  --
! CREATE DEFAULT CONVERSION public.mydef FOR 'LATIN1' TO 'UTF8' FROM iso8859_1_to_utf8;
  --
! -- cannot make default conversion with same schema/for_encoding/to_encoding
  --
! CREATE DEFAULT CONVERSION public.mydef2 FOR 'LATIN1' TO 'UTF8' FROM iso8859_1_to_utf8;
! ERROR:  default conversion for LATIN1 to UTF8 already exists
  -- test comments
  COMMENT ON CONVERSION myconv_bad IS 'foo';
  ERROR:  conversion "myconv_bad" does not exist
--- 10,29 ----
  CREATE CONVERSION myconv FOR 'LATIN1' TO 'UTF8' FROM iso8859_1_to_utf8;
  ERROR:  conversion "myconv" already exists
  --
! -- create a default conversion --- has to not match any built-in default,
! -- so we use a dummy function from regress.c
  --
! CREATE DEFAULT CONVERSION public.mydef FOR 'LATIN1' TO 'BIG5' FROM iso8859_1_to_big5;
  --
! -- cannot make two default conversions for same encodings
  --
! CREATE DEFAULT CONVERSION mydef2 FOR 'LATIN1' TO 'BIG5' FROM iso8859_1_to_big5;
! ERROR:  default conversion for LATIN1 to BIG5 already exists
! --
! -- verify that we notice if wrong conversion function is specified
! --
! CREATE CONVERSION bogus FOR 'BIG5' TO 'UTF8' FROM iso8859_1_to_utf8;
! ERROR:  expected source encoding "LATIN1", but got "BIG5"
  -- test comments
  COMMENT ON CONVERSION myconv_bad IS 'foo';
  ERROR:  conversion "myconv_bad" does not exist
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 616e674..2618f69 100644
*** a/src/test/regress/expected/opr_sanity.out
--- b/src/test/regress/expected/opr_sanity.out
*************** WHERE p.oid = c.conproc AND
*** 853,859 ****
  -- there is no way to ask convert() to invoke them, and we cannot call
  -- them directly from SQL.  But there are no non-default built-in
  -- conversions anyway.
- -- (Similarly, this doesn't cope with any search path issues.)
  SELECT p1.oid, p1.conname
  FROM pg_conversion as p1
  WHERE condefault AND
--- 853,858 ----
diff --git a/src/test/regress/input/create_function_1.source b/src/test/regress/input/create_function_1.source
index f2b1561..6ed53a5 100644
*** a/src/test/regress/input/create_function_1.source
--- b/src/test/regress/input/create_function_1.source
*************** CREATE FUNCTION test_atomic_ops()
*** 62,67 ****
--- 62,72 ----
      AS '@libdir@/regress@DLSUFFIX@'
      LANGUAGE C;

+ CREATE FUNCTION iso8859_1_to_big5(int4, int4, cstring, internal, int4)
+     RETURNS void
+     AS '@libdir@/regress@DLSUFFIX@'
+     LANGUAGE C STRICT;
+
  -- Things that shouldn't work:

  CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL
diff --git a/src/test/regress/output/create_function_1.source b/src/test/regress/output/create_function_1.source
index 30c2936..603499c 100644
*** a/src/test/regress/output/create_function_1.source
--- b/src/test/regress/output/create_function_1.source
*************** CREATE FUNCTION test_atomic_ops()
*** 55,60 ****
--- 55,64 ----
      RETURNS bool
      AS '@libdir@/regress@DLSUFFIX@'
      LANGUAGE C;
+ CREATE FUNCTION iso8859_1_to_big5(int4, int4, cstring, internal, int4)
+     RETURNS void
+     AS '@libdir@/regress@DLSUFFIX@'
+     LANGUAGE C STRICT;
  -- Things that shouldn't work:
  CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL
      AS 'SELECT ''not an integer'';';
diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c
index bb30629..294cdc4 100644
*** a/src/test/regress/regress.c
--- b/src/test/regress/regress.c
***************
*** 29,34 ****
--- 29,35 ----
  #include "commands/trigger.h"
  #include "executor/executor.h"
  #include "executor/spi.h"
+ #include "mb/pg_wchar.h"
  #include "miscadmin.h"
  #include "port/atomics.h"
  #include "utils/builtins.h"
*************** test_atomic_ops(PG_FUNCTION_ARGS)
*** 1120,1122 ****
--- 1121,1153 ----

      PG_RETURN_BOOL(true);
  }
+
+
+ PG_FUNCTION_INFO_V1(iso8859_1_to_big5);
+ Datum
+ iso8859_1_to_big5(PG_FUNCTION_ARGS)
+ {
+     unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
+     unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
+     int            len = PG_GETARG_INT32(4);
+     unsigned short c;
+
+     CHECK_ENCODING_CONVERSION_ARGS(PG_LATIN1, PG_BIG5);
+
+     /*
+      * Since this is a dummy implementation, just throw error for non-ASCII
+      * characters.
+      */
+     while (len > 0)
+     {
+         c = *src;
+         if (c == 0 || IS_HIGHBIT_SET(c))
+             report_invalid_encoding(PG_LATIN1, (const char *) src, len);
+         *dest++ = c;
+         src++;
+         len--;
+     }
+     *dest = '\0';
+
+     PG_RETURN_VOID();
+ }
diff --git a/src/test/regress/sql/conversion.sql b/src/test/regress/sql/conversion.sql
index e31876b..68fb6e2 100644
*** a/src/test/regress/sql/conversion.sql
--- b/src/test/regress/sql/conversion.sql
***************
*** 3,21 ****
  --
  CREATE USER conversion_test_user WITH NOCREATEDB NOCREATEROLE;
  SET SESSION AUTHORIZATION conversion_test_user;
  CREATE CONVERSION myconv FOR 'LATIN1' TO 'UTF8' FROM iso8859_1_to_utf8;
  --
  -- cannot make same name conversion in same schema
  --
  CREATE CONVERSION myconv FOR 'LATIN1' TO 'UTF8' FROM iso8859_1_to_utf8;
  --
! -- create default conversion with qualified name
  --
! CREATE DEFAULT CONVERSION public.mydef FOR 'LATIN1' TO 'UTF8' FROM iso8859_1_to_utf8;
  --
! -- cannot make default conversion with same schema/for_encoding/to_encoding
  --
! CREATE DEFAULT CONVERSION public.mydef2 FOR 'LATIN1' TO 'UTF8' FROM iso8859_1_to_utf8;
  -- test comments
  COMMENT ON CONVERSION myconv_bad IS 'foo';
  COMMENT ON CONVERSION myconv IS 'bar';
--- 3,28 ----
  --
  CREATE USER conversion_test_user WITH NOCREATEDB NOCREATEROLE;
  SET SESSION AUTHORIZATION conversion_test_user;
+
  CREATE CONVERSION myconv FOR 'LATIN1' TO 'UTF8' FROM iso8859_1_to_utf8;
  --
  -- cannot make same name conversion in same schema
  --
  CREATE CONVERSION myconv FOR 'LATIN1' TO 'UTF8' FROM iso8859_1_to_utf8;
  --
! -- create a default conversion --- has to not match any built-in default,
! -- so we use a dummy function from regress.c
  --
! CREATE DEFAULT CONVERSION public.mydef FOR 'LATIN1' TO 'BIG5' FROM iso8859_1_to_big5;
  --
! -- cannot make two default conversions for same encodings
  --
! CREATE DEFAULT CONVERSION mydef2 FOR 'LATIN1' TO 'BIG5' FROM iso8859_1_to_big5;
! --
! -- verify that we notice if wrong conversion function is specified
! --
! CREATE CONVERSION bogus FOR 'BIG5' TO 'UTF8' FROM iso8859_1_to_utf8;
!
  -- test comments
  COMMENT ON CONVERSION myconv_bad IS 'foo';
  COMMENT ON CONVERSION myconv IS 'bar';
diff --git a/src/test/regress/sql/opr_sanity.sql b/src/test/regress/sql/opr_sanity.sql
index d6aa2e8..6216083 100644
*** a/src/test/regress/sql/opr_sanity.sql
--- b/src/test/regress/sql/opr_sanity.sql
*************** WHERE p.oid = c.conproc AND
*** 488,494 ****
  -- there is no way to ask convert() to invoke them, and we cannot call
  -- them directly from SQL.  But there are no non-default built-in
  -- conversions anyway.
- -- (Similarly, this doesn't cope with any search path issues.)

  SELECT p1.oid, p1.conname
  FROM pg_conversion as p1
--- 488,493 ----

Re: pg_conversion seems rather strangely defined

From

Noah Misch

Date:

07 January 2016, 03:56:58

On Tue, Jan 05, 2016 at 01:46:51PM -0500, Tom Lane wrote:
> I do not see a lot of point in the namespacing of encoding conversions
> either.  Does anyone really need or use search-path-dependent lookup of
> conversions?

I have not issued CREATE CONVERSION except to experiment, and I have never
worked in a database in which someone else had created one.  Among PGXN
distributions, CREATE CONVERSION appears only in the pyrseas test suite.  It
could be hard to track down testimony on real-world usage patterns, but I
envision two credible patterns.  First, you could change the default search
path to "corrected_conversions, pg_catalog, $user, public" and inject fixed
versions of the system conversions.  One could use that to backport commit
8d3e090.  Second, you could add conversions we omit entirely, like UTF8 ->
MULE_INTERNAL.  Dropping search-path-dependent lookup would remove the
supported way to fix system conversions.

> (If they do, it's probably broken anyway, since for example
> we do not trouble to re-identify the client encoding conversion functions
> when search_path changes.)

That's bad in principle, but I'll guess it's tolerable in practice.  Switching
among implementations of a particular conversion might happen with O(weeks) or
longer period, like updating your system's iconv() conversion tables.  I can't
easily envision one application switching between implementations over the
course of a session.  (An application doing that today probably works around
the problem, perhaps with extra "SET client_encoding" calls.)

Re: pg_conversion seems rather strangely defined

From

Tom Lane

Date:

07 January 2016, 04:56:23

Noah Misch <noah@leadboat.com> writes:
> On Tue, Jan 05, 2016 at 01:46:51PM -0500, Tom Lane wrote:
>> I do not see a lot of point in the namespacing of encoding conversions
>> either.  Does anyone really need or use search-path-dependent lookup of
>> conversions?

> I have not issued CREATE CONVERSION except to experiment, and I have never
> worked in a database in which someone else had created one.  Among PGXN
> distributions, CREATE CONVERSION appears only in the pyrseas test suite.  It
> could be hard to track down testimony on real-world usage patterns, but I
> envision two credible patterns.  First, you could change the default search
> path to "corrected_conversions, pg_catalog, $user, public" and inject fixed
> versions of the system conversions.  One could use that to backport commit
> 8d3e090.  Second, you could add conversions we omit entirely, like UTF8 ->
> MULE_INTERNAL.  Dropping search-path-dependent lookup would remove the
> supported way to fix system conversions.

The built-in conversions are very intentionally not pinned.  So to my
mind, the supported way to replace one is to drop it and create your own.
The way you describe works only if an appropriate search path is installed
at the time a new session activates its client encoding conversions.  TBH,
I have no idea whether we apply (for example) "ALTER ROLE SET search_path"
before that happens; but even if we do, there's no real guarantee that it
wouldn't change.  We publish no documentation about the order of startup
actions.  Moreover past attempts at defining dependencies between GUC
settings have been spectacular failures, so I really don't want to go
there in this context.

It would only be important to be able to do it like that if different
users of the same database had conflicting ideas about what was the
correct conversion between client and database encodings.  I submit
that that's somewhere around epsilon probability, considering we have
not even heard of anyone replacing the system conversions at all.

(Adding conversions we don't supply is, of course, orthogonal to this.)

Moreover, we have precedent both for this approach being a bad idea
and for us changing it without many complaints.  We used to have
search-path-dependent lookup of default index operator classes.
We found out that that was a bad idea and got rid of it, cf commit
3ac1ac58c.  This situation looks much the same to me.
        regards, tom lane

Re: pg_conversion seems rather strangely defined

From

Noah Misch

Date:

08 January 2016, 01:28:14

On Wed, Jan 06, 2016 at 11:56:14PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Tue, Jan 05, 2016 at 01:46:51PM -0500, Tom Lane wrote:
> >> I do not see a lot of point in the namespacing of encoding conversions
> >> either.  Does anyone really need or use search-path-dependent lookup of
> >> conversions?
> 
> > I have not issued CREATE CONVERSION except to experiment, and I have never
> > worked in a database in which someone else had created one.  Among PGXN
> > distributions, CREATE CONVERSION appears only in the pyrseas test suite.  It
> > could be hard to track down testimony on real-world usage patterns, but I
> > envision two credible patterns.  First, you could change the default search
> > path to "corrected_conversions, pg_catalog, $user, public" and inject fixed
> > versions of the system conversions.  One could use that to backport commit
> > 8d3e090.  Second, you could add conversions we omit entirely, like UTF8 ->
> > MULE_INTERNAL.  Dropping search-path-dependent lookup would remove the
> > supported way to fix system conversions.
> 
> The built-in conversions are very intentionally not pinned.  So to my
> mind, the supported way to replace one is to drop it and create your own.

I just learned something.  Interesting.

> The way you describe works only if an appropriate search path is installed
> at the time a new session activates its client encoding conversions.  TBH,
> I have no idea whether we apply (for example) "ALTER ROLE SET search_path"
> before that happens; but even if we do, there's no real guarantee that it
> wouldn't change.  We publish no documentation about the order of startup
> actions.  Moreover past attempts at defining dependencies between GUC
> settings have been spectacular failures, so I really don't want to go
> there in this context.
> 
> It would only be important to be able to do it like that if different
> users of the same database had conflicting ideas about what was the
> correct conversion between client and database encodings.  I submit
> that that's somewhere around epsilon probability, considering we have
> not even heard of anyone replacing the system conversions at all.
> 
> (Adding conversions we don't supply is, of course, orthogonal to this.)

Agreed on all those points.  Even so, I don't find that the need to set GUCs
in a particular order makes a good case for removing this ancient capability.
I _would_ send a new feature back for redesign on the strength of such a
defect, but that is different.

Independent from that dearth of positive cause to restrict this, users taking
the "DROP CONVERSION pg_catalog.foo" route get dump/restore problems.  pg_dump
doesn't notice that one dropped a pg_catalog conversion; the user would
manually repeat each drop before each restore.  That's especially awkward for
pg_upgrade.  I guess the user could drop each conversion in the new cluster's
template0, run pg_upgrade, and then recreate conversions in databases that had
not overridden them in the original cluster.  That's a mess.

> Moreover, we have precedent both for this approach being a bad idea
> and for us changing it without many complaints.  We used to have
> search-path-dependent lookup of default index operator classes.
> We found out that that was a bad idea and got rid of it, cf commit
> 3ac1ac58c.  This situation looks much the same to me.

Per the 3ac1ac58c log message, "CREATE OPERATOR CLASS only allows one default
opclass per datatype regardless of schemas."  That had been true since day one
for CREATE OPERATOR CLASS.  It doesn't hold for conversions today, and that's
a crucial difference between that commit and this proposal.

Thanks,
nm

Re: pg_conversion seems rather strangely defined

From

Tom Lane

Date:

08 January 2016, 01:54:24

Noah Misch <noah@leadboat.com> writes:
> On Wed, Jan 06, 2016 at 11:56:14PM -0500, Tom Lane wrote:
>> Moreover, we have precedent both for this approach being a bad idea
>> and for us changing it without many complaints.  We used to have
>> search-path-dependent lookup of default index operator classes.
>> We found out that that was a bad idea and got rid of it, cf commit
>> 3ac1ac58c.  This situation looks much the same to me.

> Per the 3ac1ac58c log message, "CREATE OPERATOR CLASS only allows one default
> opclass per datatype regardless of schemas."  That had been true since day one
> for CREATE OPERATOR CLASS.  It doesn't hold for conversions today, and that's
> a crucial difference between that commit and this proposal.

Well, the state of affairs is slightly different, but that doesn't mean
it's not equally broken.  What I took away from the default-opclass fiasco
is that search-path-based lookup is a good idea only when you are looking
up something *by name*, so that the user can resolve any ambiguity by
schema-qualifying that name.  Searches that aren't based on a
user-specified name shouldn't depend on search_path.  This is important
because there are multiple conflicting concerns when you choose a
search_path setting, and you may not want to allow any particular search
to force your hand on what you put where in your search path.  If, for
example, you don't really want to put "public" on the front of your search
path, the current code gives you no way to select a conversion that's in
"public".

If we need to cater for alternative conversions, my preference would be a
GUC or some other kind of setting that allows selecting a client I/O
conversion by name, whether it is "default" or not.  But given the lack
of apparent demand, I don't really want to design and implement such a
feature right now.
        regards, tom lane

Re: pg_conversion seems rather strangely defined

From

Tatsuo Ishii

Date:

08 January 2016, 03:57:52

> It would only be important to be able to do it like that if different
> users of the same database had conflicting ideas about what was the
> correct conversion between client and database encodings.  I submit
> that that's somewhere around epsilon probability, considering we have
> not even heard of anyone replacing the system conversions at all.

I used to had a customer who needs to have different client and
database encoding than the default.  That is, a slightly different
mapping between Shift-JIS and other database encoding.  Due to
unfortunate historical reasons, there are several Shift-JIS variants
(in addition to the standard defined by government, there are IBM, NEC
and Microsoft versions).  This is the reason why I wanted to have the
functionality at that time.  I'm not sure the customer still uses the
functionality, but it is possible that there are other users who have
similar use cases, since the Shift-JIS variants are still used.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: pg_conversion seems rather strangely defined

From

Tom Lane

Date:

08 January 2016, 04:35:31

Tatsuo Ishii <ishii@postgresql.org> writes:
>> It would only be important to be able to do it like that if different
>> users of the same database had conflicting ideas about what was the
>> correct conversion between client and database encodings.  I submit
>> that that's somewhere around epsilon probability, considering we have
>> not even heard of anyone replacing the system conversions at all.

> I used to had a customer who needs to have different client and
> database encoding than the default.  That is, a slightly different
> mapping between Shift-JIS and other database encoding.  Due to
> unfortunate historical reasons, there are several Shift-JIS variants
> (in addition to the standard defined by government, there are IBM, NEC
> and Microsoft versions).  This is the reason why I wanted to have the
> functionality at that time.  I'm not sure the customer still uses the
> functionality, but it is possible that there are other users who have
> similar use cases, since the Shift-JIS variants are still used.

Hm.  Odd that we've not heard complaints about the removal of
CONVERT(... USING ...), then.

I think it would be a good idea at least to put back some equivalent
of CONVERT(... USING ...), if for no other reason than that it would
ease testing.  As I understood it, the argument to remove it was not
that the functionality was bad, but that we were using a SQL-standard
syntax for what we concluded was not SQL-standard functionality.
I'd propose putting it back with a syntax of, say,
convert_with(input bytea, conversion_name text) returns bytea

As for the client encoding conversion case, I still say a
search-path-based lookup is a horrible idea, and furthermore there
seems no very good reason why it has to be restricted to default
conversions.  Aside from other arguments, that tends to push people
to mark *every* conversion as default, which is outright silly if
you have several competing ones.

As a sketch of an alternative, consider inventing a GUC named
preferred_conversions or some such, which is a list of
possibly-schema-qualified conversion names.  When establishing an
original or new value of client_encoding, we look through the list
for the first entry that exists and performs the desired encoding
conversion (whether or not it is default).  If there is no match,
look up the (unique) default conversion for the encoding pair, and
use that.  (Obviously this has to be done twice, once for each
direction, when setting up client encoding conversions.)  We could
apply the same rules for identifying which specific conversion to use
in convert() and siblings.
        regards, tom lane

Re: pg_conversion seems rather strangely defined

From

Tatsuo Ishii

Date:

12 January 2016, 02:27:01

>> I used to had a customer who needs to have different client and
>> database encoding than the default.  That is, a slightly different
>> mapping between Shift-JIS and other database encoding.  Due to
>> unfortunate historical reasons, there are several Shift-JIS variants
>> (in addition to the standard defined by government, there are IBM, NEC
>> and Microsoft versions).  This is the reason why I wanted to have the
>> functionality at that time.  I'm not sure the customer still uses the
>> functionality, but it is possible that there are other users who have
>> similar use cases, since the Shift-JIS variants are still used.
> 
> Hm.  Odd that we've not heard complaints about the removal of
> CONVERT(... USING ...), then.

Yeah, I'm not sure the customer updated to the newer version of
PostgreSQL.

> I think it would be a good idea at least to put back some equivalent
> of CONVERT(... USING ...), if for no other reason than that it would
> ease testing.  As I understood it, the argument to remove it was not
> that the functionality was bad, but that we were using a SQL-standard
> syntax for what we concluded was not SQL-standard functionality.
> I'd propose putting it back with a syntax of, say,
> 
>     convert_with(input bytea, conversion_name text) returns bytea
> 
> As for the client encoding conversion case, I still say a
> search-path-based lookup is a horrible idea, and furthermore there
> seems no very good reason why it has to be restricted to default
> conversions.  Aside from other arguments, that tends to push people
> to mark *every* conversion as default, which is outright silly if
> you have several competing ones.
> 
> As a sketch of an alternative, consider inventing a GUC named
> preferred_conversions or some such, which is a list of
> possibly-schema-qualified conversion names.  When establishing an
> original or new value of client_encoding, we look through the list
> for the first entry that exists and performs the desired encoding
> conversion (whether or not it is default).  If there is no match,
> look up the (unique) default conversion for the encoding pair, and
> use that.  (Obviously this has to be done twice, once for each
> direction, when setting up client encoding conversions.)  We could
> apply the same rules for identifying which specific conversion to use
> in convert() and siblings.

Sounds good to me.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp