Re: Augment every test postgresql.conf - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Augment every test postgresql.conf
Date
Msg-id 20190512015615.GD1124997@rfd.leadboat.com
Whole thread Raw
In response to Re: Augment every test postgresql.conf  (Andrew Dunstan <andrew.dunstan@2ndquadrant.com>)
Responses Re: Augment every test postgresql.conf  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sun, Apr 07, 2019 at 07:56:02AM -0400, Andrew Dunstan wrote:
> On Sun, Apr 7, 2019 at 2:41 AM Noah Misch <noah@leadboat.com> wrote:
> >
> > On Sun, Dec 30, 2018 at 10:32:31AM -0500, Andrew Dunstan wrote:
> > > On 12/30/18 12:53 AM, Noah Misch wrote:
> > > > 2. stats_temp_directory is incompatible with TAP suites that start more than
> > > >    one node simultaneously.
> >
> > > The obvious quick fix would be to have PostgresNode.pm set this to the
> > > default after inserting the TEMP_CONFIG file.
> >
> > I'd like to get $SUBJECT in place for variables other than
> > stats_temp_directory, using your quick fix idea.  Attached.  When its time
> > comes, your stats_temp_directory work can delete that section.
> 
> Looks good.

Pushed.  This broke 010_dump_connstr.pl on bowerbird, introducing 'invalid
byte sequence for encoding "UTF8"' errors.  That's because log_connections
renders this 010_dump_connstr.pl solution insufficient:

  # In a SQL_ASCII database, pgwin32_message_to_UTF16() needs to
  # interpret everything as UTF8.  We're going to use byte sequences
  # that aren't valid UTF-8 strings, so that would fail.  Use LATIN1,
  # which accepts any byte and has a conversion from each byte to UTF-8.
  $ENV{LC_ALL}           = 'C';
  $ENV{PGCLIENTENCODING} = 'LATIN1';

The log_connections message prints before CheckMyDatabase() calls
pg_perm_setlocale() to activate that LATIN1 database encoding.  Since
bowerbird does a non-NLS build, GetMessageEncoding()==PG_SQL_ASCII at that
time.  Some options:

1. Make this one test explicitly set log_connections = off.  This workaround
   restores what we had a day ago.

2. Move the log_connections message after CheckMyDatabase() calls
   pg_perm_setlocale(), so it gets regular post-startup encoding treatment.
   That fixes this particular test.  It's still wrong when a database's name
   is not valid in that database's encoding.

3. If GetMessageEncoding()==PG_SQL_ASCII, make pgwin32_message_to_UTF16()
   assume the text is already UTF8, like it does when not in a transaction.
   If UTF8->UTF16 conversion fails, the caller will send untranslated bytes to
   write() or ReportEventA().

4. If GetMessageEncoding()==PG_SQL_ASCII, make pgwin32_message_to_UTF16()
   return NULL.  The caller will always send untranslated bytes to write() or
   ReportEventA().  This seems consistent with the SQL_ASCII concept and with
   pg_do_encoding_conversion()'s interpretation of SQL_ASCII.

5. When including a datname or rolname value in a message, hex-escape
   non-ASCII bytes.  They are byte sequences, not text of known encoding.
   This preserves the most information, but it's overkill and ugly in the
   probably-common case of one encoding across all databases of a cluster.

I'm inclined to do (1) in back branches and (4) in HEAD only.  (If starting
fresh today, I would store the encoding of each rolname and dbname or just use
UTF8 for those particular fields.)  Other preferences?

Thanks,
nm



pgsql-hackers by date:

Previous
From: Rikard Falkeborn
Date:
Subject: Wrong dead return value in jsonb_utils.c
Next
From: Tom Lane
Date:
Subject: Re: Augment every test postgresql.conf