Re: Unicode UTF-8 table formatting for psql text output - Mailing list pgsql-hackers

From Roger Leigh
Subject Re: Unicode UTF-8 table formatting for psql text output
Date
Msg-id 20091114174023.GB23762@codelibre.net
Whole thread Raw
In response to Re: Unicode UTF-8 table formatting for psql text output  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Unicode UTF-8 table formatting for psql text output
List pgsql-hackers
On Mon, Nov 09, 2009 at 05:40:54PM -0500, Bruce Momjian wrote:
> Tom Lane wrote:
> > Greg Stark <gsstark@mit.edu> writes:
> > > While i agree this looks nicer I wonder what it does to things like
> > > excel/gnumeric/ooffice auto-recognizing table layouts and importing
> > > files. I'm not sure our old format was so great for this so maybe this
> > > is actually an improvement I'm asking for.
> >
> > Yeah.  We can do what we like with the UTF8 format but I'm considerably
> > more worried about the aspect of making random changes to the
> > plain-ASCII output.  On the other hand, we changed that just a release
> > or so ago (to put in the multiline output in the first place) and
> > I didn't hear complaints about it that time.
>
> Sorry for the delayed reply:
>
> The line continuation characters were chosen in 8.4 for clarity --- if
> you have found something clearer for 8.5, let's make the improvement.  I
> think clarity is more important in this area than consistency with the
> previous psql output format.

The attached patch is proposed for the upcoming commitfest, and
hopefully takes into account the comments made in this thread.
To summarise the changes:

- it makes the handling of newlines and wrapped lines consistent
  between column header and data lines.
- it includes additional logic such that both the "old" and "new"
  styles are representable using the format structures, so we
  can retain backward compatibility if so desired (it's easy to
  remove if not).
- an "ascii-old" linestyle is added which is identical to the old
  style for those who need guaranteed reproducibility of output,
  but this is not the default.
- The Unicode format uses "↵" in the right-hand margin to indicate
  newlines.  Wrapped lines use "…" in the right-hand margin before,
  and left-hand margin after, a break (so you can visually follow
  the wrap).
- The ASCII format is the same but uses "+" and "." in place of
  carriage return and ellipsis symbols.
- All the above is documented in the SGML documentation, including
  the old style, which I always found confusing.

For comparison, I've included a transcript of all three linestyles
with a test script (also attached).

Any changes to the symbols used and/or their placement are trivially
made by just altering the format in print.c.


Regards,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Inspection of row types in pl/pgsql and pl/sql
Next
From: Robert Haas
Date:
Subject: Re: operator exclusion constraints