Re: Unicode UTF-8 table formatting for psql text output - Mailing list pgsql-hackers

From Roger Leigh
Subject Re: Unicode UTF-8 table formatting for psql text output
Date
Msg-id 20091025234826.GA6255@codelibre.net
Whole thread Raw
In response to Re: Unicode UTF-8 table formatting for psql text output  (Roger Leigh <rleigh@codelibre.net>)
Responses Re: Unicode UTF-8 table formatting for psql text output  (Roger Leigh <rleigh@codelibre.net>)
Re: Unicode UTF-8 table formatting for psql text output  (Peter Eisentraut <peter_e@gmx.net>)
Re: Unicode UTF-8 table formatting for psql text output  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
On Sat, Oct 24, 2009 at 06:23:24PM +0100, Roger Leigh wrote:
> On Fri, Oct 16, 2009 at 01:38:15PM +0300, Peter Eisentraut wrote:
> > I like the new Unicode tables, but the marking of continuation lines
> > looks pretty horrible:
> >
> >                              List of databases
> >      Name      │ Owner │ Encoding │ Collation │ Ctype │ Access
> > privileges
> > ───────────────┼───────┼──────────┼───────────┼───────┼───────────────────
> >  pl_regression │ peter │ LATIN2   │ cs_CZ     │ cs_CZ │
> >  postgres      │ peter │ LATIN2   │ cs_CZ     │ cs_CZ │
> >  template0     │ peter │ LATIN2   │ cs_CZ     │ cs_CZ │ =c/peter
> >                ╷       ╷          ╷           ╷       ╎ peter=CTc/peter
> >  template1     │ peter │ LATIN2   │ cs_CZ     │ cs_CZ │ =c/peter
> >                ╷       ╷          ╷           ╷       ╎ peter=CTc/peter
> > (4 rows)

Just for reference, this is what the output looks like (abridged)
using the attached patch.  Should display fine if your mail client handles
UTF-8 messages correctly:

rleigh=# \l
                                     List of databases
      Name       │  Owner   │ Encoding │  Collation  │    Ctype    │   Access privileges
─────────────────┼──────────┼──────────┼─────────────┼─────────────┼───────────────────────
 merkelpb        │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │
 postgres        │ postgres │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │
 projectb        │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │
 rleigh          │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │
[…]
 template0       │ postgres │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │ =c/postgres          ↵
                 │          │          │             │             │ postgres=CTc/postgres
 template1       │ postgres │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │ =c/postgres          ↵
                 │          │          │             │             │ postgres=CTc/postgres
[…]
(17 rows)

> > This looks more like a rendering mistake than something sensible, and
> > also it doesn't actually help the viewer to tell what lines are
> > continued, which was the original purpose.
>
> I've worked on a solution to this, and the preliminary patch for this
> is attached.  Note there are additional comments in the patch text.

I've attached an updated version of the patch.  This now
- adds one column extra padding in wrapped mode if border=0 to allow
  correct display of wrap marks on the rightmost column.  Removed
  special cases needed to suppress marks on the rightmost column.
- updated unicode symbols to use more commonly-available symbol
  (ellipsis) rather than hooked arrows.  Also makes wrapping more
  visually distinct cf. newlines.  The CR and ellipsis symbols are
  commonly available in several character sets other than unicode,
  and are available in many common fonts, including all the Windows
  monospaced terminal fonts.  Symbol availability isn't an issue on
  GNU/Linux.
- updated ASCII symbols to use similar-looking symbols to the Unicode
  ones.
- added some more comments to code
- removed redundant enum values

I've also attached a trivial test script to run through psql to test
the wrapping, as well as the output I get for psql from 8.4.1 and
the current mainline (both Unicode and ASCII) for comparison.
Hopefully I'm not the only one that finds the proposed change clearer
and more logical than the existing display!


There's just one tiny display glitch I can see, and that's if you have
mixed wrapping and newlines, you miss the lefthand wrap mark if the line
is the last wrapped line and it ends in a newline.  It might not be
possible to pick that up if we can't discover that from the line state
when printing out the line.  Might possibly require storing the wrap
state so we know what happened on the previous line, though if there's
a slightly cleverer approach to detecting if we're already wrapped
that would be better.


Regards,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment

pgsql-hackers by date:

Previous
From: Dave Page
Date:
Subject: Re: License clarification: BSD vs MIT
Next
From: Roger Leigh
Date:
Subject: Re: Unicode UTF-8 table formatting for psql text output