Thread: Unicode UTF-8 table formatting for psql text output

Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
psql currently uses the ASCII characters '-' '|' and '+' to render
tables.  However, most modern terminal emulators are capable of
displaying characters other than ASCII, including box drawing
characters which can be used to create rather more pleasing and
readable tables than ASCII punctuation can achieve.

The following set of patches firstly teach psql how to draw nice
tables, by abstracting the characters used for formatting tables into
a simple structure.  The table formatting can then be passed to
functions needing to know how to draw tables, and they can then use
the provided symbols rather than hard-coding ASCII punctuation.
Tables for ASCII and Unicode UTF-8 are provided.

The conversion of print_aligned_text is a very straightforward
substitution.  print_aligned_vertical is slightly more complex because
its use of string offsets to overwrite the record number assume
"1 byte = 1 character", which is no longer valid for multibyte
encodings such as UTF-8.  This has been refactored to split out the
line drawing out into an inline helper function
(print_aligned_vertical_line) which actually makes the code more like
print_aligned_text with its _print_horizontal_line helper, and
contains no assumptions about indexes into character arrays.

Examples of this in action are shown at the end of this mail.

Portability: UTF-8 strings are encoded as standard C octal escapes,
and so are portable.  I added UTF-8 comments to show what the numbers
encode, which can be removed if needed.  The code depends on
nl_langinfo(3) from <langinfo.h>, but this is #ifdef'd to allow
building on systems without support.  By default, an ASCII table is
used, which will result in identical behaviour to the current psql.
However, if nl_langinfo is available, and it reports that the locale
codeset is UTF-8, it will switch to using Unicode UTF-8 box-drawing
characters, which draw identical tables to the current psql, just with
different characters.

Extensibility: The table formatting can potentially be used to support
other character sets containing other box drawing characters, for
example IBM CP 437 or 850.  However, I have just stuck with UTF-8 for
now!

To follow:
[PATCH 1/6] psql: Abstract table formatting characters used for different line types.
[PATCH 2/6] psql: Add table formats for ASCII and UTF-8
[PATCH 3/6] psql: Create table format
[PATCH 4/6] psql: Pass table formatting object to text output functions
[PATCH 5/6] psql: print_aligned_text uses table formatting
[PATCH 6/6] psql: print_aligned_vertical uses table formatting

In the examples below, I think there's just one minor issue, which is
a leading '-' with border=0 and expanded=1, which I just noticed while
sending this mail.  I'll tidy that up and send another patch.

This is something I really think makes psql more readable and more
usable, which I've been working on over the last couple of nights, and
so here it is for your comments and criticism.  I hope you find it
useful!


Regards,
Roger


Examples:

rleigh=# \pset border 0
Border style is 0.     
rleigh=# SELECT * FROM package_priorities;
id   name
── ─────────1 extra2 important3 optional4 required5 standard
(5 rows)

rleigh=# \pset border 1
Border style is 1.
rleigh=# SELECT * FROM package_priorities;id │   name
────┼─────────── 1 │ extra 2 │ important 3 │ optional 4 │ required 5 │ standard
(5 rows)

rleigh=# \pset border 2
Border style is 2.
rleigh=# SELECT * FROM package_priorities;
┌────┬───────────┐
│ id │   name    │
├────┼───────────┤
│  1 │ extra     │
│  2 │ important │
│  3 │ optional  │
│  4 │ required  │
│  5 │ standard  │
└────┴───────────┘
(5 rows)

rleigh=# \pset border 0
Border style is 0.     
rleigh=# \pset expanded 1
Expanded display is on.
rleigh=# SELECT * FROM package_priorities;
─* Record 1                               
id   1                                    
name extra                                
─* Record 2                               
id   2                                    
name important                            
─* Record 3                               
id   3                                    
name optional                             
─* Record 4                               
id   4                                    
name required                             
─* Record 5                               
id   5                                    
name standard                             

[ this might need a tiny tweak to remove the leading - ]

rleigh=# \pset border 1
Border style is 1.     
rleigh=# SELECT * FROM package_priorities;
─[ RECORD 1 ]───
id   │ 1
name │ extra
─[ RECORD 2 ]───
id   │ 2
name │ important
─[ RECORD 3 ]───
id   │ 3
name │ optional
─[ RECORD 4 ]───
id   │ 4
name │ required
─[ RECORD 5 ]───
id   │ 5
name │ standard

rleigh=# \pset border 2
Border style is 2.
rleigh=# SELECT * FROM package_priorities;
┌─[ RECORD 1 ]─────┐
│ id   │ 1         │
│ name │ extra     │
├─[ RECORD 2 ]─────┤
│ id   │ 2         │
│ name │ important │
├─[ RECORD 3 ]─────┤
│ id   │ 3         │
│ name │ optional  │
├─[ RECORD 4 ]─────┤
│ id   │ 4         │
│ name │ required  │
├─[ RECORD 5 ]─────┤
│ id   │ 5         │
│ name │ standard  │
└──────┴───────────┘

rleigh=# \ds                  List of relationsSchema │           Name            │   Type   │ Owner
────────┼───────────────────────────┼──────────┼────────public │ architectures_id_seq      │ sequence │ rleighpublic │
binaries_id_seq          │ sequence │ rleighpublic │ components_id_seq         │ sequence │ rleighpublic │
distributions_id_seq     │ sequence │ rleighpublic │ package_priorities_id_seq │ sequence │ rleighpublic │
package_sections_id_seq  │ sequence │ rleighpublic │ sections_id_seq           │ sequence │ rleighpublic │
states_id_seq            │ sequence │ rleigh
 
(8 rows)

--  .''`.  Roger Leigh: :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/`. `'   Printing on
GNU/Linux?      http://gutenprint.sourceforge.net/  `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.
 


[PATCH 7/7] psql: Don't print leading - with expanded=1 and border=0

From
Roger Leigh
Date:
Signed-off-by: Roger Leigh <rleigh@debian.org>
---src/bin/psql/print.c |    2 +-1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c
index e4e9f01..be81adc 100644
--- a/src/bin/psql/print.c
+++ b/src/bin/psql/print.c
@@ -1014,7 +1014,7 @@ print_aligned_vertical_line(const printTableContent *cont,        fputs(lformat->leftvrule,
fout);       fputs(format->hrule, fout);    }
 
-    else
+    else if (opt_border == 1)        fputs(format->hrule, fout);    if (record)
-- 
1.6.3.3



Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
Small followup patch to correct expanded=1 and border=0 output
as mentioned in previous email.


Regards,
Roger

--  .''`.  Roger Leigh: :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/`. `'   Printing on
GNU/Linux?      http://gutenprint.sourceforge.net/  `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.
 


[PATCH 9/9] psql: print_aligned_vertical: Correct indentation

From
Roger Leigh
Date:
Signed-off-by: Roger Leigh <rleigh@debian.org>
---src/bin/psql/print.c |    8 ++++----1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c
index c6394ad..10faeb3 100644
--- a/src/bin/psql/print.c
+++ b/src/bin/psql/print.c
@@ -1154,10 +1154,10 @@ print_aligned_vertical(const printTableContent *cont,    /* print records */    for (i = 0, ptr
=cont->cells; *ptr; i++, ptr++)    {
 
-      int    line_count,
-        dcomplete,
-        hcomplete;
-        printTextRule pos = PRINT_RULE_MIDDLE;
+        int        line_count,
+                    dcomplete,
+                hcomplete;
+        printTextRule    pos = PRINT_RULE_MIDDLE;        if (i == 0)              pos = PRINT_RULE_TOP;        else if
(!(*(ptr+1)))
-- 
1.6.3.3



[PATCH 8/9] psql: print_aligned_vertical_line: Correct alignment

From
Roger Leigh
Date:
Correct a corner case where the middle column separator overlaps
the right edge of the record number.

Signed-off-by: Roger Leigh <rleigh@debian.org>
---src/bin/psql/print.c |    8 ++++----1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c
index be81adc..c6394ad 100644
--- a/src/bin/psql/print.c
+++ b/src/bin/psql/print.c
@@ -1034,16 +1034,16 @@ print_aligned_vertical_line(const printTableContent *cont,    if (opt_border > 0)    {
-        if (--reclen <= 0)
+        if (reclen-- <= 0)            fputs(format->hrule, fout);
-        if (--reclen <= 0)
+        if (reclen-- <= 0)            fputs(lformat->midvrule, fout);
-        if (--reclen <= 0)
+        if (reclen-- <= 0)            fputs(format->hrule, fout);    }    else    {
-        if (reclen-- > 0)
+        if (reclen-- <= 0)            fputs(" ", fout);    }    if (reclen < 0)
-- 
1.6.3.3



Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
Further minor cleanups to tweak column alignment in a corner case,
and to correct indentation to match the rest of the code.


Regards,
Roger

--  .''`.  Roger Leigh: :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/`. `'   Printing on
GNU/Linux?      http://gutenprint.sourceforge.net/  `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.
 


Re: Unicode UTF-8 table formatting for psql text output

From
Robert Haas
Date:
On Sat, Aug 22, 2009 at 2:13 PM, Roger Leigh<rleigh@debian.org> wrote:
> Further minor cleanups to tweak column alignment in a corner case,
> and to correct indentation to match the rest of the code.

Please read the guidelines here:
http://wiki.postgresql.org/wiki/Submitting_a_Patch

I don't think it's very helpful to submit a patch intended to do
basically one thing split up over 10 threads.  The PostgreSQL style is
heavyweight commits, and I don't believe that there's any reason to
suppose that someone would want to commit any one of these 9 pieces
without the other 8.  So I'd suggest you resend it as a single patch
and then add it here.

https://commitfest.postgresql.org/action/commitfest_view/open

...Robert


Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Sat, Aug 22, 2009 at 07:42:10PM -0400, Robert Haas wrote:
> On Sat, Aug 22, 2009 at 2:13 PM, Roger Leigh<rleigh@debian.org> wrote:
> > Further minor cleanups to tweak column alignment in a corner case,
> > and to correct indentation to match the rest of the code.
>
> Please read the guidelines here:
> http://wiki.postgresql.org/wiki/Submitting_a_Patch
>
> I don't think it's very helpful to submit a patch intended to do
> basically one thing split up over 10 threads.  The PostgreSQL style is
> heavyweight commits, and I don't believe that there's any reason to
> suppose that someone would want to commit any one of these 9 pieces
> without the other 8.

OK.  I have attached a single patch which combines the nine patches
into one.  I did read the patch page above, but it didn't mention
anything on patch size, so I split it into incremental logical
changes in order to make it easily reviewable.  6-9 can be viewed
as a whole, since 7-9 are minor fixes to 6.

The other information requested on that page:

project: postgresql
patch name: psql-utf8-table-1.patch
purpose: adds support for Unicode UTF-8 box drawing to the text
  table drawing functions print_aligned_text and
  print_aligned_vertical.
for: discussion or application.  Testing shows identical formatting
  to current code.
branch: HEAD
compiles: yes
platform-specific: POSIX, but contains appropriate preprocessor
  conditionals for portability.  Not tested on non-POSIX platform,
  but conditional taken from elsewhere (port/chklocale.c).  Only
  two lines of the patch are platform-specific, the rest is
  plain portable C.
regression tests: No.  I'm not aware of any psql regression tests
  testing output formatting.
  Manually confirmed output is unchanged with border=0/1/2 and
  expanded=0/1.
use: Use a locale with LC_CTYPE CODESET=UTF-8.
performance: Not tested, but should be minimal effect.  There's
  an additional pointer dereference during string formatting
  in some places.  Temporary copies of data are used in some
  functions e.g. of format->lrule[] for convenience and to potentially
  save on the number of dereferences, though any optimising compiler
  should make this cost zero (all data and pointers are const).
comments:

psql: Abstract table formatting characters used for different line types.

printTextLineFormat describes the characters used to draw vertical lines across
a horizontal rule at the left side, middle and right hand side. These are
included in the formatting for an entire table (printTextFormat).  The
printTextRule enum is used as an offset into the printTextFormat line rules
(lrule), allowing specification of line styles for the top, middle and bottom
horizontal lines in a table.  The other printTextFormat members, hrule and
vrule define the formatting needed to draw horizontal and vertical rules.
Add table formats for ASCII and UTF-8.  Default to using the ASCII table.
However, if a UTF-8 locale codeset is in use, switch to the UTF-8 table.
print_aligned_text and print_aligned_vertical, and their helper fuctions pass
the table formatting and (where applicable) line style information to allow
correct printing of table lines.
Convert print_aligned_text, and its helper function, to use table formatting in
place of hardcoded ASCII characters.  Convert print_aligned_vertical to use
table formatting in place of hardcoded ASCII characters, and add helper
function print_aligned_vertical_line to format individual lines.


Regards,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment

Re: Unicode UTF-8 table formatting for psql text output

From
Greg Stark
Date:
On Sun, Aug 23, 2009 at 1:49 AM, Roger Leigh<rleigh@codelibre.net> wrote:
>> Please read the guidelines here:
>> http://wiki.postgresql.org/wiki/Submitting_a_Patch
>>
>> I don't think it's very helpful to submit a patch intended to do
>> basically one thing split up over 10 threads.  The PostgreSQL style is
>> heavyweight commits, and I don't believe that there's any reason to
>> suppose that someone would want to commit any one of these 9 pieces
>> without the other 8.
>
> OK.  I have attached a single patch which combines the nine patches
> into one.  I did read the patch page above, but it didn't mention
> anything on patch size, so I split it into incremental logical
> changes in order to make it easily reviewable.  6-9 can be viewed
> as a whole, since 7-9 are minor fixes to 6.

I don't really have an opinion between one big patch versus multiple
smaller patches. That will come down to the code and whether the
separate patches are easier to read. It is sometimes hard to review a
patch when its separated from the code which needs the new
functionality though.

What I do think is getting my inbox suddenly blatted with a screenful
of patches from one author on one topicis pretty annoying. At the very
least please send all the patches as attachments to a single email. I
find it hard to understand how the linux-kernel list functions if
everyone's patches are all mixed up together and you can't save a
message and have all the related code saved but have to go back to the
list to find emails with similar names.

But the more fundamental failure here is that this email should have
been sent first. You dumped a whole pile of changes on us without ever
asking what we thought of the idea which is "not how it's done". It's
actually not as bad as it looked because it's actually not as many
changes as it looked like at first, but still.

Now I happen to like the idea of using UTF8 border characters (I
hacked emacs-w3 to use the terminal font in X to do something similar
bitd so maybe I'm biased by nostalgia though). And the changes are
actually pretty simple despite looking bigger than they really are.
They're actually only a few dozen lines.

I wonder if there's a UTF8 tab character or monospaced space character
or something which would make these tables display better in my mail
client. Currently they're completely messed up because the display
engine doesn't recognize they should be monospaced whereas the usual
output does get recognized. As a result the padding all gets collapsed
and everything is misaligned.

--
greg
http://mit.edu/~gsstark/resume.pdf


Re: Unicode UTF-8 table formatting for psql text output

From
Andrew Dunstan
Date:

Greg Stark wrote:
> I don't really have an opinion between one big patch versus multiple
> smaller patches. That will come down to the code and whether the
> separate patches are easier to read. It is sometimes hard to review a
> patch when its separated from the code which needs the new
> functionality though.
>
>
>   

I have an opinion. These patches involved multiple changes to the same 
few files.

Please, don't do that.

In general one patch per feature is what makes life for reviewers and 
committers easier. For big features, sometimes we ask people to break it 
up into several patches, but this wasn't a big feature at all, and there 
is not the slightest need for more than one patch.

cheers

andrew


Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Sun, Aug 23, 2009 at 01:49:02AM +0100, Roger Leigh wrote:
> On Sat, Aug 22, 2009 at 07:42:10PM -0400, Robert Haas wrote:
> > On Sat, Aug 22, 2009 at 2:13 PM, Roger Leigh<rleigh@debian.org> wrote:
> > > Further minor cleanups to tweak column alignment in a corner case,
> > > and to correct indentation to match the rest of the code.
> >
> > Please read the guidelines here:
> > http://wiki.postgresql.org/wiki/Submitting_a_Patch
> >
> > I don't think it's very helpful to submit a patch intended to do
> > basically one thing split up over 10 threads.  The PostgreSQL style is
> > heavyweight commits, and I don't believe that there's any reason to
> > suppose that someone would want to commit any one of these 9 pieces
> > without the other 8.
>
> OK.  I have attached a single patch which combines the nine patches
> into one.  I did read the patch page above, but it didn't mention

A second patch is attached.  This is identical to the previous one,
except that it's re-diffed against the REL8_4_STABLE branch for
anyone who wishes to use this with 8.4.

Again, apologies for the noise with my first set of patches.


Regards,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment

Re: Unicode UTF-8 table formatting for psql text output

From
Alvaro Herrera
Date:
Roger Leigh escribió:

> +#if (defined(HAVE_LANGINFO_H) && defined(CODESET))
> +    if (!strcmp(nl_langinfo(CODESET), "UTF-8"))
> +        text_format = &utf8format;
> +#endif

I think you should also try to match to "UTF8", and do it in
case-insensitive manner (pg_strcasecmp), just like chklocale.c.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Sun, Aug 23, 2009 at 06:33:49PM -0400, Alvaro Herrera wrote:
> Roger Leigh escribió:
>
> > +#if (defined(HAVE_LANGINFO_H) && defined(CODESET))
> > +    if (!strcmp(nl_langinfo(CODESET), "UTF-8"))
> > +        text_format = &utf8format;
> > +#endif
>
> I think you should also try to match to "UTF8", and do it in
> case-insensitive manner (pg_strcasecmp), just like chklocale.c.

An updated copy of the patch is attached.  This has the following
changes:

- matches UTF-8, utf8 and CP65001 case insensitively as for chklocale.c
- uses fprintf in place of repeated fputs/fputc calls
- moved hrule from table to line formatting
- moved vrule from table to separate line format

The latter two allow complete specification of all the characters used
to draw.  You can now have different horizonal rule types for the
different line types (demonstrated by making PRINT_RULE_MIDDLE thick
rather than thin).  The same applies to the vertical rules, but since
these are drawn intermixed with data these are a separate line format
in their own right (PRINT_RULE_DATA).

You can if you like make incredibly ugly tables consisting of all sorts
of thin, thick and double ruled lines running horizontally and
vertically at all edges and positions.  The patch as it is keeps this
subtle!  But, a potential future extension to this could be a \pset
command to customise the table style in addition to border printing,
since the code is now there to do this.


Regards,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment

Re: Unicode UTF-8 table formatting for psql text output

From
Alvaro Herrera
Date:
Roger Leigh escribió:

> An updated copy of the patch is attached.

I give it a try.  It looks very reasonable.  We could argue about the
exact chars to use but that can be changed later.

Did you give expanded output a look?  (\x)  I find it a bit weird that
the first line shows a single-pixel wide line but the subsequent ones
are thicker.

BTW I think you should also look at multiline fields,
a │                                           b                                           
━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━4 │ some text  : and some
more : and then some
 
(1 filas)

And wrapped:

alvherre=# select * from foo;a │                                     b                                      
━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━5 │ En un lugar de la Mancha, de cuyo
nombreno quiero acordarme, no ha mucho   ; tiempo que vivía un hidalgo
 
(1 fila)


-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Tue, Aug 25, 2009 at 10:32:50PM -0400, Alvaro Herrera wrote:
> Roger Leigh escribió:
>
> > An updated copy of the patch is attached.
>
> Did you give expanded output a look?  (\x)  I find it a bit weird that
> the first line shows a single-pixel wide line but the subsequent ones
> are thicker.

Yes, it's just due to the fact that the middle lines are using a
thicker line character, while the top and bottom lines are thin.
This can easily be changed to be e.g. all thin.

> BTW I think you should also look at multiline fields,
>
>  a │                                           b
> ━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
>  4 │ some text
>    : and some more
>    : and then some
> (1 filas)
>
> And wrapped:
>
> alvherre=# select * from foo;
>  a │                                     b
> ━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
>  5 │ En un lugar de la Mancha, de cuyo nombre no quiero acordarme, no ha mucho
>    ; tiempo que vivía un hidalgo
> (1 fila)

I initially left these exactly the same as for ASCII (the ':' and ';'
usage).  However, it's quite possible to make it use other characters.
We could use the same lines, or two, three or four dashed lines
('╎' and '╏', or ┆' and '┇' or '┊' and '┋').
There are also additional characters such as half-lines
('╶', '╷', '╹' and '╻').  Is this the kind of this you are referring
to?

The wrapping code also appears slightly broken anyway, since
continuation lines don't get '|' printed for subsequent blank columns
on the same line:

# SELECT * FROM testw;a │          b          │ c │ kfduagahkjdfghalksdkfhajsdkl
━━━┿━━━━━━━━━━━━━━━━━━━━━┿━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━1 │ This is a test stri │   │
; ng2 │ sbuild-createchroot │   │  ;  installs build-ess  ; entials so your wor  ; king3 │ sbuild-createchroot │   │  ;
installs build-ess  ; entials so your wor  ; king environment sh  ; ould already be ok.  ;  You will need only  ;  to
adda few more  ; packages in the chr  ; oot, using e.g. 
(3 rows)

(this is unchanged from psql behaviour in CVS.)

I also see in the code that under some circumstances (curr_nl_line), a
'+' is used instead of a space when printing table headers, but I
haven't been able to trigger this yet.  We could also use dashed
horizontal rules here.


Regards,
Roger

--  .''`.  Roger Leigh: :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/`. `'   Printing on
GNU/Linux?      http://gutenprint.sourceforge.net/  `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail. 

Re: Unicode UTF-8 table formatting for psql text output

From
Alvaro Herrera
Date:
Roger Leigh escribió:
> On Tue, Aug 25, 2009 at 10:32:50PM -0400, Alvaro Herrera wrote:
> > Roger Leigh escribió:
> > 
> > > An updated copy of the patch is attached.
> > 
> > Did you give expanded output a look?  (\x)  I find it a bit weird that
> > the first line shows a single-pixel wide line but the subsequent ones
> > are thicker.
> 
> Yes, it's just due to the fact that the middle lines are using a
> thicker line character, while the top and bottom lines are thin.
> This can easily be changed to be e.g. all thin.

Yeah, I think expanded output should use the same char for all lines.

> > BTW I think you should also look at multiline fields,
> > 
> >  a │                                           b                                           
> > ━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> >  4 │ some text
> >    : and some more
> >    : and then some
> > (1 filas)
> > 
> > And wrapped:
> > 
> > alvherre=# select * from foo;
> >  a │                                     b                                      
> > ━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> >  5 │ En un lugar de la Mancha, de cuyo nombre no quiero acordarme, no ha mucho 
> >    ; tiempo que vivía un hidalgo
> > (1 fila)
> 
> I initially left these exactly the same as for ASCII (the ':' and ';'
> usage).  However, it's quite possible to make it use other characters.
> We could use the same lines, or two, three or four dashed lines
> ('╎' and '╏', or ┆' and '┇' or '┊' and '┋').

This works for me (say ╎ for newline-separated strings and ┊ for wrapped
output).

> The wrapping code also appears slightly broken anyway, since
> continuation lines don't get '|' printed for subsequent blank columns
> on the same line:

Yeah ... maybe there's a point to this, or maybe it's just a bug -- I
don't know.

> I also see in the code that under some circumstances (curr_nl_line), a
> '+' is used instead of a space when printing table headers, but I
> haven't been able to trigger this yet.  We could also use dashed
> horizontal rules here.

Hmm, can't say I know what's this about.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Unicode UTF-8 table formatting for psql text output

From
Alvaro Herrera
Date:
Alvaro Herrera escribió:

> > I initially left these exactly the same as for ASCII (the ':' and ';'
> > usage).  However, it's quite possible to make it use other characters.
> > We could use the same lines, or two, three or four dashed lines
> > ('╎' and '╏', or ┆' and '┇' or '┊' and '┋').
> 
> This works for me (say ╎ for newline-separated strings and ┊ for wrapped
> output).

So it'd look like this:

alvherre=# select * from foo;a │                                     b                                      
━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━1 │ En un lugar de la Mancha, de cuyo
nombreno quiero acordarme, no ha mucho   ┊ tiempo que vivía un hidalgo de los de lanza en astillero, adarga antigua,
┊rocín flaco y galgo corredor. Una olla de algo más vaca que carnero, salpi  ┊ cón las más noches, duelos y quebrantos
lossábados, lantejas los viernes,  ┊  algún palomino de añadidura los domingos, consumían las tres partes de su  ┊
hacienda.El resto della concluían sayo de velarte, calzas de velludo par  ┊ a las fiestas, con sus pantuflos de lo
mesmo,y los días de entresemana se  ┊  honraba con su vellorí de lo más fino.  Tenía en su casa una ama que pasa  ┊ ba
delos cuarenta, y una sobrina que no llegaba a los veinte, y un mozo d  ┊ e campo y plaza, que así ensillaba el rocín
comotomaba la podadera.2 │ —«Nunca fuera caballero de  ╎ damas tan bien servido  ╎ como fuera don Quijote  ╎ cuando de
sualdea vino:  ╎ doncellas curaban dél;  ╎ princesas, del su rocino»
 
(2 filas)

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Thu, Aug 27, 2009 at 10:17:32AM -0400, Alvaro Herrera wrote:
> Alvaro Herrera escribió:
>
> > > I initially left these exactly the same as for ASCII (the ':' and ';'
> > > usage).  However, it's quite possible to make it use other characters.
> > > We could use the same lines, or two, three or four dashed lines
> > > ('╎' and '╏', or ┆' and '┇' or '┊' and '┋').
> >
> > This works for me (say ╎ for newline-separated strings and ┊ for wrapped
> > output).
>
> So it'd look like this:
>
> alvherre=# select * from foo;
>  a │                                     b
> ━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
>  1 │ En un lugar de la Mancha, de cuyo nombre no quiero acordarme, no ha mucho
>    ┊ tiempo que vivía un hidalgo de los de lanza en astillero, adarga antigua,
>    ┊ rocín flaco y galgo corredor. Una olla de algo más vaca que carnero, salpi
>    ┊ cón las más noches, duelos y quebrantos los sábados, lantejas los viernes,
>    ┊  algún palomino de añadidura los domingos, consumían las tres partes de su
>    ┊  hacienda. El resto della concluían sayo de velarte, calzas de velludo par
>    ┊ a las fiestas, con sus pantuflos de lo mesmo, y los días de entresemana se
>    ┊  honraba con su vellorí de lo más fino.  Tenía en su casa una ama que pasa
>    ┊ ba de los cuarenta, y una sobrina que no llegaba a los veinte, y un mozo d
>    ┊ e campo y plaza, que así ensillaba el rocín como tomaba la podadera.
>  2 │ —«Nunca fuera caballero de
>    ╎ damas tan bien servido
>    ╎ como fuera don Quijote
>    ╎ cuando de su aldea vino:
>    ╎ doncellas curaban dél;
>    ╎ princesas, del su rocino»
> (2 filas)

The attached patch adds support for the above output in wrapped mode.
It also uses single dashes (half lines) to represent lines which
contain no data, just vertical padding.  Output is unchanged for
ASCII output.

The last bit I'm missing is how to handle column wrapping for aligned
text output.  This is was introduced on 8th May 2008 by Bryce Nesbitt,
committed by Bruce Momjian (git commit f92f424d).  This patch
introduced the wrapped format, but I have not been able to activate
the codepaths that wrap column names in the table header (adding '+').

The logic in print_aligned_text goes like this:

col_count = cont->ncolumns;
...
more_col_wrapping = col_count;
curr_nl_line = 0;
while (more_col_wrapping)
{
  for (i = 0; i < cont->ncolumns; i++)
    if (!(this_line + 1)->ptr)   // Surely only true once at end of columns?
      more_col_wrapping--;
  curr_nl_line++;
}

I'm not convinced looking at the code that more_col_wrapping is ever
true except for the first time through the loop, and it's reached
zero after completion of the inner loop, so all of the code used when
curr_nl_line is > 0 is never called.  I'm sure I'm interpreting this
incorrectly, but I can't see under what circumstances I would get the
column name header line wrapped over multiple lines.

It may be that I just haven't got a table with column names of the
correct number and lengths to trigger it?


Regards,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment

Re: Unicode UTF-8 table formatting for psql text output

From
Selena Deckelmann
Date:
Hi!

Below is from Brad Slinger.  I'm just forwarding his message so that
the review will be in the thread for the archives. (Sorry, Brad that I
missed this earlier.. I thought you'd already replied to the list for
some reason.)

Brad says:

Please forgive my top posting.  Below is my patch review.
      The purpose of this patch is to improve psql output
readability.   This is accomplished through table formatting
improvements and line art improvements.
      The patch is a unified diff, rather than a context diff.  The
patch applies from the project root with with -p1 set.  Patch(1)
reports "with fuzz 1" when patching src/bin/psql/print.h.  The patch
touches two files:
      src/bin/psql/print.h      src/bin/psql/print.c
      The patch does not include any documentation or tests.
Comments and style are similar to existing code.
      The patched code compiles without any additional warnings.
Lint gripes about a trailing ',' in 'typedef enum printTextRule' in
print.h.  Other additional lint seem to be false positives.  The
regression tests pass against the new patch.
      The patched code installs and runs.  There were no crashes,
although I didn't try very hard.  The ASCII line art works:
      createdb test      psql test      \pset border 2      \pset format wrapped      SELECT 'short' AS short, lpad('',
1000,'long') AS long;
 
      I can trigger the new line art by setting LC_ALL=en_US.UTF-8 in
the environment.
      As far as I can tell, the patch works as advertised.  The patch
does not change the server/backend.  I don't see any performance
problems.

      Thanks,
      --bts

-- 
http://chesnok.com/daily - me
http://endpoint.com - work


Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Fri, Sep 18, 2009 at 11:30:05AM -0700, Selena Deckelmann wrote:
> Brad says:
>
>        The patched code compiles without any additional warnings.
> Lint gripes about a trailing ',' in 'typedef enum printTextRule' in
> print.h.  Other additional lint seem to be false positives.  The
> regression tests pass against the new patch.

I've attached a new patch which tidies up those extra commas, plus
a patch showing the changes from the previous patch.


Regards,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment

Re: Unicode UTF-8 table formatting for psql text output

From
Selena Deckelmann
Date:
Hi!

On Wed, Sep 23, 2009 at 2:16 AM, Roger Leigh <rleigh@codelibre.net> wrote:
> On Fri, Sep 18, 2009 at 11:30:05AM -0700, Selena Deckelmann wrote:
>> Brad says:
>>
>>        The patched code compiles without any additional warnings.
>> Lint gripes about a trailing ',' in 'typedef enum printTextRule' in
>> print.h.  Other additional lint seem to be false positives.  The
>> regression tests pass against the new patch.
>
> I've attached a new patch which tidies up those extra commas, plus
> a patch showing the changes from the previous patch.

Great!  Thank you.

Brad -- can you review this patch? Is it ready for a committer?

Thanks,
-selena


--
http://chesnok.com/daily - me
http://endpoint.com - work


Re: Unicode UTF-8 table formatting for psql text output

From
Robert Haas
Date:
On Sun, Sep 27, 2009 at 9:24 PM, Selena Deckelmann
<selenamarie@gmail.com> wrote:
> Hi!
>
> On Wed, Sep 23, 2009 at 2:16 AM, Roger Leigh <rleigh@codelibre.net> wrote:
>> On Fri, Sep 18, 2009 at 11:30:05AM -0700, Selena Deckelmann wrote:
>>> Brad says:
>>>
>>>        The patched code compiles without any additional warnings.
>>> Lint gripes about a trailing ',' in 'typedef enum printTextRule' in
>>> print.h.  Other additional lint seem to be false positives.  The
>>> regression tests pass against the new patch.
>>
>> I've attached a new patch which tidies up those extra commas, plus
>> a patch showing the changes from the previous patch.
>
> Great!  Thank you.
>
> Brad -- can you review this patch? Is it ready for a committer?

Brad already marked it that way on the CommitFest application, so I
think that probably means yes.  :-)

...Robert


Re: Unicode UTF-8 table formatting for psql text output

From
"Brad T. Sliger"
Date:
On Sunday 27 September 2009 19:03:33 Robert Haas wrote:
> On Sun, Sep 27, 2009 at 9:24 PM, Selena Deckelmann
>
> <selenamarie@gmail.com> wrote:
> > Hi!
> >
> > On Wed, Sep 23, 2009 at 2:16 AM, Roger Leigh <rleigh@codelibre.net> wrote:
> >> On Fri, Sep 18, 2009 at 11:30:05AM -0700, Selena Deckelmann wrote:
> >>> Brad says:
> >>>
> >>>        The patched code compiles without any additional warnings.
> >>> Lint gripes about a trailing ',' in 'typedef enum printTextRule' in
> >>> print.h.  Other additional lint seem to be false positives.  The
> >>> regression tests pass against the new patch.
> >>
> >> I've attached a new patch which tidies up those extra commas, plus
> >> a patch showing the changes from the previous patch.
> >
> > Great!  Thank you.
> >
> > Brad -- can you review this patch? Is it ready for a committer?
>
> Brad already marked it that way on the CommitFest application, so I
> think that probably means yes.  :-)
>
> ...Robert
I looked at the new (psql-utf8-table-4.patch) patch.  I think the style issues are fixed.
During this review I found that `gmake check` will fail when LANG=en_US.UTF-8 in the environment.  In this case
the patched psql produces UTF8 line art and the tests expect ASCII line art.
pg_regress clears LC_ALL by default, but does not clear LANG by default.  Please find attached a patch that
causes pg_regress to also clear LANG by default.
Thanks,
--bts


Re: Unicode UTF-8 table formatting for psql text output

From
Peter Eisentraut
Date:
On Mon, 2009-09-28 at 20:49 -0700, Brad T. Sliger wrote:
> During this review I found that `gmake check` will fail when
> LANG=en_US.UTF-8 in the environment.  In this case 
> the patched psql produces UTF8 line art and the tests expect ASCII
> line art.
> 
>         pg_regress clears LC_ALL by default, but does not clear LANG
> by default.  Please find attached a patch that 
> causes pg_regress to also clear LANG by default.

It probably doesn't matter much, but I think the proper fix would be to
unset LC_CTYPE.



Re: Unicode UTF-8 table formatting for psql text output

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On Mon, 2009-09-28 at 20:49 -0700, Brad T. Sliger wrote:
>> pg_regress clears LC_ALL by default, but does not clear LANG
>> by default.  Please find attached a patch that 
>> causes pg_regress to also clear LANG by default.

> It probably doesn't matter much, but I think the proper fix would be to
> unset LC_CTYPE.

Wouldn't we have to clear *all* of these variables to be sure?

The bigger question is exactly how we expect this stuff to interact with
pg_regress' --no-locale switch.  We already do clear all these variables
when --no-locale is specified.  I am wondering just what --locale is
supposed to do, and whether selectively lobotomizing the LC stuff has
any real use at all.
        regards, tom lane


Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Tue, Sep 29, 2009 at 12:01:30PM -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > On Mon, 2009-09-28 at 20:49 -0700, Brad T. Sliger wrote:
> >> pg_regress clears LC_ALL by default, but does not clear LANG
> >> by default.  Please find attached a patch that 
> >> causes pg_regress to also clear LANG by default.
> 
> > It probably doesn't matter much, but I think the proper fix would be to
> > unset LC_CTYPE.
> 
> Wouldn't we have to clear *all* of these variables to be sure?
> 
> The bigger question is exactly how we expect this stuff to interact with
> pg_regress' --no-locale switch.  We already do clear all these variables
> when --no-locale is specified.  I am wondering just what --locale is
> supposed to do, and whether selectively lobotomizing the LC stuff has
> any real use at all.

While this is currently just hypothetical, the code is also making the
assumption that the C locale is ASCII.  However, there is no guarantee
that this is the case.  ASCII is the minimal requirement, and UTF-8 is
a superset (according to SUS/POSIX).

In Debian, we do have plans to introduce a C.UTF-8 locale, and perhaps
eventually move the C locale to using UTF-8 by default, at which point
the tests would again fail because you would still get the UTF-8
formatting despite all of the LC_ etc. munging.  (At that point, we
would have a C.ASCII for compatibility.)


Regards,
Roger

--  .''`.  Roger Leigh: :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/`. `'   Printing on
GNU/Linux?      http://gutenprint.sourceforge.net/  `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.
 


Re: Unicode UTF-8 table formatting for psql text output

From
Tom Lane
Date:
Roger Leigh <rleigh@codelibre.net> writes:
> In Debian, we do have plans to introduce a C.UTF-8 locale,

Egad, isn't that a contradiction in terms?  C locale means POSIX
behavior and nothing but.
        regards, tom lane


Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Tue, Sep 29, 2009 at 01:41:27PM -0400, Tom Lane wrote:
> Roger Leigh <rleigh@codelibre.net> writes:
> > In Debian, we do have plans to introduce a C.UTF-8 locale,
> 
> Egad, isn't that a contradiction in terms?

Not entirely!

> C locale means POSIX behavior and nothing but.

Indeed it does.  However, making LC_CTYPE be UTF-8 rather than
ASCII is both possible and still strictly conforming to the
letter of the standard.  There would be some collation and
other restrictions ("digit" and other character classes would
be contrained to the ASCII characters compared with other UTF-8
locales). However, any existing programs using ASCII would continue
to function without any changes to their behaviour.  The only
observable change will be that nl_langinfo(CODESET) will return
UTF-8, and it will be valid for programs to use UTF-8 encoded
text in formatted print functions, etc..

I'd just like to stress that this isn't happening anytime soon!
(Well, C.UTF-8 is possible now, it's just not hardcoded into
libc.so.  The intention is to initially provide C.UTF-8 in
addition to C so that programs requiring UTF-8 support have a
locale guaranteed to be present on the system to meet their
needs.)


Regards,
Roger

--  .''`.  Roger Leigh: :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/`. `'   Printing on
GNU/Linux?      http://gutenprint.sourceforge.net/  `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.
 


Re: Unicode UTF-8 table formatting for psql text output

From
Peter Eisentraut
Date:
On Tue, 2009-09-29 at 12:01 -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > On Mon, 2009-09-28 at 20:49 -0700, Brad T. Sliger wrote:
> >> pg_regress clears LC_ALL by default, but does not clear LANG
> >> by default.  Please find attached a patch that 
> >> causes pg_regress to also clear LANG by default.
> 
> > It probably doesn't matter much, but I think the proper fix would be to
> > unset LC_CTYPE.

(Actually, I should have said, set LC_CTYPE to C.)

> Wouldn't we have to clear *all* of these variables to be sure?

Not really.  The problem here is that psql checks whether the console
uses UTF8, and that is controlled by the LC_CTYPE variable on Unix-ish
systems.

> The bigger question is exactly how we expect this stuff to interact with
> pg_regress' --no-locale switch.  We already do clear all these variables
> when --no-locale is specified.  I am wondering just what --locale is
> supposed to do, and whether selectively lobotomizing the LC stuff has
> any real use at all.

We should do the LANG or LC_CTYPE thing only on the client,
unconditionally.  The --no-locale/--locale options should primarily
determine what the temporary server uses.



Re: Unicode UTF-8 table formatting for psql text output

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On Tue, 2009-09-29 at 12:01 -0400, Tom Lane wrote:
>> The bigger question is exactly how we expect this stuff to interact with
>> pg_regress' --no-locale switch.  We already do clear all these variables
>> when --no-locale is specified.  I am wondering just what --locale is
>> supposed to do, and whether selectively lobotomizing the LC stuff has
>> any real use at all.

> We should do the LANG or LC_CTYPE thing only on the client,
> unconditionally.  The --no-locale/--locale options should primarily
> determine what the temporary server uses.

Well, that seems fairly reasonable, but it's going to require some
refactoring of pg_regress.  The initialize_environment function
determines what happens in both the client and the temp server.
        regards, tom lane


Re: Unicode UTF-8 table formatting for psql text output

From
Tom Lane
Date:
Roger Leigh <rleigh@codelibre.net> writes:
>> C locale means POSIX behavior and nothing but.

> Indeed it does.  However, making LC_CTYPE be UTF-8 rather than
> ASCII is both possible and still strictly conforming to the
> letter of the standard.  There would be some collation and
> other restrictions ("digit" and other character classes would
> be contrained to the ASCII characters compared with other UTF-8
> locales). However, any existing programs using ASCII would continue
> to function without any changes to their behaviour.  The only
> observable change will be that nl_langinfo(CODESET) will return
> UTF-8, and it will be valid for programs to use UTF-8 encoded
> text in formatted print functions, etc..

I really, really don't believe that that meets either the letter or
the spirit of the C standard, at least not if you are intending to
silently substitute LC_CTYPE=UTF8 when the program has specified
C/POSIX locale.  (If this is just a matter of what the default
LANG environment is, of course you can do anything.)
        regards, tom lane


Re: Unicode UTF-8 table formatting for psql text output

From
Robert Haas
Date:
On Tue, Sep 29, 2009 at 4:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
>> On Tue, 2009-09-29 at 12:01 -0400, Tom Lane wrote:
>>> The bigger question is exactly how we expect this stuff to interact with
>>> pg_regress' --no-locale switch.  We already do clear all these variables
>>> when --no-locale is specified.  I am wondering just what --locale is
>>> supposed to do, and whether selectively lobotomizing the LC stuff has
>>> any real use at all.
>
>> We should do the LANG or LC_CTYPE thing only on the client,
>> unconditionally.  The --no-locale/--locale options should primarily
>> determine what the temporary server uses.
>
> Well, that seems fairly reasonable, but it's going to require some
> refactoring of pg_regress.  The initialize_environment function
> determines what happens in both the client and the temp server.

This seems to mean that we can't apply this patch, since failing the
regression tests is not an acceptable behavior.  I think that means
that the patch author needs to either do the necessary pg_regress
refactoring or figure out some other solution that is acceptable to
the community.   Assuming that non-trivial pg_regress refactoring is
required, I think we should mark this patch as Returned with Feedback,
because that should really be a separate patch, and it's obviously far
too late to submit new patches to this CommitFest.

Objections?

...Robert


Re: Unicode UTF-8 table formatting for psql text output

From
Alvaro Herrera
Date:
Robert Haas escribió:
> On Tue, Sep 29, 2009 at 4:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Peter Eisentraut <peter_e@gmx.net> writes:
> >> On Tue, 2009-09-29 at 12:01 -0400, Tom Lane wrote:
> >>> The bigger question is exactly how we expect this stuff to interact with
> >>> pg_regress' --no-locale switch.  We already do clear all these variables
> >>> when --no-locale is specified.  I am wondering just what --locale is
> >>> supposed to do, and whether selectively lobotomizing the LC stuff has
> >>> any real use at all.
> >
> >> We should do the LANG or LC_CTYPE thing only on the client,
> >> unconditionally.  The --no-locale/--locale options should primarily
> >> determine what the temporary server uses.
> >
> > Well, that seems fairly reasonable, but it's going to require some
> > refactoring of pg_regress.  The initialize_environment function
> > determines what happens in both the client and the temp server.
> 
> This seems to mean that we can't apply this patch, since failing the
> regression tests is not an acceptable behavior.  I think that means
> that the patch author needs to either do the necessary pg_regress
> refactoring or figure out some other solution that is acceptable to
> the community.   Assuming that non-trivial pg_regress refactoring is
> required, I think we should mark this patch as Returned with Feedback,
> because that should really be a separate patch, and it's obviously far
> too late to submit new patches to this CommitFest.
> 
> Objections?

Does the patch pass regression tests in normal conditions?  If it does,
I see no reason to reject it.  If it fails in --locale only, and even
then only when the given locale is UTF8, which IIRC it's a seldom-used
case, we can see about fixing that separately.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Unicode UTF-8 table formatting for psql text output

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Robert Haas escribi�:
>> This seems to mean that we can't apply this patch, since failing the
>> regression tests is not an acceptable behavior.

> Does the patch pass regression tests in normal conditions?

If you consider that "normal" means "LANG=C in environment", then yeah.
Else not so much.  In particular, every one of the buildfarm machines
that tests a non-C locale environment can be expected to go red.

I think that the required patch might not be a whole lot larger than the
one Brad already submitted, so I'm willing to allow a couple more days
for the author to investigate the issue.
        regards, tom lane


Re: Unicode UTF-8 table formatting for psql text output

From
Andrew Dunstan
Date:

Alvaro Herrera wrote:
> Does the patch pass regression tests in normal conditions?  If it does,
> I see no reason to reject it.  If it fails in --locale only, and even
> then only when the given locale is UTF8, which IIRC it's a seldom-used
> case, we can see about fixing that separately.
>
>   

Numerous buildfarm machines (including one of mine) run the regression 
suite using UTF8. So it's far from seldom-used.

cheers

andrew


Re: Unicode UTF-8 table formatting for psql text output

From
Alvaro Herrera
Date:
Tom Lane escribió:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Robert Haas escribi�:
> >> This seems to mean that we can't apply this patch, since failing the
> >> regression tests is not an acceptable behavior.
> 
> > Does the patch pass regression tests in normal conditions?
> 
> If you consider that "normal" means "LANG=C in environment", then yeah.
> Else not so much.  In particular, every one of the buildfarm machines
> that tests a non-C locale environment can be expected to go red.

Huh, you're right, it doesn't work at all.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Tue, Sep 29, 2009 at 04:28:57PM -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > On Tue, 2009-09-29 at 12:01 -0400, Tom Lane wrote:
> >> The bigger question is exactly how we expect this stuff to interact with
> >> pg_regress' --no-locale switch.  We already do clear all these variables
> >> when --no-locale is specified.  I am wondering just what --locale is
> >> supposed to do, and whether selectively lobotomizing the LC stuff has
> >> any real use at all.
> 
> > We should do the LANG or LC_CTYPE thing only on the client,
> > unconditionally.  The --no-locale/--locale options should primarily
> > determine what the temporary server uses.
> 
> Well, that seems fairly reasonable, but it's going to require some
> refactoring of pg_regress.  The initialize_environment function
> determines what happens in both the client and the temp server.

Two possible approaches to fix the tests are as follows:

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index f2f9603..74cdaa2 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -711,8 +711,7 @@ initialize_environment(void)     * is actually called.)     */    unsetenv("LANGUAGE");
-    unsetenv("LC_ALL");
-    putenv("LC_MESSAGES=C");
+    putenv("LC_ALL=C");    /*     * Set multibyte as requested

Here we just force the locale to C.  This does have the disadvantage
that --no-locale is made redundant, and any tests which are dependent
upon locale (if any?) will be run in the C locale.

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index f2f9603..65fb49a 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -712,6 +712,7 @@ initialize_environment(void)     */    unsetenv("LANGUAGE");    unsetenv("LC_ALL");
+    putenv("LC_CTYPE=C");    putenv("LC_MESSAGES=C");    /*

Here we set LC_CTYPE to C in addition to LC_MESSAGES (and for much the
same reasons).  However, if you test on non-C locales to check for
issues with other locale codesets, those tests are all going to be
forced to use ASCII.  Is this a problem in practice?

From the POV of my patch, it's working as designed: if the locale
codeset is UTF-8 it's outputting UTF-8.  But, due to the way the
test machinery is looking at the output, this is breaking the tests.
I'm not sure what I can do with my patch to make it behave differently
that is both compatible with its intended use and not break the tests--
this is really just breaking an assumption in the testing code that
the test output will always be ASCII.

Forcing the LC_CTYPE to C will force ASCII output and work around this
problem with the tests.  Another approach would be to let psql know
it's being run in a test environment with a PG_TEST or some other
environment variable which we can check for and use to turn off UTF-8
output if set.  Would that be better?


Regards,
Roger

--  .''`.  Roger Leigh: :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/`. `'   Printing on
GNU/Linux?      http://gutenprint.sourceforge.net/  `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.
 


Re: Unicode UTF-8 table formatting for psql text output

From
Andrew Dunstan
Date:

Roger Leigh wrote:
> Here we just force the locale to C.  This does have the disadvantage
> that --no-locale is made redundant, and any tests which are dependent
> upon locale (if any?) will be run in the C locale.
>
>
>   

That is not a solution. We have not that long ago gone to some lengths 
to provide for buildfarm testing in various locales. We're not going to 
undo that.

cheers

andrew


Re: Unicode UTF-8 table formatting for psql text output

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Roger Leigh wrote:
>> Here we just force the locale to C.  This does have the disadvantage
>> that --no-locale is made redundant, and any tests which are dependent
>> upon locale (if any?) will be run in the C locale.

> That is not a solution.

Right.  I think you may have missed the point of what Peter was saying:
it's okay to force the locale to C on the *client* side of the tests.
The trick is to not change the environment that the temp server is
started with.  This will take some redesign inside pg_regress, I think.
Probably initialize_environment needs to be split into two functions,
one that's run before firing off the temp server and one that runs
after.
        regards, tom lane


Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Tue, Sep 29, 2009 at 04:32:49PM -0400, Tom Lane wrote:
> Roger Leigh <rleigh@codelibre.net> writes:
> >> C locale means POSIX behavior and nothing but.
> 
> > Indeed it does.  However, making LC_CTYPE be UTF-8 rather than
> > ASCII is both possible and still strictly conforming to the
> > letter of the standard.  There would be some collation and
> > other restrictions ("digit" and other character classes would
> > be contrained to the ASCII characters compared with other UTF-8
> > locales). However, any existing programs using ASCII would continue
> > to function without any changes to their behaviour.  The only
> > observable change will be that nl_langinfo(CODESET) will return
> > UTF-8, and it will be valid for programs to use UTF-8 encoded
> > text in formatted print functions, etc..
> 
> I really, really don't believe that that meets either the letter or
> the spirit of the C standard, at least not if you are intending to
> silently substitute LC_CTYPE=UTF8 when the program has specified
> C/POSIX locale.  (If this is just a matter of what the default
> LANG environment is, of course you can do anything.)

We have spent some time reading the relevant standards documents
(C, POSIX, SUSv2, SUSv3) and haven't found anything yet that would
preclude this.  While they all specify minimal requirements for
what the C locale character set must provide (and POSIX/SUS are the
most strict, specifying ASCII outright for each 0-127 codepoint),
these are the minimal requirements for the locale, and
implementation-specific extensions to ASCII are allowed, which would
therefore permit UTF-8.  Note that LC_CTYPE=C is not required to
specify ASCII in any standard (though POSIX/SUS require that it
must contain ASCII as a subset of the whole set).

The language in SUSv2 in fact explicitly states that this is
allowed.  In fact, I've seen documentation that some UNIX systems such
as HPUX already do have a UTF-8 C locale as an option.


Regards,
Roger

--  .''`.  Roger Leigh: :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/`. `'   Printing on
GNU/Linux?      http://gutenprint.sourceforge.net/  `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.
 


Re: Unicode UTF-8 table formatting for psql text output

From
Andrew Dunstan
Date:

Andrew Dunstan wrote:
>
>
> Roger Leigh wrote:
>> Here we just force the locale to C.  This does have the disadvantage
>> that --no-locale is made redundant, and any tests which are dependent
>> upon locale (if any?) will be run in the C locale.
>>
>>
>>   
>
> That is not a solution. We have not that long ago gone to some lengths 
> to provide for buildfarm testing in various locales. We're not going 
> to undo that.
>
>

Thinking about this some more, ISTM a much better way of approaching it 
would be to provide a flag for psql to turn off the fancy formatting, 
and have pg_regress use that flag.

cheers

andrew


Re: Unicode UTF-8 table formatting for psql text output

From
Tom Lane
Date:
Roger Leigh <rleigh@codelibre.net> writes:
> The language in SUSv2 in fact explicitly states that this is
> allowed.  In fact, I've seen documentation that some UNIX systems such
> as HPUX already do have a UTF-8 C locale as an option.

I don't argue with the concept of a C.UTF8 locale --- in fact I think
it sounds pretty useful.  What I think is 100% broken is trying to make
C locale work that way.  C locale is supposed to be the traditional
locale-ignorant, characters-are-bytes behavior.
        regards, tom lane


Re: Unicode UTF-8 table formatting for psql text output

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Thinking about this some more, ISTM a much better way of approaching it 
> would be to provide a flag for psql to turn off the fancy formatting, 
> and have pg_regress use that flag.

Yeah, that's not a bad idea.  There are likely to be other client
programs that won't want this behavioral change either.
        regards, tom lane


Re: Unicode UTF-8 table formatting for psql text output

From
Robert Haas
Date:
On Wed, Sep 30, 2009 at 11:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> Thinking about this some more, ISTM a much better way of approaching it
>> would be to provide a flag for psql to turn off the fancy formatting,
>> and have pg_regress use that flag.
>
> Yeah, that's not a bad idea.  There are likely to be other client
> programs that won't want this behavioral change either.

I'm surprised there isn't one already.  I would think that any new
format would default to off.

...Robert


Re: Unicode UTF-8 table formatting for psql text output

From
Andrew Dunstan
Date:

Robert Haas wrote:
> On Wed, Sep 30, 2009 at 11:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>   
>> Andrew Dunstan <andrew@dunslane.net> writes:
>>     
>>> Thinking about this some more, ISTM a much better way of approaching it
>>> would be to provide a flag for psql to turn off the fancy formatting,
>>> and have pg_regress use that flag.
>>>       
>> Yeah, that's not a bad idea.  There are likely to be other client
>> programs that won't want this behavioral change either.
>>     
>
> I'm surprised there isn't one already.  I would think that any new
> format would default to off.
>
>
>   

I haven't looked, but if there is that's so much less work to do ;-)

cheers

andrew


Re: Unicode UTF-8 table formatting for psql text output

From
Peter Eisentraut
Date:
On Wed, 2009-09-30 at 11:03 -0400, Andrew Dunstan wrote:
> Thinking about this some more, ISTM a much better way of approaching it 
> would be to provide a flag for psql to turn off the fancy formatting, 
> and have pg_regress use that flag.

Well, it might not be a bad idea, but adding a feature just to satisfy
the test suite instead of fixing the test suite doesn't feel satisfying.
Is there another use case?



Re: Unicode UTF-8 table formatting for psql text output

From
Peter Eisentraut
Date:
On Wed, 2009-09-30 at 12:06 -0400, Robert Haas wrote:
> On Wed, Sep 30, 2009 at 11:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Andrew Dunstan <andrew@dunslane.net> writes:
> >> Thinking about this some more, ISTM a much better way of approaching it
> >> would be to provide a flag for psql to turn off the fancy formatting,
> >> and have pg_regress use that flag.
> >
> > Yeah, that's not a bad idea.  There are likely to be other client
> > programs that won't want this behavioral change either.
> 
> I'm surprised there isn't one already.  I would think that any new
> format would default to off.

Well, this isn't a new format, just an enhancement of an existing
format.



Re: Unicode UTF-8 table formatting for psql text output

From
Andrew Dunstan
Date:

Peter Eisentraut wrote:
> On Wed, 2009-09-30 at 11:03 -0400, Andrew Dunstan wrote:
>   
>> Thinking about this some more, ISTM a much better way of approaching it 
>> would be to provide a flag for psql to turn off the fancy formatting, 
>> and have pg_regress use that flag.
>>     
>
> Well, it might not be a bad idea, but adding a feature just to satisfy
> the test suite instead of fixing the test suite doesn't feel satisfying.
> Is there another use case?
>
>   

Sure, as Tom noted pg_regress probably won't be the only user. There are 
lots of legacy scripts out there that parse psql output, and it should 
be of use to them.

cheers

andrew


Re: Unicode UTF-8 table formatting for psql text output

From
Alvaro Herrera
Date:
Andrew Dunstan escribió:
> 
> 
> Peter Eisentraut wrote:
> >On Wed, 2009-09-30 at 11:03 -0400, Andrew Dunstan wrote:
> >>Thinking about this some more, ISTM a much better way of
> >>approaching it would be to provide a flag for psql to turn off
> >>the fancy formatting, and have pg_regress use that flag.
> >
> >Well, it might not be a bad idea, but adding a feature just to satisfy
> >the test suite instead of fixing the test suite doesn't feel satisfying.
> >Is there another use case?
> 
> Sure, as Tom noted pg_regress probably won't be the only user. There
> are lots of legacy scripts out there that parse psql output, and it
> should be of use to them.

All scripts I've seen parsing psql output use unaligned, undecorated
mode.  I have yet to see one messing with the |'s.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Unicode UTF-8 table formatting for psql text output

From
Robert Haas
Date:
On Wed, Sep 30, 2009 at 1:27 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On Wed, 2009-09-30 at 12:06 -0400, Robert Haas wrote:
>> On Wed, Sep 30, 2009 at 11:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > Andrew Dunstan <andrew@dunslane.net> writes:
>> >> Thinking about this some more, ISTM a much better way of approaching it
>> >> would be to provide a flag for psql to turn off the fancy formatting,
>> >> and have pg_regress use that flag.
>> >
>> > Yeah, that's not a bad idea.  There are likely to be other client
>> > programs that won't want this behavioral change either.
>>
>> I'm surprised there isn't one already.  I would think that any new
>> format would default to off.
>
> Well, this isn't a new format, just an enhancement of an existing
> format.

I disagree.  This patch is about replacing ASCII art constructed using
| and + and replacing it with box-drawing characters.  I think the
chances that this will not look good in one or more of the terminal
emulators that I use from time to time are excellent, especially if
the approach is at all similar to what pstree does on recent versions
of Fedora.  I'm 100% not interested in tracking down some obscure
encoding/locale issue to make it show up right, and I'm 110% not
interested in helping the people who file bug reports to track down
the analagous issues in their crazy, broken environments.

The way it works right now (and has worked for the last 5 years or
more) is reliable, familiar, and, at least IME, bullet-proof.  I don't
even see a good case for changing the default, let alone not providing
a way to retreat to the old behavior.  Considering that this patch is
nothing more than an exceedingly minor piece of window-dressing,
taking even a small risk of breaking things for anyone who as of today
can log into psql and get work done without pain seems like a bad
trade-off to me.

...Robert


Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Wed, Sep 30, 2009 at 01:30:17PM -0400, Andrew Dunstan wrote:
>
>
> Peter Eisentraut wrote:
>> On Wed, 2009-09-30 at 11:03 -0400, Andrew Dunstan wrote:
>>
>>> Thinking about this some more, ISTM a much better way of approaching
>>> it would be to provide a flag for psql to turn off the fancy
>>> formatting, and have pg_regress use that flag.
>>>
>>
>> Well, it might not be a bad idea, but adding a feature just to satisfy
>> the test suite instead of fixing the test suite doesn't feel satisfying.
>> Is there another use case?
>
> Sure, as Tom noted pg_regress probably won't be the only user. There are
> lots of legacy scripts out there that parse psql output, and it should
> be of use to them.

The attached patch implements this feature.  It adds a
--no-pretty-formatting/-G option to psql (naming isn't my forté,
so feel free to change it!).  This is also documented in the
SGML docs, and help text.  Lastly, this option is used when invoking
psql by pg_regress, which results in a working testsuite in a UTF-8
locale.

Hopefully this is OK!  I'll be happy to make any amendments required.


Regards,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment

Re: Unicode UTF-8 table formatting for psql text output

From
Peter Eisentraut
Date:
On Wed, 2009-09-30 at 14:02 -0400, Alvaro Herrera wrote:
> All scripts I've seen parsing psql output use unaligned, undecorated
> mode.  I have yet to see one messing with the |'s.

Plus, we would have broken that with the : continuation lines.



Re: Unicode UTF-8 table formatting for psql text output

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> The way it works right now (and has worked for the last 5 years or
> more) is reliable, familiar, and, at least IME, bullet-proof.  I don't
> even see a good case for changing the default, let alone not providing
> a way to retreat to the old behavior.

This is pretty much my opinion as well.  I'd vote for requiring a switch
to turn it on.  Those who really want the new behavior by default can
set it up in their .psqlrc.
        regards, tom lane


Re: Unicode UTF-8 table formatting for psql text output

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On Wed, 2009-09-30 at 14:02 -0400, Alvaro Herrera wrote:
>> All scripts I've seen parsing psql output use unaligned, undecorated
>> mode.  I have yet to see one messing with the |'s.

> Plus, we would have broken that with the : continuation lines.

Only for data containing embedded newlines, which is not that common.
        regards, tom lane


Re: Unicode UTF-8 table formatting for psql text output

From
Tom Lane
Date:
Roger Leigh <rleigh@codelibre.net> writes:
>> On Wed, 2009-09-30 at 11:03 -0400, Andrew Dunstan wrote:
>>> Thinking about this some more, ISTM a much better way of approaching 
>>> it would be to provide a flag for psql to turn off the fancy 
>>> formatting, and have pg_regress use that flag.

> The attached patch implements this feature.  It adds a
> --no-pretty-formatting/-G option to psql (naming isn't my fort�,
> so feel free to change it!).  This is also documented in the
> SGML docs, and help text.  Lastly, this option is used when invoking
> psql by pg_regress, which results in a working testsuite in a UTF-8
> locale.

It would be a good idea to tie this to a psql magic variable (like
ON_ERROR_STOP) so that it could conveniently be set in ~/.psqlrc.
I'm not actually sure that we need a dedicated command line switch
for it, since you could use "-v varname" instead.
        regards, tom lane


Re: Unicode UTF-8 table formatting for psql text output

From
Peter Eisentraut
Date:
On Wed, 2009-09-30 at 18:50 -0400, Tom Lane wrote:
> It would be a good idea to tie this to a psql magic variable (like
> ON_ERROR_STOP) so that it could conveniently be set in ~/.psqlrc.
> I'm not actually sure that we need a dedicated command line switch
> for it, since you could use "-v varname" instead.

Right.  A better yet place would be inside \pset, since that contains
the other formatting options.



Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Wed, Sep 30, 2009 at 06:50:46PM -0400, Tom Lane wrote:
> Roger Leigh <rleigh@codelibre.net> writes:
> >> On Wed, 2009-09-30 at 11:03 -0400, Andrew Dunstan wrote:
> >>> Thinking about this some more, ISTM a much better way of approaching
> >>> it would be to provide a flag for psql to turn off the fancy
> >>> formatting, and have pg_regress use that flag.
>
> > The attached patch implements this feature.  It adds a
> > --no-pretty-formatting/-G option to psql (naming isn't my fort�,
> > so feel free to change it!).  This is also documented in the
> > SGML docs, and help text.  Lastly, this option is used when invoking
> > psql by pg_regress, which results in a working testsuite in a UTF-8
> > locale.
>
> It would be a good idea to tie this to a psql magic variable (like
> ON_ERROR_STOP) so that it could conveniently be set in ~/.psqlrc.
> I'm not actually sure that we need a dedicated command line switch
> for it, since you could use "-v varname" instead.

I have attached a patch which implements the feature as a pset
variable.  This also slightly simplifies some of the patch since
the table style is passed to functions directly in printTableContent
rather than separately.  The psql option '-P tablestyle=ascii' is
passed to psql in pg_regress_main.c which means the testsuite doesn't
fail any more.  The option is documented in the psql docs, and is
also tab-completed.  Users can just put '\pset tablestyle ascii' in
their .psqlrc if they want the old format in a UTF-8 locale.

To follow up on the comments about the problems of defaulting to
UTF-8.  There are just two potential problems with defaulting, both of
which are problems with the user's mis-configuration of their system
and (IMHO) not something that postgresql needs to care about.
1) The user lacks a font containing the line-drawing characters.
   It's very rare for a fixed-width terminal font to not contain
   these characters, and the patch as provided sticks to the most
   basic range from the VT100 set which are most commonly provided.
2) The user's terminal emulator is not configured to accept UTF-8
   input.  If you're using a UTF-8 locale, then this is necessary
   to display non-ASCII characters, and is done automatically by
   almost every terminal emulator out there (on Linux, they default
   to using nl_langinfo(CODESET) unless configured otherwise, which
   is a very simple change if required).  On any current GNU/Linux
   system, you have to go out of your way to break the defaults.

The patch currently switches to UTF-8 automatically /when available/.
IMO this is the correct behaviour since it will work for all but the
handful of users who misconfigured their system, and provides an
immediate benefit.  We spent years making UTF-8 work out of the box on
Linux and Unix systems, and it seems a trifle unfair to penalise all
users for the sake of the few who just didn't set up their terminal
emulator correctly (their setup is already broken, since non-ASCII
characters returned by queries are /already/ going to be displayed
incorrectly).


Regards,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment

Re: Unicode UTF-8 table formatting for psql text output

From
"Brad T. Sliger"
Date:
On Friday 02 October 2009 04:21:35 Roger Leigh wrote:
> On Wed, Sep 30, 2009 at 06:50:46PM -0400, Tom Lane wrote:
> > Roger Leigh <rleigh@codelibre.net> writes:
> > >> On Wed, 2009-09-30 at 11:03 -0400, Andrew Dunstan wrote:
> > >>> Thinking about this some more, ISTM a much better way of approaching
> > >>> it would be to provide a flag for psql to turn off the fancy
> > >>> formatting, and have pg_regress use that flag.
> > >
> > > The attached patch implements this feature.  It adds a
> > > --no-pretty-formatting/-G option to psql (naming isn't my fort�,
> > > so feel free to change it!).  This is also documented in the
> > > SGML docs, and help text.  Lastly, this option is used when invoking
> > > psql by pg_regress, which results in a working testsuite in a UTF-8
> > > locale.
> >
> > It would be a good idea to tie this to a psql magic variable (like
> > ON_ERROR_STOP) so that it could conveniently be set in ~/.psqlrc.
> > I'm not actually sure that we need a dedicated command line switch
> > for it, since you could use "-v varname" instead.
>
> I have attached a patch which implements the feature as a pset
> variable.  This also slightly simplifies some of the patch since
> the table style is passed to functions directly in printTableContent
> rather than separately.  The psql option '-P tablestyle=ascii' is
> passed to psql in pg_regress_main.c which means the testsuite doesn't
> fail any more.  The option is documented in the psql docs, and is
> also tab-completed.  Users can just put '\pset tablestyle ascii' in
> their .psqlrc if they want the old format in a UTF-8 locale.
>
> To follow up on the comments about the problems of defaulting to
> UTF-8.  There are just two potential problems with defaulting, both of
> which are problems with the user's mis-configuration of their system
> and (IMHO) not something that postgresql needs to care about.
> 1) The user lacks a font containing the line-drawing characters.
>    It's very rare for a fixed-width terminal font to not contain
>    these characters, and the patch as provided sticks to the most
>    basic range from the VT100 set which are most commonly provided.
> 2) The user's terminal emulator is not configured to accept UTF-8
>    input.  If you're using a UTF-8 locale, then this is necessary
>    to display non-ASCII characters, and is done automatically by
>    almost every terminal emulator out there (on Linux, they default
>    to using nl_langinfo(CODESET) unless configured otherwise, which
>    is a very simple change if required).  On any current GNU/Linux
>    system, you have to go out of your way to break the defaults.
>
> The patch currently switches to UTF-8 automatically /when available/.
> IMO this is the correct behaviour since it will work for all but the
> handful of users who misconfigured their system, and provides an
> immediate benefit.  We spent years making UTF-8 work out of the box on
> Linux and Unix systems, and it seems a trifle unfair to penalise all
> users for the sake of the few who just didn't set up their terminal
> emulator correctly (their setup is already broken, since non-ASCII
> characters returned by queries are /already/ going to be displayed
> incorrectly).
>
>
> Regards,
> Roger
I looked at psql-utf8-table-5.patch.
Lint(1) says there is an extra trailing ',' in src/bin/psql/print.h. in 'typedef enum printTextRule'.  The addition to
src/bin/psql/command.c could use a comment, like adjacent code.
    'ASCII' and 'UTF8' may need <acronym></acronym> tags in doc/src/sgml/ref/psql-ref.sgml, like adjacent
code.  I'm not sure someone who hasn't seen this patch in action would immediately know what it does from the
documentation.  `gmake html` works without the patch, but fails with the patch:
openjade:ref/psql-ref.sgml:1692:15:E: document type does not allow element "TERM" here; assuming
missing "VARLISTENTRY" start-tag
After the patch, `\pset format wrapped` produces '\pset: unknown option: format'.  I saw this in interactive psql
and from .psqlrc.  I think this can be fixed by changing the addition to src/bin/psql/command.c from an 'if' clause to
an 'else if' clause.
Otherwise, the patch applied, built and installed.  The `gmake check` tests all passed with LANG and/or LC_ALL
set.  The various tablestyle options seem to work.  The default behavior with respect to various LANG and LC_ALL
values seems reasonable and can be overridden.
Thanks,
--bts


Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Fri, Oct 02, 2009 at 05:34:16PM -0700, Brad T. Sliger wrote:
> On Friday 02 October 2009 04:21:35 Roger Leigh wrote:
> > I have attached a patch which implements the feature as a pset
> > variable.  This also slightly simplifies some of the patch since
> > the table style is passed to functions directly in printTableContent
> > rather than separately.  The psql option '-P tablestyle=ascii' is
> > passed to psql in pg_regress_main.c which means the testsuite doesn't
> > fail any more.  The option is documented in the psql docs, and is
> > also tab-completed.  Users can just put '\pset tablestyle ascii' in
> > their .psqlrc if they want the old format in a UTF-8 locale.
>
>     I looked at psql-utf8-table-5.patch.

Many thanks for taking the time to do this.  I've attached a followup
patch which addresses your point below:

>     Lint(1) says there is an extra trailing ',' in src/bin/psql/print.h. in 'typedef enum printTextRule'.  The
additionto  
> src/bin/psql/command.c could use a comment, like adjacent code.

Fixed.

>         'ASCII' and 'UTF8' may need <acronym></acronym> tags in doc/src/sgml/ref/psql-ref.sgml, like adjacent
> code.  I'm not sure someone who hasn't seen this patch in action would immediately know what it does from the
> documentation.  `gmake html` works without the patch, but fails with the patch:

Also fixed.  I also added some additional explanation of the option which
hopefully makes its purpose more obvious.  The <acronym> tag isn't used for
the itemised option list names, but is used in the descriptive text; I can
also add it there if appropriate.

It's likely that "tablestyle" could well be named better.  "format" is
already used, but if there's a more intuitive name that fits better,
I'm happy to change it.
>     openjade:ref/psql-ref.sgml:1692:15:E: document type does not allow element "TERM" here; assuming
> missing "VARLISTENTRY" start-tag

Also fixed.

>     After the patch, `\pset format wrapped` produces '\pset: unknown option: format'.  I saw this in interactive psql

> and from .psqlrc.  I think this can be fixed by changing the addition to src/bin/psql/command.c from an 'if' clause
to 
> an 'else if' clause.

Oops, yes.  Sorry about that hiccup.  I've also fixed this.


Regards,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment

Re: Unicode UTF-8 table formatting for psql text output

From
Peter Eisentraut
Date:
I have a comment on this bit:

> @@ -125,6 +128,17 @@ main(int argc, char *argv[])
>  
>         /* We rely on unmentioned fields of pset.popt to start out
> 0/false/NULL */
>         pset.popt.topt.format = PRINT_ALIGNED;
> +
> +       /* Default table style to plain ASCII */
> +       pset.popt.topt.table_style = &asciiformat;
> +#if (defined(HAVE_LANGINFO_H) && defined(CODESET))
> +       /* If a UTF-8 locale is available, switch to UTF-8 box drawing
> characters */
> +       if (pg_strcasecmp(nl_langinfo(CODESET), "UTF-8") == 0 ||
> +           pg_strcasecmp(nl_langinfo(CODESET), "utf8") == 0 ||
> +           pg_strcasecmp(nl_langinfo(CODESET), "CP65001") == 0)
> +               pset.popt.topt.table_style = &utf8format;
> +#endif
> +
>         pset.popt.topt.border = 1;
>         pset.popt.topt.pager = 1;
>         pset.popt.topt.start_table = true;

Elsewhere in the psql code, notably in mbprint.c, we make the decision
on whether to apply certain Unicode-aware processing based on whether
the client encoding is UTF8.  The same should be done here.

There is a patch somewhere in the pipeline that would automatically set
the psql client encoding to whatever the locale says, but until that is
done, the client encoding should be the sole setting that rules what
kind of character set processing is done on the client side.



Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Sun, Oct 04, 2009 at 11:22:27PM +0300, Peter Eisentraut wrote:
> I have a comment on this bit:
>
> > @@ -125,6 +128,17 @@ main(int argc, char *argv[])
> >
> >         /* We rely on unmentioned fields of pset.popt to start out
> > 0/false/NULL */
> >         pset.popt.topt.format = PRINT_ALIGNED;
> > +
> > +       /* Default table style to plain ASCII */
> > +       pset.popt.topt.table_style = &asciiformat;
> > +#if (defined(HAVE_LANGINFO_H) && defined(CODESET))
> > +       /* If a UTF-8 locale is available, switch to UTF-8 box drawing
> > characters */
> > +       if (pg_strcasecmp(nl_langinfo(CODESET), "UTF-8") == 0 ||
> > +           pg_strcasecmp(nl_langinfo(CODESET), "utf8") == 0 ||
> > +           pg_strcasecmp(nl_langinfo(CODESET), "CP65001") == 0)
> > +               pset.popt.topt.table_style = &utf8format;
> > +#endif
> > +
> >         pset.popt.topt.border = 1;
> >         pset.popt.topt.pager = 1;
> >         pset.popt.topt.start_table = true;
>
> Elsewhere in the psql code, notably in mbprint.c, we make the decision
> on whether to apply certain Unicode-aware processing based on whether
> the client encoding is UTF8.  The same should be done here.
>
> There is a patch somewhere in the pipeline that would automatically set
> the psql client encoding to whatever the locale says, but until that is
> done, the client encoding should be the sole setting that rules what
> kind of character set processing is done on the client side.

OK, that makes sense to a certain extent.  However, the characters
used to draw the table lines are not really that related to the
client encoding for data sent from the database (IMHO).

I think that (as you said) making the client encoding the same as the
locale character set the same in the future would clear up this
discrepancy though.  Using the client encoding, there's no guarantee
the client locale/terminal can handle UTF-8 when the client encoding is
UTF-8.

I have attached an updated patch which implements your suggested
behaviour.  It also renames the option to "linestyle" rather than
"tablestyle" which I think represents its purpose a bit more clearly.


Thanks,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment

Re: Unicode UTF-8 table formatting for psql text output

From
Tom Lane
Date:
Roger Leigh <rleigh@codelibre.net> writes:
> On Sun, Oct 04, 2009 at 11:22:27PM +0300, Peter Eisentraut wrote:
>> Elsewhere in the psql code, notably in mbprint.c, we make the decision
>> on whether to apply certain Unicode-aware processing based on whether
>> the client encoding is UTF8.  The same should be done here.
>> 
>> There is a patch somewhere in the pipeline that would automatically set
>> the psql client encoding to whatever the locale says, but until that is
>> done, the client encoding should be the sole setting that rules what
>> kind of character set processing is done on the client side.

> OK, that makes sense to a certain extent.  However, the characters
> used to draw the table lines are not really that related to the
> client encoding for data sent from the database (IMHO).

Huh?  The data *in* the table is going to be in the client_encoding, and
psql contains no mechanisms that would translate it to something else.
Surrounding it with decoration in a different encoding is just a recipe
for breakage.
        regards, tom lane


Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Mon, Oct 05, 2009 at 04:32:08PM -0400, Tom Lane wrote:
> Roger Leigh <rleigh@codelibre.net> writes:
> > On Sun, Oct 04, 2009 at 11:22:27PM +0300, Peter Eisentraut wrote:
> >> Elsewhere in the psql code, notably in mbprint.c, we make the decision
> >> on whether to apply certain Unicode-aware processing based on whether
> >> the client encoding is UTF8.  The same should be done here.
> >>
> >> There is a patch somewhere in the pipeline that would automatically set
> >> the psql client encoding to whatever the locale says, but until that is
> >> done, the client encoding should be the sole setting that rules what
> >> kind of character set processing is done on the client side.
>
> > OK, that makes sense to a certain extent.  However, the characters
> > used to draw the table lines are not really that related to the
> > client encoding for data sent from the database (IMHO).
>
> Huh?  The data *in* the table is going to be in the client_encoding, and
> psql contains no mechanisms that would translate it to something else.
> Surrounding it with decoration in a different encoding is just a recipe
> for breakage.

Ah, I was under the mistaken assumption that this was iconv()ed or
otherwise translated for correct display.  In that case, I'll leave
the patch as is (using the client encoding for table lines).

I've attached an updated copy of the patch (it just removes the
now unneeded langinfo.h header).


Regards,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment

Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Tue, Oct 06, 2009 at 10:44:27AM +0100, Roger Leigh wrote:
> On Mon, Oct 05, 2009 at 04:32:08PM -0400, Tom Lane wrote:
> > Roger Leigh <rleigh@codelibre.net> writes:
> > > On Sun, Oct 04, 2009 at 11:22:27PM +0300, Peter Eisentraut wrote:
> > >> Elsewhere in the psql code, notably in mbprint.c, we make the decision
> > >> on whether to apply certain Unicode-aware processing based on whether
> > >> the client encoding is UTF8.  The same should be done here.
> > >>
> > >> There is a patch somewhere in the pipeline that would automatically set
> > >> the psql client encoding to whatever the locale says, but until that is
> > >> done, the client encoding should be the sole setting that rules what
> > >> kind of character set processing is done on the client side.
> >
> > > OK, that makes sense to a certain extent.  However, the characters
> > > used to draw the table lines are not really that related to the
> > > client encoding for data sent from the database (IMHO).
> >
> > Huh?  The data *in* the table is going to be in the client_encoding, and
> > psql contains no mechanisms that would translate it to something else.
> > Surrounding it with decoration in a different encoding is just a recipe
> > for breakage.
>
> Ah, I was under the mistaken assumption that this was iconv()ed or
> otherwise translated for correct display.  In that case, I'll leave
> the patch as is (using the client encoding for table lines).
>
> I've attached an updated copy of the patch (it just removes the
> now unneeded langinfo.h header).

This patch included a bit of code not intended for inclusion
(setting of client encoding based on locale), which the attached
(and hopefully final!) revision of the patch excludes.


Regards,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment

Re: Unicode UTF-8 table formatting for psql text output

From
"Brad T. Sliger"
Date:
On Tuesday 06 October 2009 11:35:03 Roger Leigh wrote:
> On Tue, Oct 06, 2009 at 10:44:27AM +0100, Roger Leigh wrote:
> > On Mon, Oct 05, 2009 at 04:32:08PM -0400, Tom Lane wrote:
> > > Roger Leigh <rleigh@codelibre.net> writes:
> > > > On Sun, Oct 04, 2009 at 11:22:27PM +0300, Peter Eisentraut wrote:
> > > >> Elsewhere in the psql code, notably in mbprint.c, we make the
> > > >> decision on whether to apply certain Unicode-aware processing based
> > > >> on whether the client encoding is UTF8.  The same should be done
> > > >> here.
> > > >>
> > > >> There is a patch somewhere in the pipeline that would automatically
> > > >> set the psql client encoding to whatever the locale says, but until
> > > >> that is done, the client encoding should be the sole setting that
> > > >> rules what kind of character set processing is done on the client
> > > >> side.
> > > >
> > > > OK, that makes sense to a certain extent.  However, the characters
> > > > used to draw the table lines are not really that related to the
> > > > client encoding for data sent from the database (IMHO).
> > >
> > > Huh?  The data *in* the table is going to be in the client_encoding,
> > > and psql contains no mechanisms that would translate it to something
> > > else. Surrounding it with decoration in a different encoding is just a
> > > recipe for breakage.
> >
> > Ah, I was under the mistaken assumption that this was iconv()ed or
> > otherwise translated for correct display.  In that case, I'll leave
> > the patch as is (using the client encoding for table lines).
> >
> > I've attached an updated copy of the patch (it just removes the
> > now unneeded langinfo.h header).
>
> This patch included a bit of code not intended for inclusion
> (setting of client encoding based on locale), which the attached
> (and hopefully final!) revision of the patch excludes.
>
>
> Regards,
> Roger
I looked at psql-utf8-table-9.patch.
It applies, builds and installs.  `gmake check` passes and is not affected by values of LANG or LC_ALL in the 
environment.  HTML and man documentation builds and looks good.  The patch doesn't introduce new lint.
The psql utility displays UTF8 line art when the client encoding is UTF8, and ASCII line art is displayed otherwise.  
This can be overridden with `\pset linestyle ASCII` or `\pset linestyle UTF8`.  The psql line art is no longer affected
by
 
the values of LANG or LC_ALL in the environment.
As far as I can tell, everything looks reasonable.
Thanks,
--bts


Re: Unicode UTF-8 table formatting for psql text output

From
Peter Eisentraut
Date:
On Tue, 2009-10-06 at 19:35 +0100, Roger Leigh wrote:
> This patch included a bit of code not intended for inclusion
> (setting of client encoding based on locale), which the attached
> (and hopefully final!) revision of the patch excludes.

Well, the documentation still claims that this is dependent on the
locale.  This should be updated to match the code.

I think the setting ought be called linestyle unicode (instead of utf8),
since the same setting would presumably work in case we ever implement
UTF-16 support on the client side.




Re: Unicode UTF-8 table formatting for psql text output

From
"Kevin Grittner"
Date:
Peter Eisentraut <peter_e@gmx.net> wrote:
> I think the setting ought be called linestyle unicode (instead of
> utf8), since the same setting would presumably work in case we ever
> implement UTF-16 support on the client side.
Yeah, anytime one gets sloppy with the distinction between a character
set and a character encoding scheme, one tends to regret it, sooner or
later.  Here's we're talking about which glyphs to show -- that's
based on a character set.
-Kevin


Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Fri, Oct 09, 2009 at 04:35:46PM -0500, Kevin Grittner wrote:
> Peter Eisentraut <peter_e@gmx.net> wrote:
>
> > I think the setting ought be called linestyle unicode (instead of
> > utf8), since the same setting would presumably work in case we ever
> > implement UTF-16 support on the client side.
>
> Yeah, anytime one gets sloppy with the distinction between a character
> set and a character encoding scheme, one tends to regret it, sooner or
> later.  Here's we're talking about which glyphs to show -- that's
> based on a character set.

The attached updated patch renames all user-visible uses of
"utf8" to "unicode".  It also updates the documentation
regarding "locale" to "psql client character set encoding"
so the docs now match the code exactly.


Regards,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment

Re: Unicode UTF-8 table formatting for psql text output

From
Tom Lane
Date:
Roger Leigh <rleigh@codelibre.net> writes:
> The attached updated patch renames all user-visible uses of
> "utf8" to "unicode".  It also updates the documentation
> regarding "locale" to "psql client character set encoding"
> so the docs now match the code exactly.

Applied with light editorialization.  The main non-cosmetic change
I made was to postpone selection of default line_style until runtime
(see get_line_style() in the committed patch).  The original coding
required knowledge of the line_style default rule not only in three
different places in psql, but in every other place using print.c,
such as createlang/droplang -l (which dumped core with the patch as
submitted).  I changed it so that leaving line_style NULL implies
the default encoding-driven behavior, so that we don't need to touch
any of the callers.
        regards, tom lane


Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Tue, Oct 13, 2009 at 05:08:20PM -0400, Tom Lane wrote:
> Roger Leigh <rleigh@codelibre.net> writes:
> > The attached updated patch renames all user-visible uses of
> > "utf8" to "unicode".  It also updates the documentation
> > regarding "locale" to "psql client character set encoding"
> > so the docs now match the code exactly.
>
> Applied with light editorialization.  The main non-cosmetic change
> I made was to postpone selection of default line_style until runtime
> (see get_line_style() in the committed patch).  The original coding
> required knowledge of the line_style default rule not only in three
> different places in psql, but in every other place using print.c,
> such as createlang/droplang -l (which dumped core with the patch as
> submitted).  I changed it so that leaving line_style NULL implies
> the default encoding-driven behavior, so that we don't need to touch
> any of the callers.

Thanks.  I agree with your improvement in get_line_style(), it's
cleaner this way; I didn't realise print.c was used by the other
programs, sorry about that.


Regards,
Roger

--  .''`.  Roger Leigh: :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/`. `'   Printing on
GNU/Linux?      http://gutenprint.sourceforge.net/  `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail. 

Re: Unicode UTF-8 table formatting for psql text output

From
Peter Eisentraut
Date:
(HTML mail to preserve formatting; let's see if it works.)<br /><br /> I like the new Unicode tables, but the marking
ofcontinuation lines looks pretty horrible:<br /><br />  <tt>                            List of databases</tt><br
/><tt>    Name      │ Owner │ Encoding │ Collation │ Ctype │ Access privileges</tt><br
/><tt>───────────────┼───────┼──────────┼───────────┼───────┼───────────────────</tt><br/><tt> pl_regression │ peter │
LATIN2  │ cs_CZ     │ cs_CZ │</tt><br /><tt> postgres      │ peter │ LATIN2   │ cs_CZ     │ cs_CZ │</tt><br /><tt>
template0    │ peter │ LATIN2   │ cs_CZ     │ cs_CZ │ =c/peter</tt><br /><tt>               ╷       ╷         
╷          ╷       ╎ peter=CTc/peter</tt><br /><tt> template1     │ peter │ LATIN2   │ cs_CZ     │ cs_CZ │
=c/peter</tt><br/><tt>               ╷       ╷          ╷           ╷       ╎ peter=CTc/peter</tt><br /><tt>(4
rows)</tt><br/><br /> This looks more like a rendering mistake than something sensible, and also it doesn't actually
helpthe viewer to tell what lines are continued, which was the original purpose.<br /><br /> On a side note, I don't
reallyunderstand why the access privileges need to be broken up over multiple lines.  There is plenty of space left on
theline.<br /><br /> Note that the above is close to a default setup, so that is what many people will see by default
fromnow on.  

Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Fri, Oct 16, 2009 at 01:38:15PM +0300, Peter Eisentraut wrote:
> (HTML mail to preserve formatting; let's see if it works.)
>
> I like the new Unicode tables, but the marking of continuation lines
> looks pretty horrible:

Yes, I'm not so keen myself.  The ASCII characters used are '|', ':' and
' ' for normal, wrapped and newlines.  Here, we are using vertical lines
with two or three dashes to replace the latter two.

As mentioned earlier in the thread, this patch will be followed up
with a futher patch to unify the folding/newline behaviour between
column header and data lines, which will replace these with normal
vertical lines in all cases, with the addition of characters in the
margin such as as a CR symbol or ellipsis to indicate newlines or
wrapping.  To preserve backward compatibility, the ASCII output will
remain unchanged.

I would have done this in the patch as applied, but was asked to hold
off doing that until the initial work was committed.

> Note that the above is close to a default setup, so that is what many
> people will see by default from now on.

I'll try to get the above proposed change done in the next week or so,
in time for the next CommitFest, if you agree with the general idea.


Regards,
Roger

--  .''`.  Roger Leigh: :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/`. `'   Printing on
GNU/Linux?      http://gutenprint.sourceforge.net/  `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail. 

Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Fri, Oct 16, 2009 at 01:38:15PM +0300, Peter Eisentraut wrote:
> I like the new Unicode tables, but the marking of continuation lines
> looks pretty horrible:
>
>                              List of databases
>      Name      │ Owner │ Encoding │ Collation │ Ctype │ Access
> privileges
> ───────────────┼───────┼──────────┼───────────┼───────┼───────────────────
>  pl_regression │ peter │ LATIN2   │ cs_CZ     │ cs_CZ │
>  postgres      │ peter │ LATIN2   │ cs_CZ     │ cs_CZ │
>  template0     │ peter │ LATIN2   │ cs_CZ     │ cs_CZ │ =c/peter
>                ╷       ╷          ╷           ╷       ╎ peter=CTc/peter
>  template1     │ peter │ LATIN2   │ cs_CZ     │ cs_CZ │ =c/peter
>                ╷       ╷          ╷           ╷       ╎ peter=CTc/peter
> (4 rows)
>
> This looks more like a rendering mistake than something sensible, and
> also it doesn't actually help the viewer to tell what lines are
> continued, which was the original purpose.

I've worked on a solution to this, and the preliminary patch for this
is attached.  Note there are additional comments in the patch text.

This patch does break backward compatibility with older psql versions,
by using symbols in the column border to mark newlines and wrapped
lines rather than putting ':' and ';' and ' ' symbols in the vertical
table lines.  This makes things somewhat more consistent and readable
but at the expense of not perfectly preserving output.  The ASCII
rules are a little more compatible than the Unicode rules, but both
do break things a little.  If the data lines do not contain either
newlines or wrapped text, then the output will remain unchanged.

Any feedback would be appreciated.


Regards,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment

Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Sat, Oct 24, 2009 at 06:23:24PM +0100, Roger Leigh wrote:
> On Fri, Oct 16, 2009 at 01:38:15PM +0300, Peter Eisentraut wrote:
> > I like the new Unicode tables, but the marking of continuation lines
> > looks pretty horrible:
> >
> >                              List of databases
> >      Name      │ Owner │ Encoding │ Collation │ Ctype │ Access
> > privileges
> > ───────────────┼───────┼──────────┼───────────┼───────┼───────────────────
> >  pl_regression │ peter │ LATIN2   │ cs_CZ     │ cs_CZ │
> >  postgres      │ peter │ LATIN2   │ cs_CZ     │ cs_CZ │
> >  template0     │ peter │ LATIN2   │ cs_CZ     │ cs_CZ │ =c/peter
> >                ╷       ╷          ╷           ╷       ╎ peter=CTc/peter
> >  template1     │ peter │ LATIN2   │ cs_CZ     │ cs_CZ │ =c/peter
> >                ╷       ╷          ╷           ╷       ╎ peter=CTc/peter
> > (4 rows)

Just for reference, this is what the output looks like (abridged)
using the attached patch.  Should display fine if your mail client handles
UTF-8 messages correctly:

rleigh=# \l
                                     List of databases
      Name       │  Owner   │ Encoding │  Collation  │    Ctype    │   Access privileges
─────────────────┼──────────┼──────────┼─────────────┼─────────────┼───────────────────────
 merkelpb        │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │
 postgres        │ postgres │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │
 projectb        │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │
 rleigh          │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │
[…]
 template0       │ postgres │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │ =c/postgres          ↵
                 │          │          │             │             │ postgres=CTc/postgres
 template1       │ postgres │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │ =c/postgres          ↵
                 │          │          │             │             │ postgres=CTc/postgres
[…]
(17 rows)

> > This looks more like a rendering mistake than something sensible, and
> > also it doesn't actually help the viewer to tell what lines are
> > continued, which was the original purpose.
>
> I've worked on a solution to this, and the preliminary patch for this
> is attached.  Note there are additional comments in the patch text.

I've attached an updated version of the patch.  This now
- adds one column extra padding in wrapped mode if border=0 to allow
  correct display of wrap marks on the rightmost column.  Removed
  special cases needed to suppress marks on the rightmost column.
- updated unicode symbols to use more commonly-available symbol
  (ellipsis) rather than hooked arrows.  Also makes wrapping more
  visually distinct cf. newlines.  The CR and ellipsis symbols are
  commonly available in several character sets other than unicode,
  and are available in many common fonts, including all the Windows
  monospaced terminal fonts.  Symbol availability isn't an issue on
  GNU/Linux.
- updated ASCII symbols to use similar-looking symbols to the Unicode
  ones.
- added some more comments to code
- removed redundant enum values

I've also attached a trivial test script to run through psql to test
the wrapping, as well as the output I get for psql from 8.4.1 and
the current mainline (both Unicode and ASCII) for comparison.
Hopefully I'm not the only one that finds the proposed change clearer
and more logical than the existing display!


There's just one tiny display glitch I can see, and that's if you have
mixed wrapping and newlines, you miss the lefthand wrap mark if the line
is the last wrapped line and it ends in a newline.  It might not be
possible to pick that up if we can't discover that from the line state
when printing out the line.  Might possibly require storing the wrap
state so we know what happened on the previous line, though if there's
a slightly cleverer approach to detecting if we're already wrapped
that would be better.


Regards,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment

Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Sun, Oct 25, 2009 at 11:48:27PM +0000, Roger Leigh wrote:
> There's just one tiny display glitch I can see, and that's if you have
> mixed wrapping and newlines, you miss the lefthand wrap mark if the line
> is the last wrapped line and it ends in a newline.  It might not be
> possible to pick that up if we can't discover that from the line state
> when printing out the line.  Might possibly require storing the wrap
> state so we know what happened on the previous line, though if there's
> a slightly cleverer approach to detecting if we're already wrapped
> that would be better.

This appears simpler and more robust, so I've gone with this approach,
and attached a new patch which fixes it.


Regards,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment

Re: Unicode UTF-8 table formatting for psql text output

From
Peter Eisentraut
Date:
On sön, 2009-10-25 at 23:48 +0000, Roger Leigh wrote:
> Just for reference, this is what the output looks like (abridged)
> using the attached patch.  Should display fine if your mail client handles
> UTF-8 messages correctly:
> 
> rleigh=# \l
>                                      List of databases
>       Name       │  Owner   │ Encoding │  Collation  │    Ctype    │   Access privileges
> ─────────────────┼──────────┼──────────┼─────────────┼─────────────┼───────────────────────
>  merkelpb        │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │
>  postgres        │ postgres │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │
>  projectb        │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │
>  rleigh          │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │
> […]
>  template0       │ postgres │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │ =c/postgres          ↵
>                  │          │          │             │             │ postgres=CTc/postgres
>  template1       │ postgres │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │ =c/postgres          ↵
>                  │          │          │             │             │ postgres=CTc/postgres
> […]
> (17 rows)

That's pretty much what I had in mind.  Cool.



Re: Unicode UTF-8 table formatting for psql text output

From
Greg Stark
Date:
2009/10/25 Roger Leigh <rleigh@codelibre.net>:
> rleigh=# \l
>                                     List of databases
>      Name       │  Owner   │ Encoding │  Collation  │    Ctype    │   Access privileges
> ─────────────────┼──────────┼──────────┼─────────────┼─────────────┼───────────────────────
>  merkelpb        │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │
>  postgres        │ postgres │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │
>  projectb        │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │
>  rleigh          │ rleigh   │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │
> […]
>  template0       │ postgres │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │ =c/postgres          ↵
>                 │          │          │             │             │ postgres=CTc/postgres
>  template1       │ postgres │ UTF8     │ en_GB.UTF-8 │ en_GB.UTF-8 │ =c/postgres          ↵
>                 │          │          │             │             │ postgres=CTc/postgres
> […]
> (17 rows)
>


While i agree this looks nicer I wonder what it does to things like
excel/gnumeric/ooffice auto-recognizing table layouts and importing
files. I'm not sure our old format was so great for this so maybe this
is actually an improvement I'm asking for. But as long as we're
changing the format... It would at at least be good to test the
behaviour



--
greg

Re: Unicode UTF-8 table formatting for psql text output

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> While i agree this looks nicer I wonder what it does to things like
> excel/gnumeric/ooffice auto-recognizing table layouts and importing
> files. I'm not sure our old format was so great for this so maybe this
> is actually an improvement I'm asking for.

Yeah.  We can do what we like with the UTF8 format but I'm considerably
more worried about the aspect of making random changes to the
plain-ASCII output.  On the other hand, we changed that just a release
or so ago (to put in the multiline output in the first place) and
I didn't hear complaints about it that time.
        regards, tom lane


Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Mon, Oct 26, 2009 at 01:33:19PM -0400, Tom Lane wrote:
> Greg Stark <gsstark@mit.edu> writes:
> > While i agree this looks nicer I wonder what it does to things like
> > excel/gnumeric/ooffice auto-recognizing table layouts and importing
> > files. I'm not sure our old format was so great for this so maybe this
> > is actually an improvement I'm asking for.
>
> Yeah.  We can do what we like with the UTF8 format but I'm considerably
> more worried about the aspect of making random changes to the
> plain-ASCII output.  On the other hand, we changed that just a release
> or so ago (to put in the multiline output in the first place) and
> I didn't hear complaints about it that time.

I checked (using strace)

gnumeric (via libgda and gnome-database-properties)
openoffice (oobase)

Both spreadsheets require a connection to be set up first for them to
use as a handle, so I did that and traced from there.  Neither made
any use of psql; they both appear to use libpq via their respective
database abstraction libs--no forking of any children observed.

Excel is a bit tougher, I bought my first copy last week for other
reasons, but I lack both windows expertise and debugging tools to trace
things, and I also dual boot my computer with the postgres install on
the Linux partition, making connecting to the database rather hard!  I
think someone else is better suited to check this one!


On a related note, there's something odd with the pager code.  The output
of \l with the pager off:

rleigh=# \l                                    List of databases     Name       │  Owner   │ Encoding │  Collation  │
Ctype    │   Access privileges 
─────────────────┼──────────┼──────────┼─────────────┼─────────────┼───────────────────────
[...]

(header line is 91 characters, 273 bytes)

And with the pager on:

rleigh=# \l                                    List of databases     Name       │  Owner   │ Encoding │  Collation  │
Ctype    │   Access privileges    
─────────────────┼──────────┼──────────┼─────────────┼─────────────┼─────────────────
��─────
[...]

(longest header line 85 characters, 255 bytes, 256 bytes inc. LF,
remainder on second line)

Note that the pager wasn't required and so wasn't actually invoked, but
the output was corrupted.  A newline was inserted almost at the end of
the line and the continuation lacks a leading \342 which (since these
UTF-8 codes are all three-byte) leads to two bytes which are invalid
UTF-8.  Since this spurious newline got inserted exactly on a 256 byte
boundary, I was wondering if there was some buffer either internal to
psql or in the termios/pty layer that was getting flushed.  It also
lost the first byte of the second line (possibly swapped for the \n).

Another wierdness: it only happens if the terminal width is > 85
columns wide, otherwise it just wraps around as one would expect!
AFAICT there are no 255/256 length buffers in the code, and the code
doing the printing is just doing stdio to fout which is either stdout
or a pipe!  Because of this, I can't see how the spurious \n appears
in the middle of a simple loop.  If border=2, you'll see this for all
top mid and bottom ruled lines.

I do see strace showing some termios fiddling, could that be at fault
or is that just readline ncurses initialisation?


Regards,
Roger

--  .''`.  Roger Leigh: :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/`. `'   Printing on
GNU/Linux?      http://gutenprint.sourceforge.net/  `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail. 

Re: Unicode UTF-8 table formatting for psql text output

From
Tom Lane
Date:
Roger Leigh <rleigh@codelibre.net> writes:
> On Mon, Oct 26, 2009 at 01:33:19PM -0400, Tom Lane wrote:
>> Yeah.  We can do what we like with the UTF8 format but I'm considerably
>> more worried about the aspect of making random changes to the
>> plain-ASCII output.

> I checked (using strace)
> gnumeric (via libgda and gnome-database-properties)
> openoffice (oobase)

Even if that were the entire universe of programs we cared about,
whether their internal ODBC logic goes through psql isn't really
the point here.  What I'm worried about is somebody piping the
text output of psql into another program.

> On a related note, there's something odd with the pager code.

Hm, what platform are you testing that on?
        regards, tom lane


Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Mon, Oct 26, 2009 at 07:19:24PM -0400, Tom Lane wrote:
> Roger Leigh <rleigh@codelibre.net> writes:
> > On Mon, Oct 26, 2009 at 01:33:19PM -0400, Tom Lane wrote:
> >> Yeah.  We can do what we like with the UTF8 format but I'm considerably
> >> more worried about the aspect of making random changes to the
> >> plain-ASCII output.
>
> > I checked (using strace)
> > gnumeric (via libgda and gnome-database-properties)
> > openoffice (oobase)
>
> Even if that were the entire universe of programs we cared about,
> whether their internal ODBC logic goes through psql isn't really
> the point here.  What I'm worried about is somebody piping the
> text output of psql into another program.
>
> > On a related note, there's something odd with the pager code.
>
> Hm, what platform are you testing that on?

Debian GNU/Linux (unstable)
linux        2.6.30
eglibc       2.10.1
libreadline6 6.0.5
libncurses5  5.7
gcc          4.3.4

This is the trace of the broken write:

16206 write(1, "      Name       \342\224\202  Owner   \342\224"..., 102) = 102
16206 write(1,
"\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342
\224"..., 256) = 256
16206 write(1, "\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\n", 18) = 18

I'll attach the whole thing for reference.  What's clear is that the
first write was *exactly* 256 bytes, which is what was requested,
presumably by libc stdio buffering (which shouldn't by itself be a
problem).  Since we use 3-byte UTF-8 and 256/3 is 85 + 1 remainder,
this is where the wierd 85 char forced newline comes from.  Since it
only happens when the terminal window is >85 chars, that's where I'm
assuming some odd termios influence comes from (though it might just
be the source of the window size and be completely innocent).  The
fact that libc did the two separate writes kind of rules out termios
mangling the output post-write().


Regards,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment

Re: Unicode UTF-8 table formatting for psql text output

From
Peter Eisentraut
Date:
On Mon, 2009-10-26 at 10:12 -0700, Greg Stark wrote:
> While i agree this looks nicer I wonder what it does to things like
> excel/gnumeric/ooffice auto-recognizing table layouts and importing
> files. I'm not sure our old format was so great for this so maybe this
> is actually an improvement I'm asking for. But as long as we're
> changing the format... It would at at least be good to test the
> behaviour

What exactly are you referring to here?



Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Mon, Oct 26, 2009 at 11:33:40PM +0000, Roger Leigh wrote:
> On Mon, Oct 26, 2009 at 07:19:24PM -0400, Tom Lane wrote:
> > Roger Leigh <rleigh@codelibre.net> writes:
> > > On Mon, Oct 26, 2009 at 01:33:19PM -0400, Tom Lane wrote:
> > >> Yeah.  We can do what we like with the UTF8 format but I'm considerably
> > >> more worried about the aspect of making random changes to the
> > >> plain-ASCII output.
> >
> > > I checked (using strace)
> > > gnumeric (via libgda and gnome-database-properties)
> > > openoffice (oobase)
> >
> > Even if that were the entire universe of programs we cared about,
> > whether their internal ODBC logic goes through psql isn't really
> > the point here.  What I'm worried about is somebody piping the
> > text output of psql into another program.
> >
> > > On a related note, there's something odd with the pager code.
> >
> > Hm, what platform are you testing that on?
>
> Debian GNU/Linux (unstable)
> linux        2.6.30
> eglibc       2.10.1
> libreadline6 6.0.5
> libncurses5  5.7
> gcc          4.3.4
>
> This is the trace of the broken write:
>
> 16206 write(1, "      Name       \342\224\202  Owner   \342\224"..., 102) = 102
> 16206 write(1,
"\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342
> \224"..., 256) = 256
> 16206 write(1, "\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\n", 18) = 18

Further tracing showed this to be a bug in the "util-linux" version of
"more" which had a static 256 byte line buffer.  The above was a red
herring--it's writing to a pipe.  I've sent a patch to fix this by
increasing the buffer size.


--  .''`.  Roger Leigh: :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/`. `'   Printing on
GNU/Linux?      http://gutenprint.sourceforge.net/  `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail. 

Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Mon, Oct 26, 2009 at 01:33:19PM -0400, Tom Lane wrote:
> Greg Stark <gsstark@mit.edu> writes:
> > While i agree this looks nicer I wonder what it does to things like
> > excel/gnumeric/ooffice auto-recognizing table layouts and importing
> > files. I'm not sure our old format was so great for this so maybe this
> > is actually an improvement I'm asking for.
>
> Yeah.  We can do what we like with the UTF8 format but I'm considerably
> more worried about the aspect of making random changes to the
> plain-ASCII output.  On the other hand, we changed that just a release
> or so ago (to put in the multiline output in the first place) and
> I didn't hear complaints about it that time.

The attached updated patch makes sure that the ASCII display remains
the same (bar trailing whitespace used for padding).

I'm tempted to add an additional ascii format such as "ascii-clean"
which cleans up the inconsistencies in the formatting as for the
unicode format, while the existing ascii format would remain the
default for backwards compatibility.


Regards,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment

Re: Unicode UTF-8 table formatting for psql text output

From
Greg Stark
Date:
On Mon, Oct 26, 2009 at 11:43 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On Mon, 2009-10-26 at 10:12 -0700, Greg Stark wrote:
>> While i agree this looks nicer I wonder what it does to things like
>> excel/gnumeric/ooffice auto-recognizing table layouts and importing
>> files. I'm not sure our old format was so great for this so maybe this
>> is actually an improvement I'm asking for. But as long as we're
>> changing the format... It would at at least be good to test the
>> behaviour
>
> What exactly are you referring to here?

run something like this:

$ psql
stark=> \o /tmp/s
stark=> select generate_series(1,10),generate_series(1,5);
$ gnumeric /tmp/s&
$ ooffice /tmp/s&
$ kspread /tmp/s&

With the 8.4 formatting gnumeric automatically guesses that | is the
separator and formats the speadsheet quite reasonably. Open Office
gets confused and opens the word processor, but if you do "insert
sheet from file" and manually deselect the space and semicolon
delimiters and put | as an "other" delimiter then it looks like it
should work. I don't have kspread handy.

Does gnumeric still autorecognize the new formats? Do the newline
indicators in 8.4 mess up gnumeric? Are the new ones better or worse?

This hasn't been a top priority in the past and the ReST discussion
seemed to end up concluding that we shouldn't bother if we can't make
it perfect. I'm not sure I agree with that, but in any case I think as
long as we're changing the format we may as well check to see what the
status is.



-- 
greg


Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Sat, Oct 31, 2009 at 05:11:10AM -0700, Greg Stark wrote:
> On Mon, Oct 26, 2009 at 11:43 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> > On Mon, 2009-10-26 at 10:12 -0700, Greg Stark wrote:
> >> While i agree this looks nicer I wonder what it does to things like
> >> excel/gnumeric/ooffice auto-recognizing table layouts and importing
> >> files. I'm not sure our old format was so great for this so maybe this
> >> is actually an improvement I'm asking for. But as long as we're
> >> changing the format... It would at at least be good to test the
> >> behaviour
> >
> > What exactly are you referring to here?
>
> run something like this:
>
> $ psql
> stark=> \o /tmp/s
> stark=> select generate_series(1,10),generate_series(1,5);
> $ gnumeric /tmp/s&
> $ ooffice /tmp/s&
> $ kspread /tmp/s&
>
> With the 8.4 formatting gnumeric automatically guesses that | is the
> separator and formats the speadsheet quite reasonably. Open Office
> gets confused and opens the word processor, but if you do "insert
> sheet from file" and manually deselect the space and semicolon
> delimiters and put | as an "other" delimiter then it looks like it
> should work. I don't have kspread handy.
>
> Does gnumeric still autorecognize the new formats? Do the newline
> indicators in 8.4 mess up gnumeric? Are the new ones better or worse?
>
> This hasn't been a top priority in the past and the ReST discussion
> seemed to end up concluding that we shouldn't bother if we can't make
> it perfect. I'm not sure I agree with that, but in any case I think as
> long as we're changing the format we may as well check to see what the
> status is.

Surely if people want a machine-readable output format, they should
either

1) use libpq or one of its bindings, or
2) use a dedicated machine-readable output format such as CSV, which  is /designed/ for spreadsheet import.

The standard psql output formats (aligned, unaligned) are for
human-readable output and the others (latex, html, troff-ms) are
marked up for the respective tools.  None of these are really
useful for other programs to parse.

Wouldn't it be much simpler all around to add a "csv" output format
in addition to the above for this purpose?  Spreadsheets can read
it in with no trouble at all.


Regards,
Roger

--  .''`.  Roger Leigh: :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/`. `'   Printing on
GNU/Linux?      http://gutenprint.sourceforge.net/  `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail. 

Re: Unicode UTF-8 table formatting for psql text output

From
Andrew Dunstan
Date:

Roger Leigh wrote:
>
> Wouldn't it be much simpler all around to add a "csv" output format
> in addition to the above for this purpose?  Spreadsheets can read
> it in with no trouble at all.
>
>
>
>   

We've had CSV output since version 8.0.

cheers

andrew


Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Sat, Oct 31, 2009 at 12:25:22PM -0400, Andrew Dunstan wrote:
>
>
> Roger Leigh wrote:
>>
>> Wouldn't it be much simpler all around to add a "csv" output format
>> in addition to the above for this purpose?  Spreadsheets can read
>> it in with no trouble at all.
>
> We've had CSV output since version 8.0.

Really?  The only references I see are in tab-complete.c relating to
COPY.

You can set the field separator to ',' but you can't do a \pset format csv
and get CSV with correct quoting, escaping etc AFAICS.  It'll
still break on line wrapping if wrapping is enabled, and with
newlines in the data.

If that would be a useful addition, I can add it.


Regards,
Roger

--  .''`.  Roger Leigh: :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/`. `'   Printing on
GNU/Linux?      http://gutenprint.sourceforge.net/  `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail. 

Re: Unicode UTF-8 table formatting for psql text output

From
Andrew Dunstan
Date:

Roger Leigh wrote:
> On Sat, Oct 31, 2009 at 12:25:22PM -0400, Andrew Dunstan wrote:
>   
>> Roger Leigh wrote:
>>     
>>> Wouldn't it be much simpler all around to add a "csv" output format
>>> in addition to the above for this purpose?  Spreadsheets can read
>>> it in with no trouble at all.
>>>       
>> We've had CSV output since version 8.0.
>>     
>
> Really?  The only references I see are in tab-complete.c relating to
> COPY.
>
> You can set the field separator to ',' but you can't do a
>   \pset format csv
> and get CSV with correct quoting, escaping etc AFAICS.  It'll
> still break on line wrapping if wrapping is enabled, and with
> newlines in the data.
>
> If that would be a useful addition, I can add it.
>
>
>
>   

It's done by the backend, not by psql, but it has psql support - see \copy

cheers

andrew


Re: Unicode UTF-8 table formatting for psql text output

From
Bruce Momjian
Date:
Tom Lane wrote:
> Greg Stark <gsstark@mit.edu> writes:
> > While i agree this looks nicer I wonder what it does to things like
> > excel/gnumeric/ooffice auto-recognizing table layouts and importing
> > files. I'm not sure our old format was so great for this so maybe this
> > is actually an improvement I'm asking for.
> 
> Yeah.  We can do what we like with the UTF8 format but I'm considerably
> more worried about the aspect of making random changes to the
> plain-ASCII output.  On the other hand, we changed that just a release
> or so ago (to put in the multiline output in the first place) and
> I didn't hear complaints about it that time.

Sorry for the delayed reply:

The line continuation characters were chosen in 8.4 for clarity --- if
you have found something clearer for 8.5, let's make the improvement.  I
think clarity is more important in this area than consistency with the
previous psql output format.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Unicode UTF-8 table formatting for psql text output

From
Alvaro Herrera
Date:
Andrew Dunstan wrote:

> >You can set the field separator to ',' but you can't do a
> >  \pset format csv
> >and get CSV with correct quoting, escaping etc AFAICS.  It'll
> >still break on line wrapping if wrapping is enabled, and with
> >newlines in the data.
> >
> >If that would be a useful addition, I can add it.
> 
> It's done by the backend, not by psql, but it has psql support - see \copy

I agree with Roger that we should really have a CSV option in \pset format.
COPY as CVS is certainly very useful, but it's a different use case.
(You can't \copy a \-command for example).

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Mon, Nov 09, 2009 at 05:40:54PM -0500, Bruce Momjian wrote:
> Tom Lane wrote:
> > Greg Stark <gsstark@mit.edu> writes:
> > > While i agree this looks nicer I wonder what it does to things like
> > > excel/gnumeric/ooffice auto-recognizing table layouts and importing
> > > files. I'm not sure our old format was so great for this so maybe this
> > > is actually an improvement I'm asking for.
> >
> > Yeah.  We can do what we like with the UTF8 format but I'm considerably
> > more worried about the aspect of making random changes to the
> > plain-ASCII output.  On the other hand, we changed that just a release
> > or so ago (to put in the multiline output in the first place) and
> > I didn't hear complaints about it that time.
>
> Sorry for the delayed reply:
>
> The line continuation characters were chosen in 8.4 for clarity --- if
> you have found something clearer for 8.5, let's make the improvement.  I
> think clarity is more important in this area than consistency with the
> previous psql output format.

The attached patch is proposed for the upcoming commitfest, and
hopefully takes into account the comments made in this thread.
To summarise the changes:

- it makes the handling of newlines and wrapped lines consistent
  between column header and data lines.
- it includes additional logic such that both the "old" and "new"
  styles are representable using the format structures, so we
  can retain backward compatibility if so desired (it's easy to
  remove if not).
- an "ascii-old" linestyle is added which is identical to the old
  style for those who need guaranteed reproducibility of output,
  but this is not the default.
- The Unicode format uses "↵" in the right-hand margin to indicate
  newlines.  Wrapped lines use "…" in the right-hand margin before,
  and left-hand margin after, a break (so you can visually follow
  the wrap).
- The ASCII format is the same but uses "+" and "." in place of
  carriage return and ellipsis symbols.
- All the above is documented in the SGML documentation, including
  the old style, which I always found confusing.

For comparison, I've included a transcript of all three linestyles
with a test script (also attached).

Any changes to the symbols used and/or their placement are trivially
made by just altering the format in print.c.


Regards,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment

Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Sat, Nov 14, 2009 at 05:40:24PM +0000, Roger Leigh wrote:
> On Mon, Nov 09, 2009 at 05:40:54PM -0500, Bruce Momjian wrote:
> > Tom Lane wrote:
> > > Greg Stark <gsstark@mit.edu> writes:
> > > > While i agree this looks nicer I wonder what it does to things like
> > > > excel/gnumeric/ooffice auto-recognizing table layouts and importing
> > > > files. I'm not sure our old format was so great for this so maybe this
> > > > is actually an improvement I'm asking for.
> > >
> > > Yeah.  We can do what we like with the UTF8 format but I'm considerably
> > > more worried about the aspect of making random changes to the
> > > plain-ASCII output.  On the other hand, we changed that just a release
> > > or so ago (to put in the multiline output in the first place) and
> > > I didn't hear complaints about it that time.
> >
> > Sorry for the delayed reply:
> >
> > The line continuation characters were chosen in 8.4 for clarity --- if
> > you have found something clearer for 8.5, let's make the improvement.  I
> > think clarity is more important in this area than consistency with the
> > previous psql output format.
>
> The attached patch is proposed for the upcoming commitfest, and
> hopefully takes into account the comments made in this thread.

One note: because it's not possible to know in advance (without
making things much more complex) whether or not a line will wrap
or continue, the code currently makes sure we fully pad output
up to the right margin of the table (finalspaces).  This is in
some cases unnecessary, but is needed when border=1 or else the
continuation/wrap symbols don't end up in the margin and will
look like they are part of the data.

The side effect from this change is that some of the testsuite
expected data will need updating due to the extra pad spaces
now added to the output (triggers, dependency, tsearch,
foreign_data, prepare, with).  If the actual output format is
OK with people then I'll update the patch to include the test
data updates as well.


Regards,
Roger

--  .''`.  Roger Leigh: :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/`. `'   Printing on
GNU/Linux?      http://gutenprint.sourceforge.net/  `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail. 

Re: Unicode UTF-8 table formatting for psql text output

From
Tom Lane
Date:
Roger Leigh <rleigh@codelibre.net> writes:
> The side effect from this change is that some of the testsuite
> expected data will need updating due to the extra pad spaces

No, we are *not* doing that.  Somebody made a change to the print.c
logic last year that started adding "harmless" white space to the
last column, and it was a complete disaster for tracking whether
anything important had changed in regression test output.  Please
undo that part of your patch.
        regards, tom lane


Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Sat, Nov 14, 2009 at 01:31:29PM -0500, Tom Lane wrote:
> Roger Leigh <rleigh@codelibre.net> writes:
> > The side effect from this change is that some of the testsuite
> > expected data will need updating due to the extra pad spaces
>
> No, we are *not* doing that.  Somebody made a change to the print.c
> logic last year that started adding "harmless" white space to the
> last column, and it was a complete disaster for tracking whether
> anything important had changed in regression test output.  Please
> undo that part of your patch.

No problem, done as requested.  I've attached an updated patch that
takes care to exactly match the trailing whitespace the existing
psql outputs.  This fixes most of the changes between observed and
expected test results.

Due to the fact that this patch does alter the output for newlines
and wrapped lines (being its intent), the patch does alter expected
testcase output for these specific cases.  Because the old
wrap/newline code put ":" and ";" in place of "|" between columns,
this meant that it never worked for the first column of data, which
included single column result sets.  This necessitated some changes
to the expected results to reflect this change (which now makes the
output uniform for all columns, irrespective of position).


Regards,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment

Re: Unicode UTF-8 table formatting for psql text output

From
Roger Leigh
Date:
On Sun, Nov 15, 2009 at 12:50:14AM +0000, Roger Leigh wrote:
> On Sat, Nov 14, 2009 at 01:31:29PM -0500, Tom Lane wrote:
> > Roger Leigh <rleigh@codelibre.net> writes:
> > > The side effect from this change is that some of the testsuite
> > > expected data will need updating due to the extra pad spaces
> >
> > No, we are *not* doing that.  Somebody made a change to the print.c
> > logic last year that started adding "harmless" white space to the
> > last column, and it was a complete disaster for tracking whether
> > anything important had changed in regression test output.  Please
> > undo that part of your patch.
>
> No problem, done as requested.  I've attached an updated patch that
> takes care to exactly match the trailing whitespace the existing
> psql outputs.  This fixes most of the changes between observed and
> expected test results.

Attached is an updated patch with a couple of tweaks to ensure output
is formatted and spaced correctly when border=0, which was off in the
last patch.


Regards,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment

Re: Unicode UTF-8 table formatting for psql text output

From
Tom Lane
Date:
Roger Leigh <rleigh@codelibre.net> writes:
> Attached is an updated patch with a couple of tweaks to ensure output
> is formatted and spaced correctly when border=0, which was off in the
> last patch.

Applied wih minor editorialization.  Notably, I renamed the
backwards-compatible option from "ascii-old" to "old-ascii",
because the original submission failed to preserve the documented
behavior that the options could be abbreviated to one letter.
        regards, tom lane


Re: Unicode UTF-8 table formatting for psql text output

From
Peter Eisentraut
Date:
On sön, 2009-11-22 at 00:23 -0500, Tom Lane wrote:
> Roger Leigh <rleigh@codelibre.net> writes:
> > Attached is an updated patch with a couple of tweaks to ensure output
> > is formatted and spaced correctly when border=0, which was off in the
> > last patch.
> 
> Applied wih minor editorialization.  Notably, I renamed the
> backwards-compatible option from "ascii-old" to "old-ascii",
> because the original submission failed to preserve the documented
> behavior that the options could be abbreviated to one letter.

What is the plan behind keeping the old format?  Are we going to remove
it after one release if no one complains, or are we seriously expecting
that someone has code that actually parses this?



Re: Unicode UTF-8 table formatting for psql text output

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> What is the plan behind keeping the old format?  Are we going to remove
> it after one release if no one complains, or are we seriously expecting
> that someone has code that actually parses this?

Plan?  Do we need a plan?  The extra support consists of about two lines
of code and a small table, so there wouldn't be much harm in leaving
it there forever.

On the other hand it seems pretty likely that no one would care after a
release or two.
        regards, tom lane