Thread: [PATCH] Formatting patch for psql

[PATCH] Formatting patch for psql

From
Martijn van Oosterhout
Date:
Hi,

Here is my patch to make psql handle multi-line output sanely. Instead
of embedded newlines splattering your output across the screen,
everything gets indented to the right column.

It basically works by extending the *_width functions to instead work
out the estimate width, height and formatting space of each cell. Then
the output goes through each cell formatting as it goes and outputting
lines at the right places.

I imagine it will be tweaked by others because everybody has different
ideas about how they want the output done. This just has the minimum
extra formatting to make the output unambiguous.

Also available at: http://svana.org/kleptog/pgsql/psql-format.patch

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: [PATCH] Formatting patch for psql

From
Peter Eisentraut
Date:
Martijn van Oosterhout wrote:
> Here is my patch to make psql handle multi-line output sanely.
> Instead of embedded newlines splattering your output across the
> screen, everything gets indented to the right column.

Shouldn't you use PQmblen() to determine whether an encoding is
single-byte, rather than having an enumerated list?

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: [PATCH] Formatting patch for psql

From
Martijn van Oosterhout
Date:
On Thu, Nov 24, 2005 at 03:45:12PM +0100, Peter Eisentraut wrote:
> Martijn van Oosterhout wrote:
> > Here is my patch to make psql handle multi-line output sanely.
> > Instead of embedded newlines splattering your output across the
> > screen, everything gets indented to the right column.
>
> Shouldn't you use PQmblen() to determine whether an encoding is
> single-byte, rather than having an enumerated list?

Probably. I went through psql to make sure there wasn't a function that
did it, I didn't think of checking for undocumented libpq functions.
Actually, I see there may be another there that would be nice:
PQdsplen.

Will look into it, thanks.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: [PATCH] Formatting patch for psql

From
Martijn van Oosterhout
Date:
On Thu, Nov 24, 2005 at 03:45:12PM +0100, Peter Eisentraut wrote:
> Shouldn't you use PQmblen() to determine whether an encoding is
> single-byte, rather than having an enumerated list?

Actually, thinking about it the answer is no. The code is relying on
the fact that it's a charset that has ASCII as a subset. Unless someone
can say that every single-byte encoding we support has an ASCII subset
(EBCDIC being the obvious counter-example for not supported encodings)
then I think we have to stick to enumerating them.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: [PATCH] Formatting patch for psql

From
Peter Eisentraut
Date:
Martijn van Oosterhout wrote:
> On Thu, Nov 24, 2005 at 03:45:12PM +0100, Peter Eisentraut wrote:
> > Shouldn't you use PQmblen() to determine whether an encoding is
> > single-byte, rather than having an enumerated list?
>
> Actually, thinking about it the answer is no. The code is relying on
> the fact that it's a charset that has ASCII as a subset.

Which code does that?

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: [PATCH] Formatting patch for psql

From
Martijn van Oosterhout
Date:
On Thu, Nov 24, 2005 at 11:35:11PM +0100, Peter Eisentraut wrote:
> Martijn van Oosterhout wrote:
> > On Thu, Nov 24, 2005 at 03:45:12PM +0100, Peter Eisentraut wrote:
> > > Shouldn't you use PQmblen() to determine whether an encoding is
> > > single-byte, rather than having an enumerated list?
> >
> > Actually, thinking about it the answer is no. The code is relying on
> > the fact that it's a charset that has ASCII as a subset.
>
> Which code does that?

Well, the code you're referring to: the two places where it enumerates
the encodings in my patch. That code basically understands \n as going
to the next line, but that's ASCII.

What I beleive you were suggesting was to replace the switch statement
with something like

if(PQmblen(...) == 1)
{ code for latin1 }
else if( encoding == UTF8 )
{ code for utf8 }
else
{etc}

And what I'm saying is that you can only do that if all single byte
encodings are ASCII compatable and I'm not prepared to say that...

So in the switch statement I listed the encodings I know to be ASCII
compatable and at some point someone who actually knows the other
encodings will explain what needs to be done.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: [PATCH] Formatting patch for psql

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> And what I'm saying is that you can only do that if all single byte
> encodings are ASCII compatable and I'm not prepared to say that...

psql already assumes that; see the flex processor.

> So in the switch statement I listed the encodings I know to be ASCII
> compatable and at some point someone who actually knows the other
> encodings will explain what needs to be done.

Even if this were an appropriate solution, the formatting code is not
the place to put such knowledge...

            regards, tom lane

Re: [PATCH] Formatting patch for psql

From
Martijn van Oosterhout
Date:
On Thu, Nov 24, 2005 at 09:23:55PM -0500, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > And what I'm saying is that you can only do that if all single byte
> > encodings are ASCII compatable and I'm not prepared to say that...
>
> psql already assumes that; see the flex processor.

Hmm, Ok, I didn't realise that.

> > So in the switch statement I listed the encodings I know to be ASCII
> > compatable and at some point someone who actually knows the other
> > encodings will explain what needs to be done.
>
> Even if this were an appropriate solution, the formatting code is not
> the place to put such knowledge...

It adds two functions to mbprint.c:

pg_wcssize   - To calculate the screen size required by the string (rows&cols)
pg_wcsformat - To actually format the text in the given lines

I agree that these should actually be operating system functions but we
can't rely on them. psql already has functions there for validating
UTF-8 and converting UTF-8 to UCS so it's in the right place in that
sense.

If you think it's better moved to libpq I can do that. There is a fair
bit of encoding code in there already but the information you need here
is specifically: is this char a control character and in particular, is
it a newline.

Most standard encoding implementations return a screen width of zero
for control characters, but we dont do that. I am hopeful once we have
ICU support we can do away with the enumeration since we can simply
read the Unicode Character database and get consistant results
everywhere for all encodings. But until that is done I think this is
the best we can do.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: [PATCH] Formatting patch for psql

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> ... There is a fair
> bit of encoding code in there already but the information you need here
> is specifically: is this char a control character and in particular, is
> it a newline.

The appropriate test for that is just "ch == '\n'", in all the encodings
we support (although you do have to be careful that ch isn't a non-first
byte of a multibyte character).  If that's all you need then inventing
a bunch of additional infrastructure is really inappropriate.

            regards, tom lane

Re: [PATCH] Formatting patch for psql

From
Martijn van Oosterhout
Date:
On Fri, Nov 25, 2005 at 09:58:21AM -0500, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > ... There is a fair
> > bit of encoding code in there already but the information you need here
> > is specifically: is this char a control character and in particular, is
> > it a newline.
>
> The appropriate test for that is just "ch == '\n'", in all the encodings
> we support (although you do have to be careful that ch isn't a non-first
> byte of a multibyte character).  If that's all you need then inventing
> a bunch of additional infrastructure is really inappropriate.

Well, not quite. What about \r (on windows in particular) or \t or
Unicode 0xFEFF (zero width non-breaking space)? For Latin-1 and other
locales based upon ASCII I use (ch < 32) for control characters, not
entirely correct but close.

For UTF-8 I use the function ucs_wcwidth() which already existed in
psql. It already knows what the control characters are, though I have
no idea how up-to-date its table is.

The reason I need to know whether a character is a control character or
not is so it can be printed as a \uNNNN string. Otherwise any of those
characters will destroy the alignment on the screen bringing us right
back to what psql does now: splatter across the screen when there are
control characters in the output.

I didn't add any new information to psql, I just took advantage of what
was already there. The comment about using PQdsplen() is good, but the
routines in libpq are totally inadequate, the utf8 function has a huge
FIXME next to it.

Would people prefer a patch that brought the libpq routines up to
scratch and have mbprint use that?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: [PATCH] Formatting patch for psql

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> Would people prefer a patch that brought the libpq routines up to
> scratch and have mbprint use that?

I would.

            regards, tom lane

Re: [PATCH] Formatting patch for psql

From
Martijn van Oosterhout
Date:
On Fri, Nov 25, 2005 at 11:08:45AM -0500, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > Would people prefer a patch that brought the libpq routines up to
> > scratch and have mbprint use that?
>
> I would.

ACK. Will look into it.
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment