Thread: client side syntax error localisation for psql (v1)

client side syntax error localisation for psql (v1)

From
Fabien COELHO
Date:
Dear patchers,

Please find attached my first attempt at providing a localisation
information on the client side for syntax errors in "psql". Basically, one
function is added which takes care of the burden. I put a lot of
comments in the function to try make it clear what is going on.

It validates for me against current cvs head.

I've also added new regression tests in "errors.sql".

Although I have taken great and painful care to deal with multi-byte
encodings, I'm not sure how I can validate that, as I don't have some
japanese terminal to play with.

Have a nice day,

-- 
Fabien Coelho - coelho@cri.ensmp.fr

Re: client side syntax error localisation for psql (v1)

From
Fabien COELHO
Date:
> Dear patchers,

Sorry, wrong list:-(

-- 
Fabien.


Re: client side syntax error localisation for psql (v1)

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
> Please find attached my first attempt at providing a localisation
> information on the client side for syntax errors in "psql".

"Localisation" tends to mean something quite different around here.
I'd suggest renaming the function to something like ReportSyntaxErrorPosition.

Would it read better to add "..." at the front and/or back of the query
extract, when you have actually truncated text at that end?

The "on line N" bit seems just noise to me.
        regards, tom lane


Re: client side syntax error localisation for psql (v1)

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> The "on line N" bit seems just noise to me.

> It depends.

I can see that it would be useful in a very large query.  Perhaps
include it only when the query has more than N lines, for some N
like three to five or so?

Another possibility is to keep the cursor as just "^", and bury the
line info in the query extract.  For instance:

Short single-line query, no truncation needed:
QUERY: SELECT * FROM foo WHRE bar;QUERY:                   ^

Truncation on the right just adds ..., no other change needed:
QUERY: SELECT * FROM foo WHRE lots-of-conditions...QUERY:                   ^

When you truncate on the left, count the number of newlines removed,
and emit "LINE n" if there were any:
QUERY: LINE 4: FROM foo WHRE lots-of-conditions...QUERY:                  ^
orQUERY: LINE 4: ...FROM foo WHRE lots-of-conditions...QUERY:                     ^

(So "LINE n" would never say "LINE 1", only larger line numbers.)

Here I'm imagining that the leading ... gets inserted when text has been
removed after the start of the indicated line.  So another possible
output format is
QUERY: ...FROM foo WHRE lots-of-conditions...QUERY:             ^

if you removed some text but not any newlines from the start of the
query.

I think this wouldn't be terribly complex to implement, and it would
make things look fairly nice for both short and long queries.

One last point: it seems a bit odd to use QUERY: as the prefix for both
the query extract and the cursor line.  I don't have a suggestion what
to use instead, offhand.
        regards, tom lane


Re: client side syntax error localisation for psql (v1)

From
Fabien COELHO
Date:
Dear Tom,

> >> The "on line N" bit seems just noise to me.
> > It depends.
>
> I can see that it would be useful in a very large query.  Perhaps
> include it only when the query has more than N lines, for some N
> like three to five or so?

Yes, I can do that.

> Another possibility is to keep the cursor as just "^", and bury the
> line info in the query extract.  For instance:

Well, I want to preserve space for the query extract, that's where the
context information is! If I put more information there, I'll have to
reduce the extract length.

Moreover the cursor line information content is small, so it seems better
to put the line information there.

> (So "LINE n" would never say "LINE 1", only larger line numbers.)

I agree that LINE 1 looks pretty useless, especially if there is only
one line;-)

> I think this wouldn't be terribly complex to implement, and it would
> make things look fairly nice for both short and long queries.

There is also an alignment issue here, as depending on the number of
figures, the cursor line would have to be fixed accordingly.

> One last point: it seems a bit odd to use QUERY: as the prefix for both
> the query extract and the cursor line.  I don't have a suggestion what
> to use instead, offhand.

I agree, but couldn't think of a better choice either.

There is also a localisation issue here, as the translation of both lines
must match so that the alignment is kept. I thought that if it is the very
same word, the translation should be the same.

Thanks for your comments and ideas, I'll submit a v3 on (my) tomorrow.

Have a nice day,

-- 
Fabien Coelho - coelho@cri.ensmp.fr


Re: client side syntax error localisation for psql (v1)

From
Fabien COELHO
Date:
Dear Tom,

> > Please find attached my first attempt at providing a localisation
> > information on the client side for syntax errors in "psql".
>
> "Localisation" tends to mean something quite different around here. I'd
> suggest renaming the function to something like ReportSyntaxErrorPosition.

Ok. "Localisation" was the good french word;-)

> Would it read better to add "..." at the front and/or back of the query
> extract, when you have actually truncated text at that end?

I thought about it as well, but did not implement it.
I agree that it would look better. I can have a go.

> The "on line N" bit seems just noise to me.

It depends.

If you're in the middle of a sql-script as I often do, giving the line
within the statement isn't bad. Say:

SELECT o.titre
FROM collections AS col, ouvrages AS o, a_ecrit AS e, auteurs AS a
WHERE o.id=e.oeuvre AND e.auteur=a.id AND o.collection=col.idAND col.nom LIKE '%Spirou%' AND a.nom<>'Franquin'
EXCEPT
SELECT o.titre
FROM collections AS col, ouvrages AS o, a_ecrit AS e, auteurs AS a
WHERE o.id=e.oeuvre AND e.auteur=a.id AND o.collection=col.idAND col.nom LIKE '%Spirou%' AND a.nom='Franquin';

All lines look the same... So I would prefer to keep it.

Well, if it is the only issue against the adoption of the patch, I will do
without the line number.

I'll submit a new patch in a moment.

Thanks for your comments,

-- 
Fabien.


Re: client side syntax error localisation for psql (v1)

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> Another possibility is to keep the cursor as just "^", and bury the
>> line info in the query extract.  For instance:

> Well, I want to preserve space for the query extract, that's where the
> context information is! If I put more information there, I'll have to
> reduce the extract length.

But in the form you are using, you have to reserve that space anyway.
Consider an error marker at the end of the line:
QUERY: select 'text that is long enough to be a problem':QUERY:                                                  ^ on
line1
 

Having to leave space for "on line 1" at the right end is not better
than putting "line 1:" at the left end.

> Moreover the cursor line information content is small, so it seems better
> to put the line information there.

It seems confusing to me.  The cursor is just a marker.  Consider also
the next step up from this scheme: instead of assuming a dumb terminal,
suppose you can use color or underlining or reverse-video or some such.
Then you might want to do just one line of output:
       QUERY: SELECT foo FROM bar WHRE baz;                                  ----

where my underlining represents using the terminal's highlight ability.
With such a scheme, putting LINE n on the same line fits naturally.

> There is also an alignment issue here, as depending on the number of
> figures, the cursor line would have to be fixed accordingly.

Sure, but you have to buy into that anyway if you are going to do "...".
It's not going to add more than about one strlen() call to what you need
to do.

> There is also a localisation issue here, as the translation of both lines
> must match so that the alignment is kept. I thought that if it is the very
> same word, the translation should be the same.

Hmm, that's a good point, although perhaps you could insert a note to
translators to add space as needed to make the two strings the same
length.
        regards, tom lane


Re: client side syntax error localisation for psql (v1)

From
Peter Eisentraut
Date:
Fabien COELHO wrote:
> There is also a localisation issue here, as the translation of both
> lines must match so that the alignment is kept. I thought that if it
> is the very same word, the translation should be the same.

You can just indent with as many spaces.  This is done in other places 
as well.
Btw., a better word than "query" would be "statement".



Re: client side syntax error localisation for psql (v1)

From
Tatsuo Ishii
Date:
> Please find attached my first attempt at providing a localisation
> information on the client side for syntax errors in "psql". Basically, one
> function is added which takes care of the burden. I put a lot of
> comments in the function to try make it clear what is going on.
> 
> It validates for me against current cvs head.
> 
> I've also added new regression tests in "errors.sql".
> 
> Although I have taken great and painful care to deal with multi-byte
> encodings, I'm not sure how I can validate that, as I don't have some
> japanese terminal to play with.

As far as I understand your code, it will be broken on many multi byte
encodings.

1) a character is not always represented on a terminal propotional to  the storage size. For example a kanji character
inUTF-8 encoding  has a storage size of 3 bytes while it occupies spaces only twice  of ASCII characters on a terminal.
Samething can be said to LATIN  2,3 etc. in UTF-8 perhaps.
 

2) It assume all encodings are "ASCII compatible". Apparently some  client-side-only encodings do not satisfy this
request.Examples  include SJIS, Big5.
 

I think 2) would be solved by carefull codings. 

However 1) is hard to solve. We need kind of a mapping table between
"storage size" and "terminal size" for each encoding. Note that this
is not new problem, and sophiscated editors such as Emacs (mule) has
already concur that.
--
Tatsuo Ishii


Re: client side syntax error localisation for psql (v1)

From
Fabien COELHO
Date:
Dear Tom,

> > Well, I want to preserve space for the query extract, that's where the
> > context information is! If I put more information there, I'll have to
> > reduce the extract length.
>
> But in the form you are using, you have to reserve that space anyway.
> Consider an error marker at the end of the line:
>
>     QUERY: select 'text that is long enough to be a problem':
>     QUERY:                                                  ^ on line 1
>
> Having to leave space for "on line 1" at the right end is not better
> than putting "line 1:" at the left end.

Yes, I partly agree. However the space above can be used nevertheless for
the extract, so it is not totally "lost":

QUERY: select 'text that is long enough to be a problem': from foo ...
QUERY:                                                  ^ on line 1

> > Moreover the cursor line information content is small, so it seems better
> > to put the line information there.
>
> It seems confusing to me.  The cursor is just a marker.

Sure.

> Consider also the next step up from this scheme: instead of assuming a
> dumb terminal, suppose you can use color or underlining or reverse-video
> or some such. Then you might want to do just one line of output:
>
>         QUERY: SELECT foo FROM bar WHRE baz;
>                                    ----
> where my underlining represents using the terminal's highlight ability.
> With such a scheme, putting LINE n on the same line fits naturally.

Sure, but the current status of psql is to have no such thing, and
I'm not going to start terminal-dependent features in the software;-)
I just want to help my students with syntax errors.

As a compromise, I can suggest the following:

LINE 4: WHERE id=123 AND name LIKE 'calvin' GROP BY name...                                           ^

So "QUERY:" is dropped out. If there is very few lines, I can just
put "LINE:". I'll have to deal with the alignment problem, no big deal,
but not very nice either.

> > There is also an alignment issue here, as depending on the number of
> > figures, the cursor line would have to be fixed accordingly.
>
> Sure, but you have to buy into that anyway if you are going to do "...".

Well, I know that "..." is 3 char long, so I can put a "   " in the
cursor line in that case. If it is a figure, the length depends on its
value.

> It's not going to add more than about one strlen() call to what you need
> to do.

I cannot strlen an integer value;-) I have to convert to a string, or
deal directly with the line number.

With my new current implementation, the line number is only shown if the
query is more than 3 lines.

-- 
Fabien Coelho - coelho@cri.ensmp.fr


Re: client side syntax error localisation for psql (v1)

From
Fabien COELHO
Date:
Dear Tatsuo,

Thanks for your reply, as I noticed from the source code that your name
often appears when dealing with multi-byte issues;-)

On Fri, 12 Mar 2004, Tatsuo Ishii wrote:
> As far as I understand your code, it will be broken on many multi byte
> encodings.
>
> 1) a character is not always represented on a terminal propotional to
>    the storage size. For example a kanji character in UTF-8 encoding
>    has a storage size of 3 bytes while it occupies spaces only twice
>    of ASCII characters on a terminal. Same thing can be said to LATIN
>    2,3 etc. in UTF-8 perhaps.

I thought I dealt with that in the code by calling PQmblen for every char.
Am I wrong ?

> 2) It assume all encodings are "ASCII compatible". Apparently some
>    client-side-only encodings do not satisfy this request. Examples
>    include SJIS, Big5.

What I mean by "ASCII compatible" is that spaces, new lines, carriage
returns, tabs and NULL (C string terminaison) are one byte characters.
This assumption seemed pretty safe to me.

If this is not the case, I cannot understand how any error message could
work in psql. If one printf(" "), that would not be a space character?
Or is the terminal doing some "on the fly" translation?? What if a
file is read with such encoding??? Or is there a special compilation
option to generate special strings, but in this case the executable
would not be compatible with any other terminal????

Well, I just underline my lack of knowledge here:-(

If not, how can I detect these special characters that I need to change ?
Maybe I could translate the string to a pg_wchar[] if the function is
available to psql?

Also as I quick and dirty temporary fix, I can skip statement extraction
for those encodings that do not meet my expectations. So I would need to
know what encodings are at risk with the current scheme?

-- 
Fabien Coelho - coelho@cri.ensmp.fr


Re: client side syntax error localisation for psql (v1)

From
Fabien COELHO
Date:
> Fabien COELHO wrote:
> > There is also a localisation issue here, as the translation of both
> > lines must match so that the alignment is kept. I thought that if it
> > is the very same word, the translation should be the same.
>
> You can just indent with as many spaces.  This is done in other places
> as well.

Yes, but then I need to know the length of the translation...
I did that, so I hope sprintf did the translation job.

Thanks,

-- 
Fabien Coelho - coelho@cri.ensmp.fr


Re: client side syntax error localisation for psql (v1)

From
Tatsuo Ishii
Date:
> Dear Tatsuo,
> 
> Thanks for your reply, as I noticed from the source code that your name
> often appears when dealing with multi-byte issues;-)
> 
> On Fri, 12 Mar 2004, Tatsuo Ishii wrote:
> > As far as I understand your code, it will be broken on many multi byte
> > encodings.
> >
> > 1) a character is not always represented on a terminal propotional to
> >    the storage size. For example a kanji character in UTF-8 encoding
> >    has a storage size of 3 bytes while it occupies spaces only twice
> >    of ASCII characters on a terminal. Same thing can be said to LATIN
> >    2,3 etc. in UTF-8 perhaps.
> 
> I thought I dealt with that in the code by calling PQmblen for every char.
> Am I wrong ?

PQmblen returns the storage size, which is not necessarily same as the
character width reprensented in a terminal. For example for a kanji
character in UTF-8 PQmblen returns 3, but it ocuppies 2 x ASCII
character space, not x 3. Isn't that a problem for you?

> > 2) It assume all encodings are "ASCII compatible". Apparently some
> >    client-side-only encodings do not satisfy this request. Examples
> >    include SJIS, Big5.
> 
> What I mean by "ASCII compatible" is that spaces, new lines, carriage
> returns, tabs and NULL (C string terminaison) are one byte characters.
> This assumption seemed pretty safe to me.
> 
> If this is not the case, I cannot understand how any error message could
> work in psql. If one printf(" "), that would not be a space character?
> Or is the terminal doing some "on the fly" translation?? What if a
> file is read with such encoding??? Or is there a special compilation
> option to generate special strings, but in this case the executable
> would not be compatible with any other terminal????
> 
> Well, I just underline my lack of knowledge here:-(
> 
> If not, how can I detect these special characters that I need to change ?
> Maybe I could translate the string to a pg_wchar[] if the function is
> available to psql?

I think you can do it safely using PQmblen.

1) start from the begining of the target string

2) apply PQmblen

3) if it returns 1, you can do the spcecial character detection

4) otherwise it must not be an ASCII character and you can skip as  many characters as PQmnlen returns

5) goto 1) if any characters remain

> Also as I quick and dirty temporary fix, I can skip statement extraction
> for those encodings that do not meet my expectations. So I would need to
> know what encodings are at risk with the current scheme?
> 
> -- 
> Fabien Coelho - coelho@cri.ensmp.fr
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
>                http://archives.postgresql.org
> 


Re: client side syntax error localisation for psql (v1)

From
Tatsuo Ishii
Date:
> PQmblen returns the storage size, which is not necessarily same as the
> character width reprensented in a terminal. For example for a kanji
> character in UTF-8 PQmblen returns 3, but it ocuppies 2 x ASCII
> character space, not x 3. Isn't that a problem for you?
> 
> > > 2) It assume all encodings are "ASCII compatible". Apparently some
> > >    client-side-only encodings do not satisfy this request. Examples
> > >    include SJIS, Big5.
> > 
> > What I mean by "ASCII compatible" is that spaces, new lines, carriage
> > returns, tabs and NULL (C string terminaison) are one byte characters.
> > This assumption seemed pretty safe to me.
> > 
> > If this is not the case, I cannot understand how any error message could
> > work in psql. If one printf(" "), that would not be a space character?
> > Or is the terminal doing some "on the fly" translation?? What if a
> > file is read with such encoding??? Or is there a special compilation
> > option to generate special strings, but in this case the executable
> > would not be compatible with any other terminal????
> > 
> > Well, I just underline my lack of knowledge here:-(
> > 
> > If not, how can I detect these special characters that I need to change ?
> > Maybe I could translate the string to a pg_wchar[] if the function is
> > available to psql?
> 
> I think you can do it safely using PQmblen.
> 
> 1) start from the begining of the target string
> 
> 2) apply PQmblen
> 
> 3) if it returns 1, you can do the spcecial character detection
> 
> 4) otherwise it must not be an ASCII character and you can skip as
>    many characters as PQmnlen returns
> 
> 5) goto 1) if any characters remain         ~~of course this should be 2)
--
Tatsuo Ishii


Re: client side syntax error localisation for psql (v1)

From
Fabien COELHO
Date:
Dear Tatsuo,

> > > 1) a character is not always represented on a terminal propotional to
> > >    the storage size. For example a kanji character in UTF-8 encoding
> > >    has a storage size of 3 bytes while it occupies spaces only twice
> > >    of ASCII characters on a terminal. Same thing can be said to LATIN
> > >    2,3 etc. in UTF-8 perhaps.
> >
> > I thought I dealt with that in the code by calling PQmblen for every char.
> > Am I wrong ?
>
> PQmblen returns the storage size, which is not necessarily same as the
> character width reprensented in a terminal. For example for a kanji
> character in UTF-8 PQmblen returns 3, but it ocuppies 2 x ASCII
> character space, not x 3. Isn't that a problem for you?

If I read you correctly, you mean that 1 character may take 3 bytes
of storage in the string, but it is not guaranteed to be 1 character
from the terminal perspective... Argh, that's definitely an issue:-(

I assumed that one character whatever the encoding would be 1 character
on the display.

If it is not the case, I think I can put/compute this information in the
translation structures that is use by PQmblen, and implement a
PQmbtermlen function...

Maybe you could point me some source of information about display lengths
of characters depending on the encoding?

> > What I mean by "ASCII compatible" is that spaces, new lines, carriage
> > returns, tabs and NULL (C string terminaison) are one byte characters.
> > This assumption seemed pretty safe to me.
>
> I think you can do it safely using PQmblen.

Ok, what you describe is basically what I've done with the qidx
computation as suggested by Tom Lane and then later I check that the
encoded length is one to find my special characters.

Thanks for you reply,

-- 
Fabien Coelho - coelho@cri.ensmp.fr


Re: client side syntax error localisation for psql (v1)

From
Tatsuo Ishii
Date:
> > PQmblen returns the storage size, which is not necessarily same as the
> > character width reprensented in a terminal. For example for a kanji
> > character in UTF-8 PQmblen returns 3, but it ocuppies 2 x ASCII
> > character space, not x 3. Isn't that a problem for you?
> 
> If I read you correctly, you mean that 1 character may take 3 bytes
> of storage in the string, but it is not guaranteed to be 1 character
> from the terminal perspective... Argh, that's definitely an issue:-(
> I assumed that one character whatever the encoding would be 1 character
> on the display.

That's not correct...

One thing I have to note is that some Asian characters such as
Japanese, Chinese require twice the space on a terminal for each
character comparing with plain ASCII characters. This is hard to
explain to those who are not familiar with kanji... Could you take a
look at included screen shot?  As you can see there are four ASCII
characters in the first line. On the second line there are *two* kanji
characters and they occupy same space as above four ASCII
characters. Moreover the strage size for the first line is 4, but the
strage size for the second line may vary depending on the encoding. If
the encoding is EUC_JP or SJIS, it takes 4 bytes, however it takes 6
bytes if the encoding is UTF-8. Got it?

> If it is not the case, I think I can put/compute this information in the
> translation structures that is use by PQmblen, and implement a
> PQmbtermlen function...
> 
> Maybe you could point me some source of information about display lengths
> of characters depending on the encoding?

I could write "PQmbtermlen" function for every encoding supported by
PostgreSQL except UTF-8. Such kind of info for UTF-8 might be quite
complex. I believe there are some mapping tables or functions to get
such kind of info somewhere on the Internet, but I don't remember.

> > I think you can do it safely using PQmblen.
> 
> Ok, what you describe is basically what I've done with the qidx
> computation as suggested by Tom Lane and then later I check that the
> encoded length is one to find my special characters.

Oh, ok.

> Thanks for you reply,

You are welcome!
--
Tatsuo Ishii


Re: client side syntax error localisation for psql (v1)

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
> As a compromise, I can suggest the following:

> LINE 4: WHERE id=123 AND name LIKE 'calvin' GROP BY name...
>                                             ^

That works for me.  I don't mind it saying LINE 1: in the one-line case.

>> It's not going to add more than about one strlen() call to what you need
>> to do.

> I cannot strlen an integer value;-) I have to convert to a string, or
> deal directly with the line number.

Well, what I'm imagining is that you sprintf("LINE %d") and then strlen
that in the process of constructing the first line, and later add that
length to the offset you need in the second line.

Tatsuo's complaints about characters that span more than one position
are a much nastier problem, in any case :-(.  Can we cope?
        regards, tom lane


Re: client side syntax error localisation for psql (v1)

From
Fabien COELHO
Date:
Dear Tatsuo,

> One thing I have to note is that some Asian characters such as Japanese,
> Chinese require twice the space on a terminal for each character
> comparing with plain ASCII characters. This is hard to explain to those
> who are not familiar with kanji...

I learnt a little bit of chinese a few years ago, but I never saw the
computer version.

> Could you take a look at included screen shot?

I haven't found it. However I've made a little bit of trying with my
emacs 21 mule demonstration, and indeed there is two different character
widths...

> As you can see there are four ASCII characters in the first line. On the
> second line there are *two* kanji characters and they occupy same space
> as above four ASCII characters. Moreover the strage size for the first
> line is 4, but the strage size for the second line may vary depending on
> the encoding. If the encoding is EUC_JP or SJIS, it takes 4 bytes,
> however it takes 6 bytes if the encoding is UTF-8. Got it?

Yep.

> > Maybe you could point me some source of information about display lengths
> > of characters depending on the encoding?
>
> I could write "PQmbtermlen" function for every encoding supported by
> PostgreSQL

That would be great ! ;-)

> except UTF-8. Such kind of info for UTF-8 might be quite
> complex. I believe there are some mapping tables or functions to get
> such kind of info somewhere on the Internet, but I don't remember.

That would be even greater;-)

I guess a "return 1" version would be a false quick fix that could
be improved later on...

-- 
Fabien COELHO.


Re: client side syntax error localisation for psql (v1)

From
Tatsuo Ishii
Date:
> > Could you take a look at included screen shot?
> 
> I haven't found it. However I've made a little bit of trying with my

Oops. I have included this time.

> > > Maybe you could point me some source of information about display lengths
> > > of characters depending on the encoding?
> >
> > I could write "PQmbtermlen" function for every encoding supported by
> > PostgreSQL
> 
> That would be great ! ;-)

Ok, I will work on this.

> > except UTF-8. Such kind of info for UTF-8 might be quite
> > complex. I believe there are some mapping tables or functions to get
> > such kind of info somewhere on the Internet, but I don't remember.
> 
> That would be even greater;-)
> 
> I guess a "return 1" version would be a false quick fix that could
> be improved later on...

Agreeed.

Re: client side syntax error localisation for psql (v1)

From
Fabien COELHO
Date:
On Sat, 13 Mar 2004, Tatsuo Ishii wrote:

> Oops. I have included this time.

How ! a japanese vi !

> > > I could write "PQmbtermlen" function for every encoding supported by
> > > PostgreSQL
> >
> > That would be great ! ;-)
>
> Ok, I will work on this.

Thanks.

-- 
Fabien Coelho - coelho@cri.ensmp.fr