Thread: encoding of PostgreSQL messages

encoding of PostgreSQL messages

From

Karsten Hilbert

Date:

23 December 2008, 17:24:23

Hi all !

How can I programmatically detect which encoding a
PostgreSQL server I am trying to connect to sends back
messages -- before I connect (so client_encoding and
the pg_settings table are flat out).

Thanks,
Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: encoding of PostgreSQL messages

From

Alvaro Herrera

Date:

23 December 2008, 17:45:28

Karsten Hilbert wrote:
> Hi all !
>
> How can I programmatically detect which encoding a
> PostgreSQL server I am trying to connect to sends back
> messages -- before I connect (so client_encoding and
> the pg_settings table are flat out).

Hmm, isn't client_encoding reported in the startup packet sent by the
server, after auth?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: encoding of PostgreSQL messages

From

Karsten Hilbert

Date:

23 December 2008, 18:17:49

On Tue, Dec 23, 2008 at 06:45:17PM -0300, Alvaro Herrera wrote:

> > How can I programmatically detect which encoding a
> > PostgreSQL server I am trying to connect to sends back
> > messages -- before I connect (so client_encoding and
> > the pg_settings table are flat out).
>
> Hmm, isn't client_encoding reported in the startup packet sent by the
> server, after auth?

That would not quite be enough -- I am talking about
messages reported *during* auth, say

    FATAL: password authentication failed for user "postgres"

or

    fe_sendauth: no password supplied

both of which, in other locales, may contain non-ASCII characters.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: encoding of PostgreSQL messages

From

John DeSoi

Date:

24 December 2008, 00:10:39

On Dec 23, 2008, at 4:24 PM, Karsten Hilbert wrote:

> How can I programmatically detect which encoding a
> PostgreSQL server I am trying to connect to sends back
> messages -- before I connect (so client_encoding and
> the pg_settings table are flat out).

I don't think there is a way because you can't get any information
without authorizing first.

It is also unclear to me how things work in the other direction. When
authenticating, what if the user or database name have non-ascii
characters. Are they interpreted in the encoding of the server since
the client has not established an encoding?

John DeSoi, Ph.D.

Re: encoding of PostgreSQL messages

From

Alvaro Herrera

Date:

29 December 2008, 08:07:23

Karsten Hilbert wrote:
> On Tue, Dec 23, 2008 at 06:45:17PM -0300, Alvaro Herrera wrote:

> > Hmm, isn't client_encoding reported in the startup packet sent by the
> > server, after auth?
>
> That would not quite be enough -- I am talking about
> messages reported *during* auth, say
>
>     FATAL: password authentication failed for user "postgres"
>
> or
>
>     fe_sendauth: no password supplied
>
> both of which, in other locales, may contain non-ASCII characters.

Those are sent in the server encoding IIRC (which admittedly you don't
have a way to know, at that point.)

And I'm now wondering if we should delay initializing the translation
stuff until after client_encoding has been reported.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: encoding of PostgreSQL messages

From

Karsten Hilbert

Date:

31 December 2008, 05:47:04

On Mon, Dec 29, 2008 at 09:07:14AM -0300, Alvaro Herrera wrote:

> > That would not quite be enough -- I am talking about
> > messages reported *during* auth, say
> >
> >     FATAL: password authentication failed for user "postgres"
> >
> > or
> >
> >     fe_sendauth: no password supplied
> >
> > both of which, in other locales, may contain non-ASCII characters.
>
> Those are sent in the server encoding IIRC (which admittedly you don't
> have a way to know, at that point.)
>
> And I'm now wondering if we should delay initializing the translation
> stuff until after client_encoding has been reported.

Or else

- just don't pass those messages through gettext so they are
  always in 7 bit ASCII English

- do pass them through gettext but append a 7-bit ASCII
  filter so things do get passed in pseudo ASCII (this will
  work for many singlebyte encodings but more often than
  not for multibyte ones)

Both changes would be less intrusive than postponing the
translation.

If I had a choice I would opt for the first.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: encoding of PostgreSQL messages

From

Tom Lane

Date:

31 December 2008, 12:23:16

Karsten Hilbert <Karsten.Hilbert@gmx.net> writes:
> On Mon, Dec 29, 2008 at 09:07:14AM -0300, Alvaro Herrera wrote:
>> And I'm now wondering if we should delay initializing the translation
>> stuff until after client_encoding has been reported.

> Or else

> - just don't pass those messages through gettext so they are
>   always in 7 bit ASCII English

What's the difference?  The user-visible result would be the same
AFAICS.  (One or the other might be less messy internally, but I'm
not sure which offhand.)

            regards, tom lane

Re: encoding of PostgreSQL messages

From

"Karsten Hilbert"

Date:

31 December 2008, 12:57:35

> Karsten Hilbert <Karsten.Hilbert@gmx.net> writes:
> > On Mon, Dec 29, 2008 at 09:07:14AM -0300, Alvaro Herrera wrote:
> >> And I'm now wondering if we should delay initializing the translation
> >> stuff until after client_encoding has been reported.
>
> > Or else
>
> > - just don't pass those messages through gettext so they are
> >   always in 7 bit ASCII English
>
> What's the difference?  The user-visible result would be the same
> AFAICS.  (One or the other might be less messy internally, but I'm
> not sure which offhand.)

That was the reason for the suggestion: perhaps less messy and surely lower impact on the existing
code as it would not mean moving code later in the initialization but rather just removing the
gettext wrappers around a few strings. No difference in the result.

The difference to my other suggestion (no translation vs. translation but then replacing
characters > 127 by, say '?' or a space) is:

I could *assume* a given encoding, namely 7 bit ASCII. Or rather I could assume
that I can display the message as "something pretty similar to what the original message said,
perhaps without umlauts and accents but still recognizable in the local language".

Now, surely, I could dig down the layers to where "my application space" receives the message
from PostgreSQL and filter there. It is, however, good to have some knowledge of the encoding
where knowledge can be had.

The concrete problem is this: I connect to PostgreSQL from Python. Let's assume PG is set to German.
If the wrong password is supplied the PG error message string contains an umlaut. This is passed to
libpq, which in turn passes it to the C part of psycopg2 which then turns this into an exception. An
exception, by default in Python, is printed to the console, which may be in any encoding incompatible
with the latin1 the PG message happens to be in. Thus, printing the PG message may or may not fail
due to Unicode de-/encoding errors.

The solution is to find the right layer to take control of the encoding but this is eventually only possible
if the encoding is *known*. Thus the plea for "7-bit-ascii English by default until the encoding *can* be
known". Going to "7-bit-ascii filter of the original by default until the encoding can be known" only
tries to preserve a bit more of the original language. I may be wrong in feasibility.

Thanks for considering,
Karsten
--
Sensationsangebot verlängert: GMX FreeDSL - Telefonanschluss + DSL
für nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K1308T4569a

Re: encoding of PostgreSQL messages

From

Peter Eisentraut

Date:

31 December 2008, 14:02:07

On Wednesday 31 December 2008 18:57:29 Karsten Hilbert wrote:
> The solution is to find the right layer to take control of the encoding but
> this is eventually only possible if the encoding is *known*. Thus the plea
> for "7-bit-ascii English by default until the encoding *can* be known".
> Going to "7-bit-ascii filter of the original by default until the encoding
> can be known" only tries to preserve a bit more of the original language. I
> may be wrong in feasibility.

The proper fix is probably to include the client encoding in the connection
startup message.

Re: encoding of PostgreSQL messages

From

"Karsten Hilbert"

Date:

31 December 2008, 14:16:21

> On Wednesday 31 December 2008 18:57:29 Karsten Hilbert wrote:
> > The solution is to find the right layer to take control of the encoding
> but
> > this is eventually only possible if the encoding is *known*. Thus the
> plea
> > for "7-bit-ascii English by default until the encoding *can* be known".
> > Going to "7-bit-ascii filter of the original by default until the
> encoding
> > can be known" only tries to preserve a bit more of the original
> language. I
> > may be wrong in feasibility.
>
> The proper fix is probably to include the client encoding in the
> connection startup message.

Absolutely, either the desired client encoding being sent to the server or the current
server encoding being sent to the client.

It would, however, take way longer to trickle down into being supported by client
interfaces such as psycopg2 <evil grin>

A fixed startup encoding would not need support from those parts of the equation.

Karsten
--
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger

Re: encoding of PostgreSQL messages

From

Tom Lane

Date:

31 December 2008, 14:23:53

Peter Eisentraut <peter_e@gmx.net> writes:
> On Wednesday 31 December 2008 18:57:29 Karsten Hilbert wrote:
>> The solution is to find the right layer to take control of the encoding but
>> this is eventually only possible if the encoding is *known*. Thus the plea
>> for "7-bit-ascii English by default until the encoding *can* be known".
>> Going to "7-bit-ascii filter of the original by default until the encoding
>> can be known" only tries to preserve a bit more of the original language. I
>> may be wrong in feasibility.

> The proper fix is probably to include the client encoding in the connection
> startup message.

What of errors occurring before such an option could be applied?

I think that ultimately it's necessary to accept that there will be some
window during connection startup where sending plain ASCII (English)
messages is the best recourse.  I'm not sure what the best way to
implement that is.  On reflection though, trying to mark the individual
messages that might need that treatment doesn't seem like a winner:
there's too much possibility for mistakes, or code drift causing a
marking to become wrong; and I'm not even convinced that the very same
source-code message might not fall into both categories at different
times.  So having a mode switch somehow occurring inside elog.c once
we are ready to translate seems like the right approach.

            regards, tom lane

Re: encoding of PostgreSQL messages

From

"Karsten Hilbert"

Date:

31 December 2008, 14:44:37

> > The proper fix is probably to include the client encoding in the
> connection
> > startup message.
>
> What of errors occurring before such an option could be applied?
>
> I think that ultimately it's necessary to accept that there will be some
> window during connection startup where sending plain ASCII (English)
> messages is the best recourse.  I'm not sure what the best way to
> implement that is.  On reflection though, trying to mark the individual
> messages that might need that treatment doesn't seem like a winner:
> there's too much possibility for mistakes, or code drift causing a
> marking to become wrong; and I'm not even convinced that the very same
> source-code message might not fall into both categories at different
> times.  So having a mode switch somehow occurring inside elog.c once
> we are ready to translate seems like the right approach.

Hm, so maybe both Peter and Alvaro are right:

1) Setting the translation wrapper to a NOOP as early as possible.

    Thus, the first messages are sent in 7-bit ASCII English.

2) Including the server encoding in the very (?) first message sent to the
    client *after* which ...

3) ... proper initialization of the gettext system can happen because now
    the client can know what to make of them subsequent messages.

Thereby no strings should need special marking and proper translation or
not is achieved automatically ?

It is surely acceptable to see the very first messages in English and/or
7-bit ASCII - I just want to be able to know their encoding either by definition
or by being told.

Thanks,
Karsten
--
Sensationsangebot verlängert: GMX FreeDSL - Telefonanschluss + DSL
für nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K1308T4569a

Re: encoding of PostgreSQL messages

From

"Karsten Hilbert"

Date:

31 December 2008, 14:49:00

> Hm, so maybe both Peter and Alvaro are right:
>
> 1) Setting the translation wrapper to a NOOP as early as possible.
>
>     Thus, the first messages are sent in 7-bit ASCII English.

Despite being *marked* for translation and a translation
to exist in the .po file, that is.

Karsten
--
Sensationsangebot verlängert: GMX FreeDSL - Telefonanschluss + DSL
für nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K1308T4569a

Re: encoding of PostgreSQL messages

From

Peter Eisentraut

Date:

01 January 2009, 14:34:03

On Wednesday 31 December 2008 20:23:47 Tom Lane wrote:
> > The proper fix is probably to include the client encoding in the
> > connection startup message.
>
> What of errors occurring before such an option could be applied?

Connection errors are handled by the client, which knows the client encoding.
If the setting of the client encoding would be one of the first things to be
done on the server side, you would only have a handful of possible error
conditions left (e.g., setlocale failed, out of memory).  You could choose to
report those in plain ASCII or send a special error code that the client can
resolve.  Although I guess no one could fault us if "could not set language"
is reported not translated. ;-)

> I think that ultimately it's necessary to accept that there will be some
> window during connection startup where sending plain ASCII (English)
> messages is the best recourse.

Ultimately yes.  But we currently handle the client encoding quite late in the
startup sequence so that many connection startup failure messages that are of
interest to normal users would likely be affected.  So moving the client
encoding handling to the earliest possible phase would still be desirable.

Re: encoding of PostgreSQL messages

From

Karsten Hilbert

Date:

07 January 2009, 15:16:01

Bruce, et al,

given the thread partially quoted below would this warrant a
TODO item "improve communication of encoding between client
and server regarding early startup messages" ?

A very usable band-aid for 8.4 - short of a proper fix -
would be the minimal-invasive sending of messages in 7-bit
English until server_encoding can be retrieved by the client
by current means.

Thanks,
Karsten

On Thu, Jan 01, 2009 at 08:33:56PM +0200, Peter Eisentraut wrote:
> Subject: Re: [GENERAL] encoding of PostgreSQL messages
> User-Agent: KMail/1.9.9
>
> On Wednesday 31 December 2008 20:23:47 Tom Lane wrote:
> > > The proper fix is probably to include the client encoding in the
> > > connection startup message.
> >
> > What of errors occurring before such an option could be applied?
>
> Connection errors are handled by the client, which knows the client encoding.
> If the setting of the client encoding would be one of the first things to be
> done on the server side, you would only have a handful of possible error
> conditions left (e.g., setlocale failed, out of memory).  You could choose to
> report those in plain ASCII or send a special error code that the client can
> resolve.  Although I guess no one could fault us if "could not set language"
> is reported not translated. ;-)
>
> > I think that ultimately it's necessary to accept that there will be some
> > window during connection startup where sending plain ASCII (English)
> > messages is the best recourse.
>
> Ultimately yes.  But we currently handle the client encoding quite late in the
> startup sequence so that many connection startup failure messages that are of
> interest to normal users would likely be affected.  So moving the client
> encoding handling to the earliest possible phase would still be desirable.
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: encoding of PostgreSQL messages

From

Bruce Momjian

Date:

21 January 2009, 21:10:37

Added to TODO:

    Improve encoding of connection startup messages sent to the client

        Currently some authentication error messages are sent in the server
        encoding

            * http://archives.postgresql.org/pgsql-general/2008-12/msg00801.php
            * http://archives.postgresql.org/pgsql-general/2009-01/msg00005.php

---------------------------------------------------------------------------

Karsten Hilbert wrote:
> Bruce, et al,
>
> given the thread partially quoted below would this warrant a
> TODO item "improve communication of encoding between client
> and server regarding early startup messages" ?
>
> A very usable band-aid for 8.4 - short of a proper fix -
> would be the minimal-invasive sending of messages in 7-bit
> English until server_encoding can be retrieved by the client
> by current means.
>
> Thanks,
> Karsten
>
> On Thu, Jan 01, 2009 at 08:33:56PM +0200, Peter Eisentraut wrote:
> > Subject: Re: [GENERAL] encoding of PostgreSQL messages
> > User-Agent: KMail/1.9.9
> >
> > On Wednesday 31 December 2008 20:23:47 Tom Lane wrote:
> > > > The proper fix is probably to include the client encoding in the
> > > > connection startup message.
> > >
> > > What of errors occurring before such an option could be applied?
> >
> > Connection errors are handled by the client, which knows the client encoding.
> > If the setting of the client encoding would be one of the first things to be
> > done on the server side, you would only have a handful of possible error
> > conditions left (e.g., setlocale failed, out of memory).  You could choose to
> > report those in plain ASCII or send a special error code that the client can
> > resolve.  Although I guess no one could fault us if "could not set language"
> > is reported not translated. ;-)
> >
> > > I think that ultimately it's necessary to accept that there will be some
> > > window during connection startup where sending plain ASCII (English)
> > > messages is the best recourse.
> >
> > Ultimately yes.  But we currently handle the client encoding quite late in the
> > startup sequence so that many connection startup failure messages that are of
> > interest to normal users would likely be affected.  So moving the client
> > encoding handling to the earliest possible phase would still be desirable.
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general
>
> --
> GPG key ID E4071346 @ wwwkeys.pgp.net
> E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: encoding of PostgreSQL messages

From

Hiroshi Inoue

Date:

30 January 2009, 11:11:44

Hi,

This topic seems to be related to the bug report
    [ODBC] Localized error messages, wrong charset
.

Bruce Momjian wrote:
> Added to TODO:
>
>     Improve encoding of connection startup messages sent to the client
>
>         Currently some authentication error messages are sent in the server
>         encoding

It it true ?
IIRC the backend knows nothing about the server encoding in
authentication phase.

Psqlodbc Unicode driver sends connection startup message which
contains the client_encoding(=UTF8) guc parameter. Attached is
  a trial patch so that the psqlodbc Unicode driver can get
properly localized password error messages.

regards,
Hiroshi Inoue
Index: postmaster/postmaster.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/postmaster.c,v
retrieving revision 1.570
diff -c -c -r1.570 postmaster.c
*** postmaster/postmaster.c    4 Jan 2009 22:19:59 -0000    1.570
--- postmaster/postmaster.c    30 Jan 2009 14:05:35 -0000
***************
*** 1552,1557 ****
--- 1552,1560 ----
                                              pstrdup(nameptr));
                  port->guc_options = lappend(port->guc_options,
                                              pstrdup(valptr));
+                 if (stricmp(nameptr, "client_encoding") == 0 &&
+                     stricmp(valptr, "UTF8") == 0)
+                     bind_textdomain_codeset(PG_TEXTDOMAIN("postgres"), "UTF-8");
              }
              offset = valoffset + strlen(valptr) + 1;
          }

Re: encoding of PostgreSQL messages

From

Bruce Momjian

Date:

07 February 2009, 13:14:47

Can someone comment on this?

---------------------------------------------------------------------------

Hiroshi Inoue wrote:
> Hi,
>
> This topic seems to be related to the bug report
>     [ODBC] Localized error messages, wrong charset
> .
>
> Bruce Momjian wrote:
> > Added to TODO:
> >
> >     Improve encoding of connection startup messages sent to the client
> >
> >         Currently some authentication error messages are sent in the server
> >         encoding
>
> It it true ?
> IIRC the backend knows nothing about the server encoding in
> authentication phase.
>
> Psqlodbc Unicode driver sends connection startup message which
> contains the client_encoding(=UTF8) guc parameter. Attached is
>   a trial patch so that the psqlodbc Unicode driver can get
> properly localized password error messages.
>
> regards,
> Hiroshi Inoue

> Index: postmaster/postmaster.c
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/postmaster.c,v
> retrieving revision 1.570
> diff -c -c -r1.570 postmaster.c
> *** postmaster/postmaster.c    4 Jan 2009 22:19:59 -0000    1.570
> --- postmaster/postmaster.c    30 Jan 2009 14:05:35 -0000
> ***************
> *** 1552,1557 ****
> --- 1552,1560 ----
>                                               pstrdup(nameptr));
>                   port->guc_options = lappend(port->guc_options,
>                                               pstrdup(valptr));
> +                 if (stricmp(nameptr, "client_encoding") == 0 &&
> +                     stricmp(valptr, "UTF8") == 0)
> +                     bind_textdomain_codeset(PG_TEXTDOMAIN("postgres"), "UTF-8");
>               }
>               offset = valoffset + strlen(valptr) + 1;
>           }

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: encoding of PostgreSQL messages

From

Hiroshi Inoue

Date:

08 February 2009, 08:35:01

Peter Eisentraut wrote:
> Bruce Momjian wrote:
>> Can someone comment on this?
>
> Looks like a horrible hack to me.  Recoding stuff to the client encoding
> in the server outside the existing recoding mechanism looks pretty evil
> to me.

 > Plus, it does not address the problem of what happens to
> messages sent before this, it just moves the point of "before" a bit
> earlier for some special cases.
>
> I think we have discussed more proper solutions earlier in this thread.
>  IMO the best approach would be for the client to include the client
> encoding in the startup package.

??? My patch exactly references the client_encoding included in
  the startup message.

regards,
Hiroshi Inoue

Re: encoding of PostgreSQL messages

From

Peter Eisentraut

Date:

08 February 2009, 09:21:25

Bruce Momjian wrote:
> Can someone comment on this?

Looks like a horrible hack to me.  Recoding stuff to the client encoding
in the server outside the existing recoding mechanism looks pretty evil
to me.  Plus, it does not address the problem of what happens to
messages sent before this, it just moves the point of "before" a bit
earlier for some special cases.

I think we have discussed more proper solutions earlier in this thread.
  IMO the best approach would be for the client to include the client
encoding in the startup package.

>
> ---------------------------------------------------------------------------
>
> Hiroshi Inoue wrote:
>> Hi,
>>
>> This topic seems to be related to the bug report
>>     [ODBC] Localized error messages, wrong charset
>> .
>>
>> Bruce Momjian wrote:
>>> Added to TODO:
>>>
>>>     Improve encoding of connection startup messages sent to the client
>>>
>>>         Currently some authentication error messages are sent in the server
>>>         encoding
>> It it true ?
>> IIRC the backend knows nothing about the server encoding in
>> authentication phase.
>>
>> Psqlodbc Unicode driver sends connection startup message which
>> contains the client_encoding(=UTF8) guc parameter. Attached is
>>   a trial patch so that the psqlodbc Unicode driver can get
>> properly localized password error messages.
>>
>> regards,
>> Hiroshi Inoue
>
>> Index: postmaster/postmaster.c
>> ===================================================================
>> RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/postmaster.c,v
>> retrieving revision 1.570
>> diff -c -c -r1.570 postmaster.c
>> *** postmaster/postmaster.c    4 Jan 2009 22:19:59 -0000    1.570
>> --- postmaster/postmaster.c    30 Jan 2009 14:05:35 -0000
>> ***************
>> *** 1552,1557 ****
>> --- 1552,1560 ----
>>                                               pstrdup(nameptr));
>>                   port->guc_options = lappend(port->guc_options,
>>                                               pstrdup(valptr));
>> +                 if (stricmp(nameptr, "client_encoding") == 0 &&
>> +                     stricmp(valptr, "UTF8") == 0)
>> +                     bind_textdomain_codeset(PG_TEXTDOMAIN("postgres"), "UTF-8");
>>               }
>>               offset = valoffset + strlen(valptr) + 1;
>>           }
>

Re: encoding of PostgreSQL messages

From

Tom Lane

Date:

08 February 2009, 11:38:34

Peter Eisentraut <peter_e@gmx.net> writes:
> Bruce Momjian wrote:
>> Can someone comment on this?

> Looks like a horrible hack to me.  Recoding stuff to the client encoding
> in the server outside the existing recoding mechanism looks pretty evil
> to me.  Plus, it does not address the problem of what happens to
> messages sent before this, it just moves the point of "before" a bit
> earlier for some special cases.

> I think we have discussed more proper solutions earlier in this thread.
>   IMO the best approach would be for the client to include the client
> encoding in the startup package.

Huh?  Clients already do that (or at least some are capable of it,
including libpq).  The hard problems are (1) there's still a "before",
ie we might fail before scanning the options in the packet, and (2)
the sent encoding might itself be invalid, and you still have to report
that somehow.

I believe the only real "fix" is to guarantee that messages are sent
as untranslated ASCII until we have sent an encoding indicator at
the end of the startup sequence.  Which has its own pretty clear
downside: no more translation of authorization failures.

            regards, tom lane

Re: encoding of PostgreSQL messages

From

Karsten Hilbert

Date:

08 February 2009, 14:05:26

On Sun, Feb 08, 2009 at 10:38:16AM -0500, Tom Lane wrote:

> I believe the only real "fix" is to guarantee that messages are sent
> as untranslated ASCII until we have sent an encoding indicator at
> the end of the startup sequence.  Which has its own pretty clear
> downside: no more translation of authorization failures.

I, for one, would be perfectly fine with this. It has
several advantages:

- it does allow to "know" the encoding right from the beginning

- it does not require client library changes (IOW the change does
  not need to trickle down: server -> libpq -> psycopg2 -> GNUmed)

- the type of (auth) failure can be deduced from the (now always
  English) message string which, again, allows for client-side
  translation, if so desired

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: encoding of PostgreSQL messages

From

Hiroshi Inoue

Date:

08 February 2009, 19:03:27

Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
>> Bruce Momjian wrote:
>>> Can someone comment on this?
>
>> Looks like a horrible hack to me.  Recoding stuff to the client encoding
>> in the server outside the existing recoding mechanism looks pretty evil
>> to me.  Plus, it does not address the problem of what happens to
>> messages sent before this, it just moves the point of "before" a bit
>> earlier for some special cases.
>
>> I think we have discussed more proper solutions earlier in this thread.
>>   IMO the best approach would be for the client to include the client
>> encoding in the startup package.
>
> Huh?  Clients already do that (or at least some are capable of it,
> including libpq).

Yes the psqlodbc driver has done it because protocol 3 allowed it
 from the first.

> The hard problems are (1) there's still a "before",

Yes but isn't it an improvement that properly localized password error
 or no database error etc can be seen? Currently I see unreadable
error messages for those cases via psqlodbc driver.  The attatched
patch in my previous posting is an example to solve the problem.

regards,
Hirosh Inoue

Re: encoding of PostgreSQL messages

From

Hiroshi Inoue

Date:

10 February 2009, 20:16:19

Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
>> Bruce Momjian wrote:
>>> Can someone comment on this?
>
>> I think we have discussed more proper solutions earlier in this thread.
>>   IMO the best approach would be for the client to include the client
>> encoding in the startup package.
>
> Huh?  Clients already do that (or at least some are capable of it,

> including libpq).

What is a recommended way to do it using libpq, via "options" parameter
or "PGCLIENTENCOIDNG" environment value?

> The hard problems are (1) there's still a "before",
> ie we might fail before scanning the options in the packet, and (2)
> the sent encoding might itself be invalid, and you still have to report
> that somehow.
>
> I believe the only real "fix" is to guarantee that messages are sent
> as untranslated ASCII until we have sent an encoding indicator at
> the end of the startup sequence.  Which has its own pretty clear
> downside: no more translation of authorization failures.

I'm afraid I'm misunderstanding your point.
I'm thinking of the following steps in the backend code.

1.Set LC_MESSAGES to "C" until the client_encoding is
  determined.
2.When a client_encoding is specifed in the startup
  message, bind the corrsponding codeset to the
  textdomain and set LC_MESSAGES to the specified one
  in the startup message or restore the LC_MESSAGES
  overridden by step 1 before authorization step.
  Then we can see properly localized authorization
  failure messages.

3.Reset LC_MESSAGES to the current one in Initialize
  ClientEncoding() and unbind the codeset if necessary
  in SetDatabaseEncoding().

Comments?

regards,
Hiroshi Inoue

Re: encoding of PostgreSQL messages

From

Tom Lane

Date:

10 February 2009, 21:04:28

Hiroshi Inoue <inoue@tpf.co.jp> writes:
> I'm thinking of the following steps in the backend code.

> 1.Set LC_MESSAGES to "C" until the client_encoding is
>   determined.
> 2.When a client_encoding is specifed in the startup
>   message, bind the corrsponding codeset to the
>   textdomain and set LC_MESSAGES to the specified one
>   in the startup message or restore the LC_MESSAGES
>   overridden by step 1 before authorization step.
>   Then we can see properly localized authorization
>   failure messages.

> 3.Reset LC_MESSAGES to the current one in Initialize
>   ClientEncoding() and unbind the codeset if necessary
>   in SetDatabaseEncoding().

Reflecting on the bigger picture ...  I would imagine that the vast
majority of existing applications depend on client_encoding settings
that come from postgresql.conf, ALTER USER SET, ALTER DATABASE SET, or
just the default (== database encoding).  I don't think a solution that
penalizes those cases and makes only the case of setting it via
PGCLIENTENCODING work nicely is going to make very many people happy.

Mind you, I don't really know how to do better, but I do see that the
case of client_encoding being specified in the startup message is
not going to help enough people to be particularly useful.

            regards, tom lane

Re: encoding of PostgreSQL messages

From

Hiroshi Inoue

Date:

11 February 2009, 01:21:18

Tom Lane wrote:
> Hiroshi Inoue <inoue@tpf.co.jp> writes:
>> I'm thinking of the following steps in the backend code.
>
>> 1.Set LC_MESSAGES to "C" until the client_encoding is
>>   determined.
>> 2.When a client_encoding is specifed in the startup
>>   message, bind the corrsponding codeset to the
>>   textdomain and set LC_MESSAGES to the specified one
>>   in the startup message or restore the LC_MESSAGES
>>   overridden by step 1 before authorization step.
>>   Then we can see properly localized authorization
>>   failure messages.
>
>> 3.Reset LC_MESSAGES to the current one in Initialize
>>   ClientEncoding() and unbind the codeset if necessary
>>   in SetDatabaseEncoding().
>
> Reflecting on the bigger picture ...  I would imagine that the vast
> majority of existing applications depend on client_encoding settings
> that come from postgresql.conf, ALTER USER SET, ALTER DATABASE SET, or
> just the default (== database encoding).  I don't think a solution that
> penalizes those cases

Removing step 1 resolves the penalty. In the first place step 1
comes from your or Karsten's suggestion.

> and makes only the case of setting it via
> PGCLIENTENCODING work nicely is going to make very many people happy.
> Mind you, I don't really know how to do better, but I do see that the
> case of client_encoding being specified in the startup message is
> not going to help enough people to be particularly useful.

Maybe not enough currently because collaboration between the backend
and clients is needed to solve this problem ovbiously. The backend
should provide clients the way to specify the client_encoding on the
fly which can be applied to authorization failure messages. Then
clients which are eager to solve this problem would use the way.
Using the information in the startup message is almost unique way
to achieve it.

regards,
Hiroshi Inoue

Re: encoding of PostgreSQL messages

From

Peter Eisentraut

Date:

11 February 2009, 03:06:17

Tom Lane wrote:
> Reflecting on the bigger picture ...  I would imagine that the vast
> majority of existing applications depend on client_encoding settings
> that come from postgresql.conf, ALTER USER SET, ALTER DATABASE SET, or
> just the default (== database encoding).  I don't think a solution that
> penalizes those cases and makes only the case of setting it via
> PGCLIENTENCODING work nicely is going to make very many people happy.

I don't have any survey data available, but I think this assessment is
semantically wrong.  Usefully, the client encoding can come only from
the client, or be defaulted (and even that is semantically wrong).  I
see the other cases as workarounds.

Re: encoding of PostgreSQL messages

From

Peter Eisentraut

Date:

11 February 2009, 03:08:37

Tom Lane wrote:
> I believe the only real "fix" is to guarantee that messages are sent
> as untranslated ASCII until we have sent an encoding indicator at
> the end of the startup sequence.  Which has its own pretty clear
> downside: no more translation of authorization failures.

We should process the client encoding sent in the startup package before
authentication.

Re: encoding of PostgreSQL messages

From

Tom Lane

Date:

11 February 2009, 12:11:54

Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane wrote:
>> Reflecting on the bigger picture ...  I would imagine that the vast
>> majority of existing applications depend on client_encoding settings
>> that come from postgresql.conf, ALTER USER SET, ALTER DATABASE SET, or
>> just the default (== database encoding).  I don't think a solution that
>> penalizes those cases and makes only the case of setting it via
>> PGCLIENTENCODING work nicely is going to make very many people happy.

> I don't have any survey data available, but I think this assessment is
> semantically wrong.  Usefully, the client encoding can come only from
> the client, or be defaulted (and even that is semantically wrong).

In an ideal world, perhaps so, but do you deny my point that that's not
reality?

            regards, tom lane

Re: encoding of PostgreSQL messages

From

Peter Eisentraut

Date:

12 February 2009, 02:10:18

On Wednesday 11 February 2009 18:00:31 Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > Tom Lane wrote:
> >> Reflecting on the bigger picture ...  I would imagine that the vast
> >> majority of existing applications depend on client_encoding settings
> >> that come from postgresql.conf, ALTER USER SET, ALTER DATABASE SET, or
> >> just the default (== database encoding).  I don't think a solution that
> >> penalizes those cases and makes only the case of setting it via
> >> PGCLIENTENCODING work nicely is going to make very many people happy.
> >
> > I don't have any survey data available, but I think this assessment is
> > semantically wrong.  Usefully, the client encoding can come only from
> > the client, or be defaulted (and even that is semantically wrong).
>
> In an ideal world, perhaps so, but do you deny my point that that's not
> reality?

I have never seen a setup where the client encoding did not come from the
default or the client (and the person who set it up knew what they were
doing).  I don't think the other cases are worth optimizing.

How to check if 2 series of data are equal

From

"Paolo Saudin"

Date:

12 February 2009, 04:06:52

Hi,

I have 14 tables filled with meteorological data, one record per parameter per hour. The id field holds the parameter
type(1=temperature, 2=humidity ...) My problem is that for short periods (maybe one week, one month) there are two
stationswith the same data, I mean the temperature of table1 is equal to the humidity of table3. I need to discover
thosecases. 

I could pick one record in the first station and then compare it with the ones in the other tables for all the
parametersat that particular date. If two records are equals (it probably happens) I must then check the next one in
thetimeserie. If the second record is equal too, then probably the two series may be equals and I must raise an alert
frommy application. Is there a better and faster way to perform such a check ? 

-- tables
CREATE TABLE table1
(
  fulldate timestamp,
  id smallint NOT NULL,
  meanvalue real
) WITH (OIDS=FALSE);
--.....................
--.....................
CREATE TABLE table14
(
  fulldate timestamp,
  id smallint NOT NULL,
  meanvalue real
) WITH (OIDS=FALSE);
--
-- inserts
insert into table1(select
('2009-01-01'::timestamp + interval '1 hour' * s.a)::timestamp,
1::smallint, round(cast(random() as numeric), 1)::real
from generate_series(0,1000) as s(a)
);
--
insert into table2(select
('2009-01-01'::timestamp + interval '1 hour' * s.a)::timestamp,
1::smallint, round(cast(random() as numeric), 1)::real
from generate_series(0,1000) as s(a)
);
--
-- same data as table 1 -- MUST BE FOUND BY THE CKECK ROUTINE
insert into table3(select fulldate, id, meanvalue from table1);


Thank in advance,
Paolo Saudin

Re: encoding of PostgreSQL messages

From

Hiroshi Inoue

Date:

12 February 2009, 09:29:07

Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
>> Tom Lane wrote:
>>> Reflecting on the bigger picture ...  I would imagine that the vast
>>> majority of existing applications depend on client_encoding settings
>>> that come from postgresql.conf, ALTER USER SET, ALTER DATABASE SET, or
>>> just the default (== database encoding).  I don't think a solution that
>>> penalizes those cases and makes only

not only but also the JDBC driver or the ODBC driver sends the
startup packet including the client_encoding. Libpq can be changed
to allow *client_encoding=xxxxx* definition in conninfo.

> the case of setting it via
>>> PGCLIENTENCODING work nicely is going to make very many people happy.

Not a few libraries/applications issue SET client_encoding to ...
immediately after the connection was establised. What's wrong with
urging such clients to eliminate the SET commands and use the nice
feature of FE/BE procotol?

regards,
Hiroshi Inoue

Re: encoding of PostgreSQL messages

From

Karsten Hilbert

Date:

12 February 2009, 09:57:38

On Thu, Feb 12, 2009 at 10:28:38PM +0900, Hiroshi Inoue wrote:

> >>> Reflecting on the bigger picture ...  I would imagine that the vast
> >>> majority of existing applications depend on client_encoding settings
> >>> that come from postgresql.conf, ALTER USER SET, ALTER DATABASE SET, or
> >>> just the default (== database encoding).  I don't think a solution that
> >>> penalizes those cases and makes only

While I agree that it is desirable to find a solution which
does not break (or at least "allows to make not break") such
applications I tend to think that those applications are not
based on sound engineering with respect to this detail. A
clean solution - which allows for the unclean solution to
coexist - is very desirable.

One might introduce a GUC variable "pre_connect_encoding".
If not set it would default to 7-bit ascii as proposed but
can be set to either some explicit encoding or else to, say,
"deduce_from_environment" which would reinstate the current
behaviour. That would allow those who don't want to/cannot
fix client code to keep things working as before.

> not only but also the JDBC driver or the ODBC driver sends the
> startup packet including the client_encoding. Libpq can be changed
> to allow *client_encoding=xxxxx* definition in conninfo.
>
> > the case of setting it via
> >>> PGCLIENTENCODING work nicely is going to make very many people happy.
>
> Not a few libraries/applications issue SET client_encoding to ...
> immediately after the connection was establised. What's wrong with
> urging such clients to eliminate the SET commands and use the nice
> feature of FE/BE procotol?

Sounds sane to me as a developer using a Python PostgreSQL
adapter. It can be argued that it is *still* useful to have
a well known initial encoding (7-bit ascii) -- this would be
used for returning an error if the startup packet included
an invalid encoding ...

Also it would allow current application code to assume that
encoding until the encoding can be set by current means -
this would allow current application code to behave
correctly without having to wait for database adapter code
to support setting the encoding in the startup packet.

Or put another way: I could fix GNUmed properly NOW rather
than having to wait for psycopg2 to support setting the
client encoding in the startup packet.

Thanks,
Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: encoding of PostgreSQL messages

From

Karsten Hilbert

Date:

12 February 2009, 10:01:33

On Wed, Feb 11, 2009 at 02:20:47PM +0900, Hiroshi Inoue wrote:

> Tom Lane wrote:
> > Hiroshi Inoue <inoue@tpf.co.jp> writes:
> >> I'm thinking of the following steps in the backend code.
> >
> >> 1.Set LC_MESSAGES to "C" until the client_encoding is
> >>   determined.

I have tried that but it didn't work out for some reason.

> Removing step 1 resolves the penalty. In the first place step 1
> comes from your or Karsten's suggestion.

Not quite. My suggestion was to not *translate* strings (and
assume 7-bit ascii) until the client encoding is known.

> Maybe not enough currently because collaboration between the backend
> and clients is needed to solve this problem ovbiously. The backend
> should provide clients the way to specify the client_encoding on the
> fly which can be applied to authorization failure messages. Then
> clients which are eager to solve this problem would use the way.
> Using the information in the startup message is almost unique way
> to achieve it.
Sounds good to me as far as I can see.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: How to check if 2 series of data are equal

From

Adrian Klaver

Date:

12 February 2009, 12:28:35

On Thursday 12 February 2009 12:06:41 am Paolo Saudin wrote:
> Hi,
>
> I have 14 tables filled with meteorological data, one record per parameter
> per hour. The id field holds the parameter type (1=temperature, 2=humidity
> ...) My problem is that for short periods (maybe one week, one month) there
> are two stations with the same data, I mean the temperature of table1 is
> equal to the humidity of table3. I need to discover those cases.

Before I can start to answer this I need some clarification. How can temperature
and humidity be the same data?

>
> I could pick one record in the first station and then compare it with the
> ones in the other tables for all the parameters at that particular date. If
> two records are equals (it probably happens) I must then check the next one
> in the timeserie. If the second record is equal too, then probably the two
> series may be equals and I must raise an alert from my application. Is
> there a better and faster way to perform such a check ?
>
> -- tables
> CREATE TABLE table1
> (
>   fulldate timestamp,
>   id smallint NOT NULL,
>   meanvalue real
> ) WITH (OIDS=FALSE);
> --.....................
> --.....................
> CREATE TABLE table14
> (
>   fulldate timestamp,
>   id smallint NOT NULL,
>   meanvalue real
> ) WITH (OIDS=FALSE);
> --
> -- inserts
> insert into table1(select
> ('2009-01-01'::timestamp + interval '1 hour' * s.a)::timestamp,
> 1::smallint, round(cast(random() as numeric), 1)::real
> from generate_series(0,1000) as s(a)
> );
> --
> insert into table2(select
> ('2009-01-01'::timestamp + interval '1 hour' * s.a)::timestamp,
> 1::smallint, round(cast(random() as numeric), 1)::real
> from generate_series(0,1000) as s(a)
> );
> --
> -- same data as table 1 -- MUST BE FOUND BY THE CKECK ROUTINE
> insert into table3(select fulldate, id, meanvalue from table1);
>
>
> Thank in advance,
> Paolo Saudin



--
Adrian Klaver
aklaver@comcast.net

R: How to check if 2 series of data are equal

From

"Paolo Saudin"

Date:

12 February 2009, 13:44:47

>-----Messaggio originale-----
>Da: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] Per conto di Adrian Klaver
>Inviato: giovedì 12 febbraio 2009 17.28
>A: pgsql-general@postgresql.org
>Cc: Paolo Saudin
>Oggetto: Re: [GENERAL] How to check if 2 series of data are equal

>On Thursday 12 February 2009 12:06:41 am Paolo Saudin wrote:
>> Hi,
>>
>> I have 14 tables filled with meteorological data, one record per parameter
>> per hour. The id field holds the parameter type (1=temperature, 2=humidity
>> ...) My problem is that for short periods (maybe one week, one month) there
>> are two stations with the same data, I mean the temperature of table1 is
>> equal to the humidity of table3. I need to discover those cases.

>Before I can start to answer this I need some clarification. How can temperature
>and humidity be the same data?

Can be the same data ( and it is ) because of errors in the remote stations configurations.
The Stations and parameters IDs were mixed up resulting in same data in different tables ...

>>
>> I could pick one record in the first station and then compare it with the
>> ones in the other tables for all the parameters at that particular date. If
>> two records are equals (it probably happens) I must then check the next one
>> in the timeserie. If the second record is equal too, then probably the two
>> series may be equals and I must raise an alert from my application. Is
>> there a better and faster way to perform such a check ?
>>
>> -- tables
>> CREATE TABLE table1
>> (
>>   fulldate timestamp,
>>   id smallint NOT NULL,
>>   meanvalue real
>> ) WITH (OIDS=FALSE);
>> --.....................
>> --.....................
>> CREATE TABLE table14
>> (
>>   fulldate timestamp,
>>   id smallint NOT NULL,
>>   meanvalue real
>> ) WITH (OIDS=FALSE);
>> --
>> -- inserts
>> insert into table1(select
>> ('2009-01-01'::timestamp + interval '1 hour' * s.a)::timestamp,
>> 1::smallint, round(cast(random() as numeric), 1)::real
>> from generate_series(0,1000) as s(a)
>> );
>> --
>> insert into table2(select
>> ('2009-01-01'::timestamp + interval '1 hour' * s.a)::timestamp,
>> 1::smallint, round(cast(random() as numeric), 1)::real
>> from generate_series(0,1000) as s(a)
>> );
>> --
>> -- same data as table 1 -- MUST BE FOUND BY THE CKECK ROUTINE
>> insert into table3(select fulldate, id, meanvalue from table1);
>>
>>
>> Thank in advance,
>> Paolo Saudin



>--
>Adrian Klaver
>aklaver@comcast.net

>--
>Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general

Re: R: How to check if 2 series of data are equal

From

Adrian Klaver

Date:

12 February 2009, 13:56:37

----- "Paolo Saudin" <paolo@ecometer.it> wrote:

> >-----Messaggio originale-----
> >Da: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] Per conto di Adrian
> Klaver
> >Inviato: giovedì 12 febbraio 2009 17.28
> >A: pgsql-general@postgresql.org
> >Cc: Paolo Saudin
> >Oggetto: Re: [GENERAL] How to check if 2 series of data are equal
>
> >On Thursday 12 February 2009 12:06:41 am Paolo Saudin wrote:
> >> Hi,
> >>
> >> I have 14 tables filled with meteorological data, one record per
> parameter
> >> per hour. The id field holds the parameter type (1=temperature,
> 2=humidity
> >> ...) My problem is that for short periods (maybe one week, one
> month) there
> >> are two stations with the same data, I mean the temperature of
> table1 is
> >> equal to the humidity of table3. I need to discover those cases.
>
> >Before I can start to answer this I need some clarification. How can
> temperature
> >and humidity be the same data?
>
> Can be the same data ( and it is ) because of errors in the remote
> stations configurations.
> The Stations and parameters IDs were mixed up resulting in same data
> in different tables ...
>

I am afraid I more confused now. From the table schema the value is a real number only and has no units. As I
understandthe units designation lies in the id. If the ids are mixed up I can't see how it is possible to differentiate
betweena value of 25 that maybe degrees C or % relative humidity for instance. You are going to have to step me through
this.


Adrian Klaver
aklaver@comcast.net

Re: How to check if 2 series of data are equal

From

Sam Mason

Date:

12 February 2009, 14:17:10

On Thu, Feb 12, 2009 at 09:06:41AM +0100, Paolo Saudin wrote:
> I have 14 tables filled with meteorological data, one record
> per parameter per hour. The id field holds the parameter type
> (1=temperature, 2=humidity ...) My problem is that for short periods
> (maybe one week, one month) there are two stations with the same data,
> I mean the temperature of table1 is equal to the humidity of table3. I
> need to discover those cases.

I'm assuming that it's one table per sensor station below.

Would something like this work:

  SELECT station, fulldate,
    COUNT(CASE WHEN sensor = 1 THEN 1 END) AS num_sensor1_readings,
    COUNT(CASE WHEN sensor = 2 THEN 1 END) AS num_sensor2_readings,
    COUNT(CASE WHEN sensor = 3 THEN 1 END) AS num_sensor3_readings
  FROM (
    SELECT 1 AS station, fulldate, id AS sensor FROM table1 UNION ALL
    SELECT 2, fulldate, id FROM table2 UNION ALL
    SELECT 3, fulldate, id FROM table 3) x
  GROUP BY station, fulldate
  ORDER BY station, fulldate;

That way you'll get a list of all the duplicate values.  If the
"fulldate" column is the actual time it was received and isn't
truncated off to the nearest hour, you will probably want to use
date_trunc('hour',fulldate) in the outer select.

Hope that helps!

--
  Sam  http://samason.me.uk/

R: R: How to check if 2 series of data are equal

From

"Paolo Saudin"

Date:

12 February 2009, 15:37:43

>-----Messaggio originale-----
>Da: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] Per conto di Adrian Klaver
>Inviato: giovedì 12 febbraio 2009 18.57
>A: Paolo Saudin
>Cc: pgsql-general@postgresql.org
>Oggetto: Re: R: [GENERAL] How to check if 2 series of data are equal

>----- "Paolo Saudin" <paolo@ecometer.it> wrote:

> >-----Messaggio originale-----
> >Da: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] Per conto di Adrian
> Klaver
> >Inviato: giovedì 12 febbraio 2009 17.28
> >A: pgsql-general@postgresql.org
> >Cc: Paolo Saudin
> >Oggetto: Re: [GENERAL] How to check if 2 series of data are equal
>
> >On Thursday 12 February 2009 12:06:41 am Paolo Saudin wrote:
> >> Hi,
> >>
> >> I have 14 tables filled with meteorological data, one record per
> parameter
> >> per hour. The id field holds the parameter type (1=temperature,
> 2=humidity
> >> ...) My problem is that for short periods (maybe one week, one
> month) there
> >> are two stations with the same data, I mean the temperature of
> table1 is
> >> equal to the humidity of table3. I need to discover those cases.
>
> >Before I can start to answer this I need some clarification. How can
> temperature
> >and humidity be the same data?
>
> Can be the same data ( and it is ) because of errors in the remote
> stations configurations.
> The Stations and parameters IDs were mixed up resulting in same data
> in different tables ...
>

>I am afraid I more confused now. From the table schema the value is a real number only and has no units. As I
understandthe units >designation lies in the id. If the ids are mixed up I can't see how it is possible to
differentiatebetween a value of 25 that maybe >degrees C or % relative humidity for instance. You are going to have to
stepme through this. 

Yes, the parameter is defined by the id and stored in another table with the name, units and other properties. I need
tofind out a sequence of meanvalues (without taking care of ids) which exists in another table 

Here is  some sample data, I need to found out if some sequence of data in table1 is equal to data in table2, table3
...tableN.  

Table1
fulldate, id, meanvalue
2009-01-01 00:00:00, 1, 12.3        -- temperature
2009-01-01 01:00:00, 1, 12.5
2009-01-01 02:00:00, 1, 12.6
2009-01-01 03:00:00, 1, 12.7
2009-01-01 04:00:00, 1, 12.8
2009-01-01 05:00:00, 1, 12.2

Table1
fulldate, id, meanvalue
2009-01-01 00:00:00, 2, 80.3        -- humidity
2009-01-01 01:00:00, 2, 81.6
2009-01-01 02:00:00, 2, 82.1
2009-01-01 03:00:00, 2, 79.8
2009-01-01 04:00:00, 2, 77.2
2009-01-01 05:00:00, 2, 77.1
------------------------------------------------------------------

Table2
fulldate, id, meanvalue
2009-01-01 00:00:00, 1, 12.3        -- temperature
2009-01-01 01:00:00, 1, 11.8
2009-01-01 02:00:00, 1, 82.1       ! WRONG DATA - SAME AS Table2 id->2
2009-01-01 03:00:00, 1, 79.8       ! WRONG DATA - SAME AS Table2 id->2
2009-01-01 04:00:00, 1, 77.2       ! WRONG DATA - SAME AS Table2 id->2
2009-01-01 05:00:00, 1, 13.1

Table2
fulldate, id, meanvalue
2009-01-01 00:00:00, 2, 78.9        -- humidity
2009-01-01 01:00:00, 2, 76.4
2009-01-01 02:00:00, 2, 74.7
2009-01-01 03:00:00, 2, 73.1
2009-01-01 04:00:00, 2, 71.6
2009-01-01 05:00:00, 1, 70.8

Hope this might help,
Paolo Saudin


>Adrian Klaver
>aklaver@comcast.net

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: R: R: How to check if 2 series of data are equal

From

Adrian Klaver

Date:

12 February 2009, 18:21:46

On Thursday 12 February 2009 11:37:37 am Paolo Saudin wrote:

> >
> > Can be the same data ( and it is ) because of errors in the remote
> > stations configurations.
> > The Stations and parameters IDs were mixed up resulting in same data
> > in different tables ...
> >
> >
> >I am afraid I more confused now. From the table schema the value is a real
> > number only and has no units. As I understand the units >designation lies
> > in the id. If the ids are mixed up I can't see how it is possible to
> > differentiate between a value of 25 that maybe >degrees C or % relative
> > humidity for instance. You are going to have to step me through this.
>
> Yes, the parameter is defined by the id and stored in another table with
> the name, units and other properties. I need to find out a sequence of
> meanvalues (without taking care of ids) which exists in another table
>
> Here is  some sample data, I need to found out if some sequence of data in
> table1 is equal to data in table2, table3 ... tableN.
>
> Table1
> fulldate, id, meanvalue
> 2009-01-01 00:00:00, 1, 12.3        -- temperature
> 2009-01-01 01:00:00, 1, 12.5
> 2009-01-01 02:00:00, 1, 12.6
> 2009-01-01 03:00:00, 1, 12.7
> 2009-01-01 04:00:00, 1, 12.8
> 2009-01-01 05:00:00, 1, 12.2
>
> Table1
> fulldate, id, meanvalue
> 2009-01-01 00:00:00, 2, 80.3        -- humidity
> 2009-01-01 01:00:00, 2, 81.6
> 2009-01-01 02:00:00, 2, 82.1
> 2009-01-01 03:00:00, 2, 79.8
> 2009-01-01 04:00:00, 2, 77.2
> 2009-01-01 05:00:00, 2, 77.1
> ------------------------------------------------------------------
>
> Table2
> fulldate, id, meanvalue
> 2009-01-01 00:00:00, 1, 12.3        -- temperature
> 2009-01-01 01:00:00, 1, 11.8
> 2009-01-01 02:00:00, 1, 82.1       ! WRONG DATA - SAME AS Table2 id->2
> 2009-01-01 03:00:00, 1, 79.8       ! WRONG DATA - SAME AS Table2 id->2
> 2009-01-01 04:00:00, 1, 77.2       ! WRONG DATA - SAME AS Table2 id->2
> 2009-01-01 05:00:00, 1, 13.1

I am going to assume you mean Table1 above.

>
> Table2
> fulldate, id, meanvalue
> 2009-01-01 00:00:00, 2, 78.9        -- humidity
> 2009-01-01 01:00:00, 2, 76.4
> 2009-01-01 02:00:00, 2, 74.7
> 2009-01-01 03:00:00, 2, 73.1
> 2009-01-01 04:00:00, 2, 71.6
> 2009-01-01 05:00:00, 1, 70.8
>
> Hope this might help,
> Paolo Saudin

I modified Sams query-


SELECT fulldate,sensor
  FROM (SELECT fulldate,sensor,count(sensor)
          FROM (SELECT 1 AS station, fulldate, meanvalue AS sensor FROM table1
UNION
                SELECT 2, fulldate, meanvalue FROM table2 ORDER BY
fulldate,sensor) AS  x
GROUP BY fulldate,sensor) AS y
WHERE y.count>1;


and got-

     fulldate       | sensor
---------------------+--------
 2009-01-01 00:00:00 |   12.3
 2009-01-01 02:00:00 |   82.1
 2009-01-01 03:00:00 |   79.8
 2009-01-01 04:00:00 |   77.2



Though I think you might want to deal with the remote sensor problem first. I
would be hesitant to trust any of the data. Just a thought.


--
Adrian Klaver
aklaver@comcast.net

Re: R: R: How to check if 2 series of data are equal

From

Octavio Alvarez

Date:

13 February 2009, 00:47:22

On Thu, 2009-02-12 at 14:21 -0800, Adrian Klaver wrote:
> On Thursday 12 February 2009 11:37:37 am Paolo Saudin wrote:
>

>
> SELECT fulldate,sensor
>   FROM (SELECT fulldate,sensor,count(sensor)
>           FROM (SELECT 1 AS station, fulldate, meanvalue AS sensor FROM table1
> UNION
>                 SELECT 2, fulldate, meanvalue FROM table2 ORDER BY
> fulldate,sensor) AS  x
> GROUP BY fulldate,sensor) AS y
> WHERE y.count>1;
>
>
> and got-
>
>      fulldate       | sensor
> ---------------------+--------
>  2009-01-01 00:00:00 |   12.3
>  2009-01-01 02:00:00 |   82.1
>  2009-01-01 03:00:00 |   79.8
>  2009-01-01 04:00:00 |   77.2

You might want to do a UNION with all your tables to add a field to
identify the table, and use min() and max() to show the offending
tables.

R: R: R: How to check if 2 series of data are equal

From

"Paolo Saudin"

Date:

13 February 2009, 03:18:47


-----Messaggio originale-----
Da: Adrian Klaver [mailto:aklaver@comcast.net]
Inviato: giovedì 12 febbraio 2009 23.22
A: Paolo Saudin
Cc: pgsql-general@postgresql.org
Oggetto: Re: R: R: [GENERAL] How to check if 2 series of data are equal

On Thursday 12 February 2009 11:37:37 am Paolo Saudin wrote:

> >
> > Can be the same data ( and it is ) because of errors in the remote
> > stations configurations.
> > The Stations and parameters IDs were mixed up resulting in same data
> > in different tables ...
> >
> >
> >I am afraid I more confused now. From the table schema the value is a real
> > number only and has no units. As I understand the units >designation lies
> > in the id. If the ids are mixed up I can't see how it is possible to
> > differentiate between a value of 25 that maybe >degrees C or % relative
> > humidity for instance. You are going to have to step me through this.
>
> Yes, the parameter is defined by the id and stored in another table with
> the name, units and other properties. I need to find out a sequence of
> meanvalues (without taking care of ids) which exists in another table
>
> Here is  some sample data, I need to found out if some sequence of data in
> table1 is equal to data in table2, table3 ... tableN.
>
> Table1
> fulldate, id, meanvalue
> 2009-01-01 00:00:00, 1, 12.3        -- temperature
> 2009-01-01 01:00:00, 1, 12.5
> 2009-01-01 02:00:00, 1, 12.6
> 2009-01-01 03:00:00, 1, 12.7
> 2009-01-01 04:00:00, 1, 12.8
> 2009-01-01 05:00:00, 1, 12.2
>
> Table1
> fulldate, id, meanvalue
> 2009-01-01 00:00:00, 2, 80.3        -- humidity
> 2009-01-01 01:00:00, 2, 81.6
> 2009-01-01 02:00:00, 2, 82.1
> 2009-01-01 03:00:00, 2, 79.8
> 2009-01-01 04:00:00, 2, 77.2
> 2009-01-01 05:00:00, 2, 77.1
> ------------------------------------------------------------------
>
> Table2
> fulldate, id, meanvalue
> 2009-01-01 00:00:00, 1, 12.3        -- temperature
> 2009-01-01 01:00:00, 1, 11.8
> 2009-01-01 02:00:00, 1, 82.1       ! WRONG DATA - SAME AS Table2 id->2
> 2009-01-01 03:00:00, 1, 79.8       ! WRONG DATA - SAME AS Table2 id->2
> 2009-01-01 04:00:00, 1, 77.2       ! WRONG DATA - SAME AS Table2 id->2
> 2009-01-01 05:00:00, 1, 13.1

I am going to assume you mean Table1 above.

>
> Table2
> fulldate, id, meanvalue
> 2009-01-01 00:00:00, 2, 78.9        -- humidity
> 2009-01-01 01:00:00, 2, 76.4
> 2009-01-01 02:00:00, 2, 74.7
> 2009-01-01 03:00:00, 2, 73.1
> 2009-01-01 04:00:00, 2, 71.6
> 2009-01-01 05:00:00, 1, 70.8
>
> Hope this might help,
> Paolo Saudin

>I modified Sams query-
>
>
>SELECT fulldate,sensor
>  FROM (SELECT fulldate,sensor,count(sensor)
>          FROM (SELECT 1 AS station, fulldate, meanvalue AS sensor FROM table1
>UNION
>                SELECT 2, fulldate, meanvalue FROM table2 ORDER BY
>fulldate,sensor) AS  x
>GROUP BY fulldate,sensor) AS y
>WHERE y.count>1;
>
>
>and got-
>
>     fulldate       | sensor
>---------------------+--------
> 2009-01-01 00:00:00 |   12.3
> 2009-01-01 02:00:00 |   82.1
> 2009-01-01 03:00:00 |   79.8
> 2009-01-01 04:00:00 |   77.2

Thank you very much to you all, this one works perfectly !!


>Though I think you might want to deal with the remote sensor problem first. I
>would be hesitant to trust any of the data. Just a thought.

The problem has been fixed and does not happens any more. Unfortunately there are 14 years (1992-2006) in which data
couldbe corrupted for short periods. Now I must found them out ... 

Thanks once more

Paolo Saudin

>--
>Adrian Klaver
>aklaver@comcast.net

Re: encoding of PostgreSQL messages

From

"Hiroshi Saito"

Date:

02 March 2009, 11:58:02

Hi.

Sorry very late reaction.
I desire problem solution.Therefore, one evidence....

I tried jdbc program.
http://winpg.jp/~saito/pg_work/LC_MESSAGE_CHECK/connect_problem/jdbctestx.java
C:\home\HIROSHI>java jdbctestx
org.postgresql.util.PSQLException: FATAL: ???[??"postgres"??????????????
        at org.postgresql.core.v3.ConnectionFactoryImpl.readStartupMessages(Conn
ectionFactoryImpl.java:444)
        at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(Conne
ctionFactoryImpl.java:99)
        at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactor
y.java:66)
        at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Conn
ection.java:124)
        at org.postgresql.jdbc2.Jdbc2Connection.<init>(Jdbc2Connection.java:24)
        at org.postgresql.Driver.makeConnection(Driver.java:386)
        at org.postgresql.Driver.connect(Driver.java:260)
        at java.sql.DriverManager.getConnection(Unknown Source)
        at java.sql.DriverManager.getConnection(Unknown Source)
        at jdbctestx.main(jdbctestx.java:21)

Then, server log is this.
http://winpg.jp/~saito/pg_work/LC_MESSAGE_CHECK/connect_problem/server.log

Regards,
Hiroshi Saito

----- Original Message -----
From: "Hiroshi Inoue" <inoue@tpf.co.jp>


> Tom Lane wrote:
>> Peter Eisentraut <peter_e@gmx.net> writes:
>>> Bruce Momjian wrote:
>>>> Can someone comment on this?
>>
>>> Looks like a horrible hack to me.  Recoding stuff to the client encoding
>>> in the server outside the existing recoding mechanism looks pretty evil
>>> to me.  Plus, it does not address the problem of what happens to
>>> messages sent before this, it just moves the point of "before" a bit
>>> earlier for some special cases.
>>
>>> I think we have discussed more proper solutions earlier in this thread.
>>>   IMO the best approach would be for the client to include the client
>>> encoding in the startup package.
>>
>> Huh?  Clients already do that (or at least some are capable of it,
>> including libpq).
>
> Yes the psqlodbc driver has done it because protocol 3 allowed it
> from the first.
>
>> The hard problems are (1) there's still a "before",
>
> Yes but isn't it an improvement that properly localized password error
> or no database error etc can be seen? Currently I see unreadable
> error messages for those cases via psqlodbc driver.  The attatched
> patch in my previous posting is an example to solve the problem.
>
> regards,
> Hirosh Inoue
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Re: encoding of PostgreSQL messages

From

Karsten Hilbert

Date:

04 March 2009, 16:06:49

On Tue, Mar 03, 2009 at 12:35:37AM +0900, Hiroshi Saito wrote:

> Sorry very late reaction.
> I desire problem solution.
So do I :-)

Ganbatte !

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: encoding of PostgreSQL messages

From

"Hiroshi Saito"

Date:

05 March 2009, 14:18:45

Hi. Karsten-san.

Yeah, It was a problem unsolvable by the driver to relay.
although perseverance keeping without giving up!

--
arigatougozaimasu:-)

Regards,
Hiroshi Saito

----- Original Message -----
From: "Karsten Hilbert" <Karsten.Hilbert@gmx.net>


> On Tue, Mar 03, 2009 at 12:35:37AM +0900, Hiroshi Saito wrote:
>
>> Sorry very late reaction.
>> I desire problem solution.
> So do I :-)
>
> Ganbatte !
>
> Karsten
> --
> GPG key ID E4071346 @ wwwkeys.pgp.net
> E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Re: (problem with) encoding of (early-in-connect) PostgreSQL messages

From

Karsten Hilbert

Date:

10 March 2009, 08:39:41

Is there any chance of this being considered for 8.4 ?
Unfortunately I cannot provide patches myself as my
knowledge of C is next to nothing.

Thanks,
Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346