Thread: encoding of PostgreSQL messages
Hi all ! How can I programmatically detect which encoding a PostgreSQL server I am trying to connect to sends back messages -- before I connect (so client_encoding and the pg_settings table are flat out). Thanks, Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Karsten Hilbert wrote: > Hi all ! > > How can I programmatically detect which encoding a > PostgreSQL server I am trying to connect to sends back > messages -- before I connect (so client_encoding and > the pg_settings table are flat out). Hmm, isn't client_encoding reported in the startup packet sent by the server, after auth? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Tue, Dec 23, 2008 at 06:45:17PM -0300, Alvaro Herrera wrote: > > How can I programmatically detect which encoding a > > PostgreSQL server I am trying to connect to sends back > > messages -- before I connect (so client_encoding and > > the pg_settings table are flat out). > > Hmm, isn't client_encoding reported in the startup packet sent by the > server, after auth? That would not quite be enough -- I am talking about messages reported *during* auth, say FATAL: password authentication failed for user "postgres" or fe_sendauth: no password supplied both of which, in other locales, may contain non-ASCII characters. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Dec 23, 2008, at 4:24 PM, Karsten Hilbert wrote: > How can I programmatically detect which encoding a > PostgreSQL server I am trying to connect to sends back > messages -- before I connect (so client_encoding and > the pg_settings table are flat out). I don't think there is a way because you can't get any information without authorizing first. It is also unclear to me how things work in the other direction. When authenticating, what if the user or database name have non-ascii characters. Are they interpreted in the encoding of the server since the client has not established an encoding? John DeSoi, Ph.D.
Karsten Hilbert wrote: > On Tue, Dec 23, 2008 at 06:45:17PM -0300, Alvaro Herrera wrote: > > Hmm, isn't client_encoding reported in the startup packet sent by the > > server, after auth? > > That would not quite be enough -- I am talking about > messages reported *during* auth, say > > FATAL: password authentication failed for user "postgres" > > or > > fe_sendauth: no password supplied > > both of which, in other locales, may contain non-ASCII characters. Those are sent in the server encoding IIRC (which admittedly you don't have a way to know, at that point.) And I'm now wondering if we should delay initializing the translation stuff until after client_encoding has been reported. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mon, Dec 29, 2008 at 09:07:14AM -0300, Alvaro Herrera wrote: > > That would not quite be enough -- I am talking about > > messages reported *during* auth, say > > > > FATAL: password authentication failed for user "postgres" > > > > or > > > > fe_sendauth: no password supplied > > > > both of which, in other locales, may contain non-ASCII characters. > > Those are sent in the server encoding IIRC (which admittedly you don't > have a way to know, at that point.) > > And I'm now wondering if we should delay initializing the translation > stuff until after client_encoding has been reported. Or else - just don't pass those messages through gettext so they are always in 7 bit ASCII English - do pass them through gettext but append a 7-bit ASCII filter so things do get passed in pseudo ASCII (this will work for many singlebyte encodings but more often than not for multibyte ones) Both changes would be less intrusive than postponing the translation. If I had a choice I would opt for the first. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Karsten Hilbert <Karsten.Hilbert@gmx.net> writes: > On Mon, Dec 29, 2008 at 09:07:14AM -0300, Alvaro Herrera wrote: >> And I'm now wondering if we should delay initializing the translation >> stuff until after client_encoding has been reported. > Or else > - just don't pass those messages through gettext so they are > always in 7 bit ASCII English What's the difference? The user-visible result would be the same AFAICS. (One or the other might be less messy internally, but I'm not sure which offhand.) regards, tom lane
> Karsten Hilbert <Karsten.Hilbert@gmx.net> writes: > > On Mon, Dec 29, 2008 at 09:07:14AM -0300, Alvaro Herrera wrote: > >> And I'm now wondering if we should delay initializing the translation > >> stuff until after client_encoding has been reported. > > > Or else > > > - just don't pass those messages through gettext so they are > > always in 7 bit ASCII English > > What's the difference? The user-visible result would be the same > AFAICS. (One or the other might be less messy internally, but I'm > not sure which offhand.) That was the reason for the suggestion: perhaps less messy and surely lower impact on the existing code as it would not mean moving code later in the initialization but rather just removing the gettext wrappers around a few strings. No difference in the result. The difference to my other suggestion (no translation vs. translation but then replacing characters > 127 by, say '?' or a space) is: I could *assume* a given encoding, namely 7 bit ASCII. Or rather I could assume that I can display the message as "something pretty similar to what the original message said, perhaps without umlauts and accents but still recognizable in the local language". Now, surely, I could dig down the layers to where "my application space" receives the message from PostgreSQL and filter there. It is, however, good to have some knowledge of the encoding where knowledge can be had. The concrete problem is this: I connect to PostgreSQL from Python. Let's assume PG is set to German. If the wrong password is supplied the PG error message string contains an umlaut. This is passed to libpq, which in turn passes it to the C part of psycopg2 which then turns this into an exception. An exception, by default in Python, is printed to the console, which may be in any encoding incompatible with the latin1 the PG message happens to be in. Thus, printing the PG message may or may not fail due to Unicode de-/encoding errors. The solution is to find the right layer to take control of the encoding but this is eventually only possible if the encoding is *known*. Thus the plea for "7-bit-ascii English by default until the encoding *can* be known". Going to "7-bit-ascii filter of the original by default until the encoding can be known" only tries to preserve a bit more of the original language. I may be wrong in feasibility. Thanks for considering, Karsten -- Sensationsangebot verlängert: GMX FreeDSL - Telefonanschluss + DSL für nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K1308T4569a
On Wednesday 31 December 2008 18:57:29 Karsten Hilbert wrote: > The solution is to find the right layer to take control of the encoding but > this is eventually only possible if the encoding is *known*. Thus the plea > for "7-bit-ascii English by default until the encoding *can* be known". > Going to "7-bit-ascii filter of the original by default until the encoding > can be known" only tries to preserve a bit more of the original language. I > may be wrong in feasibility. The proper fix is probably to include the client encoding in the connection startup message.
> On Wednesday 31 December 2008 18:57:29 Karsten Hilbert wrote: > > The solution is to find the right layer to take control of the encoding > but > > this is eventually only possible if the encoding is *known*. Thus the > plea > > for "7-bit-ascii English by default until the encoding *can* be known". > > Going to "7-bit-ascii filter of the original by default until the > encoding > > can be known" only tries to preserve a bit more of the original > language. I > > may be wrong in feasibility. > > The proper fix is probably to include the client encoding in the > connection startup message. Absolutely, either the desired client encoding being sent to the server or the current server encoding being sent to the client. It would, however, take way longer to trickle down into being supported by client interfaces such as psycopg2 <evil grin> A fixed startup encoding would not need support from those parts of the equation. Karsten -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger
Peter Eisentraut <peter_e@gmx.net> writes: > On Wednesday 31 December 2008 18:57:29 Karsten Hilbert wrote: >> The solution is to find the right layer to take control of the encoding but >> this is eventually only possible if the encoding is *known*. Thus the plea >> for "7-bit-ascii English by default until the encoding *can* be known". >> Going to "7-bit-ascii filter of the original by default until the encoding >> can be known" only tries to preserve a bit more of the original language. I >> may be wrong in feasibility. > The proper fix is probably to include the client encoding in the connection > startup message. What of errors occurring before such an option could be applied? I think that ultimately it's necessary to accept that there will be some window during connection startup where sending plain ASCII (English) messages is the best recourse. I'm not sure what the best way to implement that is. On reflection though, trying to mark the individual messages that might need that treatment doesn't seem like a winner: there's too much possibility for mistakes, or code drift causing a marking to become wrong; and I'm not even convinced that the very same source-code message might not fall into both categories at different times. So having a mode switch somehow occurring inside elog.c once we are ready to translate seems like the right approach. regards, tom lane
> > The proper fix is probably to include the client encoding in the > connection > > startup message. > > What of errors occurring before such an option could be applied? > > I think that ultimately it's necessary to accept that there will be some > window during connection startup where sending plain ASCII (English) > messages is the best recourse. I'm not sure what the best way to > implement that is. On reflection though, trying to mark the individual > messages that might need that treatment doesn't seem like a winner: > there's too much possibility for mistakes, or code drift causing a > marking to become wrong; and I'm not even convinced that the very same > source-code message might not fall into both categories at different > times. So having a mode switch somehow occurring inside elog.c once > we are ready to translate seems like the right approach. Hm, so maybe both Peter and Alvaro are right: 1) Setting the translation wrapper to a NOOP as early as possible. Thus, the first messages are sent in 7-bit ASCII English. 2) Including the server encoding in the very (?) first message sent to the client *after* which ... 3) ... proper initialization of the gettext system can happen because now the client can know what to make of them subsequent messages. Thereby no strings should need special marking and proper translation or not is achieved automatically ? It is surely acceptable to see the very first messages in English and/or 7-bit ASCII - I just want to be able to know their encoding either by definition or by being told. Thanks, Karsten -- Sensationsangebot verlängert: GMX FreeDSL - Telefonanschluss + DSL für nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K1308T4569a
> Hm, so maybe both Peter and Alvaro are right: > > 1) Setting the translation wrapper to a NOOP as early as possible. > > Thus, the first messages are sent in 7-bit ASCII English. Despite being *marked* for translation and a translation to exist in the .po file, that is. Karsten -- Sensationsangebot verlängert: GMX FreeDSL - Telefonanschluss + DSL für nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K1308T4569a
On Wednesday 31 December 2008 20:23:47 Tom Lane wrote: > > The proper fix is probably to include the client encoding in the > > connection startup message. > > What of errors occurring before such an option could be applied? Connection errors are handled by the client, which knows the client encoding. If the setting of the client encoding would be one of the first things to be done on the server side, you would only have a handful of possible error conditions left (e.g., setlocale failed, out of memory). You could choose to report those in plain ASCII or send a special error code that the client can resolve. Although I guess no one could fault us if "could not set language" is reported not translated. ;-) > I think that ultimately it's necessary to accept that there will be some > window during connection startup where sending plain ASCII (English) > messages is the best recourse. Ultimately yes. But we currently handle the client encoding quite late in the startup sequence so that many connection startup failure messages that are of interest to normal users would likely be affected. So moving the client encoding handling to the earliest possible phase would still be desirable.
Bruce, et al, given the thread partially quoted below would this warrant a TODO item "improve communication of encoding between client and server regarding early startup messages" ? A very usable band-aid for 8.4 - short of a proper fix - would be the minimal-invasive sending of messages in 7-bit English until server_encoding can be retrieved by the client by current means. Thanks, Karsten On Thu, Jan 01, 2009 at 08:33:56PM +0200, Peter Eisentraut wrote: > Subject: Re: [GENERAL] encoding of PostgreSQL messages > User-Agent: KMail/1.9.9 > > On Wednesday 31 December 2008 20:23:47 Tom Lane wrote: > > > The proper fix is probably to include the client encoding in the > > > connection startup message. > > > > What of errors occurring before such an option could be applied? > > Connection errors are handled by the client, which knows the client encoding. > If the setting of the client encoding would be one of the first things to be > done on the server side, you would only have a handful of possible error > conditions left (e.g., setlocale failed, out of memory). You could choose to > report those in plain ASCII or send a special error code that the client can > resolve. Although I guess no one could fault us if "could not set language" > is reported not translated. ;-) > > > I think that ultimately it's necessary to accept that there will be some > > window during connection startup where sending plain ASCII (English) > > messages is the best recourse. > > Ultimately yes. But we currently handle the client encoding quite late in the > startup sequence so that many connection startup failure messages that are of > interest to normal users would likely be affected. So moving the client > encoding handling to the earliest possible phase would still be desirable. > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Added to TODO: Improve encoding of connection startup messages sent to the client Currently some authentication error messages are sent in the server encoding * http://archives.postgresql.org/pgsql-general/2008-12/msg00801.php * http://archives.postgresql.org/pgsql-general/2009-01/msg00005.php --------------------------------------------------------------------------- Karsten Hilbert wrote: > Bruce, et al, > > given the thread partially quoted below would this warrant a > TODO item "improve communication of encoding between client > and server regarding early startup messages" ? > > A very usable band-aid for 8.4 - short of a proper fix - > would be the minimal-invasive sending of messages in 7-bit > English until server_encoding can be retrieved by the client > by current means. > > Thanks, > Karsten > > On Thu, Jan 01, 2009 at 08:33:56PM +0200, Peter Eisentraut wrote: > > Subject: Re: [GENERAL] encoding of PostgreSQL messages > > User-Agent: KMail/1.9.9 > > > > On Wednesday 31 December 2008 20:23:47 Tom Lane wrote: > > > > The proper fix is probably to include the client encoding in the > > > > connection startup message. > > > > > > What of errors occurring before such an option could be applied? > > > > Connection errors are handled by the client, which knows the client encoding. > > If the setting of the client encoding would be one of the first things to be > > done on the server side, you would only have a handful of possible error > > conditions left (e.g., setlocale failed, out of memory). You could choose to > > report those in plain ASCII or send a special error code that the client can > > resolve. Although I guess no one could fault us if "could not set language" > > is reported not translated. ;-) > > > > > I think that ultimately it's necessary to accept that there will be some > > > window during connection startup where sending plain ASCII (English) > > > messages is the best recourse. > > > > Ultimately yes. But we currently handle the client encoding quite late in the > > startup sequence so that many connection startup failure messages that are of > > interest to normal users would likely be affected. So moving the client > > encoding handling to the earliest possible phase would still be desirable. > > > > -- > > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-general > > -- > GPG key ID E4071346 @ wwwkeys.pgp.net > E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Hi, This topic seems to be related to the bug report [ODBC] Localized error messages, wrong charset . Bruce Momjian wrote: > Added to TODO: > > Improve encoding of connection startup messages sent to the client > > Currently some authentication error messages are sent in the server > encoding It it true ? IIRC the backend knows nothing about the server encoding in authentication phase. Psqlodbc Unicode driver sends connection startup message which contains the client_encoding(=UTF8) guc parameter. Attached is a trial patch so that the psqlodbc Unicode driver can get properly localized password error messages. regards, Hiroshi Inoue Index: postmaster/postmaster.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/postmaster.c,v retrieving revision 1.570 diff -c -c -r1.570 postmaster.c *** postmaster/postmaster.c 4 Jan 2009 22:19:59 -0000 1.570 --- postmaster/postmaster.c 30 Jan 2009 14:05:35 -0000 *************** *** 1552,1557 **** --- 1552,1560 ---- pstrdup(nameptr)); port->guc_options = lappend(port->guc_options, pstrdup(valptr)); + if (stricmp(nameptr, "client_encoding") == 0 && + stricmp(valptr, "UTF8") == 0) + bind_textdomain_codeset(PG_TEXTDOMAIN("postgres"), "UTF-8"); } offset = valoffset + strlen(valptr) + 1; }
Can someone comment on this? --------------------------------------------------------------------------- Hiroshi Inoue wrote: > Hi, > > This topic seems to be related to the bug report > [ODBC] Localized error messages, wrong charset > . > > Bruce Momjian wrote: > > Added to TODO: > > > > Improve encoding of connection startup messages sent to the client > > > > Currently some authentication error messages are sent in the server > > encoding > > It it true ? > IIRC the backend knows nothing about the server encoding in > authentication phase. > > Psqlodbc Unicode driver sends connection startup message which > contains the client_encoding(=UTF8) guc parameter. Attached is > a trial patch so that the psqlodbc Unicode driver can get > properly localized password error messages. > > regards, > Hiroshi Inoue > Index: postmaster/postmaster.c > =================================================================== > RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/postmaster.c,v > retrieving revision 1.570 > diff -c -c -r1.570 postmaster.c > *** postmaster/postmaster.c 4 Jan 2009 22:19:59 -0000 1.570 > --- postmaster/postmaster.c 30 Jan 2009 14:05:35 -0000 > *************** > *** 1552,1557 **** > --- 1552,1560 ---- > pstrdup(nameptr)); > port->guc_options = lappend(port->guc_options, > pstrdup(valptr)); > + if (stricmp(nameptr, "client_encoding") == 0 && > + stricmp(valptr, "UTF8") == 0) > + bind_textdomain_codeset(PG_TEXTDOMAIN("postgres"), "UTF-8"); > } > offset = valoffset + strlen(valptr) + 1; > } -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Peter Eisentraut wrote: > Bruce Momjian wrote: >> Can someone comment on this? > > Looks like a horrible hack to me. Recoding stuff to the client encoding > in the server outside the existing recoding mechanism looks pretty evil > to me. > Plus, it does not address the problem of what happens to > messages sent before this, it just moves the point of "before" a bit > earlier for some special cases. > > I think we have discussed more proper solutions earlier in this thread. > IMO the best approach would be for the client to include the client > encoding in the startup package. ??? My patch exactly references the client_encoding included in the startup message. regards, Hiroshi Inoue
Bruce Momjian wrote: > Can someone comment on this? Looks like a horrible hack to me. Recoding stuff to the client encoding in the server outside the existing recoding mechanism looks pretty evil to me. Plus, it does not address the problem of what happens to messages sent before this, it just moves the point of "before" a bit earlier for some special cases. I think we have discussed more proper solutions earlier in this thread. IMO the best approach would be for the client to include the client encoding in the startup package. > > --------------------------------------------------------------------------- > > Hiroshi Inoue wrote: >> Hi, >> >> This topic seems to be related to the bug report >> [ODBC] Localized error messages, wrong charset >> . >> >> Bruce Momjian wrote: >>> Added to TODO: >>> >>> Improve encoding of connection startup messages sent to the client >>> >>> Currently some authentication error messages are sent in the server >>> encoding >> It it true ? >> IIRC the backend knows nothing about the server encoding in >> authentication phase. >> >> Psqlodbc Unicode driver sends connection startup message which >> contains the client_encoding(=UTF8) guc parameter. Attached is >> a trial patch so that the psqlodbc Unicode driver can get >> properly localized password error messages. >> >> regards, >> Hiroshi Inoue > >> Index: postmaster/postmaster.c >> =================================================================== >> RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/postmaster.c,v >> retrieving revision 1.570 >> diff -c -c -r1.570 postmaster.c >> *** postmaster/postmaster.c 4 Jan 2009 22:19:59 -0000 1.570 >> --- postmaster/postmaster.c 30 Jan 2009 14:05:35 -0000 >> *************** >> *** 1552,1557 **** >> --- 1552,1560 ---- >> pstrdup(nameptr)); >> port->guc_options = lappend(port->guc_options, >> pstrdup(valptr)); >> + if (stricmp(nameptr, "client_encoding") == 0 && >> + stricmp(valptr, "UTF8") == 0) >> + bind_textdomain_codeset(PG_TEXTDOMAIN("postgres"), "UTF-8"); >> } >> offset = valoffset + strlen(valptr) + 1; >> } >
Peter Eisentraut <peter_e@gmx.net> writes: > Bruce Momjian wrote: >> Can someone comment on this? > Looks like a horrible hack to me. Recoding stuff to the client encoding > in the server outside the existing recoding mechanism looks pretty evil > to me. Plus, it does not address the problem of what happens to > messages sent before this, it just moves the point of "before" a bit > earlier for some special cases. > I think we have discussed more proper solutions earlier in this thread. > IMO the best approach would be for the client to include the client > encoding in the startup package. Huh? Clients already do that (or at least some are capable of it, including libpq). The hard problems are (1) there's still a "before", ie we might fail before scanning the options in the packet, and (2) the sent encoding might itself be invalid, and you still have to report that somehow. I believe the only real "fix" is to guarantee that messages are sent as untranslated ASCII until we have sent an encoding indicator at the end of the startup sequence. Which has its own pretty clear downside: no more translation of authorization failures. regards, tom lane
On Sun, Feb 08, 2009 at 10:38:16AM -0500, Tom Lane wrote: > I believe the only real "fix" is to guarantee that messages are sent > as untranslated ASCII until we have sent an encoding indicator at > the end of the startup sequence. Which has its own pretty clear > downside: no more translation of authorization failures. I, for one, would be perfectly fine with this. It has several advantages: - it does allow to "know" the encoding right from the beginning - it does not require client library changes (IOW the change does not need to trickle down: server -> libpq -> psycopg2 -> GNUmed) - the type of (auth) failure can be deduced from the (now always English) message string which, again, allows for client-side translation, if so desired Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> Bruce Momjian wrote: >>> Can someone comment on this? > >> Looks like a horrible hack to me. Recoding stuff to the client encoding >> in the server outside the existing recoding mechanism looks pretty evil >> to me. Plus, it does not address the problem of what happens to >> messages sent before this, it just moves the point of "before" a bit >> earlier for some special cases. > >> I think we have discussed more proper solutions earlier in this thread. >> IMO the best approach would be for the client to include the client >> encoding in the startup package. > > Huh? Clients already do that (or at least some are capable of it, > including libpq). Yes the psqlodbc driver has done it because protocol 3 allowed it from the first. > The hard problems are (1) there's still a "before", Yes but isn't it an improvement that properly localized password error or no database error etc can be seen? Currently I see unreadable error messages for those cases via psqlodbc driver. The attatched patch in my previous posting is an example to solve the problem. regards, Hirosh Inoue
Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> Bruce Momjian wrote: >>> Can someone comment on this? > >> I think we have discussed more proper solutions earlier in this thread. >> IMO the best approach would be for the client to include the client >> encoding in the startup package. > > Huh? Clients already do that (or at least some are capable of it, > including libpq). What is a recommended way to do it using libpq, via "options" parameter or "PGCLIENTENCOIDNG" environment value? > The hard problems are (1) there's still a "before", > ie we might fail before scanning the options in the packet, and (2) > the sent encoding might itself be invalid, and you still have to report > that somehow. > > I believe the only real "fix" is to guarantee that messages are sent > as untranslated ASCII until we have sent an encoding indicator at > the end of the startup sequence. Which has its own pretty clear > downside: no more translation of authorization failures. I'm afraid I'm misunderstanding your point. I'm thinking of the following steps in the backend code. 1.Set LC_MESSAGES to "C" until the client_encoding is determined. 2.When a client_encoding is specifed in the startup message, bind the corrsponding codeset to the textdomain and set LC_MESSAGES to the specified one in the startup message or restore the LC_MESSAGES overridden by step 1 before authorization step. Then we can see properly localized authorization failure messages. 3.Reset LC_MESSAGES to the current one in Initialize ClientEncoding() and unbind the codeset if necessary in SetDatabaseEncoding(). Comments? regards, Hiroshi Inoue
Hiroshi Inoue <inoue@tpf.co.jp> writes: > I'm thinking of the following steps in the backend code. > 1.Set LC_MESSAGES to "C" until the client_encoding is > determined. > 2.When a client_encoding is specifed in the startup > message, bind the corrsponding codeset to the > textdomain and set LC_MESSAGES to the specified one > in the startup message or restore the LC_MESSAGES > overridden by step 1 before authorization step. > Then we can see properly localized authorization > failure messages. > 3.Reset LC_MESSAGES to the current one in Initialize > ClientEncoding() and unbind the codeset if necessary > in SetDatabaseEncoding(). Reflecting on the bigger picture ... I would imagine that the vast majority of existing applications depend on client_encoding settings that come from postgresql.conf, ALTER USER SET, ALTER DATABASE SET, or just the default (== database encoding). I don't think a solution that penalizes those cases and makes only the case of setting it via PGCLIENTENCODING work nicely is going to make very many people happy. Mind you, I don't really know how to do better, but I do see that the case of client_encoding being specified in the startup message is not going to help enough people to be particularly useful. regards, tom lane
Tom Lane wrote: > Hiroshi Inoue <inoue@tpf.co.jp> writes: >> I'm thinking of the following steps in the backend code. > >> 1.Set LC_MESSAGES to "C" until the client_encoding is >> determined. >> 2.When a client_encoding is specifed in the startup >> message, bind the corrsponding codeset to the >> textdomain and set LC_MESSAGES to the specified one >> in the startup message or restore the LC_MESSAGES >> overridden by step 1 before authorization step. >> Then we can see properly localized authorization >> failure messages. > >> 3.Reset LC_MESSAGES to the current one in Initialize >> ClientEncoding() and unbind the codeset if necessary >> in SetDatabaseEncoding(). > > Reflecting on the bigger picture ... I would imagine that the vast > majority of existing applications depend on client_encoding settings > that come from postgresql.conf, ALTER USER SET, ALTER DATABASE SET, or > just the default (== database encoding). I don't think a solution that > penalizes those cases Removing step 1 resolves the penalty. In the first place step 1 comes from your or Karsten's suggestion. > and makes only the case of setting it via > PGCLIENTENCODING work nicely is going to make very many people happy. > Mind you, I don't really know how to do better, but I do see that the > case of client_encoding being specified in the startup message is > not going to help enough people to be particularly useful. Maybe not enough currently because collaboration between the backend and clients is needed to solve this problem ovbiously. The backend should provide clients the way to specify the client_encoding on the fly which can be applied to authorization failure messages. Then clients which are eager to solve this problem would use the way. Using the information in the startup message is almost unique way to achieve it. regards, Hiroshi Inoue
Tom Lane wrote: > Reflecting on the bigger picture ... I would imagine that the vast > majority of existing applications depend on client_encoding settings > that come from postgresql.conf, ALTER USER SET, ALTER DATABASE SET, or > just the default (== database encoding). I don't think a solution that > penalizes those cases and makes only the case of setting it via > PGCLIENTENCODING work nicely is going to make very many people happy. I don't have any survey data available, but I think this assessment is semantically wrong. Usefully, the client encoding can come only from the client, or be defaulted (and even that is semantically wrong). I see the other cases as workarounds.
Tom Lane wrote: > I believe the only real "fix" is to guarantee that messages are sent > as untranslated ASCII until we have sent an encoding indicator at > the end of the startup sequence. Which has its own pretty clear > downside: no more translation of authorization failures. We should process the client encoding sent in the startup package before authentication.
Peter Eisentraut <peter_e@gmx.net> writes: > Tom Lane wrote: >> Reflecting on the bigger picture ... I would imagine that the vast >> majority of existing applications depend on client_encoding settings >> that come from postgresql.conf, ALTER USER SET, ALTER DATABASE SET, or >> just the default (== database encoding). I don't think a solution that >> penalizes those cases and makes only the case of setting it via >> PGCLIENTENCODING work nicely is going to make very many people happy. > I don't have any survey data available, but I think this assessment is > semantically wrong. Usefully, the client encoding can come only from > the client, or be defaulted (and even that is semantically wrong). In an ideal world, perhaps so, but do you deny my point that that's not reality? regards, tom lane
On Wednesday 11 February 2009 18:00:31 Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > Tom Lane wrote: > >> Reflecting on the bigger picture ... I would imagine that the vast > >> majority of existing applications depend on client_encoding settings > >> that come from postgresql.conf, ALTER USER SET, ALTER DATABASE SET, or > >> just the default (== database encoding). I don't think a solution that > >> penalizes those cases and makes only the case of setting it via > >> PGCLIENTENCODING work nicely is going to make very many people happy. > > > > I don't have any survey data available, but I think this assessment is > > semantically wrong. Usefully, the client encoding can come only from > > the client, or be defaulted (and even that is semantically wrong). > > In an ideal world, perhaps so, but do you deny my point that that's not > reality? I have never seen a setup where the client encoding did not come from the default or the client (and the person who set it up knew what they were doing). I don't think the other cases are worth optimizing.
Hi, I have 14 tables filled with meteorological data, one record per parameter per hour. The id field holds the parameter type(1=temperature, 2=humidity ...) My problem is that for short periods (maybe one week, one month) there are two stationswith the same data, I mean the temperature of table1 is equal to the humidity of table3. I need to discover thosecases. I could pick one record in the first station and then compare it with the ones in the other tables for all the parametersat that particular date. If two records are equals (it probably happens) I must then check the next one in thetimeserie. If the second record is equal too, then probably the two series may be equals and I must raise an alert frommy application. Is there a better and faster way to perform such a check ? -- tables CREATE TABLE table1 ( fulldate timestamp, id smallint NOT NULL, meanvalue real ) WITH (OIDS=FALSE); --..................... --..................... CREATE TABLE table14 ( fulldate timestamp, id smallint NOT NULL, meanvalue real ) WITH (OIDS=FALSE); -- -- inserts insert into table1(select ('2009-01-01'::timestamp + interval '1 hour' * s.a)::timestamp, 1::smallint, round(cast(random() as numeric), 1)::real from generate_series(0,1000) as s(a) ); -- insert into table2(select ('2009-01-01'::timestamp + interval '1 hour' * s.a)::timestamp, 1::smallint, round(cast(random() as numeric), 1)::real from generate_series(0,1000) as s(a) ); -- -- same data as table 1 -- MUST BE FOUND BY THE CKECK ROUTINE insert into table3(select fulldate, id, meanvalue from table1); Thank in advance, Paolo Saudin
Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> Tom Lane wrote: >>> Reflecting on the bigger picture ... I would imagine that the vast >>> majority of existing applications depend on client_encoding settings >>> that come from postgresql.conf, ALTER USER SET, ALTER DATABASE SET, or >>> just the default (== database encoding). I don't think a solution that >>> penalizes those cases and makes only not only but also the JDBC driver or the ODBC driver sends the startup packet including the client_encoding. Libpq can be changed to allow *client_encoding=xxxxx* definition in conninfo. > the case of setting it via >>> PGCLIENTENCODING work nicely is going to make very many people happy. Not a few libraries/applications issue SET client_encoding to ... immediately after the connection was establised. What's wrong with urging such clients to eliminate the SET commands and use the nice feature of FE/BE procotol? regards, Hiroshi Inoue
On Thu, Feb 12, 2009 at 10:28:38PM +0900, Hiroshi Inoue wrote: > >>> Reflecting on the bigger picture ... I would imagine that the vast > >>> majority of existing applications depend on client_encoding settings > >>> that come from postgresql.conf, ALTER USER SET, ALTER DATABASE SET, or > >>> just the default (== database encoding). I don't think a solution that > >>> penalizes those cases and makes only While I agree that it is desirable to find a solution which does not break (or at least "allows to make not break") such applications I tend to think that those applications are not based on sound engineering with respect to this detail. A clean solution - which allows for the unclean solution to coexist - is very desirable. One might introduce a GUC variable "pre_connect_encoding". If not set it would default to 7-bit ascii as proposed but can be set to either some explicit encoding or else to, say, "deduce_from_environment" which would reinstate the current behaviour. That would allow those who don't want to/cannot fix client code to keep things working as before. > not only but also the JDBC driver or the ODBC driver sends the > startup packet including the client_encoding. Libpq can be changed > to allow *client_encoding=xxxxx* definition in conninfo. > > > the case of setting it via > >>> PGCLIENTENCODING work nicely is going to make very many people happy. > > Not a few libraries/applications issue SET client_encoding to ... > immediately after the connection was establised. What's wrong with > urging such clients to eliminate the SET commands and use the nice > feature of FE/BE procotol? Sounds sane to me as a developer using a Python PostgreSQL adapter. It can be argued that it is *still* useful to have a well known initial encoding (7-bit ascii) -- this would be used for returning an error if the startup packet included an invalid encoding ... Also it would allow current application code to assume that encoding until the encoding can be set by current means - this would allow current application code to behave correctly without having to wait for database adapter code to support setting the encoding in the startup packet. Or put another way: I could fix GNUmed properly NOW rather than having to wait for psycopg2 to support setting the client encoding in the startup packet. Thanks, Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Wed, Feb 11, 2009 at 02:20:47PM +0900, Hiroshi Inoue wrote: > Tom Lane wrote: > > Hiroshi Inoue <inoue@tpf.co.jp> writes: > >> I'm thinking of the following steps in the backend code. > > > >> 1.Set LC_MESSAGES to "C" until the client_encoding is > >> determined. I have tried that but it didn't work out for some reason. > Removing step 1 resolves the penalty. In the first place step 1 > comes from your or Karsten's suggestion. Not quite. My suggestion was to not *translate* strings (and assume 7-bit ascii) until the client encoding is known. > Maybe not enough currently because collaboration between the backend > and clients is needed to solve this problem ovbiously. The backend > should provide clients the way to specify the client_encoding on the > fly which can be applied to authorization failure messages. Then > clients which are eager to solve this problem would use the way. > Using the information in the startup message is almost unique way > to achieve it. Sounds good to me as far as I can see. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Thursday 12 February 2009 12:06:41 am Paolo Saudin wrote: > Hi, > > I have 14 tables filled with meteorological data, one record per parameter > per hour. The id field holds the parameter type (1=temperature, 2=humidity > ...) My problem is that for short periods (maybe one week, one month) there > are two stations with the same data, I mean the temperature of table1 is > equal to the humidity of table3. I need to discover those cases. Before I can start to answer this I need some clarification. How can temperature and humidity be the same data? > > I could pick one record in the first station and then compare it with the > ones in the other tables for all the parameters at that particular date. If > two records are equals (it probably happens) I must then check the next one > in the timeserie. If the second record is equal too, then probably the two > series may be equals and I must raise an alert from my application. Is > there a better and faster way to perform such a check ? > > -- tables > CREATE TABLE table1 > ( > fulldate timestamp, > id smallint NOT NULL, > meanvalue real > ) WITH (OIDS=FALSE); > --..................... > --..................... > CREATE TABLE table14 > ( > fulldate timestamp, > id smallint NOT NULL, > meanvalue real > ) WITH (OIDS=FALSE); > -- > -- inserts > insert into table1(select > ('2009-01-01'::timestamp + interval '1 hour' * s.a)::timestamp, > 1::smallint, round(cast(random() as numeric), 1)::real > from generate_series(0,1000) as s(a) > ); > -- > insert into table2(select > ('2009-01-01'::timestamp + interval '1 hour' * s.a)::timestamp, > 1::smallint, round(cast(random() as numeric), 1)::real > from generate_series(0,1000) as s(a) > ); > -- > -- same data as table 1 -- MUST BE FOUND BY THE CKECK ROUTINE > insert into table3(select fulldate, id, meanvalue from table1); > > > Thank in advance, > Paolo Saudin -- Adrian Klaver aklaver@comcast.net
>-----Messaggio originale----- >Da: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] Per conto di Adrian Klaver >Inviato: giovedì 12 febbraio 2009 17.28 >A: pgsql-general@postgresql.org >Cc: Paolo Saudin >Oggetto: Re: [GENERAL] How to check if 2 series of data are equal >On Thursday 12 February 2009 12:06:41 am Paolo Saudin wrote: >> Hi, >> >> I have 14 tables filled with meteorological data, one record per parameter >> per hour. The id field holds the parameter type (1=temperature, 2=humidity >> ...) My problem is that for short periods (maybe one week, one month) there >> are two stations with the same data, I mean the temperature of table1 is >> equal to the humidity of table3. I need to discover those cases. >Before I can start to answer this I need some clarification. How can temperature >and humidity be the same data? Can be the same data ( and it is ) because of errors in the remote stations configurations. The Stations and parameters IDs were mixed up resulting in same data in different tables ... >> >> I could pick one record in the first station and then compare it with the >> ones in the other tables for all the parameters at that particular date. If >> two records are equals (it probably happens) I must then check the next one >> in the timeserie. If the second record is equal too, then probably the two >> series may be equals and I must raise an alert from my application. Is >> there a better and faster way to perform such a check ? >> >> -- tables >> CREATE TABLE table1 >> ( >> fulldate timestamp, >> id smallint NOT NULL, >> meanvalue real >> ) WITH (OIDS=FALSE); >> --..................... >> --..................... >> CREATE TABLE table14 >> ( >> fulldate timestamp, >> id smallint NOT NULL, >> meanvalue real >> ) WITH (OIDS=FALSE); >> -- >> -- inserts >> insert into table1(select >> ('2009-01-01'::timestamp + interval '1 hour' * s.a)::timestamp, >> 1::smallint, round(cast(random() as numeric), 1)::real >> from generate_series(0,1000) as s(a) >> ); >> -- >> insert into table2(select >> ('2009-01-01'::timestamp + interval '1 hour' * s.a)::timestamp, >> 1::smallint, round(cast(random() as numeric), 1)::real >> from generate_series(0,1000) as s(a) >> ); >> -- >> -- same data as table 1 -- MUST BE FOUND BY THE CKECK ROUTINE >> insert into table3(select fulldate, id, meanvalue from table1); >> >> >> Thank in advance, >> Paolo Saudin >-- >Adrian Klaver >aklaver@comcast.net >-- >Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-general
----- "Paolo Saudin" <paolo@ecometer.it> wrote: > >-----Messaggio originale----- > >Da: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] Per conto di Adrian > Klaver > >Inviato: giovedì 12 febbraio 2009 17.28 > >A: pgsql-general@postgresql.org > >Cc: Paolo Saudin > >Oggetto: Re: [GENERAL] How to check if 2 series of data are equal > > >On Thursday 12 February 2009 12:06:41 am Paolo Saudin wrote: > >> Hi, > >> > >> I have 14 tables filled with meteorological data, one record per > parameter > >> per hour. The id field holds the parameter type (1=temperature, > 2=humidity > >> ...) My problem is that for short periods (maybe one week, one > month) there > >> are two stations with the same data, I mean the temperature of > table1 is > >> equal to the humidity of table3. I need to discover those cases. > > >Before I can start to answer this I need some clarification. How can > temperature > >and humidity be the same data? > > Can be the same data ( and it is ) because of errors in the remote > stations configurations. > The Stations and parameters IDs were mixed up resulting in same data > in different tables ... > I am afraid I more confused now. From the table schema the value is a real number only and has no units. As I understandthe units designation lies in the id. If the ids are mixed up I can't see how it is possible to differentiate betweena value of 25 that maybe degrees C or % relative humidity for instance. You are going to have to step me through this. Adrian Klaver aklaver@comcast.net
On Thu, Feb 12, 2009 at 09:06:41AM +0100, Paolo Saudin wrote: > I have 14 tables filled with meteorological data, one record > per parameter per hour. The id field holds the parameter type > (1=temperature, 2=humidity ...) My problem is that for short periods > (maybe one week, one month) there are two stations with the same data, > I mean the temperature of table1 is equal to the humidity of table3. I > need to discover those cases. I'm assuming that it's one table per sensor station below. Would something like this work: SELECT station, fulldate, COUNT(CASE WHEN sensor = 1 THEN 1 END) AS num_sensor1_readings, COUNT(CASE WHEN sensor = 2 THEN 1 END) AS num_sensor2_readings, COUNT(CASE WHEN sensor = 3 THEN 1 END) AS num_sensor3_readings FROM ( SELECT 1 AS station, fulldate, id AS sensor FROM table1 UNION ALL SELECT 2, fulldate, id FROM table2 UNION ALL SELECT 3, fulldate, id FROM table 3) x GROUP BY station, fulldate ORDER BY station, fulldate; That way you'll get a list of all the duplicate values. If the "fulldate" column is the actual time it was received and isn't truncated off to the nearest hour, you will probably want to use date_trunc('hour',fulldate) in the outer select. Hope that helps! -- Sam http://samason.me.uk/
>-----Messaggio originale----- >Da: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] Per conto di Adrian Klaver >Inviato: giovedì 12 febbraio 2009 18.57 >A: Paolo Saudin >Cc: pgsql-general@postgresql.org >Oggetto: Re: R: [GENERAL] How to check if 2 series of data are equal >----- "Paolo Saudin" <paolo@ecometer.it> wrote: > >-----Messaggio originale----- > >Da: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] Per conto di Adrian > Klaver > >Inviato: giovedì 12 febbraio 2009 17.28 > >A: pgsql-general@postgresql.org > >Cc: Paolo Saudin > >Oggetto: Re: [GENERAL] How to check if 2 series of data are equal > > >On Thursday 12 February 2009 12:06:41 am Paolo Saudin wrote: > >> Hi, > >> > >> I have 14 tables filled with meteorological data, one record per > parameter > >> per hour. The id field holds the parameter type (1=temperature, > 2=humidity > >> ...) My problem is that for short periods (maybe one week, one > month) there > >> are two stations with the same data, I mean the temperature of > table1 is > >> equal to the humidity of table3. I need to discover those cases. > > >Before I can start to answer this I need some clarification. How can > temperature > >and humidity be the same data? > > Can be the same data ( and it is ) because of errors in the remote > stations configurations. > The Stations and parameters IDs were mixed up resulting in same data > in different tables ... > >I am afraid I more confused now. From the table schema the value is a real number only and has no units. As I understandthe units >designation lies in the id. If the ids are mixed up I can't see how it is possible to differentiatebetween a value of 25 that maybe >degrees C or % relative humidity for instance. You are going to have to stepme through this. Yes, the parameter is defined by the id and stored in another table with the name, units and other properties. I need tofind out a sequence of meanvalues (without taking care of ids) which exists in another table Here is some sample data, I need to found out if some sequence of data in table1 is equal to data in table2, table3 ...tableN. Table1 fulldate, id, meanvalue 2009-01-01 00:00:00, 1, 12.3 -- temperature 2009-01-01 01:00:00, 1, 12.5 2009-01-01 02:00:00, 1, 12.6 2009-01-01 03:00:00, 1, 12.7 2009-01-01 04:00:00, 1, 12.8 2009-01-01 05:00:00, 1, 12.2 Table1 fulldate, id, meanvalue 2009-01-01 00:00:00, 2, 80.3 -- humidity 2009-01-01 01:00:00, 2, 81.6 2009-01-01 02:00:00, 2, 82.1 2009-01-01 03:00:00, 2, 79.8 2009-01-01 04:00:00, 2, 77.2 2009-01-01 05:00:00, 2, 77.1 ------------------------------------------------------------------ Table2 fulldate, id, meanvalue 2009-01-01 00:00:00, 1, 12.3 -- temperature 2009-01-01 01:00:00, 1, 11.8 2009-01-01 02:00:00, 1, 82.1 ! WRONG DATA - SAME AS Table2 id->2 2009-01-01 03:00:00, 1, 79.8 ! WRONG DATA - SAME AS Table2 id->2 2009-01-01 04:00:00, 1, 77.2 ! WRONG DATA - SAME AS Table2 id->2 2009-01-01 05:00:00, 1, 13.1 Table2 fulldate, id, meanvalue 2009-01-01 00:00:00, 2, 78.9 -- humidity 2009-01-01 01:00:00, 2, 76.4 2009-01-01 02:00:00, 2, 74.7 2009-01-01 03:00:00, 2, 73.1 2009-01-01 04:00:00, 2, 71.6 2009-01-01 05:00:00, 1, 70.8 Hope this might help, Paolo Saudin >Adrian Klaver >aklaver@comcast.net -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On Thursday 12 February 2009 11:37:37 am Paolo Saudin wrote: > > > > Can be the same data ( and it is ) because of errors in the remote > > stations configurations. > > The Stations and parameters IDs were mixed up resulting in same data > > in different tables ... > > > > > >I am afraid I more confused now. From the table schema the value is a real > > number only and has no units. As I understand the units >designation lies > > in the id. If the ids are mixed up I can't see how it is possible to > > differentiate between a value of 25 that maybe >degrees C or % relative > > humidity for instance. You are going to have to step me through this. > > Yes, the parameter is defined by the id and stored in another table with > the name, units and other properties. I need to find out a sequence of > meanvalues (without taking care of ids) which exists in another table > > Here is some sample data, I need to found out if some sequence of data in > table1 is equal to data in table2, table3 ... tableN. > > Table1 > fulldate, id, meanvalue > 2009-01-01 00:00:00, 1, 12.3 -- temperature > 2009-01-01 01:00:00, 1, 12.5 > 2009-01-01 02:00:00, 1, 12.6 > 2009-01-01 03:00:00, 1, 12.7 > 2009-01-01 04:00:00, 1, 12.8 > 2009-01-01 05:00:00, 1, 12.2 > > Table1 > fulldate, id, meanvalue > 2009-01-01 00:00:00, 2, 80.3 -- humidity > 2009-01-01 01:00:00, 2, 81.6 > 2009-01-01 02:00:00, 2, 82.1 > 2009-01-01 03:00:00, 2, 79.8 > 2009-01-01 04:00:00, 2, 77.2 > 2009-01-01 05:00:00, 2, 77.1 > ------------------------------------------------------------------ > > Table2 > fulldate, id, meanvalue > 2009-01-01 00:00:00, 1, 12.3 -- temperature > 2009-01-01 01:00:00, 1, 11.8 > 2009-01-01 02:00:00, 1, 82.1 ! WRONG DATA - SAME AS Table2 id->2 > 2009-01-01 03:00:00, 1, 79.8 ! WRONG DATA - SAME AS Table2 id->2 > 2009-01-01 04:00:00, 1, 77.2 ! WRONG DATA - SAME AS Table2 id->2 > 2009-01-01 05:00:00, 1, 13.1 I am going to assume you mean Table1 above. > > Table2 > fulldate, id, meanvalue > 2009-01-01 00:00:00, 2, 78.9 -- humidity > 2009-01-01 01:00:00, 2, 76.4 > 2009-01-01 02:00:00, 2, 74.7 > 2009-01-01 03:00:00, 2, 73.1 > 2009-01-01 04:00:00, 2, 71.6 > 2009-01-01 05:00:00, 1, 70.8 > > Hope this might help, > Paolo Saudin I modified Sams query- SELECT fulldate,sensor FROM (SELECT fulldate,sensor,count(sensor) FROM (SELECT 1 AS station, fulldate, meanvalue AS sensor FROM table1 UNION SELECT 2, fulldate, meanvalue FROM table2 ORDER BY fulldate,sensor) AS x GROUP BY fulldate,sensor) AS y WHERE y.count>1; and got- fulldate | sensor ---------------------+-------- 2009-01-01 00:00:00 | 12.3 2009-01-01 02:00:00 | 82.1 2009-01-01 03:00:00 | 79.8 2009-01-01 04:00:00 | 77.2 Though I think you might want to deal with the remote sensor problem first. I would be hesitant to trust any of the data. Just a thought. -- Adrian Klaver aklaver@comcast.net
On Thu, 2009-02-12 at 14:21 -0800, Adrian Klaver wrote: > On Thursday 12 February 2009 11:37:37 am Paolo Saudin wrote: > > > SELECT fulldate,sensor > FROM (SELECT fulldate,sensor,count(sensor) > FROM (SELECT 1 AS station, fulldate, meanvalue AS sensor FROM table1 > UNION > SELECT 2, fulldate, meanvalue FROM table2 ORDER BY > fulldate,sensor) AS x > GROUP BY fulldate,sensor) AS y > WHERE y.count>1; > > > and got- > > fulldate | sensor > ---------------------+-------- > 2009-01-01 00:00:00 | 12.3 > 2009-01-01 02:00:00 | 82.1 > 2009-01-01 03:00:00 | 79.8 > 2009-01-01 04:00:00 | 77.2 You might want to do a UNION with all your tables to add a field to identify the table, and use min() and max() to show the offending tables.
-----Messaggio originale----- Da: Adrian Klaver [mailto:aklaver@comcast.net] Inviato: giovedì 12 febbraio 2009 23.22 A: Paolo Saudin Cc: pgsql-general@postgresql.org Oggetto: Re: R: R: [GENERAL] How to check if 2 series of data are equal On Thursday 12 February 2009 11:37:37 am Paolo Saudin wrote: > > > > Can be the same data ( and it is ) because of errors in the remote > > stations configurations. > > The Stations and parameters IDs were mixed up resulting in same data > > in different tables ... > > > > > >I am afraid I more confused now. From the table schema the value is a real > > number only and has no units. As I understand the units >designation lies > > in the id. If the ids are mixed up I can't see how it is possible to > > differentiate between a value of 25 that maybe >degrees C or % relative > > humidity for instance. You are going to have to step me through this. > > Yes, the parameter is defined by the id and stored in another table with > the name, units and other properties. I need to find out a sequence of > meanvalues (without taking care of ids) which exists in another table > > Here is some sample data, I need to found out if some sequence of data in > table1 is equal to data in table2, table3 ... tableN. > > Table1 > fulldate, id, meanvalue > 2009-01-01 00:00:00, 1, 12.3 -- temperature > 2009-01-01 01:00:00, 1, 12.5 > 2009-01-01 02:00:00, 1, 12.6 > 2009-01-01 03:00:00, 1, 12.7 > 2009-01-01 04:00:00, 1, 12.8 > 2009-01-01 05:00:00, 1, 12.2 > > Table1 > fulldate, id, meanvalue > 2009-01-01 00:00:00, 2, 80.3 -- humidity > 2009-01-01 01:00:00, 2, 81.6 > 2009-01-01 02:00:00, 2, 82.1 > 2009-01-01 03:00:00, 2, 79.8 > 2009-01-01 04:00:00, 2, 77.2 > 2009-01-01 05:00:00, 2, 77.1 > ------------------------------------------------------------------ > > Table2 > fulldate, id, meanvalue > 2009-01-01 00:00:00, 1, 12.3 -- temperature > 2009-01-01 01:00:00, 1, 11.8 > 2009-01-01 02:00:00, 1, 82.1 ! WRONG DATA - SAME AS Table2 id->2 > 2009-01-01 03:00:00, 1, 79.8 ! WRONG DATA - SAME AS Table2 id->2 > 2009-01-01 04:00:00, 1, 77.2 ! WRONG DATA - SAME AS Table2 id->2 > 2009-01-01 05:00:00, 1, 13.1 I am going to assume you mean Table1 above. > > Table2 > fulldate, id, meanvalue > 2009-01-01 00:00:00, 2, 78.9 -- humidity > 2009-01-01 01:00:00, 2, 76.4 > 2009-01-01 02:00:00, 2, 74.7 > 2009-01-01 03:00:00, 2, 73.1 > 2009-01-01 04:00:00, 2, 71.6 > 2009-01-01 05:00:00, 1, 70.8 > > Hope this might help, > Paolo Saudin >I modified Sams query- > > >SELECT fulldate,sensor > FROM (SELECT fulldate,sensor,count(sensor) > FROM (SELECT 1 AS station, fulldate, meanvalue AS sensor FROM table1 >UNION > SELECT 2, fulldate, meanvalue FROM table2 ORDER BY >fulldate,sensor) AS x >GROUP BY fulldate,sensor) AS y >WHERE y.count>1; > > >and got- > > fulldate | sensor >---------------------+-------- > 2009-01-01 00:00:00 | 12.3 > 2009-01-01 02:00:00 | 82.1 > 2009-01-01 03:00:00 | 79.8 > 2009-01-01 04:00:00 | 77.2 Thank you very much to you all, this one works perfectly !! >Though I think you might want to deal with the remote sensor problem first. I >would be hesitant to trust any of the data. Just a thought. The problem has been fixed and does not happens any more. Unfortunately there are 14 years (1992-2006) in which data couldbe corrupted for short periods. Now I must found them out ... Thanks once more Paolo Saudin >-- >Adrian Klaver >aklaver@comcast.net
Hi. Sorry very late reaction. I desire problem solution.Therefore, one evidence.... I tried jdbc program. http://winpg.jp/~saito/pg_work/LC_MESSAGE_CHECK/connect_problem/jdbctestx.java C:\home\HIROSHI>java jdbctestx org.postgresql.util.PSQLException: FATAL: ???[??"postgres"?????????????? at org.postgresql.core.v3.ConnectionFactoryImpl.readStartupMessages(Conn ectionFactoryImpl.java:444) at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(Conne ctionFactoryImpl.java:99) at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactor y.java:66) at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Conn ection.java:124) at org.postgresql.jdbc2.Jdbc2Connection.<init>(Jdbc2Connection.java:24) at org.postgresql.Driver.makeConnection(Driver.java:386) at org.postgresql.Driver.connect(Driver.java:260) at java.sql.DriverManager.getConnection(Unknown Source) at java.sql.DriverManager.getConnection(Unknown Source) at jdbctestx.main(jdbctestx.java:21) Then, server log is this. http://winpg.jp/~saito/pg_work/LC_MESSAGE_CHECK/connect_problem/server.log Regards, Hiroshi Saito ----- Original Message ----- From: "Hiroshi Inoue" <inoue@tpf.co.jp> > Tom Lane wrote: >> Peter Eisentraut <peter_e@gmx.net> writes: >>> Bruce Momjian wrote: >>>> Can someone comment on this? >> >>> Looks like a horrible hack to me. Recoding stuff to the client encoding >>> in the server outside the existing recoding mechanism looks pretty evil >>> to me. Plus, it does not address the problem of what happens to >>> messages sent before this, it just moves the point of "before" a bit >>> earlier for some special cases. >> >>> I think we have discussed more proper solutions earlier in this thread. >>> IMO the best approach would be for the client to include the client >>> encoding in the startup package. >> >> Huh? Clients already do that (or at least some are capable of it, >> including libpq). > > Yes the psqlodbc driver has done it because protocol 3 allowed it > from the first. > >> The hard problems are (1) there's still a "before", > > Yes but isn't it an improvement that properly localized password error > or no database error etc can be seen? Currently I see unreadable > error messages for those cases via psqlodbc driver. The attatched > patch in my previous posting is an example to solve the problem. > > regards, > Hirosh Inoue > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
On Tue, Mar 03, 2009 at 12:35:37AM +0900, Hiroshi Saito wrote: > Sorry very late reaction. > I desire problem solution. So do I :-) Ganbatte ! Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Hi. Karsten-san. Yeah, It was a problem unsolvable by the driver to relay. although perseverance keeping without giving up! -- arigatougozaimasu:-) Regards, Hiroshi Saito ----- Original Message ----- From: "Karsten Hilbert" <Karsten.Hilbert@gmx.net> > On Tue, Mar 03, 2009 at 12:35:37AM +0900, Hiroshi Saito wrote: > >> Sorry very late reaction. >> I desire problem solution. > So do I :-) > > Ganbatte ! > > Karsten > -- > GPG key ID E4071346 @ wwwkeys.pgp.net > E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
Is there any chance of this being considered for 8.4 ? Unfortunately I cannot provide patches myself as my knowledge of C is next to nothing. Thanks, Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346