Thread: [HACKERS] Getting rid of "unknown error" in dblink and postgres_fdw
We have a report in pgsql-general of a dblink query failing withERROR: unknown error This is, to say the least, unhelpful. And it violates our error message style guidelines. Where that is coming from is a situation where we've failed to extract any primary message from a remote error. (I theorize that today's report is triggered by an out-of-memory situation, but that's only an unsupported guess at the moment.) I propose that we should change that string to "could not obtain message string for error on connection "foo"", or something along that line. postgres_fdw has the same disease. It wouldn't have the notion of a named connection, but maybe we could insert the foreign server name instead. A possible objection is that if we really are on the hairy edge of OOM, trying to construct such error strings might push us over the edge and then you get "out of memory" instead. But IMO that's not worse; it could be argued to be a more useful description of the problem. Comments? regards, tom lane
Re: [HACKERS] Getting rid of "unknown error" in dblink and postgres_fdw
From
"David G. Johnston"
Date:
A possible objection is that if we really are on the hairy edge of OOM,
trying to construct such error strings might push us over the edge
What am I missing here? Constructing said string occurs on the local end of the connection but the memory problem exists on the remote end.
David J.
I wrote: > We have a report in pgsql-general of a dblink query failing with > ERROR: unknown error Er, fingers faster than brain this morning. Actually that report is in pgsql-bugs, specifically bug #14471: https://www.postgresql.org/message-id/20161221094443.25614.47807%40wrigleys.postgresql.org regards, tom lane
"David G. Johnston" <david.g.johnston@gmail.com> writes: > On Wed, Dec 21, 2016 at 9:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> A possible objection is that if we really are on the hairy edge of OOM, >> trying to construct such error strings might push us over the edge > What am I missing here? Constructing said string occurs on the local end > of the connection but the memory problem exists on the remote end. Um, that's not clear, in fact it's not likely. The backend usually manages to tell you so if it's out of memory --- or, worst case, it crashes trying, in which case the local libpq ought to gin up a report about loss of connection. So the root cause I'm suspecting is that the local libpq was unable to obtain memory for a PGresult or something like that. That theory has some holes of its own, because libpq also keeps some cards up its sleeve that usually let it report out-of-memory successfully, but it's the best I can do with the info at hand. In any case, the point of the error style guidelines is that it's *always* possible to do better than "unknown error"; now that it's been proven that this case is reachable in the field, we should try harder. regards, tom lane
On 12/21/2016 08:49 AM, Tom Lane wrote: > We have a report in pgsql-general of a dblink query failing with > ERROR: unknown error > This is, to say the least, unhelpful. And it violates our error > message style guidelines. > > Where that is coming from is a situation where we've failed to extract > any primary message from a remote error. (I theorize that today's report > is triggered by an out-of-memory situation, but that's only an unsupported > guess at the moment.) > > I propose that we should change that string to "could not obtain message > string for error on connection "foo"", or something along that line. > > postgres_fdw has the same disease. It wouldn't have the notion of a named > connection, but maybe we could insert the foreign server name instead. > > A possible objection is that if we really are on the hairy edge of OOM, > trying to construct such error strings might push us over the edge and > then you get "out of memory" instead. But IMO that's not worse; it > could be argued to be a more useful description of the problem. > > Comments? Seems reasonable to me. I can work on it if you'd like. Do you think this should be backpatched? Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Joe Conway <mail@joeconway.com> writes: > On 12/21/2016 08:49 AM, Tom Lane wrote: >> I propose that we should change that string to "could not obtain message >> string for error on connection "foo"", or something along that line. >> >> postgres_fdw has the same disease. It wouldn't have the notion of a named >> connection, but maybe we could insert the foreign server name instead. > Seems reasonable to me. I can work on it if you'd like. Do you think > this should be backpatched? If you have time for it, please do, I have lots on my plate already. I'd vote for back-patching; the benefits of a clearer error message are obvious, and it hardly seems likely that any existing applications are depending on this specific error string. regards, tom lane
On 12/21/2016 09:27 AM, Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >> On 12/21/2016 08:49 AM, Tom Lane wrote: >>> I propose that we should change that string to "could not obtain message >>> string for error on connection "foo"", or something along that line. >>> >>> postgres_fdw has the same disease. It wouldn't have the notion of a named >>> connection, but maybe we could insert the foreign server name instead. > >> Seems reasonable to me. I can work on it if you'd like. Do you think >> this should be backpatched? > > If you have time for it, please do, I have lots on my plate already. Ok, will do. > I'd vote for back-patching; the benefits of a clearer error message > are obvious, and it hardly seems likely that any existing applications > are depending on this specific error string. Agreed. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
I wrote: >>> I propose that we should change that string to "could not obtain message >>> string for error on connection "foo"", or something along that line. BTW, looking closer, I notice that the dblink case already has errcontext("Error occurred on dblink connection named \"%s\": %s.", dblink_context_conname, dblink_context_msg))); so we probably don't need the connection name in the primary error message. Now I think "could not obtain message string for remote error" would be a sufficient message. In the postgres_fdw case, I'd be inclined to use the same replacement primary message. Maybe we should think about adding the server name to the errcontext there, but that seems like an independent improvement. regards, tom lane
On 12/21/2016 10:08 AM, Tom Lane wrote: > I wrote: >>>> I propose that we should change that string to "could not obtain message >>>> string for error on connection "foo"", or something along that line. > > BTW, looking closer, I notice that the dblink case already has > > errcontext("Error occurred on dblink connection named \"%s\": %s.", > dblink_context_conname, dblink_context_msg))); > > so we probably don't need the connection name in the primary error > message. Now I think "could not obtain message string for remote error" > would be a sufficient message. > > In the postgres_fdw case, I'd be inclined to use the same replacement > primary message. Maybe we should think about adding the server name > to the errcontext there, but that seems like an independent improvement. Committed that way. I did notice that postgres_fdw has the following stanza that I don't see in dblink: 8<------------------ /** If we don't get a message from the PGresult, try the PGconn. This* is needed because for connection-level failures,PQexec may just* return NULL, not a PGresult at all.*/ if (message_primary == NULL)message_primary = PQerrorMessage(conn); 8<------------------ I wonder if the original issue on pgsql-bugs was a connection-level failure rather than OOM? Seems like dblink ought to do the same. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Joe Conway <mail@joeconway.com> writes: > I did notice that postgres_fdw has the following stanza that I don't see > in dblink: > 8<------------------ > /* > * If we don't get a message from the PGresult, try the PGconn. This > * is needed because for connection-level failures, PQexec may just > * return NULL, not a PGresult at all. > */ > if (message_primary == NULL) > message_primary = PQerrorMessage(conn); > 8<------------------ > I wonder if the original issue on pgsql-bugs was a connection-level > failure rather than OOM? Seems like dblink ought to do the same. Oooh ... I had thought that code was in both, which was why I was having a hard time explaining the OP's failure. But I see you're right, which provides a very straightforward explanation for the report. I believe that if libpq is unable to malloc a PGresult, it will return NULL but put an "out of memory" message into the PGconn's error buffer. I had supposed that we'd capture and report the latter, but as the dblink code stands, it won't. In short, yes, please copy that bit into dblink. regards, tom lane
On 12/21/2016 04:22 PM, Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >> I did notice that postgres_fdw has the following stanza that I don't see >> in dblink: > >> 8<------------------ >> /* >> * If we don't get a message from the PGresult, try the PGconn. This >> * is needed because for connection-level failures, PQexec may just >> * return NULL, not a PGresult at all. >> */ >> if (message_primary == NULL) >> message_primary = PQerrorMessage(conn); >> 8<------------------ > >> I wonder if the original issue on pgsql-bugs was a connection-level >> failure rather than OOM? Seems like dblink ought to do the same. > > Oooh ... I had thought that code was in both, which was why I was having > a hard time explaining the OP's failure. But I see you're right, > which provides a very straightforward explanation for the report. > I believe that if libpq is unable to malloc a PGresult, it will return > NULL but put an "out of memory" message into the PGconn's error buffer. > I had supposed that we'd capture and report the latter, but as the > dblink code stands, it won't. > > In short, yes, please copy that bit into dblink. The attached should do the trick I think. You think it is reasonable to backpatch this part too? Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Attachment
Joe Conway <mail@joeconway.com> writes: > On 12/21/2016 04:22 PM, Tom Lane wrote: >> In short, yes, please copy that bit into dblink. > The attached should do the trick I think. I see that you need to pass the PGconn into dblink_res_error() now, but what's the point of the new "bool fail" parameter? > You think it is reasonable to backpatch this part too? Yes, definitely. regards, tom lane
On 12/21/2016 09:20 PM, Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >> On 12/21/2016 04:22 PM, Tom Lane wrote: >>> In short, yes, please copy that bit into dblink. > >> The attached should do the trick I think. > > I see that you need to pass the PGconn into dblink_res_error() now, but > what's the point of the new "bool fail" parameter? That part isn't new -- we added it sometime prior to 9.2: 8<--------------if (fail) level = ERROR;else level = NOTICE; 8<-------------- It allows dblink to throw a NOTICE on remote errors rather than an actual ERROR, e.g. for an autonomous transaction. From the docs (9.2 in this case) 8<-------------- fail_on_error If true (the default when omitted) then an error thrown on the remote side of the connection causes an error to also be thrown locally. If false, the remote error is locally reported as a NOTICE, and the function returns no rows. 8<-------------- >> You think it is reasonable to backpatch this part too? > > Yes, definitely. Ok, will do. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Joe Conway <mail@joeconway.com> writes: > On 12/21/2016 09:20 PM, Tom Lane wrote: >> I see that you need to pass the PGconn into dblink_res_error() now, but >> what's the point of the new "bool fail" parameter? > That part isn't new -- we added it sometime prior to 9.2: Oh! I misread the patch --- something about an unluckily-placed line wrap and not looking very closely :-(. Yeah, it's fine as is. Sorry for the noise. regards, tom lane
On 12/22/2016 06:55 AM, Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >> On 12/21/2016 09:20 PM, Tom Lane wrote: >>> I see that you need to pass the PGconn into dblink_res_error() now, but >>> what's the point of the new "bool fail" parameter? > >> That part isn't new -- we added it sometime prior to 9.2: > > Oh! I misread the patch --- something about an unluckily-placed line > wrap and not looking very closely :-(. Yeah, it's fine as is. > Sorry for the noise. Thanks -- committed/backpatched to 9.2 Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development