Thread: \copy: unexpected response (4)
I have set up a batch job that pipes a couple hundred of \copy commands to psql. It starts out well enough, sprinkled with a few error messages that I intend to circle back and clean up after batching all of the ALTER commands that I need, but then: \copy gm1 from export/month1-88/13m/GM1.dat csv psql:copy.sql:6328161: ERROR: invalid input syntax for integer: "743.5" CONTEXT: COPY gm1, line 415306, column gm1c: "743.5" \copy gm2a from export/month1-88/13m/GM2A.dat csv \copy gm2e from export/month1-88/13m/GM2E.dat csv \copy gm3 from export/month1-88/13m/GM3.dat csv \copy gm4 from export/month1-88/13m/GM4.dat csv psql:copy.sql:8059525: \copy: unexpected response (4) psql:copy.sql:8059525: \copy: unexpected response (4) psql:copy.sql:8059525: \copy: unexpected response (4) psql:copy.sql:8059525: \copy: unexpected response (4) psql:copy.sql:8059525: \copy: unexpected response (4) psql:copy.sql:8059525: \copy: unexpected response (4) psql:copy.sql:8059525: \copy: unexpected response (4) psql:copy.sql:8059525: \copy: unexpected response (4) . . . (I am echoing everything and interleaving stderr into a log file.) That last message is repeated literally ad inifinitum -- I mean I found 300G of them after leaving my job running overnight! I tried a \copy on that last file by hand and simply got another 'invalid input syntax' message. Can anyone speculate on what might lead to this condition? At first I thought it was a networking problem but doing what I could to minimize that possibility had no effect. Maybe I need to use separate connections? The only reference I could find to that error message was http://doxygen.postgresql.org/bin_2psql_2copy_8h.html but I know absolutely nil about the source code. Thanks. Neil -- View this message in context: http://www.nabble.com/%5Ccopy%3A-unexpected-response-%284%29-tp24866027p24866027.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Neil Best <nbest@ci.uchicago.edu> writes: > psql:copy.sql:8059525: \copy: unexpected response (4) > psql:copy.sql:8059525: \copy: unexpected response (4) > psql:copy.sql:8059525: \copy: unexpected response (4) > psql:copy.sql:8059525: \copy: unexpected response (4) > psql:copy.sql:8059525: \copy: unexpected response (4) > psql:copy.sql:8059525: \copy: unexpected response (4) > psql:copy.sql:8059525: \copy: unexpected response (4) > psql:copy.sql:8059525: \copy: unexpected response (4) Hmm. It looks like psql could get into an infinite loop if the server failed to exit COPY IN mode for some reason, but it's not at all clear how that could happen (or what to do about it). What server version and what psql version is this? What does the server's log show? Can you provide a reproducible test case? regards, tom lane
Tom Lane-2 wrote: > > Hmm. It looks like psql could get into an infinite loop if the server > failed to exit COPY IN mode for some reason, but it's not at all clear > how that could happen (or what to do about it). What server version > and what psql version is this? What does the server's log show? > Can you provide a reproducible test case? > Maybe it is a networking issue after all. I also thought to check the server logs and found this: 2009-08-06 16:52:16 CDTCONTEXT: COPY gm1, line 415306, column gm1c: "743.5" 2009-08-06 16:52:16 CDTSTATEMENT: COPY gm1 FROM STDIN CSV 2009-08-06 16:52:51 CDTLOG: checkpoints are occurring too frequently (7 seconds apart) 2009-08-06 16:52:51 CDTHINT: Consider increasing the configuration parameter "checkpoint_segments". 2009-08-06 16:53:02 CDTLOG: checkpoints are occurring too frequently (11 seconds apart) 2009-08-06 16:53:02 CDTHINT: Consider increasing the configuration parameter "checkpoint_segments". 2009-08-06 16:53:20 CDTERROR: invalid input syntax for integer: "68.84" 2009-08-06 16:53:20 CDTCONTEXT: COPY gm4, line 411272, column gm4c: "68.84" 2009-08-06 16:53:20 CDTSTATEMENT: COPY gm4 FROM STDIN CSV 2009-08-06 16:53:20 CDTLOG: SSL renegotiation failure 2009-08-06 16:53:20 CDTSTATEMENT: COPY gm4 FROM STDIN CSV 2009-08-06 16:53:20 CDTLOG: SSL error: unexpected record 2009-08-06 16:53:20 CDTSTATEMENT: COPY gm4 FROM STDIN CSV 2009-08-06 16:53:20 CDTLOG: could not send data to client: Connection reset by peer 2009-08-06 16:53:20 CDTSTATEMENT: COPY gm4 FROM STDIN CSV 2009-08-06 16:53:20 CDTLOG: SSL error: unexpected record 2009-08-06 16:53:20 CDTLOG: could not receive data from client: Connection reset by peer 2009-08-06 16:53:20 CDTLOG: incomplete message from client 2009-08-06 17:38:21 CDTLOG: could not receive data from client: Operation timed out 2009-08-06 17:38:21 CDTLOG: unexpected EOF on client connection (END) As for the versions: monthly=> select version(); version -------------------------------------------------------------- PostgreSQL 8.3.7 on i386-apple-darwin9.6.0, compiled by GCC $ psql --version psql (PostgreSQL) 8.1.11 Could this mismatch cause the problem? I will find out if it's feasible to copy my data to the server to rule out these client and network vagaries. Should I be concerned abou the checkpoint messages? I haven't looked up what those are all about yet. -- View this message in context: http://www.nabble.com/%5Ccopy%3A-unexpected-response-%284%29-tp24866027p24867582.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Neil Best <nbest@ci.uchicago.edu> writes: > Tom Lane-2 wrote: >> Hmm. It looks like psql could get into an infinite loop if the server >> failed to exit COPY IN mode for some reason, but it's not at all clear >> how that could happen (or what to do about it). What server version >> and what psql version is this? What does the server's log show? >> Can you provide a reproducible test case? > Maybe it is a networking issue after all. I also thought to check the > server logs and found this: > 2009-08-06 16:53:20 CDTERROR: invalid input syntax for integer: "68.84" > 2009-08-06 16:53:20 CDTCONTEXT: COPY gm4, line 411272, column gm4c: "68.84" > 2009-08-06 16:53:20 CDTSTATEMENT: COPY gm4 FROM STDIN CSV > 2009-08-06 16:53:20 CDTLOG: SSL renegotiation failure > 2009-08-06 16:53:20 CDTSTATEMENT: COPY gm4 FROM STDIN CSV > 2009-08-06 16:53:20 CDTLOG: SSL error: unexpected record > 2009-08-06 16:53:20 CDTSTATEMENT: COPY gm4 FROM STDIN CSV > 2009-08-06 16:53:20 CDTLOG: could not send data to client: Connection reset > by peer Hmm, so it looks like the connection dropped and libpq failed to recognize that, or maybe libpq was okay but psql needs to check a bit more carefully here. I'll take a look. regards, tom lane
I wrote: > Hmm, so it looks like the connection dropped and libpq failed to > recognize that, or maybe libpq was okay but psql needs to check a bit > more carefully here. I'll take a look. I could not reproduce this problem in testing, but after eyeballing the code awhile I have a theory. It looks like it is possible for PQputCopyEnd to fail and leave the PGconn in COPY_IN state, but this only happens (1) if the output buffer contained at least 8K already, causing pqSendSome to be invoked from pqPutMsgEnd, and (2) pqSendSome returned failure. In that situation the loop in copy.c becomes infinite, since there's no mechanism for getting out of COPY_IN state. This case would be relatively difficult to trigger, but it seems to fit all the facts, if we assume that the connection had failed for some reason just at that point. BTW, the "SSL renegotiation failure" bit suggests that it could have been an OpenSSL bug not a real network lossage, so you might want to see how up-to-date your openssl libraries are. Anyway, it seems to me that the most appropriate fix is to add some code to that loop, along the lines of /* * Make sure we have pumped libpq dry of results; else it may still be in * ASYNC_BUSY state, leading to false readings in, eg, get_prompt(). */ while ((result = PQgetResult(pset.db)) != NULL) { success = false; psql_error("\\copy: unexpected response (%d)\n", PQresultStatus(result)); + /* if still in COPY IN state, try to get out of it */ + if (PQresultStatus(result) == PGRES_COPY_IN) + PQputCopyEnd(conn, _("trying to exit copy mode")); PQclear(result); } This would cover this particular case and perhaps others as well. regards, tom lane
On Fri, Aug 7, 2009 at 12:33 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > BTW, the "SSL renegotiation failure" bit > suggests that it could have been an OpenSSL bug not a real network > lossage, so you might want to see how up-to-date your openssl libraries > are. Thanks for your comments, Tom. The operation seems more reliable if I move the data to the server and do it across a local connection, which I presume does not involve SSL, so that may be the weak link as you surmise. Would you expect the SSL library problem more likely to be on the server or the client, or is it just hard to say? Does either of them have a facility that exposes the SSL version information or do I have to go to the OS for that? Incidentally, I have not experienced any sort of instability in my interactive sessions over an SSL connection, so is it related to the \copy operation itself, the higher volume of data across the connection, or the fact that I am asking it to do multiple \copies in rapid succession, would you say? I expect it's hard to say definitively, but maybe you or someone else can say something about likelihoods. I appreciate the information.
Neil Best <nbest@ci.uchicago.edu> writes: > On Fri, Aug 7, 2009 at 12:33 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> BTW, the "SSL renegotiation failure" bit >> suggests that it could have been an OpenSSL bug not a real network >> lossage, so you might want to see how up-to-date your openssl libraries >> are. > Thanks for your comments, Tom. The operation seems more reliable if I > move the data to the server and do it across a local connection, which > I presume does not involve SSL, so that may be the weak link as you > surmise. Would you expect the SSL library problem more likely to be > on the server or the client, or is it just hard to say? You're talking like you've found this to be repeatable. Is it? regards, tom lane
Tom Lane-2 wrote: > > Neil Best <nbest@ci.uchicago.edu> writes: >> On Fri, Aug 7, 2009 at 12:33 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> BTW, the "SSL renegotiation failure" bit >>> suggests that it could have been an OpenSSL bug not a real network >>> lossage, so you might want to see how up-to-date your openssl libraries >>> are. > >> Thanks for your comments, Tom. The operation seems more reliable if I >> move the data to the server and do it across a local connection, which >> I presume does not involve SSL, so that may be the weak link as you >> surmise. Would you expect the SSL library problem more likely to be >> on the server or the client, or is it just hard to say? > > You're talking like you've found this to be repeatable. Is it? > > -- View this message in context: http://www.nabble.com/%5Ccopy%3A-unexpected-response-%284%29-tp24866027p24973703.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Tom Lane-2 wrote: > > Neil Best <nbest@ci.uchicago.edu> writes: >> On Fri, Aug 7, 2009 at 12:33 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> BTW, the "SSL renegotiation failure" bit >>> suggests that it could have been an OpenSSL bug not a real network >>> lossage, so you might want to see how up-to-date your openssl libraries >>> are. > >> Thanks for your comments, Tom. The operation seems more reliable if I >> move the data to the server and do it across a local connection, which >> I presume does not involve SSL, so that may be the weak link as you >> surmise. Would you expect the SSL library problem more likely to be >> on the server or the client, or is it just hard to say? > > You're talking like you've found this to be repeatable. Is it? > > It is indeed repeatable, Tom. I was able to perform my \copy batch job multiple times without error by working over the local connection. I had to run it many times to iteratively catch all of the type mismatches that had to be fixed, so I truncated and \copied over 200 tables about 7 times in this manner. I did not attempt any software upgrades. Any theories? -- View this message in context: http://www.nabble.com/%5Ccopy%3A-unexpected-response-%284%29-tp24866027p24973785.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Neil Best <nbest@ci.uchicago.edu> writes: > Tom Lane-2 wrote: >> You're talking like you've found this to be repeatable. Is it? > It is indeed repeatable, Tom. I was able to perform my \copy batch job > multiple times without error by working over the local connection. I had to > run it many times to iteratively catch all of the type mismatches that had > to be fixed, so I truncated and \copied over 200 tables about 7 times in > this manner. I did not attempt any software upgrades. Any theories? Sorry, I meant to ask whether the *failure* was repeatable. regards, tom lane
Tom Lane-2 wrote: > > Sorry, I meant to ask whether the *failure* was repeatable. > Oh, I see. Well, to the extent that i had the same problem in two different remote clients before finding that the local connection work-around made it go away, I would say that it was repeatable. In short, it happened more than once and was not intermittent, as far as I could tell. Does that help? -- View this message in context: http://www.nabble.com/%5Ccopy%3A-unexpected-response-%284%29-tp24866027p24976506.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
I have encountered the symmetric error to this -> PGRES_COPY_OUT. We are using a foreign data wrapper into a large (and unnamed) database system which generates a substantial pipeline of rows for copyout to bring back through libPQ to respond to the psql request. If the back end is blown away the request hangs in the perl command line script containing the psql command. That command is then terminated with an alarm(60) call, which was unfortunately a too short timeout. Nevertheless, the gigabyte of error messages (one per row I assume) that are generated is probably too much error notification. One (or two) error message(s) would do. And the fix should possibly look something like: /* * Make sure we have pumped libpq dry of results; else it may still be in * ASYNC_BUSY state, leading to false readings in, eg, get_prompt(). */ while ((result = PQgetResult(pset.db)) != NULL) { success = false; psql_error("\\copy: unexpected response (%d)\n", PQresultStatus(result)); /* if still in COPY IN or COPY OUT state, try to get out of it */ if (PQresultStatus(result) == PGRES_COPY_IN || PQresultStatus(result) == PGRES_COPY_OUT) PQputCopyEnd(conn, _("trying to exit copy mode")); PQclear(result); } I think this becomes an issue only when foreign data wrappers are used, because both ends of the conversation can die with a pipeline of data being shoveled through a shared library call under the foreign data wrapper, while postgres bravely and pointlessly carries on. Charlie ... -- View this message in context: http://postgresql.1045698.n5.nabble.com/copy-unexpected-response-4-tp1922077p5461710.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.