Thread: problem with PQsendQuery/PQgetResult and COPY FROM statement
I attempt to execute a "COPY test FROM STDIN" using PQsendQuery. Subsequently, after the appropriate poll()/PQconsumeInput/PQisBusy calls, PQgetResult returns tens of millions of (invalid?) non-null PGresults. This behavior seems incorrect, and sometimes causes my application to exhaust memory and crash. The postgres version is 8.1.3. I can reproduce the problem in about 50 lines of C. I include below (1) the code, (2) a psql dump of the table in question, (3) the code's output. I'd appreciate any insight or suggestions you may have. Thanks, max poletto ====================================================================== (1) code // compile with: g++ -Wall -O -o pgtest pgtest.cc -lpq #include <cerrno> #include <cstdio> #include <ctime> #include <vector> using namespace std; #include <libpq-fe.h> #include <sys/poll.h> void retrieve(PGconn *conn, time_t timeout) { vector<PGresult *> res; while (1) { int r; do { struct pollfd pfds[1]; pfds[0].fd = PQsocket(conn); pfds[0].events= POLLIN; pfds[0].revents = 0; r = poll(pfds, sizeof(pfds) / sizeof(struct pollfd), 1000 * timeout); } while (r < 0 && errno == EINTR); if (r <= 0 || !PQconsumeInput(conn)) return; int i = 0; PGresult*oldr = 0; while (!PQisBusy(conn)) { PGresult *r = PQgetResult(conn); if (r) { res.push_back(r); if (++i % 5000000 == 0) { printf("%d results\n", i); } if (r == oldr) { printf("r==oldr (%p)\n",r); } oldr = r; } else { printf("PQgetResult return 0 after %d results\n", i); return; } } } } int main() { PGconn *conn = PQconnectdb("dbname=testdb user=postgres"); if (!conn) return -1; if (PQsendQuery(conn, "COPY test FROMSTDIN") > 0) { retrieve(conn, 20); if (PQputCopyEnd(conn, 0) < 0) return -1; } PQfinish(conn); return 0; } ====================================================================== (2) psql session root@tm01-5% psql testdb postgres Welcome to psql 8.1.3, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g orterminate with semicolon to execute query \q to quit testdb=# \d test Table "public.test"Column | Type | Modifiers --------+---------+-----------a | integer |b | integer |c | integer | testdb=# select * from test;a | b | c ---+---+---1 | 2 | 3 (1 row) testdb=# ====================================================================== (3) output root@tm01-5% ./pgtest 5000000 results 10000000 results 15000000 results 20000000 results 25000000 results PQgetResult return 0 after 25649299 results 4.640u 4.696s 0:09.34 99.8% 0+0k 0+0io 0pf+0w In this toy example, the process VM size exceeds 2GB before PQgetResult finally returns 0. The real application, which has already allocated ~1GB prior to the query, runs out of memory (3GB limit on 32-bit linux) before PQgetResult ever returns 0.
On May 19 11:51, max.poletto@gmail.com wrote: > if (PQsendQuery(conn, "COPY test FROM STDIN") > 0) { > retrieve(conn, 20); Shouldn't you be send()'ing instead of retrieve()'ing? COPY tbl FROM stdin, requests data from client to COPY FROM stdin TO tbl. Regards.
max.poletto@gmail.com writes: > I attempt to execute a "COPY test FROM STDIN" using PQsendQuery. > Subsequently, after the appropriate poll()/PQconsumeInput/PQisBusy > calls, PQgetResult returns tens of millions of (invalid?) non-null > PGresults. You should fix your code to pay some attention to what those results are. I'm betting they are error results. > This behavior seems incorrect, and sometimes causes my > application to exhaust memory and crash. Well, that's because you're not PQclear'ing a result when done with it. regards, tom lane
On Fri, May 19, 2006 at 11:51:00AM -0700, max.poletto@gmail.com wrote: > I attempt to execute a "COPY test FROM STDIN" using PQsendQuery. > Subsequently, after the appropriate poll()/PQconsumeInput/PQisBusy > calls, PQgetResult returns tens of millions of (invalid?) non-null > PGresults. This behavior seems incorrect, and sometimes causes my > application to exhaust memory and crash. In addition to the issues other people have pointed out, when using COPY you transfer the data with PQputCopyData()/PQgetCopyData(). Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Thanks for all your replies, but I must clarify some things. First, note that what I posted is just a small example that reproduces behavior that appears incorrect. The real code is a C++ wrapper around libpq that supports non-blocking queries and reuses open connections. Volkan and Martijn: I know about PQ{put,get}CopyData, but my example never gets to that point. I must first determine whether the (asynchronous) PQsendQuery() of "COPY test FROM STDIN" succeeded. That's all that retrieve() tries to do in my example. Tom: of course I should (and eventually do) use PQclear(), but I may not want to right away, because I must return to the user a vector of result objects (for example, all the result rows from a query). I do not expect PQgetResult to return millions of non-null PGresult objects after a PQsendQuery("COPY test FROM STDIN"). I expect exactly one non-null result, with a result status of PGRES_COPY_IN. Moreover, the manual says: If a COPY command is issued via PQexec in a string that could contain additional commands, the application must continue fetching results via PQgetResult after completing the COPY sequence.Only when PQgetResult returns NULL is it certain that the PQexec command string is done and it is safe to issue more commands. I assumed this to be true for PQexec or "one of the equivalent functions" mentioned in the manual, such as PQsendQuery. However, if I add the following switch statement to my example: while (!PQisBusy(conn)) { PGresult *r = PQgetResult(conn); if (r) { switch (PQresultStatus(r)) { casePGRES_COPY_IN: break; case PGRES_EMPTY_QUERY: printf("PGRES_EMPTY_QUERY\n"); break; case PGRES_COMMAND_OK: printf("PGRES_COMMAND_OK\n"); break; case PGRES_TUPLES_OK: printf("PGRES_TUPLES_OK\n"); break; case PGRES_COPY_OUT: printf("PGRES_COPY_OUT\n"); break; case PGRES_BAD_RESPONSE: printf("PGRES_BAD_RESPONSE\n"); break; case PGRES_NONFATAL_ERROR: printf("PGRES_NONFATAL_ERROR\n"); break; case PGRES_FATAL_ERROR: printf("PGRES_FATAL_ERROR\n"); break; } res.push_back(r); if (++i % 5000000 == 0) { printf("%d results\n",i); } if (r == oldr) { printf("r==oldr (%p)\n", r); } oldr = r; } else { printf("PQgetResultreturn 0 after %d results\n", i); return; } } the code still prints only: 5000000 results 10000000 results 15000000 results 20000000 results 25000000 results PQgetResult return 0 after 25649299 results In other words, there are >25M distinct non-null results, and all of them have status code PGRES_COPY_IN, and none of them have errors. So it appears that I should check whether the first PGresult object has a status code of PGRES_COPY_IN, and ignore subsequent PGresults even if they are not NULL. I don't object to this interface, but it is not what I would conclude after RTFM. max
max.poletto@gmail.com writes: > I do not expect PQgetResult to return millions of non-null PGresult > objects after a PQsendQuery("COPY test FROM STDIN"). I expect exactly > one non-null result, with a result status of PGRES_COPY_IN. If you call it exactly once, it'll say that exactly once. If you keep calling it "millions of times", it'll keep saying that. > Moreover, the manual says: > If a COPY command is issued via PQexec in a string that could > contain > additional commands, the application must continue fetching results > via PQgetResult after completing the COPY sequence. Only when > PQgetResult returns NULL is it certain that the PQexec command > string > is done and it is safe to issue more commands. Indeed. You forgot to "complete the COPY sequence" before returning to the PQgetResult loop. As long as the thing is in COPY mode, PQgetResult will return a result saying PGRES_COPY_IN. The point of this paragraph is that you might want to consider doing more PQgetResults *after* you've ended COPY mode. regards, tom lane
> If you call it exactly once, it'll say that exactly once. If you keep > calling it "millions of times", it'll keep saying that. OK. I wonder, though, why at some point it does in fact return 0. > Indeed. You forgot to "complete the COPY sequence" before returning > to the PQgetResult loop. As long as the thing is in COPY mode, > PQgetResult will return a result saying PGRES_COPY_IN. The point Got it. Thanks. max
"Massimiliano Poletto" <max.poletto@gmail.com> writes: >> If you call it exactly once, it'll say that exactly once. If you keep >> calling it "millions of times", it'll keep saying that. > OK. I wonder, though, why at some point it does in fact return 0. Probably because you ran out of memory to create new PGresult structs. regards, tom lane