Thread: Problems w/ LO
I am having some problems w/ LO in postgres 6.5.snapshot (date marked 5/27). Here is the problem: I am doing a program that will search through ~250M of text in LO format. The search function seems to be running out of ram as I get a 'NOTICE: ShmemAlloc: out of memory' error after the program runs for a bit. From running 'free', I can see that I am not using any memory in my swap space yet, so it does not really seem to be running out of memory. Postmaster does constantly grow even though I am not generating any information that should make it grow at all. When I have commented out the lo_open and lo_close function calls, everything is ok so I am guessing that there is some kind of a leak in the lo_open and lo_close functions if not in the back end in postmaster. Come take a look at the code if you please: http://x.cwru.edu/~bap/search_4.c - Brandon ------------------------------------------------------ Smith Computer Lab Administrator, Case Western Reserve University bap@scl.cwru.edu 216 - 368 - 5066 http://cwrulug.cwru.edu ------------------------------------------------------ PGP Public Key Fingerprint: 1477 2DCF 8A4F CA2C 8B1F 6DFE 3B7C FDFB
On 27-May-99 Brandon Palmer wrote: > I am having some problems w/ LO in postgres 6.5.snapshot (date marked > 5/27). Here is the problem: > > I am doing a program that will search through ~250M of text in LO > format. The search function seems to be running out of ram as I get a > 'NOTICE: ShmemAlloc: out of memory' error after the program runs for a > bit. From running 'free', I can see that I am not using any memory in > my swap space yet, so it does not really seem to be running out of > memory. Postmaster does constantly grow even though I am not > generating any information that should make it grow at all. When I > have commented out the lo_open and lo_close function calls, everything > is ok so I am guessing that there is some kind of a leak in the lo_open > and lo_close functions if not in the back end in postmaster. Come take > a look at the code if you please: > > http://x.cwru.edu/~bap/search_4.c What are you running it on? What kind of limits do you have in your shell (man limits in FreeBSD). Vince. -- ========================================================================== Vince Vielhaber -- KA8CSH email: vev@michvhf.com flame-mail: /dev/null # include <std/disclaimers.h> TEAM-OS2 Online Campground Directory http://www.camping-usa.com Online Giftshop Superstore http://www.cloudninegifts.com ==========================================================================
>I am having some problems w/ LO in postgres 6.5.snapshot (date marked >5/27). Here is the problem: Seems 6.5 has a problem with LOs. Sorry, but I don't have time right now to track this problem since I have another one that has higher priority. I've been looking into the "stuck spin lock" problem under high load. Unless it being solved, PostgreSQL would not be usable in the "real world." Question to hackers: Why does s_lock_stuck() call abort()? Shouldn't be elog(ERROR) or elog(FATAL)? -- Tatsuo Ishii
> I am having some problems w/ LO in postgres 6.5.snapshot (date marked > 5/27). Here is the problem: > > I am doing a program that will search through ~250M of text in LO > format. The search function seems to be running out of ram as I get a > 'NOTICE: ShmemAlloc: out of memory' error after the program runs for a > bit. From running 'free', I can see that I am not using any memory in > my swap space yet, so it does not really seem to be running out of > memory. Postmaster does constantly grow even though I am not > generating any information that should make it grow at all. When I > have commented out the lo_open and lo_close function calls, everything > is ok so I am guessing that there is some kind of a leak in the lo_open > and lo_close functions if not in the back end in postmaster. Come take > a look at the code if you please: > > http://x.cwru.edu/~bap/search_4.c I have took look at your code. There are some minor errors in it, but they should not cause 'NOTICE: ShmemAlloc: out of memory' anyway. I couldn't run your program since I don't have test data. So I made a small test program to make sure if the problem caused by LO (It's stolen from test/examples/testlo.c). In the program, ~4k LO is read for 10000 times in a transaction. The backend process size became a little bit bigger, but I couldn't see any problem you mentioned. I have attached my test program and your program (modified so that it does not use LO calls). Can you try them and report back what happens? --- Tatsuo Ishii --------------------------------------------------------------- #include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include "libpq-fe.h" #include "libpq/libpq-fs.h" #define BUFSIZE 1024 int main(int argc, char **argv) { PGconn *conn; PGresult *res; int lobj_fd; int nbytes; int i; char buf[BUFSIZE]; conn = PQsetdb(NULL, NULL, NULL, NULL, "test"); /* check to see that the backend connection was successfully made */ if (PQstatus(conn) == CONNECTION_BAD) { fprintf(stderr,"%s", PQerrorMessage(conn)); exit(0); } res = PQexec(conn, "begin"); PQclear(res); for (i=0;i<10000;i++) { lobj_fd = lo_open(conn, 20225, INV_READ); if (lobj_fd < 0) {fprintf(stderr, "can't open large object");exit(0); } printf("start read\n"); while ((nbytes = lo_read(conn, lobj_fd, buf, BUFSIZE)) > 0) { printf("read%d\n",nbytes); } lo_close(conn, lobj_fd); } res = PQexec(conn, "end"); PQclear(res); PQfinish(conn); exit(0); } --------------------------------------------------------------- #include <stdio.h> #include "libpq-fe.h" #include <stdlib.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include "libpq/libpq-fs.h" #include <string.h> #define BUFSIZE 1024 int lobj_fd; char *buf; int nbytes; buf = (char *) malloc (1024); int print_lo(PGconn*, char*, int); int print_lo(PGconn *conn, char *search_for, int in_oid) {return(1); lobj_fd = lo_open(conn, in_oid, INV_READ); while ((nbytes = lo_read(conn, lobj_fd, buf, BUFSIZE)) > 0) { if(strstr(buf,search_for)) { lo_close(conn, lobj_fd); return 1; } } lo_close(conn, lobj_fd); return 0; } int main(int argc, char **argv) {char *search_1, *search_2;char *_insert;int i;int nFields; PGconn *conn;PGresult *res; _insert = (char *) malloc (1024);search_1 = (char *) malloc (1024);search_2 = (char *) malloc (1024); search_1 = argv[1];search_2 = argv[2]; conn = PQsetdb(NULL, NULL, NULL, NULL, "lkb_alpha2"); res = PQexec(conn, "BEGIN");PQclear(res);res = PQexec(conn, "CREATE TEMP TABLE __a (finds INT4)");res = PQexec(conn, "CREATETEMP TABLE __b (finds INT4)"); res = PQexec(conn, "SELECT did from master_table"); nFields = PQnfields(res); for (i = 0; i < PQntuples(res); i++) { if(print_lo(conn, search_1, atoi(PQgetvalue(res, i, 0)))) { printf("+"); fflush(stdout); sprintf(_insert, "INSERT INTO __a VALUES (%i)", atoi(PQgetvalue(res,i, 0))); PQexec (conn, _insert); } else { printf("."); fflush(stdout); } } printf("\n\n"); res = PQexec(conn, "SELECT finds from __a"); for (i = 0; i < PQntuples(res); i++) { if(print_lo(conn, search_2, atoi(PQgetvalue(res, i, 0)))) { printf("+"); fflush(stdout); sprintf(_insert, "INSERT INTO __b VALUES (%i)", atoi(PQgetvalue(res, i,0))); PQexec (conn, _insert); } else { printf("."); fflush(stdout); } } res = PQexec(conn, "SELECT finds FROM __b"); nFields = PQnfields(res); for(i = 0; i < PQntuples(res); i++){ printf("\n\nMatch: %i", atoi(PQgetvalue(res, i, 0)));} printf("\n\n"); res = PQexec(conn, "END");PQclear(res); PQfinish(conn); exit(0); }
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > I've been looking into the "stuck spin lock" problem under high > load. Unless it being solved, PostgreSQL would not be usable in the > "real world." > Question to hackers: Why does s_lock_stuck() call abort()? Shouldn't > be elog(ERROR) or elog(FATAL)? I think that is probably the right thing. elog(ERROR) will not do anything to release the stuck spinlock, and IIRC not even elog(FATAL) will. The only way out is to clobber all the backends and reinitialize shared memory. The postmaster will not do that unless a backend dies without making an exit report --- which means doing abort(). regards, tom lane
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > I couldn't run your program since I don't have test data. So I made a > small test program to make sure if the problem caused by LO (It's > stolen from test/examples/testlo.c). In the program, ~4k LO is read > for 10000 times in a transaction. The backend process size became a > little bit bigger, but I couldn't see any problem you mentioned. I tried the same thing, except I simply put a loop around the begin/end transaction part of testlo.c so that it would create and access many large objects in a single backend process. With today's sources I do not see a 'ShmemAlloc: out of memory' error even after several thousand iterations. (But I do not know if this test would have triggered one before...) What I do see is a significant backend memory leak --- several kilobytes per cycle. I think the problem here is that inv_create is done with the palloc memory context set to the private memory context created by lo_open ... and this memory context is never cleaned out as long as the backend survives. So whatever junk data might get palloc'd and not freed during the index creation step will just hang around indefinitely. And that code is far from leak-free. What I propose doing about it is modifying lo_commit to destroy lo_open's private memory context. This will mean going back to the old semantics wherein large object descriptors are not valid across transactions. But I think that's the safest thing anyway. We can detect the case where someone tries to use a stale LO handle if we zero out the LO "cookies" array as a side-effect of lo_commit. Comments? Objections? regards, tom lane
> I think the problem here is that inv_create is done with the palloc > memory context set to the private memory context created by lo_open > ... and this memory context is never cleaned out as long as the backend > survives. So whatever junk data might get palloc'd and not freed during > the index creation step will just hang around indefinitely. And that > code is far from leak-free. > > What I propose doing about it is modifying lo_commit to destroy > lo_open's private memory context. This will mean going back to the > old semantics wherein large object descriptors are not valid across > transactions. But I think that's the safest thing anyway. We can > detect the case where someone tries to use a stale LO handle if we > zero out the LO "cookies" array as a side-effect of lo_commit. Makes sense. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
>Tatsuo Ishii <t-ishii@sra.co.jp> writes: >> I couldn't run your program since I don't have test data. So I made a >> small test program to make sure if the problem caused by LO (It's >> stolen from test/examples/testlo.c). In the program, ~4k LO is read >> for 10000 times in a transaction. The backend process size became a >> little bit bigger, but I couldn't see any problem you mentioned. > >I tried the same thing, except I simply put a loop around the begin/end >transaction part of testlo.c so that it would create and access many >large objects in a single backend process. With today's sources I do >not see a 'ShmemAlloc: out of memory' error even after several thousand >iterations. (But I do not know if this test would have triggered one >before...) > >What I do see is a significant backend memory leak --- several kilobytes >per cycle. > >I think the problem here is that inv_create is done with the palloc >memory context set to the private memory context created by lo_open >... and this memory context is never cleaned out as long as the backend >survives. So whatever junk data might get palloc'd and not freed during >the index creation step will just hang around indefinitely. And that >code is far from leak-free. > >What I propose doing about it is modifying lo_commit to destroy >lo_open's private memory context. This will mean going back to the >old semantics wherein large object descriptors are not valid across >transactions. But I think that's the safest thing anyway. We can >detect the case where someone tries to use a stale LO handle if we >zero out the LO "cookies" array as a side-effect of lo_commit. > >Comments? Objections? Then why should we use the private memory context if all lo operations must be in a transaction? -- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: >> What I propose doing about it is modifying lo_commit to destroy >> lo_open's private memory context. This will mean going back to the >> old semantics wherein large object descriptors are not valid across >> transactions. But I think that's the safest thing anyway. > Then why should we use the private memory context if all lo operations > must be in a transaction? Right now, we could dispense with the private context. But I think it's best to leave it there for future flexibility. For example, I was thinking about flushing the context only if no LOs remain open (easily checked since lo_commit scans the cookies array anyway); that would allow cross-transaction LO handles without imposing a permanent memory leak. The trouble with that --- and this is a bug that was there anyway --- is that you need some way of cleaning up LO handles that are opened during an aborted transaction. They might be pointing at an LO relation that doesn't exist anymore. (And even if it does, the semantics of xact abort are supposed to be that all side effects are undone; opening an LO handle would be such a side effect.) As things now stand, LO handles are always closed at end of transaction regardless of whether it was commit or abort, so there is no bug. We could think about someday adding the bookkeeping needed to keep track of LO handles opened during the current xact versus ones already open, and thereby allow them to live across xact boundaries without risking the bug. But that'd be a New Feature so it's not getting done for 6.5. regards, tom lane
> > Then why should we use the private memory context if all lo operations > > must be in a transaction? > > Right now, we could dispense with the private context. But I think > it's best to leave it there for future flexibility. For example, I was > thinking about flushing the context only if no LOs remain open (easily > checked since lo_commit scans the cookies array anyway); that would > allow cross-transaction LO handles without imposing a permanent memory > leak. The trouble with that --- and this is a bug that was there anyway > --- is that you need some way of cleaning up LO handles that are opened > during an aborted transaction. They might be pointing at an LO relation > that doesn't exist anymore. (And even if it does, the semantics of xact > abort are supposed to be that all side effects are undone; opening an LO > handle would be such a side effect.) > > As things now stand, LO handles are always closed at end of transaction > regardless of whether it was commit or abort, so there is no bug. > > We could think about someday adding the bookkeeping needed to keep track > of LO handles opened during the current xact versus ones already open, > and thereby allow them to live across xact boundaries without risking > the bug. But that'd be a New Feature so it's not getting done for 6.5. Now I understand your point. Thank you for your detailed explanations! --- Tatsuo Ishii