Thread: big text field -> message type 0x44
Hi, I've been trying to ask on general, and tried to search the archives without much result, so I'll try here. I'm using PostgreSQL 7.2.1 on Solaris 8/sparc. In a table, I have a text field, which may contain long ascii strings. However, when trying to put data larger than about 32000 characters (probably 32767), I get various errors in different situations. I'll try to list the ones I've seen here, hoping that it will help you find the problem. Using libpq from my application, connecting to localhost:5432, I can insert large ascii strings to the field using the INSERT command, but I cannot get it with SELECT. I then get a "message type 0x44 arrived from server while idle" error. Using libpq from my application, connecting to the unix socket, I'm unable to insert the large ascii string. I get a PGRES_NONFATAL_ERROR, but no text message is available, i.e. PQresultErrorMessage(result) returns an empty string. When running SELECT here, I still get the message type 0x44 error. Using psql connecting to either unix socket or localhost:5432, I can run the same SELECT and the correct data is printed. The same application and PostgreSQL version running in Linux works well, so I've only seen this on Solaris. Since it works in psql, it must be possible for my application to work too, but I just can't figure out why it doesn't. Are there known problems with large strings on Solaris? Greetings, Tomas
Tomas Berndtsson <tomas@nocrew.org> writes: > Since it works in psql, it must be possible for my application to work > too, but I just can't figure out why it doesn't. I think it's got to be a bug in your application code. A bug in libpq is the only other possibility --- but seeing that psql also uses libpq, I'm inclined to discount that. (You're sure you are linking the same version of libpq into your app that psql uses, right?) regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Tomas Berndtsson <tomas@nocrew.org> writes: > > Since it works in psql, it must be possible for my application to work > > too, but I just can't figure out why it doesn't. > > I think it's got to be a bug in your application code. > > A bug in libpq is the only other possibility --- but seeing that psql > also uses libpq, I'm inclined to discount that. (You're sure you are > linking the same version of libpq into your app that psql uses, > right?) Yep, there is only one installation of PostgreSQL on the machine. My application is multithreaded, and I have been very careful to open a new connection for each thread. Could it have anything to do with semaphores and shared memory in Solaris? My /etc/system contains this: set shmsys:shminfo_shmmax=0x2000000 set shmsys:shminfo_shmmin=1 set shmsys:shminfo_shmmni=256 set shmsys:shminfo_shmseg=256 set semsys:seminfo_semmap=256 set semsys:seminfo_semmni=256 set semsys:seminfo_semmns=256 set semsys:seminfo_semmnu=256 set semsys:seminfo_semmsl=256 set semsys:seminfo_semopm=256 set semsys:seminfo_semume=256 set semsys:seminfo_semusz=256 I have these values to be able to have more connections than default to PostgreSQL. Maybe they need to be even higher? What's strange is that the same application and PostgreSQL version works fine in Linux. Tomas
Tomas Berndtsson <tomas@nocrew.org> writes: > Yep, there is only one installation of PostgreSQL on the machine. My > application is multithreaded, and I have been very careful to open a > new connection for each thread. Could it have anything to do with > semaphores and shared memory in Solaris? I wouldn't think so; the client-side code doesn't have anything to do with either shared memory or semaphores. But your comment about threading immediately focuses my attention on that. Let's see (checks ASCII codes...) message 0x44 is 'D' which is a data message. The only situations I've seen before in which libpq comes out with this complaint are (1) when it's lost sync with the backend as a result of running out of memory to store a large query result (its recovery from that situation is pretty crummy :-(), or (2) when someone's confused libpq by trying concurrent queries with one PGconn. You say you didn't do (2), so that leaves (1). Is it possible that your threading setup limits the amount of memory libpq can malloc? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Tomas Berndtsson <tomas@nocrew.org> writes: > > Yep, there is only one installation of PostgreSQL on the machine. My > > application is multithreaded, and I have been very careful to open a > > new connection for each thread. Could it have anything to do with > > semaphores and shared memory in Solaris? > > I wouldn't think so; the client-side code doesn't have anything to do > with either shared memory or semaphores. But your comment about > threading immediately focuses my attention on that. > > Let's see (checks ASCII codes...) message 0x44 is 'D' which is a data > message. The only situations I've seen before in which libpq comes out > with this complaint are (1) when it's lost sync with the backend as a > result of running out of memory to store a large query result (its > recovery from that situation is pretty crummy :-(), or (2) when > someone's confused libpq by trying concurrent queries with one PGconn. > > You say you didn't do (2), so that leaves (1). Is it possible that your > threading setup limits the amount of memory libpq can malloc? I don't know what I would do to limit it. The machine has 2GB RAM, and over 1GB free. However, after some semi-random looking through the source code of libpq, I tried to change a value, namely here: fe-misc.c row 510 in pqReadData(): if (conn->inEnd > 32768 && (conn->inBufSize - conn->inEnd) >= 8192) { someread = 1; goto tryAgain; } I changed the 32768 value to 131072, and sure enough, my application was able to get larger fields without any errors. The best thing would of course be to have no limit to it. That would mean taking the whole if-statement out, right? I've only tried with the value change, though. There's a comment above this, saying it's a hack for some kernels that only give back one packet, even if there is more. But, it seems to confuse the Solaris kernel in some mysterious way when running threads. I haven't seen that it breaks anything else by changing this value, but if you think it might, please tell me. I wouldn't want to risk breaking other stuff. Tomas
Tomas Berndtsson <tomas@nocrew.org> writes: > However, after some semi-random looking through the source code of > libpq, I tried to change a value, namely here: > fe-misc.c row 510 in pqReadData(): > if (conn->inEnd > 32768 && > (conn->inBufSize - conn->inEnd) >= 8192) > I changed the 32768 value to 131072, and sure enough, my application > was able to get larger fields without any errors. That's really interesting. I cannot see anything unsafe about that retry loop --- could you instrument it some more to determine exactly what happens after we go back to try to read more? Also, are you using SSL by any chance? Perhaps the problem is that the SSL library doesn't react the same as a bare recv() call? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Tomas Berndtsson <tomas@nocrew.org> writes: > > However, after some semi-random looking through the source code of > > libpq, I tried to change a value, namely here: > > fe-misc.c row 510 in pqReadData(): > > if (conn->inEnd > 32768 && > > (conn->inBufSize - conn->inEnd) >= 8192) > > > I changed the 32768 value to 131072, and sure enough, my application > > was able to get larger fields without any errors. > > That's really interesting. I cannot see anything unsafe about that > retry loop --- could you instrument it some more to determine exactly > what happens after we go back to try to read more? > > Also, are you using SSL by any chance? Perhaps the problem is that > the SSL library doesn't react the same as a bare recv() call? Nope, no SSL. I inserted some debug printing in the code. This is the output: LIBPQ: recv inbufsize=16384 inend=0 nread=8192 LIBPQ: recv inbufsize=16384 inend=6194 nread=8192 LIBPQ: recv inbufsize=32768 inend=14386 nread=8192 LIBPQ: recv inbufsize=32768 inend=22578 nread=8192 LIBPQ: recv inbufsize=65536 inend=30770 nread=8192 LIBPQ: trying again LIBPQ: recv inbufsize=65536 inend=38962 nread=-1 LIBPQ: SOCK_ERRNO = 25 (Inappropriate ioctl for device) message type 0x44 arrived from server while idle The "recv" row is printed right after recv is called. "trying again" is printed inside the if (conn->inEnd > 32768 && (conn->inBufSize - conn->inEnd) >= 8192) After it tries again, it always gets error from recv() for some reason that I don't know. I also don't understand why errno is set to ENOTTY at this point, that makes no sense at all. But it does, and libpq doesn't recognise the errno code and therefore returns -1 from pqReadData(). By skipping the trying again if-statement, pqReadData() will always return proper data, and let the calling function deal with the fact that there is more data to be read. I don't know if I can help you more than this. I have absolutely no idea why recv() would fail with ENOTTY. Tomas
Tomas Berndtsson <tomas@nocrew.org> writes: > After it tries again, it always gets error from recv() for some reason > that I don't know. I also don't understand why errno is set to ENOTTY > at this point, that makes no sense at all. Are you sure it is set? Try setting errno=0 just before recv() (inside the retry loop). Maybe recv() is neglecting to set it in this case. I suddenly have a recollection of something about some platform failing to set errno when using threads. Try searching the PG archives. > By skipping the trying again if-statement, pqReadData() will always > return proper data, and let the calling function deal with the fact > that there is more data to be read. I have no confidence in this. If the calling function comes back for more data, why wouldn't the recv() fail the same way? A few more instructions in between shouldn't change its behavior, one would think. regards, tom lane
Tom Lane wrote: > Tomas Berndtsson <tomas@nocrew.org> writes: > > After it tries again, it always gets error from recv() for some reason > > that I don't know. I also don't understand why errno is set to ENOTTY > > at this point, that makes no sense at all. > > Are you sure it is set? Try setting errno=0 just before recv() (inside > the retry loop). Maybe recv() is neglecting to set it in this case. > > I suddenly have a recollection of something about some platform failing > to set errno when using threads. Try searching the PG archives. I don't know whether or not things have changed significantly since Solaris 2.4 (and perhaps 2.5), but I seem to remember that back then a lot of the networking code was implemented in libraries on top of SVr4 TLI (Transport Layer Interface), and thus functions like recv() that made use of internet domain sockets were actually just wrappers around the TLI stuff. If it's still implemented that way, I suppose there's the possibility that recv() isn't thread-safe under Solaris, but I doubt it. Such a deficiency would be quite glaring considering what threads are used for. Just food for thought, for what it's worth... - Kevin
Tom Lane <tgl@sss.pgh.pa.us> writes: > Tomas Berndtsson <tomas@nocrew.org> writes: > > After it tries again, it always gets error from recv() for some reason > > that I don't know. I also don't understand why errno is set to ENOTTY > > at this point, that makes no sense at all. > > Are you sure it is set? Try setting errno=0 just before recv() (inside > the retry loop). Maybe recv() is neglecting to set it in this case. Indeed you were right in this. But, if I added -D_REENTRANT to the Makefile for libpq, it started to set it. If libpq should be thread safe, I believe it should be compiled with -D_REENTRANT. When I did this, recv still returns error, but now sets errno to EAGAIN, so pqReadData() returns 1, giving the same result as removing the if-statement that does the try again thing. > > By skipping the trying again if-statement, pqReadData() will always > > return proper data, and let the calling function deal with the fact > > that there is more data to be read. > > I have no confidence in this. If the calling function comes back for > more data, why wouldn't the recv() fail the same way? A few more > instructions in between shouldn't change its behavior, one would think. No, I agree it sounds strange. I still haven't figured out why recv fails after the goto, but not when calling the function again. Tomas
Tomas Berndtsson <tomas@nocrew.org> writes: > Indeed you were right in this. But, if I added -D_REENTRANT to the > Makefile for libpq, it started to set it. If libpq should be thread > safe, I believe it should be compiled with -D_REENTRANT. > When I did this, recv still returns error, but now sets errno to > EAGAIN, so pqReadData() returns 1, giving the same result as removing > the if-statement that does the try again thing. Okay, so it seems -D_REENTRANT is the appropriate fix. We could either add that to the template/solaris file, or just add a note to FAQ_Solaris advising that it be added to the configure switches if people intend to use libpq in threaded programs. Is there any cost or downside to just adding it always in template/solaris? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Tomas Berndtsson <tomas@nocrew.org> writes: > > Indeed you were right in this. But, if I added -D_REENTRANT to the > > Makefile for libpq, it started to set it. If libpq should be thread > > safe, I believe it should be compiled with -D_REENTRANT. > > > When I did this, recv still returns error, but now sets errno to > > EAGAIN, so pqReadData() returns 1, giving the same result as removing > > the if-statement that does the try again thing. > > Okay, so it seems -D_REENTRANT is the appropriate fix. > > We could either add that to the template/solaris file, or just add a > note to FAQ_Solaris advising that it be added to the configure switches > if people intend to use libpq in threaded programs. Is there any > cost or downside to just adding it always in template/solaris? Not that I know of. Some data (like errno) is made local for the thread, so I suppose it takes a little more memory and maybe more disk space, but else than that I don't think it affects much. But, then again, I'm not an expert at these things. Someone else might know more what the real difference is. Tomas
Tom Lane writes:> Okay, so it seems -D_REENTRANT is the appropriate fix.> > We could either add that to the template/solarisfile, or just add a> note to FAQ_Solaris advising that it be added to the configure switches> if people intendto use libpq in threaded programs. Is there any> cost or downside to just adding it always in template/solaris? However, _REENTRANT is not a Solarisism... On all (recent) UNIX systems it toggles on correct handling for thread specific instances of historically global variables (eg errno). It should be considered for all platforms if libpq is intended to be used from threaded programs. You'll probably find Tomas's code breaks on Linux too... Lee.
Lee Kindness <lkindness@csl.co.uk> writes: > Tom Lane writes: > > Okay, so it seems -D_REENTRANT is the appropriate fix. > > > > We could either add that to the template/solaris file, or just add a > > note to FAQ_Solaris advising that it be added to the configure switches > > if people intend to use libpq in threaded programs. Is there any > > cost or downside to just adding it always in template/solaris? > > However, _REENTRANT is not a Solarisism... On all (recent) UNIX > systems it toggles on correct handling for thread specific instances > of historically global variables (eg errno). It should be considered > for all platforms if libpq is intended to be used from threaded > programs. I know libpq is "officially" non-threadsafe, but is there anything in there that would actually cause a problem, assuming either a connection per thread or proper locking on the application's part? Most of the data in the library seems to be per-connection... -Doug
Lee Kindness <lkindness@csl.co.uk> writes: > Tom Lane writes: > > Okay, so it seems -D_REENTRANT is the appropriate fix. > > > > We could either add that to the template/solaris file, or just add a > > note to FAQ_Solaris advising that it be added to the configure switches > > if people intend to use libpq in threaded programs. Is there any > > cost or downside to just adding it always in template/solaris? > > However, _REENTRANT is not a Solarisism... On all (recent) UNIX > systems it toggles on correct handling for thread specific instances > of historically global variables (eg errno). It should be considered > for all platforms if libpq is intended to be used from threaded > programs. > > You'll probably find Tomas's code breaks on Linux too... Actually, I've tried it in Linux, and it works there. Might be that the recv() doesn't return -1 when trying again in Linux. In that case, for this particular problem, it wouldn't matter if it's reentrant or not. Tomas
Doug McNaught <doug@mcnaught.org> writes: > Lee Kindness <lkindness@csl.co.uk> writes: > > > Tom Lane writes: > > > Okay, so it seems -D_REENTRANT is the appropriate fix. > > > > > > We could either add that to the template/solaris file, or just add a > > > note to FAQ_Solaris advising that it be added to the configure switches > > > if people intend to use libpq in threaded programs. Is there any > > > cost or downside to just adding it always in template/solaris? > > > > However, _REENTRANT is not a Solarisism... On all (recent) UNIX > > systems it toggles on correct handling for thread specific instances > > of historically global variables (eg errno). It should be considered > > for all platforms if libpq is intended to be used from threaded > > programs. > > I know libpq is "officially" non-threadsafe, but is there anything in > there that would actually cause a problem, assuming either a > connection per thread or proper locking on the application's part? > Most of the data in the library seems to be per-connection... The documentation states: "libpq is thread-safe as of PostgreSQL 7.0, so long as no two threadsattempt to manipulate the same PGconn object at thesame time." Tomas
Lee Kindness <lkindness@csl.co.uk> writes: > Tom Lane writes: >>> Okay, so it seems -D_REENTRANT is the appropriate fix. > However, _REENTRANT is not a Solarisism... On all (recent) UNIX > systems it toggles on correct handling for thread specific instances > of historically global variables (eg errno). It should be considered > for all platforms if libpq is intended to be used from threaded > programs. Now that I think about it, what that macro is probably really doing is switching the code from looking at a static "errno" variable to looking at a per-thread variable. So in fact -D_REENTRANT would be correct if you intended to link with a thread-aware libc, and wrong if you intended to link with a non-aware libc. (Is there such a thing as a non-threaded implementation of libc on the platforms where -D_REENTRANT does anything?) If this analysis is right then I think we should *not* force _REENTRANT; it will have to be up to users to choose the mechanism they want to use in their programs. regards, tom lane
--On Thursday, December 05, 2002 14:02:04 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Lee Kindness <lkindness@csl.co.uk> writes: >> Tom Lane writes: >>>> Okay, so it seems -D_REENTRANT is the appropriate fix. > >> However, _REENTRANT is not a Solarisism... On all (recent) UNIX >> systems it toggles on correct handling for thread specific instances >> of historically global variables (eg errno). It should be considered >> for all platforms if libpq is intended to be used from threaded >> programs. > > Now that I think about it, what that macro is probably really doing is > switching the code from looking at a static "errno" variable to looking > at a per-thread variable. So in fact -D_REENTRANT would be correct if > you intended to link with a thread-aware libc, and wrong if you intended > to link with a non-aware libc. (Is there such a thing as a non-threaded > implementation of libc on the platforms where -D_REENTRANT does > anything?) If this analysis is right then I think we should *not* > force _REENTRANT; it will have to be up to users to choose the mechanism > they want to use in their programs. > YES. I believe UnixWare7 has such. You need -Kthread to get a threaded version of SOME calls. If you need more details, Ask. > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: ler@lerctr.org US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749
Tom Lane wrote: > Lee Kindness <lkindness@csl.co.uk> writes: > > Tom Lane writes: > >>> Okay, so it seems -D_REENTRANT is the appropriate fix. > > > However, _REENTRANT is not a Solarisism... On all (recent) UNIX > > systems it toggles on correct handling for thread specific instances > > of historically global variables (eg errno). It should be considered > > for all platforms if libpq is intended to be used from threaded > > programs. > > Now that I think about it, what that macro is probably really doing is > switching the code from looking at a static "errno" variable to looking > at a per-thread variable. So in fact -D_REENTRANT would be correct if > you intended to link with a thread-aware libc, and wrong if you intended > to link with a non-aware libc. (Is there such a thing as a non-threaded > implementation of libc on the platforms where -D_REENTRANT does > anything?) If this analysis is right then I think we should *not* > force _REENTRANT; it will have to be up to users to choose the mechanism > they want to use in their programs. As far as I remember, on some platforms -lpthread does replace some of the libc functions with thread-safe ones. That could be quite confusing. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Tom Lane writes:> Lee Kindness <lkindness@csl.co.uk> writes:> > Tom Lane writes:> >>> Okay, so it seems -D_REENTRANT is theappropriate fix.> > However, _REENTRANT is not a Solarisism... On all (recent) UNIX> > systems it toggles on correct handlingfor thread specific instances> > of historically global variables (eg errno). It should be considered> > for allplatforms if libpq is intended to be used from threaded> > programs.> Now that I think about it, what that macro is probablyreally doing is> switching the code from looking at a static "errno" variable to looking> at a per-thread variable. So in fact -D_REENTRANT would be correct if> you intended to link with a thread-aware libc, and wrong if you intended>to link with a non-aware libc. (Is there such a thing as a non-threaded> implementation of libc on the platformswhere -D_REENTRANT does> anything?) If this analysis is right then I think we should *not*> force _REENTRANT; itwill have to be up to users to choose the mechanism> they want to use in their programs. I think in the long-term the libraries are going to have to be looked at in detail to ensure they work as would be expected from multithreaded programs. I cannot see any harm in adding -D_REENTRANT to CFLAGS even though some platforms supersede it with -lthread or something (becaue they still define _REENTRANT behind the scenes). I remember in the past reading in detail the issues involved with making shared libraries work as expected from threads. However I no-longer has access to that book, but think it was "Multithreaded Programming with Pthreads"... Again, something i'd like to look at later this month. Workwise the threaded code we had which used embedded SQL calls in C fell into heaps when moved from Ingres to PostgreSQL. And Ingres's ESQL/C is real crap for threading and we employeed loads of mutexes... So, ... Lee.