Thread: big text field -> message type 0x44

big text field -> message type 0x44

From
Tomas Berndtsson
Date:
Hi, I've been trying to ask on general, and tried to search the
archives without much result, so I'll try here.

I'm using PostgreSQL 7.2.1 on Solaris 8/sparc. In a table, I have a
text field, which may contain long ascii strings. However, when trying
to put data larger than about 32000 characters (probably 32767), I get
various errors in different situations. I'll try to list the ones I've
seen here, hoping that it will help you find the problem.

Using libpq from my application, connecting to localhost:5432, I can
insert large ascii strings to the field using the INSERT command, but
I cannot get it with SELECT. I then get a "message type 0x44 arrived
from server while idle" error.

Using libpq from my application, connecting to the unix socket, I'm
unable to insert the large ascii string. I get a PGRES_NONFATAL_ERROR,
but no text message is available, i.e. PQresultErrorMessage(result)
returns an empty string. When running SELECT here, I still get the
message type 0x44 error.

Using psql connecting to either unix socket or localhost:5432, I can
run the same SELECT and the correct data is printed.

The same application and PostgreSQL version running in Linux works
well, so I've only seen this on Solaris.

Since it works in psql, it must be possible for my application to work
too, but I just can't figure out why it doesn't. Are there known
problems with large strings on Solaris?


Greetings,

Tomas



Re: big text field -> message type 0x44

From
Tom Lane
Date:
Tomas Berndtsson <tomas@nocrew.org> writes:
> Since it works in psql, it must be possible for my application to work
> too, but I just can't figure out why it doesn't.

I think it's got to be a bug in your application code.

A bug in libpq is the only other possibility --- but seeing that psql
also uses libpq, I'm inclined to discount that.  (You're sure you are
linking the same version of libpq into your app that psql uses,
right?)
        regards, tom lane


Re: big text field -> message type 0x44

From
Tomas Berndtsson
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Tomas Berndtsson <tomas@nocrew.org> writes:
> > Since it works in psql, it must be possible for my application to work
> > too, but I just can't figure out why it doesn't.
> 
> I think it's got to be a bug in your application code.
> 
> A bug in libpq is the only other possibility --- but seeing that psql
> also uses libpq, I'm inclined to discount that.  (You're sure you are
> linking the same version of libpq into your app that psql uses,
> right?)

Yep, there is only one installation of PostgreSQL on the machine. My
application is multithreaded, and I have been very careful to open a
new connection for each thread. Could it have anything to do with
semaphores and shared memory in Solaris? My /etc/system contains this:

set shmsys:shminfo_shmmax=0x2000000
set shmsys:shminfo_shmmin=1
set shmsys:shminfo_shmmni=256
set shmsys:shminfo_shmseg=256

set semsys:seminfo_semmap=256
set semsys:seminfo_semmni=256
set semsys:seminfo_semmns=256
set semsys:seminfo_semmnu=256
set semsys:seminfo_semmsl=256
set semsys:seminfo_semopm=256
set semsys:seminfo_semume=256
set semsys:seminfo_semusz=256

I have these values to be able to have more connections than default
to PostgreSQL. Maybe they need to be even higher?

What's strange is that the same application and PostgreSQL version
works fine in Linux.


Tomas


Re: big text field -> message type 0x44

From
Tom Lane
Date:
Tomas Berndtsson <tomas@nocrew.org> writes:
> Yep, there is only one installation of PostgreSQL on the machine. My
> application is multithreaded, and I have been very careful to open a
> new connection for each thread. Could it have anything to do with
> semaphores and shared memory in Solaris?

I wouldn't think so; the client-side code doesn't have anything to do
with either shared memory or semaphores.  But your comment about
threading immediately focuses my attention on that.

Let's see (checks ASCII codes...) message 0x44 is 'D' which is a data
message.  The only situations I've seen before in which libpq comes out
with this complaint are (1) when it's lost sync with the backend as a
result of running out of memory to store a large query result (its
recovery from that situation is pretty crummy :-(), or (2) when
someone's confused libpq by trying concurrent queries with one PGconn.

You say you didn't do (2), so that leaves (1).  Is it possible that your
threading setup limits the amount of memory libpq can malloc?
        regards, tom lane


Re: big text field -> message type 0x44

From
Tomas Berndtsson
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Tomas Berndtsson <tomas@nocrew.org> writes:
> > Yep, there is only one installation of PostgreSQL on the machine. My
> > application is multithreaded, and I have been very careful to open a
> > new connection for each thread. Could it have anything to do with
> > semaphores and shared memory in Solaris?
> 
> I wouldn't think so; the client-side code doesn't have anything to do
> with either shared memory or semaphores.  But your comment about
> threading immediately focuses my attention on that.
> 
> Let's see (checks ASCII codes...) message 0x44 is 'D' which is a data
> message.  The only situations I've seen before in which libpq comes out
> with this complaint are (1) when it's lost sync with the backend as a
> result of running out of memory to store a large query result (its
> recovery from that situation is pretty crummy :-(), or (2) when
> someone's confused libpq by trying concurrent queries with one PGconn.
> 
> You say you didn't do (2), so that leaves (1).  Is it possible that your
> threading setup limits the amount of memory libpq can malloc?

I don't know what I would do to limit it. The machine has 2GB RAM, and
over 1GB free.

However, after some semi-random looking through the source code of
libpq, I tried to change a value, namely here:

fe-misc.c row 510 in pqReadData():
               if (conn->inEnd > 32768 &&                       (conn->inBufSize - conn->inEnd) >= 8192)
{                      someread = 1;                       goto tryAgain;               }
 


I changed the 32768 value to 131072, and sure enough, my application
was able to get larger fields without any errors. The best thing would
of course be to have no limit to it. That would mean taking the whole
if-statement out, right? I've only tried with the value change,
though. There's a comment above this, saying it's a hack for some
kernels that only give back one packet, even if there is more. But, it
seems to confuse the Solaris kernel in some mysterious way when
running threads. 

I haven't seen that it breaks anything else by changing this value,
but if you think it might, please tell me. I wouldn't want to risk
breaking other stuff.


Tomas


Re: big text field -> message type 0x44

From
Tom Lane
Date:
Tomas Berndtsson <tomas@nocrew.org> writes:
> However, after some semi-random looking through the source code of
> libpq, I tried to change a value, namely here:
> fe-misc.c row 510 in pqReadData():
>                 if (conn->inEnd > 32768 &&
>                         (conn->inBufSize - conn->inEnd) >= 8192)

> I changed the 32768 value to 131072, and sure enough, my application
> was able to get larger fields without any errors.

That's really interesting.  I cannot see anything unsafe about that
retry loop --- could you instrument it some more to determine exactly
what happens after we go back to try to read more?

Also, are you using SSL by any chance?  Perhaps the problem is that
the SSL library doesn't react the same as a bare recv() call?
        regards, tom lane


Re: big text field -> message type 0x44

From
Tomas Berndtsson
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Tomas Berndtsson <tomas@nocrew.org> writes:
> > However, after some semi-random looking through the source code of
> > libpq, I tried to change a value, namely here:
> > fe-misc.c row 510 in pqReadData():
> >                 if (conn->inEnd > 32768 &&
> >                         (conn->inBufSize - conn->inEnd) >= 8192)
> 
> > I changed the 32768 value to 131072, and sure enough, my application
> > was able to get larger fields without any errors.
> 
> That's really interesting.  I cannot see anything unsafe about that
> retry loop --- could you instrument it some more to determine exactly
> what happens after we go back to try to read more?
> 
> Also, are you using SSL by any chance?  Perhaps the problem is that
> the SSL library doesn't react the same as a bare recv() call?

Nope, no SSL.

I inserted some debug printing in the code. This is the output:

LIBPQ: recv inbufsize=16384 inend=0 nread=8192
LIBPQ: recv inbufsize=16384 inend=6194 nread=8192
LIBPQ: recv inbufsize=32768 inend=14386 nread=8192
LIBPQ: recv inbufsize=32768 inend=22578 nread=8192
LIBPQ: recv inbufsize=65536 inend=30770 nread=8192
LIBPQ: trying again
LIBPQ: recv inbufsize=65536 inend=38962 nread=-1
LIBPQ: SOCK_ERRNO = 25 (Inappropriate ioctl for device)
message type 0x44 arrived from server while idle

The "recv" row is printed right after recv is called.

"trying again" is printed inside the
if (conn->inEnd > 32768 &&   (conn->inBufSize - conn->inEnd) >= 8192)

After it tries again, it always gets error from recv() for some reason
that I don't know. I also don't understand why errno is set to ENOTTY
at this point, that makes no sense at all. But it does, and libpq
doesn't recognise the errno code and therefore returns -1 from
pqReadData().

By skipping the trying again if-statement, pqReadData() will always
return proper data, and let the calling function deal with the fact
that there is more data to be read.

I don't know if I can help you more than this. I have absolutely no
idea why recv() would fail with ENOTTY.


Tomas


Re: big text field -> message type 0x44

From
Tom Lane
Date:
Tomas Berndtsson <tomas@nocrew.org> writes:
> After it tries again, it always gets error from recv() for some reason
> that I don't know. I also don't understand why errno is set to ENOTTY
> at this point, that makes no sense at all.

Are you sure it is set?  Try setting errno=0 just before recv() (inside
the retry loop).  Maybe recv() is neglecting to set it in this case.

I suddenly have a recollection of something about some platform failing
to set errno when using threads.  Try searching the PG archives.

> By skipping the trying again if-statement, pqReadData() will always
> return proper data, and let the calling function deal with the fact
> that there is more data to be read.

I have no confidence in this.  If the calling function comes back for
more data, why wouldn't the recv() fail the same way?  A few more
instructions in between shouldn't change its behavior, one would think.
        regards, tom lane


Re: big text field -> message type 0x44

From
Kevin Brown
Date:
Tom Lane wrote:
> Tomas Berndtsson <tomas@nocrew.org> writes:
> > After it tries again, it always gets error from recv() for some reason
> > that I don't know. I also don't understand why errno is set to ENOTTY
> > at this point, that makes no sense at all.
> 
> Are you sure it is set?  Try setting errno=0 just before recv() (inside
> the retry loop).  Maybe recv() is neglecting to set it in this case.
> 
> I suddenly have a recollection of something about some platform failing
> to set errno when using threads.  Try searching the PG archives.

I don't know whether or not things have changed significantly since
Solaris 2.4 (and perhaps 2.5), but I seem to remember that back then a
lot of the networking code was implemented in libraries on top of SVr4
TLI (Transport Layer Interface), and thus functions like recv() that
made use of internet domain sockets were actually just wrappers around
the TLI stuff.

If it's still implemented that way, I suppose there's the possibility
that recv() isn't thread-safe under Solaris, but I doubt it.  Such a
deficiency would be quite glaring considering what threads are used
for.

Just food for thought, for what it's worth...


- Kevin


Re: big text field -> message type 0x44

From
Tomas Berndtsson
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Tomas Berndtsson <tomas@nocrew.org> writes:
> > After it tries again, it always gets error from recv() for some reason
> > that I don't know. I also don't understand why errno is set to ENOTTY
> > at this point, that makes no sense at all.
> 
> Are you sure it is set?  Try setting errno=0 just before recv() (inside
> the retry loop).  Maybe recv() is neglecting to set it in this case.

Indeed you were right in this. But, if I added -D_REENTRANT to the
Makefile for libpq, it started to set it. If libpq should be thread
safe, I believe it should be compiled with -D_REENTRANT. 

When I did this, recv still returns error, but now sets errno to
EAGAIN, so pqReadData() returns 1, giving the same result as removing
the if-statement that does the try again thing. 

> > By skipping the trying again if-statement, pqReadData() will always
> > return proper data, and let the calling function deal with the fact
> > that there is more data to be read.
> 
> I have no confidence in this.  If the calling function comes back for
> more data, why wouldn't the recv() fail the same way?  A few more
> instructions in between shouldn't change its behavior, one would think.

No, I agree it sounds strange. I still haven't figured out why recv
fails after the goto, but not when calling the function again. 


Tomas


Re: big text field -> message type 0x44

From
Tom Lane
Date:
Tomas Berndtsson <tomas@nocrew.org> writes:
> Indeed you were right in this. But, if I added -D_REENTRANT to the
> Makefile for libpq, it started to set it. If libpq should be thread
> safe, I believe it should be compiled with -D_REENTRANT. 

> When I did this, recv still returns error, but now sets errno to
> EAGAIN, so pqReadData() returns 1, giving the same result as removing
> the if-statement that does the try again thing. 

Okay, so it seems -D_REENTRANT is the appropriate fix.

We could either add that to the template/solaris file, or just add a
note to FAQ_Solaris advising that it be added to the configure switches
if people intend to use libpq in threaded programs.  Is there any
cost or downside to just adding it always in template/solaris?
        regards, tom lane


Re: big text field -> message type 0x44

From
Tomas Berndtsson
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Tomas Berndtsson <tomas@nocrew.org> writes:
> > Indeed you were right in this. But, if I added -D_REENTRANT to the
> > Makefile for libpq, it started to set it. If libpq should be thread
> > safe, I believe it should be compiled with -D_REENTRANT. 
> 
> > When I did this, recv still returns error, but now sets errno to
> > EAGAIN, so pqReadData() returns 1, giving the same result as removing
> > the if-statement that does the try again thing. 
> 
> Okay, so it seems -D_REENTRANT is the appropriate fix.
> 
> We could either add that to the template/solaris file, or just add a
> note to FAQ_Solaris advising that it be added to the configure switches
> if people intend to use libpq in threaded programs.  Is there any
> cost or downside to just adding it always in template/solaris?

Not that I know of. Some data (like errno) is made local for the
thread, so I suppose it takes a little more memory and maybe more disk
space, but else than that I don't think it affects much. But, then
again, I'm not an expert at these things. Someone else might know
more what the real difference is.


Tomas


Re: big text field -> message type 0x44

From
Lee Kindness
Date:
Tom Lane writes:> Okay, so it seems -D_REENTRANT is the appropriate fix.> > We could either add that to the
template/solarisfile, or just add a> note to FAQ_Solaris advising that it be added to the configure switches> if people
intendto use libpq in threaded programs.  Is there any> cost or downside to just adding it always in template/solaris?
 

However, _REENTRANT is not a Solarisism... On all (recent) UNIX
systems it toggles on correct handling for thread specific instances
of historically global variables (eg errno). It should be considered
for all platforms if libpq is intended to be used from threaded
programs.

You'll probably find Tomas's code breaks on Linux too...

Lee.


Re: big text field -> message type 0x44

From
Doug McNaught
Date:
Lee Kindness <lkindness@csl.co.uk> writes:

> Tom Lane writes:
>  > Okay, so it seems -D_REENTRANT is the appropriate fix.
>  > 
>  > We could either add that to the template/solaris file, or just add a
>  > note to FAQ_Solaris advising that it be added to the configure switches
>  > if people intend to use libpq in threaded programs.  Is there any
>  > cost or downside to just adding it always in template/solaris?
> 
> However, _REENTRANT is not a Solarisism... On all (recent) UNIX
> systems it toggles on correct handling for thread specific instances
> of historically global variables (eg errno). It should be considered
> for all platforms if libpq is intended to be used from threaded
> programs.

I know libpq is "officially" non-threadsafe, but is there anything in
there that would actually cause a problem, assuming either a
connection per thread or proper locking on the application's part?
Most of the data in the library seems to be per-connection...

-Doug


Re: big text field -> message type 0x44

From
Tomas Berndtsson
Date:
Lee Kindness <lkindness@csl.co.uk> writes:

> Tom Lane writes:
>  > Okay, so it seems -D_REENTRANT is the appropriate fix.
>  > 
>  > We could either add that to the template/solaris file, or just add a
>  > note to FAQ_Solaris advising that it be added to the configure switches
>  > if people intend to use libpq in threaded programs.  Is there any
>  > cost or downside to just adding it always in template/solaris?
> 
> However, _REENTRANT is not a Solarisism... On all (recent) UNIX
> systems it toggles on correct handling for thread specific instances
> of historically global variables (eg errno). It should be considered
> for all platforms if libpq is intended to be used from threaded
> programs.
> 
> You'll probably find Tomas's code breaks on Linux too...

Actually, I've tried it in Linux, and it works there. Might be that
the recv() doesn't return -1 when trying again in Linux. In that case,
for this particular problem, it wouldn't matter if it's reentrant or
not.


Tomas


Re: big text field -> message type 0x44

From
Tomas Berndtsson
Date:
Doug McNaught <doug@mcnaught.org> writes:

> Lee Kindness <lkindness@csl.co.uk> writes:
> 
> > Tom Lane writes:
> >  > Okay, so it seems -D_REENTRANT is the appropriate fix.
> >  > 
> >  > We could either add that to the template/solaris file, or just add a
> >  > note to FAQ_Solaris advising that it be added to the configure switches
> >  > if people intend to use libpq in threaded programs.  Is there any
> >  > cost or downside to just adding it always in template/solaris?
> > 
> > However, _REENTRANT is not a Solarisism... On all (recent) UNIX
> > systems it toggles on correct handling for thread specific instances
> > of historically global variables (eg errno). It should be considered
> > for all platforms if libpq is intended to be used from threaded
> > programs.
> 
> I know libpq is "officially" non-threadsafe, but is there anything in
> there that would actually cause a problem, assuming either a
> connection per thread or proper locking on the application's part?
> Most of the data in the library seems to be per-connection...

The documentation states:

"libpq is thread-safe as of PostgreSQL 7.0, so long as no two threadsattempt to manipulate the same PGconn object at
thesame time."
 


Tomas


Re: big text field -> message type 0x44

From
Tom Lane
Date:
Lee Kindness <lkindness@csl.co.uk> writes:
> Tom Lane writes:
>>> Okay, so it seems -D_REENTRANT is the appropriate fix.

> However, _REENTRANT is not a Solarisism... On all (recent) UNIX
> systems it toggles on correct handling for thread specific instances
> of historically global variables (eg errno). It should be considered
> for all platforms if libpq is intended to be used from threaded
> programs.

Now that I think about it, what that macro is probably really doing is
switching the code from looking at a static "errno" variable to looking
at a per-thread variable.  So in fact -D_REENTRANT would be correct if
you intended to link with a thread-aware libc, and wrong if you intended
to link with a non-aware libc.  (Is there such a thing as a non-threaded
implementation of libc on the platforms where -D_REENTRANT does
anything?)  If this analysis is right then I think we should *not*
force _REENTRANT; it will have to be up to users to choose the mechanism
they want to use in their programs.
        regards, tom lane


Re: big text field -> message type 0x44

From
Larry Rosenman
Date:

--On Thursday, December 05, 2002 14:02:04 -0500 Tom Lane 
<tgl@sss.pgh.pa.us> wrote:

> Lee Kindness <lkindness@csl.co.uk> writes:
>> Tom Lane writes:
>>>> Okay, so it seems -D_REENTRANT is the appropriate fix.
>
>> However, _REENTRANT is not a Solarisism... On all (recent) UNIX
>> systems it toggles on correct handling for thread specific instances
>> of historically global variables (eg errno). It should be considered
>> for all platforms if libpq is intended to be used from threaded
>> programs.
>
> Now that I think about it, what that macro is probably really doing is
> switching the code from looking at a static "errno" variable to looking
> at a per-thread variable.  So in fact -D_REENTRANT would be correct if
> you intended to link with a thread-aware libc, and wrong if you intended
> to link with a non-aware libc.  (Is there such a thing as a non-threaded
> implementation of libc on the platforms where -D_REENTRANT does
> anything?)  If this analysis is right then I think we should *not*
> force _REENTRANT; it will have to be up to users to choose the mechanism
> they want to use in their programs.
>
YES.  I believe UnixWare7 has such.  You need -Kthread to get a threaded 
version of SOME
calls.

If you need more details, Ask.


>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>



-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749





Re: big text field -> message type 0x44

From
Bruce Momjian
Date:
Tom Lane wrote:
> Lee Kindness <lkindness@csl.co.uk> writes:
> > Tom Lane writes:
> >>> Okay, so it seems -D_REENTRANT is the appropriate fix.
> 
> > However, _REENTRANT is not a Solarisism... On all (recent) UNIX
> > systems it toggles on correct handling for thread specific instances
> > of historically global variables (eg errno). It should be considered
> > for all platforms if libpq is intended to be used from threaded
> > programs.
> 
> Now that I think about it, what that macro is probably really doing is
> switching the code from looking at a static "errno" variable to looking
> at a per-thread variable.  So in fact -D_REENTRANT would be correct if
> you intended to link with a thread-aware libc, and wrong if you intended
> to link with a non-aware libc.  (Is there such a thing as a non-threaded
> implementation of libc on the platforms where -D_REENTRANT does
> anything?)  If this analysis is right then I think we should *not*
> force _REENTRANT; it will have to be up to users to choose the mechanism
> they want to use in their programs.

As far as I remember, on some platforms -lpthread does replace some of
the libc functions with thread-safe ones.  That could be quite confusing.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: big text field -> message type 0x44

From
Lee Kindness
Date:
Tom Lane writes:> Lee Kindness <lkindness@csl.co.uk> writes:> > Tom Lane writes:> >>> Okay, so it seems -D_REENTRANT is
theappropriate fix.> > However, _REENTRANT is not a Solarisism... On all (recent) UNIX> > systems it toggles on correct
handlingfor thread specific instances> > of historically global variables (eg errno). It should be considered> > for
allplatforms if libpq is intended to be used from threaded> > programs.> Now that I think about it, what that macro is
probablyreally doing is> switching the code from looking at a static "errno" variable to looking> at a per-thread
variable. So in fact -D_REENTRANT would be correct if> you intended to link with a thread-aware libc, and wrong if you
intended>to link with a non-aware libc.  (Is there such a thing as a non-threaded> implementation of libc on the
platformswhere -D_REENTRANT does> anything?)  If this analysis is right then I think we should *not*> force _REENTRANT;
itwill have to be up to users to choose the mechanism> they want to use in their programs.
 

I think in the long-term the libraries are going to have to be looked
at in detail to ensure they work as would be expected from
multithreaded programs. I cannot see any harm in adding -D_REENTRANT
to CFLAGS even though some platforms supersede it with -lthread or
something (becaue they still define _REENTRANT behind the scenes).

I remember in the past reading in detail the issues involved with
making shared libraries work as expected from threads. However I
no-longer has access to that book, but think it was "Multithreaded
Programming with Pthreads"...

Again, something i'd like to look at later this month. Workwise the
threaded code we had which used embedded SQL calls in C fell into
heaps when moved from Ingres to PostgreSQL. And Ingres's ESQL/C is
real crap for threading and we employeed loads of mutexes... So, ... 

Lee.