Thread: psql leaking?

psql leaking?

From
"Russ Brown"
Date:
Hello,

Today I tried connecting to my database locally via psql. I got the usual
welcome & basic help messages, but it never got to the prompt: it just
hung. So I checked top and the psql process was increasing in size at
quite a rate (up to a gig in under 30 seconds).

I'd been using psql with no problems only a couple of hours ago, and I
haven't installed anything for at least a couple of days that I can think
of.

The first thing I did was to pg_dumpall a backup and try recompiling the
server. That didn't work, so I tried doing initdb again. No joy. So I
upgraded to 7.4.5 (was 7.4.3 before), and again no joy. Note that it
happens just after I've run initdb, so even when my database has gone
nowhere near it: this is effectively a clean compile and install that is
failing at this point.

I've turned logging up full and got this (starting from when I run the
psql command):

2004-09-04 16:43:29 DEBUG:  forked new backend, pid=23315 socket=8
2004-09-04 16:43:29 DEBUG:  /usr/bin/postmaster child[23315]: starting
with (
2004-09-04 16:43:29 DEBUG:      postgres
2004-09-04 16:43:29 DEBUG:      -v196608
2004-09-04 16:43:29 DEBUG:      -p
2004-09-04 16:43:29 DEBUG:      template1
2004-09-04 16:43:29 DEBUG:  )
2004-09-04 16:43:29 DEBUG:  InitPostgres
2004-09-04 16:44:53 LOG:  unexpected EOF on client connection
2004-09-04 16:44:53 DEBUG:  proc_exit(0)
2004-09-04 16:44:53 DEBUG:  shmem_exit(0)
2004-09-04 16:44:53 DEBUG:  exit(0)
2004-09-04 16:44:53 DEBUG:  reaping dead processes
2004-09-04 16:44:53 DEBUG:  child process (PID 23315) exited with exit
code 0

The LOG is the point at which I CTRL+C the client.

I'm suspecting that there's a problem with one of the libraries that psql
uses: I really can't see this being a psql bug as I'd have noticed it
before: there must be something else going on. Trouble is, I don't know
what libraries it uses other than glibc (which I'm currently recompiling
just in case).

I'm at a loss, and can't think of anything else. This has just happened
totally unprompted and without any real clues that I can see.

Anyone else have any ideas?

Thanks.

--

  Russell Brown

Re: psql leaking?

From
"Russ Brown"
Date:
A little more information that I've realised I didn't supply. The server
(well, it's a desktop machine really) is running on Gentoo Linux, 2.6.7
kernel. There are no problems with PHP accessing the database (from the
same machine), and psql doesn't have a problem with either having SQL
files directed at it or running individual commands via the -c option. It
seems to be just when trying to get an interactive prompt. I've tried
recompiling readline but that didn't fix it (my recompile of glic appears
to have not succeeded too). Each of those recompiles was followed by a
recompile of PostgreSQL. I've run ldconfig and also revdep-rebuild, with
no effect.

I'm convinced that there's a library that I've missed that I need to
recompile but I can't for the life of me think of what it is...

On Sat, 04 Sep 2004 17:05:32 +0100, Russ Brown <postgres@dot4dot.plus.com>
wrote:

> Hello,
>
> Today I tried connecting to my database locally via psql. I got the
> usual welcome & basic help messages, but it never got to the prompt: it
> just hung. So I checked top and the psql process was increasing in size
> at quite a rate (up to a gig in under 30 seconds).
>
> I'd been using psql with no problems only a couple of hours ago, and I
> haven't installed anything for at least a couple of days that I can
> think of.
>
> The first thing I did was to pg_dumpall a backup and try recompiling the
> server. That didn't work, so I tried doing initdb again. No joy. So I
> upgraded to 7.4.5 (was 7.4.3 before), and again no joy. Note that it
> happens just after I've run initdb, so even when my database has gone
> nowhere near it: this is effectively a clean compile and install that is
> failing at this point.
>
> I've turned logging up full and got this (starting from when I run the
> psql command):
>
> 2004-09-04 16:43:29 DEBUG:  forked new backend, pid=23315 socket=8
> 2004-09-04 16:43:29 DEBUG:  /usr/bin/postmaster child[23315]: starting
> with (
> 2004-09-04 16:43:29 DEBUG:      postgres
> 2004-09-04 16:43:29 DEBUG:      -v196608
> 2004-09-04 16:43:29 DEBUG:      -p
> 2004-09-04 16:43:29 DEBUG:      template1
> 2004-09-04 16:43:29 DEBUG:  )
> 2004-09-04 16:43:29 DEBUG:  InitPostgres
> 2004-09-04 16:44:53 LOG:  unexpected EOF on client connection
> 2004-09-04 16:44:53 DEBUG:  proc_exit(0)
> 2004-09-04 16:44:53 DEBUG:  shmem_exit(0)
> 2004-09-04 16:44:53 DEBUG:  exit(0)
> 2004-09-04 16:44:53 DEBUG:  reaping dead processes
> 2004-09-04 16:44:53 DEBUG:  child process (PID 23315) exited with exit
> code 0
>
> The LOG is the point at which I CTRL+C the client.
>
> I'm suspecting that there's a problem with one of the libraries that
> psql uses: I really can't see this being a psql bug as I'd have noticed
> it before: there must be something else going on. Trouble is, I don't
> know what libraries it uses other than glibc (which I'm currently
> recompiling just in case).
>
> I'm at a loss, and can't think of anything else. This has just happened
> totally unprompted and without any real clues that I can see.
>
> Anyone else have any ideas?
>
> Thanks.
>



--

  Russell Brown

Re: psql leaking?

From
"Russ Brown"
Date:
Thanks very much for your reply, Daniel.

I'm not a C programmer, so I don't really know anything about the tools
mentioned. I don't seem to get any core file at all when psql is
terminated (checked in the working directory, home directory and temp
directory): I reckon that's probably due to a system setting that disables
core dumps. I don't know what it might be though.

Anyway, I compiled strace and ltrace. ltrace gave no output, but strace
certainly did, and it helped me find a workaround to the problem (though
not a fix).

 From what I can tell (and bearing in mind my lack of knowlege and
experience on this) strace shows that psql opens a number of files, most
of which are libraries. After this it goes into what looks like an
infinite loop with plenty of commands exactly like this:

read(4, "", 131072)                     = 0

Every now and then there are a pair of lines like this (but not exactly
the same values in each case):

brk(0)                                  = 0x808e000
brk(0x80af000)                          = 0x80af000

So I looked at the last sane instructions before the loop, and its'
reading the history file:

open("/home/rbrown/.psql_history", O_RDONLY) = 4
fstat64(4, {st_mode=S_IFREG|0600, st_size=384, ...}) = 0
mmap2(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x403ad000
read(4, "_HiStOrY_V2_\nSET\\040search_path="..., 131072) = 384

So I moved the file away and it I was able to log into psql! Result! And
for the acid test, I quit and connected again. No go, same problem. So I
moved away the file again and it's fine.

So, the problem is the history file, and since I can move the file away
and have exactly the same problem happen again, I'm starting to wonder if
it's not a psql bug. But surely something like this would have happened
before? There must be something on my system that is prompting the unusual
behaviour from psql.

If anyone has any ideas I'll be happy to try them, but in the meantime I
have a workaround so I can get on with things.

Oh, and here's the content of the history file as written by just opening
and closing psql (obtained from cat):

_HiStOrY_V2_
\134l

Thanks again Daniel.

On Sat, 04 Sep 2004 23:03:32 +0200, Daniel Verite
<daniel@manitou-mail.org> wrote:

>      Russ Brown writes
>
>> > I'm suspecting that there's a problem with one of the libraries that
>> > psql uses: I really can't see this being a psql bug as I'd have
>> noticed
>> > it before: there must be something else going on. Trouble is, I don't
>> > know what libraries it uses other than glibc (which I'm currently
>> > recompiling just in case).
>
> libreadline, libtermcap?
>
>> >
>> > I'm at a loss, and can't think of anything else. This has just
>> happened
>> > totally unprompted and without any real clues that I can see.
>
> Some suggestions:
>
> - kill psql when it's errating and examine the stack of the resulting
> core file
> with gdb
>
> - launch psql with strace and see if the output gives any clue.
>
> - ditto with ltrace
>
> Regards,
>



--

  Russell Brown

Re: psql leaking? - SOLVED

From
"Russ Brown"
Date:
Fixed it. On a whim I googled for '_HiStOrY_V2_', which I thought looked a
bit funny. It turned up source files for libedit, which from what I can
tell is supposed to be a readline replacement.

Sure enough, I had that library installed, so I uninstalled it and psql
refused to run at all, as I'd expected. So I recompiled postgresql and now
everything works. It seems that the postgesql build process picked up and
used libedit instead of readline: don't know if that's a core issue or a
gentoo ebuild issue...

Anyway, problem solved, and the moral of the story is libedit don't work
with psql. :-)

Thanks again Daniel.

On Sat, 04 Sep 2004 22:35:10 +0100, Russ Brown <postgres@dot4dot.plus.com>
wrote:

> Thanks very much for your reply, Daniel.
>
> I'm not a C programmer, so I don't really know anything about the tools
> mentioned. I don't seem to get any core file at all when psql is
> terminated (checked in the working directory, home directory and temp
> directory): I reckon that's probably due to a system setting that
> disables core dumps. I don't know what it might be though.
>
> Anyway, I compiled strace and ltrace. ltrace gave no output, but strace
> certainly did, and it helped me find a workaround to the problem (though
> not a fix).
>
>  From what I can tell (and bearing in mind my lack of knowlege and
> experience on this) strace shows that psql opens a number of files, most
> of which are libraries. After this it goes into what looks like an
> infinite loop with plenty of commands exactly like this:
>
> read(4, "", 131072)                     = 0
>
> Every now and then there are a pair of lines like this (but not exactly
> the same values in each case):
>
> brk(0)                                  = 0x808e000
> brk(0x80af000)                          = 0x80af000
>
> So I looked at the last sane instructions before the loop, and its'
> reading the history file:
>
> open("/home/rbrown/.psql_history", O_RDONLY) = 4
> fstat64(4, {st_mode=S_IFREG|0600, st_size=384, ...}) = 0
> mmap2(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0) = 0x403ad000
> read(4, "_HiStOrY_V2_\nSET\\040search_path="..., 131072) = 384
>
> So I moved the file away and it I was able to log into psql! Result! And
> for the acid test, I quit and connected again. No go, same problem. So I
> moved away the file again and it's fine.
>
> So, the problem is the history file, and since I can move the file away
> and have exactly the same problem happen again, I'm starting to wonder
> if it's not a psql bug. But surely something like this would have
> happened before? There must be something on my system that is prompting
> the unusual behaviour from psql.
>
> If anyone has any ideas I'll be happy to try them, but in the meantime I
> have a workaround so I can get on with things.
>
> Oh, and here's the content of the history file as written by just
> opening and closing psql (obtained from cat):
>
> _HiStOrY_V2_
> \134l
>
> Thanks again Daniel.
>
> On Sat, 04 Sep 2004 23:03:32 +0200, Daniel Verite
> <daniel@manitou-mail.org> wrote:
>
>>      Russ Brown writes
>>
>>> > I'm suspecting that there's a problem with one of the libraries that
>>> > psql uses: I really can't see this being a psql bug as I'd have
>>> noticed
>>> > it before: there must be something else going on. Trouble is, I don't
>>> > know what libraries it uses other than glibc (which I'm currently
>>> > recompiling just in case).
>>
>> libreadline, libtermcap?
>>
>>> >
>>> > I'm at a loss, and can't think of anything else. This has just
>>> happened
>>> > totally unprompted and without any real clues that I can see.
>>
>> Some suggestions:
>>
>> - kill psql when it's errating and examine the stack of the resulting
>> core file
>> with gdb
>>
>> - launch psql with strace and see if the output gives any clue.
>>
>> - ditto with ltrace
>>
>> Regards,
>>
>
>
>



--

  Russell Brown

Re: psql leaking?

From
Tom Lane
Date:
"Russ Brown" <postgres@dot4dot.plus.com> writes:
> A little more information that I've realised I didn't supply. The server
> (well, it's a desktop machine really) is running on Gentoo Linux, 2.6.7
> kernel. There are no problems with PHP accessing the database (from the
> same machine), and psql doesn't have a problem with either having SQL
> files directed at it or running individual commands via the -c option. It
> seems to be just when trying to get an interactive prompt. I've tried
> recompiling readline but that didn't fix it (my recompile of glic appears
> to have not succeeded too). Each of those recompiles was followed by a
> recompile of PostgreSQL. I've run ldconfig and also revdep-rebuild, with
> no effect.

Gentoo?  Have you perhaps updated any library sources recently?  If it's
an actual recently-introduced bug then no amount of recompiling is going
to make it go away.

The interactive-only aspect does seem to point the finger at readline,
but I wouldn't swear to that.  You might try recompiling --without-readline
just to see.

Another approach is to make a debug build and then use gdb to get an
idea of where it's looping.

            regards, tom lane