Thread: memory problem again

memory problem again

From
Daniel Kalchev
Date:
Hello,

I have this problem with PostgreSQL 6.5.2:

table timelog199911 has 

logs=> select count(*) from timelog199911;count
------
208749
(1 row)


logs=> select distinct confid 
logs-> from timelog199910
logs-> where
logs-> confid IS NOT NULL;
pqReadData() -- backend closed the channel unexpectedly.       This probably means the backend terminated abnormally
  before or while processing the request.
 
We have lost the connection to the backend, so further processing is 
impossible.  Terminating.

The logged message in stderr (of postmaster) is 

FATAL 1:  Memory exhausted in AllocSetAlloc()

The process size grows to 76 MB (this is somehow a limit of Postgres on 
BSD/OS, but this is not my question now).

Why would it require so much memory? The same query without distinct is 
processed fast, but I don't need that much data back in the application.
The format is:

Table    = timelog
+----------------------------------+----------------------------------+-------+
|              Field               |              Type                | Length|
+----------------------------------+----------------------------------+-------+
| loginname                        | text                             |   var |
| site                             | varchar()                        |    16 |
| start_time                       | datetime                         |     8 |
| elapsed                          | timespan                         |    12 |
| port                             | text                             |   var |
| valid                            | bool default 't'                 |     1 |
| ipaddress                        | inet                             |   var |
| confid                           | int4                             |     4 |
| session_id                       | text                             |   var |
+----------------------------------+----------------------------------+-------+
Indices:  timelog_loginname_idx         timelog_start_time_idx

(indexes are btree on the indicate fields).

Weird, isn't it? 

Daniel



Re: [HACKERS] memory problem again

From
Tom Lane
Date:
Daniel Kalchev <daniel@digsys.bg> writes:
> I have this problem with PostgreSQL 6.5.2:

> logs=> select distinct confid 
> logs-> from timelog199910
> logs-> where
> logs-> confid IS NOT NULL;
> pqReadData() -- backend closed the channel unexpectedly.

> The logged message in stderr (of postmaster) is 
> FATAL 1:  Memory exhausted in AllocSetAlloc()

Odd.  I can't replicate this here.  (I'm using 6.5.3, but I doubt that
matters.)  There must be some factor involved that you haven't told us.
You don't have any triggers or rules on the table, do you?

Has anyone else seen anything like this?
        regards, tom lane


Re: [HACKERS] memory problem again

From
Daniel Kalchev
Date:
Tom... this is getting even more weird:

logs=> select distinct confid from timelog199911;
pqReadData() -- backend closed the channel unexpectedly.
[...]

Now this:
logs=> \copy timelog199911 to timelog199911
Successfully copied.
logs=> drop table timelog199911;
DROP
logs=> CREATE TABLE "timelog199911" (       "loginname" text,       "site" character varying(16),       "start_time"
datetime,      "elapsed" timespan,       "port" text,       "valid" bool,       "ipaddress" inet,       "confid" int4,
    "session_id" text);
 
[...]
CREATE
logs=> CREATE  INDEX "timelog199911_loginname_idx" on "timelog199911" using 
btree ( "loginname" "text_ops" );
CREATE
logs=> \copy timelog199911 from timelog199911
(ok, I know it's smarted to build the index after copying in the data)
Successfully copied.

logs=> select distinct confid from timelog199911;
sbrk: grow failed, return = 12
sbrk: grow failed, return = 12
pqReadData() -- backend closed the channel unexpectedly.
[...]

logs=> select confid from timelog199911;
[...]
(208749 rows)

Weird!

Daniel

>>>Tom Lane said:> Daniel Kalchev <daniel@digsys.bg> writes:> > I have this problem with PostgreSQL 6.5.2:> > > logs=>
selectdistinct confid > > logs-> from timelog199910> > logs-> where> > logs-> confid IS NOT NULL;> > pqReadData() --
backendclosed the channel unexpectedly.> > > The logged message in stderr (of postmaster) is > > FATAL 1:  Memory
exhaustedin AllocSetAlloc()> > Odd.  I can't replicate this here.  (I'm using 6.5.3, but I doubt that> matters.)  There
mustbe some factor involved that you haven't told us.> You don't have any triggers or rules on the table, do you?> >
Hasanyone else seen anything like this?> >             regards, tom lane
 




(resolution?) Re: [HACKERS] memory problem again

From
Daniel Kalchev
Date:
I found out how to resolve this problem, yet it does not explain why it 
happens anyway!

I had postmaster started with this script:

unlimit
postmaster -D/usr/local/pgsql/data -B 256 -i -o "-e -S 8192" >> 
/usr/local/pgsql/errlog 2>&1 &

Removing all the parameters to postmaster

postmaster -D/usr/local/pgsql/data -i -o "-e" >> /usr/local/pgsql/errlog 2>&1 &

made it work....

Perhaps some memory management problem? I guess the -S option is the culprit 
here, but this machine has 256 MB RAM and actually never swaps (yet).

Hope this helps somehow.

Daniel

>>>Tom Lane said:> Daniel Kalchev <daniel@digsys.bg> writes:> > I have this problem with PostgreSQL 6.5.2:> > > logs=>
selectdistinct confid > > logs-> from timelog199910> > logs-> where> > logs-> confid IS NOT NULL;> > pqReadData() --
backendclosed the channel unexpectedly.> > > The logged message in stderr (of postmaster) is > > FATAL 1:  Memory
exhaustedin AllocSetAlloc()> > Odd.  I can't replicate this here.  (I'm using 6.5.3, but I doubt that> matters.)  There
mustbe some factor involved that you haven't told us.> You don't have any triggers or rules on the table, do you?> >
Hasanyone else seen anything like this?> >             regards, tom lane
 




Re: (resolution?) Re: [HACKERS] memory problem again

From
Tom Lane
Date:
Daniel Kalchev <daniel@digsys.bg> writes:
> I found out how to resolve this problem, yet it does not explain why it 
> happens anyway!
> I had postmaster started with this script:
> postmaster -D/usr/local/pgsql/data -B 256 -i -o "-e -S 8192" >> 
> /usr/local/pgsql/errlog 2>&1 &
> Removing all the parameters to postmaster
> postmaster -D/usr/local/pgsql/data -i -o "-e" >> /usr/local/pgsql/errlog 2>&1 &
> made it work....
> Perhaps some memory management problem? I guess the -S option is the culprit 
> here, but this machine has 256 MB RAM and actually never swaps (yet).

8192 * 1K = 8 meg workspace per sort sure doesn't sound unreasonable.
There is a sort going on under-the-hood in your SELECT DISTINCT (it's
implemented in the same fashion as "sort | uniq"), but under ordinary
circumstances that doesn't cause any problem.  I can see a couple of
possibilities:1. You have a very small kernel limit on per-process data space,   probably 8M or at most 16M.2.
Somethingis broken in the sort code that makes it fail to   obey the -S limit.
 
I favor #1, since if #2 were true we'd probably have noticed it before.

You might try experimenting with a couple of different -S values (-B
shouldn't make any difference here, it just affects the size of the
shared-memory-block request), and watching the size of the backend
process with top(1) or something like it.

In the meantime, find out where kernel parameters are set on your
system, and look at what MAXDSIZ is set to...
        regards, tom lane


Re: (resolution?) Re: [HACKERS] memory problem again

From
Daniel Kalchev
Date:
Tom,

I think the #2 is more likely, because:

the kernel is compiled with large enough data size:

# support for larger processes and number of childs
options         "DFLDSIZ=\(128*1024*1024\)"
options         "MAXDSIZ=\(256*1024*1024\)"
options         "CHILD_MAX=256"
options         "OPEN_MAX=256"
options         "KMAPENTRIES=4000" # Prevents kmem malloc errors !
options         "KMEMSIZE=\(32*1024*1024\)"

the default postgres acocunt limits are:

coredumpsize    unlimited
cputime         unlimited
datasize        131072 kbytes
filesize        unlimited
maxproc         256
memorylocked    85380 kbytes
memoryuse       256136 kbytes
openfiles       128
stacksize       2048 kbytes

I run the postmaster after unlimit, which sets limits thus:

coredumpsize    unlimited
cputime         unlimited
datasize        262144 kbytes
filesize        unlimited
maxproc         4116
memorylocked    256140 kbytes
memoryuse       256136 kbytes
openfiles       13196
stacksize       262144 kbytes

I will do some experimentation with the -S flag to see how it works.

BTW, this postgres is compiled with default of 64 backends - I saw recently 
note here that this may interfere with the -S option somehow....

(another not related bug, but still on memory allocation)
Still - this does not explain why postgres cannot allocated more than 76 MB 
(according to top) on BSD/OS (never did, actually - any previous version too), 
while a simple malloc(1 MB) loop allocates up to the process limit.

Maybe at some time postrges tries to allocate 'larger' chunk, which the BSD/OS 
malloc does not like?

Daniel

>>>Tom Lane said:> Daniel Kalchev <daniel@digsys.bg> writes:> > I found out how to resolve this problem, yet it does
notexplain why it > > happens anyway!> > I had postmaster started with this script:> > postmaster
-D/usr/local/pgsql/data-B 256 -i -o "-e -S 8192" >> > > /usr/local/pgsql/errlog 2>&1 &> > Removing all the parameters
topostmaster> > postmaster -D/usr/local/pgsql/data -i -o "-e" >> /usr/local/pgsql/errlog 2    >&1 &> > made it
work....>> Perhaps some memory management problem? I guess the -S option is the culpr    it > > here, but this machine
has256 MB RAM and actually never swaps (yet).> > 8192 * 1K = 8 meg workspace per sort sure doesn't sound unreasonable.>
Thereis a sort going on under-the-hood in your SELECT DISTINCT (it's> implemented in the same fashion as "sort |
uniq"),but under ordinary> circumstances that doesn't cause any problem.  I can see a couple of> possibilities:>     1.
Youhave a very small kernel limit on per-process data space,>        probably 8M or at most 16M.>     2. Something is
brokenin the sort code that makes it fail to>        obey the -S limit.> I favor #1, since if #2 were true we'd
probablyhave noticed it before.> > You might try experimenting with a couple of different -S values (-B> shouldn't make
anydifference here, it just affects the size of the> shared-memory-block request), and watching the size of the
backend>process with top(1) or something like it.> > In the meantime, find out where kernel parameters are set on your>
system,and look at what MAXDSIZ is set to...> >             regards, tom lane
 




Re: (resolution?) Re: [HACKERS] memory problem again

From
Bruce Momjian
Date:
> (another not related bug, but still on memory allocation)
> Still - this does not explain why postgres cannot allocated more than 76 MB 
> (according to top) on BSD/OS (never did, actually - any previous version too), 
> while a simple malloc(1 MB) loop allocates up to the process limit.
> 
> Maybe at some time postrges tries to allocate 'larger' chunk, which the BSD/OS 
> malloc does not like?
> 

You can easily put in errlog(NOTICE...) and dump out the allocations to
see what is being requested.  It is also possible TOP display is not
accurate in some way.  Does ps vm flags show this too?

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: (resolution?) Re: [HACKERS] memory problem again

From
Tom Lane
Date:
Daniel Kalchev <daniel@digsys.bg> writes:
> I run the postmaster after unlimit, which sets limits thus:
> datasize        262144 kbytes

Oh well, so much for the small-DSIZ theory.  But I still don't much
care for the other theory (sort ignores -S) because (a) I can't
reproduce any such behavior here, (b) I've been through that code
recently and didn't see anything that looked like it would cause
that behavior, and (c) if it were true then we ought to be seeing
more complaints.

I think there's probably some platform-specific issue that's causing
the misbehavior you see, but I'm at a loss to guess what it is.
Anyone have any ideas?

> BTW, this postgres is compiled with default of 64 backends - I saw recently 
> note here that this may interfere with the -S option somehow....

I must've missed that --- I don't know any reason for number of backends
to interfere with -S, because -S just sets the amount of memory that
any one backend thinks it can expend for local working storage (per
sort or hash node).  Can you recall where/when this discussion was?

> (another not related bug, but still on memory allocation)
> Still - this does not explain why postgres cannot allocated more than
> 76 MB (according to top) on BSD/OS (never did, actually - any previous
> version too), while a simple malloc(1 MB) loop allocates up to the
> process limit.

That does seem odd.  Could it be that the shared memory segment used
by Postgres gets placed at 64M or so in your process's virtual address
space, thus preventing the malloc arena from expanding past that point?
If so, is there anything we can do to force a higher placement?

> Maybe at some time postrges tries to allocate 'larger' chunk, which
> the BSD/OS malloc does not like?

There is some code in aset.c that asks for larger and larger chunks,
but it should fall back to asking for a smaller chunk if it can't
get a bigger one.  More to the point, the sort operation invoked by
SELECT DISTINCT shouldn't ask for more than (roughly) your -S setting.
So I'm still clueless where the problem is :-(
        regards, tom lane