Thread: memory problem again
Hello, I have this problem with PostgreSQL 6.5.2: table timelog199911 has logs=> select count(*) from timelog199911;count ------ 208749 (1 row) logs=> select distinct confid logs-> from timelog199910 logs-> where logs-> confid IS NOT NULL; pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally before or while processing the request. We have lost the connection to the backend, so further processing is impossible. Terminating. The logged message in stderr (of postmaster) is FATAL 1: Memory exhausted in AllocSetAlloc() The process size grows to 76 MB (this is somehow a limit of Postgres on BSD/OS, but this is not my question now). Why would it require so much memory? The same query without distinct is processed fast, but I don't need that much data back in the application. The format is: Table = timelog +----------------------------------+----------------------------------+-------+ | Field | Type | Length| +----------------------------------+----------------------------------+-------+ | loginname | text | var | | site | varchar() | 16 | | start_time | datetime | 8 | | elapsed | timespan | 12 | | port | text | var | | valid | bool default 't' | 1 | | ipaddress | inet | var | | confid | int4 | 4 | | session_id | text | var | +----------------------------------+----------------------------------+-------+ Indices: timelog_loginname_idx timelog_start_time_idx (indexes are btree on the indicate fields). Weird, isn't it? Daniel
Daniel Kalchev <daniel@digsys.bg> writes: > I have this problem with PostgreSQL 6.5.2: > logs=> select distinct confid > logs-> from timelog199910 > logs-> where > logs-> confid IS NOT NULL; > pqReadData() -- backend closed the channel unexpectedly. > The logged message in stderr (of postmaster) is > FATAL 1: Memory exhausted in AllocSetAlloc() Odd. I can't replicate this here. (I'm using 6.5.3, but I doubt that matters.) There must be some factor involved that you haven't told us. You don't have any triggers or rules on the table, do you? Has anyone else seen anything like this? regards, tom lane
Tom... this is getting even more weird: logs=> select distinct confid from timelog199911; pqReadData() -- backend closed the channel unexpectedly. [...] Now this: logs=> \copy timelog199911 to timelog199911 Successfully copied. logs=> drop table timelog199911; DROP logs=> CREATE TABLE "timelog199911" ( "loginname" text, "site" character varying(16), "start_time" datetime, "elapsed" timespan, "port" text, "valid" bool, "ipaddress" inet, "confid" int4, "session_id" text); [...] CREATE logs=> CREATE INDEX "timelog199911_loginname_idx" on "timelog199911" using btree ( "loginname" "text_ops" ); CREATE logs=> \copy timelog199911 from timelog199911 (ok, I know it's smarted to build the index after copying in the data) Successfully copied. logs=> select distinct confid from timelog199911; sbrk: grow failed, return = 12 sbrk: grow failed, return = 12 pqReadData() -- backend closed the channel unexpectedly. [...] logs=> select confid from timelog199911; [...] (208749 rows) Weird! Daniel >>>Tom Lane said:> Daniel Kalchev <daniel@digsys.bg> writes:> > I have this problem with PostgreSQL 6.5.2:> > > logs=> selectdistinct confid > > logs-> from timelog199910> > logs-> where> > logs-> confid IS NOT NULL;> > pqReadData() -- backendclosed the channel unexpectedly.> > > The logged message in stderr (of postmaster) is > > FATAL 1: Memory exhaustedin AllocSetAlloc()> > Odd. I can't replicate this here. (I'm using 6.5.3, but I doubt that> matters.) There mustbe some factor involved that you haven't told us.> You don't have any triggers or rules on the table, do you?> > Hasanyone else seen anything like this?> > regards, tom lane
I found out how to resolve this problem, yet it does not explain why it happens anyway! I had postmaster started with this script: unlimit postmaster -D/usr/local/pgsql/data -B 256 -i -o "-e -S 8192" >> /usr/local/pgsql/errlog 2>&1 & Removing all the parameters to postmaster postmaster -D/usr/local/pgsql/data -i -o "-e" >> /usr/local/pgsql/errlog 2>&1 & made it work.... Perhaps some memory management problem? I guess the -S option is the culprit here, but this machine has 256 MB RAM and actually never swaps (yet). Hope this helps somehow. Daniel >>>Tom Lane said:> Daniel Kalchev <daniel@digsys.bg> writes:> > I have this problem with PostgreSQL 6.5.2:> > > logs=> selectdistinct confid > > logs-> from timelog199910> > logs-> where> > logs-> confid IS NOT NULL;> > pqReadData() -- backendclosed the channel unexpectedly.> > > The logged message in stderr (of postmaster) is > > FATAL 1: Memory exhaustedin AllocSetAlloc()> > Odd. I can't replicate this here. (I'm using 6.5.3, but I doubt that> matters.) There mustbe some factor involved that you haven't told us.> You don't have any triggers or rules on the table, do you?> > Hasanyone else seen anything like this?> > regards, tom lane
Daniel Kalchev <daniel@digsys.bg> writes: > I found out how to resolve this problem, yet it does not explain why it > happens anyway! > I had postmaster started with this script: > postmaster -D/usr/local/pgsql/data -B 256 -i -o "-e -S 8192" >> > /usr/local/pgsql/errlog 2>&1 & > Removing all the parameters to postmaster > postmaster -D/usr/local/pgsql/data -i -o "-e" >> /usr/local/pgsql/errlog 2>&1 & > made it work.... > Perhaps some memory management problem? I guess the -S option is the culprit > here, but this machine has 256 MB RAM and actually never swaps (yet). 8192 * 1K = 8 meg workspace per sort sure doesn't sound unreasonable. There is a sort going on under-the-hood in your SELECT DISTINCT (it's implemented in the same fashion as "sort | uniq"), but under ordinary circumstances that doesn't cause any problem. I can see a couple of possibilities:1. You have a very small kernel limit on per-process data space, probably 8M or at most 16M.2. Somethingis broken in the sort code that makes it fail to obey the -S limit. I favor #1, since if #2 were true we'd probably have noticed it before. You might try experimenting with a couple of different -S values (-B shouldn't make any difference here, it just affects the size of the shared-memory-block request), and watching the size of the backend process with top(1) or something like it. In the meantime, find out where kernel parameters are set on your system, and look at what MAXDSIZ is set to... regards, tom lane
Tom, I think the #2 is more likely, because: the kernel is compiled with large enough data size: # support for larger processes and number of childs options "DFLDSIZ=\(128*1024*1024\)" options "MAXDSIZ=\(256*1024*1024\)" options "CHILD_MAX=256" options "OPEN_MAX=256" options "KMAPENTRIES=4000" # Prevents kmem malloc errors ! options "KMEMSIZE=\(32*1024*1024\)" the default postgres acocunt limits are: coredumpsize unlimited cputime unlimited datasize 131072 kbytes filesize unlimited maxproc 256 memorylocked 85380 kbytes memoryuse 256136 kbytes openfiles 128 stacksize 2048 kbytes I run the postmaster after unlimit, which sets limits thus: coredumpsize unlimited cputime unlimited datasize 262144 kbytes filesize unlimited maxproc 4116 memorylocked 256140 kbytes memoryuse 256136 kbytes openfiles 13196 stacksize 262144 kbytes I will do some experimentation with the -S flag to see how it works. BTW, this postgres is compiled with default of 64 backends - I saw recently note here that this may interfere with the -S option somehow.... (another not related bug, but still on memory allocation) Still - this does not explain why postgres cannot allocated more than 76 MB (according to top) on BSD/OS (never did, actually - any previous version too), while a simple malloc(1 MB) loop allocates up to the process limit. Maybe at some time postrges tries to allocate 'larger' chunk, which the BSD/OS malloc does not like? Daniel >>>Tom Lane said:> Daniel Kalchev <daniel@digsys.bg> writes:> > I found out how to resolve this problem, yet it does notexplain why it > > happens anyway!> > I had postmaster started with this script:> > postmaster -D/usr/local/pgsql/data-B 256 -i -o "-e -S 8192" >> > > /usr/local/pgsql/errlog 2>&1 &> > Removing all the parameters topostmaster> > postmaster -D/usr/local/pgsql/data -i -o "-e" >> /usr/local/pgsql/errlog 2 >&1 &> > made it work....>> Perhaps some memory management problem? I guess the -S option is the culpr it > > here, but this machine has256 MB RAM and actually never swaps (yet).> > 8192 * 1K = 8 meg workspace per sort sure doesn't sound unreasonable.> Thereis a sort going on under-the-hood in your SELECT DISTINCT (it's> implemented in the same fashion as "sort | uniq"),but under ordinary> circumstances that doesn't cause any problem. I can see a couple of> possibilities:> 1. Youhave a very small kernel limit on per-process data space,> probably 8M or at most 16M.> 2. Something is brokenin the sort code that makes it fail to> obey the -S limit.> I favor #1, since if #2 were true we'd probablyhave noticed it before.> > You might try experimenting with a couple of different -S values (-B> shouldn't make anydifference here, it just affects the size of the> shared-memory-block request), and watching the size of the backend>process with top(1) or something like it.> > In the meantime, find out where kernel parameters are set on your> system,and look at what MAXDSIZ is set to...> > regards, tom lane
> (another not related bug, but still on memory allocation) > Still - this does not explain why postgres cannot allocated more than 76 MB > (according to top) on BSD/OS (never did, actually - any previous version too), > while a simple malloc(1 MB) loop allocates up to the process limit. > > Maybe at some time postrges tries to allocate 'larger' chunk, which the BSD/OS > malloc does not like? > You can easily put in errlog(NOTICE...) and dump out the allocations to see what is being requested. It is also possible TOP display is not accurate in some way. Does ps vm flags show this too? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Daniel Kalchev <daniel@digsys.bg> writes: > I run the postmaster after unlimit, which sets limits thus: > datasize 262144 kbytes Oh well, so much for the small-DSIZ theory. But I still don't much care for the other theory (sort ignores -S) because (a) I can't reproduce any such behavior here, (b) I've been through that code recently and didn't see anything that looked like it would cause that behavior, and (c) if it were true then we ought to be seeing more complaints. I think there's probably some platform-specific issue that's causing the misbehavior you see, but I'm at a loss to guess what it is. Anyone have any ideas? > BTW, this postgres is compiled with default of 64 backends - I saw recently > note here that this may interfere with the -S option somehow.... I must've missed that --- I don't know any reason for number of backends to interfere with -S, because -S just sets the amount of memory that any one backend thinks it can expend for local working storage (per sort or hash node). Can you recall where/when this discussion was? > (another not related bug, but still on memory allocation) > Still - this does not explain why postgres cannot allocated more than > 76 MB (according to top) on BSD/OS (never did, actually - any previous > version too), while a simple malloc(1 MB) loop allocates up to the > process limit. That does seem odd. Could it be that the shared memory segment used by Postgres gets placed at 64M or so in your process's virtual address space, thus preventing the malloc arena from expanding past that point? If so, is there anything we can do to force a higher placement? > Maybe at some time postrges tries to allocate 'larger' chunk, which > the BSD/OS malloc does not like? There is some code in aset.c that asks for larger and larger chunks, but it should fall back to asking for a smaller chunk if it can't get a bigger one. More to the point, the sort operation invoked by SELECT DISTINCT shouldn't ask for more than (roughly) your -S setting. So I'm still clueless where the problem is :-( regards, tom lane