Thread: 8.4devel out of memory

8.4devel out of memory

From
"Kevin Grittner"
Date:
I was testing a very complex statistical query, with (among other
things) many EXISTS and NOT EXISTS tests against a build of the source
snapshot from 3 September.  (The query looks pretty innocent, but
those aren't tables, they're complicated views.)  Under 8.3.3 this
query runs successfully, but takes a few hours.  I started it last
night before leaving, on the same machine where 8.3.3 has been
running, and in the morning found this:

olr=# explain analyze
SELECT
      "MS"."sMatterNo",
      CAST(COUNT(*) AS int) AS "count"
  FROM
   "MatterSearch" "MS"
   JOIN "MatterDateStat" "S" ON
     (
      "S"."matterNo" = "MS"."sMatterNo" AND
       "S"."isOnHold" = FALSE
   )
  WHERE
  (
   "MS"."matterStatusCode" IN ('OP', 'RO')
  )
  GROUP BY "MS"."sMatterNo"
;
ERROR:  out of memory
DETAIL:  Failed on request of size 8.

It was running for about half an hour before I left, and I didn't
notice the error, so I'm pretty sure it took longer than that for this
error to appear.

kgrittn@OLR-DEV-PG:~> df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              20G  8.0G   11G  43% /
tmpfs                 2.0G   16K  2.0G   1% /dev/shm
/dev/sda3             253G  7.9G  245G   4% /var/pgsql/data
kgrittn@OLR-DEV-PG:~> free -m
             total       used       free     shared    buffers
cached
Mem:          4049       2239       1809          0         94
1083
-/+ buffers/cache:       1061       2987
Swap:         1027        561        466

There are several development databases on this machine, all fairly
small, but enough that there's usually no significant free memory --
it gets used as cache.  The 1.8 GB free this morning suggests that
something allocated and free a lot of memory.

kgrittn@OLR-DEV-PG:~/postgresql-snapshot> uname -a
Linux OLR-DEV-PG 2.6.5-7.286-bigsmp #1 SMP Thu May 31 10:12:58 UTC 2007
i686 i686 i386 GNU/Linux
kgrittn@OLR-DEV-PG:~/postgresql-snapshot> cat /proc/version
Linux version 2.6.5-7.286-bigsmp (geeko@buildhost) (gcc version 3.3.3
(SuSE Linux)) #1 SMP Thu May 31 10:12:58 UTC 2007
kgrittn@OLR-DEV-PG:~/postgresql-snapshot> cat /etc/SuSE-release
SUSE LINUX Enterprise Server 9 (i586)
VERSION = 9
PATCHLEVEL = 3

Attached are the plans from 8.3.3 and 8.4devel.  Also attached are the
non-default 8.3.3 postgresql.conf settings; the file is the same for
8.4devel except for the port number.  I don't know if the specifics of
the views and tables would be useful here, or just noise, so I'll omit
them unless someone asks for them.

What would be the reasonable next step here?

-Kevin

kgrittn@OLR-DEV-PG:~> /usr/local/pgsql-8.4dev/bin/pg_config
BINDIR = /usr/local/pgsql-8.4dev/bin
DOCDIR = /usr/local/pgsql-8.4dev/share/doc
HTMLDIR = /usr/local/pgsql-8.4dev/share/doc
INCLUDEDIR = /usr/local/pgsql-8.4dev/include
PKGINCLUDEDIR = /usr/local/pgsql-8.4dev/include
INCLUDEDIR-SERVER = /usr/local/pgsql-8.4dev/include/server
LIBDIR = /usr/local/pgsql-8.4dev/lib
PKGLIBDIR = /usr/local/pgsql-8.4dev/lib
LOCALEDIR = /usr/local/pgsql-8.4dev/share/locale
MANDIR = /usr/local/pgsql-8.4dev/share/man
SHAREDIR = /usr/local/pgsql-8.4dev/share
SYSCONFDIR = /usr/local/pgsql-8.4dev/etc
PGXS = /usr/local/pgsql-8.4dev/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/usr/local/pgsql-8.4dev'
'--enable-integer-datetimes' '--enable-debug' '--disable-nls'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE
CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wendif-labels
-fno-strict-aliasing -g
CFLAGS_SL = -fpic
LDFLAGS = -Wl,-rpath,'/usr/local/pgsql-8.4dev/lib'
LDFLAGS_SL =
LIBS = -lpgport -lz -lreadline -lcrypt -ldl -lm
VERSION = PostgreSQL 8.4devel
kgrittn@OLR-DEV-PG:~> /usr/local/pgsql-8.4dev/bin/pg_controldata
/var/pgsql/data/kgrittn
pg_control version number:            842
Catalog version number:               200808311
Database system identifier:           5242286260647024629
Database cluster state:               in production
pg_control last modified:             Thu 04 Sep 2008 05:17:28 PM CDT
Latest checkpoint location:           0/26E7A718
Prior checkpoint location:            0/26E7A6D4
Latest checkpoint's REDO location:    0/26E7A718
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          0/3561
Latest checkpoint's NextOID:          49152
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Time of latest checkpoint:            Thu 04 Sep 2008 05:17:28 PM CDT
Minimum recovery ending location:     0/0
Maximum data alignment:               4
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        2000
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by reference
Maximum length of locale name:        128
LC_COLLATE:                           C
LC_CTYPE:                             C


Attachment

Re: 8.4devel out of memory

From
"Kevin Grittner"
Date:
>>> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

> ERROR:  out of memory
> DETAIL:  Failed on request of size 8.

> What would be the reasonable next step here?

I bet the log would be of interest.  :-)

-Kevin

Attachment

Re: 8.4devel out of memory

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>     PortalHeapMemory: 1620992 total in 200 blocks; 5856 free (8 chunks); 1615136 used
>       ExecutorState: 2787288448 total in 364 blocks; 328 free (5 chunks); 2787288120 used

Ouch.  We have created a memory leak someplace, but it's hard to tell
where from this info.  Can you boil this down into a self-contained test
case?  I doubt it depends at all on the specific data, so the table &
view definitions plus some dummy data would probably be enough to
reproduce it.

Is this a 32-bit or 64-bit build?  Also, does the leak still occur if
you just run the query as-is rather than EXPLAIN ANALYZE it?
        regards, tom lane


Re: 8.4devel out of memory

From
"Kevin Grittner"
Date:
>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: 
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>>     PortalHeapMemory: 1620992 total in 200 blocks; 5856 free (8
chunks); 
> 1615136 used
>>       ExecutorState: 2787288448 total in 364 blocks; 328 free (5
chunks); 
> 2787288120 used
> 
> Ouch.  We have created a memory leak someplace, but it's hard to
tell
> where from this info.  Can you boil this down into a self-contained
test
> case?  I doubt it depends at all on the specific data, so the table
&
> view definitions plus some dummy data would probably be enough to
> reproduce it.
The tables and views aren't that hard; finding a way to generate
enough fake data may be a challenge.  (I assume that since it took
over a half hour to run out of memory, the volume of data needs to be
sufficient to get there.)
> Is this a 32-bit or 64-bit build?
32-bit PostgreSQL on 32-bit Linux.
> Also, does the leak still occur if
> you just run the query as-is rather than EXPLAIN ANALYZE it?
I will find out.
-Kevin


Re: 8.4devel out of memory

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> The tables and views aren't that hard; finding a way to generate
> enough fake data may be a challenge.  (I assume that since it took
> over a half hour to run out of memory, the volume of data needs to be
> sufficient to get there.)

We don't really need 2GB of leakage to find the problem ... a query
that generates a couple hundred meg of bloat would be plenty.

Since we don't know how much space the query would have needed to run to
completion, it's likely that something involving much less than a tenth
as much data would still be enough to make the leak obvious.
        regards, tom lane


Re: 8.4devel out of memory

From
"Kevin Grittner"
Date:
>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: 
> Also, does the leak still occur if
> you just run the query as-is rather than EXPLAIN ANALYZE it?
The machine became unresponsive similar to the early symptoms of the
apparent memory leak cited in this post:
http://archives.postgresql.org/pgsql-bugs/2008-07/msg00105.php
It was impossible to kill the offending process, since the connections
were unresponsive; we didn't wait for the OOM killer, but rebooted the
machine.
-Kevin


Re: 8.4devel out of memory

From
"Kevin Grittner"
Date:
>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: 
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> The tables and views aren't that hard; finding a way to generate
>> enough fake data may be a challenge.  (I assume that since it took
>> over a half hour to run out of memory, the volume of data needs to
be
>> sufficient to get there.)
> 
> We don't really need 2GB of leakage to find the problem ... a query
> that generates a couple hundred meg of bloat would be plenty.
> 
> Since we don't know how much space the query would have needed to run
to
> completion, it's likely that something involving much less than a
tenth
> as much data would still be enough to make the leak obvious.
I've got the database down to the minimum database objects needed for
the query (three tables, five views on them, and a bunch of domains),
and gotten those tables down to 10% of their original size, with all
confidential data turned to meaningless literals.  The problem still
happens on today's snapshot.
kgrittn@OLR-DEV-PG:~> cat /proc/version
Linux version 2.6.5-7.286-bigsmp (geeko@buildhost) (gcc version 3.3.3
(SuSE Linux)) #1 SMP Thu May 31 10:12:58 UTC 2007
kgrittn@OLR-DEV-PG:~> cat /etc/SuSE-release
SUSE LINUX Enterprise Server 9 (i586)
VERSION = 9
PATCHLEVEL = 3
kgrittn@OLR-DEV-PG:~> /usr/local/pgsql-8.4dev-20081008/bin/pg_config
BINDIR = /usr/local/pgsql-8.4dev-20081008/bin
DOCDIR = /usr/local/pgsql-8.4dev-20081008/share/doc
HTMLDIR = /usr/local/pgsql-8.4dev-20081008/share/doc
INCLUDEDIR = /usr/local/pgsql-8.4dev-20081008/include
PKGINCLUDEDIR = /usr/local/pgsql-8.4dev-20081008/include
INCLUDEDIR-SERVER = /usr/local/pgsql-8.4dev-20081008/include/server
LIBDIR = /usr/local/pgsql-8.4dev-20081008/lib
PKGLIBDIR = /usr/local/pgsql-8.4dev-20081008/lib
LOCALEDIR = /usr/local/pgsql-8.4dev-20081008/share/locale
MANDIR = /usr/local/pgsql-8.4dev-20081008/share/man
SHAREDIR = /usr/local/pgsql-8.4dev-20081008/share
SYSCONFDIR = /usr/local/pgsql-8.4dev-20081008/etc
PGXS = /usr/local/pgsql-8.4dev-20081008/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/usr/local/pgsql-8.4dev-20081008'
'--enable-integer-datetimes' '--enable-debug' '--disable-nls'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE
CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wendif-labels
-fno-strict-aliasing -g
CFLAGS_SL = -fpic
LDFLAGS = -Wl,-rpath,'/usr/local/pgsql-8.4dev-20081008/lib'
LDFLAGS_SL =
LIBS = -lpgport -lz -lreadline -lcrypt -ldl -lm
VERSION = PostgreSQL 8.4devel
listen_addresses = '*'
port = 5666
max_connections = 200
shared_buffers = 256MB
temp_buffers = 10MB
max_prepared_transactions = 0
work_mem = 16MB
maintenance_work_mem = 400MB
bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 4.0
wal_buffers = 256kB
checkpoint_segments = 50
seq_page_cost = 0.1
random_page_cost = 0.1
effective_cache_size = 3GB
geqo = off
default_statistics_target = 100
from_collapse_limit = 20
join_collapse_limit = 20
logging_collector = on
log_connections = on
log_disconnections = on
log_line_prefix = '[%m] %p %q<%u %d %r> '
autovacuum_naptime = 1min
autovacuum_vacuum_threshold = 10
autovacuum_analyze_threshold = 10
datestyle = 'iso, mdy'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
default_text_search_config = 'pg_catalog.english'
escape_string_warning = off
sql_inheritance = off
standard_conforming_strings = on
vmstat 1 output at start and end of process
(process interrupted by ^C)
procs -----------memory---------- ---swap-- -----io---- --system--
----cpu----r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us
sy id wa0  0  31620 426432  11876 2363148    0    0     2     1    1     7  4 
0 96  00  0  31620 426432  11876 2363148    0    0     0     0 1016   680  0 
0 100  00  0  31620 426432  11876 2363148    0    0     0     0 1034   787  3 
0 98  00  0  31620 426432  11884 2363140    0    0     0    12 1017   702  0 
0 99  10  0  31620 426308  11884 2363140    0    0     0    27 1019   704  0 
0 100  00  0  31620 426308  11884 2363140    0    0     0     0 1023   777  0 
0 99  00  0  31620 426324  11884 2363140    0    0     0     0 1017   677  0 
0 100  00  0  31620 426324  11884 2363140    0    0     0     0 1007   716  3 
0 97  00  0  31620 426324  11884 2363140    0    0     0     0 1011   705  0 
0 100  00  0  31620 426316  11884 2363140    0    0     0    30 1017   715  0 
0 100  00  0  31620 426308  11884 2363140    0    0     0     0 1020   812  0 
1 99  00  0  31620 426308  11884 2363140    0    0     0     0 1016   686  0 
0 100  00  0  31620 426308  11884 2363140    0    0     0     0 1013   717  0 
0 100  00  0  31620 426308  11884 2363140    0    0     0     0 1013   712  0 
0 100  00  0  31620 426340  11884 2363140    0    0     0     6 1013   693  0 
0 100  00  0  31620 426340  11884 2363140    0    0     0     8 1028   773  0 
0 100  00  0  31620 426340  11884 2363140    0    0     0     0 1014   728  0 
0 100  00  0  31620 426200  11884 2363140    0    0     0     8 1047   787  2 
0 99  00  0  31620 426200  11884 2363140    0    0     0     0 1021   729  0 
0 100  00  0  31620 426200  11884 2363140    0    0     0    26 1017   718  0 
0 100  00  0  31620 426200  11884 2363140    0    0     0     4 1018   792  0 
0 100  00  0  31620 426200  11884 2363140    0    0     0     8 1026   717  0 
0 100  00  0  31620 426200  11884 2363140    0    0     0     9 1026   747  9 
0 92  00  0  31620 426200  11884 2363140    0    0     0     8 1013   681  0 
0 100  00  0  31620 426200  11884 2363140    0    0     0    18 1018   679  0 
0 100  00  0  31620 426192  11884 2363140    0    0     0     0 1021   759  0 
0 100  00  0  31620 426192  11884 2363140    0    0     0     0 1009   670  0 
0 100  00  0  31620 426192  11884 2363140    0    0     0     0 1018   679  0 
0 100  00  0  31620 426192  11884 2363140    0    0     0     0 1020   711  0 
0 100  00  0  31620 426192  11884 2363140    0    0     0    86 1025   710  0 
0 100  01  0  31620 426168  11884 2363140    0    0     0     0 1020   754  0 
0 100  01  0  31620 419968  11884 2366224    0    0     0     0 1042   714 38 
1 61  01  0  31620 418480  11884 2366224    0    0     0    13 1036   870 66
25 10  01  0  31620 416992  11884 2366224    0    0     0     0 1016   745 49 
1 50  01  0  31620 415520  11884 2366224    0    0     0    24 1020   755 49 
1 50  02  0  31620 414032  11884 2366224    0    0     0    13 1029   831 50 
1 50  01  0  31620 412544  11884 2366224    0    0     0     0 1012   716 50 
1 50  01  0  31620 411180  11884 2366224    0    0     0     0 1012   742 49 
1 50  01  0  31620 409692  11884 2366224    0    0     0     0 1009   755 49 
1 50  01  0  31620 408204  11884 2366224    0    0     0    84 1020   730 49 
1 50  01  0  31620 406592  11884 2366224    0    0     0     0 1019   801 49 
1 50  01  0  31620 405104  11884 2366224    0    0     0     0 1010   735 49 
3 49  01  0  31620 403756  11884 2366224    0    0     0     0 1011   748 49 
1 50  01  0  31620 402268  11884 2366224    0    0     0     0 1019   712 49 
1 50  01  0  31620 400780  11884 2366224    0    0     0    22 1021   731 49 
1 50  01  0  31620 399268  11884 2366224    0    0     0     0 1017   789 49 
1 50  01  0  31620 397780  11884 2366224    0    0     0     0 1015   705 49 
1 50  01  0  31620 396292  11884 2366224    0    0     0     0 1038   749 50 
1 49  02  0  31620 394680  11884 2366224    0    0     0     0 1019   767 49 
1 50  01  0  31620 393192  11884 2366224    0    0     0   109 1028   716 49 
1 50  01  0  31620 391704  11884 2366224    0    0     0     4 1021   775 49 
1 50  01  0  31620 390216  11884 2366224    0    0     0     4 1017   699 50 
1 50  0
procs -----------memory---------- ---swap-- -----io---- --system--
----cpu----r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us
sy id wa3  0  31860 136604   4348 2183840    0    0     0     4 1016   665 49 
1 50  02  0  31860 137844   4348 2181784    0    0     0     7 1031   739 49 
1 50  02  0  31860 137596   4348 2180756    0    0     0    19 1016   669 49 
1 50  02  0  31860 137348   4348 2178700    0    0     0     4 1013   670 49 
1 50  02  0  31860 137240   4356 2177664    0    0     0    27 1023   709 49 
1 49  12  0  31860 136992   4356 2176636    0    0     0   105 1034   680 49 
1 50  02  0  31860 136744   4356 2175608    0    0     0     4 1017   724 49 
2 50  03  0  31860 136620   4356 2173552    0    0     0    19 1018   665 49 
1 50  02  0  31860 137380   4356 2171496    0    0     0     7 1014   662 50 
1 50  02  0  31860 137256   4356 2170468    0    0     0     4 1022   653 49 
1 50  02  0  31860 137008   4356 2169440    0    0     0    28 1017   669 49 
1 50  02  0  31860 136860   4356 2168412    0    0     0    23 1026   742 50 
1 50  02  0  31860 318396   4356 2167384    0    0     0    25 1016   732 51 
3 47  01  0  31860 639556   4356 2167384    0    0     0    11 1016   686  0 
2 98  01  0  31860 639556   4356 2167384    0    0     0    19 1039   757  2 
1 98  01  0  31860 639556   4356 2167384    0    0     0    36 1019   639  0 
0 100  01  0  31860 639060   4356 2167384    0    0     0  8240 1125   779  0 
0 98  11  0  31860 639680   4356 2167384    0    0     0   104 1091   767  0 
0 78 222  0  31860 639680   4356 2167384    0    0     0     0 1011   637  0 
0 100  01  0  31860 639680   4356 2167384    0    0     0     0 1017   652  0 
0 100  01  0  31860 639696   4356 2167384    0    0     0    29 1021   666  0 
0 100  01  0  31860 639696   4356 2167384    0    0     0     0 1034   739  0 
0 100  01  0  31860 643416   4356 2167384    0    0     0     0 1018   677  0 
0 99  01  0  31860 643416   4356 2167384    0    0     0     5 1012   649  0 
0 100  01  0  31860 643416   4356 2167384    0    0     0     0 1013   655  0 
0 100  01  0  31860 643416   4356 2167384    0    0     0    41 1017   637  0 
0 100  01  0  31860 643416   4356 2167384    0    0     0     0 1023   724  0 
0 100  02  0  31860 643284   4364 2167376    0    0     0    44 1023   698  2 
1 96  11  0  31860 643408   4364 2167376    0    0     0     0 1014   643  7 
0 93  01  0  31860 643408   4364 2167376    0    0     0    26 1046   741  1 
0 99  0
The query:
SELECT     "MS"."sMatterNo",     CAST(COUNT(*) AS int) AS "count" FROM  "MatterSearch" "MS"  JOIN "MatterDateStat" "S"
ON   (     "S"."matterNo" = "MS"."sMatterNo" AND      "S"."isOnHold" = FALSE  ) WHERE (  "MS"."matterStatusCode" IN
('OP','RO') ) GROUP BY "MS"."sMatterNo"
 
;
The pg_dump tarball is 20MB.  Should I email it directly to you?
-Kevin


Re: 8.4devel out of memory

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> I've got the database down to the minimum database objects needed for
> the query (three tables, five views on them, and a bunch of domains),
> and gotten those tables down to 10% of their original size, with all
> confidential data turned to meaningless literals.  The problem still
> happens on today's snapshot.

Excellent.

> The pg_dump tarball is 20MB.  Should I email it directly to you?

Works for me.
        regards, tom lane


Re: 8.4devel out of memory

From
Tom Lane
Date:
Hah, got it.  The reason the leak only manifests on 32-bit is that it
only manifests if int8 is a pass-by-reference datatype.  The new API
for amgetbitmap causes a query-lifespan leak of one palloc(sizeof(int8))
per bitmap index scan.  The distinguishing feature of your query seems
to be just that it does enough repeated bitmapscans to notice ...

The attached patch cures the leak for me, but I find it a tad ugly.
I'm tempted to think that it would be better if we changed the call
signature for amgetbitmap so that it returned the count through an
"int64 *" output parameter.  Thoughts anyone?
        regards, tom lane


Index: src/backend/access/index/indexam.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/access/index/indexam.c,v
retrieving revision 1.110
diff -c -r1.110 indexam.c
*** src/backend/access/index/indexam.c    11 Sep 2008 14:01:09 -0000    1.110
--- src/backend/access/index/indexam.c    9 Oct 2008 22:47:54 -0000
***************
*** 655,660 ****
--- 655,661 ---- {     FmgrInfo   *procedure;     int64        ntids;
+     Datum        d;      SCAN_CHECKS;     GET_SCAN_PROCEDURE(amgetbitmap);
***************
*** 665,673 ****     /*      * have the am's getbitmap proc do all the work.      */
!     ntids = DatumGetInt64(FunctionCall2(procedure,
!                                         PointerGetDatum(scan),
!                                         PointerGetDatum(bitmap)));
pgstat_count_index_tuples(scan->indexRelation,ntids); 
 
--- 666,680 ----     /*      * have the am's getbitmap proc do all the work.      */
!     d = FunctionCall2(procedure,
!                       PointerGetDatum(scan),
!                       PointerGetDatum(bitmap));
! 
!     ntids = DatumGetInt64(d);
! 
! #ifndef USE_FLOAT8_BYVAL
!     pfree(DatumGetPointer(d));
! #endif      pgstat_count_index_tuples(scan->indexRelation, ntids); 



Re: 8.4devel out of memory

From
"Kevin Grittner"
Date:
>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: 
> The attached patch cures the leak for me
I see that the patch was applied.  A CVS checkout from this morning
fixes the leak for me, too; the vmstat output stayed rock steady
during the run.
Thanks!
-Kevin