Re: OOM-killer issue with a specific query 9 of 20) - Mailing list pgsql-performance

From nabble.30.miller_2555@spamgourmet.com
Subject Re: OOM-killer issue with a specific query 9 of 20)
Date
Msg-id CABLpH8zUZroESqpe+fiJ4pvtEgm5sHehUHRJkDGbRRp9Fk9hTw@mail.gmail.com
Whole thread Raw
In response to Re: OOM-killer issue with a specific query  (Scott Marlowe <scott.marlowe@gmail.com>)
Responses Re: OOM-killer issue with a specific query 9 of 20)
List pgsql-performance
On Tue, Dec 20, 2011 at 8:24 AM, Scott Marlowe -
scott.marlowe@gmail.com
<+nabble+miller_2555+3b65e832a3.scott.marlowe#gmail.com@spamgourmet.com>
wrote:
>
> On Mon, Dec 19, 2011 at 8:52 AM,  <nabble.30.miller_2555@spamgourmet.com> wrote:
> > I can probably fix by making the following sysctl adjustments:
> >     vm.overcommit_memory = 2
> >     vm.overcommit_ratio = 0

FYI - for the sake of others visiting this post, disabling the OS
memory overcommit does not appear an easy solution in my case as the
box fails to bootstrap due to insufficient memory.

> > Under steady-state conditions, the following shows the virtual memory size
> > for postgres backend processes:
> >      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> >     8506 postgres  20   0 2327m 3084 1792 S  0.0  0.0   0:00.33 postgres
> >     8504 postgres  20   0 2326m  14m  13m S  0.0  0.2   0:01.32 postgres
> >     8505 postgres  20   0 2326m  728  452 S  0.0  0.0   0:00.91 postgres
> >     3582 postgres  20   0 2325m  54m  53m S  0.0  0.7   0:02.03 postgres
>
> FYI, this is not swap usage.  VIRT is all the memory a process has
> handles open to everywhere, including libs that it's linked against
> that might not even be loaded.  Generally speaking, VIRT is close to
> worthless for troubleshooting.
>
Thanks - I misunderstood the meaning of VIRT (its been ahwile since
I've consulted the man page).

> > My current relevant postgresql.conf settings are the following:
> >     shared_buffers = 2100MB
> >     temp_buffers = 8MB
> >     work_mem = 32MB
> >     maintenance_work_mem = 16MB
> >     max_stack_depth = 2MB
> >     constraint_exclusion = partition
>
> What's max_connections?
>
max_connections=20. As a sidenote, this is a development box and there
are no other active connections to the database while this test case
was run.

> > When executing the query, I've been watching the "top" activity, sorted by
> > resident memory. Upon execution, no other processes appear to take
> > additional resident memory, except a postgres backend process servicing the
> > query, which goes to +6Gb (triggering the OOM-killer). Given the settings in
> > postgresql.conf, and my anecdotal understanding of Postgres memory
> > management functions, I am uncertain why Postgres exhausts physical memory
> > instead of swapping to temporary files.
>
> > EXPLAIN ANALYZE output:
> >     Note: could not produce output for exact query due to OOM-killer, but
> > ran query by limiting the subquery to the first 50 results. The planner
> > iterates over all partitions, but only the first two partitions are noted
> > for brevity.
>
> This may be one instance where the regular explain will be more
> useful.  it's quite likely that the query changes when there is no
> limit.  If you compare what explain for the full query says, and what
> explain (analyze) for the abridged one says, the part that's causing
> you to run out of memory may be more obvious.
>
I've run EXPLAIN on the query, but AFAICS the query plan does not
appear significantly different than the abridged version for this
particular query (output attached below). In an effort to analyze the
base case, I re-ran the query (without LIMIT) for a selected partition
of tableA and tableB (both tables are partitioned by "Date" and the
"Date" column on each partition of tableB references the "Date" column
of the corresponding partition of tableA as a foreign key constraint).
The tableA partition holds 82,939 records (record width is 108 bytes,
per EXPLAIN) and the tableB partition holds 13,718 records (record
width is 312 bytes, per EXPLAIN) For a single table partition, `top`
shows the following resource usage of running postmaster processes:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1818 postgres  20   0 2325m  56m  55m S  0.0  0.7   0:32.10 postmaster
 2810 postgres  20   0  156m 1044  420 S  0.0  0.0   0:01.50 postmaster
 2813 postgres  20   0 2326m 256m 255m S  0.0  3.2   0:09.61 postmaster
 2814 postgres  20   0 2326m 2220 1592 S  0.0  0.0   0:04.30 postmaster
 2815 postgres  20   0 2327m 3996 2148 S  0.0  0.0   0:00.66 postmaster
 2816 postgres  20   0  156m 1272  504 S  0.0  0.0   0:09.14 postmaster
29661 postgres  20   0 2335m  49m  40m S  0.0  0.6   0:00.24 postmaster

While I could run the query partition-by-partition, I'd still like to
be able to run a full query across all partitions.

EXPLAIN output excerpt:
                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=3125664.13..3130782.83 rows=2047480 width=55)
   Sort Key: A."Date"
   ->  Result  (cost=11553.15..2856046.51 rows=2047480 width=55)
         ->  Append  (cost=11553.15..2856046.51 rows=2047480 width=55)
               ->  Seq Scan on tableA A  (cost=11553.15..11571.02
rows=1 width=44)
                     Filter: ("Boolean" AND (NOT (hashed SubPlan 1))
AND (("text" = 'abc'::text) OR ("text" = 'xyz'::text)))
                     SubPlan 1
                       ->  Result  (cost=0.00..10357.52 rows=478252 width=8)
                             ->  Append  (cost=0.00..10357.52
rows=478252 width=8)
                                   ->  Seq Scan on tableB
(cost=0.00..15.30 rows=530 width=8)
                                   ->  Seq Scan on tableB_201201
tableB  (cost=0.00..15.30 rows=530 width=8)
                                   ->  Seq Scan on tableB_201112
tableB  (cost=0.00..251.25 rows=12125 width=8)
                                   ->  Seq Scan on tableB_201111
tableB  (cost=0.00..604.89 rows=29189 width=8)
                                   ->  Seq Scan on tableB_201110
tableB  (cost=0.00..490.30 rows=23630 width=8)
                                   ->  ...
               ->  Seq Scan on tableA_201201 A
(cost=11553.15..11571.02 rows=1 width=44)
                     Filter: ("Boolean" AND (NOT (hashed SubPlan 1))
AND (("text" = 'abc'::text) OR ("text" = 'xyz'::text)))
                     SubPlan 1
                       ->  Result  (cost=0.00..10357.52 rows=478252 width=8)
                             ->  Append  (cost=0.00..10357.52
rows=478252 width=8)
                                   ->  Seq Scan on tableB
(cost=0.00..15.30 rows=530 width=8)
                                   ->  Seq Scan on tableB_201201
tableB  (cost=0.00..15.30 rows=530 width=8)
                                   ->  Seq Scan on tableB_201112
tableB  (cost=0.00..251.25 rows=12125 width=8)
                                   ->  Seq Scan on tableB_201111
tableB  (cost=0.00..604.89 rows=29189 width=8)
                                   ->  Seq Scan on tableB_201110
tableB  (cost=0.00..490.30 rows=23630 width=8)
                                   ->  ...
               ->  Seq Scan on tableA_201112 A
(cost=11553.15..12346.63 rows=5980 width=55)
                     Filter: ("Boolean" AND (NOT (hashed SubPlan 1))
AND (("text" = 'abc'::text) OR ("text" = 'xyz'::text)))
                     SubPlan 1
                       ->  Result  (cost=0.00..10357.52 rows=478252 width=8)
                             ->  Append  (cost=0.00..10357.52
rows=478252 width=8)
                                   ->  Seq Scan on tableB
(cost=0.00..15.30 rows=530 width=8)
                                   ->  Seq Scan on tableB_201201
tableB  (cost=0.00..15.30 rows=530 width=8)
                                   ->  Seq Scan on tableB_201112
tableB  (cost=0.00..251.25 rows=12125 width=8)
                                   ->  Seq Scan on tableB_201111
tableB  (cost=0.00..604.89 rows=29189 width=8)
                                   ->  Seq Scan on tableB_201110
tableB  (cost=0.00..490.30 rows=23630 width=8)
                                   ->  ...
               ->  ...
(23112 rows)


pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: OOM-killer issue with a specific query
Next
From: Tom Lane
Date:
Subject: Re: OOM-killer issue with a specific query 9 of 20)