OOM-killer issue with a specific query - Mailing list pgsql-performance

From nabble.30.miller_2555@spamgourmet.com
Subject OOM-killer issue with a specific query
Date
Msg-id CABLpH8y=QiHwshW6ss66_62_ddxQFQ=f9MtKv8GqThi-h90xbA@mail.gmail.com
Whole thread Raw
Responses Re: OOM-killer issue with a specific query  (Scott Marlowe <scott.marlowe@gmail.com>)
List pgsql-performance
Hi - I'm running into an OOM-killer issue when running a specific query (no virtual machine running) and, based on researching the issue, I can probably fix by making the following sysctl adjustments:
    vm.overcommit_memory = 2
    vm.overcommit_ratio = 0
However, I am perplexed as to why I am running into the issue in the first place. The machine (running Linux 2.6.34.7-61.fc13.x86_64) is dedicated to Postgres (v9.0.0 [RPM package: postgresql90-9.0.0-1PGDG.fc13.1.x86_64]) and the following memory usage is pretty typical for the system (via "top"):
    Mem:   8121992k total,  2901960k used,  5220032k free,   237408k buffers
    Swap:  1048572k total,   235940k used,   812632k free,  2053768k cached
Under steady-state conditions, the following shows the virtual memory size for postgres backend processes:
     PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    8506 postgres  20   0 2327m 3084 1792 S  0.0  0.0   0:00.33 postgres
    8504 postgres  20   0 2326m  14m  13m S  0.0  0.2   0:01.32 postgres
    8505 postgres  20   0 2326m  728  452 S  0.0  0.0   0:00.91 postgres
    3582 postgres  20   0 2325m  54m  53m S  0.0  0.7   0:02.03 postgres
My current relevant postgresql.conf settings are the following:
    shared_buffers = 2100MB
    temp_buffers = 8MB
    work_mem = 32MB
    maintenance_work_mem = 16MB
    max_stack_depth = 2MB
    constraint_exclusion = partition
When executing the query, I've been watching the "top" activity, sorted by resident memory. Upon execution, no other processes appear to take additional resident memory, except a postgres backend process servicing the query, which goes to +6Gb (triggering the OOM-killer). Given the settings in postgresql.conf, and my anecdotal understanding of Postgres memory management functions, I am uncertain why Postgres exhausts physical memory instead of swapping to temporary files. Do I need to lower my work_mem setting since the subquery involves a partitioned table, causing a multiplier effect to the memory used (I have tried per-connection settings of 6MB)? Would tweaking  query planning settings help?

Thanks in advance!

If it helps, I have included the query (with column names aliased to their data type), a brief description of the applicable table's contents, and an abridged copy of the EXPLAIN ANALYZE output

SELECT "bigint", "date", "text"
FROM tableA AS A
WHERE A."boolean" = 'true' AND
(A."text" = 'abc' OR A."text" = 'xyz') AND
A."bigint" NOT IN (SELECT "bigint" FROM tableB)
ORDER BY A."date" DESC;

tableA:
    - total table contains ~11 million records (total width: 109 bytes)
    - partitioned by month (180 partitions)
    - each table partition contains ~100k records
tableB:
    - total table contains ~400k records (total width: 279 bytes)
    - partitioned by month (96 partitions)
    - each table partition contains ~30k records


EXPLAIN ANALYZE output:
    Note: could not produce output for exact query due to OOM-killer, but ran query by limiting the subquery to the first 50 results. The planner iterates over all partitions, but only the first two partitions are noted for brevity.

                                                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=451279.67..451279.70 rows=10 width=55) (actual time=18343.085..18343.090 rows=10 loops=1)
   ->  Sort  (cost=451279.67..456398.37 rows=2047480 width=55) (actual time=18343.083..18343.087 rows=10 loops=1)
         Sort Key: A."Date"
         Sort Method:  top-N heapsort  Memory: 26kB
         ->  Result  (cost=1.21..407034.37 rows=2047480 width=55) (actual time=0.793..17014.726 rows=4160606 loops=1)
               ->  Append  (cost=1.21..407034.37 rows=2047480 width=55) (actual time=0.792..16119.298 rows=4160606 loops=1)
                     ->  Seq Scan on tableA A  (cost=1.21..19.08 rows=1 width=44) (actual time=0.002..0.002 rows=0 loops=1)
                           Filter: ("boolean" AND (NOT (hashed SubPlan 1)) AND (("text" = 'abc'::text) OR ("text" = 'xyz'::text)))
                           SubPlan 1
                             ->  Limit  (cost=0.00..1.08 rows=50 width=8) (actual time=0.010..0.054 rows=50 loops=210)
                                   ->  Result  (cost=0.00..9249.46 rows=427846 width=8) (actual time=0.009..0.044 rows=50 loops=210)
                                         ->  Append  (cost=0.00..9249.46 rows=427846 width=8) (actual time=0.008..0.031 rows=50 loops=210)
                                               ->  Seq Scan on tableB  (cost=0.00..15.30 rows=530 width=8) (actual time=0.001..0.001 rows=0 loops=210)
                                               ->  Seq Scan on tableB_201201 tableB  (cost=0.00..15.30 rows=530 width=8) (actual time=0.000..0.000 rows=0 loops=210)
                                               ->  Seq Scan on tableB_201112 tableB  (cost=0.00..251.25 rows=12125 width=8) (actual time=0.006..0.019 rows=50 loops=210)
                                               ->  ...
                     ->  Seq Scan on tableA_201201 A  (cost=1.21..19.08 rows=1 width=44) (actual time=0.001..0.001 rows=0 loops=1)
                           Filter: ("boolean" AND (NOT (hashed SubPlan 1)) AND (("text" = 'abc'::text) OR ("text" = 'xyz'::text)))
                           SubPlan 1
                             ->  Limit  (cost=0.00..1.08 rows=50 width=8) (actual time=0.010..0.054 rows=50 loops=210)
                                   ->  Result  (cost=0.00..9249.46 rows=427846 width=8) (actual time=0.009..0.044 rows=50 loops=210)
                                         ->  Append  (cost=0.00..9249.46 rows=427846 width=8) (actual time=0.008..0.031 rows=50 loops=210)
                                               ->  Seq Scan on tableB  (cost=0.00..15.30 rows=530 width=8) (actual time=0.001..0.001 rows=0 loops=210)
                                               ->  Seq Scan on tableB_201201 tableB  (cost=0.00..15.30 rows=530 width=8) (actual time=0.000..0.000 rows=0 loops=210)
                                               ->  Seq Scan on tableB_201112 tableB  (cost=0.00..251.25 rows=12125 width=8) (actual time=0.006..0.019 rows=50 loops=210)
                                               ->  ...
                     ->  Seq Scan on tableA_201112 A  (cost=1.21..794.69 rows=5980 width=55) (actual time=0.789..12.686 rows=12075 loops=1)
                           Filter: ("boolean" AND (NOT (hashed SubPlan 1)) AND (("text" = 'abc'::text) OR ("text" = 'xyz'::text)))
                           SubPlan 1
                             ->  Limit  (cost=0.00..1.08 rows=50 width=8) (actual time=0.010..0.054 rows=50 loops=210)
                                   ->  Result  (cost=0.00..9249.46 rows=427846 width=8) (actual time=0.009..0.044 rows=50 loops=210)
                                         ->  Append  (cost=0.00..9249.46 rows=427846 width=8) (actual time=0.008..0.031 rows=50 loops=210)
                                               ->  Seq Scan on tableB  (cost=0.00..15.30 rows=530 width=8) (actual time=0.001..0.001 rows=0 loops=210)
                                               ->  Seq Scan on tableB_201201 tableB  (cost=0.00..15.30 rows=530 width=8) (actual time=0.000..0.000 rows=0 loops=210)
                                               ->  Seq Scan on tableB_201112 tableB  (cost=0.00..251.25 rows=12125 width=8) (actual time=0.006..0.019 rows=50 loops=210)
                                               ->  Seq Scan on tableB_201111 tableB  (cost=0.00..604.89 rows=29189 width=8) (never executed)
                                               ->  ...
                     ->  Seq Scan on tableA_201111 A  (cost=1.21..2666.12 rows=14670 width=55) (actual time=0.441..36.680 rows=29189 loops=1)
                           Filter: ("boolean" AND (NOT (hashed SubPlan 1)) AND (("text" = 'abc'::text) OR ("text" = 'xyz'::text)))
                           SubPlan 1
                             ->  Limit  (cost=0.00..1.08 rows=50 width=8) (actual time=0.010..0.054 rows=50 loops=210)
                                   ->  Result  (cost=0.00..9249.46 rows=427846 width=8) (actual time=0.009..0.044 rows=50 loops=210)
                                         ->  Append  (cost=0.00..9249.46 rows=427846 width=8) (actual time=0.008..0.031 rows=50 loops=210)
                                               ->  Seq Scan on tableB  (cost=0.00..15.30 rows=530 width=8) (actual time=0.001..0.001 rows=0 loops=210)
                                               ->  Seq Scan on tableB_201201 tableB  (cost=0.00..15.30 rows=530 width=8) (actual time=0.000..0.000 rows=0 loops=210)
                                               ->  Seq Scan on tableB_201112 tableB  (cost=0.00..251.25 rows=12125 width=8) (actual time=0.006..0.019 rows=50 loops=210)
                                               ->  Seq Scan on tableB_201111 tableB  (cost=0.00..604.89 rows=29189 width=8) (never executed)
                                               ->  ...
                     ->  ...
 Total runtime: 18359.851 ms
(23327 rows)


pgsql-performance by date:

Previous
From: Marti Raudsepp
Date:
Subject: Re: Dramatic change in memory usage with version 9.1
Next
From: Cédric Villemain
Date:
Subject: Re: Dramatic change in memory usage with version 9.1