Re: Out of memory on SELECT in 8.3.5 - Mailing list pgsql-general

From Matt Magoffin
Subject Re: Out of memory on SELECT in 8.3.5
Date
Msg-id 51355.192.168.1.106.1234157867.squirrel@msqr.us
Whole thread Raw
In response to Re: Out of memory on SELECT in 8.3.5  (Stephen Frost <sfrost@snowman.net>)
Responses Re: Out of memory on SELECT in 8.3.5
List pgsql-general
>> PostgreSQL 8.3.5 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
>> 4.1.2 20070626 (Red Hat 4.1.2-14)
>
> Does the result from 'free' look reasonable on this box?

I think so:

             total       used       free     shared    buffers     cached
Mem:      16432296   16273964     158332          0     173536   14321340
-/+ buffers/cache:    1779088   14653208
Swap:      2096440        560    2095880


> How many PG processes are running, generally?

I see about 90 currently, of which 24 are Slony connections and the rest
are client connections.

> Do you see the PG process running this query growing to fill the
available memory?  That query really
> didn't look like it was chewing up all that much memory to me..

Just running top, it does appear to chew through a fair amount of memory.
Here's a snapshot from top of the postgres processing running this query
from just before it ran out of memory:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 4486 postgres  18   0 4576m 3.6g 3.3g R   90 23.1   0:34.23 postgres:
postgres lms_nna [local] EXPLAIN

These values did start out low, for example the RES memory started in the
130MB range, then climbed to the 3.6GB you see here.

> Have
> you tried reducing your work_mem setting to see if that makes the errors
> go away?  It might cause a different plan which might be less efficient
> for some queries, not sure if you have anything which would be
> dramatically affected by that..Do the row counts in the plan match up
> with what you'd expect?  Can you provide output from 'explain analyze'
> on a successful run?

I set the work_mem to 2MB, and the query does actually complete (explain
analyze output below), so does this mean that the query simply uses too
much memory for sorting/joining, and we'd have to either allocate enough
work_mem to allow the query to complete, or a smaller work_mem as shown
here to make the query use slower disk-based sorting? The row counts are
matching what we'd expect from this query.


                                                                                                          QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=2345301.47..2345402.31 rows=40336 width=234) (actual
time=203429.526..203599.840 rows=49317 loops=1)
   Sort Key: s.sale_date, s.vin
   Sort Method:  external merge  Disk: 4776kB
   ->  Hash Left Join  (cost=1469244.28..2337665.81 rows=40336 width=234)
(actual time=162362.791..202883.728 rows=49317 loops=1)
         Hash Cond: (ml.lead_id = phone.lead_id)
         ->  Hash Left Join  (cost=1349360.82..2159151.44 rows=40336
width=219) (actual time=155499.666..194941.423 rows=49317
loops=1)
               Hash Cond: (ml.lead_id = email.lead_id)
               ->  Nested Loop Left Join  (cost=1236733.28..1999980.90
rows=40336 width=204) (actual time=141890.479..186344.717
rows=49317 loops=1)
                     ->  Nested Loop Left Join
(cost=1236733.28..1806406.02 rows=40336 width=141)
(actual time=141890.419..166782.070 rows=49317
loops=1)
                           ->  Nested Loop Left Join
(cost=1236733.28..1586899.03 rows=40336
width=126) (actual time=141890.368..166124.253
rows=49317 loops=1)
                                 ->  Nested Loop Left Join
(cost=1236733.28..1413543.83 rows=40336
width=118) (actual
time=141890.281..156284.612 rows=49317
loops=1)
                                       ->  Merge Left Join
(cost=1236733.28..1237778.33
rows=40336 width=89) (actual
time=141890.184..143717.900
rows=49317 loops=1)
                                             Merge Cond: (ml.lead_id =
lrdm.lead_id)
                                             ->  Sort
(cost=43356.21..43457.05
rows=40336 width=78) (actual
time=1722.385..1794.668
rows=49317 loops=1)
                                                   Sort Key: ml.lead_id
                                                   Sort Method:  external
merge  Disk: 4152kB
                                                   ->  Hash Left Join
(cost=11354.33..38475.05
rows=40336 width=78)
(actual
time=919.319..1578.556
rows=49317 loops=1)
                                                         Hash Cond:
((s.dealer_code)::text
=
(d.dealer_code)::text)
                                                         ->  Hash Left
Join
(cost=11018.04..37584.13
rows=40336
width=60) (actual
time=914.936..1445.926
rows=49317
loops=1)
                                                               Hash Cond:
(s.id =
ml.sale_id)
                                                               Filter:
(((s.sale_type
=
'd'::bpchar)
AND (NOT
ml.lost_sale))
OR
((s.sale_type
=
'c'::bpchar)
AND
(ml.lead_pos
= 0)) OR
(s.sale_type
=
'0'::bpchar))
                                                               ->  Index
Scan using
mb_sale_sale_date_idx
on mb_sale
s
(cost=0.00..14027.94
rows=43804
width=50)
(actual
time=0.149..126.835
rows=49152
loops=1)
                                                                     Index
Cond:
((sale_date
>=
'2009-01-01'::date)
AND
(sale_date
<=
'2009-01-31'::date))
                                                               ->  Hash
(cost=5577.35..5577.35
rows=281335
width=26)
(actual
time=914.089..914.089
rows=281335
loops=1)
                                                                     ->
Seq
Scan
on
mb_lead
ml
(cost=0.00..5577.35
rows=281335
width=26)
(actual
time=0.026..394.384
rows=281335
loops=1)
                                                         ->  Hash
(cost=321.13..321.13
rows=1213
width=23) (actual
time=4.343..4.343
rows=1213
loops=1)
                                                               ->  Seq
Scan on
dealer d
(cost=0.00..321.13
rows=1213
width=23)
(actual
time=0.013..2.451
rows=1213
loops=1)
                                             ->  Materialize
(cost=1193377.08..1194985.94
rows=128709 width=19) (actual
time=135323.691..140326.709
rows=1442527 loops=1)
                                                   ->  Sort
(cost=1193377.08..1193698.85
rows=128709 width=19)
(actual
time=135323.681..137742.900
rows=1442488 loops=1)
                                                         Sort Key:
lrdm.lead_id
                                                         Sort Method:
external merge
Disk: 50312kB
                                                         ->  Bitmap Heap
Scan on
lead_reporting_meta
lrdm
(cost=118695.77..1179811.20
rows=128709
width=19) (actual
time=90238.108..126737.875
rows=1460904
loops=1)
                                                               Recheck
Cond:
(item_key =
'[ALS:prospectid]TrackingCode'::text)
                                                               Filter:
(pos = 1)
                                                               ->  Bitmap
Index Scan
on
lead_reporting_meta_item_key_idx

(cost=0.00..118663.60
rows=1476580
width=0)
(actual
time=90223.734..90223.734
rows=1476483
loops=1)
                                                                     Index
Cond:
(item_key
=
'[ALS:prospectid]TrackingCode'::text)
                                       ->  Index Scan using
lead_reporting_address_lead_id_idx
on lead_reporting_address address
(cost=0.00..4.35 rows=1 width=37)
(actual time=0.251..0.252 rows=0
loops=49317)
                                             Index Cond: (address.lead_id
= ml.lead_id)
                                 ->  Index Scan using
lead_reporting_street_address_id_idx on
lead_reporting_street address_street
(cost=0.00..4.29 rows=1 width=24) (actual
time=0.196..0.197 rows=0 loops=49317)
                                       Index Cond:
(address_street.address_id =
address.id)
                                       Filter: (address_street.pos = 0)
                           ->  Index Scan using lead_pkey on lead l
(cost=0.00..5.43 rows=1 width=23) (actual
time=0.010..0.010 rows=0 loops=49317)
                                 Index Cond: (l.id = ml.lead_id)
                     ->  Index Scan using lead_reporting_data_pkey on
lead_reporting_data lrd  (cost=0.00..4.79 rows=1
width=71) (actual time=0.393..0.393 rows=0
loops=49317)
                           Index Cond: (lrd.lead_id = ml.lead_id)
               ->  Hash  (cost=85837.99..85837.99 rows=1459164 width=23)
(actual time=7607.947..7607.947 rows=1521294 loops=1)
                     ->  Seq Scan on lead_reporting_list_data email
(cost=0.00..85837.99 rows=1459164 width=23) (actual
time=0.030..5063.069 rows=1521294 loops=1)
                           Filter: ((list_type = 'e'::bpchar) AND (pos = 0))
         ->  Hash  (cost=85837.99..85837.99 rows=1854357 width=23) (actual
time=6863.050..6863.050 rows=1803678 loops=1)
               ->  Seq Scan on lead_reporting_list_data phone
(cost=0.00..85837.99 rows=1854357 width=23) (actual
time=0.020..3920.096 rows=1803678 loops=1)
                     Filter: ((list_type = 'p'::bpchar) AND (pos = 0))
 Total runtime: 203657.960 ms
(52 rows)



pgsql-general by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Out of memory on SELECT in 8.3.5
Next
From: Stephen Frost
Date:
Subject: Re: Out of memory on SELECT in 8.3.5