Thread: Out of memory on SELECT in 8.3.5

Out of memory on SELECT in 8.3.5

From

"Matt Magoffin"

Date:

08 February 2009, 23:46:13

We've been having persistent out-of-memory errors occur in our production
8.3 deployment, which is now running 8.3.5. I'm not sure the query here is
the cause of the problem, but this is our most-recent example which
triggered an out-of-memory error for us.

Perhaps our configuration needs tweaking somewhere...

shared_buffers = 4000MB
#temp_buffers = 8MB
work_mem = 128MB
maintenance_work_mem = 256MB
#max_stack_depth = 2MB

I was wondering if somebody could spot anything odd looking in the query
plan or the memory parameters dumped to the log that might be causing
this?

explain SELECT
        s.sale_date,
        s.vin,
        s.dealer_region as sale_region,
        s.dealer_area as sale_area,
        s.dealer_code as sale_dealer,
        d.name as sale_dealer_name,
        s.model as sale_model,
        s.sale_type,
        ml.lead_id,
        l.created,
        lrd.reporting_date as lead_date,
        ml.lead_pos as lead_order,
        ml.lost_sale,
        ml.range,
        lrd.dealer_region as lead_region,
        lrd.dealer_area as lead_area,
        lrd.dealer_code as lead_dealer,
        lrd.model as lead_model,
        l.source as lead_source,
        lrd.oem_type as lead_type,
        lrd.oem_subtype as lead_subtype,
        lrd.unique_all as lead_unique_all,
        lrd.unique_region as lead_unique_region,
        lrd.unique_area as lead_unique_area,
        lrd.unique_dealer as lead_unique_dealer,
        lrd.customer_fname,
        lrd.customer_lname,
        lrdm.item_value as lead_tracking_code,
        address_street.street as owner_street,
        address.city as owner_city,
        address.state as owner_state,
        address.postal_code as owner_postal_code,
        address.country as owner_country,
        email.list_value as owner_email,
        phone.list_value as owner_phone

    FROM mb_sale s
    LEFT OUTER JOIN dealer d on d.dealer_code = s.dealer_code
    LEFT OUTER JOIN mb_lead ml on ml.sale_id = s.id
    LEFT OUTER JOIN lead_reporting_data lrd ON lrd.lead_id = ml.lead_id
    LEFT OUTER JOIN lead_reporting_meta lrdm
        ON lrdm.lead_id = ml.lead_id
        AND lrdm.item_key = '[ALS:prospectid]TrackingCode'
        AND lrdm.pos=1
    LEFT OUTER JOIN lead l ON l.id = ml.lead_id

    LEFT OUTER JOIN lead_reporting_address address ON address.lead_id =
ml.lead_id
    LEFT OUTER JOIN lead_reporting_street address_street
        ON address_street.address_id = address.id AND address_street.pos = 0

    LEFT OUTER JOIN lead_reporting_list_data phone
        ON phone.lead_id = ml.lead_id AND phone.list_type = 'p' AND phone.pos = 0

    LEFT OUTER JOIN lead_reporting_list_data email
        ON email.lead_id = ml.lead_id AND email.list_type = 'e' AND email.pos = 0

    WHERE
        ((s.sale_type = 'd' and ml.lost_sale = FALSE) OR (s.sale_type = 'c' and
ml.lead_pos = 0) OR (s.sale_type = '0'))
        AND s.sale_date BETWEEN date('2009-01-01') AND date('2009-01-31')
    ORDER BY s.sale_date, s.vin
                                                                                                     QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=2297525.76..2297626.60 rows=40336 width=234)
   Sort Key: s.sale_date, s.vin
   ->  Hash Left Join  (cost=1450193.63..2294440.10 rows=40336 width=234)
         Hash Cond: (ml.lead_id = phone.lead_id)
         ->  Hash Left Join  (cost=1341176.18..2140101.73 rows=40336
width=219)
               Hash Cond: (ml.lead_id = email.lead_id)
               ->  Nested Loop Left Join  (cost=1237098.64..2000317.19
rows=40336 width=204)
                     ->  Nested Loop Left Join
(cost=1237098.64..1806747.26 rows=40336 width=141)
                           ->  Nested Loop Left Join
(cost=1237098.64..1633400.88 rows=40336
width=133)
                                 ->  Nested Loop Left Join
(cost=1237098.64..1413899.40 rows=40336
width=118)
                                       ->  Merge Left Join
(cost=1237098.64..1238143.69
rows=40336 width=89)
                                             Merge Cond: (ml.lead_id =
lrdm.lead_id)
                                             ->  Sort
(cost=46384.08..46484.92
rows=40336 width=78)
                                                   Sort Key: ml.lead_id
                                                   ->  Hash Left Join
(cost=9430.33..43298.42
rows=40336 width=78)
                                                         Hash Cond:
((s.dealer_code)::text
=
(d.dealer_code)::text)
                                                         ->  Hash Left
Join
(cost=9094.04..42407.50
rows=40336
width=60)
                                                               Hash Cond:
(s.id =
ml.sale_id)
                                                               Filter:
(((s.sale_type
=
'd'::bpchar)
AND (NOT
ml.lost_sale))
OR
((s.sale_type
=
'c'::bpchar)
AND
(ml.lead_pos
= 0)) OR
(s.sale_type
=
'0'::bpchar))
                                                               ->  Index
Scan using
mb_sale_sale_date_idx
on mb_sale
s
(cost=0.00..14027.94
rows=43804
width=50)
                                                                     Index
Cond:
((sale_date
>=
'2009-01-01'::date)
AND
(sale_date
<=
'2009-01-31'::date))
                                                               ->  Hash
(cost=5577.35..5577.35
rows=281335
width=26)
                                                                     ->
Seq
Scan
on
mb_lead
ml
(cost=0.00..5577.35
rows=281335
width=26)
                                                         ->  Hash
(cost=321.13..321.13
rows=1213
width=23)
                                                               ->  Seq
Scan on
dealer d
(cost=0.00..321.13
rows=1213
width=23)
                                             ->  Sort
(cost=1190714.57..1191036.34
rows=128709 width=19)
                                                   Sort Key: lrdm.lead_id
                                                   ->  Bitmap Heap Scan on
lead_reporting_meta
lrdm
(cost=118675.76..1179791.19
rows=128709 width=19)
                                                         Recheck Cond:
(item_key =
'[ALS:prospectid]TrackingCode'::text)
                                                         Filter: (pos = 1)
                                                         ->  Bitmap Index
Scan on
lead_reporting_meta_item_key_idx

(cost=0.00..118643.58
rows=1476580
width=0)
                                                               Index Cond:
(item_key =
'[ALS:prospectid]TrackingCode'::text)
                                       ->  Index Scan using
lead_reporting_address_lead_id_idx
on lead_reporting_address address
(cost=0.00..4.34 rows=1 width=37)
                                             Index Cond: (address.lead_id
= ml.lead_id)
                                 ->  Index Scan using lead_pkey on lead l
(cost=0.00..5.43 rows=1 width=23)
                                       Index Cond: (l.id = ml.lead_id)
                           ->  Index Scan using
lead_reporting_street_address_id_idx on
lead_reporting_street address_street
(cost=0.00..4.29 rows=1 width=24)
                                 Index Cond: (address_street.address_id =
address.id)
                                 Filter: (address_street.pos = 0)
                     ->  Index Scan using lead_reporting_data_pkey on
lead_reporting_data lrd  (cost=0.00..4.79 rows=1
width=71)
                           Index Cond: (lrd.lead_id = ml.lead_id)
               ->  Hash  (cost=85837.99..85837.99 rows=1459164 width=23)
                     ->  Seq Scan on lead_reporting_list_data email
(cost=0.00..85837.99 rows=1459164 width=23)
                           Filter: ((list_type = 'e'::bpchar) AND (pos = 0))
         ->  Hash  (cost=85837.99..85837.99 rows=1854357 width=23)
               ->  Seq Scan on lead_reporting_list_data phone
(cost=0.00..85837.99 rows=1854357 width=23)
                     Filter: ((list_type = 'p'::bpchar) AND (pos = 0))
(47 rows)

TopMemoryContext: 154528 total in 18 blocks; 5240 free (20 chunks); 149288
used
  TopTransactionContext: 8192 total in 1 blocks; 7624 free (0 chunks); 568
used
  PL/PgSQL function context: 8192 total in 1 blocks; 5512 free (4 chunks);
2680 used
  PL/PgSQL function context: 57344 total in 3 blocks; 42720 free (11
chunks); 14624 used
  PL/PgSQL function context: 8192 total in 1 blocks; 4648 free (7 chunks);
3544 used
  Rendezvous variable hash: 8192 total in 1 blocks; 1680 free (0 chunks);
6512 used
  PLpgSQL function cache: 24520 total in 2 blocks; 3744 free (0 chunks);
20776 used
  Record information cache: 24576 total in 2 blocks; 15984 free (5
chunks); 8592 used
  CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  RI compare cache: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used
  RI query cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used
  Type information cache: 24576 total in 2 blocks; 11888 free (5 chunks);
12688 used
  Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512
used
  Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 chunks);
12688 used
  Prepared Queries: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used
  MessageContext: 8192 total in 1 blocks; 6976 free (0 chunks); 1216 used
  smgr relation table: 24576 total in 2 blocks; 3584 free (4 chunks);
20992 used
  TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks);
32 used
  Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used
    PortalHeapMemory: 1021952 total in 126 blocks; 11008 free (181
chunks); 1010944 used
      ExecutorState: 4972936 total in 6 blocks; 38456 free (11 chunks);
4934480 used
        HashTableContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
          HashBatchContext: 148889648 total in 28 blocks; 2532544 free (5
chunks); 146357104 used
        HashTableContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
          HashBatchContext: 144695344 total in 27 blocks; 828368 free (12
chunks); 143866976 used
        TIDBitmap: 50356272 total in 16 blocks; 7517392 free (41 chunks);
42838880 used
        TupleSort: 142649352 total in 25 blocks; 92237584 free (1339345
chunks); 50411768 used
        HashTableContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
          HashBatchContext: 131072 total in 4 blocks; 20064 free (3
chunks); 111008 used
        HashTableContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
          HashBatchContext: 25419824 total in 13 blocks; 2650448 free (5
chunks); 22769376 used
        TupleSort: 9953328 total in 11 blocks; 1277840 free (9 chunks);
8675488 used
        TupleSort: 1138736 total in 8 blocks; 768 free (8 chunks); 1137968
used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
        ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
      JoinRelHashTable: 8192 total in 1 blocks; 624 free (0 chunks); 7568
used
  Relcache by OID: 24576 total in 2 blocks; 10752 free (3 chunks); 13824 used
  CacheMemoryContext: 1341680 total in 21 blocks; 174632 free (83 chunks);
1167048 used
    unnamed prepared statement: 516096 total in 6 blocks; 231360 free (2
chunks); 284736 used
    CachedPlan: 3072 total in 2 blocks; 1160 free (0 chunks); 1912 used
    pg_statistic_relid_att_index: 2048 total in 1 blocks; 608 free (0
chunks); 1440 used
    locks_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    lead_reporting_list_data_value_lower_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
    lead_reporting_list_data_value_idx: 2048 total in 1 blocks; 752 free
(0 chunks); 1296 used
    lead_reporting_list_data_subtype_idx: 2048 total in 1 blocks; 752 free
(0 chunks); 1296 used
    lead_reporting_list_data_type_idx: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
    lead_reporting_list_data_lead_id_idx: 2048 total in 1 blocks; 752 free
(0 chunks); 1296 used
    lead_reporting_list_data_pkey: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
    lead_reporting_street_street_lower_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
    lead_reporting_street_address_id_idx: 2048 total in 1 blocks; 704 free
(0 chunks); 1344 used
    lead_reporting_street_pkey: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
    lead_reporting_address_postal_code_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
    lead_reporting_address_state_idx: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
    lead_reporting_address_city_lower_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
    lead_reporting_address_lead_id_idx: 2048 total in 1 blocks; 704 free
(0 chunks); 1344 used
    lead_reporting_address_pkey: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
    mb_lead_sale_id_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296
used
    mb_lead_lead_id_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296
used
    mb_lead_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    mb_sale_dealer_region_area_idx: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
    mb_sale_dealer_code_idx: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
    mb_sale_sale_date_idx: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
    mb_sale_type_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    mb_sale_model_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    mb_sale_dealer_region_idx: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
    mb_sale_dealer_area_idx: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
    mb_sale_vin_key: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    mb_sale_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    CachedPlan: 3072 total in 2 blocks; 648 free (0 chunks); 2424 used
    pg_toast_17933_index: 2048 total in 1 blocks; 608 free (0 chunks);
1440 used
    config_xml_name_key: 2048 total in 1 blocks; 704 free (0 chunks); 1344
used
    config_xml_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    CachedPlan: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    CachedPlanSource: 1024 total in 1 blocks; 56 free (0 chunks); 968 used
    SPI Plan: 1024 total in 1 blocks; 832 free (0 chunks); 192 used
    CachedPlan: 1024 total in 1 blocks; 336 free (0 chunks); 688 used
    CachedPlanSource: 1024 total in 1 blocks; 80 free (0 chunks); 944 used
    SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
    CachedPlanSource: 1024 total in 1 blocks; 24 free (0 chunks); 1000 used
    SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
    CachedPlan: 7168 total in 3 blocks; 3888 free (1 chunks); 3280 used
    CachedPlan: 7168 total in 3 blocks; 3920 free (0 chunks); 3248 used
    CachedPlanSource: 3072 total in 2 blocks; 888 free (0 chunks); 2184 used
    SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
    CachedPlan: 7168 total in 3 blocks; 3704 free (1 chunks); 3464 used
    CachedPlanSource: 3072 total in 2 blocks; 896 free (0 chunks); 2176 used
    SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
    CachedPlan: 3072 total in 2 blocks; 1456 free (1 chunks); 1616 used
    CachedPlanSource: 3072 total in 2 blocks; 1664 free (0 chunks); 1408 used
    SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
    CachedPlan: 1024 total in 1 blocks; 336 free (0 chunks); 688 used
    CachedPlanSource: 1024 total in 1 blocks; 80 free (0 chunks); 944 used
    SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
    CachedPlan: 3072 total in 2 blocks; 312 free (0 chunks); 2760 used
    CachedPlanSource: 3072 total in 2 blocks; 1568 free (0 chunks); 1504 used
    SPI Plan: 1024 total in 1 blocks; 800 free (0 chunks); 224 used
    cl_event_first_response_pkey: 2048 total in 1 blocks; 704 free (0
chunks); 1344 used
    CachedPlan: 1024 total in 1 blocks; 336 free (0 chunks); 688 used
    CachedPlanSource: 1024 total in 1 blocks; 80 free (0 chunks); 944 used
    SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
    CachedPlan: 1024 total in 1 blocks; 336 free (0 chunks); 688 used
    CachedPlanSource: 1024 total in 1 blocks; 80 free (0 chunks); 944 used
    SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
    CachedPlan: 3072 total in 2 blocks; 1728 free (0 chunks); 1344 used
    CachedPlanSource: 3072 total in 2 blocks; 1536 free (0 chunks); 1536 used
    SPI Plan: 1024 total in 1 blocks; 800 free (0 chunks); 224 used
    CachedPlan: 15360 total in 4 blocks; 7104 free (1 chunks); 8256 used
    CachedPlanSource: 7168 total in 3 blocks; 2976 free (2 chunks); 4192 used
    SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
    CachedPlan: 3072 total in 2 blocks; 1568 free (0 chunks); 1504 used
    CachedPlanSource: 3072 total in 2 blocks; 1352 free (0 chunks); 1720 used
    SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
    CachedPlan: 3072 total in 2 blocks; 368 free (0 chunks); 2704 used
    CachedPlanSource: 3072 total in 2 blocks; 40 free (0 chunks); 3032 used
    SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
    CachedPlan: 7168 total in 3 blocks; 3704 free (1 chunks); 3464 used
    CachedPlanSource: 3072 total in 2 blocks; 896 free (0 chunks); 2176 used
    SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
    cl_event_crm_idx: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used
    cl_event_lead_id_idx: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
    cl_event_event_name_idx: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
    cl_event_crm_lead_id_idx: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
    cl_event_pkey: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used
    locality_data_postal_code_norm_idx: 2048 total in 1 blocks; 752 free
(0 chunks); 1296 used
    locality_data_postal_code_key: 2048 total in 1 blocks; 392 free (0
chunks); 1656 used
    locality_data_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296
used
    lead_reporting_meta_item_value_key_idx: 2048 total in 1 blocks; 608
free (0 chunks); 1440 used
    lead_reporting_meta_item_key_idx: 2048 total in 1 blocks; 704 free (0
chunks); 1344 used
    lead_reporting_meta_item_type_idx: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
    lead_reporting_meta_lead_id_idx: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
    lead_reporting_meta_pkey: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
    CachedPlan: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    CachedPlanSource: 1024 total in 1 blocks; 56 free (0 chunks); 968 used
    SPI Plan: 1024 total in 1 blocks; 832 free (0 chunks); 192 used
    CachedPlan: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    CachedPlanSource: 1024 total in 1 blocks; 56 free (0 chunks); 968 used
    SPI Plan: 1024 total in 1 blocks; 832 free (0 chunks); 192 used
    CachedPlanSource: 3072 total in 2 blocks; 1248 free (1 chunks); 1824 used
    SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
    pg_toast_18404_index: 2048 total in 1 blocks; 608 free (0 chunks);
1440 used
    pg_attrdef_adrelid_adnum_index: 2048 total in 1 blocks; 608 free (0
chunks); 1440 used
    CachedPlan: 3072 total in 2 blocks; 936 free (0 chunks); 2136 used
    CachedPlanSource: 3072 total in 2 blocks; 1424 free (0 chunks); 1648 used
    SPI Plan: 1024 total in 1 blocks; 832 free (0 chunks); 192 used
    CachedPlanSource: 3072 total in 2 blocks; 360 free (0 chunks); 2712 used
    SPI Plan: 1024 total in 1 blocks; 800 free (0 chunks); 224 used
    CachedPlan: 7168 total in 3 blocks; 3888 free (1 chunks); 3280 used
    CachedPlanSource: 3072 total in 2 blocks; 360 free (0 chunks); 2712 used
    SPI Plan: 1024 total in 1 blocks; 800 free (0 chunks); 224 used
    CachedPlan: 7168 total in 3 blocks; 3704 free (1 chunks); 3464 used
    CachedPlanSource: 3072 total in 2 blocks; 896 free (0 chunks); 2176 used
    SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
    region_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_18005_index: 2048 total in 1 blocks; 608 free (0 chunks);
1440 used
    request_state_idx: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used
    lead_id_idx: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used
    lead_billing_contention_pkey: 2048 total in 1 blocks; 704 free (0
chunks); 1344 used
    lead_reporting_data_delivery_meth_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
    lead_reporting_data_typeflags_idx: 2048 total in 1 blocks; 600 free (0
chunks); 1448 used
    lead_reporting_data_unique_dealer_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
    lead_reporting_data_unique_area_idx: 2048 total in 1 blocks; 752 free
(0 chunks); 1296 used
    lead_reporting_data_unique_region_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
    lead_reporting_data_unique_all_idx: 2048 total in 1 blocks; 752 free
(0 chunks); 1296 used
    lead_reporting_data_oem_subtype_idx: 2048 total in 1 blocks; 752 free
(0 chunks); 1296 used
    lead_reporting_data_oem_type_idx: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
    lead_reporting_data_dealer_area_idx: 2048 total in 1 blocks; 752 free
(0 chunks); 1296 used
    lead_reporting_data_customer_lname_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
    lead_reporting_data_customer_fname_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
    lead_reporting_data_customer_postal_code_idx: 2048 total in 1 blocks;
752 free (0 chunks); 1296 used
    lead_reporting_data_dealer_region_area_idx: 2048 total in 1 blocks;
752 free (0 chunks); 1296 used
    lead_reporting_data_dealer_region_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
    lead_reporting_data_dealer_code_idx: 2048 total in 1 blocks; 704 free
(0 chunks); 1344 used
    lead_reporting_data_reporting_date_idx: 2048 total in 1 blocks; 704
free (0 chunks); 1344 used
    lead_reporting_data_model_idx: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
    lead_reporting_data_pkey: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
    lead_reporting: 23552 total in 5 blocks; 7240 free (0 chunks); 16312 used
    lead_processing_state_step_idx: 2048 total in 1 blocks; 656 free (0
chunks); 1392 used
    reporting_modified_idx: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
    reporting_last_processed_date_idx: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
    reporting_date_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296
used
    lead_created_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    lead_processing_step_idx: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
    lead_destination_idx: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
    lead_modified_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    lead_source_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    lead_pkey: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used
    metadata_key_value_idx: 2048 total in 1 blocks; 656 free (0 chunks);
1392 used
    metadata_pkey: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used
    dealer_metadata_pkey: 2048 total in 1 blocks; 608 free (0 chunks);
1440 used
    xml_schema_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    ehcache_expiration_idx: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
    ehcache_pkey: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used
    external_system_user_name_key: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
    external_system_name_key: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
    external_system_pkey: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
    dealer_external_system_id_idx: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
    dealer_dealer_code_key: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
    dealer_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    CachedPlan: 1024 total in 1 blocks; 816 free (0 chunks); 208 used
    CachedPlanSource: 1024 total in 1 blocks; 664 free (1 chunks); 360 used
    xslt_style_sheet_pkey: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
    pg_index_indrelid_index: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
    CachedPlan: 1024 total in 1 blocks; 816 free (0 chunks); 208 used
    CachedPlanSource: 1024 total in 1 blocks; 664 free (1 chunks); 360 used
    pg_ts_dict_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_aggregate_fnoid_index: 3072 total in 2 blocks; 1696 free (2
chunks); 1376 used
    pg_language_name_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_ts_dict_dictname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_namespace_nspname_index: 3072 total in 2 blocks; 1696 free (2
chunks); 1376 used
    pg_opfamily_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_opclass_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
    pg_ts_parser_prsname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_amop_fam_strat_index: 3072 total in 2 blocks; 1384 free (2 chunks);
1688 used
    pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1576 free (3
chunks); 1496 used
    pg_trigger_tgrelid_tgname_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
    pg_cast_source_target_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
    pg_auth_members_role_member_index: 3072 total in 2 blocks; 1648 free
(2 chunks); 1424 used
    pg_attribute_relid_attnum_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
    pg_ts_config_cfgname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_authid_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
    pg_ts_config_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_conversion_default_index: 3072 total in 2 blocks; 1432 free (3
chunks); 1640 used
    pg_language_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
    pg_enum_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328
used
    pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1576 free (3
chunks); 1496 used
    pg_ts_parser_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_database_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
    pg_conversion_name_nsp_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_class_relname_nsp_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
    pg_attribute_relid_attnam_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_class_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376
used
    pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1384 free (2
chunks); 1688 used
    pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1384 free (2
chunks); 1688 used
    pg_index_indexrelid_index: 3072 total in 2 blocks; 1696 free (2
chunks); 1376 used
    pg_type_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376
used
    pg_rewrite_rel_rulename_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
    pg_authid_rolname_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
    pg_auth_members_member_role_index: 3072 total in 2 blocks; 1648 free
(2 chunks); 1424 used
    pg_enum_typid_label_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_constraint_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
    pg_conversion_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_ts_template_tmplname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_ts_config_map_index: 3072 total in 2 blocks; 1624 free (3 chunks);
1448 used
    pg_namespace_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
    pg_type_typname_nsp_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
    pg_operator_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
    pg_amop_opr_fam_index: 3072 total in 2 blocks; 1600 free (2 chunks);
1472 used
    pg_proc_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376
used
    pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1624 free (3
chunks); 1448 used
    pg_ts_template_oid_index: 3072 total in 2 blocks; 1744 free (3
chunks); 1328 used
  MdSmgr: 24576 total in 2 blocks; 16192 free (0 chunks); 8384 used
  LOCALLOCK hash: 24576 total in 2 blocks; 11872 free (5 chunks); 12704 used
  Timezones: 53584 total in 2 blocks; 3744 free (0 chunks); 49840 used
  ErrorContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
2009-02-08 20:11:27 CST lms_nna ERROR:  out of memory
2009-02-08 20:11:27 CST lms_nna DETAIL:  Failed on request of size 243.

Re: Out of memory on SELECT in 8.3.5

From

Stephen Frost

Date:

08 February 2009, 23:55:42

* Matt Magoffin (postgresql.org@msqr.us) wrote:
> We've been having persistent out-of-memory errors occur in our production
> 8.3 deployment, which is now running 8.3.5. I'm not sure the query here is
> the cause of the problem, but this is our most-recent example which
> triggered an out-of-memory error for us.

erm..  How much memory do you have in the system?

> Perhaps our configuration needs tweaking somewhere...
>
> shared_buffers = 4000MB

I hope you've got a fair bit more than 4G of memory if you're going to
use 4G for shared buffers...  Once that memory is dedicated to shared
buffers it's not going to be available for other usage.

> #temp_buffers = 8MB
> work_mem = 128MB
> maintenance_work_mem = 256MB
> #max_stack_depth = 2MB

Those are bumped up from the defaults, but don't look too bad to me.  Do
realize that 'work_mem' is the amount PG is allowed to use for each
action and that complex queries can end up using many times that amount
(but I doubt that's the real issue here..).

    Thanks,

        Stephen

Attachment

signature.asc

Re: Out of memory on SELECT in 8.3.5

From

"Matt Magoffin"

Date:

09 February 2009, 00:07:04

> erm..  How much memory do you have in the system?

This system has 16GB of RAM, and Postgres is basically the only service
running on the box.

>> shared_buffers = 4000MB
>
> I hope you've got a fair bit more than 4G of memory if you're going to
> use 4G for shared buffers...  Once that memory is dedicated to shared
> buffers it's not going to be available for other usage.

Yep, we've got 16GB to work with here. I should have also mentioned the
architecture in my original post, sorry. SELECT version() returns this:

PostgreSQL 8.3.5 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20070626 (Red Hat 4.1.2-14)

Regards,
Matt

Re: Out of memory on SELECT in 8.3.5

From

Stephen Frost

Date:

09 February 2009, 00:15:20

* Matt Magoffin (postgresql.org@msqr.us) wrote:
> Yep, we've got 16GB to work with here. I should have also mentioned the
> architecture in my original post, sorry. SELECT version() returns this:
>
> PostgreSQL 8.3.5 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
> 4.1.2 20070626 (Red Hat 4.1.2-14)

Does the result from 'free' look reasonable on this box?  How many PG
processes are running, generally?  Do you see the PG process running
this query growing to fill the available memory?  That query really
didn't look like it was chewing up all that much memory to me..  Have
you tried reducing your work_mem setting to see if that makes the errors
go away?  It might cause a different plan which might be less efficient
for some queries, not sure if you have anything which would be
dramatically affected by that..  Do the row counts in the plan match up
with what you'd expect?  Can you provide output from 'explain analyze'
on a successful run?

    Thanks,

        Stephen

Attachment

signature.asc

Re: Out of memory on SELECT in 8.3.5

From

"Matt Magoffin"

Date:

09 February 2009, 01:38:24

>> PostgreSQL 8.3.5 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
>> 4.1.2 20070626 (Red Hat 4.1.2-14)
>
> Does the result from 'free' look reasonable on this box?

I think so:

             total       used       free     shared    buffers     cached
Mem:      16432296   16273964     158332          0     173536   14321340
-/+ buffers/cache:    1779088   14653208
Swap:      2096440        560    2095880


> How many PG processes are running, generally?

I see about 90 currently, of which 24 are Slony connections and the rest
are client connections.

> Do you see the PG process running this query growing to fill the
available memory?  That query really
> didn't look like it was chewing up all that much memory to me..

Just running top, it does appear to chew through a fair amount of memory.
Here's a snapshot from top of the postgres processing running this query
from just before it ran out of memory:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 4486 postgres  18   0 4576m 3.6g 3.3g R   90 23.1   0:34.23 postgres:
postgres lms_nna [local] EXPLAIN

These values did start out low, for example the RES memory started in the
130MB range, then climbed to the 3.6GB you see here.

> Have
> you tried reducing your work_mem setting to see if that makes the errors
> go away?  It might cause a different plan which might be less efficient
> for some queries, not sure if you have anything which would be
> dramatically affected by that..Do the row counts in the plan match up
> with what you'd expect?  Can you provide output from 'explain analyze'
> on a successful run?

I set the work_mem to 2MB, and the query does actually complete (explain
analyze output below), so does this mean that the query simply uses too
much memory for sorting/joining, and we'd have to either allocate enough
work_mem to allow the query to complete, or a smaller work_mem as shown
here to make the query use slower disk-based sorting? The row counts are
matching what we'd expect from this query.


                                                                                                          QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=2345301.47..2345402.31 rows=40336 width=234) (actual
time=203429.526..203599.840 rows=49317 loops=1)
   Sort Key: s.sale_date, s.vin
   Sort Method:  external merge  Disk: 4776kB
   ->  Hash Left Join  (cost=1469244.28..2337665.81 rows=40336 width=234)
(actual time=162362.791..202883.728 rows=49317 loops=1)
         Hash Cond: (ml.lead_id = phone.lead_id)
         ->  Hash Left Join  (cost=1349360.82..2159151.44 rows=40336
width=219) (actual time=155499.666..194941.423 rows=49317
loops=1)
               Hash Cond: (ml.lead_id = email.lead_id)
               ->  Nested Loop Left Join  (cost=1236733.28..1999980.90
rows=40336 width=204) (actual time=141890.479..186344.717
rows=49317 loops=1)
                     ->  Nested Loop Left Join
(cost=1236733.28..1806406.02 rows=40336 width=141)
(actual time=141890.419..166782.070 rows=49317
loops=1)
                           ->  Nested Loop Left Join
(cost=1236733.28..1586899.03 rows=40336
width=126) (actual time=141890.368..166124.253
rows=49317 loops=1)
                                 ->  Nested Loop Left Join
(cost=1236733.28..1413543.83 rows=40336
width=118) (actual
time=141890.281..156284.612 rows=49317
loops=1)
                                       ->  Merge Left Join
(cost=1236733.28..1237778.33
rows=40336 width=89) (actual
time=141890.184..143717.900
rows=49317 loops=1)
                                             Merge Cond: (ml.lead_id =
lrdm.lead_id)
                                             ->  Sort
(cost=43356.21..43457.05
rows=40336 width=78) (actual
time=1722.385..1794.668
rows=49317 loops=1)
                                                   Sort Key: ml.lead_id
                                                   Sort Method:  external
merge  Disk: 4152kB
                                                   ->  Hash Left Join
(cost=11354.33..38475.05
rows=40336 width=78)
(actual
time=919.319..1578.556
rows=49317 loops=1)
                                                         Hash Cond:
((s.dealer_code)::text
=
(d.dealer_code)::text)
                                                         ->  Hash Left
Join
(cost=11018.04..37584.13
rows=40336
width=60) (actual
time=914.936..1445.926
rows=49317
loops=1)
                                                               Hash Cond:
(s.id =
ml.sale_id)
                                                               Filter:
(((s.sale_type
=
'd'::bpchar)
AND (NOT
ml.lost_sale))
OR
((s.sale_type
=
'c'::bpchar)
AND
(ml.lead_pos
= 0)) OR
(s.sale_type
=
'0'::bpchar))
                                                               ->  Index
Scan using
mb_sale_sale_date_idx
on mb_sale
s
(cost=0.00..14027.94
rows=43804
width=50)
(actual
time=0.149..126.835
rows=49152
loops=1)
                                                                     Index
Cond:
((sale_date
>=
'2009-01-01'::date)
AND
(sale_date
<=
'2009-01-31'::date))
                                                               ->  Hash
(cost=5577.35..5577.35
rows=281335
width=26)
(actual
time=914.089..914.089
rows=281335
loops=1)
                                                                     ->
Seq
Scan
on
mb_lead
ml
(cost=0.00..5577.35
rows=281335
width=26)
(actual
time=0.026..394.384
rows=281335
loops=1)
                                                         ->  Hash
(cost=321.13..321.13
rows=1213
width=23) (actual
time=4.343..4.343
rows=1213
loops=1)
                                                               ->  Seq
Scan on
dealer d
(cost=0.00..321.13
rows=1213
width=23)
(actual
time=0.013..2.451
rows=1213
loops=1)
                                             ->  Materialize
(cost=1193377.08..1194985.94
rows=128709 width=19) (actual
time=135323.691..140326.709
rows=1442527 loops=1)
                                                   ->  Sort
(cost=1193377.08..1193698.85
rows=128709 width=19)
(actual
time=135323.681..137742.900
rows=1442488 loops=1)
                                                         Sort Key:
lrdm.lead_id
                                                         Sort Method:
external merge
Disk: 50312kB
                                                         ->  Bitmap Heap
Scan on
lead_reporting_meta
lrdm
(cost=118695.77..1179811.20
rows=128709
width=19) (actual
time=90238.108..126737.875
rows=1460904
loops=1)
                                                               Recheck
Cond:
(item_key =
'[ALS:prospectid]TrackingCode'::text)
                                                               Filter:
(pos = 1)
                                                               ->  Bitmap
Index Scan
on
lead_reporting_meta_item_key_idx

(cost=0.00..118663.60
rows=1476580
width=0)
(actual
time=90223.734..90223.734
rows=1476483
loops=1)
                                                                     Index
Cond:
(item_key
=
'[ALS:prospectid]TrackingCode'::text)
                                       ->  Index Scan using
lead_reporting_address_lead_id_idx
on lead_reporting_address address
(cost=0.00..4.35 rows=1 width=37)
(actual time=0.251..0.252 rows=0
loops=49317)
                                             Index Cond: (address.lead_id
= ml.lead_id)
                                 ->  Index Scan using
lead_reporting_street_address_id_idx on
lead_reporting_street address_street
(cost=0.00..4.29 rows=1 width=24) (actual
time=0.196..0.197 rows=0 loops=49317)
                                       Index Cond:
(address_street.address_id =
address.id)
                                       Filter: (address_street.pos = 0)
                           ->  Index Scan using lead_pkey on lead l
(cost=0.00..5.43 rows=1 width=23) (actual
time=0.010..0.010 rows=0 loops=49317)
                                 Index Cond: (l.id = ml.lead_id)
                     ->  Index Scan using lead_reporting_data_pkey on
lead_reporting_data lrd  (cost=0.00..4.79 rows=1
width=71) (actual time=0.393..0.393 rows=0
loops=49317)
                           Index Cond: (lrd.lead_id = ml.lead_id)
               ->  Hash  (cost=85837.99..85837.99 rows=1459164 width=23)
(actual time=7607.947..7607.947 rows=1521294 loops=1)
                     ->  Seq Scan on lead_reporting_list_data email
(cost=0.00..85837.99 rows=1459164 width=23) (actual
time=0.030..5063.069 rows=1521294 loops=1)
                           Filter: ((list_type = 'e'::bpchar) AND (pos = 0))
         ->  Hash  (cost=85837.99..85837.99 rows=1854357 width=23) (actual
time=6863.050..6863.050 rows=1803678 loops=1)
               ->  Seq Scan on lead_reporting_list_data phone
(cost=0.00..85837.99 rows=1854357 width=23) (actual
time=0.020..3920.096 rows=1803678 loops=1)
                     Filter: ((list_type = 'p'::bpchar) AND (pos = 0))
 Total runtime: 203657.960 ms
(52 rows)

Re: Out of memory on SELECT in 8.3.5

From

Stephen Frost

Date:

09 February 2009, 01:52:47

* Matt Magoffin (postgresql.org@msqr.us) wrote:
> > Does the result from 'free' look reasonable on this box?
>
> I think so:
>
>              total       used       free     shared    buffers     cached
> Mem:      16432296   16273964     158332          0     173536   14321340
> -/+ buffers/cache:    1779088   14653208
> Swap:      2096440        560    2095880

That certainly looks fine..  And you've got 14G or so which should be
available for this query.  Was this near the time the query was running?
Could you give us what 'free' returns when the query is close to the
out-of-memory error?  I'd expect the 2nd row under 'free' to be getting
low for the allocation to fail.

> Just running top, it does appear to chew through a fair amount of memory.
> Here's a snapshot from top of the postgres processing running this query
> from just before it ran out of memory:
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  4486 postgres  18   0 4576m 3.6g 3.3g R   90 23.1   0:34.23 postgres:
> postgres lms_nna [local] EXPLAIN
>
> These values did start out low, for example the RES memory started in the
> 130MB range, then climbed to the 3.6GB you see here.

Uhh..  I saw that your system was 64-bit, but is your PG process
compiled as 64bit?  Maybe you're hitting an artificial 32-bit limit,
which isn't exactly helped by your shared_buffers being set up so high
to begin with?  Run 'file' on your postgres binary, like so:

sfrost@snowman:/home/sfrost> file /usr/lib/postgresql/8.3/bin/postgres
/usr/lib/postgresql/8.3/bin/postgres: ELF 64-bit LSB executable, x86-64,
version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux
2.6.8, stripped

(that's on Debian, your postgres binary will probably be somewhere
else).

> I set the work_mem to 2MB, and the query does actually complete (explain
> analyze output below), so does this mean that the query simply uses too
> much memory for sorting/joining, and we'd have to either allocate enough
> work_mem to allow the query to complete, or a smaller work_mem as shown
> here to make the query use slower disk-based sorting? The row counts are
> matching what we'd expect from this query.

Allocating more work_mem won't help in this situation.  PG's
out-of-memory error is only ever caused by an actual allocation failure.
The work_mem, to PG, is more of a guideline than any kind of hard limit.
Based on the explain analyze, I continue to feel that your query isn't
actually all that big in terms of memory usage (the disk-based sorting
taking place was in the 50M range or so, from what I saw, and you should
have plenty of memory for such a query..).

Check your binary..  I suspect that's where your issue is.  Now, I don't
recall offhand if you can just recompile-in-place or if you need to do a
dump/restore, but I'd definitely do a backup just in case and keep it
handy (or maybe use it just to be safe) and see if PG complains when
it's started up.

    Thanks,

        Stephen

Attachment

signature.asc

Re: Out of memory on SELECT in 8.3.5

From

Tom Lane

Date:

09 February 2009, 02:38:50

> * Matt Magoffin (postgresql.org@msqr.us) wrote:
>> Just running top, it does appear to chew through a fair amount of memory.
>> Here's a snapshot from top of the postgres processing running this query
>> from just before it ran out of memory:
>>
>> PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 4486 postgres  18   0 4576m 3.6g 3.3g R   90 23.1   0:34.23 postgres:
>> postgres lms_nna [local] EXPLAIN
>>
>> These values did start out low, for example the RES memory started in the
>> 130MB range, then climbed to the 3.6GB you see here.

That is almost certainly meaningless; it just reflects the process
touching a larger and larger fraction of shared buffers over its
existence.  The number to pay attention to is the non-shared memory size
(VIRT - SHR is probably the right number here).

Stephen Frost <sfrost@snowman.net> writes:
> Uhh..  I saw that your system was 64-bit, but is your PG process
> compiled as 64bit?  Maybe you're hitting an artificial 32-bit limit,
> which isn't exactly helped by your shared_buffers being set up so high
> to begin with?  Run 'file' on your postgres binary, like so:

I think it must be compiled 64-bit, or he'd not be able to get
shared_buffers that high to start with.  However, it's possible that the
postmaster's been started under a ulimit setting that constrains each
backend to just a few hundred meg of per-process memory.

            regards, tom lane

Re: Out of memory on SELECT in 8.3.5

From

Stephen Frost

Date:

09 February 2009, 02:42:58

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > Uhh..  I saw that your system was 64-bit, but is your PG process
> > compiled as 64bit?  Maybe you're hitting an artificial 32-bit limit,
> > which isn't exactly helped by your shared_buffers being set up so high
> > to begin with?  Run 'file' on your postgres binary, like so:
>
> I think it must be compiled 64-bit, or he'd not be able to get
> shared_buffers that high to start with.  However, it's possible that the
> postmaster's been started under a ulimit setting that constrains each
> backend to just a few hundred meg of per-process memory.

I'm not so sure..  He has it as '4000MB', which would leave 96M free,
which doesn't seem that far from where his query is ending up at based
on the memory info he provided and the size of the sorts being done.
The ulimit is certainly something else to check tho.

    Thanks,

        Stephen

Attachment

signature.asc

Re: Out of memory on SELECT in 8.3.5

From

"Matt Magoffin"

Date:

09 February 2009, 03:01:54

>>              total       used       free     shared    buffers
>> cached
>> Mem:      16432296   16273964     158332          0     173536
>> 14321340
>> -/+ buffers/cache:    1779088   14653208
>> Swap:      2096440        560    2095880
>
> That certainly looks fine..  And you've got 14G or so which should be
> available for this query.  Was this near the time the query was running?
> Could you give us what 'free' returns when the query is close to the
> out-of-memory error?  I'd expect the 2nd row under 'free' to be getting
> low for the allocation to fail.

It was near the time... here's free just before executing the query:

             total       used       free     shared    buffers     cached
Mem:      16432296   16342056      90240          0     116868   14561200
-/+ buffers/cache:    1663988   14768308
Swap:      2096440        448    2095992

And then here is free close to the time the query runs out of memory (but
while the query is still running):

             total       used       free     shared    buffers     cached
Mem:      16432296   16348260      84036          0      41344   14167384
-/+ buffers/cache:    2139532   14292764
Swap:      2096440        448    2095992

> Uhh..  I saw that your system was 64-bit, but is your PG process
> compiled as 64bit?  Maybe you're hitting an artificial 32-bit limit,
> which isn't exactly helped by your shared_buffers being set up so high
> to begin with?

Yes, it's compiled as 64-bit:

/opt/lms/postgres-8.3_64/bin/postgres: ELF 64-bit LSB executable, AMD
x86-64, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses
shared libs), for GNU/Linux 2.6.9, not stripped

We've had nagging memory-related issues with 8.3 that manifest themselves
like memory leaks... some posts I've made in the past have led to some
leaks getting fixed... but I've not been able to track down more specific
causes. It's just that over time Postgres seems to accumulate memory and
then we start to see out-of-memory errors like this one. Again, this query
could have nothing to do with the root cause, but this is just the example
I have to work with at the moment.

Regards,
Matt

Re: Out of memory on SELECT in 8.3.5

From

"Matt Magoffin"

Date:

09 February 2009, 03:06:43

> I think it must be compiled 64-bit, or he'd not be able to get
> shared_buffers that high to start with.  However, it's possible that the
> postmaster's been started under a ulimit setting that constrains each
> backend to just a few hundred meg of per-process memory.

Here's the output of ulimit -a by the "postgres" user the database is
running under:

[postgres@170226-db7 ~]$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
max nice                        (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 139264
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
max rt priority                 (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 139264
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I think this means it does not have an artificial memory limit imposed,
but is there a specific setting beyond these I could check do you think?

Regards,
Matt

Re: Out of memory on SELECT in 8.3.5

From

Tom Lane

Date:

09 February 2009, 03:07:05

Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> I think it must be compiled 64-bit, or he'd not be able to get
>> shared_buffers that high to start with.

> I'm not so sure..  He has it as '4000MB', which would leave 96M free,
> which doesn't seem that far from where his query is ending up at based
> on the memory info he provided and the size of the sorts being done.
> The ulimit is certainly something else to check tho.

Hmm ... a gig here, a gig there, pretty soon you're talking about real
memory?  He's got several sorts and hashes that are each taking over
100MB according to the memory context dump, so it seems impossible that
it all fits into a strict 32-bit address space.  There's surely no harm
in double-checking the executable's file type though.

            regards, tom lane

Re: Out of memory on SELECT in 8.3.5

From

"Matt Magoffin"

Date:

09 February 2009, 03:10:13

> Hmm ... a gig here, a gig there, pretty soon you're talking about real
> memory?  He's got several sorts and hashes that are each taking over
> 100MB according to the memory context dump, so it seems impossible that
> it all fits into a strict 32-bit address space.  There's surely no harm
> in double-checking the executable's file type though.

I did reply to this in a different email in this thread, but just to be
safe, yes, I did verify Postgres is a 64-bit binary:

[postgres@170226-db7 ~]$ file /opt/lms/postgres-8.3_64/bin/postgres
/opt/lms/postgres-8.3_64/bin/postgres: ELF 64-bit LSB executable, AMD
x86-64, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses
shared libs), for GNU/Linux 2.6.9, not stripped

Regards,
Matt

Re: Out of memory on SELECT in 8.3.5

From

Tom Lane

Date:

09 February 2009, 03:10:59

"Matt Magoffin" <postgresql.org@msqr.us> writes:
> Here's the output of ulimit -a by the "postgres" user the database is
> running under:
> ...
> I think this means it does not have an artificial memory limit imposed,

Agreed, that ulimit isn't reflecting any such limit, but is that really
the same environment the postmaster gets started in?  I wouldn't trust
a system startup script to be launched in the same environment that a
login shell gets.  You might try adding
    ulimit -a >/tmp/something
to the startup script to find out for sure.

            regards, tom lane

Re: Out of memory on SELECT in 8.3.5

From

Stephen Frost

Date:

09 February 2009, 03:12:36

* Matt Magoffin (postgresql.org@msqr.us) wrote:
> > I think it must be compiled 64-bit, or he'd not be able to get
> > shared_buffers that high to start with.  However, it's possible that the
> > postmaster's been started under a ulimit setting that constrains each
> > backend to just a few hundred meg of per-process memory.
>
> Here's the output of ulimit -a by the "postgres" user the database is
> running under:
[...]
> I think this means it does not have an artificial memory limit imposed,
> but is there a specific setting beyond these I could check do you think?

How about cat /proc/<pid>/limits for the postmaster?
And maybe:
status
stat
maps

Though I'm kinda grasping at straws here, to be honest.  I've had PG up
and running through >16G of memory at a time before.

    Thanks,

        Stephen

Attachment

signature.asc

Re: Out of memory on SELECT in 8.3.5

From

Stephen Frost

Date:

09 February 2009, 03:15:00

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Hmm ... a gig here, a gig there, pretty soon you're talking about real
> memory?  He's got several sorts and hashes that are each taking over
> 100MB according to the memory context dump, so it seems impossible that
> it all fits into a strict 32-bit address space.  There's surely no harm
> in double-checking the executable's file type though.

Argh, it's late and I'm not thinking.  Yeah, I had seen the 100M hashes
and just wasn't thinking clearly.  Sorry for the noise.

    Stephen

Attachment

signature.asc

Re: Out of memory on SELECT in 8.3.5

From

Tom Lane

Date:

09 February 2009, 03:31:53

"Matt Magoffin" <postgresql.org@msqr.us> writes:
> We've had nagging memory-related issues with 8.3 that manifest themselves
> like memory leaks... some posts I've made in the past have led to some
> leaks getting fixed... but I've not been able to track down more specific
> causes. It's just that over time Postgres seems to accumulate memory and
> then we start to see out-of-memory errors like this one. Again, this query
> could have nothing to do with the root cause, but this is just the example
> I have to work with at the moment.

Yeah, I remember we found a few xml-related leaks based on your reports.
However, there's not anything here to suggest that this query is
straining the capabilities of a 64-bit system with lots o RAM.  It seems
certain you're hitting some artificial process-size limit, and the only
one I know about is ulimit.

I wasn't aware of /proc/<pid>/limits before, but now that I've heard
of it, checking that for the postmaster and/or a backend seems like
a great idea.

            regards, tom lane

Re: Out of memory on SELECT in 8.3.5

From

"Matt Magoffin"

Date:

09 February 2009, 03:51:47

> How about cat /proc/<pid>/limits for the postmaster?
> And maybe:
> status
> stat
> maps
>
> Though I'm kinda grasping at straws here, to be honest.  I've had PG up
> and running through >16G of memory at a time before.

There is no /prod/<pid>/limits file, but here are

status:

Name:    postmaster
State:    S (sleeping)
SleepAVG:    95%
Tgid:    4698
Pid:    4698
PPid:    1
TracerPid:    0
Uid:    26    26    26    26
Gid:    26    26    26    26
FDSize:    1024
Groups:    26
VmPeak:     4301216 kB
VmSize:     4301204 kB
VmLck:           0 kB
VmHWM:       85980 kB
VmRSS:       85980 kB
VmData:         940 kB
VmStk:          84 kB
VmExe:        4480 kB
VmLib:        3512 kB
VmPTE:         284 kB
StaBrk:    00ade000 kB
Brk:    0c3db000 kB
StaStk:    7fff21fecf40 kB
Threads:    1
SigQ:    0/139264
SigPnd:    0000000000000000
ShdPnd:    0000000000000000
SigBlk:    0000000000000000
SigIgn:    0000000001303000
SigCgt:    0000000000014a07
CapInh:    0000000000000000
CapPrm:    0000000000000000
CapEff:    0000000000000000
Cpus_allowed:    7fffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
Mems_allowed:    00000000,00000001

stat:

4698 (postmaster) S 1 4686 4686 0 -1 4202496 27092033 7777059343 31 2809
6684 47834 13037706 1828097 15 0 1 0 9296 4404432896 21495
18446744073709551615 4194304 8778268 140733763735360 18446744073709551615
232627404115 0 0 19935232 84487 0 0 0 17 2 0 0 69

maps:

00400000-00860000 r-xp 00000000 08:03 1120168
/opt/lms/64/postgres/8.3/bin/postgres
00a5f000-00a6b000 rw-p 0045f000 08:03 1120168
/opt/lms/64/postgres/8.3/bin/postgres
00a6b000-00ade000 rw-p 00a6b000 00:00 0
0c39c000-0c3db000 rw-p 0c39c000 00:00 0
3012200000-3012234000 r-xp 00000000 08:07 5237058
/usr/lib64/libxslt.so.1.1.17
3012234000-3012433000 ---p 00034000 08:07 5237058
/usr/lib64/libxslt.so.1.1.17
3012433000-3012435000 rw-p 00033000 08:07 5237058
/usr/lib64/libxslt.so.1.1.17
3012a00000-3012a05000 r-xp 00000000 08:07 457029
/lib64/libcrypt-2.5.so
3012a05000-3012c04000 ---p 00005000 08:07 457029
/lib64/libcrypt-2.5.so
3012c04000-3012c05000 r--p 00004000 08:07 457029
/lib64/libcrypt-2.5.so
3012c05000-3012c06000 rw-p 00005000 08:07 457029
/lib64/libcrypt-2.5.so
3012c06000-3012c34000 rw-p 3012c06000 00:00 0
3629600000-362961a000 r-xp 00000000 08:07 457227
/lib64/ld-2.5.so
3629819000-362981a000 r--p 00019000 08:07 457227
/lib64/ld-2.5.so
362981a000-362981b000 rw-p 0001a000 08:07 457227
/lib64/ld-2.5.so
3629a00000-3629b46000 r-xp 00000000 08:07 457228
/lib64/libc-2.5.so
3629b46000-3629d46000 ---p 00146000 08:07 457228
/lib64/libc-2.5.so
3629d46000-3629d4a000 r--p 00146000 08:07 457228
/lib64/libc-2.5.so
3629d4a000-3629d4b000 rw-p 0014a000 08:07 457228
/lib64/libc-2.5.so
3629d4b000-3629d50000 rw-p 3629d4b000 00:00 0
3629e00000-3629e02000 r-xp 00000000 08:07 457229
/lib64/libdl-2.5.so
3629e02000-362a002000 ---p 00002000 08:07 457229
/lib64/libdl-2.5.so
362a002000-362a003000 r--p 00002000 08:07 457229
/lib64/libdl-2.5.so
362a003000-362a004000 rw-p 00003000 08:07 457229
/lib64/libdl-2.5.so
362aa00000-362aa82000 r-xp 00000000 08:07 457235
/lib64/libm-2.5.so
362aa82000-362ac81000 ---p 00082000 08:07 457235
/lib64/libm-2.5.so
362ac81000-362ac82000 r--p 00081000 08:07 457235
/lib64/libm-2.5.so
362ac82000-362ac83000 rw-p 00082000 08:07 457235
/lib64/libm-2.5.so
362b600000-362b614000 r-xp 00000000 08:07 5231431
/usr/lib64/libz.so.1.2.3
362b614000-362b813000 ---p 00014000 08:07 5231431
/usr/lib64/libz.so.1.2.3
362b813000-362b814000 rw-p 00013000 08:07 5231431
/usr/lib64/libz.so.1.2.3
362ba00000-362bb33000 r-xp 00000000 08:07 5235374
/usr/lib64/libxml2.so.2.6.26
362bb33000-362bd33000 ---p 00133000 08:07 5235374
/usr/lib64/libxml2.so.2.6.26
362bd33000-362bd3c000 rw-p 00133000 08:07 5235374
/usr/lib64/libxml2.so.2.6.26
362bd3c000-362bd3d000 rw-p 362bd3c000 00:00 0
2aaaaaaab000-2aaaaaaac000 rw-p 2aaaaaaab000 00:00 0
2aaaaaabd000-2aaaaaac1000 rw-p 2aaaaaabd000 00:00 0
2aaaaaac1000-2aaaaaacb000 r-xp 00000000 08:07 456988
/lib64/libnss_files-2.5.so
2aaaaaacb000-2aaaaacca000 ---p 0000a000 08:07 456988
/lib64/libnss_files-2.5.so
2aaaaacca000-2aaaaaccb000 r--p 00009000 08:07 456988
/lib64/libnss_files-2.5.so
2aaaaaccb000-2aaaaaccc000 rw-p 0000a000 08:07 456988
/lib64/libnss_files-2.5.so
2aaaaaccc000-2aaaae299000 r--p 00000000 08:07 5229018
/usr/lib/locale/locale-archive
2aaaae299000-2aabafc41000 rw-s 00000000 00:08 0
/SYSV0063da81 (deleted)
7fff21fda000-7fff21fef000 rw-p 7fff21fda000 00:00 0
[stack]
ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0
[vdso]

Hope this helps,
Matt

Re: Out of memory on SELECT in 8.3.5

From

"Matt Magoffin"

Date:

09 February 2009, 03:57:17

> Agreed, that ulimit isn't reflecting any such limit, but is that really
> the same environment the postmaster gets started in?  I wouldn't trust
> a system startup script to be launched in the same environment that a
> login shell gets.  You might try adding
>     ulimit -a >/tmp/something
> to the startup script to find out for sure.

Our startup script is calling

su -l postgres ...

So I thought the limits would be the same (from the -l flag)? I then tried
to mimic this with the following:

[root@170226-db7 ~]# su -l postgres -c "ulimit -a"
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
max nice                        (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 139264
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
max rt priority                 (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 139264
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

which I think should accurately reflect what the postmaster environment
should be seeing.

Regards,
Matt

Re: Out of memory on SELECT in 8.3.5

From

Stephen Frost

Date:

09 February 2009, 03:59:45

* Matt Magoffin (postgresql.org@msqr.us) wrote:
> There is no /prod/<pid>/limits file, but here are

erp, that stinks.  Must be on an older kernel?  I've got it under (a
Debian-built) 2.6.26.  I can't recall if there's another way to get
limit info for an active process..  Could use Tom's suggestion of
echo'ing ulimit -a out to a file somewhere during database start-up.

    Thanks,

        Stephen

Attachment

signature.asc

Re: Out of memory on SELECT in 8.3.5

From

"Matt Magoffin"

Date:

09 February 2009, 04:06:53

> Yeah, I remember we found a few xml-related leaks based on your reports.
> However, there's not anything here to suggest that this query is
> straining the capabilities of a 64-bit system with lots o RAM.  It seems
> certain you're hitting some artificial process-size limit, and the only
> one I know about is ulimit.
>
> I wasn't aware of /proc/<pid>/limits before, but now that I've heard
> of it, checking that for the postmaster and/or a backend seems like
> a great idea.

This doesn't seem to exist for any process on this box:

[root@170226-db7 ~]# ls /proc/*/limit*
ls: /proc/*/limit*: No such file or directory

If this were a system-defined process-size limit, then should the query
still run out of memory after restarting Postgres? Most likely we'll have
to restart Postgres soon, and I'll retry this query after doing so. Based
on past experience, I'd expect the query to complete at that time.

From what we experience, Postgres seems to be slowly accumulating memory
in the fashion of a small memory leak and things start to fail with
out-of-memory errors after the server has been running for some time (e.g.
roughly 4-6 weeks). Restarting Postgres clears out the problems (after a
restart we can immediately run queries that were failing before the
restart)... but then the cycle starts again.

I just bring this up wondering if there is something possibly accumulating
within Postgres that isn't getting freed and might cause an out-of-memory
error like this in some way.

Regards,
Matt

Re: Out of memory on SELECT in 8.3.5

From

"Matt Magoffin"

Date:

09 February 2009, 04:09:16

> erp, that stinks.  Must be on an older kernel?  I've got it under (a
> Debian-built) 2.6.26.  I can't recall if there's another way to get
> limit info for an active process..  Could use Tom's suggestion of
> echo'ing ulimit -a out to a file somewhere during database start-up.

Yes, this is a RHEL 5 box with kernel 2.6.18... must not exist for this
ancient kernel :-)

Regards,
Matt

Re: Out of memory on SELECT in 8.3.5

From

"Matt Magoffin"

Date:

09 February 2009, 04:23:22

> [root@170226-db7 ~]# su -l postgres -c "ulimit -a"
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> max nice                        (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 139264
> max locked memory       (kbytes, -l) 32
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 1024
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> max rt priority                 (-r) 0
> stack size              (kbytes, -s) 10240
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 139264
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited

I just noticed something: the "open files" limit lists 1024, which is the
default for this system. A quick count of open data files currently in use
by Postgres returns almost 7000, though.

[root@170226-db7 ~]# lsof -u postgres |egrep
'(/pg_data|/pg_index|/pg_log)' |wc -l
6749

We have 100+ postgres processes running, so for an individual process,
could the 1024 file limit be doing anything to this query? Or would I see
an explicit error message regarding this condition?

Regards,
Matt

Re: Out of memory on SELECT in 8.3.5

From

Grzegorz Jaśkiewicz

Date:

09 February 2009, 04:33:13

On Mon, Feb 9, 2009 at 8:23 AM, Matt Magoffin <postgresql.org@msqr.us> wrote:

> I just noticed something: the "open files" limit lists 1024, which is the
> default for this system. A quick count of open data files currently in use
> by Postgres returns almost 7000, though.
>
> [root@170226-db7 ~]# lsof -u postgres |egrep
> '(/pg_data|/pg_index|/pg_log)' |wc -l
> 6749
>
> We have 100+ postgres processes running, so for an individual process,
> could the 1024 file limit be doing anything to this query? Or would I see
> an explicit error message regarding this condition?

you would get one of "Open files rlimit 1024 reached for uid xxxx" in
syslog (which you should checkout anyhow).
I wonder if it isn't just another one of those 'this only happends on
64bit machine' problems :)

--
GJ

Re: Out of memory on SELECT in 8.3.5

From

"Matt Magoffin"

Date:

09 February 2009, 04:54:00

>> We have 100+ postgres processes running, so for an individual process,
>> could the 1024 file limit be doing anything to this query? Or would I
>> see
>> an explicit error message regarding this condition?
>
> you would get one of "Open files rlimit 1024 reached for uid xxxx" in
> syslog (which you should checkout anyhow).

No... nothing like this in syslog.

Regards,
Matt

Re: Out of memory on SELECT in 8.3.5

From

Grzegorz Jaśkiewicz

Date:

09 February 2009, 05:01:57

On Mon, Feb 9, 2009 at 8:53 AM, Matt Magoffin <postgresql.org@msqr.us> wrote:
>>> We have 100+ postgres processes running, so for an individual process,
>>> could the 1024 file limit be doing anything to this query? Or would I
>>> see
>>> an explicit error message regarding this condition?
>>
>> you would get one of "Open files rlimit 1024 reached for uid xxxx" in
>> syslog (which you should checkout anyhow).
>
> No... nothing like this in syslog.
>

So did the backend crashed on this one, or just produced 'out of
memory ' message ?

--
GJ

Re: Out of memory on SELECT in 8.3.5

From

John R Pierce

Date:

09 February 2009, 05:18:55

Matt Magoffin wrote:
> We have 100+ postgres processes running, so for an individual process,
> could the 1024 file limit be doing anything to this query? Or would I see
> an explicit error message regarding this condition?
>
>

with 100 concurrent postgres connections,  if they all did something
requiring large amounts of work_mem, you could allocate 100 * 125MB (I
believe thats what you said it was set to?) which is like 12GB :-O

in fact a single query thats doing multiple sorts of large datasets  for
a messy join (or other similar activity) can involve several instances
of workmem.  multiply that by 100 queries, and ouch.

have you considered using a connection pool to reduce the postgres
process count?

Re: Out of memory on SELECT in 8.3.5

From

Scott Marlowe

Date:

09 February 2009, 05:28:11

On Mon, Feb 9, 2009 at 2:17 AM, John R Pierce <pierce@hogranch.com> wrote:
> Matt Magoffin wrote:
>>
>> We have 100+ postgres processes running, so for an individual process,
>> could the 1024 file limit be doing anything to this query? Or would I see
>> an explicit error message regarding this condition?
>>
>>
>
> with 100 concurrent postgres connections,  if they all did something
> requiring large amounts of work_mem, you could allocate 100 * 125MB (I
> believe thats what you said it was set to?) which is like 12GB :-O
>
> in fact a single query thats doing multiple sorts of large datasets  for a
> messy join (or other similar activity) can involve several instances of
> workmem.  multiply that by 100 queries, and ouch.
>
> have you considered using a connection pool to reduce the postgres process
> count?

No matter what I am pretty conservative with work_mem for these
reasons.  Plus, I tested most of our queries and raising work_mem
above 16Meg had no real positive effect on most queries.  If I have a
single reporting query that can use work_mem over that I set it and
run that query by itself (from things like cron jobs) rather than just
leaving work_mem really high.  High work_mem is a bit of a foot gun.

Re: Out of memory on SELECT in 8.3.5

From

Tom Lane

Date:

09 February 2009, 11:47:26

"Matt Magoffin" <postgresql.org@msqr.us> writes:
> I just noticed something: the "open files" limit lists 1024, which is the
> default for this system. A quick count of open data files currently in use
> by Postgres returns almost 7000, though.

In principle, any problem with that ought to result in explicit error
messages like "too many open files".  In practice, what can happen is
that Postgres can run the kernel out of file table slots (which are a
finite system-wide resource) and then everything else on the box starts
to fall over.  Postgres itself will often survive just fine, since it
has a preprogrammed response to ENFILE to close some other file it
doesn't need right now and try again.  Most other userland code will
just croak if open() starts failing.

Which is a longwinded way of saying "that doesn't seem to match the
symptoms, but ..."  If you're not dead certain that your kernel is
configured to allow *well* north of 7000 open files, you might consider
cutting max_files_per_process in half at your next restart.  I am
wondering if there could be some kind of memory-pressure effect inside
the kernel proper that causes lots of open files to translate into
unexpected denials of requests for more process memory.  It's kind of
grasping at straws, admittedly.  But it seems like we've eliminated the
possibility of a material memory leak inside the Postgres backend
process (even if you don't trust the memory context dump on its own,
the reported process VSIZE backs it up), and the other printout you
showed said that there was hardly any usage of swap space, which seems
to defeat the theory that there's system-wide memory pressure in the
normal sense.

OTOH ... wait a minute.  Have you maybe got the system configured to
start denying memory requests before it gets into significant swapping?
We typically suggest setting vm.overcommit_memory=2 on Linux, but
I'm not sure whether that results in the kernel trying to avoid using
swap at all.  Maybe this really is an issue with system-wide total
memory use rather than any one process.

            regards, tom lane

Re: Out of memory on SELECT in 8.3.5

From

Stephen Frost

Date:

09 February 2009, 12:25:56

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> OTOH ... wait a minute.  Have you maybe got the system configured to
> start denying memory requests before it gets into significant swapping?
> We typically suggest setting vm.overcommit_memory=2 on Linux, but
> I'm not sure whether that results in the kernel trying to avoid using
> swap at all.  Maybe this really is an issue with system-wide total
> memory use rather than any one process.

Err, yea, overcommit_memory=2, a small amount of swap space, and a low
overcommit_ratio could cause this to happen...  The default
ratio is 50 though, which should mean, on this system, there is about
10G available for user processes, but his usage shows only a bit over 2G
being used outside of buffers/caches (based on free output)..

Matt, can you provide the output from these:
cat /proc/sys/vm/overcommit_memory

cat /proc/sys/vm/overcommit_ratio

cat /proc/meminfo

?

    Thanks,

        Stephen

Attachment

signature.asc

Re: Out of memory on SELECT in 8.3.5

From

"Matt Magoffin"

Date:

09 February 2009, 16:02:11

> So did the backend crashed on this one, or just produced 'out of
> memory ' message ?

No crash, just the error message.

-- m@

Re: Out of memory on SELECT in 8.3.5

From

"Matt Magoffin"

Date:

09 February 2009, 16:07:36

> Err, yea, overcommit_memory=2, a small amount of swap space, and a low
> overcommit_ratio could cause this to happen...  The default
> ratio is 50 though, which should mean, on this system, there is about
> 10G available for user processes, but his usage shows only a bit over 2G
> being used outside of buffers/caches (based on free output)..
>
> Matt, can you provide the output from these:
> cat /proc/sys/vm/overcommit_memory
>
> cat /proc/sys/vm/overcommit_ratio
>
> cat /proc/meminfo

Sure, here you go:

[root@170226-db7 ~]# cat /proc/sys/vm/overcommit_memory
2

[root@170226-db7 ~]# cat /proc/sys/vm/overcommit_ratio
50

[root@170226-db7 ~]# cat /proc/meminfo
MemTotal:     16432296 kB
MemFree:        160188 kB
Buffers:        164764 kB
Cached:       14006456 kB
SwapCached:          0 kB
Active:        6984144 kB
Inactive:      8471456 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:     16432296 kB
LowFree:        160188 kB
SwapTotal:     2096440 kB
SwapFree:      2095992 kB
Dirty:             540 kB
Writeback:           0 kB
AnonPages:     1270076 kB
Mapped:        4258192 kB
Slab:           211448 kB
PageTables:     559248 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:  10312588 kB
Committed_AS:  9760756 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    263556 kB
VmallocChunk: 34359474475 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

Regards,
Matt

Re: Out of memory on SELECT in 8.3.5

From

Stephen Frost

Date:

09 February 2009, 16:15:43

* Matt Magoffin (postgresql.org@msqr.us) wrote:
> [root@170226-db7 ~]# cat /proc/meminfo
> CommitLimit:  10312588 kB
> Committed_AS:  9760756 kB

I suspect this may be it...  Apparently, while you're only using about
2G, you've got 10G or so of outstanding commitments, and Linux is
refusing to allocate more.

You probably want to up your overcommit_ratio, esp. in light of the fact
that you've only got 2G of swap on this box.  I'd probably say up it to
80, which would give you 14.8G of commitable memory, leaving some room
in-memory (1.2G) for cache/buffers and whatnot.  Alternativly, you could
go for 90, which would allow commits up to 16.4G, so if everyone used
all their memory, you'd be into swap.

    Enjoy,

        Stephen

Attachment

signature.asc

Re: Out of memory on SELECT in 8.3.5

From

"Matt Magoffin"

Date:

09 February 2009, 16:21:52

> Which is a longwinded way of saying "that doesn't seem to match the
> symptoms, but ..."  If you're not dead certain that your kernel is
> configured to allow *well* north of 7000 open files, you might consider
> cutting max_files_per_process in half at your next restart.

I think it is, looking at /proc/sys/fs/file-max I get

[root@170226-db7 64]# cat /proc/sys/fs/file-max
1573080

> OTOH ... wait a minute.  Have you maybe got the system configured to
> start denying memory requests before it gets into significant swapping?
> We typically suggest setting vm.overcommit_memory=2 on Linux, but
> I'm not sure whether that results in the kernel trying to avoid using
> swap at all.  Maybe this really is an issue with system-wide total
> memory use rather than any one process.

We do have vm.overcommit_memory set to 2 on this system.

I should add that now many queries and just transaction initiations are
failing on this system since yesterday. Now I just saw one of our
applications attempt to initiate a transaction (this is a Java app, so
through JDBC here) and it resulted in an out-of-memory error. This is all
I see in the logs:

[2009-02-09 13:52:13 CST lms_kia FATAL:  out of memory
TopMemoryContext: 53952 total in 6 blocks; 7048 free (7 chunks); 46904 used
  Relcache by OID: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  CacheMemoryContext: 91024 total in 4 blocks; 3856 free (9 chunks); 87168
used
  MdSmgr: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  LOCALLOCK hash: 8192 total in 1 blocks; 3744 free (0 chunks); 4448 used
  Timezones: 53584 total in 2 blocks; 3744 free (0 chunks); 49840 used
  ErrorContext: 8192 total in 1 blocks; 8160 free (4 chunks); 32 used
2009-02-09 13:52:29 CST lms_kia FATAL:  out of memory
2009-02-09 13:52:29 CST lms_kia DETAIL:  Failed on request of size 2456.
TopMemoryContext: 97184 total in 11 blocks; 2216 free (7 chunks); 94968 used
  TopTransactionContext: 8192 total in 1 blocks; 7648 free (0 chunks); 544
used
  Prepared Queries: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used
  Type information cache: 24576 total in 2 blocks; 11888 free (5 chunks);
12688 used
  Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512
used
  Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 chunks);
12688 used
  MessageContext: 8192 total in 1 blocks; 6976 free (0 chunks); 1216 used
  smgr relation table: 24576 total in 2 blocks; 11840 free (4 chunks);
12736 used
  TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks);
32 used
  Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used
    PortalHeapMemory: 122880 total in 17 blocks; 7640 free (102 chunks);
115240 used
      ExecutorState: 89312 total in 4 blocks; 1960 free (2 chunks); 87352
used
        HashTableContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
          HashBatchContext: 125820976 total in 26 blocks; 496 free (6
chunks); 125820480 used
        HashTableContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
          HashBatchContext: 524336 total in 1 blocks; 0 free (0 chunks);
524336 used
        TIDBitmap: 516096 total in 6 blocks; 222528 free (20 chunks);
293568 used
        TupleSort: 32816 total in 2 blocks; 7360 free (0 chunks); 25456 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8096 free (0 chunks); 96 used
        ExprContext: 8192 total in 1 blocks; 8160 free (1 chunks); 32 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        AggContext: 8192 total in 1 blocks; 8016 free (0 chunks); 176 used
          TupleHashTable: 8192 total in 1 blocks; 3744 free (0 chunks);
4448 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  Relcache by OID: 24576 total in 2 blocks; 13872 free (3 chunks); 10704 used
  CacheMemoryContext: 1341680 total in 21 blocks; 502840 free (2 chunks);
838840 used
    unnamed prepared statement: 57344 total in 3 blocks; 2008 free (2
chunks); 55336 used
    cl_event_crm_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    cl_event_lead_id_idx: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
    cl_event_event_name_idx: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
    cl_event_crm_lead_id_idx: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
    cl_event_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    cl_event_first_response_pkey: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
    lead_reporting_data_delivery_meth_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
    lead_reporting_data_typeflags_idx: 2048 total in 1 blocks; 600 free (0
chunks); 1448 used
    lead_reporting_data_unique_dealer_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
    lead_reporting_data_unique_area_idx: 2048 total in 1 blocks; 752 free
(0 chunks); 1296 used
    lead_reporting_data_unique_region_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
    lead_reporting_data_unique_all_idx: 2048 total in 1 blocks; 752 free
(0 chunks); 1296 used
    lead_reporting_data_oem_subtype_idx: 2048 total in 1 blocks; 752 free
(0 chunks); 1296 used
    lead_reporting_data_oem_type_idx: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
    lead_reporting_data_dealer_area_idx: 2048 total in 1 blocks; 752 free
(0 chunks); 1296 used
    lead_reporting_data_customer_lname_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
    lead_reporting_data_customer_fname_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
    lead_reporting_data_customer_postal_code_idx: 2048 total in 1 blocks;
752 free (0 chunks); 1296 used
    lead_reporting_data_dealer_region_area_idx: 2048 total in 1 blocks;
752 free (0 chunks); 1296 used
    lead_reporting_data_dealer_region_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
    lead_reporting_data_dealer_code_idx: 2048 total in 1 blocks; 752 free
(0 chunks); 1296 used
    lead_reporting_data_reporting_date_idx: 2048 total in 1 blocks; 704
free (0 chunks); 1344 used
    lead_reporting_data_model_idx: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
    lead_reporting_data_pkey: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
    config_xml_name_key: 2048 total in 1 blocks; 704 free (0 chunks); 1344
used
    config_xml_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    CachedPlan: 1024 total in 1 blocks; 816 free (0 chunks); 208 used
    CachedPlanSource: 1024 total in 1 blocks; 664 free (1 chunks); 360 used
    ehcache_expiration_idx: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
    ehcache_pkey: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used
    CachedPlan: 1024 total in 1 blocks; 816 free (0 chunks); 208 used
    CachedPlanSource: 1024 total in 1 blocks; 664 free (1 chunks); 360 used
    region_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    dealer_external_system_id_idx: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
    dealer_dealer_code_key: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
    dealer_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    xslt_style_sheet_pkey: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
    pg_index_indrelid_index: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
    pg_ts_dict_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_aggregate_fnoid_index: 3072 total in 2 blocks; 1696 free (2
chunks); 1376 used
    pg_language_name_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_statistic_relid_att_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
    pg_ts_dict_dictname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_namespace_nspname_index: 3072 total in 2 blocks; 1696 free (2
chunks); 1376 used
    pg_opfamily_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_opclass_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
    pg_ts_parser_prsname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_amop_fam_strat_index: 3072 total in 2 blocks; 1384 free (2 chunks);
1688 used
    pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1576 free (3
chunks); 1496 used
    pg_trigger_tgrelid_tgname_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
    pg_cast_source_target_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
    pg_auth_members_role_member_index: 3072 total in 2 blocks; 1648 free
(2 chunks); 1424 used
    pg_attribute_relid_attnum_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
    pg_ts_config_cfgname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_authid_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
    pg_ts_config_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_conversion_default_index: 3072 total in 2 blocks; 1432 free (3
chunks); 1640 used
    pg_language_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_enum_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328
used
    pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1576 free (3
chunks); 1496 used
    pg_ts_parser_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_database_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
    pg_conversion_name_nsp_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_class_relname_nsp_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
    pg_attribute_relid_attnam_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_class_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376
used
    pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1384 free (2
chunks); 1688 used
    pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1384 free (2
chunks); 1688 used
    pg_index_indexrelid_index: 3072 total in 2 blocks; 1696 free (2
chunks); 1376 used
    pg_type_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376
used
    pg_rewrite_rel_rulename_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_authid_rolname_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
    pg_auth_members_member_role_index: 3072 total in 2 blocks; 1648 free
(2 chunks); 1424 used
    pg_enum_typid_label_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_constraint_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_conversion_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_ts_template_tmplname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_ts_config_map_index: 3072 total in 2 blocks; 1624 free (3 chunks);
1448 used
    pg_namespace_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
    pg_type_typname_nsp_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
    pg_operator_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
    pg_amop_opr_fam_index: 3072 total in 2 blocks; 1600 free (2 chunks);
1472 used
    pg_proc_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376
used
    pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1624 free (3
chunks); 1448 used
    pg_ts_template_oid_index: 3072 total in 2 blocks; 1744 free (3
chunks); 1328 used
  MdSmgr: 8192 total in 1 blocks; 6752 free (0 chunks); 1440 used
  LOCALLOCK hash: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used
  Timezones: 53584 total in 2 blocks; 3744 free (0 chunks); 49840 used
  ErrorContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used

Does this provide any useful information? I have other queries that are
failing as well, and I can provide explain output for those if it might
help.

Regards,
Matt

Re: Out of memory on SELECT in 8.3.5

From

"Matt Magoffin"

Date:

09 February 2009, 16:26:05

> * Matt Magoffin (postgresql.org@msqr.us) wrote:
>> [root@170226-db7 ~]# cat /proc/meminfo
>> CommitLimit:  10312588 kB
>> Committed_AS:  9760756 kB
>
> I suspect this may be it...  Apparently, while you're only using about
> 2G, you've got 10G or so of outstanding commitments, and Linux is
> refusing to allocate more.
>
> You probably want to up your overcommit_ratio, esp. in light of the fact
> that you've only got 2G of swap on this box.  I'd probably say up it to
> 80, which would give you 14.8G of commitable memory, leaving some room
> in-memory (1.2G) for cache/buffers and whatnot.  Alternativly, you could
> go for 90, which would allow commits up to 16.4G, so if everyone used
> all their memory, you'd be into swap.

Thanks for the advice. Should we have more than 2GB of swap available? I
thought the goal for a Postgres system was to avoid swap use at all cost?
Would it be better for us to add more swap, or adjust this
overcommit_ratio as you discuss?

Regards,
Matt

Re: Out of memory on SELECT in 8.3.5

From

"Matt Magoffin"

Date:

09 February 2009, 16:33:19

> I suspect this may be it...  Apparently, while you're only using about
> 2G, you've got 10G or so of outstanding commitments, and Linux is
> refusing to allocate more.
>
> You probably want to up your overcommit_ratio, esp. in light of the fact
> that you've only got 2G of swap on this box.  I'd probably say up it to
> 80, which would give you 14.8G of commitable memory, leaving some room
> in-memory (1.2G) for cache/buffers and whatnot.  Alternativly, you could
> go for 90, which would allow commits up to 16.4G, so if everyone used
> all their memory, you'd be into swap.

Also, by adjusting this, would I possibly just be delaying the problem we
currently have (i.e. over time, we start to run out of memory)? I just
wonder why the system is reaching this limit at all... do you feel it is
quite normal for a system with this memory configuration to be configured
with the ratio set to 80? I'm not terribly familiar with these VM
parameters, so I apologize if I sound vague.

Regards,
Matt

Re: Out of memory on SELECT in 8.3.5

From

Stephen Frost

Date:

09 February 2009, 16:33:44

* Matt Magoffin (postgresql.org@msqr.us) wrote:
> Thanks for the advice. Should we have more than 2GB of swap available? I
> thought the goal for a Postgres system was to avoid swap use at all cost?
> Would it be better for us to add more swap, or adjust this
> overcommit_ratio as you discuss?

You do want to avoid swap, but at the same time you don't want to end up
with lots of real memory unavailable for use (as here..).  After
thinking about this for a bit, my suggestion would be to add swap up to
the amount of memory you have, for starters.  That would give you 24G of
commitable memory.  I would then pay attention to free, swap usage, and
/proc/meminfo.  If you start using alot of swap, you might want to
adjust PG to use less memory; if you see alot of free memory but you're
close to your commit limit, up the ratio some.

Basically, the swap would be there just in case all those folks with
outstanding committed memory decide to actually use it.  That's not all
*that* likely to happen, in the end.

    Thanks,

        Stephen

Attachment

signature.asc

Re: Out of memory on SELECT in 8.3.5

From

Scott Marlowe

Date:

09 February 2009, 16:35:38

On Mon, Feb 9, 2009 at 1:25 PM, Matt Magoffin <postgresql.org@msqr.us> wrote:
>> * Matt Magoffin (postgresql.org@msqr.us) wrote:
>>> [root@170226-db7 ~]# cat /proc/meminfo
>>> CommitLimit:  10312588 kB
>>> Committed_AS:  9760756 kB
>>
>> I suspect this may be it...  Apparently, while you're only using about
>> 2G, you've got 10G or so of outstanding commitments, and Linux is
>> refusing to allocate more.
>>
>> You probably want to up your overcommit_ratio, esp. in light of the fact
>> that you've only got 2G of swap on this box.  I'd probably say up it to
>> 80, which would give you 14.8G of commitable memory, leaving some room
>> in-memory (1.2G) for cache/buffers and whatnot.  Alternativly, you could
>> go for 90, which would allow commits up to 16.4G, so if everyone used
>> all their memory, you'd be into swap.
>
> Thanks for the advice. Should we have more than 2GB of swap available? I
> thought the goal for a Postgres system was to avoid swap use at all cost?
> Would it be better for us to add more swap, or adjust this
> overcommit_ratio as you discuss?

I'd do both.  But only after I'd reduced work_mem.  Given that
reducing work_mem removed the problem, it looks to me like pgsql is
requesting several large blocks of ram, then only using a small port
of them.  But overcommit set to 2 means that the OS will not allow an
overcommit of memory to these allocations, the allocations fail, and
you get your error.

Re: Out of memory on SELECT in 8.3.5

From

Scott Marlowe

Date:

09 February 2009, 16:37:41

On Mon, Feb 9, 2009 at 1:32 PM, Matt Magoffin <postgresql.org@msqr.us> wrote:
>> I suspect this may be it...  Apparently, while you're only using about
>> 2G, you've got 10G or so of outstanding commitments, and Linux is
>> refusing to allocate more.
>>
>> You probably want to up your overcommit_ratio, esp. in light of the fact
>> that you've only got 2G of swap on this box.  I'd probably say up it to
>> 80, which would give you 14.8G of commitable memory, leaving some room
>> in-memory (1.2G) for cache/buffers and whatnot.  Alternativly, you could
>> go for 90, which would allow commits up to 16.4G, so if everyone used
>> all their memory, you'd be into swap.
>
> Also, by adjusting this, would I possibly just be delaying the problem we
> currently have (i.e. over time, we start to run out of memory)? I just
> wonder why the system is reaching this limit at all... do you feel it is
> quite normal for a system with this memory configuration to be configured
> with the ratio set to 80? I'm not terribly familiar with these VM
> parameters, so I apologize if I sound vague.

I think that you're fixing a symptom, but ignoring the cause.
Twiddling VM parameters may help out, but this problem of too much
memory allocated is the real issue, so yeah, you're just putting off
the inevitable.

Test your individual queries with varying settings for work_mem.  Find
the weighted-averaged knee of diminishing returns and set work_mem
closer to that.  I'm willing to be it's way less than 128M.  2M to 16M
are the numbers I come up with the most.

Re: Out of memory on SELECT in 8.3.5

From

"Matt Magoffin"

Date:

09 February 2009, 16:42:01

> with 100 concurrent postgres connections,  if they all did something
> requiring large amounts of work_mem, you could allocate 100 * 125MB (I
> believe thats what you said it was set to?) which is like 12GB :-O
>
> in fact a single query thats doing multiple sorts of large datasets  for
> a messy join (or other similar activity) can involve several instances
> of workmem.  multiply that by 100 queries, and ouch.
>
> have you considered using a connection pool to reduce the postgres
> process count?

We do have a connection pool here, and as I mentioned about a quarter of
these are Slony-controlled processes for replication. Most connections are
not doing complex queries of this sort, in fact when this query runs it is
the only query running, most connections are idle (kept open by a
connection pool) or performing short insert/update operations.

But I see your point about the large work_mem, it was set that high to
help speed up big queries such as this one.

-- m@

Re: Out of memory on SELECT in 8.3.5

From

Stephen Frost

Date:

09 February 2009, 16:45:44

* Scott Marlowe (scott.marlowe@gmail.com) wrote:
> I'd do both.  But only after I'd reduced work_mem.  Given that
> reducing work_mem removed the problem, it looks to me like pgsql is
> requesting several large blocks of ram, then only using a small port
> of them.  But overcommit set to 2 means that the OS will not allow an
> overcommit of memory to these allocations, the allocations fail, and
> you get your error.

Reducing work_mem solved the problem by reducing the memory used, sure.
In the end, however, we actually want to make the other 14G or so in
the box useful for something, not just try to fit everything in under
that 10G committed limit.  PG allocates the memory it needs, it doesn't
just allocate whatever work_mem is set to (in fact, I don't believe
work_mem is ever directly used in allocations).

I'm not against looking at lowering work_mem to something else, but you
don't want to suddenly get shifted over to disk-based sorts or similar
when your data set grows too big if there's a bunch of unused memory in
the box.

    Thanks,

        Stephen

Attachment

signature.asc

Re: Out of memory on SELECT in 8.3.5

From

Stephen Frost

Date:

09 February 2009, 16:57:50

* Scott Marlowe (scott.marlowe@gmail.com) wrote:
> I think that you're fixing a symptom, but ignoring the cause.
> Twiddling VM parameters may help out, but this problem of too much
> memory allocated is the real issue, so yeah, you're just putting off
> the inevitable.

I don't think changing work_mem down is actually going to reduce the
memory allocated without changing the plan to something less optimal.
In the end, all of this is putting off the inevitable, if you get enough
PGs going and enough requests and whatnot, you're going to start running
out of memory again.  Same if you get larger data sets that take up more
hash table space or similar.  Eventually you might need a bigger box,
but let's try to get everything in the current box to at least be used
first..

    Thanks,

        Stephen

Attachment

signature.asc

Re: Out of memory on SELECT in 8.3.5

From

"Matt Magoffin"

Date:

09 February 2009, 16:58:31

> I'd do both.  But only after I'd reduced work_mem.  Given that
> reducing work_mem removed the problem, it looks to me like pgsql is
> requesting several large blocks of ram, then only using a small port
> of them.  But overcommit set to 2 means that the OS will not allow an
> overcommit of memory to these allocations, the allocations fail, and
> you get your error.

OK, I've re-configured work_mem, and set vm.overcommit_ratio to 80.
Without restarting Postgres, I was now able to run that big query posted
earlier in this thread that failed... so the overcommit_ratio adjustment
helped there. The EXPLAIN ANALYZE for that I've included below, where I
can see it did use in-memory sorting for some of the sorts.

                                                                                                        QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=2297842.92..2297943.76 rows=40336 width=234) (actual
time=370440.041..370483.133 rows=49317 loops=1)
   Sort Key: s.sale_date, s.vin
   Sort Method:  quicksort  Memory: 9357kB
   ->  Hash Left Join  (cost=1450365.72..2294757.26 rows=40336 width=234)
(actual time=293212.004..369857.956 rows=49317 loops=1)
         Hash Cond: (ml.lead_id = phone.lead_id)
         ->  Hash Left Join  (cost=1341348.27..2140418.89 rows=40336
width=219) (actual time=286374.489..362880.702 rows=49317
loops=1)
               Hash Cond: (ml.lead_id = email.lead_id)
               ->  Nested Loop Left Join  (cost=1237270.73..2000634.35
rows=40336 width=204) (actual time=278652.051..355022.014
rows=49317 loops=1)
                     ->  Nested Loop Left Join
(cost=1237270.73..1807035.53 rows=40336 width=141)
(actual time=278635.414..323774.871 rows=49317
loops=1)
                           ->  Nested Loop Left Join
(cost=1237270.73..1587513.22 rows=40336
width=126) (actual time=278632.427..317952.620
rows=49317 loops=1)
                                 ->  Nested Loop Left Join
(cost=1237270.73..1414103.19 rows=40336
width=118) (actual
time=278606.034..297951.038 rows=49317
loops=1)
                                       ->  Merge Left Join
(cost=1237270.73..1238315.78
rows=40336 width=89) (actual
time=278587.812..279498.796
rows=49317 loops=1)
                                             Merge Cond: (ml.lead_id =
lrdm.lead_id)
                                             ->  Sort
(cost=46384.08..46484.92
rows=40336 width=78) (actual
time=6200.810..6240.526
rows=49317 loops=1)
                                                   Sort Key: ml.lead_id
                                                   Sort Method:  quicksort
 Memory: 8472kB
                                                   ->  Hash Left Join
(cost=9430.33..43298.42
rows=40336 width=78)
(actual
time=1079.869..6084.010
rows=49317 loops=1)
                                                         Hash Cond:
((s.dealer_code)::text
=
(d.dealer_code)::text)
                                                         ->  Hash Left
Join
(cost=9094.04..42407.50
rows=40336
width=60) (actual
time=1074.170..5947.646
rows=49317
loops=1)
                                                               Hash Cond:
(s.id =
ml.sale_id)
                                                               Filter:
(((s.sale_type
=
'd'::bpchar)
AND (NOT
ml.lost_sale))
OR
((s.sale_type
=
'c'::bpchar)
AND
(ml.lead_pos
= 0)) OR
(s.sale_type
=
'0'::bpchar))
                                                               ->  Index
Scan using
mb_sale_sale_date_idx
on mb_sale
s
(cost=0.00..14027.94
rows=43804
width=50)
(actual
time=55.663..4683.901
rows=49152
loops=1)
                                                                     Index
Cond:
((sale_date
>=
'2009-01-01'::date)
AND
(sale_date
<=
'2009-01-31'::date))
                                                               ->  Hash
(cost=5577.35..5577.35
rows=281335
width=26)
(actual
time=1018.108..1018.108
rows=281335
loops=1)
                                                                     ->
Seq
Scan
on
mb_lead
ml
(cost=0.00..5577.35
rows=281335
width=26)
(actual
time=3.451..516.245
rows=281335
loops=1)
                                                         ->  Hash
(cost=321.13..321.13
rows=1213
width=23) (actual
time=5.577..5.577
rows=1213
loops=1)
                                                               ->  Seq
Scan on
dealer d
(cost=0.00..321.13
rows=1213
width=23)
(actual
time=0.056..3.684
rows=1213
loops=1)
                                             ->  Sort
(cost=1190886.66..1191208.43
rows=128709 width=19) (actual
time=270075.460..271851.519
rows=1442527 loops=1)
                                                   Sort Key: lrdm.lead_id
                                                   Sort Method:  external
sort  Disk: 56072kB
                                                   ->  Bitmap Heap Scan on
lead_reporting_meta
lrdm
(cost=118847.85..1179963.28
rows=128709 width=19)
(actual
time=103684.796..261544.708
rows=1462381 loops=1)
                                                         Recheck Cond:
(item_key =
'[ALS:prospectid]TrackingCode'::text)
                                                         Filter: (pos = 1)
                                                         ->  Bitmap Index
Scan on
lead_reporting_meta_item_key_idx

(cost=0.00..118815.67
rows=1476580
width=0) (actual
time=102982.150..102982.150
rows=1484068
loops=1)
                                                               Index Cond:
(item_key =
'[ALS:prospectid]TrackingCode'::text)
                                       ->  Index Scan using
lead_reporting_address_lead_id_idx
on lead_reporting_address address
(cost=0.00..4.35 rows=1 width=37)
(actual time=0.370..0.371 rows=0
loops=49317)
                                             Index Cond: (address.lead_id
= ml.lead_id)
                                 ->  Index Scan using
lead_reporting_street_address_id_idx on
lead_reporting_street address_street
(cost=0.00..4.29 rows=1 width=24) (actual
time=0.402..0.403 rows=0 loops=49317)
                                       Index Cond:
(address_street.address_id =
address.id)
                                       Filter: (address_street.pos = 0)
                           ->  Index Scan using lead_pkey on lead l
(cost=0.00..5.43 rows=1 width=23) (actual
time=0.114..0.115 rows=0 loops=49317)
                                 Index Cond: (l.id = ml.lead_id)
                     ->  Index Scan using lead_reporting_data_pkey on
lead_reporting_data lrd  (cost=0.00..4.79 rows=1
width=71) (actual time=0.630..0.630 rows=0
loops=49317)
                           Index Cond: (lrd.lead_id = ml.lead_id)
               ->  Hash  (cost=85837.99..85837.99 rows=1459164 width=23)
(actual time=7719.918..7719.918 rows=1522674 loops=1)
                     ->  Seq Scan on lead_reporting_list_data email
(cost=0.00..85837.99 rows=1459164 width=23) (actual
time=6.258..5105.843 rows=1522674 loops=1)
                           Filter: ((list_type = 'e'::bpchar) AND (pos = 0))
         ->  Hash  (cost=85837.99..85837.99 rows=1854357 width=23) (actual
time=6834.882..6834.882 rows=1805273 loops=1)
               ->  Seq Scan on lead_reporting_list_data phone
(cost=0.00..85837.99 rows=1854357 width=23) (actual
time=0.027..3917.300 rows=1805273 loops=1)
                     Filter: ((list_type = 'p'::bpchar) AND (pos = 0))
 Total runtime: 370595.083 ms
(51 rows)

Re: Out of memory on SELECT in 8.3.5

From

"Matt Magoffin"

Date:

09 February 2009, 17:01:51

> I don't think changing work_mem down is actually going to reduce the
> memory allocated without changing the plan to something less optimal.
> In the end, all of this is putting off the inevitable, if you get enough
> PGs going and enough requests and whatnot, you're going to start running
> out of memory again.  Same if you get larger data sets that take up more
> hash table space or similar.  Eventually you might need a bigger box,
> but let's try to get everything in the current box to at least be used
> first..

Yes... and indeed changing vm.overcommit_ratio to 80 does allow that
previously-failing query to execute successfully. Do you think this is
also what caused the out-of-memory error we saw today just when a
transaction was initiated?

Regards,
Matt

Re: Out of memory on SELECT in 8.3.5

From

Stephen Frost

Date:

09 February 2009, 17:03:17

* Matt Magoffin (postgresql.org@msqr.us) wrote:
> Also, by adjusting this, would I possibly just be delaying the problem we
> currently have (i.e. over time, we start to run out of memory)? I just
> wonder why the system is reaching this limit at all... do you feel it is
> quite normal for a system with this memory configuration to be configured
> with the ratio set to 80? I'm not terribly familiar with these VM
> parameters, so I apologize if I sound vague.

In terms of 'normal', I tend to build my systems with equal swap and
memory, and then leave the overcommit ratio at the default.  Still, I
think the best thing is to just keep tabs on how much memory has been
committed, how much is actually being used, how much swap you're using,
and what your commit limit is.  You can then fine-tune things as needed.

    Stephen

Attachment

signature.asc

Re: Out of memory on SELECT in 8.3.5

From

Stephen Frost

Date:

09 February 2009, 17:05:57

* Matt Magoffin (postgresql.org@msqr.us) wrote:
> Yes... and indeed changing vm.overcommit_ratio to 80 does allow that
> previously-failing query to execute successfully. Do you think this is
> also what caused the out-of-memory error we saw today just when a
> transaction was initiated?

Almost certainly..  You were running close to the edge *system-wide* on
memory allocations based on your meminfo results.  You'll want to keep
an eye on the limit vs. the committed amount going forward so as to
avoid this happening in the future.  It's just about the same as
watching how you're doing on overall memory usage in the box or on swap
usage.  Set up some nagios monitoring, and alarm once
Committed_As/CommitLimit goes over 90%.

    Thanks,

        Stephen

Attachment

signature.asc

Re: Out of memory on SELECT in 8.3.5

From

Scott Marlowe

Date:

09 February 2009, 17:32:32

On Mon, Feb 9, 2009 at 2:01 PM, Matt Magoffin <postgresql.org@msqr.us> wrote:
>> I don't think changing work_mem down is actually going to reduce the
>> memory allocated without changing the plan to something less optimal.
>> In the end, all of this is putting off the inevitable, if you get enough
>> PGs going and enough requests and whatnot, you're going to start running
>> out of memory again.  Same if you get larger data sets that take up more
>> hash table space or similar.  Eventually you might need a bigger box,
>> but let's try to get everything in the current box to at least be used
>> first..
>
> Yes... and indeed changing vm.overcommit_ratio to 80 does allow that
> previously-failing query to execute successfully. Do you think this is
> also what caused the out-of-memory error we saw today just when a
> transaction was initiated?

Curious, what's the explain analyze look like for that one?

Re: Out of memory on SELECT in 8.3.5

From

Scott Marlowe

Date:

09 February 2009, 17:35:35

On Mon, Feb 9, 2009 at 1:57 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Scott Marlowe (scott.marlowe@gmail.com) wrote:
>> I think that you're fixing a symptom, but ignoring the cause.
>> Twiddling VM parameters may help out, but this problem of too much
>> memory allocated is the real issue, so yeah, you're just putting off
>> the inevitable.
>
> I don't think changing work_mem down is actually going to reduce the
> memory allocated without changing the plan to something less optimal.
> In the end, all of this is putting off the inevitable, if you get enough
> PGs going and enough requests and whatnot, you're going to start running
> out of memory again.  Same if you get larger data sets that take up more
> hash table space or similar.  Eventually you might need a bigger box,
> but let's try to get everything in the current box to at least be used
> first..

That's not always true.  It's not uncommon to have the same plan just
spill to disk later.  If the dataset is big enough that you're gonna
need to spill to disk anyway, then the difference in performance is
often pretty small.  Again, it's why I was saying he needs to
benchmark ALL his queries, by weight, and see what work_mem gives
optimal results without catering to just one or two big queries.  If
needs be, set work_mem for the connection big and avoid the foot gun
that is large global work_mem.

Far better to waste a few gigs of memory than to start having midday
failures under max load due to oom conditions.

Re: Out of memory on SELECT in 8.3.5

From

"Matt Magoffin"

Date:

09 February 2009, 17:40:44

>> Yes... and indeed changing vm.overcommit_ratio to 80 does allow that
>> previously-failing query to execute successfully. Do you think this is
>> also what caused the out-of-memory error we saw today just when a
>> transaction was initiated?
>
> Curious, what's the explain analyze look like for that one?

Do you mean the transaction initiation? I'm not sure how to get an EXPLAIN
for that, the application never got to do anything, from the application
side it failed with out-of-memory while trying to open the connection. Or,
the most precise I have is that in the JDBC driver, it failed at

Caused by: org.postgresql.util.PSQLException: FATAL: out of memory
        at
org.postgresql.core.v3.ConnectionFactoryImpl.readStartupMessages(ConnectionFactoryImpl.java:444) 
       at
org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:99)
        at
org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:66)
        at
org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Connection.java:124)
        at
org.postgresql.jdbc3.AbstractJdbc3Connection.<init>(AbstractJdbc3Connection.java:30)
        at
org.postgresql.jdbc3g.Jdbc3gConnection.<init>(Jdbc3gConnection.java:24)
        at org.postgresql.Driver.makeConnection(Driver.java:386)
        at org.postgresql.Driver.connect(Driver.java:260)

And the corresponding error in the Postgres log is

2009-02-09 13:52:13 CST lms_kia FATAL:  out of memory
TopMemoryContext: 53952 total in 6 blocks; 7048 free (7 chunks); 46904 used
  Relcache by OID: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  CacheMemoryContext: 91024 total in 4 blocks; 3856 free (9 chunks); 87168
used
  MdSmgr: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  LOCALLOCK hash: 8192 total in 1 blocks; 3744 free (0 chunks); 4448 used
  Timezones: 53584 total in 2 blocks; 3744 free (0 chunks); 49840 used
  ErrorContext: 8192 total in 1 blocks; 8160 free (4 chunks); 32 used
2009-02-09 13:52:29 CST lms_kia FATAL:  out of memory
2009-02-09 13:52:29 CST lms_kia DETAIL:  Failed on request of size 2456.

Re: Out of memory on SELECT in 8.3.5

From

Tom Lane

Date:

09 February 2009, 17:43:47

Stephen Frost <sfrost@snowman.net> writes:
> * Matt Magoffin (postgresql.org@msqr.us) wrote:
>> Yes... and indeed changing vm.overcommit_ratio to 80 does allow that
>> previously-failing query to execute successfully. Do you think this is
>> also what caused the out-of-memory error we saw today just when a
>> transaction was initiated?

> Almost certainly..  You were running close to the edge *system-wide* on
> memory allocations based on your meminfo results.  You'll want to keep
> an eye on the limit vs. the committed amount going forward so as to
> avoid this happening in the future.  It's just about the same as
> watching how you're doing on overall memory usage in the box or on swap
> usage.  Set up some nagios monitoring, and alarm once
> Committed_As/CommitLimit goes over 90%.

IIRC, our documentation points out vm.overcommit_memory as a critical
parameter, but says nothing about overcommit_ratio nor how to find out
how close you are to hitting that limit.  Without necessarily being
prescriptive about the best strategy, should we add some info to let
people know about these things?  This thread was mostly news to me,
so I suspect it's news to some other folk as well.

            regards, tom lane

Re: Out of memory on SELECT in 8.3.5

From

Scott Marlowe

Date:

09 February 2009, 18:14:04

On Mon, Feb 9, 2009 at 2:40 PM, Matt Magoffin <postgresql.org@msqr.us> wrote:
>>> Yes... and indeed changing vm.overcommit_ratio to 80 does allow that
>>> previously-failing query to execute successfully. Do you think this is
>>> also what caused the out-of-memory error we saw today just when a
>>> transaction was initiated?
>>
>> Curious, what's the explain analyze look like for that one?
>
> Do you mean the transaction initiation? I'm not sure how to get an EXPLAIN
> for that, the application never got to do anything, from the application
> side it failed with out-of-memory while trying to open the connection. Or,
> the most precise I have is that in the JDBC driver, it failed at

No, explain analyze for the query that wouldn't execute before but now
does, with, I assume, a large work_mem.  I'd like to see how it
differes from the one with smaller work_mem.

Re: Out of memory on SELECT in 8.3.5

From

"Matt Magoffin"

Date:

09 February 2009, 18:58:13

> No, explain analyze for the query that wouldn't execute before but now
> does, with, I assume, a large work_mem.  I'd like to see how it
> differes from the one with smaller work_mem.

Ah, I pasted that in an earlier email, sent February 10, 2009 9:58:00 AM
GMT+13:00... that plan was the one using still the 128MB of work_mem after
changing the overcommit_ratio to 80.

Regards,
Matt

Re: Out of memory on SELECT in 8.3.5

From

Scott Marlowe

Date:

09 February 2009, 20:11:37

On Mon, Feb 9, 2009 at 3:57 PM, Matt Magoffin <postgresql.org@msqr.us> wrote:
>> No, explain analyze for the query that wouldn't execute before but now
>> does, with, I assume, a large work_mem.  I'd like to see how it
>> differes from the one with smaller work_mem.
>
> Ah, I pasted that in an earlier email, sent February 10, 2009 9:58:00 AM
> GMT+13:00... that plan was the one using still the 128MB of work_mem after
> changing the overcommit_ratio to 80.

Opps, my bad.  I thought that was just explain.  I'll go look.

Re: Out of memory on SELECT in 8.3.5

From

Scott Marlowe

Date:

09 February 2009, 22:36:39

On Mon, Feb 9, 2009 at 1:58 PM, Matt Magoffin <postgresql.org@msqr.us> wrote:

I wonder if this is the problem, or part of it.  This part of the
explain analyze on down, there's 1.4M rows, when the planner seems to
expect the number of rows to be chopped down quite a bit more when it
goes from the bitmap index scan to the bitmap heap scan.  I'm
wondering if you're hitting one of those issues where pgsql thinks it
can fit some operation into work_mem and it can't and the allocation
fails.  Just a guess, I'd run that by someone else before I took it as
fact.

>                                             ->  Sort
> (cost=1190886.66..1191208.43
> rows=128709 width=19) (actual
> time=270075.460..271851.519
> rows=1442527 loops=1)
>                                                   Sort Key: lrdm.lead_id
>                                                   Sort Method:  external
> sort  Disk: 56072kB
>                                                   ->  Bitmap Heap Scan on
> lead_reporting_meta
> lrdm
> (cost=118847.85..1179963.28
> rows=128709 width=19)
> (actual
> time=103684.796..261544.708
> rows=1462381 loops=1)
>                                                         Recheck Cond:
> (item_key =
> '[ALS:prospectid]TrackingCode'::text)
>                                                         Filter: (pos = 1)
>                                                         ->  Bitmap Index
> Scan on
> lead_reporting_meta_item_key_idx
>
> (cost=0.00..118815.67
> rows=1476580
> width=0) (actual
> time=102982.150..102982.150
> rows=1484068
> loops=1)
>                                                               Index Cond:
> (item_key =
> '[ALS:prospectid]TrackingCode'::text)
>                                       ->  Index Scan using
> lead_reporting_address_lead_id_idx
> on lead_reporting_address address
> (cost=0.00..4.35 rows=1 width=37)
> (actual time=0.370..0.371 rows=0
> loops=49317)
>                                             Index Cond: (address.lead_id
> = ml.lead_id)
>                                 ->  Index Scan using
> lead_reporting_street_address_id_idx on
> lead_reporting_street address_street
> (cost=0.00..4.29 rows=1 width=24) (actual
> time=0.402..0.403 rows=0 loops=49317)
>                                       Index Cond:
> (address_street.address_id =
> address.id)
>                                       Filter: (address_street.pos = 0)
>                           ->  Index Scan using lead_pkey on lead l
> (cost=0.00..5.43 rows=1 width=23) (actual
> time=0.114..0.115 rows=0 loops=49317)
>                                 Index Cond: (l.id = ml.lead_id)
>                     ->  Index Scan using lead_reporting_data_pkey on
> lead_reporting_data lrd  (cost=0.00..4.79 rows=1
> width=71) (actual time=0.630..0.630 rows=0
> loops=49317)
>                           Index Cond: (lrd.lead_id = ml.lead_id)
>               ->  Hash  (cost=85837.99..85837.99 rows=1459164 width=23)
> (actual time=7719.918..7719.918 rows=1522674 loops=1)
>                     ->  Seq Scan on lead_reporting_list_data email
> (cost=0.00..85837.99 rows=1459164 width=23) (actual
> time=6.258..5105.843 rows=1522674 loops=1)
>                           Filter: ((list_type = 'e'::bpchar) AND (pos = 0))
>         ->  Hash  (cost=85837.99..85837.99 rows=1854357 width=23) (actual
> time=6834.882..6834.882 rows=1805273 loops=1)
>               ->  Seq Scan on lead_reporting_list_data phone
> (cost=0.00..85837.99 rows=1854357 width=23) (actual
> time=0.027..3917.300 rows=1805273 loops=1)
>                     Filter: ((list_type = 'p'::bpchar) AND (pos = 0))
>  Total runtime: 370595.083 ms
> (51 rows)