OOM Errors as a result of table inheritance and a bad plan(?) - Mailing list pgsql-performance

From Arctic Toucan
Subject OOM Errors as a result of table inheritance and a bad plan(?)
Date
Msg-id BAY126-W604E273FFED65685953D679CB20@phx.gbl
Whole thread Raw
Responses Re: OOM Errors as a result of table inheritance and a bad plan(?)
List pgsql-performance
In keeping with some of the recent threads regarding the planner...
 
I have a fair sized data warehouse in which I am trying to perform an aggregation, but getting OOM errors in Postgres(8.2.4).
I believe the reason for the OOM is that Postgres is attempting to do a hash aggregation, but it has grossly underestimated the rows resulting from the aggregation.
The data in the database is very uniformly distributed so I don't believe that the table stats are the cause of the problem.
This may be related to table inheritance, and can be demonstrated pretty easily.
 
CREATE TABLE foo(a INT);
ANALYZE foo;
CREATE TABLE foo_1() INHERITS(foo);
insert into foo_1 select  generate_series(1,100000);
insert into foo_1 select  generate_series(1,100000);
insert into foo_1 select  generate_series(1,100000);
ANALYZE foo_1;

CREATE TABLE foo_2() INHERITS(foo);
insert into foo_2 select  generate_series(1,100000);
insert into foo_2 select  generate_series(1,100000);
insert into foo_2 select  generate_series(1,100000);
ANALYZE foo_2;
-- If I query a particular partition, the plan estimate for the hash aggregate is good
EXPLAIN ANALYZE SELECT a,COUNT(*) from foo_1 group by a;
 HashAggregate  (cost=5822.00..7061.01 rows=99121 width=4) (actual time=554.556..657.121 rows=100000 loops=1)
   ->  Seq Scan on foo_1  (cost=0.00..4322.00 rows=300000 width=4) (actual time=0.014..203.290 rows=300000 loops=1)
 Total runtime: 712.211 ms

-- If I query the base table, the plan estimate for the hash aggregate is off by several orders of magnitude
EXPLAIN ANALYZE SELECT a,COUNT(*) from foo group by a;
HashAggregate  (cost=11686.10..11688.60 rows=200 width=4) (actual time=1724.188..1826.630 rows=100000 loops=1)
   ->  Append  (cost=0.00..8675.40 rows=602140 width=4) (actual time=0.016..1045.134 rows=600000 loops=1)
         ->  Seq Scan on foo  (cost=0.00..31.40 rows=2140 width=4) (actual time=0.001..0.001 rows=0 loops=1)
         ->  Seq Scan on foo_1 foo  (cost=0.00..4322.00 rows=300000 width=4) (actual time=0.012..205.130 rows=300000 loops=1)
         ->  Seq Scan on foo_2 foo  (cost=0.00..4322.00 rows=300000 width=4) (actual time=0.011..203.542 rows=300000 loops=1)
 Total runtime: 1879.550 ms
(6 rows)
-- Is there something magical about the hash aggregate estimate of 200 rows?
-- I can have 30,000 or 300,000 rows in each child partition table and multiple partition's with different values of "a" and yet it always come up with 200.
-- eg.
create table foo_3() inherits(foo);
insert into foo_3 select generate_series(100000,300000);
analyze foo_3;
EXPLAIN ANALYZE SELECT a,COUNT(*) from foo group by a;
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=15568.12..15570.62 rows=200 width=4) (actual time=2386.813..2691.254 rows=300000 loops=1)
   ->  Append  (cost=0.00..11557.41 rows=802141 width=4) (actual time=0.016..1403.121 rows=800001 loops=1)
         ->  Seq Scan on foo  (cost=0.00..31.40 rows=2140 width=4) (actual time=0.002..0.002 rows=0 loops=1)
         ->  Seq Scan on foo_1 foo  (cost=0.00..4322.00 rows=300000 width=4) (actual time=0.013..201.549 rows=300000 loops=1)
         ->  Seq Scan on foo_2 foo  (cost=0.00..4322.00 rows=300000 width=4) (actual time=0.010..211.332 rows=300000 loops=1)
         ->  Seq Scan on foo_3 foo  (cost=0.00..2882.01 rows=200001 width=4) (actual time=0.011..137.262 rows=200001 loops=1)
 Total runtime: 2851.990 ms

Is this a bug, or some subtlety of the Postgres query planner?
 
In my particular case, I am doing a join out to another small table as part of the aggregation and using constraint exclusion on the partitions, but I believe the cause of my problem is the same.
 
I am running on 64bit FreeBSD/Postgres 8.2.4 on a machine with 8GB of memory.
 
An explain of the query and resulting OOM diagnostics follow:
 
The aggregation will result in 5,000,000 rows, not 5,000.
In the stats_dtl table there are 12 observations(hourly) for each customer
There are 125 different policy_id and 25 different policy_group_id's
Policy's and policy_groups are even distributed across all customers
 
 userquery Scan table_4760  (cost=2897243.60..2897418.60 rows=5000 width=152)
   ->  HashAggregate  (cost=2897243.60..2897368.60 rows=5000 width=40)
         ->  Hash Join  (cost=7.81..2241002.00 rows=37499520 width=40)
               Hash Cond: (public.customer_stats_dtl.policy_id = policy.policy_id)
               ->  Append  (cost=0.00..1641001.87 rows=59999232 width=40)
                     ->  Seq Scan on customer_stats_dtl  (cost=0.00..22.45 rows=4 width=40)
                           Filter: ((period_start >= '2007-09-08 20:00:00-04'::timestamp with time zone) AND (period_start < '2007-09-09 08:00:00-04'::timestamp with time zone))
                     ->  Seq Scan on customer_stats_dtl_027 customer_stats_dtl  (cost=0.00..1640979.42 rows=59999228 width=40)
                           Filter: ((period_start >= '2007-09-08 20:00:00-04'::timestamp with time zone) AND (period_start < '2007-09-09 08:00:00-04'::timestamp with time zone))
               ->  Hash  (cost=6.25..6.25 rows=125 width=8)
                     ->  Seq Scan on policy  (cost=0.00..6.25 rows=125 width=8)
                    
TopMemoryContext: 268400 total in 32 blocks; 20912 free (43 chunks); 247488 used
unnamed prepared statement: 2097152 total in 8 blocks; 721456 free (2 chunks); 1375696 used
TopTransactionContext: 8192 total in 1 blocks; 7648 free (0 chunks); 544 used
SPI Plan: 3072 total in 2 blocks; 1320 free (0 chunks); 1752 used
SPI Plan: 7168 total in 3 blocks; 1312 free (0 chunks); 5856 used
SPI Plan: 3072 total in 2 blocks; 1320 free (0 chunks); 1752 used
SPI Plan: 3072 total in 2 blocks; 928 free (0 chunks); 2144 used
SPI Plan: 7168 total in 3 blocks; 3920 free (0 chunks); 3248 used
SPI Plan: 3072 total in 2 blocks; 1224 free (0 chunks); 1848 used
PL/PgSQL function context: 24576 total in 2 blocks; 12008 free (9 chunks); 12568 used
CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
Rendezvous variable hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PLpgSQL function cache: 24224 total in 2 blocks; 3744 free (0 chunks); 20480 used
Type information cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used
S_2: 1024 total in 1 blocks; 488 free (0 chunks); 536 used
S_1: 1024 total in 1 blocks; 488 free (0 chunks); 536 used
Prepared Queries: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used
Record information cache: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used
MessageContext: 8192 total in 1 blocks; 7744 free (0 chunks); 448 used
Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
smgr relation table: 57344 total in 3 blocks; 21872 free (8 chunks); 35472 used
TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used
Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PortalMemory: 8192 total in 1 blocks; 7888 free (1 chunks); 304 used
PortalHeapMemory: 1024 total in 1 blocks; 768 free (0 chunks); 256 used
ExecutorState: 122880 total in 4 blocks; 68728 free (2 chunks); 54152 used
HashTableContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
HashBatchContext: 32888 total in 3 blocks; 14512 free (0 chunks); 18376 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
AggContext: 1033887744 total in 133 blocks; 688 free (0 chunks); 1033887056 used               **
TupleHashTable: 566485040 total in 78 blocks; 2064896 free (304 chunks); 564420144 used        **
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8040 free (1 chunks); 152 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Relcache by OID: 57344 total in 3 blocks; 31024 free (6 chunks); 26320 used
CacheMemoryContext: 4679720 total in 23 blocks; 404104 free (1 chunks); 4275616 used
 
If you have actually read this far, the following wrinkle is where I am currently stuck.
 
I set ENABLE_HASHAGG=OFF and still get an OOM :-(
In this situation Postgres runs for much longer before getting the OOM and I believe it has actually written some records to the aggregation
I base this on seeing autovacuum kick off against the aggregation table after the query fails, which may not be valid.

userquery Scan table_4760  (cost=8771233.77..9521474.17 rows=5000 width=152)
  ->  GroupAggregate  (cost=8771233.77..9521349.17 rows=5000 width=40)
        ->  Sort  (cost=8771233.77..8864982.57 rows=37499520 width=40)
              Sort Key: public.customer_stats_dtl.user_id, policy.policy_group_id
              ->  Hash Join  (cost=7.81..2241002.00 rows=37499520 width=40)
                    Hash Cond: (public.customer_stats_dtl.policy_id = policy.policy_id)
                    ->  Append  (cost=0.00..1641001.87 rows=59999232 width=40)
                          ->  Seq Scan on customer_stats_dtl  (cost=0.00..22.45 rows=4 width=40)
                                Filter: ((period_start >= '2007-09-08 20:00:00-04'::timestamp with time zone) AND (period_start < '2007-09-09 08:00:00-04'::timestamp with time zone))
                          ->  Seq Scan on customer_stats_dtl_027 customer_stats_dtl  (cost=0.00..1640979.42 rows=59999228 width=40)
                                Filter: ((period_start >= '2007-09-08 20:00:00-04'::timestamp with time zone) AND (period_start < '2007-09-09 08:00:00-04'::timestamp with time zone))
                    ->  Hash  (cost=6.25..6.25 rows=125 width=8)
                          ->  Seq Scan on policy  (cost=0.00..6.25 rows=125 width=8)
 
TopMemoryContext: 874608 total in 106 blocks; 14240 free (45 chunks); 860368 used
unnamed prepared statement: 2097152 total in 8 blocks; 720544 free (2 chunks); 1376608 used
TopTransactionContext: 8192 total in 1 blocks; 7648 free (0 chunks); 544 used
AfterTriggerEvents: 758112256 total in 102 blocks; 1792 free (8 chunks); 758110464 used
S_2: 1024 total in 1 blocks; 488 free (0 chunks); 536 used
Type information cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used
SPI Plan: 3072 total in 2 blocks; 1320 free (0 chunks); 1752 used
SPI Plan: 7168 total in 3 blocks; 1312 free (0 chunks); 5856 used
SPI Plan: 3072 total in 2 blocks; 1320 free (0 chunks); 1752 used
SPI Plan: 3072 total in 2 blocks; 928 free (0 chunks); 2144 used
SPI Plan: 7168 total in 3 blocks; 3920 free (0 chunks); 3248 used
SPI Plan: 3072 total in 2 blocks; 1224 free (0 chunks); 1848 used
PL/PgSQL function context: 24576 total in 2 blocks; 12008 free (9 chunks); 12568
 used
CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
Rendezvous variable hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PLpgSQL function cache: 24224 total in 2 blocks; 3744 free (0 chunks); 20480 used
S_1: 1024 total in 1 blocks; 488 free (0 chunks); 536 used
Prepared Queries: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used
Record information cache: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used
MessageContext: 8192 total in 1 blocks; 7744 free (0 chunks); 448 used
Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
smgr relation table: 253952 total in 5 blocks; 40912 free (16 chunks); 213040 used
TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used
Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PortalMemory: 8192 total in 1 blocks; 7888 free (1 chunks); 304 used
PortalHeapMemory: 1024 total in 1 blocks; 768 free (0 chunks); 256 used
ExecutorState: 122880 total in 4 blocks; 61720 free (15 chunks); 61160 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
HashTableContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
HashBatchContext: 32888 total in 3 blocks; 14512 free (0 chunks); 18376 used
TupleSort: 822437960 total in 105 blocks; 424201336 free (5267294 chunks); 398236624 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
AggContext: 8192 total in 1 blocks; 8000 free (3 chunks); 192 used
ExprContext: 8192 total in 1 blocks; 8000 free (0 chunks); 192 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used
Relcache by OID: 253952 total in 5 blocks; 117488 free (8 chunks); 136464 used
CacheMemoryContext: 17262632 total in 25 blocks; 3573056 free (0 chunks); 13689576 used


Explore the seven wonders of the world Learn more!

pgsql-performance by date:

Previous
From: "Peter Koczan"
Date:
Subject: Re: sequence query performance issues
Next
From: Peter Schuller
Date:
Subject: Non-blocking vacuum full