Bad row estimates - Mailing list pgsql-performance

From Alex Adriaanse
Subject Bad row estimates
Date
Msg-id 4408E91C.90104@innovacomputing.com
Whole thread Raw
Responses Re: Bad row estimates
List pgsql-performance
Hello,

I am doing some query optimizations for one of my clients who runs
PostgreSQL 8.1.1, and am trying to cut down on the runtime of this
particular query as it runs very frequently:

SELECT count(*) FROM test_table_1
    INNER JOIN test_table_2 ON
        (test_table_2.s_id = 13300613 AND test_table_1.id = test_table_2.n_id)
    WHERE now() BETWEEN test_table_1.start_ts AND test_table_1.end_ts
    AND test_table_1.id = test_table_1.g_id;


The related tables are as follows:

           Table "public.test_table_1"
  Column  |           Type           | Modifiers
----------+--------------------------+-----------
 id       | numeric(20,0)            | not null
 g_id     | numeric(20,0)            |
 start_ts | timestamp with time zone |
 end_ts   | timestamp with time zone |
Indexes:
    "test_table_1_pkey" PRIMARY KEY, btree (id)
    "test_table_1_ts_index" btree (start_ts, end_ts)

    Table "public.test_table_2"
 Column |     Type      | Modifiers
--------+---------------+-----------
 s_id   | numeric(20,0) |
 n_id   | numeric(20,0) |
Indexes:
    "test_table_2_n_id" btree (n_id)
    "test_table_2_s_id" btree (s_id)


When I run the query it uses the following plan:

 Aggregate  (cost=217.17..217.18 rows=1 width=0) (actual time=107.829..107.830 rows=1 loops=1)
   ->  Nested Loop  (cost=11.09..217.16 rows=1 width=0) (actual time=107.817..107.817 rows=0 loops=1)
         ->  Index Scan using test_table_1_ts_index on test_table_1  (cost=0.01..204.05 rows=1 width=22) (actual
time=3.677..4.388rows=155 loops=1) 
               Index Cond: ((now() >= start_ts) AND (now() <= end_ts))
               Filter: (id = g_id)
         ->  Bitmap Heap Scan on test_table_2  (cost=11.09..13.10 rows=1 width=12) (actual time=0.664..0.664 rows=0
loops=155)
               Recheck Cond: ((test_table_2.s_id = 13300613::numeric) AND ("outer".id = test_table_2.n_id))
               ->  BitmapAnd  (cost=11.09..11.09 rows=1 width=0) (actual time=0.662..0.662 rows=0 loops=155)
                     ->  Bitmap Index Scan on test_table_2_s_id  (cost=0.00..2.48 rows=136 width=0) (actual
time=0.014..0.014rows=1 loops=155) 
                           Index Cond: (s_id = 13300613::numeric)
                     ->  Bitmap Index Scan on test_table_2_n_id  (cost=0.00..8.36 rows=959 width=0) (actual
time=0.645..0.645rows=891 loops=155) 
                           Index Cond: ("outer".id = test_table_2.n_id)
 Total runtime: 107.947 ms


However, when I turn off enable_nestloop it runs as follows:

 Aggregate  (cost=465.86..465.87 rows=1 width=0) (actual time=5.763..5.764 rows=1 loops=1)
   ->  Merge Join  (cost=465.16..465.86 rows=1 width=0) (actual time=5.752..5.752 rows=0 loops=1)
         Merge Cond: ("outer".id = "inner".n_id)
         ->  Sort  (cost=204.06..204.07 rows=1 width=22) (actual time=5.505..5.505 rows=1 loops=1)
               Sort Key: test_table_1.id
               ->  Index Scan using test_table_1_ts_index on test_table_1  (cost=0.01..204.05 rows=1 width=22) (actual
time=4.458..4.995rows=155 loops=1) 
                     Index Cond: ((now() >= start_ts) AND (now() <= end_ts))
                     Filter: (id = g_id)
         ->  Sort  (cost=261.10..261.44 rows=136 width=12) (actual time=0.235..0.236 rows=1 loops=1)
               Sort Key: test_table_2.n_id
               ->  Bitmap Heap Scan on test_table_2  (cost=2.48..256.28 rows=136 width=12) (actual time=0.218..0.219
rows=1loops=1) 
                     Recheck Cond: (s_id = 13300613::numeric)
                     ->  Bitmap Index Scan on test_table_2_s_id  (cost=0.00..2.48 rows=136 width=0) (actual
time=0.168..0.168rows=1 loops=1) 
                           Index Cond: (s_id = 13300613::numeric)
 Total runtime: 5.893 ms

As you can see the total runtime drops from 108ms to 6ms, indicating
that it is much better to use a Merge Join rather than a Nested Loop in
this case.  It looks like the planner chooses a Nested Loop because it
incorrectly estimates the (now() BETWEEN test_table_1.start_ts AND
test_table_1.end_ts AND test_table_1.id = test_table_1.g_id) condition
to return 1 row, whereas in reality it returns 155 rows.

I have set statistics for test_table_1.id and test_table_1.g_id to 1000,
and have ANALYZEd both tables.  This does not seem to make a bit of a
difference -- it keeps thinking the criteria will only return 1 row.
However, if I add a boolean column named "equal_ids" to test_table_1
with the value (test_table_1.id = test_table_1.g_id), and use that in
the query instead of the equality it does make a much better row
estimate.  Essentially:

ALTER TABLE test_table_1 ADD equal_ids BOOLEAN;
UPDATE test_table_1 SET equal_ids = (id = g_id);
VACUUM FULL test_table_1;
ANALYZE VERBOSE test_table_1;
    INFO:  analyzing "public.test_table_1"
    INFO:  "test_table_1": scanned 83 of 83 pages, containing 8827 live rows and 0 dead rows; 8827 rows in sample, 8827
estimatedtotal rows 


The plans listed above already reflect these changes.  When I substitute
"test_table_1.id = test_table_1.g_id" with "test_table_1.equal_ids" in
the query I get the following plan:

 Aggregate  (cost=469.76..469.77 rows=1 width=0) (actual time=5.711..5.712 rows=1 loops=1)
   ->  Merge Join  (cost=468.52..469.76 rows=2 width=0) (actual time=5.703..5.703 rows=0 loops=1)
         Merge Cond: ("outer".id = "inner".n_id)
         ->  Sort  (cost=207.42..207.69 rows=108 width=11) (actual time=5.462..5.462 rows=1 loops=1)
               Sort Key: test_table_1.id
               ->  Index Scan using test_table_1_ts_index on test_table_1  (cost=0.01..203.77 rows=108 width=11)
(actualtime=4.547..4.984 rows=155 loops=1) 
                     Index Cond: ((now() >= start_ts) AND (now() <= end_ts))
                     Filter: equal_ids
         ->  Sort  (cost=261.10..261.44 rows=136 width=12) (actual time=0.231..0.232 rows=1 loops=1)
               Sort Key: test_table_2.n_id
               ->  Bitmap Heap Scan on test_table_2  (cost=2.48..256.28 rows=136 width=12) (actual time=0.212..0.213
rows=1loops=1) 
                     Recheck Cond: (s_id = 13300613::numeric)
                     ->  Bitmap Index Scan on test_table_2_s_id  (cost=0.00..2.48 rows=136 width=0) (actual
time=0.177..0.177rows=1 loops=1) 
                           Index Cond: (s_id = 13300613::numeric)
 Total runtime: 5.830 ms

The row estimate (108) is much better in this case.

Here's some information on the data in these tables:

SELECT count(*) FROM test_table_1;
 count
-------
  8827

SELECT count(*) FROM test_table_2;
  count
---------
 1149533

SELECT equal_ids, count(equal_ids) FROM test_table_1 GROUP BY equal_ids;
 equal_ids | count
-----------+-------
 f         |   281
 t         |  8546

SELECT equal_ids, count(equal_ids) FROM test_table_1 WHERE now() BETWEEN test_table_1.start_ts AND test_table_1.end_ts
GROUPBY equal_ids; 
 equal_ids | count
-----------+-------
 t         |   155

SELECT attname, null_frac, n_distinct FROM pg_stats WHERE tablename = 'test_table_1' AND attname IN ('id', 'g_id',
'equal_ids');
  attname  | null_frac | n_distinct
-----------+-----------+------------
 id        |         0 |         -1
 g_id      |         0 |  -0.968166
 equal_ids |         0 |          2


Any ideas on how I could go about getting PostgreSQL to use a Merge Join
without having to resort to using the equal_ids column or disabling
enable_nestloop?  Let me know if you need any additional info.

Thanks!

Alex

pgsql-performance by date:

Previous
From: Greg Stark
Date:
Subject: Re: Like 'name%' is not using index
Next
From: Greg Stark
Date:
Subject: Re: Bad row estimates