Thread: Bad row estimates

Bad row estimates

From
Alex Adriaanse
Date:
Hello,

I am doing some query optimizations for one of my clients who runs
PostgreSQL 8.1.1, and am trying to cut down on the runtime of this
particular query as it runs very frequently:

SELECT count(*) FROM test_table_1
    INNER JOIN test_table_2 ON
        (test_table_2.s_id = 13300613 AND test_table_1.id = test_table_2.n_id)
    WHERE now() BETWEEN test_table_1.start_ts AND test_table_1.end_ts
    AND test_table_1.id = test_table_1.g_id;


The related tables are as follows:

           Table "public.test_table_1"
  Column  |           Type           | Modifiers
----------+--------------------------+-----------
 id       | numeric(20,0)            | not null
 g_id     | numeric(20,0)            |
 start_ts | timestamp with time zone |
 end_ts   | timestamp with time zone |
Indexes:
    "test_table_1_pkey" PRIMARY KEY, btree (id)
    "test_table_1_ts_index" btree (start_ts, end_ts)

    Table "public.test_table_2"
 Column |     Type      | Modifiers
--------+---------------+-----------
 s_id   | numeric(20,0) |
 n_id   | numeric(20,0) |
Indexes:
    "test_table_2_n_id" btree (n_id)
    "test_table_2_s_id" btree (s_id)


When I run the query it uses the following plan:

 Aggregate  (cost=217.17..217.18 rows=1 width=0) (actual time=107.829..107.830 rows=1 loops=1)
   ->  Nested Loop  (cost=11.09..217.16 rows=1 width=0) (actual time=107.817..107.817 rows=0 loops=1)
         ->  Index Scan using test_table_1_ts_index on test_table_1  (cost=0.01..204.05 rows=1 width=22) (actual
time=3.677..4.388rows=155 loops=1) 
               Index Cond: ((now() >= start_ts) AND (now() <= end_ts))
               Filter: (id = g_id)
         ->  Bitmap Heap Scan on test_table_2  (cost=11.09..13.10 rows=1 width=12) (actual time=0.664..0.664 rows=0
loops=155)
               Recheck Cond: ((test_table_2.s_id = 13300613::numeric) AND ("outer".id = test_table_2.n_id))
               ->  BitmapAnd  (cost=11.09..11.09 rows=1 width=0) (actual time=0.662..0.662 rows=0 loops=155)
                     ->  Bitmap Index Scan on test_table_2_s_id  (cost=0.00..2.48 rows=136 width=0) (actual
time=0.014..0.014rows=1 loops=155) 
                           Index Cond: (s_id = 13300613::numeric)
                     ->  Bitmap Index Scan on test_table_2_n_id  (cost=0.00..8.36 rows=959 width=0) (actual
time=0.645..0.645rows=891 loops=155) 
                           Index Cond: ("outer".id = test_table_2.n_id)
 Total runtime: 107.947 ms


However, when I turn off enable_nestloop it runs as follows:

 Aggregate  (cost=465.86..465.87 rows=1 width=0) (actual time=5.763..5.764 rows=1 loops=1)
   ->  Merge Join  (cost=465.16..465.86 rows=1 width=0) (actual time=5.752..5.752 rows=0 loops=1)
         Merge Cond: ("outer".id = "inner".n_id)
         ->  Sort  (cost=204.06..204.07 rows=1 width=22) (actual time=5.505..5.505 rows=1 loops=1)
               Sort Key: test_table_1.id
               ->  Index Scan using test_table_1_ts_index on test_table_1  (cost=0.01..204.05 rows=1 width=22) (actual
time=4.458..4.995rows=155 loops=1) 
                     Index Cond: ((now() >= start_ts) AND (now() <= end_ts))
                     Filter: (id = g_id)
         ->  Sort  (cost=261.10..261.44 rows=136 width=12) (actual time=0.235..0.236 rows=1 loops=1)
               Sort Key: test_table_2.n_id
               ->  Bitmap Heap Scan on test_table_2  (cost=2.48..256.28 rows=136 width=12) (actual time=0.218..0.219
rows=1loops=1) 
                     Recheck Cond: (s_id = 13300613::numeric)
                     ->  Bitmap Index Scan on test_table_2_s_id  (cost=0.00..2.48 rows=136 width=0) (actual
time=0.168..0.168rows=1 loops=1) 
                           Index Cond: (s_id = 13300613::numeric)
 Total runtime: 5.893 ms

As you can see the total runtime drops from 108ms to 6ms, indicating
that it is much better to use a Merge Join rather than a Nested Loop in
this case.  It looks like the planner chooses a Nested Loop because it
incorrectly estimates the (now() BETWEEN test_table_1.start_ts AND
test_table_1.end_ts AND test_table_1.id = test_table_1.g_id) condition
to return 1 row, whereas in reality it returns 155 rows.

I have set statistics for test_table_1.id and test_table_1.g_id to 1000,
and have ANALYZEd both tables.  This does not seem to make a bit of a
difference -- it keeps thinking the criteria will only return 1 row.
However, if I add a boolean column named "equal_ids" to test_table_1
with the value (test_table_1.id = test_table_1.g_id), and use that in
the query instead of the equality it does make a much better row
estimate.  Essentially:

ALTER TABLE test_table_1 ADD equal_ids BOOLEAN;
UPDATE test_table_1 SET equal_ids = (id = g_id);
VACUUM FULL test_table_1;
ANALYZE VERBOSE test_table_1;
    INFO:  analyzing "public.test_table_1"
    INFO:  "test_table_1": scanned 83 of 83 pages, containing 8827 live rows and 0 dead rows; 8827 rows in sample, 8827
estimatedtotal rows 


The plans listed above already reflect these changes.  When I substitute
"test_table_1.id = test_table_1.g_id" with "test_table_1.equal_ids" in
the query I get the following plan:

 Aggregate  (cost=469.76..469.77 rows=1 width=0) (actual time=5.711..5.712 rows=1 loops=1)
   ->  Merge Join  (cost=468.52..469.76 rows=2 width=0) (actual time=5.703..5.703 rows=0 loops=1)
         Merge Cond: ("outer".id = "inner".n_id)
         ->  Sort  (cost=207.42..207.69 rows=108 width=11) (actual time=5.462..5.462 rows=1 loops=1)
               Sort Key: test_table_1.id
               ->  Index Scan using test_table_1_ts_index on test_table_1  (cost=0.01..203.77 rows=108 width=11)
(actualtime=4.547..4.984 rows=155 loops=1) 
                     Index Cond: ((now() >= start_ts) AND (now() <= end_ts))
                     Filter: equal_ids
         ->  Sort  (cost=261.10..261.44 rows=136 width=12) (actual time=0.231..0.232 rows=1 loops=1)
               Sort Key: test_table_2.n_id
               ->  Bitmap Heap Scan on test_table_2  (cost=2.48..256.28 rows=136 width=12) (actual time=0.212..0.213
rows=1loops=1) 
                     Recheck Cond: (s_id = 13300613::numeric)
                     ->  Bitmap Index Scan on test_table_2_s_id  (cost=0.00..2.48 rows=136 width=0) (actual
time=0.177..0.177rows=1 loops=1) 
                           Index Cond: (s_id = 13300613::numeric)
 Total runtime: 5.830 ms

The row estimate (108) is much better in this case.

Here's some information on the data in these tables:

SELECT count(*) FROM test_table_1;
 count
-------
  8827

SELECT count(*) FROM test_table_2;
  count
---------
 1149533

SELECT equal_ids, count(equal_ids) FROM test_table_1 GROUP BY equal_ids;
 equal_ids | count
-----------+-------
 f         |   281
 t         |  8546

SELECT equal_ids, count(equal_ids) FROM test_table_1 WHERE now() BETWEEN test_table_1.start_ts AND test_table_1.end_ts
GROUPBY equal_ids; 
 equal_ids | count
-----------+-------
 t         |   155

SELECT attname, null_frac, n_distinct FROM pg_stats WHERE tablename = 'test_table_1' AND attname IN ('id', 'g_id',
'equal_ids');
  attname  | null_frac | n_distinct
-----------+-----------+------------
 id        |         0 |         -1
 g_id      |         0 |  -0.968166
 equal_ids |         0 |          2


Any ideas on how I could go about getting PostgreSQL to use a Merge Join
without having to resort to using the equal_ids column or disabling
enable_nestloop?  Let me know if you need any additional info.

Thanks!

Alex

Re: Bad row estimates

From
Greg Stark
Date:
Alex Adriaanse <alex@innovacomputing.com> writes:

> SELECT count(*) FROM test_table_1
>     INNER JOIN test_table_2 ON
>         (test_table_2.s_id = 13300613 AND test_table_1.id = test_table_2.n_id)
>     WHERE now() BETWEEN test_table_1.start_ts AND test_table_1.end_ts
>     AND test_table_1.id = test_table_1.g_id;

I don't know if this is the entire answer but this query is touching on two of
Postgres's specific difficulties in analyzing statistics:

The "now() BETWEEN test_table_1.start_ts AND test_table_1.end_ts can't be
answered completely using a btree index. You could try using a GIST index here
but I'm not clear how much it would help you (or how much work it would be).

The "test_table_1.id = test_table_1.g_id" clause depends on intercolumn
"correlation" which Postgres doesn't make any attempt at analyzing. That's why
you've found that no matter how much you increase the statitics goal it can't
come up with a better estimate.

Actually the "now() between ..." clause also suffers from the inter-column
dependency issue which is why the estimates for it are off as well.

> However, if I add a boolean column named "equal_ids" to test_table_1 with
> the value (test_table_1.id = test_table_1.g_id), and use that in the query
> instead of the equality it does make a much better row estimate.

One thing you could try is making an expression index on that expression. You
don't need to actually have a redundant column bloating your table. In 8.1 I
believe Postgres will even calculate statistics for these expression indexes.

In fact you could go one step further and try a partial index like:

  CREATE INDEX eq_start ON test_table (start_ts) WHERE id = g_id

The ideal combination might be to create a partial GIST index :)

(I don't think the end_ts in the index is buying you much, despite its
appearance in the Index Cond in the plan.)

--
greg

Re: Bad row estimates

From
Greg Stark
Date:
Greg Stark <gsstark@MIT.EDU> writes:

> The "now() BETWEEN test_table_1.start_ts AND test_table_1.end_ts can't be
> answered completely using a btree index. You could try using a GIST index here
> but I'm not clear how much it would help you (or how much work it would be).

To add to my own comment you could also try creating two separate indexes on
start_ts and end_ts. Postgres can combine the two indexes using a bitmap scan.
It's not a complete solution like a GIST index would be though.

It also doesn't help at all with the planner estimating how many records will
actually match.

--
greg

Re: Bad row estimates

From
"Jim C. Nasby"
Date:
On Sat, Mar 04, 2006 at 02:01:35AM -0500, Greg Stark wrote:
> Alex Adriaanse <alex@innovacomputing.com> writes:
>
> > SELECT count(*) FROM test_table_1
> >     INNER JOIN test_table_2 ON
> >         (test_table_2.s_id = 13300613 AND test_table_1.id = test_table_2.n_id)
> >     WHERE now() BETWEEN test_table_1.start_ts AND test_table_1.end_ts
> >     AND test_table_1.id = test_table_1.g_id;

Something else that helps in cases like this is to place both an upper
and lower boundary on one (or both) fields if possible. For example, if
you know that start_ts and end_ts will always be within 1 hour of each
other, adding the following will help:

AND start_ts >= now()-'1 hour'::interval AND end_ts <= now()+'1
hour'::interval
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Bad row estimates

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> (I don't think the end_ts in the index is buying you much, despite its
> appearance in the Index Cond in the plan.)

Well, it saves some trips to the heap, but the indexscan is still going
to run from the beginning of the index to start_ts = now(), because
btree has no idea that there's any correlation between the two index
columns.

If you could put some a-priori bound on the interval width, you could
add a WHERE constraint "AND now() - max_width <= start_ts", which'd
constrain the index scan and possibly also get you a better planner
estimate.  Otherwise I think you really need a special datatype for time
intervals and a GIST or r-tree index on it :-(.

            regards, tom lane

Re: Bad row estimates

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Otherwise I think you really need a special datatype for time
> intervals and a GIST or r-tree index on it :-(.

You could actually take short cuts using expression indexes to do this. If it
works out well then you might want to implement a real data type to avoid the
overhead of the SQL conversion functions.

Here's an example. If I were to do this for real I would look for a better
datatype than the box datatype and I would wrap the whole conversion in an SQL
function. But this will serve to demonstrate:

stark=> create table interval_test (start_ts timestamp with time zone, end_ts timestamp with time zone);
CREATE TABLE

stark=> create index interval_idx on interval_test using gist (box(point(start_ts::abstime::integer,
end_ts::abstime::integer), point(start_ts::abstime::integer, end_ts::abstime::integer))); 
CREATE INDEX

stark=> explain select * from interval_test where
box(point(now()::abstime::integer,now()::abstime::integer),point(now()::abstime::integer,now()::abstime::integer))~
box(point(start_ts::abstime::integer,end_ts::abstime::integer) , point(start_ts::abstime::integer,
end_ts::abstime::integer));

                                                                                                     QUERY PLAN

                                                                                             

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using interval_idx on interval_test  (cost=0.07..8.36 rows=2 width=16)
   Index Cond: (box(point((((now())::abstime)::integer)::double precision, (((now())::abstime)::integer)::double
precision),point((((now())::abstime)::integer)::double precision, (((now())::abstime)::integer)::double precision)) ~
box(point((((start_ts)::abstime)::integer)::doubleprecision, (((end_ts)::abstime)::integer)::double precision),
point((((start_ts)::abstime)::integer)::doubleprecision, (((end_ts)::abstime)::integer)::double precision))) 
(2 rows)

--
greg

Re: Bad row estimates

From
Alex Adriaanse
Date:
Thank you all for your valuable input.  I have tried creating a partial
index, a GIST index, and a GIST + partial index, as suggested, but it
does not seem to make a significant difference.  For instance:

CREATE INDEX test_table_1_interval_idx ON test_table_1 USING GIST
    (box(point(start_ts::abstime::integer, start_ts::abstime::integer), point(end_ts::abstime::integer,
end_ts::abstime::integer)))
    WHERE id = g_id;

ANALYZE test_table_1;

EXPLAIN ANALYZE SELECT count(*) FROM test_table_1
    INNER JOIN test_table_2 ON (test_table_2.s_id=13300613 AND test_table_1.id = test_table_2.n_id)
    WHERE box(point(start_ts::abstime::integer, start_ts::abstime::integer), point(end_ts::abstime::integer,
end_ts::abstime::integer))
        ~
box(point(now()::abstime::integer,now()::abstime::integer),point(now()::abstime::integer,now()::abstime::integer))
    AND test_table_1.id = test_table_1.g_id;
                                                                    QUERY PLAN
                                        

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=15.09..15.10 rows=1 width=0) (actual time=69.771..69.772 rows=1 loops=1)
   ->  Nested Loop  (cost=9.06..15.08 rows=1 width=0) (actual time=69.752..69.752 rows=0 loops=1)
         ->  Index Scan using test_table_1_interval_idx on test_table_1  (cost=0.07..4.07 rows=1 width=22) (actual
time=2.930..3.607rows=135 loops=1) 
               Index Cond: (box(point((((start_ts)::abstime)::integer)::double precision,
(((start_ts)::abstime)::integer)::doubleprecision), point((((end_ts)::abstime)::integer)::double precision,
(((end_ts)::abstime)::integer)::doubleprecision)) ~ box(point((((now())::abstime)::integer)::double precision,
(((now())::abstime)::integer)::doubleprecision), point((((now())::abstime)::integer)::double precision,
(((now())::abstime)::integer)::doubleprecision))) 
         ->  Bitmap Heap Scan on test_table_2  (cost=8.99..11.00 rows=1 width=12) (actual time=0.486..0.486 rows=0
loops=135)
               Recheck Cond: ((test_table_2.s_id = 13300613::numeric) AND ("outer".id = test_table_2.n_id))
               ->  BitmapAnd  (cost=8.99..8.99 rows=1 width=0) (actual time=0.485..0.485 rows=0 loops=135)
                     ->  Bitmap Index Scan on test_table_2_s_id  (cost=0.00..2.17 rows=48 width=0) (actual
time=0.015..0.015rows=1 loops=135) 
                           Index Cond: (s_id = 13300613::numeric)
                     ->  Bitmap Index Scan on test_table_2_n_id  (cost=0.00..6.57 rows=735 width=0) (actual
time=0.467..0.467rows=815 loops=135) 
                           Index Cond: ("outer".id = test_table_2.n_id)
 Total runtime: 69.961 ms

(Note: without the GIST index the query currently runs in about 65ms)

Its row estimates are still way off.  As a matter of fact, it almost
seems as if the index doesn't affect row estimates at all.

What would you guys suggest?

Thanks,

Alex

Greg Stark wrote:
> You could actually take short cuts using expression indexes to do this. If it
> works out well then you might want to implement a real data type to avoid the
> overhead of the SQL conversion functions.
>
> Here's an example. If I were to do this for real I would look for a better
> datatype than the box datatype and I would wrap the whole conversion in an SQL
> function. But this will serve to demonstrate:
>
> stark=> create table interval_test (start_ts timestamp with time zone, end_ts timestamp with time zone);
> CREATE TABLE
>
> stark=> create index interval_idx on interval_test using gist (box(point(start_ts::abstime::integer,
end_ts::abstime::integer), point(start_ts::abstime::integer, end_ts::abstime::integer))); 
> CREATE INDEX
>
> stark=> explain select * from interval_test where
box(point(now()::abstime::integer,now()::abstime::integer),point(now()::abstime::integer,now()::abstime::integer))~
box(point(start_ts::abstime::integer,end_ts::abstime::integer) , point(start_ts::abstime::integer,
end_ts::abstime::integer));
>
                                                                                                       QUERY PLAN

                                                                                               
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using interval_idx on interval_test  (cost=0.07..8.36 rows=2 width=16)
>    Index Cond: (box(point((((now())::abstime)::integer)::double precision, (((now())::abstime)::integer)::double
precision),point((((now())::abstime)::integer)::double precision, (((now())::abstime)::integer)::double precision)) ~
box(point((((start_ts)::abstime)::integer)::doubleprecision, (((end_ts)::abstime)::integer)::double precision),
point((((start_ts)::abstime)::integer)::doubleprecision, (((end_ts)::abstime)::integer)::double precision))) 
> (2 rows)
>
>

Re: Bad row estimates

From
Greg Stark
Date:
Alex Adriaanse <alex@innovacomputing.com> writes:

> Its row estimates are still way off.  As a matter of fact, it almost seems as
> if the index doesn't affect row estimates at all.

Indexes normally don't affect estimates. Expression indexes do effectively
create a new column to generate stats for, but that doesn't really help here
because there aren't any estimation functions for the geometric gist indexes.

> ->  BitmapAnd  (cost=8.99..8.99 rows=1 width=0) (actual time=0.485..0.485 rows=0 loops=135)
>      ->  Bitmap Index Scan on test_table_2_s_id  (cost=0.00..2.17 rows=48 width=0) (actual time=0.015..0.015 rows=1
loops=135)
>            Index Cond: (s_id = 13300613::numeric)
>      ->  Bitmap Index Scan on test_table_2_n_id  (cost=0.00..6.57 rows=735 width=0) (actual time=0.467..0.467
rows=815loops=135) 
>            Index Cond: ("outer".id = test_table_2.n_id)

If this query is representative then it seems you might be better off without
the test_table_2_n_id index. Of course this could be a problem if you need
that index for other purposes.

I'm puzzled how test_table_2_s_id's estimate isn't more precise. Are there
some values of s_id that are quite common and others that are unique? You
might try raising the statistics target on s_id.

Incidentally, 70ms is pretty good. I'm usually happy if all my mundane queries
are under 100ms and the more complex queries in the vicinity of 300ms. Trying
to optimize below 100ms is hard because you'll find a lot of variability in
the performance. Any extraneous disk i/o from checkpoints, vacuums, even other
services, will throw off your expectations.

--
greg