Re: Bad row estimates - Mailing list pgsql-performance

From Alex Adriaanse
Subject Re: Bad row estimates
Date
Msg-id 440F1115.4020400@innovacomputing.com
Whole thread Raw
In response to Re: Bad row estimates  (Greg Stark <gsstark@mit.edu>)
Responses Re: Bad row estimates  (Greg Stark <gsstark@mit.edu>)
List pgsql-performance
Thank you all for your valuable input.  I have tried creating a partial
index, a GIST index, and a GIST + partial index, as suggested, but it
does not seem to make a significant difference.  For instance:

CREATE INDEX test_table_1_interval_idx ON test_table_1 USING GIST
    (box(point(start_ts::abstime::integer, start_ts::abstime::integer), point(end_ts::abstime::integer,
end_ts::abstime::integer)))
    WHERE id = g_id;

ANALYZE test_table_1;

EXPLAIN ANALYZE SELECT count(*) FROM test_table_1
    INNER JOIN test_table_2 ON (test_table_2.s_id=13300613 AND test_table_1.id = test_table_2.n_id)
    WHERE box(point(start_ts::abstime::integer, start_ts::abstime::integer), point(end_ts::abstime::integer,
end_ts::abstime::integer))
        ~
box(point(now()::abstime::integer,now()::abstime::integer),point(now()::abstime::integer,now()::abstime::integer))
    AND test_table_1.id = test_table_1.g_id;
                                                                    QUERY PLAN
                                        

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=15.09..15.10 rows=1 width=0) (actual time=69.771..69.772 rows=1 loops=1)
   ->  Nested Loop  (cost=9.06..15.08 rows=1 width=0) (actual time=69.752..69.752 rows=0 loops=1)
         ->  Index Scan using test_table_1_interval_idx on test_table_1  (cost=0.07..4.07 rows=1 width=22) (actual
time=2.930..3.607rows=135 loops=1) 
               Index Cond: (box(point((((start_ts)::abstime)::integer)::double precision,
(((start_ts)::abstime)::integer)::doubleprecision), point((((end_ts)::abstime)::integer)::double precision,
(((end_ts)::abstime)::integer)::doubleprecision)) ~ box(point((((now())::abstime)::integer)::double precision,
(((now())::abstime)::integer)::doubleprecision), point((((now())::abstime)::integer)::double precision,
(((now())::abstime)::integer)::doubleprecision))) 
         ->  Bitmap Heap Scan on test_table_2  (cost=8.99..11.00 rows=1 width=12) (actual time=0.486..0.486 rows=0
loops=135)
               Recheck Cond: ((test_table_2.s_id = 13300613::numeric) AND ("outer".id = test_table_2.n_id))
               ->  BitmapAnd  (cost=8.99..8.99 rows=1 width=0) (actual time=0.485..0.485 rows=0 loops=135)
                     ->  Bitmap Index Scan on test_table_2_s_id  (cost=0.00..2.17 rows=48 width=0) (actual
time=0.015..0.015rows=1 loops=135) 
                           Index Cond: (s_id = 13300613::numeric)
                     ->  Bitmap Index Scan on test_table_2_n_id  (cost=0.00..6.57 rows=735 width=0) (actual
time=0.467..0.467rows=815 loops=135) 
                           Index Cond: ("outer".id = test_table_2.n_id)
 Total runtime: 69.961 ms

(Note: without the GIST index the query currently runs in about 65ms)

Its row estimates are still way off.  As a matter of fact, it almost
seems as if the index doesn't affect row estimates at all.

What would you guys suggest?

Thanks,

Alex

Greg Stark wrote:
> You could actually take short cuts using expression indexes to do this. If it
> works out well then you might want to implement a real data type to avoid the
> overhead of the SQL conversion functions.
>
> Here's an example. If I were to do this for real I would look for a better
> datatype than the box datatype and I would wrap the whole conversion in an SQL
> function. But this will serve to demonstrate:
>
> stark=> create table interval_test (start_ts timestamp with time zone, end_ts timestamp with time zone);
> CREATE TABLE
>
> stark=> create index interval_idx on interval_test using gist (box(point(start_ts::abstime::integer,
end_ts::abstime::integer), point(start_ts::abstime::integer, end_ts::abstime::integer))); 
> CREATE INDEX
>
> stark=> explain select * from interval_test where
box(point(now()::abstime::integer,now()::abstime::integer),point(now()::abstime::integer,now()::abstime::integer))~
box(point(start_ts::abstime::integer,end_ts::abstime::integer) , point(start_ts::abstime::integer,
end_ts::abstime::integer));
>
                                                                                                       QUERY PLAN

                                                                                               
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using interval_idx on interval_test  (cost=0.07..8.36 rows=2 width=16)
>    Index Cond: (box(point((((now())::abstime)::integer)::double precision, (((now())::abstime)::integer)::double
precision),point((((now())::abstime)::integer)::double precision, (((now())::abstime)::integer)::double precision)) ~
box(point((((start_ts)::abstime)::integer)::doubleprecision, (((end_ts)::abstime)::integer)::double precision),
point((((start_ts)::abstime)::integer)::doubleprecision, (((end_ts)::abstime)::integer)::double precision))) 
> (2 rows)
>
>

pgsql-performance by date:

Previous
From: Ruben Rubio Rey
Date:
Subject: Is good idea an array of 365 elements in a cell of a table, in order to perform searchs?
Next
From: Greg Stark
Date:
Subject: Re: Bad row estimates