Re: Bad row estimates - Mailing list pgsql-performance

From Greg Stark
Subject Re: Bad row estimates
Date
Msg-id 87wtfasng0.fsf@stark.xeocode.com
Whole thread Raw
In response to Bad row estimates  (Alex Adriaanse <alex@innovacomputing.com>)
Responses Re: Bad row estimates  (Greg Stark <gsstark@mit.edu>)
Re: Bad row estimates  ("Jim C. Nasby" <jnasby@pervasive.com>)
Re: Bad row estimates  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Alex Adriaanse <alex@innovacomputing.com> writes:

> SELECT count(*) FROM test_table_1
>     INNER JOIN test_table_2 ON
>         (test_table_2.s_id = 13300613 AND test_table_1.id = test_table_2.n_id)
>     WHERE now() BETWEEN test_table_1.start_ts AND test_table_1.end_ts
>     AND test_table_1.id = test_table_1.g_id;

I don't know if this is the entire answer but this query is touching on two of
Postgres's specific difficulties in analyzing statistics:

The "now() BETWEEN test_table_1.start_ts AND test_table_1.end_ts can't be
answered completely using a btree index. You could try using a GIST index here
but I'm not clear how much it would help you (or how much work it would be).

The "test_table_1.id = test_table_1.g_id" clause depends on intercolumn
"correlation" which Postgres doesn't make any attempt at analyzing. That's why
you've found that no matter how much you increase the statitics goal it can't
come up with a better estimate.

Actually the "now() between ..." clause also suffers from the inter-column
dependency issue which is why the estimates for it are off as well.

> However, if I add a boolean column named "equal_ids" to test_table_1 with
> the value (test_table_1.id = test_table_1.g_id), and use that in the query
> instead of the equality it does make a much better row estimate.

One thing you could try is making an expression index on that expression. You
don't need to actually have a redundant column bloating your table. In 8.1 I
believe Postgres will even calculate statistics for these expression indexes.

In fact you could go one step further and try a partial index like:

  CREATE INDEX eq_start ON test_table (start_ts) WHERE id = g_id

The ideal combination might be to create a partial GIST index :)

(I don't think the end_ts in the index is buying you much, despite its
appearance in the Index Cond in the plan.)

--
greg

pgsql-performance by date:

Previous
From: Alex Adriaanse
Date:
Subject: Bad row estimates
Next
From: Greg Stark
Date:
Subject: Re: Bad row estimates