Home > mailing lists

Re: Bad row estimates - Mailing list pgsql-performance

From	Greg Stark
Subject	Re: Bad row estimates
Date	March 4, 2006 03:01:53
Msg-id	87wtfasng0.fsf@stark.xeocode.com Whole thread Raw
In response to	Bad row estimates (Alex Adriaanse <alex@innovacomputing.com>)
Responses	Re: Bad row estimates Re: Bad row estimates Re: Bad row estimates
List	pgsql-performance

Tree view

Alex Adriaanse <alex@innovacomputing.com> writes:

> SELECT count(*) FROM test_table_1
>     INNER JOIN test_table_2 ON
>         (test_table_2.s_id = 13300613 AND test_table_1.id = test_table_2.n_id)
>     WHERE now() BETWEEN test_table_1.start_ts AND test_table_1.end_ts
>     AND test_table_1.id = test_table_1.g_id;

I don't know if this is the entire answer but this query is touching on two of
Postgres's specific difficulties in analyzing statistics:

The "now() BETWEEN test_table_1.start_ts AND test_table_1.end_ts can't be
answered completely using a btree index. You could try using a GIST index here
but I'm not clear how much it would help you (or how much work it would be).

The "test_table_1.id = test_table_1.g_id" clause depends on intercolumn
"correlation" which Postgres doesn't make any attempt at analyzing. That's why
you've found that no matter how much you increase the statitics goal it can't
come up with a better estimate.

Actually the "now() between ..." clause also suffers from the inter-column
dependency issue which is why the estimates for it are off as well.

> However, if I add a boolean column named "equal_ids" to test_table_1 with
> the value (test_table_1.id = test_table_1.g_id), and use that in the query
> instead of the equality it does make a much better row estimate.

One thing you could try is making an expression index on that expression. You
don't need to actually have a redundant column bloating your table. In 8.1 I
believe Postgres will even calculate statistics for these expression indexes.

In fact you could go one step further and try a partial index like:

  CREATE INDEX eq_start ON test_table (start_ts) WHERE id = g_id

The ideal combination might be to create a partial GIST index :)

(I don't think the end_ts in the index is buying you much, despite its
appearance in the Index Cond in the plan.)

--
greg

pgsql-performance by date:

From: Alex Adriaanse
Date: 03 March 2006, 21:11:01
Subject: Bad row estimates

From: Greg Stark
Date: 04 March 2006, 04:15:12
Subject: Re: Bad row estimates

Re: Bad row estimates - Mailing list pgsql-performance

Previous

Next