Thread: Problem with planner

Problem with planner

From
hubert depesz lubaczewski
Date:
Hi,
we have 8.3.11 installation on client site, with table, which looks like
this:

$ \d objects
                                              Table "public.objects"
         Column          |           Type           |                           Modifiers
-------------------------+--------------------------+---------------------------------------------------------------
...
 state                   | text                     |
...
 ending_tsz              | timestamp with time zone | default (now() + '4 mons'::interval)
...
Indexes:
    "objects_stat_user_id_creation_tsz" btree (state, user_id, creation_tsz)
    "objects_ending_tsz_active" btree (ending_tsz) WHERE state = 'active'::text
    "objects_ending_tsz_idx" btree (ending_tsz)
...


and we have a query:
select count(*) from objects where state='active'::text and ending_tsz <= ( select now() - '1 day'::interval );

Normally this query has been getting plan, using
objects_ending_tsz_active, which is sane and fast.

But today, without any sensible reason, it switched to:

                                                       QUERY PLAN
 

-------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=6719810.62..6719810.63 rows=1 width=0)
   InitPlan
     ->  Result  (cost=0.00..0.01 rows=1 width=0)
   ->  Bitmap Heap Scan on objects  (cost=1289719.58..6711422.56 rows=3355219 width=0)
         Recheck Cond: ((state = 'active'::text) AND (ending_tsz <= $0))
         ->  BitmapAnd  (cost=1289719.58..1289719.58 rows=3355219 width=0)
               ->  Bitmap Index Scan on objects_stat_user_id_creation_tsz  (cost=0.00..334318.95 rows=10065657 width=0)
                     Index Cond: (state = 'active'::text)
               ->  Bitmap Index Scan on objects_ending_tsz_idx  (cost=0.00..953722.77 rows=24986738 width=0)
                     Index Cond: (ending_tsz <= $0)
(10 rows)

running analyze objects 2 times in a row fixed the issue, but hour later
- the problem came back.

what can be wrong?

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/

Re: Problem with planner

From
Tom Lane
Date:
hubert depesz lubaczewski <depesz@depesz.com> writes:
> and we have a query:
> select count(*) from objects where state='active'::text and ending_tsz <= ( select now() - '1 day'::interval );

Try getting rid of the sub-select.  There might have been a reason to do
it like that ten years ago, but these days it's a guaranteed
pessimization.

            regards, tom lane

Re: Problem with planner

From
hubert depesz lubaczewski
Date:
On Mon, Aug 08, 2011 at 12:51:40PM -0400, Tom Lane wrote:
> hubert depesz lubaczewski <depesz@depesz.com> writes:
> > and we have a query:
> > select count(*) from objects where state='active'::text and ending_tsz <= ( select now() - '1 day'::interval );
>
> Try getting rid of the sub-select.  There might have been a reason to do
> it like that ten years ago, but these days it's a guaranteed
> pessimization.

looks like even worse plan:

                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=9168023.89..9168023.90 rows=1 width=0)
   ->  Bitmap Heap Scan on objects  (cost=336429.41..9147670.19 rows=8141478 width=0)
         Recheck Cond: (state = 'active'::text)
         Filter: (ending_tsz <= (now() - '1 day'::interval))
         ->  Bitmap Index Scan on objects_stat_user_id_creation_tsz  (cost=0.00..334394.04 rows=10064868 width=0)
               Index Cond: (state = 'active'::text)
(6 rows)

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/

Re: Problem with planner

From
pasman pasmański
Date:
Try to change index: objects_endings_tsz_active(state,endings_tsz)
where state='active'.

2011/8/8, hubert depesz lubaczewski <depesz@depesz.com>:
> Hi,
> we have 8.3.11 installation on client site, with table, which looks like
> this:
>
> $ \d objects
>                                               Table "public.objects"
>          Column          |           Type           |
>    Modifiers
> -------------------------+--------------------------+---------------------------------------------------------------
> ...
>  state                   | text                     |
> ...
>  ending_tsz              | timestamp with time zone | default (now() + '4
> mons'::interval)
> ...
> Indexes:
>     "objects_stat_user_id_creation_tsz" btree (state, user_id, creation_tsz)
>     "objects_ending_tsz_active" btree (ending_tsz) WHERE state =
> 'active'::text
>     "objects_ending_tsz_idx" btree (ending_tsz)
> ...
>
>
> and we have a query:
> select count(*) from objects where state='active'::text and ending_tsz <= (
> select now() - '1 day'::interval );
>
> Normally this query has been getting plan, using
> objects_ending_tsz_active, which is sane and fast.
>
> But today, without any sensible reason, it switched to:
>
>                                                        QUERY PLAN
>
>
-------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=6719810.62..6719810.63 rows=1 width=0)
>    InitPlan
>      ->  Result  (cost=0.00..0.01 rows=1 width=0)
>    ->  Bitmap Heap Scan on objects  (cost=1289719.58..6711422.56
> rows=3355219 width=0)
>          Recheck Cond: ((state = 'active'::text) AND (ending_tsz <= $0))
>          ->  BitmapAnd  (cost=1289719.58..1289719.58 rows=3355219 width=0)
>                ->  Bitmap Index Scan on objects_stat_user_id_creation_tsz
> (cost=0.00..334318.95 rows=10065657 width=0)
>                      Index Cond: (state = 'active'::text)
>                ->  Bitmap Index Scan on objects_ending_tsz_idx
> (cost=0.00..953722.77 rows=24986738 width=0)
>                      Index Cond: (ending_tsz <= $0)
> (10 rows)
>
> running analyze objects 2 times in a row fixed the issue, but hour later
> - the problem came back.
>
> what can be wrong?
>
> Best regards,
>
> depesz
>
> --
> The best thing about modern society is how easy it is to avoid contact with
> it.
>
> http://depesz.com/
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


--
------------
pasman

Re: Problem with planner

From
hubert depesz lubaczewski
Date:
On Mon, Aug 08, 2011 at 10:01:58PM +0200, pasman pasmański wrote:
> Try to change index: objects_endings_tsz_active(state,endings_tsz)
> where state='active'.

adding new indexes on this machine is quite complicated, as it's heavily
used production system.
And the current index should be chosen by planner anyway - it is after
analyze, but then it breaks again.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/

Re: Problem with planner

From
Cédric Villemain
Date:
2011/8/8 hubert depesz lubaczewski <depesz@depesz.com>:
> Hi,
> we have 8.3.11 installation on client site, with table, which looks like
> this:
>
> $ \d objects
>                                              Table "public.objects"
>         Column          |           Type           |                           Modifiers
> -------------------------+--------------------------+---------------------------------------------------------------
> ...
>  state                   | text                     |
> ...
>  ending_tsz              | timestamp with time zone | default (now() + '4 mons'::interval)
> ...
> Indexes:
>    "objects_stat_user_id_creation_tsz" btree (state, user_id, creation_tsz)
>    "objects_ending_tsz_active" btree (ending_tsz) WHERE state = 'active'::text
>    "objects_ending_tsz_idx" btree (ending_tsz)
> ...
>
>
> and we have a query:
> select count(*) from objects where state='active'::text and ending_tsz <= ( select now() - '1 day'::interval );
>
> Normally this query has been getting plan, using
> objects_ending_tsz_active, which is sane and fast.
>
> But today, without any sensible reason, it switched to:
>
>                                                       QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=6719810.62..6719810.63 rows=1 width=0)
>   InitPlan
>     ->  Result  (cost=0.00..0.01 rows=1 width=0)
>   ->  Bitmap Heap Scan on objects  (cost=1289719.58..6711422.56 rows=3355219 width=0)
>         Recheck Cond: ((state = 'active'::text) AND (ending_tsz <= $0))
>         ->  BitmapAnd  (cost=1289719.58..1289719.58 rows=3355219 width=0)
>               ->  Bitmap Index Scan on objects_stat_user_id_creation_tsz  (cost=0.00..334318.95 rows=10065657
width=0)
>                     Index Cond: (state = 'active'::text)
>               ->  Bitmap Index Scan on objects_ending_tsz_idx  (cost=0.00..953722.77 rows=24986738 width=0)
>                     Index Cond: (ending_tsz <= $0)
> (10 rows)
>
> running analyze objects 2 times in a row fixed the issue, but hour later
> - the problem came back.
>
> what can be wrong?

The plan turn bad without any new ANALYZE, right  ?
does the table increase more quickly now than before ? is it now way
larger than before ?
Also, do you have an explain with the 'good' plan ?
--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

Re: Problem with planner

From
hubert depesz lubaczewski
Date:
On Tue, Aug 09, 2011 at 05:11:09PM +0200, Cédric Villemain wrote:
> The plan turn bad without any new ANALYZE, right  ?

Right.

> does the table increase more quickly now than before ? is it now way
> larger than before ?
> Also, do you have an explain with the 'good' plan ?

changes in the objects table are more or less the same (in volume) day
to day.

as for good plan. sure. If i'll disable bitmap scans, I get:

$ explain analyze select count(*) from objects where state='active' and ending_tsz <= (select now() - '1 day'::interval
);
                                                                       QUERY PLAN
                                 

---------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=7651119.35..7651119.36 rows=1 width=0) (actual time=63.150..63.151 rows=1 loops=1)
   InitPlan
     ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.010..0.010 rows=1 loops=1)
   ->  Index Scan using objects_ending_tsz_active on objects  (cost=0.00..7642758.99 rows=3344138 width=0) (actual
time=63.131..63.131rows=0 loops=1) 
         Index Cond: (ending_tsz <= $0)
 Total runtime: 63.279 ms
(6 rows)

for comparison, normal plan, with enable_bitmapscan = true:

$ explain  select count(*) from objects where state='active' and ending_tsz <= (select now() - '1 day'::interval );
                                                       QUERY PLAN
 

-------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=6726333.89..6726333.90 rows=1 width=0)
   InitPlan
     ->  Result  (cost=0.00..0.01 rows=1 width=0)
   ->  Bitmap Heap Scan on objects  (cost=1295618.40..6717973.52 rows=3344138 width=0)
         Recheck Cond: ((state = 'active'::text) AND (ending_tsz <= $0))
         ->  BitmapAnd  (cost=1295618.40..1295618.40 rows=3344138 width=0)
               ->  Bitmap Index Scan on objects_stat_user_id_creation_tsz  (cost=0.00..333925.70 rows=10032414 width=0)
                     Index Cond: (state = 'active'::text)
               ->  Bitmap Index Scan on objects_ending_tsz_idx  (cost=0.00..960020.38 rows=25015994 width=0)
                     Index Cond: (ending_tsz <= $0)
(10 rows)

Now. Interesting is, that if i'll change the query like Tom suggested, things go really bad:

$ set enable_bitmapscan = true;
SET
$ explain  select count(*) from objects where state='active' and ending_tsz <= now() - '1 day'::interval;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=9192258.28..9192258.29 rows=1 width=0)
   ->  Bitmap Heap Scan on objects  (cost=335966.69..9171848.45 rows=8163932 width=0)
         Recheck Cond: (state = 'active'::text)
         Filter: (ending_tsz <= (now() - '1 day'::interval))
         ->  Bitmap Index Scan on objects_stat_user_id_creation_tsz  (cost=0.00..333925.70 rows=10032414 width=0)
               Index Cond: (state = 'active'::text)
(6 rows)

$ set enable_bitmapscan = false;
SET
$ explain  select count(*) from objects where state='active' and ending_tsz <= now() - '1 day'::interval;
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Aggregate  (cost=10814335.47..10814335.48 rows=1 width=0)
   ->  Seq Scan on objects  (cost=0.00..10793925.64 rows=8163932 width=0)
         Filter: ((state = 'active'::text) AND (ending_tsz <= (now() - '1 day'::interval)))
(3 rows)

Current stats of the table:

$ select * from pg_stat_user_tables where relname = 'objects';
-[ RECORD 1 ]----+------------------------------
relid            | 71635994
schemaname       | public
relname          | objects
seq_scan         | 181
seq_tup_read     | 3164627085
idx_scan         | 164923232565
idx_tup_fetch    | 1359016133552
n_tup_ins        | 31372199
n_tup_upd        | 698411462
n_tup_del        | 1
n_tup_hot_upd    | 20426973
n_live_tup       | 75016862
n_dead_tup       | 494489
last_vacuum      | 2011-03-31 06:15:39.866869+00
last_autovacuum  | 2011-08-09 05:51:35.050683+00
last_analyze     | 2011-08-09 03:30:14.986266+00
last_autoanalyze | 2010-09-27 05:10:10.793584+00

Best regards,

depesz


Re: Problem with planner

From
Tom Lane
Date:
hubert depesz lubaczewski <depesz@depesz.com> writes:
> as for good plan. sure. If i'll disable bitmap scans, I get:

> $ explain analyze select count(*) from objects where state='active' and ending_tsz <= (select now() - '1
day'::interval); 
>                                                                        QUERY PLAN
                                   
>
---------------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=7651119.35..7651119.36 rows=1 width=0) (actual time=63.150..63.151 rows=1 loops=1)
>    InitPlan
>      ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.010..0.010 rows=1 loops=1)
>    ->  Index Scan using objects_ending_tsz_active on objects  (cost=0.00..7642758.99 rows=3344138 width=0) (actual
time=63.131..63.131rows=0 loops=1) 
>          Index Cond: (ending_tsz <= $0)
>  Total runtime: 63.279 ms
> (6 rows)

There's absolutely no hope of getting the planner to make intelligent
plan choices when its estimates are that far off (3344138 rows estimated
versus none actual).  It's just luck if you get the "right" plan despite
that.

Now part of the reason why the estimate is so bad is the sub-select,
which completely defeats selectivity estimation for the ending_tsz
condition.  Could we see EXPLAIN ANALYZE for the query without a
sub-select, both with and without enable_bitmapscan?

            regards, tom lane

Re: Problem with planner

From
hubert depesz lubaczewski
Date:
On Tue, Aug 09, 2011 at 12:11:48PM -0400, Tom Lane wrote:
> Now part of the reason why the estimate is so bad is the sub-select,
> which completely defeats selectivity estimation for the ending_tsz
> condition.  Could we see EXPLAIN ANALYZE for the query without a
> sub-select, both with and without enable_bitmapscan?

working on it.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/

Re: Problem with planner

From
hubert depesz lubaczewski
Date:
On Tue, Aug 09, 2011 at 12:11:48PM -0400, Tom Lane wrote:
> There's absolutely no hope of getting the planner to make intelligent
> plan choices when its estimates are that far off (3344138 rows estimated
> versus none actual).  It's just luck if you get the "right" plan despite
> that.
> Now part of the reason why the estimate is so bad is the sub-select,
> which completely defeats selectivity estimation for the ending_tsz
> condition.  Could we see EXPLAIN ANALYZE for the query without a
> sub-select, both with and without enable_bitmapscan?

version with disabled bitmapscans:
$ explain analyze  select count(*) from objects where state='active' and ending_tsz <= now() - '1 day'::interval;
                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=10815555.72..10815555.73 rows=1 width=0) (actual time=1416200.548..1416200.548 rows=1 loops=1)
   ->  Seq Scan on objects  (cost=0.00..10795673.36 rows=7952943 width=0) (actual time=1210074.356..1416200.498 rows=13
loops=1)
         Filter: ((state = 'active'::text) AND (ending_tsz <= (now() - '1 day'::interval)))
 Total runtime: 1416200.678 ms
(4 rows)


version with enabled bitmap scans is still working on (3 hours now), will post explain analyze when it finishes.

explain (with enabled bitmap scans) looks like this:

$ explain select count(*) from objects where state='active' and ending_tsz <= now() - '1 day'::interval;
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=9117480.50..9117480.51 rows=1 width=0)
   ->  Bitmap Heap Scan on objects  (cost=326375.21..9097597.40 rows=7953240 width=0)
         Recheck Cond: (state = 'active'::text)
         Filter: (ending_tsz <= (now() - '1 day'::interval))
         ->  Bitmap Index Scan on objects_stat_user_id_creation_tsz  (cost=0.00..324386.90 rows=9754574 width=0)
               Index Cond: (state = 'active'::text)
(6 rows)


value of state and ending_tsz are pretty correlated - it's very
unlikely to have rows matching both criteria (state = 'active' and
ending_tsz <= now() - '1 day').

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/

Re: Problem with planner

From
Tom Lane
Date:
hubert depesz lubaczewski <depesz@depesz.com> writes:
> version with disabled bitmapscans:
> $ explain analyze  select count(*) from objects where state='active' and ending_tsz <= now() - '1 day'::interval;
>                                                             QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=10815555.72..10815555.73 rows=1 width=0) (actual time=1416200.548..1416200.548 rows=1 loops=1)
>    ->  Seq Scan on objects  (cost=0.00..10795673.36 rows=7952943 width=0) (actual time=1210074.356..1416200.498
rows=13loops=1) 
>          Filter: ((state = 'active'::text) AND (ending_tsz <= (now() - '1 day'::interval)))
>  Total runtime: 1416200.678 ms
> (4 rows)

Hmm, so it's not using the index at all here.  The problem clearly is
that the rowcount estimate is still completely bogus :-(, even though
it's presumably getting a reasonable estimate now for the ending_tsz
test in isolation.

I suppose what's going on here is that the "state" and "ending_tsz"
columns are highly correlated, such that there are lots of 'active'
items but hardly any of them ended more than a day ago?  If so,
you're going to have to rethink the representation somehow to get
good results, because there's no way the planner will see this until
we have cross-column stats in some form.

The least invasive fix that I can think of offhand is to set up an
index (non-partial) on the expression

    case when state = 'active' then ending_tsz else null end

and phrase the query as

    WHERE (case when state = 'active' then ending_tsz else null end) <= (now() - '1 day'::interval)

This should result in condensing the stats about active items'
ending_tsz into a format the planner can deal with, assuming
you're running a PG version that will keep and use stats on
expression indexes.

            regards, tom lane

Re: Problem with planner

From
hubert depesz lubaczewski
Date:
On Tue, Aug 09, 2011 at 04:08:39PM -0400, Tom Lane wrote:
> I suppose what's going on here is that the "state" and "ending_tsz"
> columns are highly correlated, such that there are lots of 'active'
> items but hardly any of them ended more than a day ago?  If so,

yes, that's correct.

> you're going to have to rethink the representation somehow to get
> good results, because there's no way the planner will see this until
> we have cross-column stats in some form.
>
> The least invasive fix that I can think of offhand is to set up an
> index (non-partial) on the expression
>
>     case when state = 'active' then ending_tsz else null end
>
> and phrase the query as
>
>     WHERE (case when state = 'active' then ending_tsz else null end) <= (now() - '1 day'::interval)
>
> This should result in condensing the stats about active items'
> ending_tsz into a format the planner can deal with, assuming
> you're running a PG version that will keep and use stats on
> expression indexes.

it's 8.3.11.
I solved the problem by adding "enable_bitmapscan = false" (and keeping
the query in original format, with subselect) which caused the plan to
be ok.

but I'm much more interested to understand why pg chooses *not* to use
index which is tailored specifically for the query - it has exactly
matching where clause, and it indexes the column that we use for
comparison.

the thing is - i solved the problem for now. I might add new index the
way you suggest, and it might help. but it's is very unnerving that
postgresql will just choose to ignore specially made index, perfectly
matching the criteria in query.

since I can't test it - is there any chance (Cédric suggested something
like this) that some newer version has more logic to try harder to use
best index?

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/

Re: Problem with planner

From
Tom Lane
Date:
hubert depesz lubaczewski <depesz@depesz.com> writes:
> I solved the problem by adding "enable_bitmapscan = false" (and keeping
> the query in original format, with subselect) which caused the plan to
> be ok.

I doubt that solution is any more robust than what you had before ---
in particular, it's likely to fall back to seqscans.

> but I'm much more interested to understand why pg chooses *not* to use
> index which is tailored specifically for the query - it has exactly
> matching where clause, and it indexes the column that we use for
> comparison.

Because the planner thinks it will have to pull a huge number of rows
from the index.  Whether the index is "tailored" for the query
is irrelevant if it looks more expensive to use than a seqscan.

            regards, tom lane

Re: Problem with planner

From
hubert depesz lubaczewski
Date:
On Tue, Aug 09, 2011 at 05:18:38PM -0400, Tom Lane wrote:
> hubert depesz lubaczewski <depesz@depesz.com> writes:
> > I solved the problem by adding "enable_bitmapscan = false" (and keeping
> > the query in original format, with subselect) which caused the plan to
> > be ok.
>
> I doubt that solution is any more robust than what you had before ---
> in particular, it's likely to fall back to seqscans.
>
> > but I'm much more interested to understand why pg chooses *not* to use
> > index which is tailored specifically for the query - it has exactly
> > matching where clause, and it indexes the column that we use for
> > comparison.
>
> Because the planner thinks it will have to pull a huge number of rows
> from the index.  Whether the index is "tailored" for the query
> is irrelevant if it looks more expensive to use than a seqscan.


The query with enabled bitmap scans finally finished:
                                                                                   QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=9117479.68..9117479.69 rows=1 width=0) (actual time=24964197.316..24964197.317 rows=1 loops=1)
   ->  Bitmap Heap Scan on objects  (cost=326375.14..9097597.32 rows=7952942 width=0) (actual
time=24804152.598..24964197.207rows=48 loops=1) 
         Recheck Cond: (state = 'active'::text)
         Filter: (ending_tsz <= (now() - '1 day'::interval))
         ->  Bitmap Index Scan on objects_stat_user_id_creation_tsz  (cost=0.00..324386.90 rows=9754574 width=0)
(actualtime=24788705.254..24788705.254 rows=10582798 loops=1) 
               Index Cond: (state = 'active'::text)
 Total runtime: 24964211.224 ms
(7 rows)

In the mean time, based on your other mail, I created this index:

create index concurrently depesz_test_idx on objects (state, ending_tsz) WHERE state = 'active'::text;

and now it's being used by plans:
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Aggregate  (cost=8510568.24..8510568.25 rows=1 width=0)
   ->  Bitmap Heap Scan on objects  (cost=183260.11..8490232.49 rows=8134302 width=0)
         Recheck Cond: ((state = 'active'::text) AND (ending_tsz <= (now() - '1 day'::interval)))
         ->  Bitmap Index Scan on depesz_test_idx  (cost=0.00..181226.54 rows=8134302 width=0)
               Index Cond: ((state = 'active'::text) AND (ending_tsz <= (now() - '1 day'::interval)))
(5 rows)

but when I'll disable bitmap scans it reverts back to seq scan.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/

Re: Problem with planner

From
Cédric Villemain
Date:
2011/8/9 hubert depesz lubaczewski <depesz@depesz.com>:
> On Tue, Aug 09, 2011 at 04:08:39PM -0400, Tom Lane wrote:
>> I suppose what's going on here is that the "state" and "ending_tsz"
>> columns are highly correlated, such that there are lots of 'active'
>> items but hardly any of them ended more than a day ago?  If so,
>
> yes, that's correct.
>
>> you're going to have to rethink the representation somehow to get
>> good results, because there's no way the planner will see this until
>> we have cross-column stats in some form.
>>
>> The least invasive fix that I can think of offhand is to set up an
>> index (non-partial) on the expression
>>
>>       case when state = 'active' then ending_tsz else null end
>>
>> and phrase the query as
>>
>>       WHERE (case when state = 'active' then ending_tsz else null end) <= (now() - '1 day'::interval)
>>
>> This should result in condensing the stats about active items'
>> ending_tsz into a format the planner can deal with, assuming
>> you're running a PG version that will keep and use stats on
>> expression indexes.
>
> it's 8.3.11.
> I solved the problem by adding "enable_bitmapscan = false" (and keeping
> the query in original format, with subselect) which caused the plan to
> be ok.
>
> but I'm much more interested to understand why pg chooses *not* to use
> index which is tailored specifically for the query - it has exactly
> matching where clause, and it indexes the column that we use for
> comparison.
>
> the thing is - i solved the problem for now. I might add new index the
> way you suggest, and it might help. but it's is very unnerving that
> postgresql will just choose to ignore specially made index, perfectly
> matching the criteria in query.
>
> since I can't test it - is there any chance (Cédric suggested something
> like this) that some newer version has more logic to try harder to use
> best index?

I wondered if it is the same logic to choose between bitmap and
indexscan in both 8.3 and HEAD.
It looks like it is (except that now you can put the not-wanted index
on another tablepsace and increase the cost of accessing it, which is
another no-so-pretty way to workaround the issue).

>
> Best regards,
>
> depesz
>
> --
> The best thing about modern society is how easy it is to avoid contact with it.
>                                                             http://depesz.com/
>



--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation