Thread: An "obvious" index not being used

An "obvious" index not being used

From
Daniele Varrazzo
Date:
Hello,

I am experiencing a query for which an useful index is not being used by
PostgreSQL. The query is in the form:

     select count(*) from foo
     where foo.account_id in (
         select id from accounts where system = 'abc');

and the size of the tables it works on is:

   - 270 records in "accounts" 22 of which match the condition 'abc';
   - 5.3M records in "foo", 92K of which match the query condition.

There is an index in the field "foo.account_id" but is not used. The resulting
query plan is:

  Aggregate  (cost=300940.70..300940.71 rows=1 width=0) (actual
time=13412.088..13412.089 rows=1 loops=1)
    ->  Hash IN Join  (cost=11.97..299858.32 rows=432953 width=0) (actual
time=0.678..13307.074 rows=92790 loops=1)
          Hash Cond: (foo.account_id = accounts.id)
          ->  Seq Scan on foo  (cost=0.00..275591.14 rows=5313514 width=4)
(actual time=0.014..7163.538 rows=5313514 loops=1)
          ->  Hash  (cost=11.70..11.70 rows=22 width=4) (actual
time=0.199..0.199 rows=22 loops=1)
                ->  Bitmap Heap Scan on accounts  (cost=1.42..11.70 rows=22
width=4) (actual time=0.092..0.160 rows=22 loops=1)
                      Recheck Cond: (("system")::text = 'abc'::text)
                      ->  Bitmap Index Scan on iaccounts_x1
(cost=0.00..1.42 rows=22 width=0) (actual time=0.077..0.077 rows=22
loops=1)
                            Index Cond: (("system")::text = 'abc'::text)
  Total runtime: 13412.226 ms


There is a seqscan on the large table. If seqscans are disabled, the plan
becomes the more acceptable:

  Aggregate  (cost=2471979.99..2471980.00 rows=1 width=0) (actual
time=630.977..630.978 rows=1 loops=1)
    ->  Nested Loop  (cost=1258.12..2470897.61 rows=432953 width=0) (actual
time=0.164..526.174 rows=92790 loops=1)
          ->  HashAggregate  (cost=12.75..12.97 rows=22 width=4) (actual
time=0.131..0.169 rows=22 loops=1)
                ->  Bitmap Heap Scan on accounts  (cost=2.42..12.70 rows=22
width=4) (actual time=0.047..0.091 rows=22 loops=1)
                      Recheck Cond: (("system")::text = 'abc'::text)
                      ->  Bitmap Index Scan on iaccounts_x1
(cost=0.00..2.42 rows=22 width=0) (actual time=0.036..0.036 rows=22
loops=1)
                            Index Cond: (("system")::text = 'abc'::text)
          ->  Bitmap Heap Scan on foo  (cost=1245.37..111275.14 rows=83024
width=4) (actual time=3.086..14.391 rows=4218 loops=22)
                Recheck Cond: (foo.account_id = accounts.id)
                ->  Bitmap Index Scan on ifoo_x1  (cost=0.00..1224.61
rows=83024 width=0) (actual time=2.962..2.962 rows=4218 loops=22)
                      Index Cond: (foo.account_id = accounts.id)
  Total runtime: 631.121 ms

where the index "ifoo_x1" is used.


A similar query plan can be also obtained performing first the internal query
and hardcoding the result in a new query:

     explain analyze select count(*) from foo
     where account_id in
(70,33,190,21,191,223,203,202,148,246,85,281,280,319,234,67,245,310,318,279,320,9);


I have tried to:

   - rewrite the query with a JOIN instead of an IN (no change in the plan),
   - rewrite the query using EXISTS (it gets worse),
   - raise the statistics for the foo.account_id field to 100 and to 1000,
   - decrease the random_page_cost down to 1,
   - vacuum-analyze the tables at each change,

none of which has changed the situation.

The system is an Ubuntu Hardy 64 bits running PG 8.3. The issue has been
confirmed on Mac OS 1.5/PG 8.3. Although I made fewer tests on a PG 8.2 we
recently switched from, I think the issue presents on that version too.

This is the first time I see the query planner failing a plan rather obvious:
is there any other setting to tweak to force it to do good? (but a sensible
tweaking: the random_page_cost to 1 was just a try to have the index used,
nothing to be really put in production)

If you want to try the issue, an anonimized dataset is available on
http://piro.develer.com/test.sql.bz2 . The file size is 46MB (1.5GB
uncompressed). Chris Mair, who tested it on Mac OS, also noticed that PG
behaved correctly with the freshly imported data: as soon as he VACUUMed the
database he started experiencing the described issue.

Thank you very much.

--
Daniele Varrazzo - Develer S.r.l.
http://www.develer.com

Re: An "obvious" index not being used

From
Tom Lane
Date:
Daniele Varrazzo <piro@develer.com> writes:
> There is an index in the field "foo.account_id" but is not used. The resulting
> query plan is:

>   Aggregate  (cost=300940.70..300940.71 rows=1 width=0) (actual
> time=13412.088..13412.089 rows=1 loops=1)
>     ->  Hash IN Join  (cost=11.97..299858.32 rows=432953 width=0) (actual
> time=0.678..13307.074 rows=92790 loops=1)
>           Hash Cond: (foo.account_id = accounts.id)
>           ->  Seq Scan on foo  (cost=0.00..275591.14 rows=5313514 width=4)
> (actual time=0.014..7163.538 rows=5313514 loops=1)

Well, if the estimate of 432953 rows selected were correct, it'd be
right not to use the index.  Fetching one row in ten is not a chore
for an indexscan.  (I'm not sure it'd prefer an indexscan even with an
accurate 92K-row estimate, but at least you'd be in the realm where
tweaking random_page_cost would make a difference.)

I'm not sure why that estimate is so bad, given that you said you
increased the stats target on the table.  Is there anything particularly
skewed about the distribution of the account IDs?

            regards, tom lane

Re: An "obvious" index not being used

From
Daniele Varrazzo
Date:
Tom Lane ha scritto:
> Daniele Varrazzo <piro@develer.com> writes:
>> There is an index in the field "foo.account_id" but is not used. The resulting
>> query plan is:
>
>>   Aggregate  (cost=300940.70..300940.71 rows=1 width=0) (actual
>> time=13412.088..13412.089 rows=1 loops=1)
>>     ->  Hash IN Join  (cost=11.97..299858.32 rows=432953 width=0) (actual
>> time=0.678..13307.074 rows=92790 loops=1)
>>           Hash Cond: (foo.account_id = accounts.id)
>>           ->  Seq Scan on foo  (cost=0.00..275591.14 rows=5313514 width=4)
>> (actual time=0.014..7163.538 rows=5313514 loops=1)
>
> Well, if the estimate of 432953 rows selected were correct, it'd be
> right not to use the index.  Fetching one row in ten is not a chore
> for an indexscan.  (I'm not sure it'd prefer an indexscan even with an
> accurate 92K-row estimate, but at least you'd be in the realm where
> tweaking random_page_cost would make a difference.)

Let me guess: because the account tables has an estimated (and correct) guess
of 22 records fetched out from 270 =~ 8%, it assumes that it will need to
fetch the 8% of 5.3M records (which... yes, it matches the estimate of 433K).
Well, this seems terribly wrong for this data set :(

> I'm not sure why that estimate is so bad, given that you said you
> increased the stats target on the table.  Is there anything particularly
> skewed about the distribution of the account IDs?

Probably there is, in the sense that the relatively many accounts of 'abc'
type are referred by relatively few records. In the plan for the hardcoded
query the estimate is:

->  Bitmap Index Scan on ifoo_x1  (cost=0.00..4115.67 rows=178308
width=0) (actual time=89.766..89.766 rows=92790 loops=1)

which is actually more accurate.

I suspect the foo.account_id statistical data are not used at all in query:
the query planner can only estimate the number of accounts to look for, not
how they are distributed in the referencing tables. It seems the only way to
get the proper plan is to add a load of fake accounts! Well, I'd rather have
the query executed in 2 times, in order to have the stats correctly used: this
is the first time it happens to me.

--
Daniele Varrazzo - Develer S.r.l.
http://www.develer.com

Re: An "obvious" index not being used

From
"Kevin Grittner"
Date:
>>> Daniele Varrazzo <piro@develer.com> wrote:

>      select count(*) from foo
>      where foo.account_id in (
>          select id from accounts where system = 'abc');

>   Total runtime: 13412.226 ms

Out of curiosity, how does it do with the logically equivalent?:

select count(*) from foo
where exists (select * from accounts
  where accounts.id = foo.account_id
    and accounts.system = 'abc');

-Kevin

Re: An "obvious" index not being used

From
"Daniele Varrazzo"
Date:
>>>> Daniele Varrazzo <piro@develer.com> wrote:
>
>>      select count(*) from foo
>>      where foo.account_id in (
>>          select id from accounts where system = 'abc');
>
>>   Total runtime: 13412.226 ms
>
> Out of curiosity, how does it do with the logically equivalent?:
>
> select count(*) from foo
> where exists (select * from accounts
>   where accounts.id = foo.account_id
>     and accounts.system = 'abc');

I tried it: it is slower and the query plan still includes the seqscan:

 Aggregate  (cost=44212346.30..44212346.31 rows=1 width=0) (actual
time=21510.468..21510.469 rows=1 loops=1)
   ->  Seq Scan on foo  (cost=0.00..44205704.40 rows=2656760 width=0)
(actual time=0.058..21402.752 rows=92790 loops=1)
         Filter: (subplan)
         SubPlan
           ->  Index Scan using accounts_pkey on accounts  (cost=0.00..8.27
rows=1 width=288) (actual time=0.002..0.002 rows=0 loops=5313519)
                 Index Cond: (id = $0)
                 Filter: (("system")::text = 'abc'::text)
 Total runtime: 21510.531 ms

Here the estimate is even more gross: 2656760 is exactly the 50% of the
records in the table.

--
Daniele Varrazzo - Develer S.r.l.
http://www.develer.com

Re: An "obvious" index not being used

From
Francisco Reyes
Date:
Daniele Varrazzo writes:

> I suspect the foo.account_id statistical data are not used at all in query:
> the query planner can only estimate the number of accounts to look for, not

You mentioned you bumped your default_statistics_target.
What did you increase it to?
My data sets are so "strange" that anything less than 350 gives many bad
plans.

Re: An "obvious" index not being used

From
Daniele Varrazzo
Date:
Francisco Reyes writes:
> Daniele Varrazzo writes:
>
>> I suspect the foo.account_id statistical data are not used at all in
>> query: the query planner can only estimate the number of accounts to
>> look for, not
>
> You mentioned you bumped your default_statistics_target.
> What did you increase it to?
> My data sets are so "strange" that anything less than 350 gives many bad
> plans.

Not default_statistics_target: I used "ALTER TABLE SET STATISTICS" to change
the stats only for the tables I was interested in, arriving up to 1000. I
think the result is the same, but it was a red herring anyway: these stats
couldn't be used at all in my query.

In my problem I had 2 tables: a small one (accounts), a large one (foo). The
way the query is written doesn't allow the stats from the large table to be
used at all, unless the records from the small table are fetched. This is
independent from the stats accuracy.

What the planner does is to assume an even distribution in the data in the
joined fields. The assumption is probably better than not having anything, but
in my data set (where there were a bunch of accounts with many foo each,but
many accounts with too little foo) this proved false.

The stats can be used only if at planning time the planner knows what values
to look for in the field: this is the reason for which, if the query is split
in two parts, performances become acceptable. In this case we may fall in your
situation: a data set may be "strange" and thus require an increase in the
stats resolution. I can't remember if the default 10 was too low, but 100 was
definitely enough for me.

It would be nice if the planner could perform the "split query" optimization
automatically, i.e. fetch records from small tables to plan the action on
larger tables. But I suspect this doesn't fit at all in the current PostgreSQL
query pipeline... or does it?

--
Daniele Varrazzo - Develer S.r.l.
http://www.develer.com

Re: An "obvious" index not being used

From
Tom Lane
Date:
Daniele Varrazzo <piro@develer.com> writes:
> In my problem I had 2 tables: a small one (accounts), a large one (foo). The
> way the query is written doesn't allow the stats from the large table to be
> used at all, unless the records from the small table are fetched. This is
> independent from the stats accuracy.

> What the planner does is to assume an even distribution in the data in the
> joined fields.

Sir, you don't know what you're talking about.

            regards, tom lane

Re: An "obvious" index not being used

From
Daniele Varrazzo
Date:
Tom Lane ha scritto:
> Daniele Varrazzo <piro@develer.com> writes:
>> In my problem I had 2 tables: a small one (accounts), a large one (foo). The
>> way the query is written doesn't allow the stats from the large table to be
>> used at all, unless the records from the small table are fetched. This is
>> independent from the stats accuracy.
>
>> What the planner does is to assume an even distribution in the data in the
>> joined fields.
>
> Sir, you don't know what you're talking about.

This is probably correct, I am not into the PG internals.

I was just reporting the analysis I proposed in my previous message in this
thread
(http://archives.postgresql.org/pgsql-performance/2008-06/msg00095.php). You
gave me an hint of where the backend was missing to correctly estimate, and I
deduced a guess of the strategy the backend could have used to reach that
result - not matching the reality of my data set but I think matching the
picture it could have using the stats data but not performing any further fetch.

Nobody confuted that message, of course that may have happened because it was
laughable:

Daniele Varrazzo ha scritto:
 > Tom Lane ha scritto:
 >> Daniele Varrazzo <piro@develer.com> writes:
 >>> There is an index in the field "foo.account_id" but is not used. The
 >>> resulting query plan is:
 >>
 >>>   Aggregate  (cost=300940.70..300940.71 rows=1 width=0) (actual
 >>> time=13412.088..13412.089 rows=1 loops=1)
 >>>     ->  Hash IN Join  (cost=11.97..299858.32 rows=432953 width=0)
 >>> (actual
 >>> time=0.678..13307.074 rows=92790 loops=1)
 >>>           Hash Cond: (foo.account_id = accounts.id)
 >>>           ->  Seq Scan on foo  (cost=0.00..275591.14 rows=5313514
 >>> width=4)
 >>> (actual time=0.014..7163.538 rows=5313514 loops=1)
 >>
 >> Well, if the estimate of 432953 rows selected were correct, it'd be
 >> right not to use the index.  Fetching one row in ten is not a chore
 >> for an indexscan.  (I'm not sure it'd prefer an indexscan even with an
 >> accurate 92K-row estimate, but at least you'd be in the realm where
 >> tweaking random_page_cost would make a difference.)
 >
 > Let me guess: because the account tables has an estimated (and correct)
 > guess of 22 records fetched out from 270 =~ 8%, it assumes that it will
 > need to fetch the 8% of 5.3M records (which... yes, it matches the
 > estimate of 433K).

This is the idea I had about how the query planner behaved in that query, and
why the query performs as I expect when the joined items are explicit. Was it
wrong?

Thank you very much. Again, the only reason for which I think I was right is
because nobody confuted my previous email.

Regards,

--
Daniele Varrazzo - Develer S.r.l.
http://www.develer.com