Thread: partial indexed not being used.

partial indexed not being used.

From
Ron Arts
Date:
Hello

I have a table that has a boolean column telling if that particular
record has been billed already. So most of the time that column holds
only true values.

To select the unbilled record I use the following query:

# select * where billed = false order by calldate;

and defined the following index:

# Create index cdr_billed_index on cdr (billed) where billed = false;

This query took surprisingly long, and explain seems to be telling
me the query is using a sequential scan:

=# explain select cdr.* from cdr where billed = false order by calldate;

                             QUERY PLAN
-------------------------------------------------------------------
  Sort  (cost=37448.75..37526.94 rows=31273 width=465)
    Sort Key: calldate
    ->  Seq Scan on cdr  (cost=0.00..20323.81 rows=31273 width=465)
          Filter: (billed = false)

How can I ensure the patial billing index will be used?

thanks for any insights,
Ron

Attachment

Re: partial indexed not being used.

From
Michael Fuhr
Date:
On Mon, Feb 12, 2007 at 01:23:13AM +0100, Ron Arts wrote:
> I have a table that has a boolean column telling if that particular
> record has been billed already. So most of the time that column holds
> only true values.

How often is "most of the time"?  What are the results of the following
queries?

select version();
select count(*) from cdr;
select count(*) from cdr where billed = false;

> =# explain select cdr.* from cdr where billed = false order by calldate;
>
>                             QUERY PLAN
> -------------------------------------------------------------------
>  Sort  (cost=37448.75..37526.94 rows=31273 width=465)
>    Sort Key: calldate
>    ->  Seq Scan on cdr  (cost=0.00..20323.81 rows=31273 width=465)
>          Filter: (billed = false)
>
> How can I ensure the patial billing index will be used?

If using the index would be slower than a sequential scan then you
don't want the query to use the index.  Let's see how accurate the
row count estimates are and whether using an index really would be
faster -- please post the output of the following queries:

set enable_seqscan to on;
explain analyze select cdr.* from cdr where billed = false order by calldate;
set enable_seqscan to off;
explain analyze select cdr.* from cdr where billed = false order by calldate;

If the number of rows returned isn't close to the planner's estimate
(31273 in the output you showed) then trying running ANALYZE or
VACUUM ANALYZE on the table, then run the above statements again.

--
Michael Fuhr

Re: partial indexed not being used.

From
Michael Fuhr
Date:
On Sun, Feb 11, 2007 at 07:27:29PM -0700, Michael Fuhr wrote:
> If using the index would be slower than a sequential scan then you
> don't want the query to use the index.  Let's see how accurate the
> row count estimates are and whether using an index really would be
> faster -- please post the output of the following queries:

Also, what's your hardware configuration and what non-default
settings do you have in postgresql.conf?  In particular, how much
memory do you have and what values do you have for shared_buffers,
effective_cache_size, work_mem, and random_page_cost?

--
Michael Fuhr

Re: partial indexed not being used.

From
Ron Arts
Date:
Michael Fuhr schreef:
> On Mon, Feb 12, 2007 at 01:23:13AM +0100, Ron Arts wrote:
>> I have a table that has a boolean column telling if that particular
>> record has been billed already. So most of the time that column holds
>> only true values.
>
> How often is "most of the time"?  What are the results of the following
> queries?
>

Michael,

most of the time means: 9 out of ten times.

> select version();

# select version();
                                                             version

-------------------------------------------------------------------------------------------------------------------------------
  PostgreSQL 7.4.7 on i386-redhat-linux-gnu, compiled by GCC i386-redhat-linux-gcc (GCC) 3.3.3 20040412 (Red Hat Linux
3.3.3-7)
(1 row)

> select count(*) from cdr;

# select count(*) from cdr;
  count
-------
  62547
(1 row)

> select count(*) from cdr where billed = false;
>

# select count(*) from cdr where billed = false;
  count
-------
      3
(1 row)


I am worried that is is doing a sequential scan. On production systems
the cdr table might contain of millions of records.

Thanks,
Ron

>> =# explain select cdr.* from cdr where billed = false order by calldate;
>>
>>                             QUERY PLAN
>> -------------------------------------------------------------------
>>  Sort  (cost=37448.75..37526.94 rows=31273 width=465)
>>    Sort Key: calldate
>>    ->  Seq Scan on cdr  (cost=0.00..20323.81 rows=31273 width=465)
>>          Filter: (billed = false)
>>
>> How can I ensure the patial billing index will be used?
>
> If using the index would be slower than a sequential scan then you
> don't want the query to use the index.  Let's see how accurate the
> row count estimates are and whether using an index really would be
> faster -- please post the output of the following queries:
>
> set enable_seqscan to on;
> explain analyze select cdr.* from cdr where billed = false order by calldate;
> set enable_seqscan to off;
> explain analyze select cdr.* from cdr where billed = false order by calldate;
>
> If the number of rows returned isn't close to the planner's estimate
> (31273 in the output you showed) then trying running ANALYZE or
> VACUUM ANALYZE on the table, then run the above statements again.
>


Attachment

Re: partial indexed not being used.

From
Tom Lane
Date:
Ron Arts <ron.arts@neonova.nl> writes:
> # select count(*) from cdr where billed = false;
>   count
> -------
>       3
> (1 row)

>> =# explain select cdr.* from cdr where billed = false order by calldate;
>>
>>                             QUERY PLAN
>> -------------------------------------------------------------------
>>  Sort  (cost=37448.75..37526.94 rows=31273 width=465)
>>    Sort Key: calldate
>>    ->  Seq Scan on cdr  (cost=0.00..20323.81 rows=31273 width=465)
                                                ^^^^^^^^^^
>>          Filter: (billed = false)

You haven't ANALYZEd this table (lately, or perhaps ever).  The planner
is thus working with a default selectivity estimate (which I think is
50% for a bool column --- that seems to match your numbers anyway), and
that leads it to the conclusion that a seqscan is the right thing.
Which it would indeed be, if half the table has to be retrieved.

            regards, tom lane

Re: partial indexed not being used.

From
Ron Arts
Date:
Tom Lane schreef:
> Ron Arts <ron.arts@neonova.nl> writes:
>> # select count(*) from cdr where billed = false;
>>   count
>> -------
>>       3
>> (1 row)
>
>>> =# explain select cdr.* from cdr where billed = false order by calldate;
>>>
>>>                             QUERY PLAN
>>> -------------------------------------------------------------------
>>>  Sort  (cost=37448.75..37526.94 rows=31273 width=465)
>>>    Sort Key: calldate
>>>    ->  Seq Scan on cdr  (cost=0.00..20323.81 rows=31273 width=465)
>                                                 ^^^^^^^^^^
>>>          Filter: (billed = false)
>
> You haven't ANALYZEd this table (lately, or perhaps ever).  The planner
> is thus working with a default selectivity estimate (which I think is
> 50% for a bool column --- that seems to match your numbers anyway), and
> that leads it to the conclusion that a seqscan is the right thing.
> Which it would indeed be, if half the table has to be retrieved.
>
>             regards, tom lane
>

Oh man,

I totally forgot about analyze because I have it running daily from a script.
And of course the script was broken, and I didn't notice.

Thanks!

Ron



--
NeoNova BV, The Netherlands
Professional internet and VoIP solutions

http://www.neonova.nl   Kruislaan 419              1098 VA Amsterdam
info: 020-5628292       servicedesk: 020-5628292   fax: 020-5628291
KvK Amsterdam 34151241

The following disclaimer applies to this email:
http://www.neonova.nl/maildisclaimer

Attachment