Thread: Index not used, performance problem

Index not used, performance problem

From
Andreas Kostyrka
Date:
Hi!

I've got the following problem:
PostgreSQL 7.2.1-2 (Debian) on Duron/700MHz, 512MB, IDE hdd (laptop).

I've got a table that has 6400 rows, an index on the deleted, nachname,
vorname and hvvsnummer attributes, and my O-R wrapper generate queries
like this:

SELECT patient.id, patient.vorname, patient.nachname, patient.titel,
patient.geburtsdatum, patient.hvvsnummer, patient.geschlecht,
patient.adresse_id, patient.beruf, patient.kommentar, patient.cave,
patient.zusatzversicherung, patient.deleted FROM patient WHERE
((((patient.deleted = 'f') AND (patient.nachname LIKE 'K%')) AND
(patient.vorname LIKE '%')) AND (patient.hvvsnummer LIKE '%'))

This results in a SeqScan von patient. Even more curious is that simpler
queries like

select * from patient where deleted='f'; OR:
select * from patient where nachname LIKE 'K%';

all result in SeqScan on patient.

I've "analyzed" and "reindex" the table already multiple times, and
still PostgreSQL insists upon not using any index.

TIA for any pointers,

Andreas

mpp2=# \d patient
                 Table "patient"
       Column       |     Type     |  Modifiers
--------------------+--------------+-------------
 id                 | integer      | not null
 vorname            | text         | not null
 nachname           | text         | not null
 titel              | text         |
 geburtsdatum       | date         |
 hvvsnummer         | text         |
 geschlecht         | character(1) |
 adresse_id         | integer      |
 beruf              | text         |
 kommentar          | text         |
 cave               | text         |
 zusatzversicherung | text         |
 deleted            | boolean      | default 'f'
Indexes: patient_deleted,
         patient_hvvsnummer,
         patient_nachname,
         patient_vorname
Primary key: patient_pkey
Check constraints: "patient_geschlecht" (((geschlecht = 'm'::bpchar) OR
(geschlecht = 'w'::bpchar)) OR (geschlecht = '?'::bpchar))
Triggers: RI_ConstraintTrigger_352787,
          RI_ConstraintTrigger_352789,
          RI_ConstraintTrigger_352801,
          RI_ConstraintTrigger_352803,
          RI_ConstraintTrigger_352815

mpp2=# select count(*) from patient;
 count
-------
  6406
(1 row)

mpp2=# explain SELECT * FROM patient WHERE (patient.nachname LIKE 'K%');
NOTICE:  QUERY PLAN:

Seq Scan on patient  (cost=0.00..173.07 rows=272 width=70)

EXPLAIN
mpp2=# explain SELECT * FROM patient WHERE NOT deleted;
NOTICE:  QUERY PLAN:

Seq Scan on patient  (cost=0.00..157.06 rows=6406 width=70)

EXPLAIN
mpp2=# explain SELECT * FROM patient WHERE deleted='f';
NOTICE:  QUERY PLAN:

Seq Scan on patient  (cost=0.00..173.07 rows=6406 width=70)

EXPLAIN



Attachment

Re: Index not used, performance problem

From
Christopher Kings-Lynne
Date:
Hi Andreas,

A few points:

PostgreSQL is rarely going to use an index for a boolean column.  The
reason is that since almost by definition true will occupy 50% of the rows
and false will occupy 50% (say).  In this case, a sequential scan is
always faster.  You would say that the 'selectivity' isn't good enough.

As for the LIKE searches, the only ones that PostgreSQL can index are of
the form 'FOO%', which is what you are doing.  However, I believe that
PostgreSQL cannot do this if your database encoding is anything other than
'C'.  So, if you are using an Austrian encoding, it might not be able to
use the index.

Some things to try:

If you are always seeking over all four columns, then drop the 4
individual indexes and create one like this:

create index my_key on patient(nachname, vorname, hvvsnummer);

That would be more efficient, in the C locale.

Also, what is the point of searching for LIKE '%'? Why not just leave that
out?

Chris

On 29 Mar 2003, Andreas Kostyrka wrote:

> Hi!
>
> I've got the following problem:
> PostgreSQL 7.2.1-2 (Debian) on Duron/700MHz, 512MB, IDE hdd (laptop).
>
> I've got a table that has 6400 rows, an index on the deleted, nachname,
> vorname and hvvsnummer attributes, and my O-R wrapper generate queries
> like this:
>
> SELECT patient.id, patient.vorname, patient.nachname, patient.titel,
> patient.geburtsdatum, patient.hvvsnummer, patient.geschlecht,
> patient.adresse_id, patient.beruf, patient.kommentar, patient.cave,
> patient.zusatzversicherung, patient.deleted FROM patient WHERE
> ((((patient.deleted = 'f') AND (patient.nachname LIKE 'K%')) AND
> (patient.vorname LIKE '%')) AND (patient.hvvsnummer LIKE '%'))
>
> This results in a SeqScan von patient. Even more curious is that simpler
> queries like
>
> select * from patient where deleted='f'; OR:
> select * from patient where nachname LIKE 'K%';
>
> all result in SeqScan on patient.
>
> I've "analyzed" and "reindex" the table already multiple times, and
> still PostgreSQL insists upon not using any index.
>
> TIA for any pointers,
>
> Andreas
>
> mpp2=# \d patient
>                  Table "patient"
>        Column       |     Type     |  Modifiers
> --------------------+--------------+-------------
>  id                 | integer      | not null
>  vorname            | text         | not null
>  nachname           | text         | not null
>  titel              | text         |
>  geburtsdatum       | date         |
>  hvvsnummer         | text         |
>  geschlecht         | character(1) |
>  adresse_id         | integer      |
>  beruf              | text         |
>  kommentar          | text         |
>  cave               | text         |
>  zusatzversicherung | text         |
>  deleted            | boolean      | default 'f'
> Indexes: patient_deleted,
>          patient_hvvsnummer,
>          patient_nachname,
>          patient_vorname
> Primary key: patient_pkey
> Check constraints: "patient_geschlecht" (((geschlecht = 'm'::bpchar) OR
> (geschlecht = 'w'::bpchar)) OR (geschlecht = '?'::bpchar))
> Triggers: RI_ConstraintTrigger_352787,
>           RI_ConstraintTrigger_352789,
>           RI_ConstraintTrigger_352801,
>           RI_ConstraintTrigger_352803,
>           RI_ConstraintTrigger_352815
>
> mpp2=# select count(*) from patient;
>  count
> -------
>   6406
> (1 row)
>
> mpp2=# explain SELECT * FROM patient WHERE (patient.nachname LIKE 'K%');
> NOTICE:  QUERY PLAN:
>
> Seq Scan on patient  (cost=0.00..173.07 rows=272 width=70)
>
> EXPLAIN
> mpp2=# explain SELECT * FROM patient WHERE NOT deleted;
> NOTICE:  QUERY PLAN:
>
> Seq Scan on patient  (cost=0.00..157.06 rows=6406 width=70)
>
> EXPLAIN
> mpp2=# explain SELECT * FROM patient WHERE deleted='f';
> NOTICE:  QUERY PLAN:
>
> Seq Scan on patient  (cost=0.00..173.07 rows=6406 width=70)
>
> EXPLAIN
>
>
>


Re: Index not used, performance problem

From
Andrew Sullivan
Date:
On Sat, Mar 29, 2003 at 09:47:51PM +0800, Christopher Kings-Lynne wrote:
> the form 'FOO%', which is what you are doing.  However, I believe that
> PostgreSQL cannot do this if your database encoding is anything other than
> 'C'.  So, if you are using an Austrian encoding, it might not be able to

That is, you need to have had the LOCALE set to 'C' when you did
initdb.  It's not enough to change it afterwards.

A
--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: Index not used, performance problem

From
"John K. Herreshoff"
Date:
In Linux (Redhat) where, exactly, does one set the LOCALE to C?

TIA :-)

John.

On Saturday 29 March 2003 10:34 am, Andrew Sullivan wrote:
> On Sat, Mar 29, 2003 at 09:47:51PM +0800, Christopher Kings-Lynne wrote:
> > the form 'FOO%', which is what you are doing.  However, I believe that
> > PostgreSQL cannot do this if your database encoding is anything other
> > than 'C'.  So, if you are using an Austrian encoding, it might not be
> > able to
>
> That is, you need to have had the LOCALE set to 'C' when you did
> initdb.  It's not enough to change it afterwards.
>
> A


Re: Index not used, performance problem

From
Andreas Kostyrka
Date:
On Sat, 2003-03-29 at 14:47, Christopher Kings-Lynne wrote:
> As for the LIKE searches, the only ones that PostgreSQL can index are of
> the form 'FOO%', which is what you are doing.  However, I believe that
> PostgreSQL cannot do this if your database encoding is anything other than
> 'C'.  So, if you are using an Austrian encoding, it might not be able to
> use the index.
Well, I use LATIN1. How do I store 8-bit chars else? And if so,
PostgreSQL seems quite strongly broken, because a relational database
relies by design heavily on indexes.

> Also, what is the point of searching for LIKE '%'? Why not just leave that
> out?
Well, it's about generating the SQL query.
Actually it's just a border case for searching for a given prefix.

Andreas

Attachment

Re: Index not used, performance problem

From
Jord Tanner
Date:
# su postgres
% export LANG=C
% /usr/local/pgsql/bin/initdb blah blah

That always works for me!

On Sat, 2003-03-29 at 08:49, John K. Herreshoff wrote:
> In Linux (Redhat) where, exactly, does one set the LOCALE to C?
>
> TIA :-)
>
> John.
>
> On Saturday 29 March 2003 10:34 am, Andrew Sullivan wrote:
> > On Sat, Mar 29, 2003 at 09:47:51PM +0800, Christopher Kings-Lynne wrote:
> > > the form 'FOO%', which is what you are doing.  However, I believe that
> > > PostgreSQL cannot do this if your database encoding is anything other
> > > than 'C'.  So, if you are using an Austrian encoding, it might not be
> > > able to
> >
> > That is, you need to have had the LOCALE set to 'C' when you did
> > initdb.  It's not enough to change it afterwards.
> >
> > A
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
--
Jord Tanner <jord@indygecko.com>


Re: Index not used, performance problem

From
Andreas Kostyrka
Date:
On Sat, 2003-03-29 at 14:47, Christopher Kings-Lynne wrote:
> Hi Andreas,
>
> A few points:
>
> PostgreSQL is rarely going to use an index for a boolean column.  The
> reason is that since almost by definition true will occupy 50% of the rows
> and false will occupy 50% (say).  In this case, a sequential scan is
> always faster.  You would say that the 'selectivity' isn't good enough.
Well, perhaps it should collect statistics, because a "deleted" column
is a prime candidate for a strongly skewed population.

Andreas

Attachment

Re: Index not used, performance problem

From
Matt Mello
Date:
I have many boolean columns, and my queries almost always use indexes.
Just because a column can have only 2 values does not mean that 50% of
them will be true and 50% will be false.  The ratio of T|F depends on
the content.  I have some boolean columns with less than 1% true.
Obviously, an index will help with these ... and it does, tremendously.

If you only have 6400 rows, it is *possible* that the planner will
choose not to use an index, as using an index might be slower than just
seqscanning.

If you do lots of updates on that table, you might need to do a vacuum
full occasionally, although I'm not certain how much that benefits a
boolean field.

Also, if possible, I would consider upgrading to a more recent version.
  I have seen many of the experts here post news about significant bug
fixes between 7.2 and 7.3.  (My experience with boolean fields is using
7.3.)

In addition, when posting to the list, it is helpful to post an "explain
analyze" for a query, as it gives more & better details (for those same
experts, of which I am not).


Andreas Kostyrka wrote:
> On Sat, 2003-03-29 at 14:47, Christopher Kings-Lynne wrote:
>
>>Hi Andreas,
>>
>>A few points:
>>
>>PostgreSQL is rarely going to use an index for a boolean column.  The
>>reason is that since almost by definition true will occupy 50% of the rows
>>and false will occupy 50% (say).  In this case, a sequential scan is
>>always faster.  You would say that the 'selectivity' isn't good enough.
>
> Well, perhaps it should collect statistics, because a "deleted" column
> is a prime candidate for a strongly skewed population.
>
> Andreas

--
Matt Mello
512-350-6900


Re: Index not used, performance problem

From
Tom Lane
Date:
Andreas Kostyrka <andreas@mtg.co.at> writes:
> On Sat, 2003-03-29 at 14:47, Christopher Kings-Lynne wrote:
>> As for the LIKE searches, the only ones that PostgreSQL can index are of
>> the form 'FOO%', which is what you are doing.  However, I believe that
>> PostgreSQL cannot do this if your database encoding is anything other than
>> 'C'.  So, if you are using an Austrian encoding, it might not be able to
>> use the index.

> Well, I use LATIN1. How do I store 8-bit chars else?

You are both confusing locale with encoding.  The LIKE optimization
requires 'C' locale, but it should work with any encoding (or at least
any single-byte encoding; not sure about multibyte).

> And if so, PostgreSQL seems quite strongly broken, because a
> relational database relies by design heavily on indexes.

Some of us would reply that the locales are broken ;-).  The bizarre
sorting rules demanded by so many locales are what make it impossible
to optimize a LIKE prefix into an indexscan.  See the archives for
the reasons why our many tries at this have failed.

            regards, tom lane


Re: Index not used, performance problem

From
"scott.marlowe"
Date:
On 29 Mar 2003, Andreas Kostyrka wrote:

> On Sat, 2003-03-29 at 14:47, Christopher Kings-Lynne wrote:
> > Hi Andreas,
> >
> > A few points:
> >
> > PostgreSQL is rarely going to use an index for a boolean column.  The
> > reason is that since almost by definition true will occupy 50% of the rows
> > and false will occupy 50% (say).  In this case, a sequential scan is
> > always faster.  You would say that the 'selectivity' isn't good enough.
> Well, perhaps it should collect statistics, because a "deleted" column
> is a prime candidate for a strongly skewed population.

It does.  When you run analyze.  You have vacuumed and analyzed the
database right?

Assuming you have, it's often better to make a partial index for your
booleans.  I'll assume that patient.deleted being true is a more rare
condition than false, since false is the default.

So, create your index this way to make it smaller and faster:

create index dxname on sometable (bool_field) where bool_field IS TRUE;

Now you have a tiny little index that gets scanned ultra fast and is easy
to maintain.  You have to, however, access it the same way.  the proper
way to reference a bool field is with IS [NOT] {TRUE|FALSE}

select * from some_table where bool_field IS TRUE would match the index I
created aboce.

select * from some_table where bool_field = 't' would not.


Re: Index not used, performance problem

From
Tom Lane
Date:
"scott.marlowe" <scott.marlowe@ihs.com> writes:
> So, create your index this way to make it smaller and faster:
> create index dxname on sometable (bool_field) where bool_field IS TRUE;

Also note that the index itself could be on some other column; for
example if you do

    create index fooi on foo (intcol) where boolcol;

then a query like

    select ... from foo where intcol >= 42 and boolcol;

could use the index to exploit both WHERE conditions.

> You have to, however, access it the same way.  the proper
> way to reference a bool field is with IS [NOT] {TRUE|FALSE}

This strikes me as pedantry.  "WHERE bool" (resp. "WHERE NOT bool") has
the same semantics and is easier to read, at least to me.  (Of course,
if you think differently, then by all means write the form that seems
clearest to you.)

But yeah, the condition appearing in the actual queries had best match
what's used in the partial-index CREATE command exactly.  The planner is
not real smart about deducing "this implies that".

            regards, tom lane