Thread: Index not used, performance problem
Hi! I've got the following problem: PostgreSQL 7.2.1-2 (Debian) on Duron/700MHz, 512MB, IDE hdd (laptop). I've got a table that has 6400 rows, an index on the deleted, nachname, vorname and hvvsnummer attributes, and my O-R wrapper generate queries like this: SELECT patient.id, patient.vorname, patient.nachname, patient.titel, patient.geburtsdatum, patient.hvvsnummer, patient.geschlecht, patient.adresse_id, patient.beruf, patient.kommentar, patient.cave, patient.zusatzversicherung, patient.deleted FROM patient WHERE ((((patient.deleted = 'f') AND (patient.nachname LIKE 'K%')) AND (patient.vorname LIKE '%')) AND (patient.hvvsnummer LIKE '%')) This results in a SeqScan von patient. Even more curious is that simpler queries like select * from patient where deleted='f'; OR: select * from patient where nachname LIKE 'K%'; all result in SeqScan on patient. I've "analyzed" and "reindex" the table already multiple times, and still PostgreSQL insists upon not using any index. TIA for any pointers, Andreas mpp2=# \d patient Table "patient" Column | Type | Modifiers --------------------+--------------+------------- id | integer | not null vorname | text | not null nachname | text | not null titel | text | geburtsdatum | date | hvvsnummer | text | geschlecht | character(1) | adresse_id | integer | beruf | text | kommentar | text | cave | text | zusatzversicherung | text | deleted | boolean | default 'f' Indexes: patient_deleted, patient_hvvsnummer, patient_nachname, patient_vorname Primary key: patient_pkey Check constraints: "patient_geschlecht" (((geschlecht = 'm'::bpchar) OR (geschlecht = 'w'::bpchar)) OR (geschlecht = '?'::bpchar)) Triggers: RI_ConstraintTrigger_352787, RI_ConstraintTrigger_352789, RI_ConstraintTrigger_352801, RI_ConstraintTrigger_352803, RI_ConstraintTrigger_352815 mpp2=# select count(*) from patient; count ------- 6406 (1 row) mpp2=# explain SELECT * FROM patient WHERE (patient.nachname LIKE 'K%'); NOTICE: QUERY PLAN: Seq Scan on patient (cost=0.00..173.07 rows=272 width=70) EXPLAIN mpp2=# explain SELECT * FROM patient WHERE NOT deleted; NOTICE: QUERY PLAN: Seq Scan on patient (cost=0.00..157.06 rows=6406 width=70) EXPLAIN mpp2=# explain SELECT * FROM patient WHERE deleted='f'; NOTICE: QUERY PLAN: Seq Scan on patient (cost=0.00..173.07 rows=6406 width=70) EXPLAIN
Attachment
Hi Andreas, A few points: PostgreSQL is rarely going to use an index for a boolean column. The reason is that since almost by definition true will occupy 50% of the rows and false will occupy 50% (say). In this case, a sequential scan is always faster. You would say that the 'selectivity' isn't good enough. As for the LIKE searches, the only ones that PostgreSQL can index are of the form 'FOO%', which is what you are doing. However, I believe that PostgreSQL cannot do this if your database encoding is anything other than 'C'. So, if you are using an Austrian encoding, it might not be able to use the index. Some things to try: If you are always seeking over all four columns, then drop the 4 individual indexes and create one like this: create index my_key on patient(nachname, vorname, hvvsnummer); That would be more efficient, in the C locale. Also, what is the point of searching for LIKE '%'? Why not just leave that out? Chris On 29 Mar 2003, Andreas Kostyrka wrote: > Hi! > > I've got the following problem: > PostgreSQL 7.2.1-2 (Debian) on Duron/700MHz, 512MB, IDE hdd (laptop). > > I've got a table that has 6400 rows, an index on the deleted, nachname, > vorname and hvvsnummer attributes, and my O-R wrapper generate queries > like this: > > SELECT patient.id, patient.vorname, patient.nachname, patient.titel, > patient.geburtsdatum, patient.hvvsnummer, patient.geschlecht, > patient.adresse_id, patient.beruf, patient.kommentar, patient.cave, > patient.zusatzversicherung, patient.deleted FROM patient WHERE > ((((patient.deleted = 'f') AND (patient.nachname LIKE 'K%')) AND > (patient.vorname LIKE '%')) AND (patient.hvvsnummer LIKE '%')) > > This results in a SeqScan von patient. Even more curious is that simpler > queries like > > select * from patient where deleted='f'; OR: > select * from patient where nachname LIKE 'K%'; > > all result in SeqScan on patient. > > I've "analyzed" and "reindex" the table already multiple times, and > still PostgreSQL insists upon not using any index. > > TIA for any pointers, > > Andreas > > mpp2=# \d patient > Table "patient" > Column | Type | Modifiers > --------------------+--------------+------------- > id | integer | not null > vorname | text | not null > nachname | text | not null > titel | text | > geburtsdatum | date | > hvvsnummer | text | > geschlecht | character(1) | > adresse_id | integer | > beruf | text | > kommentar | text | > cave | text | > zusatzversicherung | text | > deleted | boolean | default 'f' > Indexes: patient_deleted, > patient_hvvsnummer, > patient_nachname, > patient_vorname > Primary key: patient_pkey > Check constraints: "patient_geschlecht" (((geschlecht = 'm'::bpchar) OR > (geschlecht = 'w'::bpchar)) OR (geschlecht = '?'::bpchar)) > Triggers: RI_ConstraintTrigger_352787, > RI_ConstraintTrigger_352789, > RI_ConstraintTrigger_352801, > RI_ConstraintTrigger_352803, > RI_ConstraintTrigger_352815 > > mpp2=# select count(*) from patient; > count > ------- > 6406 > (1 row) > > mpp2=# explain SELECT * FROM patient WHERE (patient.nachname LIKE 'K%'); > NOTICE: QUERY PLAN: > > Seq Scan on patient (cost=0.00..173.07 rows=272 width=70) > > EXPLAIN > mpp2=# explain SELECT * FROM patient WHERE NOT deleted; > NOTICE: QUERY PLAN: > > Seq Scan on patient (cost=0.00..157.06 rows=6406 width=70) > > EXPLAIN > mpp2=# explain SELECT * FROM patient WHERE deleted='f'; > NOTICE: QUERY PLAN: > > Seq Scan on patient (cost=0.00..173.07 rows=6406 width=70) > > EXPLAIN > > >
On Sat, Mar 29, 2003 at 09:47:51PM +0800, Christopher Kings-Lynne wrote: > the form 'FOO%', which is what you are doing. However, I believe that > PostgreSQL cannot do this if your database encoding is anything other than > 'C'. So, if you are using an Austrian encoding, it might not be able to That is, you need to have had the LOCALE set to 'C' when you did initdb. It's not enough to change it afterwards. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
In Linux (Redhat) where, exactly, does one set the LOCALE to C? TIA :-) John. On Saturday 29 March 2003 10:34 am, Andrew Sullivan wrote: > On Sat, Mar 29, 2003 at 09:47:51PM +0800, Christopher Kings-Lynne wrote: > > the form 'FOO%', which is what you are doing. However, I believe that > > PostgreSQL cannot do this if your database encoding is anything other > > than 'C'. So, if you are using an Austrian encoding, it might not be > > able to > > That is, you need to have had the LOCALE set to 'C' when you did > initdb. It's not enough to change it afterwards. > > A
On Sat, 2003-03-29 at 14:47, Christopher Kings-Lynne wrote: > As for the LIKE searches, the only ones that PostgreSQL can index are of > the form 'FOO%', which is what you are doing. However, I believe that > PostgreSQL cannot do this if your database encoding is anything other than > 'C'. So, if you are using an Austrian encoding, it might not be able to > use the index. Well, I use LATIN1. How do I store 8-bit chars else? And if so, PostgreSQL seems quite strongly broken, because a relational database relies by design heavily on indexes. > Also, what is the point of searching for LIKE '%'? Why not just leave that > out? Well, it's about generating the SQL query. Actually it's just a border case for searching for a given prefix. Andreas
Attachment
# su postgres % export LANG=C % /usr/local/pgsql/bin/initdb blah blah That always works for me! On Sat, 2003-03-29 at 08:49, John K. Herreshoff wrote: > In Linux (Redhat) where, exactly, does one set the LOCALE to C? > > TIA :-) > > John. > > On Saturday 29 March 2003 10:34 am, Andrew Sullivan wrote: > > On Sat, Mar 29, 2003 at 09:47:51PM +0800, Christopher Kings-Lynne wrote: > > > the form 'FOO%', which is what you are doing. However, I believe that > > > PostgreSQL cannot do this if your database encoding is anything other > > > than 'C'. So, if you are using an Austrian encoding, it might not be > > > able to > > > > That is, you need to have had the LOCALE set to 'C' when you did > > initdb. It's not enough to change it afterwards. > > > > A > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Jord Tanner <jord@indygecko.com>
On Sat, 2003-03-29 at 14:47, Christopher Kings-Lynne wrote: > Hi Andreas, > > A few points: > > PostgreSQL is rarely going to use an index for a boolean column. The > reason is that since almost by definition true will occupy 50% of the rows > and false will occupy 50% (say). In this case, a sequential scan is > always faster. You would say that the 'selectivity' isn't good enough. Well, perhaps it should collect statistics, because a "deleted" column is a prime candidate for a strongly skewed population. Andreas
Attachment
I have many boolean columns, and my queries almost always use indexes. Just because a column can have only 2 values does not mean that 50% of them will be true and 50% will be false. The ratio of T|F depends on the content. I have some boolean columns with less than 1% true. Obviously, an index will help with these ... and it does, tremendously. If you only have 6400 rows, it is *possible* that the planner will choose not to use an index, as using an index might be slower than just seqscanning. If you do lots of updates on that table, you might need to do a vacuum full occasionally, although I'm not certain how much that benefits a boolean field. Also, if possible, I would consider upgrading to a more recent version. I have seen many of the experts here post news about significant bug fixes between 7.2 and 7.3. (My experience with boolean fields is using 7.3.) In addition, when posting to the list, it is helpful to post an "explain analyze" for a query, as it gives more & better details (for those same experts, of which I am not). Andreas Kostyrka wrote: > On Sat, 2003-03-29 at 14:47, Christopher Kings-Lynne wrote: > >>Hi Andreas, >> >>A few points: >> >>PostgreSQL is rarely going to use an index for a boolean column. The >>reason is that since almost by definition true will occupy 50% of the rows >>and false will occupy 50% (say). In this case, a sequential scan is >>always faster. You would say that the 'selectivity' isn't good enough. > > Well, perhaps it should collect statistics, because a "deleted" column > is a prime candidate for a strongly skewed population. > > Andreas -- Matt Mello 512-350-6900
Andreas Kostyrka <andreas@mtg.co.at> writes: > On Sat, 2003-03-29 at 14:47, Christopher Kings-Lynne wrote: >> As for the LIKE searches, the only ones that PostgreSQL can index are of >> the form 'FOO%', which is what you are doing. However, I believe that >> PostgreSQL cannot do this if your database encoding is anything other than >> 'C'. So, if you are using an Austrian encoding, it might not be able to >> use the index. > Well, I use LATIN1. How do I store 8-bit chars else? You are both confusing locale with encoding. The LIKE optimization requires 'C' locale, but it should work with any encoding (or at least any single-byte encoding; not sure about multibyte). > And if so, PostgreSQL seems quite strongly broken, because a > relational database relies by design heavily on indexes. Some of us would reply that the locales are broken ;-). The bizarre sorting rules demanded by so many locales are what make it impossible to optimize a LIKE prefix into an indexscan. See the archives for the reasons why our many tries at this have failed. regards, tom lane
On 29 Mar 2003, Andreas Kostyrka wrote: > On Sat, 2003-03-29 at 14:47, Christopher Kings-Lynne wrote: > > Hi Andreas, > > > > A few points: > > > > PostgreSQL is rarely going to use an index for a boolean column. The > > reason is that since almost by definition true will occupy 50% of the rows > > and false will occupy 50% (say). In this case, a sequential scan is > > always faster. You would say that the 'selectivity' isn't good enough. > Well, perhaps it should collect statistics, because a "deleted" column > is a prime candidate for a strongly skewed population. It does. When you run analyze. You have vacuumed and analyzed the database right? Assuming you have, it's often better to make a partial index for your booleans. I'll assume that patient.deleted being true is a more rare condition than false, since false is the default. So, create your index this way to make it smaller and faster: create index dxname on sometable (bool_field) where bool_field IS TRUE; Now you have a tiny little index that gets scanned ultra fast and is easy to maintain. You have to, however, access it the same way. the proper way to reference a bool field is with IS [NOT] {TRUE|FALSE} select * from some_table where bool_field IS TRUE would match the index I created aboce. select * from some_table where bool_field = 't' would not.
"scott.marlowe" <scott.marlowe@ihs.com> writes: > So, create your index this way to make it smaller and faster: > create index dxname on sometable (bool_field) where bool_field IS TRUE; Also note that the index itself could be on some other column; for example if you do create index fooi on foo (intcol) where boolcol; then a query like select ... from foo where intcol >= 42 and boolcol; could use the index to exploit both WHERE conditions. > You have to, however, access it the same way. the proper > way to reference a bool field is with IS [NOT] {TRUE|FALSE} This strikes me as pedantry. "WHERE bool" (resp. "WHERE NOT bool") has the same semantics and is easier to read, at least to me. (Of course, if you think differently, then by all means write the form that seems clearest to you.) But yeah, the condition appearing in the actual queries had best match what's used in the partial-index CREATE command exactly. The planner is not real smart about deducing "this implies that". regards, tom lane