Thread: Sequence scans on indexed row
Hello all. I am doing a straight forward update on a table with over 1.6 million records based on a where clause that only uses one of its indexed columns (btree on a varchar). However postgres instists on not using the index and instead does a sequence scan that takes ages to complete. The weird thing is that the same table has another column indexed in exactly the same way (btree on a varchar) which postgres correctly uses the index on. One difference between the two columns is that on one I have fixed the maximum size of the varchar to 80 while the other is free to grow as much as it wants. The other difference between these two columns (or rather the kind of data they contain) is that the column for which postgres *uses* the index on contains around 32000 distinct values while the other only contains 14 distinct values. I have a suspicion that postgres (i'm using 7.1.3) incorrectly assumes that a sequence scan is cheaper for the column with the 14 distinct values in it and ends up scanning 1.6 million rows. Bug? If so, has it been fixed on later versions? TIA for any replies, kp
On Sat, 2003-04-19 at 22:36, kp wrote: > The other difference between these two columns (or rather the kind of > data they contain) is that the column for which postgres *uses* the > index on contains around 32000 distinct values while the other only > contains 14 distinct values. Did you do a vacuum analyze on the tables in question? There are some other more complex possibilities, but best to start w/ the basics. -- Tim Ellis Senior Database Architect and author, tedia2sql (http://tedia2sql.tigris.org) If this helped you, http://svcs.affero.net/rm.php?r=philovivero
kp <pgsql@pobox.gr> writes: > The other difference between these two columns (or rather the kind of > data they contain) is that the column for which postgres *uses* the > index on contains around 32000 distinct values while the other only > contains 14 distinct values. I think it's doing what it's supposed to, then. Using an indexscan to select 1/14th of a table is a loser --- the seqscan will be faster. (If you disbelieve this, try timing it both ways. You can set enable_seqscan to off to force the planner to do it the other way.) > I have a suspicion that postgres (i'm using 7.1.3) incorrectly assumes > that a sequence scan is cheaper for the column with the 14 distinct > values in it and ends up scanning 1.6 million rows. The only way it's wrong is if the particular value being searched for is much less than 1/14th of the table. IIRC Postgres 7.1 does not have statistics about anything beyond the most common value of the column, and so it's easily misled by nonuniform data distributions. If that's your situation then an update to 7.2.* or 7.3.* seems called for. regards, tom lane
Tom Lane wrote: > The only way it's wrong is if the particular value being searched for is > much less than 1/14th of the table. IIRC Postgres 7.1 does not have > statistics about anything beyond the most common value of the column, > and so it's easily misled by nonuniform data distributions. If that's > your situation then an update to 7.2.* or 7.3.* seems called for. And this is exactly the case... This is the data distribution for the column with the 14 distinct values: code | record_count ----------+-------------- | 186802 aa | 432 bb | 31557 cc | 5599 dd | 15511 ee | 8364 ff | 769796 gg | 328 hh | 4911 ii | 1904 jj | 9145 kk | 576944 ll | 2296 OK. I will upgrade my installation and see what happens. Thanks very much for the advice. Best regards... kp
Hello, I had the same problem days ago. My fault was that I didn't use 'XXX' on update. I mean: Update Table set column = XXX where indexcoulm='something'; Perhaps it is something like this... good luck yours Cris.. ----- Original Message ----- From: "kp" <pgsql@pobox.gr> To: <pgsql-admin@postgresql.org> Sent: Sunday, April 20, 2003 7:36 AM Subject: [ADMIN] Sequence scans on indexed row > Hello all. > > I am doing a straight forward update on a table with over 1.6 million > records based on a where clause that only uses one of its indexed > columns (btree on a varchar). However postgres instists on not using the > index and instead does a sequence scan that takes ages to complete. > > The weird thing is that the same table has another column indexed in > exactly the same way (btree on a varchar) which postgres correctly uses > the index on. > > One difference between the two columns is that on one I have fixed the > maximum size of the varchar to 80 while the other is free to grow as > much as it wants. > > The other difference between these two columns (or rather the kind of > data they contain) is that the column for which postgres *uses* the > index on contains around 32000 distinct values while the other only > contains 14 distinct values. > > I have a suspicion that postgres (i'm using 7.1.3) incorrectly assumes > that a sequence scan is cheaper for the column with the 14 distinct > values in it and ends up scanning 1.6 million rows. > > Bug? If so, has it been fixed on later versions? > > TIA for any replies, > kp > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly