Thread: Sequence scans on indexed row

Sequence scans on indexed row

From
kp
Date:
Hello all.

I am doing a straight forward update on a table with over 1.6 million
records based on a where clause that only uses one of its indexed
columns (btree on a varchar). However postgres instists on not using the
index and instead does a sequence scan that takes ages to complete.

The weird thing is that the same table has another column indexed in
exactly the same way (btree on a varchar) which postgres correctly uses
the index on.

One difference between the two columns is that on one I have fixed the
maximum size of the varchar to 80 while the other is free to grow as
much as it wants.

The other difference between these two columns (or rather the kind of
data they contain) is that the column for which postgres *uses* the
index on contains around 32000 distinct values while the other only
contains 14 distinct values.

I have a suspicion that postgres (i'm using 7.1.3) incorrectly assumes
that a sequence scan is cheaper for the column with the 14 distinct
values in it and ends up scanning 1.6 million rows.

Bug? If so, has it been fixed on later versions?

TIA for any replies,
kp


Re: Sequence scans on indexed row

From
Tim Ellis
Date:
On Sat, 2003-04-19 at 22:36, kp wrote:
> The other difference between these two columns (or rather the kind of
> data they contain) is that the column for which postgres *uses* the
> index on contains around 32000 distinct values while the other only
> contains 14 distinct values.

Did you do a vacuum analyze on the tables in question? There are some
other more complex possibilities, but best to start w/ the basics.

--
Tim Ellis
Senior Database Architect and author, tedia2sql (http://tedia2sql.tigris.org)
If this helped you, http://svcs.affero.net/rm.php?r=philovivero


Re: Sequence scans on indexed row

From
Tom Lane
Date:
kp <pgsql@pobox.gr> writes:
> The other difference between these two columns (or rather the kind of
> data they contain) is that the column for which postgres *uses* the
> index on contains around 32000 distinct values while the other only
> contains 14 distinct values.

I think it's doing what it's supposed to, then.  Using an indexscan to
select 1/14th of a table is a loser --- the seqscan will be faster.
(If you disbelieve this, try timing it both ways.  You can set
enable_seqscan to off to force the planner to do it the other way.)

> I have a suspicion that postgres (i'm using 7.1.3) incorrectly assumes
> that a sequence scan is cheaper for the column with the 14 distinct
> values in it and ends up scanning 1.6 million rows.

The only way it's wrong is if the particular value being searched for is
much less than 1/14th of the table.  IIRC Postgres 7.1 does not have
statistics about anything beyond the most common value of the column,
and so it's easily misled by nonuniform data distributions.  If that's
your situation then an update to 7.2.* or 7.3.* seems called for.

            regards, tom lane


Re: Sequence scans on indexed row

From
kp
Date:
Tom Lane wrote:
> The only way it's wrong is if the particular value being searched for is
> much less than 1/14th of the table.  IIRC Postgres 7.1 does not have
> statistics about anything beyond the most common value of the column,
> and so it's easily misled by nonuniform data distributions.  If that's
> your situation then an update to 7.2.* or 7.3.* seems called for.
And this is exactly the case... This is the data distribution for the
column with the 14 distinct values:
  code     | record_count
----------+--------------
           |     186802
  aa       |        432
  bb       |      31557
  cc       |       5599
  dd       |      15511
  ee       |       8364
  ff       |     769796
  gg       |        328
  hh       |       4911
  ii       |       1904
  jj       |       9145
  kk       |     576944
  ll       |       2296

OK. I will upgrade my installation and see what happens.

Thanks very much for the advice.

Best regards...
kp


Re: Sequence scans on indexed row

From
"Cristina Surroca"
Date:
Hello,
I had the same problem days ago. My fault was that I didn't use 'XXX' on
update. I mean:
Update Table set column = XXX where indexcoulm='something';

Perhaps it is something like this...

good luck

yours

Cris..
----- Original Message -----
From: "kp" <pgsql@pobox.gr>
To: <pgsql-admin@postgresql.org>
Sent: Sunday, April 20, 2003 7:36 AM
Subject: [ADMIN] Sequence scans on indexed row


> Hello all.
>
> I am doing a straight forward update on a table with over 1.6 million
> records based on a where clause that only uses one of its indexed
> columns (btree on a varchar). However postgres instists on not using the
> index and instead does a sequence scan that takes ages to complete.
>
> The weird thing is that the same table has another column indexed in
> exactly the same way (btree on a varchar) which postgres correctly uses
> the index on.
>
> One difference between the two columns is that on one I have fixed the
> maximum size of the varchar to 80 while the other is free to grow as
> much as it wants.
>
> The other difference between these two columns (or rather the kind of
> data they contain) is that the column for which postgres *uses* the
> index on contains around 32000 distinct values while the other only
> contains 14 distinct values.
>
> I have a suspicion that postgres (i'm using 7.1.3) incorrectly assumes
> that a sequence scan is cheaper for the column with the 14 distinct
> values in it and ends up scanning 1.6 million rows.
>
> Bug? If so, has it been fixed on later versions?
>
> TIA for any replies,
> kp
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly