Thread: Seqscan rather than Index

From:
Jon Anderson
Date:

I have a table 'Alias' with 541162 rows.  It's created as follows:

CREATE TABLE alias
(
  id int4 NOT NULL,
  person_id int4 NOT NULL,
  last_name varchar(30),
  first_name varchar(30),
  middle_name varchar(30),
  questioned_identity_flag varchar,
  CONSTRAINT alias_pkey PRIMARY KEY (id)
)

After populating the data, (I can provide a data file if necessary)
 I created 2 indexes as follows:
CREATE INDEX "PX_Alias"  ON alias  USING btree  (id);
ALTER TABLE alias CLUSTER ON "PX_Alias";
CREATE INDEX "IX_Alias_Last_Name"  ON alias  USING btree (last_name);
VACUUM FULL ANALYSE Alias

Then I run a query:
SELECT * FROM Alias WHERE last_name = 'ANDERSON'
This results in a seqscan, rather than an index scan:
   {SEQSCAN
   :startup_cost 0.00
   :total_cost 11970.53
   :plan_rows 3608
   :plan_width 41
   :targetlist (
      {TARGETENTRY
      :resdom
         {RESDOM
         :resno 1
         :restype 23
         :restypmod -1
         :resname id
         :ressortgroupref 0
         :resorigtbl 2780815
         :resorigcol 1
         :resjunk false
         }
      :expr
         {VAR
         :varno 1
         :varattno 1
         :vartype 23
         :vartypmod -1
         :varlevelsup 0
         :varnoold 1
         :varoattno 1
         }
      }
      {TARGETENTRY
      :resdom
         {RESDOM
         :resno 2
         :restype 23
         :restypmod -1
         :resname person_id
         :ressortgroupref 0
         :resorigtbl 2780815
         :resorigcol 2
         :resjunk false
         }
      :expr
         {VAR
         :varno 1
         :varattno 2
         :vartype 23
         :vartypmod -1
         :varlevelsup 0
         :varnoold 1
         :varoattno 2
         }
      }
      {TARGETENTRY
      :resdom
         {RESDOM
         :resno 3
         :restype 1043
         :restypmod 34
         :resname last_name
         :ressortgroupref 0
         :resorigtbl 2780815
         :resorigcol 3
         :resjunk false
         }
      :expr
         {VAR
         :varno 1
         :varattno 3
         :vartype 1043
         :vartypmod 34
         :varlevelsup 0
         :varnoold 1
         :varoattno 3
         }
      }
      {TARGETENTRY
      :resdom
         {RESDOM
         :resno 4
         :restype 1043
         :restypmod 34
         :resname first_name
         :ressortgroupref 0
         :resorigtbl 2780815
         :resorigcol 4
         :resjunk false
         }
      :expr
         {VAR
         :varno 1
         :varattno 4
         :vartype 1043
         :vartypmod 34
         :varlevelsup 0
         :varnoold 1
         :varoattno 4
         }
      }
      {TARGETENTRY
      :resdom
         {RESDOM
         :resno 5
         :restype 1043
         :restypmod 34
         :resname middle_name
         :ressortgroupref 0
         :resorigtbl 2780815
         :resorigcol 5
         :resjunk false
         }
      :expr
         {VAR
         :varno 1
         :varattno 5
         :vartype 1043
         :vartypmod 34
         :varlevelsup 0
         :varnoold 1
         :varoattno 5
         }
      }
      {TARGETENTRY
      :resdom
         {RESDOM
         :resno 6
         :restype 1043
         :restypmod -1
         :resname questioned_identity_flag
         :ressortgroupref 0
         :resorigtbl 2780815
         :resorigcol 6
         :resjunk false
         }
      :expr
         {VAR
         :varno 1
         :varattno 6
         :vartype 1043
         :vartypmod -1
         :varlevelsup 0
         :varnoold 1
         :varoattno 6
         }
      }
   )
   :qual (
      {OPEXPR
      :opno 98
      :opfuncid 67
      :opresulttype 16
      :opretset false
      :args (
         {RELABELTYPE
         :arg
            {VAR
            :varno 1
            :varattno 3
            :vartype 1043
            :vartypmod 34
            :varlevelsup 0
            :varnoold 1
            :varoattno 3
            }
         :resulttype 25
         :resulttypmod -1
         :relabelformat 0
         }
         {CONST
         :consttype 25
         :constlen -1
         :constbyval false
         :constisnull false
         :constvalue 12 [ 12 0 0 0 65 78 68 69 82 83 79 78 ]
         }
      )
      }
   )
   :lefttree <>
   :righttree <>
   :initPlan <>
   :extParam (b)
   :allParam (b)
   :nParamExec 0
   :scanrelid 1
   }

Seq Scan on alias  (cost=0.00..11970.53 rows=3608 width=41) (actual
time=0.000..2103.000 rows=4443 loops=1)
  Filter: ((last_name)::text = 'ANDERSON'::text)
Total runtime: 2153.000 ms


If I:
SET enable_seqscan TO off;

Then the query takes about 300 milliseconds, and uses the index scan.
It seems that the cost estimate is slightly higher for the index scan,
but in reality, it is much faster:


   {INDEXSCAN
   :startup_cost 0.00
   :total_cost 12148.18
   :plan_rows 3608
   :plan_width 41
   :targetlist (
      {TARGETENTRY
      :resdom
         {RESDOM
         :resno 1
         :restype 23
         :restypmod -1
         :resname id
         :ressortgroupref 0
         :resorigtbl 2780815
         :resorigcol 1
         :resjunk false
         }
      :expr
         {VAR
         :varno 1
         :varattno 1
         :vartype 23
         :vartypmod -1
         :varlevelsup 0
         :varnoold 1
         :varoattno 1
         }
      }
      {TARGETENTRY
      :resdom
         {RESDOM
         :resno 2
         :restype 23
         :restypmod -1
         :resname person_id
         :ressortgroupref 0
         :resorigtbl 2780815
         :resorigcol 2
         :resjunk false
         }
      :expr
         {VAR
         :varno 1
         :varattno 2
         :vartype 23
         :vartypmod -1
         :varlevelsup 0
         :varnoold 1
         :varoattno 2
         }
      }
      {TARGETENTRY
      :resdom
         {RESDOM
         :resno 3
         :restype 1043
         :restypmod 34
         :resname last_name
         :ressortgroupref 0
         :resorigtbl 2780815
         :resorigcol 3
         :resjunk false
         }
      :expr
         {VAR
         :varno 1
         :varattno 3
         :vartype 1043
         :vartypmod 34
         :varlevelsup 0
         :varnoold 1
         :varoattno 3
         }
      }
      {TARGETENTRY
      :resdom
         {RESDOM
         :resno 4
         :restype 1043
         :restypmod 34
         :resname first_name
         :ressortgroupref 0
         :resorigtbl 2780815
         :resorigcol 4
         :resjunk false
         }
      :expr
         {VAR
         :varno 1
         :varattno 4
         :vartype 1043
         :vartypmod 34
         :varlevelsup 0
         :varnoold 1
         :varoattno 4
         }
      }
      {TARGETENTRY
      :resdom
         {RESDOM
         :resno 5
         :restype 1043
         :restypmod 34
         :resname middle_name
         :ressortgroupref 0
         :resorigtbl 2780815
         :resorigcol 5
         :resjunk false
         }
      :expr
         {VAR
         :varno 1
         :varattno 5
         :vartype 1043
         :vartypmod 34
         :varlevelsup 0
         :varnoold 1
         :varoattno 5
         }
      }
      {TARGETENTRY
      :resdom
         {RESDOM
         :resno 6
         :restype 1043
         :restypmod -1
         :resname questioned_identity_flag
         :ressortgroupref 0
         :resorigtbl 2780815
         :resorigcol 6
         :resjunk false
         }
      :expr
         {VAR
         :varno 1
         :varattno 6
         :vartype 1043
         :vartypmod -1
         :varlevelsup 0
         :varnoold 1
         :varoattno 6
         }
      }
   )
   :qual <>
   :lefttree <>
   :righttree <>
   :initPlan <>
   :extParam (b)
   :allParam (b)
   :nParamExec 0
   :scanrelid 1
   :indxid (o 5117678)
   :indxqual ((
      {OPEXPR
      :opno 98
      :opfuncid 67
      :opresulttype 16
      :opretset false
      :args (
         {VAR
         :varno 1
         :varattno 1
         :vartype 1043
         :vartypmod 34
         :varlevelsup 0
         :varnoold 1
         :varoattno 3
         }
         {CONST
         :consttype 25
         :constlen -1
         :constbyval false
         :constisnull false
         :constvalue 12 [ 12 0 0 0 65 78 68 69 82 83 79 78 ]
         }
      )
      }
   ))
   :indxqualorig ((
      {OPEXPR
      :opno 98
      :opfuncid 67
      :opresulttype 16
      :opretset false
      :args (
         {RELABELTYPE
         :arg
            {VAR
            :varno 1
            :varattno 3
            :vartype 1043
            :vartypmod 34
            :varlevelsup 0
            :varnoold 1
            :varoattno 3
            }
         :resulttype 25
         :resulttypmod -1
         :relabelformat 0
         }
         {CONST
         :consttype 25
         :constlen -1
         :constbyval false
         :constisnull false
         :constvalue 12 [ 12 0 0 0 65 78 68 69 82 83 79 78 ]
         }
      )
      }
   ))
   :indxstrategy ((i 3))
   :indxsubtype ((o 0))
   :indxlossy ((i 0))
   :indxorderdir 1
   }

Index Scan using "IX_Alias_Last_Name" on alias  (cost=0.00..12148.18
rows=3608 width=41) (actual time=0.000..200.000 rows=4443 loops=1)
  Index Cond: ((last_name)::text = 'ANDERSON'::text)
Total runtime: 220.000 ms

Dropping the index and cluster on the id doesn't make any difference.

According to the pg_stats table,  'ANDERSON' is one of the most
frequent values; howerver, querying by another 'JACKSON', will use the
index scan.

Any hints on what to do to make PostgreSQL use the index?  This seems
like a fairly simple case, isn't it?  (I'm using 8.0-rc1 on windows.)

From:
Tom Lane
Date:

Jon Anderson <> writes:
> Any hints on what to do to make PostgreSQL use the index?

You might want to reduce random_page_cost a little.

Keep in mind that your test case is small enough to fit in RAM and is
probably not reflective of what will happen with larger tables.

            regards, tom lane

From:
David Brown
Date:

> You might want to reduce random_page_cost a little.

> Keep in mind that your test case is small enough to fit in RAM and is
> probably not reflective of what will happen with larger tables.

I am also running 8.0 rc1 for Windows. Despite many hours spent tweaking various planner cost constants, I found little
effecton cost estimates. Even reducing random_page_cost from 4.0 to 0.1 had negligible impact and failed to
significantlyinfluence the planner. 

Increasing the statistics target for the last_name column to 250 or so *may* help, at least if you're only selecting
onename at a time. That's the standard advice around here and the only thing I've found useful. Half the threads in
thisforum are about under-utilized indexes. It would be great if someone could admit the planner is broken and talk
aboutactually fixing it! 

I'm unconvinced that the planner only favours sequential scans as table size decreases. In my experience so far, larger
tableshave the same problem only it's more noticeable. 

The issue hits PostgreSQL harder than others because of its awful sequential scan speed, which is two to five times
slowerthan other DBMS. The archives show there has been talk for years about this, but it seems, no solution. The
obviousthing to consider is the block size, but people have tried increasing this in the past with only marginal
success.

Regards

David

From:
Richard Huxton
Date:

David Brown wrote:
>> You might want to reduce random_page_cost a little.
>
>
>> Keep in mind that your test case is small enough to fit in RAM and
>> is probably not reflective of what will happen with larger tables.
>
>
> I am also running 8.0 rc1 for Windows. Despite many hours spent
> tweaking various planner cost constants, I found little effect on
> cost estimates. Even reducing random_page_cost from 4.0 to 0.1 had
> negligible impact and failed to significantly influence the planner.

I'm not sure setting random_page_cost below 1.0 makes much sense.

> Increasing the statistics target for the last_name column to 250 or
> so *may* help, at least if you're only selecting one name at a time.

Not going to do anything in this case. The planner is roughly right
about how many rows will be returned, it's just not expecting everything
to be in RAM.

> That's the standard advice around here and the only thing I've found
> useful. Half the threads in this forum are about under-utilized
> indexes. It would be great if someone could admit the planner is
> broken and talk about actually fixing it!

Not sure I agree here - when the stats are accurate, you can get the
planner to make near-optimal choices most of the time. Is there any
particular pattern you've seen?

> I'm unconvinced that the planner only favours sequential scans as
> table size decreases. In my experience so far, larger tables have the
> same problem only it's more noticeable.

Hmm - assuming your statistics are good, this would suggest the other
cost settings just aren't right for your hardware.

> The issue hits PostgreSQL harder than others because of its awful
> sequential scan speed, which is two to five times slower than other
> DBMS. The archives show there has been talk for years about this, but
> it seems, no solution. The obvious thing to consider is the block
> size, but people have tried increasing this in the past with only
> marginal success.

Must admit this puzzles me. Are you saying you can't saturate your disk
I/O? Or are you saying other DBMS store records in 0.5 to 0.2 times less
space than PG?

--
   Richard Huxton
   Archonet Ltd

From:
Greg Stark
Date:

Richard Huxton <> writes:

> Not going to do anything in this case. The planner is roughly right about how
> many rows will be returned, it's just not expecting everything to be in RAM.

That doesn't make sense or else it would switch to the index at
random_page_cost = 1.0. If it was still using a sequential scan at
random_page_cost < 1 then perhaps he had some problem with his query like
mismatched data types that forced it to use a full scan.

> > That's the standard advice around here and the only thing I've found
> > useful. Half the threads in this forum are about under-utilized
> > indexes. It would be great if someone could admit the planner is
> > broken and talk about actually fixing it!
>
> Not sure I agree here - when the stats are accurate, you can get the planner to
> make near-optimal choices most of the time. Is there any particular pattern
> you've seen?

The most common cause I've seen here is that Postgres makes very pessimistic
assumptions about selectivity when it doesn't know better. Every other
database I've tested assumes 'col > ?' is about 5% selectivity . Postgres
assumes 33%.

Postgres is also more pessimistic about the efficiency of index scans. It's
willing to use a sequential scan down to well below 5% selectivity when other
databases use the more traditional rule of thumb of 10%.

In combination these effects do seem to cause an _awful_ lot of complaints.


> > The issue hits PostgreSQL harder than others because of its awful
> > sequential scan speed, which is two to five times slower than other
> > DBMS. The archives show there has been talk for years about this, but
> > it seems, no solution. The obvious thing to consider is the block
> > size, but people have tried increasing this in the past with only
> > marginal success.
>
> Must admit this puzzles me. Are you saying you can't saturate your disk I/O? Or
> are you saying other DBMS store records in 0.5 to 0.2 times less space than PG?

I don't know what he's talking about either. Perhaps he's thinking of people
who haven't been running vacuum enough?

--
greg

From:
Tom Lane
Date:

Greg Stark <> writes:
> Postgres is also more pessimistic about the efficiency of index scans. It's
> willing to use a sequential scan down to well below 5% selectivity when other
> databases use the more traditional rule of thumb of 10%.

However, other databases are probably basing their analysis on a
different execution model.  Since we have to visit both heap and index
in all cases, we do indeed have a larger penalty for index use.

I've looked pretty closely at the cost model for index access, believe me.
It's not pessimistic; if anything it is undercharging for index access.
(For one thing it treats the index's internal fetches as sequential
access, when in reality they are probably random.)

I think the one effect that's not being modeled is amortization of index
fetches across successive queries.  The cost model is pretty much based
on the assumption that each query starts from ground zero, whereas in
reality a heavily used index will certainly have all its upper levels in
RAM, and if it's not too large the leaf pages might all be cached too.
I wouldn't want to switch the planner over to making that assumption
exclusively, but we could talk about having a cost parameter that dials
the assumption up or down.

Awhile back I tried rewriting btcostestimate to charge zero for
accessing the metapage and the upper index levels, but charge
random_page_cost for fetching leaf pages.  For small indexes this came
out with worse (larger) numbers than we have now, which is not the
direction we want to go in :-(.  So I think that we have to somehow
honestly model caching of index pages across queries.

Of course, to be completely fair such a modification should account for
caching of heap pages as well, so it would also bring down the estimates
for seqscans.  But I'd be willing to accept a model that considers only
caching of index pages as a zero-order approximation.

            regards, tom lane

From:
Greg Stark
Date:

Tom Lane <> writes:

> Greg Stark <> writes:
> > Postgres is also more pessimistic about the efficiency of index scans. It's
> > willing to use a sequential scan down to well below 5% selectivity when other
> > databases use the more traditional rule of thumb of 10%.
>
> However, other databases are probably basing their analysis on a
> different execution model.  Since we have to visit both heap and index
> in all cases, we do indeed have a larger penalty for index use.

It's only in special cases that other databases do not have to look at the
heap. For simple queries like "select * from x where foo > ?" they still have
to look at the heap. I never looked into how much of a bonus Oracle gives for
the index-only case, I'm not sure it even takes it into account.

> I've looked pretty closely at the cost model for index access, believe me.
> It's not pessimistic; if anything it is undercharging for index access.

I think there's another effect here beyond the physical arithmetic. There's a
kind of teleological reasoning that goes something like "If the user created
the index chances are it's because he wanted it to be used".

I guess that argues more for more aggressive selectivity estimates than for
biased index costing though. If I'm doing "where foo > ?" then if there's an
index on foo I probably put it there for a reason and want it to be used even
if postgres doesn't really have a clue how selective the query will be.

> I think the one effect that's not being modeled is amortization of index
> fetches across successive queries.

And across multiple fetches in a single query, such as with a nested loop.

It seems like the effective_cache_size parameter should be having some
influence here.

--
greg

From:
Tom Lane
Date:

Greg Stark <> writes:
> Tom Lane <> writes:
>> I think the one effect that's not being modeled is amortization of index
>> fetches across successive queries.

> And across multiple fetches in a single query, such as with a nested loop.

Right, that's effectively the same problem.  You could imagine making a
special-purpose solution for nestloop queries but I think the issue is
more general than that.

> It seems like the effective_cache_size parameter should be having some
> influence here.

But it doesn't :-(.  e_c_s is currently only used to estimate
amortization of repeated heap-page fetches within a single indexscan.

            regards, tom lane

From:
"Steinar H. Gunderson"
Date:

On Fri, Dec 17, 2004 at 10:47:57AM -0500, Greg Stark wrote:
>> Must admit this puzzles me. Are you saying you can't saturate your disk I/O? Or
>> are you saying other DBMS store records in 0.5 to 0.2 times less space than PG?
> I don't know what he's talking about either. Perhaps he's thinking of people
> who haven't been running vacuum enough?

I'm a bit unsure -- should counting ~3 million rows (no OIDs, PG 7.4,
everything in cache, 32-byte rows) take ~3500ms on an Athlon 64 2800+?

/* Steinar */
--
Homepage: http://www.sesse.net/

From:
"Steinar H. Gunderson"
Date:

On Fri, Dec 17, 2004 at 10:56:27PM +0100, Steinar H. Gunderson wrote:
> I'm a bit unsure -- should counting ~3 million rows (no OIDs, PG 7.4,
> everything in cache, 32-byte rows) take ~3500ms on an Athlon 64 2800+?

(I realize I was a bit unclear here. This is a completely separate case, not
related to the original poster -- I was just wondering if what I'm seeing is
normal or not.)

/* Steinar */
--
Homepage: http://www.sesse.net/

From:
Frank Wiles
Date:

On Fri, 17 Dec 2004 23:09:07 +0100
"Steinar H. Gunderson" <> wrote:

> On Fri, Dec 17, 2004 at 10:56:27PM +0100, Steinar H. Gunderson wrote:
> > I'm a bit unsure -- should counting ~3 million rows (no OIDs, PG
> > 7.4, everything in cache, 32-byte rows) take ~3500ms on an Athlon 64
> > 2800+?
>
> (I realize I was a bit unclear here. This is a completely separate
> case, not related to the original poster -- I was just wondering if
> what I'm seeing is normal or not.)

  It depends more on your disk IO than the processor.  Counting isn't
  processor intensive, but reading through the entire table on disk
  is.  I've also seen a huge difference between select count(*) and
  select count(1) in older versions, haven't tried it on a recent
  version however.

 ---------------------------------
   Frank Wiles <>
   http://www.wiles.org
 ---------------------------------


From:
"Steinar H. Gunderson"
Date:

On Fri, Dec 17, 2004 at 05:02:29PM -0600, Frank Wiles wrote:
>   It depends more on your disk IO than the processor.  Counting isn't
>   processor intensive, but reading through the entire table on disk
>   is.  I've also seen a huge difference between select count(*) and
>   select count(1) in older versions, haven't tried it on a recent
>   version however.

Like I said, all in cache, so no disk IO. count(*) and count(1) give me
identical results. (BTW, I don't think this is a count problem, it's a
"sequential scan" problem -- I'm just trying to find out if this is natural
or not, ie. if this is just something I have to expect in a relational
database, even with no I/O.)

/* Steinar */
--
Homepage: http://www.sesse.net/

From:
Bruno Wolff III
Date:

On Fri, Dec 17, 2004 at 22:56:27 +0100,
  "Steinar H. Gunderson" <> wrote:
>
> I'm a bit unsure -- should counting ~3 million rows (no OIDs, PG 7.4,
> everything in cache, 32-byte rows) take ~3500ms on an Athlon 64 2800+?

It doesn't seem totally out of wack. You will be limited by the memory
bandwidth and it looks like you get something on the order of a few
hundred references to memory per row. That may be a little high, but
it doesn't seem ridiculously high.

From:
Tom Lane
Date:

Frank Wiles <> writes:
>   I've also seen a huge difference between select count(*) and
>   select count(1) in older versions,

That must have been before my time, ie, pre-6.4 or so.  There is
certainly zero difference now.

            regards, tom lane

From:
"Steinar H. Gunderson"
Date:

On Fri, Dec 17, 2004 at 10:39:18PM -0600, Bruno Wolff III wrote:
> It doesn't seem totally out of wack. You will be limited by the memory
> bandwidth and it looks like you get something on the order of a few
> hundred references to memory per row. That may be a little high, but
> it doesn't seem ridiculously high.

I just tested 8.0.0rc1 -- I got a _50%_ speedup on this operation...

/* Steinar */
--
Homepage: http://www.sesse.net/

From:
Frank Wiles
Date:

On Fri, 17 Dec 2004 23:37:37 -0500
Tom Lane <> wrote:

> Frank Wiles <> writes:
> >   I've also seen a huge difference between select count(*) and
> >   select count(1) in older versions,
>
> That must have been before my time, ie, pre-6.4 or so.  There is
> certainly zero difference now.

  Yeah now that I think about it that sounds about the right time
  frame I last benchmarked it.

 ---------------------------------
   Frank Wiles <>
   http://www.wiles.org
 ---------------------------------