Thread: why sequential scan

why sequential scan

From
newsreader@mediaone.net
Date:
Can someone explain why pg is doing
a sequential scan on table item with the following
statement

-----
q=> explain select distinct h.id,i.item,i.heading,i.finish from item i ,body_index h where h.id=i.item and
(h.string='book')order by finish; 
NOTICE:  QUERY PLAN:

Unique  (cost=6591.46..6606.51 rows=150 width=24)
  ->  Sort  (cost=6591.46..6591.46 rows=1505 width=24)
        ->  Hash Join  (cost=5323.27..6512.04 rows=1505 width=24)
              ->  Seq Scan on item i  (cost=0.00..964.39 rows=5139 width=20)
              ->  Hash  (cost=5319.51..5319.51 rows=1505 width=4)
                    ->  Index Scan using body_index_string on body_index h  (cost=0.00..5319.51 rows=1505 width=4)

---------

"item" table has integer primary key "item".  It has
15 or so other columns.

The performance is not very impressive with about
5000 records in item table and 1.5 million record in
body_index and both are supposed to get
much bigger in the real life situation

Is the performance bottle neck that
particular sequential scan?

The database has just been vacuumed.

Thanks in advance

Re: why sequential scan

From
Tom Lane
Date:
newsreader@mediaone.net writes:
> Can someone explain why pg is doing
> a sequential scan on table item with the following
> statement

Looks like a fairly reasonable plan to me, if the rows estimates are
accurate.  Are they?

            regards, tom lane

Re: why sequential scan

From
newsreader@mediaone.net
Date:
Two estimates I undestand are quite
good.

select distinct id on body_index where string='book'

returns about 1500 rows.  That matches with
the bottom line of the plan

There are 5139 rows in table item.  It is
the same number of rows in the plan for
sequential scan

If I were doing a maual join I would do

q=> select distinct id on body_index where string='book'

which gives me an index scan

I would then iterate over each id I get and
look up in item like this

q=> select * from item where item =? order by finish

Explain gives me 1 row estimate for each lookup.
At most 1500 rows.  No?

Below is the original plan for easier reference
-------------
q=> explain select distinct h.id,i.item,i.heading,i.finish from item i ,body_index h where h.id=i.item and
+(h.string='book') order by finish;
NOTICE:  QUERY PLAN:

Unique  (cost=6591.46..6606.51 rows=150 width=24)
  ->  Sort  (cost=6591.46..6591.46 rows=1505 width=24)
        ->  Hash Join  (cost=5323.27..6512.04 rows=1505 width=24)
              ->  Seq Scan on item i  (cost=0.00..964.39 rows=5139 width=20)
              ->  Hash  (cost=5319.51..5319.51 rows=1505 width=4)
                    ->  Index Scan using body_index_string on body_index h  (cost=0.00..5319.51 rows=1505 width=4)
--------------


Thanks

On Thu, Aug 16, 2001 at 10:59:18AM -0400, Tom Lane wrote:
> newsreader@mediaone.net writes:
> > Can someone explain why pg is doing
> > a sequential scan on table item with the following
> > statement
>
> Looks like a fairly reasonable plan to me, if the rows estimates are
> accurate.  Are they?
>
>             regards, tom lane

Re: why sequential scan

From
Tom Lane
Date:
newsreader@mediaone.net writes:
> I would then iterate over each id I get and
> look up in item like this

> q=> select * from item where item =? order by finish

That's a nestloop join with inner indexscan.  The planner did consider
that, and rejected it as slower than the hashjoin it chose.  Now,
whether its cost model is accurate for your situation is hard to tell;
but personally I'd bet that it's right.  1500 index probes probably
are slower than a sequential scan over 5000 items.

You could probably force the planner to choose that plan by setting
enable_hashjoin and enable_mergejoin to OFF.  It'd be interesting to
see the EXPLAIN result in that situation, as well as actual timings
of the query both ways.

            regards, tom lane

Re: why sequential scan

From
newsreader@mediaone.net
Date:
On Thu, Aug 16, 2001 at 08:10:41PM -0400, newsreader@mediaone.net wrote:
> Ok I set enable_hashjoin and enable_mergejoin to off
> and performance is much much worse: just over 1 second
> job becomes a minute job
>
> Perhaps I should re-check if the database
> gets bigger.
>
> Thanks a lot
>
> On Thu, Aug 16, 2001 at 12:45:28PM -0400, Tom Lane wrote:
> > newsreader@mediaone.net writes:
> > > I would then iterate over each id I get and
> > > look up in item like this
> >
> > > q=> select * from item where item =? order by finish
> >
> > That's a nestloop join with inner indexscan.  The planner did consider
> > that, and rejected it as slower than the hashjoin it chose.  Now,
> > whether its cost model is accurate for your situation is hard to tell;
> > but personally I'd bet that it's right.  1500 index probes probably
> > are slower than a sequential scan over 5000 items.
> >
> > You could probably force the planner to choose that plan by setting
> > enable_hashjoin and enable_mergejoin to OFF.  It'd be interesting to
> > see the EXPLAIN result in that situation, as well as actual timings
> > of the query both ways.
> >
> >             regards, tom lane
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 5: Have you checked our extensive FAQ?
> >
> > http://www.postgresql.org/users-lounge/docs/faq.html