Thread: why sequential scan
Can someone explain why pg is doing a sequential scan on table item with the following statement ----- q=> explain select distinct h.id,i.item,i.heading,i.finish from item i ,body_index h where h.id=i.item and (h.string='book')order by finish; NOTICE: QUERY PLAN: Unique (cost=6591.46..6606.51 rows=150 width=24) -> Sort (cost=6591.46..6591.46 rows=1505 width=24) -> Hash Join (cost=5323.27..6512.04 rows=1505 width=24) -> Seq Scan on item i (cost=0.00..964.39 rows=5139 width=20) -> Hash (cost=5319.51..5319.51 rows=1505 width=4) -> Index Scan using body_index_string on body_index h (cost=0.00..5319.51 rows=1505 width=4) --------- "item" table has integer primary key "item". It has 15 or so other columns. The performance is not very impressive with about 5000 records in item table and 1.5 million record in body_index and both are supposed to get much bigger in the real life situation Is the performance bottle neck that particular sequential scan? The database has just been vacuumed. Thanks in advance
newsreader@mediaone.net writes: > Can someone explain why pg is doing > a sequential scan on table item with the following > statement Looks like a fairly reasonable plan to me, if the rows estimates are accurate. Are they? regards, tom lane
Two estimates I undestand are quite good. select distinct id on body_index where string='book' returns about 1500 rows. That matches with the bottom line of the plan There are 5139 rows in table item. It is the same number of rows in the plan for sequential scan If I were doing a maual join I would do q=> select distinct id on body_index where string='book' which gives me an index scan I would then iterate over each id I get and look up in item like this q=> select * from item where item =? order by finish Explain gives me 1 row estimate for each lookup. At most 1500 rows. No? Below is the original plan for easier reference ------------- q=> explain select distinct h.id,i.item,i.heading,i.finish from item i ,body_index h where h.id=i.item and +(h.string='book') order by finish; NOTICE: QUERY PLAN: Unique (cost=6591.46..6606.51 rows=150 width=24) -> Sort (cost=6591.46..6591.46 rows=1505 width=24) -> Hash Join (cost=5323.27..6512.04 rows=1505 width=24) -> Seq Scan on item i (cost=0.00..964.39 rows=5139 width=20) -> Hash (cost=5319.51..5319.51 rows=1505 width=4) -> Index Scan using body_index_string on body_index h (cost=0.00..5319.51 rows=1505 width=4) -------------- Thanks On Thu, Aug 16, 2001 at 10:59:18AM -0400, Tom Lane wrote: > newsreader@mediaone.net writes: > > Can someone explain why pg is doing > > a sequential scan on table item with the following > > statement > > Looks like a fairly reasonable plan to me, if the rows estimates are > accurate. Are they? > > regards, tom lane
newsreader@mediaone.net writes: > I would then iterate over each id I get and > look up in item like this > q=> select * from item where item =? order by finish That's a nestloop join with inner indexscan. The planner did consider that, and rejected it as slower than the hashjoin it chose. Now, whether its cost model is accurate for your situation is hard to tell; but personally I'd bet that it's right. 1500 index probes probably are slower than a sequential scan over 5000 items. You could probably force the planner to choose that plan by setting enable_hashjoin and enable_mergejoin to OFF. It'd be interesting to see the EXPLAIN result in that situation, as well as actual timings of the query both ways. regards, tom lane
On Thu, Aug 16, 2001 at 08:10:41PM -0400, newsreader@mediaone.net wrote: > Ok I set enable_hashjoin and enable_mergejoin to off > and performance is much much worse: just over 1 second > job becomes a minute job > > Perhaps I should re-check if the database > gets bigger. > > Thanks a lot > > On Thu, Aug 16, 2001 at 12:45:28PM -0400, Tom Lane wrote: > > newsreader@mediaone.net writes: > > > I would then iterate over each id I get and > > > look up in item like this > > > > > q=> select * from item where item =? order by finish > > > > That's a nestloop join with inner indexscan. The planner did consider > > that, and rejected it as slower than the hashjoin it chose. Now, > > whether its cost model is accurate for your situation is hard to tell; > > but personally I'd bet that it's right. 1500 index probes probably > > are slower than a sequential scan over 5000 items. > > > > You could probably force the planner to choose that plan by setting > > enable_hashjoin and enable_mergejoin to OFF. It'd be interesting to > > see the EXPLAIN result in that situation, as well as actual timings > > of the query both ways. > > > > regards, tom lane > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 5: Have you checked our extensive FAQ? > > > > http://www.postgresql.org/users-lounge/docs/faq.html