Thread: This table won't use INDEX until I DUMP/RESTORE it ?

This table won't use INDEX until I DUMP/RESTORE it ?

From
Chris Miles
Date:
I have a DB that appears to perform badly.  A test of one table
with one of the typical queries gives me a query plan indicating
a Seq Scan;

 DB=# EXPLAIN select pcref,pcseqnbr from catrecrel where pcbsref='something'  and (pccaref is null or pccaref='') and
pcparis null order by pcseqnbr ; 
 NOTICE:  QUERY PLAN:

 Sort  (cost=38266.65..38266.65 rows=4 width=58)
   ->  Seq Scan on catrecrel  (cost=0.00..38266.61 rows=4 width=58)

Ok, that's no good, but it _should_ be using an index instead, and
if I dump this table, restore it onto a different (non-live) DB, and
try again, I see that it does indeed plan to use the index:

 DB=# EXPLAIN select pcref,pcseqnbr from catrecrel where pcbsref='something'  and (pccaref is null or pccaref='') and
pcparis null order by pcseqnbr ; 
 NOTICE:  QUERY PLAN:

 Sort  (cost=469.92..469.92 rows=1 width=58)
   ->  Index Scan using ind_pcbsref on catrecrel  (cost=0.00..469.91 rows=1 width=58)

Now, why is this?  The first (live) DB is VACUUM ANALYSEd nightly, and
was done so again just before this test.  Data, schema and indexes should
be the same in both (well it was dumped/restored directly from one to the
other).

The only fix I can think of is to dump and restore the whole DB, based on
the fact that a newly restored DB appears to work much better, but surely
I shouldn't have to do this?  What else can i do to fix it?

Cheers,
CM

--
Chris Miles
http://chrismiles.info/


Re: This table won't use INDEX until I DUMP/RESTORE it ?

From
"Priya G"
Date:
<div style="background-color:"><div><p>try to analyze the table. That may help to use the index<br /><br
/></div><div></div>>From:Chris Miles  <div></div>>To: pgsql-admin@postgresql.org <div></div>>CC: Chris Miles
<div></div>>Subject:[ADMIN] This table won't use INDEX until I DUMP/RESTORE it ? <div></div>>Date: Tue, 19 Aug
200317:46:16 +0100 <div></div>> <div></div>>I have a DB that appears to perform badly. A test of one table
<div></div>>withone of the typical queries gives me a query plan indicating <div></div>>a Seq Scan;
<div></div>><div></div>>DB=# EXPLAIN select pcref,pcseqnbr from catrecrel where
<div></div>>pcbsref='something'and (pccaref is null or pccaref='') and pcpar <div></div>>is null order by
pcseqnbr; <div></div>>NOTICE: QUERY PLAN: <div></div>> <div></div>>Sort (cost=38266.65..38266.65 rows=4
width=58)<div></div>> -> Seq Scan on catrecrel (cost=0.00..38266.61 rows=4 width=58) <div></div>>
<div></div>>Ok,that's no good, but it _should_ be using an index instead, and <div></div>>if I dump this table,
restoreit onto a different (non-live) DB, and <div></div>>try again, I see that it does indeed plan to use the
index:<div></div>> <div></div>>DB=# EXPLAIN select pcref,pcseqnbr from catrecrel where
<div></div>>pcbsref='something'and (pccaref is null or pccaref='') and pcpar <div></div>>is null order by
pcseqnbr; <div></div>>NOTICE: QUERY PLAN: <div></div>> <div></div>>Sort (cost=469.92..469.92 rows=1 width=58)
<div></div>>-> Index Scan using ind_pcbsref on catrecrel (cost=0.00..469.91 <div></div>>rows=1 width=58)
<div></div>><div></div>>Now, why is this? The first (live) DB is VACUUM ANALYSEd nightly, <div></div>>and
<div></div>>wasdone so again just before this test. Data, schema and indexes <div></div>>should <div></div>>be
thesame in both (well it was dumped/restored directly from one <div></div>>to the <div></div>>other).
<div></div>><div></div>>The only fix I can think of is to dump and restore the whole DB, <div></div>>based on
<div></div>>thefact that a newly restored DB appears to work much better, but <div></div>>surely <div></div>>I
shouldn'thave to do this? What else can i do to fix it? <div></div>> <div></div>>Cheers, <div></div>>CM
<div></div>><div></div>>-- <div></div>>Chris Miles <div></div>>http://chrismiles.info/ <div></div>>
<div></div>><div></div>>---------------------------(end of <div></div>>broadcast)---------------------------
<div></div>>TIP5: Have you checked our extensive FAQ? <div></div>> <div></div>>
http://www.postgresql.org/docs/faqs/FAQ.html<div></div></div><br clear="all" /><hr /><b>MSN 8:</b> <a
href="http://g.msn.com/8HMLENUS/2746??PS=">Get6 months for $9.95/month.</a> 

Re: This table won't use INDEX until I DUMP/RESTORE it ?

From
Stephan Szabo
Date:
On Tue, 19 Aug 2003, Chris Miles wrote:

> I have a DB that appears to perform badly.  A test of one table
> with one of the typical queries gives me a query plan indicating
> a Seq Scan;
>
>  DB=# EXPLAIN select pcref,pcseqnbr from catrecrel where
> pcbsref='something' and (pccaref is null or pccaref='') and pcpar is
> null order by pcseqnbr ;
>  NOTICE:  QUERY PLAN:
>
>  Sort  (cost=38266.65..38266.65 rows=4 width=58)
>    ->  Seq Scan on catrecrel  (cost=0.00..38266.61 rows=4 width=58)

What does it give if you set enable_seqscan=off; before the explain?
And what does explain analyze give both with seqscan disabled and enabled?

Also, what version are you running?




Re: This table won't use INDEX until I DUMP/RESTORE it ?

From
Stephan Szabo
Date:
On Thu, 21 Aug 2003, Chris Miles wrote:

>
> Stephan Szabo wrote:
> > On Tue, 19 Aug 2003, Chris Miles wrote:
> >>I have a DB that appears to perform badly.  A test of one table
> >>with one of the typical queries gives me a query plan indicating
> >>a Seq Scan;
> >
> > What does it give if you set enable_seqscan=off; before the explain?
>
> ok, with enable_seqscan=off it gives an index scan for the explain.
>
> > And what does explain analyze give both with seqscan disabled and enabled?
>
> test=# set enable_seqscan=on;
> SET VARIABLE
> test=# EXPLAIN ANALYSE select pcref,pcseqnbr from catrecrel where
> pcbsref='test' and (pccaref is null or pccaref='') and pcpar is null
> order by pcseqnbr ;
> NOTICE:  QUERY PLAN:
>
> Sort  (cost=38288.75..38288.75 rows=4 width=58) (actual time=7271.47..7272.59 rows=743 loops=1)
>   -> Seq Scan on catrecrel (cost=0.00..38288.70 rows=4 width=58)
> (actual time=0.10..7266.19 rows=743 loops=1)
> Total runtime: 7273.92 msec
>
> EXPLAIN
> test=# set enable_seqscan=off;
> SET VARIABLE
> test=# EXPLAIN ANALYSE select pcref,pcseqnbr from catrecrel where
> pcbsref='test' and (pccaref is null or pccaref='') and pcpar is null
> order by pcseqnbr ;
> NOTICE:  QUERY PLAN:
>
> Index Scan using ind_pcseqnbr on catrecrel (cost=0.00..38390.48 rows=4
> width=58) (actual time=0.28..229.19 rows=743 loops=1)
> Total runtime: 230.53 msec

Well it seems to be off with the number of records, but not in the
direction that would generally make index scan better.  How big
is the table and what does vacuum verbose give?  I don't understand why it
isn't doing a sort and scan on pcbsref, how many of the rows have
pcbsref='test'?

In general, an index on (pcbsref, pcseqnbr) might give the best results
(as it could do that check and the ordering with one index).

> > Also, what version are you running?
>
> Sorry forgot to mention it is: 7.2.3

You should probably move to at least 7.2.4, IIRC there was a serious
bug fixed.

> So why do I have to force seqscan off to get better behaviour?
> This wouldn't be practical to do within our code.

That was more to get its idea for the cost of the index scan.
As a short term thing, if you havent played with random_page_cost,
I'd suggest lowering it from the default 4 to say 3 or 2 and see
what plan it gives you.





Re: This table won't use INDEX until I DUMP/RESTORE it ?

From
Chris Miles
Date:
Stephan Szabo wrote:
> On Tue, 19 Aug 2003, Chris Miles wrote:
>>I have a DB that appears to perform badly.  A test of one table
>>with one of the typical queries gives me a query plan indicating
>>a Seq Scan;
>
> What does it give if you set enable_seqscan=off; before the explain?

ok, with enable_seqscan=off it gives an index scan for the explain.

> And what does explain analyze give both with seqscan disabled and enabled?

test=# set enable_seqscan=on;
SET VARIABLE
test=# EXPLAIN ANALYSE select pcref,pcseqnbr from catrecrel where pcbsref='test'  and (pccaref is null or pccaref='')
andpcpar is null order by pcseqnbr ; 
NOTICE:  QUERY PLAN:

Sort  (cost=38288.75..38288.75 rows=4 width=58) (actual time=7271.47..7272.59 rows=743 loops=1)
  ->  Seq Scan on catrecrel  (cost=0.00..38288.70 rows=4 width=58) (actual time=0.10..7266.19 rows=743 loops=1)
Total runtime: 7273.92 msec

EXPLAIN
test=# set enable_seqscan=off;
SET VARIABLE
test=# EXPLAIN ANALYSE select pcref,pcseqnbr from catrecrel where pcbsref='test'  and (pccaref is null or pccaref='')
andpcpar is null order by pcseqnbr ; 
NOTICE:  QUERY PLAN:

Index Scan using ind_pcseqnbr on catrecrel  (cost=0.00..38390.48 rows=4 width=58) (actual time=0.28..229.19 rows=743
loops=1)
Total runtime: 230.53 msec

EXPLAIN

> Also, what version are you running?

Sorry forgot to mention it is: 7.2.3

So why do I have to force seqscan off to get better behaviour?
This wouldn't be practical to do within our code.

Is a newer version, such as 7.3.4, much smarter with query planning?

Regards,
Chris.

--
Chris Miles
http://chrismiles.info/


Re: This table won't use INDEX until I DUMP/RESTORE it ?

From
"scott.marlowe"
Date:
On Thu, 21 Aug 2003, Chris Miles wrote:

>
> Stephan Szabo wrote:
> > On Tue, 19 Aug 2003, Chris Miles wrote:
> >>I have a DB that appears to perform badly.  A test of one table
> >>with one of the typical queries gives me a query plan indicating
> >>a Seq Scan;
> >
> > What does it give if you set enable_seqscan=off; before the explain?
>
> ok, with enable_seqscan=off it gives an index scan for the explain.
>
> > And what does explain analyze give both with seqscan disabled and enabled?
>
> test=# set enable_seqscan=on;
> SET VARIABLE
> test=# EXPLAIN ANALYSE select pcref,pcseqnbr from catrecrel where pcbsref='test'  and (pccaref is null or pccaref='')
andpcpar is null order by pcseqnbr ; 
> NOTICE:  QUERY PLAN:
>
> Sort  (cost=38288.75..38288.75 rows=4 width=58) (actual time=7271.47..7272.59 rows=743 loops=1)
>   ->  Seq Scan on catrecrel  (cost=0.00..38288.70 rows=4 width=58) (actual time=0.10..7266.19 rows=743 loops=1)
> Total runtime: 7273.92 msec

Note that Postgresql "thinks" the cost of a seq scan is about 38288
here...

> EXPLAIN
> test=# set enable_seqscan=off;
> SET VARIABLE
> test=# EXPLAIN ANALYSE select pcref,pcseqnbr from catrecrel where pcbsref='test'  and (pccaref is null or pccaref='')
andpcpar is null order by pcseqnbr ; 
> NOTICE:  QUERY PLAN:
>
> Index Scan using ind_pcseqnbr on catrecrel  (cost=0.00..38390.48 rows=4 width=58) (actual time=0.28..229.19 rows=743
loops=1)
> Total runtime: 230.53 msec

and here is "thinks" the cost of an index scan is 38390.

however, given that the seq scan is using 7.2 seconds, and the index scan
is .23 seconds, it would appear the query analyzer is making a bad choice.

> > Also, what version are you running?
>
> Sorry forgot to mention it is: 7.2.3
>
> So why do I have to force seqscan off to get better behaviour?
> This wouldn't be practical to do within our code.

No, enable_seqscan=off is a kind of hammer to the forebrain method of
forcing postgresql to do what you want.  It is not something you should
have to use in production, just troubleshooting.

> Is a newer version, such as 7.3.4, much smarter with query planning?

7.3.4 was mostly bug fixes, but there might be a tweak on the planner
somewhere.

More importantly, the problem here is likely that your machine has very
fast random I/O and more memory than postgresql realizes, so it is making
choices as though it were on a slower machine (I/O wise) with less memory.

You should set effective_cache_size appropriately (very approximate
method, take the size of the buffer + kernel cache and divide by 8192 for
effective_cache_size.)

Then you might want to play around with random_page_cost and the
cpu*tuple*cost variables in postgresql.conf.

Note that you can set these from a psql session for testing to see when
they kick in.  On fast boxes with lots of ram, it's not uncommon for
random page cost to need a setting of between 1 and 2 to get the planner
to behave properly.  Going any lower than 1 means that likely the cpu*cost
vars aren't set right, or that you've found a corner case.

Your query doesn't really look like a corner case, more like the planner
is just missing the index scan by a few points, so in this case, adjusting
the cpu*cost vars and lowering random_page_cost should do it.


Re: This table won't use INDEX until I DUMP/RESTORE it ?

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> That was more to get its idea for the cost of the index scan.
> As a short term thing, if you havent played with random_page_cost,
> I'd suggest lowering it from the default 4 to say 3 or 2 and see
> what plan it gives you.

Given the closeness of the two cost estimates, random_page_cost
shouldn't need to be moved much at all to flip the plan choice to
indexscan for this particular query.  You might want to move it more
so that the ratio of the estimates approximates reality, but I advise
caution on that, or at least caution on using this example as the
test case.  There's a fairly large misestimate of the number of returned
rows in there too, and so if you use this case you'll be conflating
the effects of that misestimate with the effects of random_page_cost
being wrong for your environment.

            regards, tom lane