Thread: Yet again on indices...

Yet again on indices...

From
Jean-Paul ARGUDO
Date:
Ok, 

I'm working on query analysis for a program in ecpg for business puposes. Look
at what I found on with PG 7.2: Please be cool with my french2english processor,
I got few bogomips in my brain dedicated to english (should have listen more in
class..):
----

line 962 (in the ecpg source..)

EXPLAIN SELECT t12_bskid, t12_pnb, t12_lne, t12_tck
FROM T12_20011231
WHERE t12_bskid >= 1   
ORDER BY t12_bskid, t12_pnb, t12_tck, t12_lne;

NOTICE:  QUERY PLAN:
Sort  (cost=3006.13..3006.13 rows=25693 width=46) ->  Seq Scan on t12_20011231  (cost=0.00..1124.20 rows=25693
width=46)

=> not good, table t12_20011231 as 26K tuples :-(

=> create index t12_idx_bskid_20011231 on t12_20011231 (t12_bskid);

Sort  (cost=3006.13..3006.13 rows=25693 width=46) ->  Seq Scan on t12_20011231  (cost=0.00..1124.20 rows=25693
width=46)

=> probably statistic refresh to be done: 
$ /usr/local/pgsql/bin/vacuumdb --analyze dbks

Sort  (cost=3006.13..3006.13 rows=25693 width=46) ->  Seq Scan on t12_20011231  (cost=0.00..1124.20 rows=25693
width=46)


=> Uh? Seq scan cheaper than index???  

=> let's disable seqscan to read cost of index:
postgresql.conf : enable_seqscan = false

Sort  (cost=3126.79..3126.79 rows=25693 width=46) ->  Index Scan using t12_idx_bskid_20011231 on t12_20011231
(cost=0.00..1244.86 rows=25693 width=46)

=> Uh? seq scan'cost is lower than index scan??  => mailto hackers

----

What's your opinion? 

I have to tell that this select opperates in a forloop statment . 
I hardly believe reading 26K tuples is cheaper thant index reading, but maybe
you'll ask me about buffers that should store de 26K tuples?...

But just after this query, there is another one that maybe will put data in
buffers, kicking t12_20011231 data blocks...

Well I feel a little stuck there. I'll continue with enable_scans=false, but
I feel bad beeing forced to do so... and still asking myself if this is good
idea.

Thanks for support, best regards.

-- 
Jean-Paul ARGUDO



Re: Yet again on indices...

From
Jean-Paul ARGUDO
Date:
> > postgresql.conf : enable_seqscan = false
> You could just do 
> set enable_seqscan to 'off'
> in sql

thanks for the tip :-)
> > => Uh? seq scan'cost is lower than index scan??  => mailto hackers
> It often is. Really.
> > What's your opinion? 
> What are the real performance numbers ?

Finally, testing and testing again shows the choice of table scan is faster than
index scan on this 26K tuples table. really impresive.
I posted another mail about Oracle vs PG results in a comparative survey I'm
currently working on for 1 month. Please read it, I feel a bit disapointed with
Oracle's 1200 tps..

Thanks for your support Hannu!

-- 
Jean-Paul ARGUDO



Re: Yet again on indices...

From
Tom Lane
Date:
Jean-Paul ARGUDO <jean-paul.argudo@idealx.com> writes:
> EXPLAIN SELECT t12_bskid, t12_pnb, t12_lne, t12_tck
> FROM T12_20011231
> WHERE t12_bskid >= 1   
> ORDER BY t12_bskid, t12_pnb, t12_tck, t12_lne;

> Sort  (cost=3006.13..3006.13 rows=25693 width=46)
>   ->  Seq Scan on t12_20011231  (cost=0.00..1124.20 rows=25693 width=46)

> => Uh? Seq scan cheaper than index???  

For that kind of query, very probably.  How much of the table is
actually selected by "WHERE t12_bskid >= 1"?
        regards, tom lane


Re: Yet again on indices...

From
Stephan Szabo
Date:
On Wed, 27 Feb 2002, Jean-Paul ARGUDO wrote:

> EXPLAIN SELECT t12_bskid, t12_pnb, t12_lne, t12_tck
> FROM T12_20011231
> WHERE t12_bskid >= 1
> ORDER BY t12_bskid, t12_pnb, t12_tck, t12_lne;
>
> NOTICE:  QUERY PLAN:
>
> Sort  (cost=3006.13..3006.13 rows=25693 width=46)
>   ->  Seq Scan on t12_20011231  (cost=0.00..1124.20 rows=25693 width=46)
>
> => not good, table t12_20011231 as 26K tuples :-(

>
> => create index t12_idx_bskid_20011231 on t12_20011231 (t12_bskid);
>
> Sort  (cost=3006.13..3006.13 rows=25693 width=46)
>   ->  Seq Scan on t12_20011231  (cost=0.00..1124.20 rows=25693 width=46)
>
> => probably statistic refresh to be done:
> $ /usr/local/pgsql/bin/vacuumdb --analyze dbks
>
> Sort  (cost=3006.13..3006.13 rows=25693 width=46)
>   ->  Seq Scan on t12_20011231  (cost=0.00..1124.20 rows=25693 width=46)
>
>
> => Uh? Seq scan cheaper than index???
>
> => let's disable seqscan to read cost of index:
> postgresql.conf : enable_seqscan = false
>
> Sort  (cost=3126.79..3126.79 rows=25693 width=46)
>   ->  Index Scan using t12_idx_bskid_20011231 on t12_20011231
> (cost=0.00..1244.86 rows=25693 width=46)
>
> => Uh? seq scan'cost is lower than index scan??  => mailto hackers
>
> ----
>

> What's your opinion?

Well you didn't send the schema, or explain analyze results to show
which is actually faster, but...

Sequence scan *can be* faster than index scan when a large portion of the
table is going to be read.  If the data is randomly distributed,
eventually you end up reading most/all of the table blocks anyway to get
the validity information for the rows and you're doing it in random order,
plus you're reading parts of the index as well. How many rows are in
the table, and how many match t12_bskid >=1?



Re: Yet again on indices...

From
Thomas Swan
Date:
Tom Lane wrote:<br /><blockquote cite="mid22262.1014823120@sss.pgh.pa.us" type="cite"><pre wrap="">Jean-Paul ARGUDO <a
class="moz-txt-link-rfc2396E"href="mailto:jean-paul.argudo@idealx.com"><jean-paul.argudo@idealx.com></a>
writes:<br/></pre><blockquote type="cite"><pre wrap="">EXPLAIN SELECT t12_bskid, t12_pnb, t12_lne, t12_tck<br />FROM
T12_20011231<br/>WHERE t12_bskid >= 1   <br />ORDER BY t12_bskid, t12_pnb, t12_tck, t12_lne;<br
/></pre></blockquote><prewrap=""><br /></pre><blockquote type="cite"><pre wrap="">Sort  (cost=3006.13..3006.13
rows=25693width=46)<br />  ->  Seq Scan on t12_20011231  (cost=0.00..1124.20 rows=25693 width=46)<br
/></pre></blockquote><prewrap=""></pre></blockquote> Try the following:<br /><br /> EXPLAIN ANALYZE SELECT t12_bskid,
t12_pnb,t12_lne, t12_tck FROM T12_20011231 WHERE t12_bskid >= 1  ORDER BY t12_bskid, t12_pnb, t12_tck, t12_lne;<br
/><prewrap=""></pre> and see what the actual results are.  Then turn the seq_scans off and do the same thing.<br
/><blockquotecite="mid22262.1014823120@sss.pgh.pa.us" type="cite"><pre wrap=""><br /></pre><blockquote type="cite"><pre
wrap="">=>Uh? Seq scan cheaper than index???  <br /></pre></blockquote><pre wrap=""><br />For that kind of query,
veryprobably.  How much of the table is<br />actually selected by "WHERE t12_bskid >= 1"?<br /><br />
regards,tom lane<br /><br />---------------------------(end of broadcast)---------------------------<br />TIP 2: you
canget off all lists at once with the unregister command<br />    (send "unregister YourEmailAddressHere" to <a
class="moz-txt-link-abbreviated"href="mailto:majordomo@postgresql.org">majordomo@postgresql.org</a>)<br
/></pre></blockquote><br/><br /> 

Re: Yet again on indices...

From
Hannu Krosing
Date:
On Wed, 2002-02-27 at 14:48, Jean-Paul ARGUDO wrote:
> Ok, 
> 
> I'm working on query analysis for a program in ecpg for business puposes. Look
> at what I found on with PG 7.2: Please be cool with my french2english processor,
> I got few bogomips in my brain dedicated to english (should have listen more in
> class..):
> ----
> 
> line 962 (in the ecpg source..)
> 
> EXPLAIN SELECT t12_bskid, t12_pnb, t12_lne, t12_tck
> FROM T12_20011231
> WHERE t12_bskid >= 1   
> ORDER BY t12_bskid, t12_pnb, t12_tck, t12_lne;
> 
...

> 
> 
> => Uh? Seq scan cheaper than index???  
> 
> => let's disable seqscan to read cost of index:
> postgresql.conf : enable_seqscan = false

You could just do 

set enable_seqscan to 'off'

in sql

> Sort  (cost=3126.79..3126.79 rows=25693 width=46)
>   ->  Index Scan using t12_idx_bskid_20011231 on t12_20011231
> (cost=0.00..1244.86 rows=25693 width=46)
> 
> => Uh? seq scan'cost is lower than index scan??  => mailto hackers

It often is. Really.

> ----
> 
> What's your opinion? 

What are the real performance numbers ?

If they are other than what postgresql optimiser thinks you can change
them in system table.

----------------
Hannu