Re: again on index usage - Mailing list pgsql-hackers

From Daniel Kalchev
Subject Re: again on index usage
Date
Msg-id 200201100939.LAA28676@dcave.digsys.bg
Whole thread Raw
In response to Re: again on index usage  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: again on index usage  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: again on index usage  ("Hiroshi Inoue" <Inoue@tpf.co.jp>)
List pgsql-hackers
>>>Tom Lane said:> Daniel Kalchev <daniel@digsys.bg> writes:> > In any case, if we need to hit 50k pages (assuming the
indexeddata is > > randomly scattered in the file), and having to read these three times each    , it > > will be less
I/Othan having to read 1.7 million records.> > How do you arrive at that?  Assuming 100 records per page (probably the>
rightorder of magnitude), the seqscan alternative is 17k page reads.> Yes, you examine more tuples, but CPUs are lots
fasterthan disks.
 

I tried this:

db=# select * into iplog_test from iplog_gate200112;
SELECT
db=# create index iplog_test_ipaddr_idx on iplog_test(ipaddr);
CREATE
db=# cluster iplog_test_ipaddr_idx on iplog_test;
CLUSTER
db=# create index iplog_test_ipdate_idx on iplog_test(ipdate);
CREATE
db=# vacuum verbose analyze iplog_test;
NOTICE:  --Relation iplog_test--
NOTICE:  Pages 17722: Changed 0, reaped 0, Empty 0, New 0; Tup 1706202: Vac 0, 
Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 80, MaxLen 88; Re-using: Free/Avail. 
Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 1.48s/-1.86u sec.
NOTICE:  Index iplog_test_ipaddr_idx: Pages 5621; Tuples 1706202. CPU 
0.51s/1.80u sec.
NOTICE:  Index iplog_test_ipdate_idx: Pages 4681; Tuples 1706202. CPU 
0.36s/1.92u sec.
NOTICE:  --Relation pg_toast_253297758--
NOTICE:  Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0, Keep/VTL 
0/0, Crash 0, UnUsed 0, MinLen 0, MaxLen 0; Re-using: Free/Avail. Space 0/0; 
EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
NOTICE:  Index pg_toast_253297758_idx: Pages 1; Tuples 0. CPU 0.00s/0.00u sec.
NOTICE:  Analyzing...
VACUUM
db=# explain
db-# SELECT sum(input), sum(output) FROM iplog_test 
db-# WHERE 
db-# '2001-12-01 00:00:00+02' <= ipdate AND ipdate < '2001-12-02 00:00:00+02' 
AND
db-# '2001-12-01 00:00:00+02' <= ipdate AND ipdate < '2002-01-01 00:00:00+02' 
AND
db-# ipaddr <<= '193.68.240.0/20' AND 'uni-gw' ~ router;
NOTICE:  QUERY PLAN:

Aggregate  (cost=56112.97..56112.97 rows=1 width=16) ->  Seq Scan on iplog_test  (cost=0.00..56111.54 rows=284
width=16)

EXPLAIN

query runs for ca 3.5 minutes.

db=# set enable_seqscan to off;

the query plan is

Aggregate  (cost=100507.36..100507.36 rows=1 width=16) ->  Index Scan using iplog_test_ipdate_idx on iplog_test  
(cost=0.00..100505.94 rows=284 width=16)

query runs for ca 2.2 minutes.

Moves closer to your point :-)

Anyway, the platform is an dual Pentium III 550 MHz (intel) machine with 512 
MB RAM, with 15000 RPM Cheetah for the database, running BSD/OS 4.2. The 
machine is reasonably loaded all the time, so this is very much real-time test.

I agree, that with the 'wrong' clustering the index scan is not so much faster 
than the sequential scan.

Perhaps I need to tune this machine's costs to prefer more disk intensive 
operations over CPU intensive operations?

Let's see what 4.2 will result in.

Daniel



pgsql-hackers by date:

Previous
From: Justin Clift
Date:
Subject: Re: Increasing checkpoint distance helps 7.2 noticeably
Next
From: Michael Meskes
Date:
Subject: Re: ECPG: include sqlca